The book 'Reinforcement Learning: An Introduction' by Sutton and Barto is the standard text book for introductory courses to reinforcement learning. Next to concrete algorithms and extensive examples the book contains several fundamental results related to Markov decision processes (MDPs) and Bellman equations in Chapters 3 and 4. Unfortunately some proofs are missing, some theorems lack precise formulation, and for some results the line of arguments is quite garbled.
In this note we provide all missing proofs, give precise formulations of theorems and untangle the line of arguments. Further, we avoid using random variables and their expected values. Since we (like Sutton/Barto) restrict our attention to finite MDPs all expected values can be made explicit avoiding overloaded notation and murky conclusions.
This article bridges the gap between introductory literature like Sutton/Barto and research literature containing exact formulations and proofs of relevant results, but being less accessible to beginners due to higher generality and complexity.