An explanation of the mathematics behind backward propagation.
How to model an RL problem: Markov Decision Processes
How to model an RL problem: Dynamic programming