L16A Introduction L16B Markov Decision Process (MDP) L16C Value Iteration L16D Policy Iteration L16E Reinforcement Learning L16F Model-Free RL based on MC Estimation L16G Temporal Difference Learning SARSA L16H Exploration Strategies L16I Q-LearningL16J SARSA vs. Q-Learning