Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Reinforcement Learning Crash Course
Reinforcement Learning Basics with OpenAI Gym (available)
Welcome to Chapter 1! (3:39)
OpenAI Gym Installation (12:43)
Jupyter Installation (9:32)
Setting Up a RL Problem (12:54)
Exercise: Set Up the MountainCar-v0 Environment
The Agent and Its Environment (15:37)
Exercise: Investigate Observations in the MountainCar-v0 Environment
Actions (18:21)
Exercise: Actions in the MountainCar Environment
Rewards (19:09)
Exercise: Investigate Rewards in the MountainCar Environment
Goals and Corresponding Reward Functions (16:03)
Episodes (23:55)
Exercise: What are the Terminal States in MountainCar?
Exercise: Calculate Average Total Reward Per Episode
See You in Chapter 2! (3:29)
How to Maximize Rewards (available)
Markov Decision Process (22:45)
RL vs. Other Forms of ML (Supervised/Unsupervised Learning) (11:38)
Policy (20:34)
Exercise: Implement the Sampling Function for the Epsilon Pole Direction Policy
Model Based vs. Model Free Learning (6:57)
Modifying Gym Environments With Wrappers (56:40)
Exercise: Modify the CartPole-v0 Environment to Return Rounded Observations
Value Function (20:50)
Calculating Value Function Samples (25:56)
Exercise: Calculate Value Samples for Pole Direction Policy
Discounted Reward Sum (34:33)
Exercise: Calculate Value Samples Using Discounted Reward Sum
Action Value Function (20:35)
Calculating Q Values (28:06)
Exercise: Calculate Average Values of States Over Many Episodes
Bellman Expectation Equation for the Value Function (21:09)
Exercise: Verify the Bellman Expectation Equation for the Q-Value Function
Comparing Policies (14:22)
Policy Improvement (25:22)
Greedy Policy Improvement in CartPole-v0 (22:51)
Exercise: Implement Greedy Policy Sampling Function with Random Tie Breaking
Exercise: Implementing Sampling Functions for any Environment with Discrete Actions
Optimal Policy (4:46)
Exploration vs. Exploitation (20:27)
Exercise: Plot growth of state-action pairs in exploration mode
Epsilon Greedy Policy (15:40)
Iterative Epsilon Greedy Policy Improvement (19:06)
Exercise: Implement an Exponential Schedule for Epsilon
Important Announcement
Teach online with
Exercise: Implementing Sampling Functions for any Environment with Discrete Actions
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock