This course was created with the
course builder. Create your online course today.
Start now
Create your course
with
Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Reinforcement Learning Crash Course
Reinforcement Learning Basics with OpenAI Gym (available)
Welcome to Chapter 1! (3:39)
OpenAI Gym Installation (12:43)
Jupyter Installation (9:32)
Setting Up a RL Problem (12:54)
Exercise: Set Up the MountainCar-v0 Environment
The Agent and Its Environment (15:37)
Exercise: Investigate Observations in the MountainCar-v0 Environment
Actions (18:21)
Exercise: Actions in the MountainCar Environment
Rewards (19:09)
Exercise: Investigate Rewards in the MountainCar Environment
Goals and Corresponding Reward Functions (16:03)
Episodes (23:55)
Exercise: What are the Terminal States in MountainCar?
Exercise: Calculate Average Total Reward Per Episode
See You in Chapter 2! (3:29)
How to Maximize Rewards (available)
Markov Decision Process (22:45)
RL vs. Other Forms of ML (Supervised/Unsupervised Learning) (11:38)
Policy (20:34)
Exercise: Implement the Sampling Function for the Epsilon Pole Direction Policy
Model Based vs. Model Free Learning (6:57)
Modifying Gym Environments With Wrappers (56:40)
Exercise: Modify the CartPole-v0 Environment to Return Rounded Observations
Value Function (20:50)
Calculating Value Function Samples (25:56)
Exercise: Calculate Value Samples for Pole Direction Policy
Discounted Reward Sum (34:33)
Exercise: Calculate Value Samples Using Discounted Reward Sum
Action Value Function (20:35)
Calculating Q Values (28:06)
Exercise: Calculate Average Values of States Over Many Episodes
Bellman Expectation Equation for the Value Function (21:09)
Exercise: Verify the Bellman Expectation Equation for the Q-Value Function
Comparing Policies (14:22)
Policy Improvement (25:22)
Greedy Policy Improvement in CartPole-v0 (22:51)
Exercise: Implement Greedy Policy Sampling Function with Random Tie Breaking
Exercise: Implementing Sampling Functions for any Environment with Discrete Actions
Optimal Policy (4:46)
Exploration vs. Exploitation (20:27)
Exercise: Plot growth of state-action pairs in exploration mode
Epsilon Greedy Policy (15:40)
Iterative Epsilon Greedy Policy Improvement (19:06)
Exercise: Implement an Exponential Schedule for Epsilon
Important Announcement
Comparing Policies
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock