Study Circle in Reinforcement Learning
NEWS: The project presentation on Friday May 3 has been postponed to Friday May 24 10:15 - 12:00. One reason is lack of finished projects to present and another reason is that Karl-Erik is away that day.
This a graduate/PhD course on Reinforcement Learning (RL) given on study circle form, i.e, it is the participants that do most of the work.
We will mainly follow the Reinforcement Learning Course given by David Silver at UCL.
We will have course meetings once per week. Before the meeting the course participants should have gone through the lecture slides and watched the corresponding lecture video.
The UCL course follows quite closely the standard text book on RL:
- Richard S. Sutton and Andrew G. Barto: "Reinforcement Learning: An Introduction", The MIT Press
Most of the algorithms are available in Python in the following repos:
A new version of the course that combines Advanced NN and Tensorflow with RL can be found here
Neural-MMO - Multi-agent Reinforcement Learning Environment from OpenAI
DeepMind's python tools for connecting to, training RL, and playing Starcraft 2
The background to the name Dynamic Programming is explained in Richard Bellman's acceptance speech for the Norbert Wiener Prize
Course responsible:Karl-Erik Årzén
Meetings (the default meeting room is the Seminar Room at Dept of Automatic Control, 2nd floor):
- Meeting1: January 25, 13:00 - 15:00. Introduction. Before the meeting each participants should have gone through Lecture 1 in the UCL course. Notes from Meeting 1.
- Before Meeting 2: Watch Lecture 2 and work through the OpenAI Gym Tutorial from dennybritz
- Meeting 2: Friday February 1, 13:15 - 15:00 Markov Decision Processes
- Before Meeting 3: Watch Lecture 3 and do the following exercises from Dannybritz
- Implement Policy Evaluation in Python
- Implement Policy Iteration in Python
- Implement Value Iteration in Python
- Implement Gambler's Problem
- GridWorld example that works https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
- Before Meeting 3: Watch Lecture 3 and do the following exercises from Dannybritz
- Meeting 3: Monday February 11, 13:15 - 15:00 Planning by Dynamic Programming
- Before Meeting 4: Watch Lecture 4 and do the following exercises from Dannybritz
- Get familiar with the Blackjack environment (Blackjack-v0)
- Implement the Monte Carlo Prediction to estimate state-action values
- Before Meeting 4: Watch Lecture 4 and do the following exercises from Dannybritz
- Meeting 4: Monday February 18, 13:15 - 15:00 Model-Free Prediction
- Before Meeting 5: Watch Lecture 5 and do the following exercises from Dannybritz
- Implement the on-policy first-visit Monte Carlo Control algorithm
- Implement the off-policy every-visit Monte Carlo Control using Weighted Important Sampling algorithm
- Before Meeting 5: Watch Lecture 5 and do the following exercises from Dannybritz
- Meeting 5: Friday February 22, 10:15 - 12:00 Model-Free Control (OBS: In Lab F, First floor, M-building)
- Before Meeting 6: Watch Lecture 6 and do the following exercises from Dannybritz
- Get familiar with the Windy Gridworld Playground
- Implement SARSA
- Implement Q-Learning in Python (or some other language)
- Get familiar with the Mountain Car Playground
- Solve Mountain Car Problem using Q-Learning with Linear Function Approximation
- TD(Lambda) example in Julia from Fredrik Bagge Carlsson
- Before Meeting 6: Watch Lecture 6 and do the following exercises from Dannybritz
- Meeting 6: Friday March 1, 10:15 - 12:00 Value Function Approximation
- Before Meeting 7: Watch Lecture 7 and study the following exercises
- Cliffwalk REINFORCE
- Cliffwalk Actor-Critic
- Mountain Car Actor Critic
- REINFORCE example in Julia from Fredrik Bagge Carlsson
- Before Meeting 7: Watch Lecture 7 and study the following exercises
- Meeting 7: Friday March 8, 10:15 - 12:00 Policy Gradient Methods
- Before Meeting 8: Watch Lecture 8 and study the following exercises (Note that these are exercises that are based on Lecture 6, there are no new exercises for Lecture 8)
- Deep-Q Learning for Atari Games
- Double-Q Learning
- Before Meeting 8: Watch Lecture 8 and study the following exercises (Note that these are exercises that are based on Lecture 6, there are no new exercises for Lecture 8)
- Meeting 8: Friday March 15, 10:15 - 12:00 Integrating Learning and Planning
- Before Meeting 9: Watch Lecture 9
- Meeting 9: Friday March 22, 10:15 - 12:00 Exploration and Exploitation
- Meeting 10: Watch Lecture 10
- Select a project for the extended version of the course, see below.
- Meeting 10: Friday March 29, 10:15 - 12:00 Case Study: RL in Classic Games
Paper about Mastering the game of go
Paper about Deepstack playing poker
Possible Projects for the Extended Version of the Course
Two types of projects:
- Programming projects, e.g.,
- Participate in one of the DeepMind or OpenAI's web competitions
- Implement RL on your own research topic
- Advanced Topic projects
- Study some advanced and/or not so well treated topic in RL and present it at a lecture, e.g.,
- Connections between RL and control (B. Recht: "A Tour of Reinforcement Learning: The view from Continuous Control")
- RL with continuous action and state spaces
- Dual Control and how it connects to RL
- RL for Robotics
- Connections between adaptive control and RL
- The Stanford Helicopter RL Case Study
- Advanced Research Topics from the new version of the UCL course (RL8 Lecture + video gives a set of topics)
- Your own project
- Study some advanced and/or not so well treated topic in RL and present it at a lecture, e.g.,