Denna sida på svenska This page in English

Study Circle in Reinforcement Learning

NEWS: The project presentation on Friday May 3 has been postponed to Friday May 24 10:15 - 12:00. One reason is lack of finished projects to present and another reason is that Karl-Erik is away that day.

This a graduate/PhD course on Reinforcement Learning (RL) given on study circle form, i.e, it is the participants that do most of the work.

We will mainly follow the Reinforcement Learning Course given by David Silver at UCL.

We will have course meetings once per week. Before the meeting the course participants should have gone through the lecture slides and watched the corresponding lecture video.

The UCL course follows quite closely the standard text book on RL:

Richard S. Sutton and Andrew G. Barto: "Reinforcement Learning: An Introduction", The MIT Press

Most of the algorithms are available in Python in the following repos:

https://github.com/dennybritz/reinforcement-learning

https://github.com/dalmia/David-Silver-Reinforcement-learning

A new version of the course that combines Advanced NN and Tensorflow with RL can be found here

Neural-MMO - Multi-agent Reinforcement Learning Environment from OpenAI

DeepMind's python tools for connecting to, training RL, and playing Starcraft 2

The background to the name Dynamic Programming is explained in Richard Bellman's acceptance speech for the Norbert Wiener Prize

Course responsible:Karl-Erik Årzén

Meetings (the default meeting room is the Seminar Room at Dept of Automatic Control, 2nd floor):

Meeting1: January 25, 13:00 - 15:00. Introduction. Before the meeting each participants should have gone through Lecture 1 in the UCL course. Notes from Meeting 1.
- Before Meeting 2: Watch Lecture 2 and work through the OpenAI Gym Tutorial from dennybritz
Meeting 2: Friday February 1, 13:15 - 15:00 Markov Decision Processes
- Before Meeting 3: Watch Lecture 3 and do the following exercises from Dannybritz
  - Implement Policy Evaluation in Python
  - Implement Policy Iteration in Python
  - Implement Value Iteration in Python
  - Implement Gambler's Problem
- GridWorld example that works https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
Meeting 3: Monday February 11, 13:15 - 15:00 Planning by Dynamic Programming
- Before Meeting 4: Watch Lecture 4 and do the following exercises from Dannybritz
  - Get familiar with the Blackjack environment (Blackjack-v0)
  - Implement the Monte Carlo Prediction to estimate state-action values
Meeting 4: Monday February 18, 13:15 - 15:00 Model-Free Prediction
- Before Meeting 5: Watch Lecture 5 and do the following exercises from Dannybritz
  - Implement the on-policy first-visit Monte Carlo Control algorithm
  - Implement the off-policy every-visit Monte Carlo Control using Weighted Important Sampling algorithm
Meeting 5: Friday February 22, 10:15 - 12:00 Model-Free Control (OBS: In Lab F, First floor, M-building)
- Before Meeting 6: Watch Lecture 6 and do the following exercises from Dannybritz
  - Get familiar with the Windy Gridworld Playground
  - Implement SARSA
  - Implement Q-Learning in Python (or some other language)
  - Get familiar with the Mountain Car Playground
  - Solve Mountain Car Problem using Q-Learning with Linear Function Approximation
  - TD(Lambda) example in Julia from Fredrik Bagge Carlsson
Meeting 6: Friday March 1, 10:15 - 12:00 Value Function Approximation
- Before Meeting 7: Watch Lecture 7 and study the following exercises
  - Cliffwalk REINFORCE
  - Cliffwalk Actor-Critic
  - Mountain Car Actor Critic
  - REINFORCE example in Julia from Fredrik Bagge Carlsson
Meeting 7: Friday March 8, 10:15 - 12:00 Policy Gradient Methods
- Before Meeting 8: Watch Lecture 8 and study the following exercises (Note that these are exercises that are based on Lecture 6, there are no new exercises for Lecture 8)
  - Deep-Q Learning for Atari Games
  - Double-Q Learning
Meeting 8: Friday March 15, 10:15 - 12:00 Integrating Learning and Planning
- Before Meeting 9: Watch Lecture 9
Meeting 9: Friday March 22, 10:15 - 12:00 Exploration and Exploitation
- Meeting 10: Watch Lecture 10
- Select a project for the extended version of the course, see below.
Meeting 10: Friday March 29, 10:15 - 12:00 Case Study: RL in Classic Games

Paper about Mastering the game of go

Paper about Deepstack playing poker

Possible Projects for the Extended Version of the Course

Two types of projects:

Programming projects, e.g.,
- Participate in one of the DeepMind or OpenAI's web competitions
- Implement RL on your own research topic
Advanced Topic projects
- Study some advanced and/or not so well treated topic in RL and present it at a lecture, e.g.,
  - Connections between RL and control (B. Recht: "A Tour of Reinforcement Learning: The view from Continuous Control")
  - RL with continuous action and state spaces
  - Dual Control and how it connects to RL
  - RL for Robotics
  - Connections between adaptive control and RL
  - The Stanford Helicopter RL Case Study
  - Advanced Research Topics from the new version of the UCL course (RL8 Lecture + video gives a set of topics)
  - Your own project

Department of Automatic Control

LTH, Faculty of Engineering

Study Circle in Reinforcement Learning

Sidöversikt

Departments

International Master's programmes

Exchange studies

Current students

Core areas