COMP 138: Reinforcement Learning

Office hours

Jivko Sinapov (Instructor)

Time: Tuesdays 3:00-4:00 pm or by appointment
Office: Virtual Halligan (see Canvas link)
Email: jivko--DOT--sinapov--AT--tufts--DOT--edu

Andre Cleaver (Graduate Teaching Assistant)

Time: Th 5:00 - 6:00 pm and F 12:00 - 1:00 pm
Office: see canvas announcement
Email: firstname.lastname -- AT -- tufts -- DOT -- edu

Class Diary (including links to slides and readings) [top]

All slides are available here.

10/20 Planning and Learning
Assigned Reading:
- Chapter 9 of Sutton and Barto
- Knox, W.B., and Stone, P. "Interactively shaping agents via human reinforcement: The TAMER framework.", Proceedings of the 5th ACM International Conference on Knowledge Capture, 2009.
10/15 n-Step TD part 2
10/13 n-Step TD
Assigned Reading:
- Chapter 8 of Sutton and Barto
- Narvekar et al, "Source task creation for curriculum learning", Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (AAMAS), 2016.
Assigned Homework: Homework 3 (see canvas, due 11/2)
10/8 Temporal Difference Learning II
10/6 Temporal Difference Learning
Assigned Reading:
- Chapter 7 of Sutton and Barto
- Griffith et al, "Policy shaping: Integrating human feedback with reinforcement learning", Advances in neural information processing systems (NIPS). 2013.
10/1 Dynamic Programming for Solving MDPs II
9/29 Dynamic Programming for Solving MDPs
Assigned Reading:
- Chapter 6 of Sutton and Barto
- Taylor et al, "Transfer Learning via Inter-Task Mappings for Temporal Difference Learning", Journal of Machine Learning Research, 8(1):2125-2167, 2007.
Assigned Homework: Homework 2 (see canvas, due 10/12)
9/24 MDPs II
9/22 MDPs I
Assigned Reading: Chapters 4 and 5 of Sutton and Barto.
9/17 Q-Learning
9/15 Multi-Armed Bandits (II)
Assigned Reading: Chapter 3 of Sutton and Barto.
9/10 Multi-Armed Bandits (I)
Assigned Homework: Homework 1 (see canvas, due 9/20)
9/8 Class Introduction -- What is Reinforcement Learning?
Assigned Reading: Chapters 1 and 2 of Sutton and Barto.

Final Projects [top]

Information about the course project will appear here.

Important Dates:

Team Formation (up to 3 people per team): October 22nd

Project Proposal Writeup due: Monday Nov 4th

Final Project Presentations: TBD

Final Project Report and Deliverables: TBD

Course Overview [top]

"Reinforcement learning problems involve learning what to do --- how to map situations to actions --- so as to maximize a numerical reward signal." - Sutton and Barto ("Reinforcement Learning: An Introduction", course textbook)

This course will focus on agents that must learn, plan, and act in complex, non-deterministic environments. We will cover the main theory and approaches of Reinforcement Learning (RL), along with common software libraries and packages used to implement and test RL algorithms. The course is a graduate seminar with assigned readings and discussions. The content of the course will be guided in part by the interests of the students. It will cover at least the first several chapters of the course textbook. Beyond that, we will move to more advanced and recent readings from the field (e.g., transfer learning and deep RL) with an aim towards focusing on the practical successes and challenges relating to reinforcement learning.

There will be a programming component to the course in the form of a few short assignments and a final project. Students are expected to be proficient programmers in at least one of the following languages: C++, Java, or Python. Prior coursework (or experience) in Artificial Intelligence and/or Machine Learning is highly recommended, but not required.

Course Requirements [top]

Grades will be based on

class participation (including attendance) (10%);
programming assignments (45%)
reading responses (10%)
class discussion moderation (10%)
a final project (25%)

Class Participation

You are expected to attend each class and actively participate by taking part in discussions and activities, and asking questions. If you anticipate missing a class, let me know as soon as you are able.

Programming Assignments

Students will be required to complete four minor programming assignments of their own choosing. In most cases these will come from the exercises in Sutton and Barto, though other options are possible upon consultation with the instructor. These exercises will generally not involve extensive or elaborate programs. The emphasis should be on empirically analyzing various learning algorithms and reporting the results. The reports, all relevant code and data should be submitted on canvas.

Grades for each programming assignment will be out of 10 points. Some examples of reports are included below:
7 to 7.5 - Adequate but the bare minimum in terms of experiments and analysis Example
8 to 8.5 - Good job but with some room for improvement Example
9 to 9.5 - Good analysis and good presentation of results Example
10+ - Excellent, does more than what had been asked Example

Reading Responses

Students should post responses to the readings on the Canvas forum. Reading response assignments will be announced in class and will be due on the day the assigned reading is due, before class. Credit will be based on evidence that you have done the readings carefully. You are encouraged to also read some of your peers' responses. The response should include a summary of the reading along with any of the following:

Insightful questions;
Clarification questions about ambiguities;
Comments about the relation of the reading to previous readings;
Critiques on the research;
Critiques on the writing style or clarity;
Thoughts on what you would like to learn about in more detail;
Possible extensions or related studies;
Thoughts on the paper's importance; and
Summaries of the most important things you learned.

Discussion Moderation

Each student will lead a discussion on one of the readings. The discussion can begin with a brief summary/overview of the important points in the readings, but the assumption is to be that everyone has already completed the readings. The student may either present material related to the readings (perhaps from an outside source) or moderate a class discussion about the readings. The student must be prepared to keep the conversation flowing. Some tips on leading a good discussion are available here, courtesy of Peter Stone. It is required that you present your plan for the discussion, including any slides you intend to show, to the Professor and TA at least two nights prior to your discussion by email. The sign-up sheet for discussion slots will be posted on Canvas via an announcement. As there are more students than days of class, some days will feature 2 discussion moderators.

Text and Website [top]

The main reference used throughout the course will be "Reinforcement Learning: An Introduction", Second edition, in progress.

In addition, relevant research papers and book chapters will also be assigned, and later chosen by the students following their project topic.

Software Resources [top]

The Brown-UMBC Reinforcement Learning and Planning java code library: BURLAP

Ms. Pac-Man domain and RL environment in Java: [ZIP]

Simple Q-Network for playing Atari games: [ZIP]. The code is based on Andrej Karpathy's code described here.

MATLAB example: Q-learning for pendulum control: [LINK]

Reinforcement Learning with PyTorch: [LINK]

A collection of RL examples from WILDML: [LINK]

Credits and Similar Courses [top]

This class is heavily inspired by a course on Reinforcement Learning taught at UT Austin by Peter Stone. Feel free to thank him if you enjoy it.

Academic Dishonesty Policy [top]

You are encouraged to form study groups and discuss the reading materials assigned for this class. You are allowed to discuss the reading response assignments with your colleagues. You are also allowed to discuss the programming assignments (e.g., in front of a white board). However, each student will be expected to write their own response and code. Sharing of code is not allowed!

Collaboration is expected for the final projects -- as soon as you can, you will form teams of 2-3 members. If you absolutely insist on working alone, I won't stop you but you'll be facing a larger work load. For the final project, you're allowed to (and expected to) use various open-source libraries, published code, algorithms, datasets, etc. As long as you cite everything you use that was developed by someone else, you'll be fine.

IMPORTANT: Cheating, plagiarism, and other academic misconducts will not be tolerated and will be handled according to Tufts' policy on academic dishonesty. According to that policy, if I find any evidence of dishonesty, I am required to report it.

[Back to Department Homepage]