COMP 150: Reinforcement Learning

Fall 2018

Instructor: Jivko Sinapov
Department of Computer Science

Tuesday, Thursday 1:30 - 2:45 pm
Classroom: Halligan 108

Office hours

Jivko Sinapov (Instructor)

Time: Tuesday 3:30-4:30 pm or by appointment
Office: Halligan 213 (may also be at the lab at Halligan 228)
Email: jivko--DOT--sinapov--AT--tufts--DOT--edu

Srijith Rajeev (Teaching Assistant)

Time: Monday 3:00-4:15 pm, Tuesday noon-1:15 pm, Wednesday 11:00-12:15 pm
Office: Halligan 228 A-B
Email: srijith2311 -- AT -- gmail -- DOT -- com

Class Diary (including links to slides and readings) [top]

All slides are available here.

Final Projects [top]

Projects:

Important Dates:

Team Formation (up to 2 people per team): October 30th

Project Proposal Writeup due: November 6th

Final Project Presentations: Dec 4 and Dec 6

Final Project Report and Deliverables: Dec 18


Course Overview [top]

"Reinforcement learning problems involve learning what to do --- how to map situations to actions --- so as to maximize a numerical reward signal." - Sutton and Barto ("Reinforcement Learning: An Introduction", course textbook)

This course will focus on agents that must learn, plan, and act in complex, non-deterministic environments. We will cover the main theory and approaches of Reinforcement Learning (RL), along with common software libraries and packages used to implement and test RL algorithms. The course is a graduate seminar with assigned readings and discussions. The content of the course will be guided in part by the interests of the students. It will cover at least the first several chapters of the course textbook. Beyond that, we will move to more advanced and recent readings from the field (e.g., transfer learning and deep RL) with an aim towards focusing on the practical successes and challenges relating to reinforcement learning.

There will be a programming component to the course in the form of a few short assignments and a final project. Students are expected to be proficient programmers in at least one of the following languages: C++, Java, or Python. Prior coursework (or experience) in Artificial Intelligence and/or Machine Learning is highly recommended, but not required.

Course Requirements [top]

Grades will be based on


Class Participation

You are expected to attend each class and actively participate by taking part in discussions and activities, and asking questions. If you anticipate missing a class, let me know as soon as you are able.


Programming Assignments

Students will be required to complete four minor programming assignments of their own choosing. In most cases these will come from the exercises in Sutton and Barto, though other options are possible upon consultation with the instructor. These exercises will generally not involve extensive or elaborate programs. The emphasis should be on empirically analyzing various learning algorithms and reporting the results. The reports, all relevant code and data should be submitted on canvas.

Grades for each programming assignment will be out of 10 points. Some examples of reports are included below:
7 to 7.5 - Adequate but the bare minimum in terms of experiments and analysis Example
8 to 8.5 - Good job but with some room for improvement Example
9 to 9.5 - Good analysis and good presentation of results Example
10+ - Excellent, does more than what had been asked Example


Reading Responses

Students should post responses to the readings on the Canvas forum. Reading response assignments will be announced in class and will be due on the day the assigned reading is due, before class. Credit will be based on evidence that you have done the readings carefully. You are encouraged to also read some of your peers' responses. The response should include a summary of the reading along with any of the following:


Discussion Moderation

Each student will lead a discussion on one of the readings. The discussion can begin with a brief summary/overview of the important points in the readings, but the assumption is to be that everyone has already completed the readings. The student may either present material related to the readings (perhaps from an outside source) or moderate a class discussion about the readings. The student must be prepared to keep the conversation flowing. Some tips on leading a good discussion are available here, courtesy of Peter Stone. It is required that you present your plan for the discussion, including any slides you intend to show, to the Professor and TA at least two nights prior to your discussion by email. The sign-up sheet for discussion slots will be posted on Canvas via an announcement. As there are more students than days of class, some days will feature 2 discussion moderators.

Text and Website [top]

The main reference used throughout the course will be "Reinforcement Learning: An Introduction", Second edition, in progress.

In addition, relevant research papers and book chapters will also be assigned, and later chosen by the students following their project topic.

Software Resources [top]

The Brown-UMBC Reinforcement Learning and Planning java code library: BURLAP

Ms. Pac-Man domain and RL environment in Java: [ZIP]

Simple Q-Network for playing Atari games: [ZIP]. The code is based on Andrej Karpathy's code described here.

MATLAB example: Q-learning for pendulum control: [LINK]

Reinforcement Learning with PyTorch: [LINK]

A collection of RL examples from WILDML: [LINK]

Related Conferences and Journals [top]

Credits and Similar Courses [top]

This class is heavily inspired by a course on Reinforcement Learning taught at UT Austin by Peter Stone. Feel free to thank him if you enjoy it.

Academic Dishonesty Policy [top]

You are encouraged to form study groups and discuss the reading materials assigned for this class. You are allowed to discuss the reading response assignments with your colleagues. You are also allowed to discuss the programming assignments (e.g., in front of a white board). However, each student will be expected to write their own response and code. Sharing of code is not allowed!

Collaboration is expected for the final projects -- as soon as you can, you will form teams of 2 members. If you absolutely insist on working alone, I won't stop you but you'll be facing a larger work load. For the final project, you're allowed to (and expected to) use various open-source libraries, published code, algorithms, datasets, etc. As long as you cite everything you use that was developed by someone else, you'll be fine.

IMPORTANT: Cheating, plagiarism, and other academic misconducts will not be tolerated and will be handled according to Tufts' policy on academic dishonesty. According to that policy, if I find any evidence of dishonesty, I am required to report it.


[Back to Department Homepage]