Office hours
Jivko Sinapov (Instructor)
Time: Tuesday 3:30-4:30 pm or by
appointment
Office: Halligan 213 (may also be at the lab at Halligan 228)
Email:
jivko--DOT--sinapov--AT--tufts--DOT--edu
Srijith Rajeev (Teaching Assistant)
Time: Monday 3:00-4:15 pm, Tuesday noon-1:15 pm, Wednesday 11:00-12:15 pm
Office: Halligan 228 A-B
Email:
srijith2311 -- AT -- gmail -- DOT -- com
Class Diary (including links to slides and readings) [top]
All slides are available here.
- 11/29 Conclusion
- 11/27 Policy Gradient Methods
- 11/20 Project Breakout
- 11/15 Eligibility Traces II
- 11/13 Eligibility Traces I
Assigned Reading:- Chapter 13 of Sutton and Barto
- Research Article of your choice
- 11/8 Review -- Link to Sutton's RL slides
- 11/6 PacMan Demo
Assigned Reading:- Chapter 12 of Sutton and Barto
- Research Article of your choice
- 11/2 In-Class Exercise
- 10/30 Function Approximation II
Assigned Reading:- Chapter 11 of Sutton and Barto
- Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112.1-2 (1999): 181-211.
- 10/23 Function Approximation I
Assigned Reading:- Chapter 10 of Sutton and Barto
- Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
- 10/18 Project Brainstorm Activity
- 10/16 Planning and Learning
Assigned Reading:- Chapter 9 of Sutton and Barto
- Knox, W.B., and Stone, P. "Interactively shaping agents via human reinforcement: The TAMER framework.", Proceedings of the 5th ACM International Conference on Knowledge Capture, 2009.
- 10/11 n-Step Temporal Difference Learning
Assigned Reading:- Chapter 8 of Sutton and Barto
- Narvekar et al, "Source task creation for curriculum learning", Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (AAMAS), 2016.
- 10/4 Temporal Difference Learning II
- 10/2 Temporal Difference Learning
Assigned Reading:- Chapter 7 of Sutton and Barto
- Griffith et al, "Policy shaping: Integrating human feedback with reinforcement learning", Advances in neural information processing systems (NIPS). 2013.
- 9/27 Monte Carlo Methods
Moderated Discussion slides available here. - 9/25 Dynamic Programming for Policy Evaluation and Improvement
Assigned Reading:- Chapter 6 of Sutton and Barto
- Taylor et al, "Transfer Learning via Inter-Task Mappings for Temporal Difference Learning", Journal of Machine Learning Research, 8(1):2125-2167, 2007.
- 9/20 Markov Decision Processes II
- 9/18 Markov Decision Processes I
Assigned Reading: Chapters 4 and 5 of Sutton and Barto. - 9/13 Q-Learning
- 9/11 Exploration in Multi-Armed Bandits
Assigned Reading: Chapter 3 of Sutton and Barto. - 9/6 Multi-Armed Bandits
- 9/4 Class Introduction -- What is Reinforcement Learning?
Assigned Reading: Chapters 1 and 2 of Sutton and Barto.
Final Projects [top]
Projects:
- CARL: Cloud Assisted Reinforcement Learning
Abdullah Bin Faisal
[Proposal] [Final Report] - Using Traffic Data Management to Enhance Safety of a Reinforcement Learning Agent
Evana Gizzi and David Zabner
[Proposal] [Final Report] - Path Planning Amidst Moving Obstacles
Hifza Khalid and Jong Seo Yoon
[Proposal] [Final Report] - Creating Harmony Using Human Reinforcement and Machine Learning
Thomas Klimek and Ben Machlin
[Proposal] [Final Report] - A competition between travelling salesmen
Cuong Nguyen and Daniel Dinjian
[Proposal] [Final Report] - Better Gaming: Policy Control with Reward Approximation
Dan Pechi, Jeremy Shih, and Rui Sun
[Proposal] [Final Report] - Improving Sentences with Online Policy Gradient Methods
Andrew Savage and Bhushan Suwal
[Proposal] [Final Report] - Stable Locomotion in Unstructured Terrain using Curriculum Learning for Online Parameter Adaptation
Sam Shaw and Mateo Guaman
[Proposal] [Final Report] - An In-Depth Investigation Of The Effects Of Feature Reduction On The Performance Of DQN Atari Agents
Holt Spalding and Oliver Newland
[Proposal] [Final Report] - Mission Impossible: Deep Q Network Agent in DOOM
Yirong Tang and Shucheng Tian
[Proposal] [Final Report] - Deep Asynchronous Reinforcement Knowledge (DARK) for Embodied Intelligent Agent
Gyan Tatiya and Sambit Pradhan
[Proposal] [Final Report]
Important Dates:
Team Formation (up to 2 people per team): October 30th
Project Proposal Writeup due: November 6th
Final Project Presentations: Dec 4 and Dec 6
Final Project Report and Deliverables: Dec 18
Course Overview [top]
"Reinforcement learning problems involve learning what to do --- how to map situations to actions --- so as to maximize a numerical reward signal." - Sutton and Barto ("Reinforcement Learning: An Introduction", course textbook)
This course will focus on agents that must learn, plan, and act in complex, non-deterministic environments. We will cover the main theory and approaches of Reinforcement Learning (RL), along with common software libraries and packages used to implement and test RL algorithms. The course is a graduate seminar with assigned readings and discussions. The content of the course will be guided in part by the interests of the students. It will cover at least the first several chapters of the course textbook. Beyond that, we will move to more advanced and recent readings from the field (e.g., transfer learning and deep RL) with an aim towards focusing on the practical successes and challenges relating to reinforcement learning.
There will be a programming component to the course in the form of a few short assignments and a final project. Students are expected to be proficient programmers in at least one of the following languages: C++, Java, or Python. Prior coursework (or experience) in Artificial Intelligence and/or Machine Learning is highly recommended, but not required.
Course Requirements [top]
Grades will be based on
- class participation (including attendance) (10%);
- programming assignments (45%)
- reading responses (10%)
- class discussion moderation (10%)
- a final project (25%)
Class Participation
You are expected to attend each class and actively participate by taking part in discussions and activities, and asking questions. If you anticipate missing a class, let me know as soon as you are able.
Programming Assignments
Students will be required to complete four minor programming assignments of their own choosing. In most cases these will come from the exercises in Sutton and Barto, though other options are possible upon consultation with the instructor. These exercises will generally not involve extensive or elaborate programs. The emphasis should be on empirically analyzing various learning algorithms and reporting the results. The reports, all relevant code and data should be submitted on canvas.
Grades for each programming assignment will be out of 10 points. Some examples of reports are included below:
7 to 7.5 - Adequate but the bare minimum in terms of experiments and analysis Example
8 to 8.5 - Good job but with some room for improvement Example
9 to 9.5 - Good analysis and good presentation of results Example
10+ - Excellent, does more than what had been asked Example
Reading Responses
Students should post responses to the readings on the Canvas forum. Reading response assignments will be announced in class and will be due on the day the assigned reading is due, before class. Credit will be based on evidence that you have done the readings carefully. You are encouraged to also read some of your peers' responses. The response should include a summary of the reading along with any of the following:
- Insightful questions;
- Clarification questions about ambiguities;
- Comments about the relation of the reading to previous readings;
- Critiques on the research;
- Critiques on the writing style or clarity;
- Thoughts on what you would like to learn about in more detail;
- Possible extensions or related studies;
- Thoughts on the paper's importance; and
- Summaries of the most important things you learned.
Discussion Moderation
Each student will lead a discussion on one of the readings. The discussion can begin with a brief summary/overview of the important points in the readings, but the assumption is to be that everyone has already completed the readings. The student may either present material related to the readings (perhaps from an outside source) or moderate a class discussion about the readings. The student must be prepared to keep the conversation flowing. Some tips on leading a good discussion are available here, courtesy of Peter Stone. It is required that you present your plan for the discussion, including any slides you intend to show, to the Professor and TA at least two nights prior to your discussion by email. The sign-up sheet for discussion slots will be posted on Canvas via an announcement. As there are more students than days of class, some days will feature 2 discussion moderators.
Text and Website [top]
The main reference used throughout the course will be "Reinforcement Learning: An Introduction", Second edition, in progress.
In addition, relevant research papers and book chapters will also be assigned, and later chosen by the students following their project topic.
Software Resources [top]
The Brown-UMBC Reinforcement Learning and Planning java code library: BURLAP
Ms. Pac-Man domain and RL environment in Java: [ZIP]
Simple Q-Network for playing Atari games: [ZIP]. The code is based on Andrej Karpathy's code described here.
MATLAB example: Q-learning for pendulum control: [LINK]
Reinforcement Learning with PyTorch: [LINK]
A collection of RL examples from WILDML: [LINK]
Related Conferences and Journals [top]
Credits and Similar Courses [top]
This class is heavily inspired by a course on Reinforcement Learning taught at UT Austin by Peter Stone. Feel free to thank him if you enjoy it.
Academic Dishonesty Policy [top]
You are encouraged to form study groups and discuss the reading materials assigned for this class. You are allowed to discuss the reading response assignments with your colleagues. You are also allowed to discuss the programming assignments (e.g., in front of a white board). However, each student will be expected to write their own response and code. Sharing of code is not allowed!
Collaboration is expected for the final projects -- as soon as you can, you will form teams of 2 members. If you absolutely insist on working alone, I won't stop you but you'll be facing a larger work load. For the final project, you're allowed to (and expected to) use various open-source libraries, published code, algorithms, datasets, etc. As long as you cite everything you use that was developed by someone else, you'll be fine.
IMPORTANT: Cheating, plagiarism, and other academic misconducts will not be tolerated and will be handled according to Tufts' policy on academic dishonesty. According to that policy, if I find any evidence of dishonesty, I am required to report it.
[Back to Department Homepage]