# COMP 135 Introduction to Machine Learning

## Instructor

## Teaching assistant

Minh Nguyen, Nguyen Duc Nguyen, and Duc Nguyen (please contact them through piazza)

## Class times and location

MW 4:30-5:45pm, Robinson Hall 253

## Office hours:

W, Th 2:00-3:00pm, Halligan 234

## Description & Objective:

As we collect more and more data, we need computers to help us to extract important information from data. Machine learning provides different methods to serve the purpose. The aim of machine learning is to ``learn’’ from data – the learned model either can predict un-observed fact or provide high-level knowledge.

This introductory course will focus on basic machine learning concepts and models. At a high level, these topics will be covered: machine learning formulations, learning objective and evaluation, supervised and unsupervised learning models.

At the end of the course, a successful student should be able to identify the machine learning problems and their formulations from the data, apply different learning models to these problems, and analyze model outputs.

We will use piazza for class discussion (access code is needed). The instructor and TAs will monitor posts at piazza, so please ask questions there instead of via emails. (You can send me an email if your piazza message is not responded over 24 hours, otherwise I may neglect your email.)

## Prerequisites:

Comp 15 and COMP/MATH 61 or consent of instructor. Some math such as calculus, linear algebra, and probability, will be very helpful.

## Course Work and Grading Policy

The course work is a combination of homework assignments (20%), programming projects (30%), midterm exam (19%), final exam (30%), and class participation (1%).

## Policy for Late Submissions:

The due time of your homework and projects will be listed in their descrititions. If your submission is one day late, you will get 50% of the credits you normally get. If your submission is two days late, the percentage is 25%. You will get no credit for a submission two days after the due time. In case of documented illness or family emergency, the due date can be postponed accordingly at the request of the student.

## Tentative List of Topics

- Supervised learning concepts: Bayesian decision theory; experimental evaluation; avoid overfitting; PAC learning theory.
- Supervised learning models: Naive Bayes classifier; linear classifiers; nearest neighbors; support vector machines; neural networks. decision trees; aggregation methods (boosting and bagging).
- Unsupervised learning: clustering.
- Additional topics selected from: collaborative filtering.

Here is a tentative schedule.

## The textbook and other related books

Text book (we will closely follow this book):

- [CIML] A Course in Machine Learning. Hal Daumé III. http://ciml.info.[link]

Other related books (the first three are free):

- [IML] Introduction to Machine Learning, Third Edition. Ethem Alpaydin. MIT Press, 2010.[Tufts access link]
- [ESML]: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Springer, 2009. [link]
- [DM]: Data Mining, 4th Edition. Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. Morgan Kaufmann, 2016. [link]
- [ML]: Machine Learning. Tom M. Mitchell. McGraw-Hill, 1997.
- [PC]: Pattern Classification Second Edition. R. Duda, P. Hart, and D. Stork. John Wiley & Sons, 2001.
- [MLPP]: Machine Learning: A Probabilistic Perspective. Kevin P. Murphy. MIT Press, 2012.
- [PRML]: Pattern Recognition and Machine Learning. Christopher M. Bishop. Springer, 2006.

# Software

In this course you will be mostly writing your own machine learning code for assignments. You are encouraged to use scikit-learn.

## Academic Integrity Policy:

On homework assignments and projects: you must work out the details of each solution and code/write it out on your own. You may verbally discuss the problems and general ideas about their solutions with other students, but you CANNOT show and copy written or typed solutions from others. You may consult other textbooks or existing content on the web, but you CANNOT ask for answers through any question answering websites like (but not limited to) Quora, StackOverflow, etc.. If you see some material having the same problem and providing a solution, you CANNOT check or copy the solution provided.

On exams: no collaboration is allowed.

In general, this course will strictly follow the Academic Integrity Policy of Tufts University. For any issues not covered above, please refer to the Academic Integrity Policy at Tufts.

## Accessibility:

Tufts and the instructor of COMP 135 in 2018 Spring strive to create a learning environment that is welcoming students of all backgrounds. Please see the detailed accessibility policy at Tufts.