COMP150: Machine Learning for Graph Data Analytics
Instructor
Class times and location
TR 10:30-11:45 @ Collaborative Learning and Innovation Complex 316
Office hours:
W 2-3pm, F 2-3pm at Halligan 234
Overview
Graph and network data are ubiquitous and often in large scale. Graph data are generally characterized by the graph structure and data attached to graph nodes or edges. Machine learning is an important approach to automated information extraction from graph data. However, graph data need special model designs, as most learning models (e.g. neural networks) only accept vectors as the input. Models for graph data generally fall into two categories: 1) models that learn vector representations of graph data, and 2) models that take graphs as the input.
In this course, we will start with an introduction of graph theory, linear algebra, and machine learning, then we will cover the following topics in depth:
- Node representation
- Graph representation and generative models
- Graph convolutional neural networks
- Learning to solve hard graph problems
- Graphs in chemistry
- Knowledge graph
The course work consists of 3 projects and a final project.
Objectives
After this course, a successful student should acquire the following abilities when solving a learning problem involving graph data:
- identifying the type of learning problem (e.g. whether node embeddings or graph embeddings are needed; what target to fit)
- choosing the appropriate type of learning models (e.g. preparing correct inputs and fitting targets to the model)
- training related learning models with existing packages to solve the problem
Schedule
Week | Content | Assignment |
---|---|---|
week 1 (Sep 2) | Graph theory; Linear algebra | |
week 2 (Sep 9) | Graph Laplacian; Graph signal [tutorial] | Proj 1.1 out |
week 3 (Sep 16) | Node2vec [paper]; Discussion: embedding propagation | Proj 1.1 due |
week 4 (Sep 23) | Visualization [tSNE]; Other variants [papers] | Proj 1.2 out |
week 5 (Sep 30) | GCN [paper]; Discussion: GAT [paper] | Proj 1.2 due; Proj 2 out |
week 6 (Oct 7) | Other variants | |
week 7 (Oct 14), R only | Graph classification and graph kernel | Proj 2 due |
week 8 (Oct 21) | Graph classification (cont.); Autoencoding intro [paper] | Final proj proposal due; Proj 3 out |
week 9 (Oct 28) | Graph encoding [paper] | |
week 10 (Nov 4) | Graph generation [papers] | Proj 3 due |
week 11 (Nov 11) | Chemical graph tutorial | |
week 12 (Nov 18) | Molecule graph; reaction graph | |
week 13 (Nov 25, T only) | Knowlege graph | |
week 14 (Dec 2) | Knowledge graph embedding; knowledge graph applications | Final proj due at exam |
Course Work and Grading Policy
-
In-class quizzes (5%): there are three to five in-class quizzes scheduled at random dates. The purposes are encouraging attendance and collecting feedback.
- Participation (5%):
- participate class discussion (3%): the instructor will take notes at students' questions and monitor class discussions.
- participate piazza discussion (2%): the top 10 piazza contributors get 1%. Other students get credit in proportional to the 10th person.
- Assignments (40%):
- Assignment 1 (13%): setting up the programming environment; implementing of a simple node embedding algorithm
- Assignment 2 (13%): implementing a graph convolutional neural network
- Assignment 3 (14%): implementing a graph autoencoder
- Final project (50%):
- Project proposal (10%): Students are encouraged to form teams to work on a problem as the final project. A team can have at most two students. A team needs to first write a proposal, which includes problem description, the dataset, the plan, and a review of current methods.
- Project implementation and report (35%): The team needs to excute the plan for the proposed problem and write a report. The report should take the format of research paper.
- Project presentation (5%): The team needs to present the project to the entire class.
Prerequisites:
Comp 135 Introduction to Machine Learning. Background knowledge in linear algebra
Schedule
Academic Integrity Policy:
On assignments: you must work out the details of each solution and code/write it out on your own. You may verbally discuss the problems and general ideas about their solutions with other students, but you CANNOT show and copy written or typed solutions from others. You may consult other textbooks or existing content on the web, but you CANNOT ask for answers through any question answering websites like (but not limited to) Quora, StackOverflow, etc.. If you see some material having the same problem and providing a solution, you CANNOT check or copy the solution provided.
On the final project: each team needs to work out the project on its own. The team members should try their best balance the work between the two team members. If any code is from a third-party, the code needs to be wrapped in a function or package and labled as third-party.
This course will strictly follow the Academic Integrity Policy of Tufts University. For any issues not covered above, please refer to the Academic Integrity Policy at Tufts.
Accessibility:
Tufts and the instructor of COMP 135 in 2018 Spring strive to create a learning environment that is welcoming students of all backgrounds. Please see the detailed accessibility policy at Tufts.