I am a second year Ph.D student in the computer science department at Tufts University. I am being advised by Dr. Fahad Dogar. My interests lie at the intersection of systems, networks and machine learning. I completed my BSc in Electrical Engineering from LUMS in 2017. I have been excited about research since my undergrad days under the mentorship of Dr. Ihsan and Dr. Zartash.
Resource adaptive DNN training
Distributed training of Deep Neural Networks is a resource hungry process. In shared compute clusters,
other (training) tasks compete for different resources (e.g., network, GPUs). Thus the
resources available to a particular DNN training job can vary during its training cycle, potentially increasing the time it takes to training the DNN.
We propose the idea of adapting the DNN architecture to meet the available resource
capacities: in epochs of high load, a DNN model can downsize itself and vice versa. Such an
approach trades-off some accuracy for timeliness.
This project involves addressing interesting ML challenges like amortizing the accuracy
hit during a transition from one DNN to another, and system challenges like forecasting
available resource capacity in the future.
Workload adaptive flow scheduling in Data centers
This project is about designing a learning based scheduling policy (2D) that is robust to changes in workloads (job size distributions). 2D uses principles from existing scheduling policies and learning to meet its objective of being tail-optimal in the face of changing workloads.
Reducing tail-latency via duplicating requests
Duplication can help alleviate the problem of tail-latency when one resource becomes a straggler. However, it can double the load on the system. In this work, we investigate making duplication safe - by using prioritization and purging - and easy to implement using a high level interface.