My Page

Ph.D. Student

Abdullah Bin Faisal

I am currently on the job market, seeking research based positions in AI + systems. My interest is in building efficient systems grounded in human-centered design. I recently completed my Ph.D. in Computer Science at Tufts University, where I explored how to make modern AI systems more predictable and resource-efficient, developing infrastructure that powers large-scale AI applications and building tools that bring these capabilities into classrooms.

Publications

When will my ML Job finish? Toward providing Completion Time Estimates through Predictability-Centric Scheduling
Abdullah Bin Faisal, Noah Martin, Hafiz Mohsin, Swaminathan Lamelas, Fahad R. Dogar
Usenix OSDI 2024

Abstract. In this paper, we make a case for providing job completion time estimates to GPU cluster users, similar to providing the delivery date of a package or arrival time of a booked ride. Our analysis reveals that providing predictability can come at the expense of performance and fairness. Existing GPU schedulers optimize for extreme points in the trade-off space, making them either extremely unpredictable or impractical. To address this challenge, we present PCS, a new scheduling framework that aims to provide predictability while balancing other traditional objectives. The key idea behind PCS is to use Weighted-Fair-Queueing (WFQ) and find a suitable configuration of different WFQ parameters (e.g., queue weights) that meets specific goals for predictability. It uses a simulation-aided search strategy to efficiently discover WFQ configurations that lie around the Pareto front of the trade-off space between these objectives. We implement and evaluate PCS in the context of scheduling ML training workloads on GPUs. Our evaluation, on a small-scale GPU testbed and larger-scale simulations, shows that PCS can provide accurate completion time estimates while marginally compromising on performance and fairness.
```
@inproceedings{Faisal2024,
 author = {Abdullah Bin Faisal and Noah Martin and Hafiz Mohsin and Swaminathan Lamelas and Fahad R. Dogar},
 title = {When will my ML Job finish? Toward providing Completion Time Estimates through Predictability-Centric Scheduling},
 booktitle = {18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)},
 year = {2024},
 publisher = {USENIX Association}
}
                            
```
Workload Adaptive Flow Scheduling
Abdullah Bin Faisal, Hafiz Mohsin, Ihsan A. Qazi, Zartash A. Uzmi, Fahad R. Dogar
ACM CoNEXT 2018

Abstract. Existing flow scheduling schemes for data center networks optimize for a specific workload and performance metric. In this paper, we present 2D, a new scheduling policy that offers robustness across performance metrics and changing workloads - a ground existing scheduling policies are unable to cover. 2D combines basic scheduling building blocks of multiplexing and serialization in a principled way, ensuring tail optimal performance across workloads while also improving the average (and lower percentiles) completion times. To implement 2D for flow-level scheduling in a distributed setting, we break-up the scheduling decision into two parts: coarse time-scale decisions based on workload and load changes are made by a centralized controller while per-flow serialization decisions are made in a distributed fashion, involving the end-points and sequencer(s). Our testbed experiments show that, for realistic cloud workloads, 2D provides consistent gains at the tail and average flow completion times compared to basic scheduling techniques (e.g., FIFO and fair sharing) as well as heuristic-based schedulers (e.g., Aalo and Baraat).
```
@inproceedings{Faisal2018,
 author = {Abdullah Bin Faisal and Hafiz Mohsin and Ihsan Ayyub Qazi and Zartash Uzmi and Fahad R. Dogar},
 title = {Workload adaptive flow scheduling},
 booktitle = {Proceedings of the 14th International Conference on Emerging Networking EXperiments and Technologies},
 year = {2018},
 publisher = {Association for Computing Machinery}
}
                            
```
Reducing tail latency using duplication: a multi-layered approach
Hafiz Mohsin, Abdullah Bin Faisal, M. Asim Jamshed, Pter Vondras, Ali Musa Iftikhar, Ihsan A. Qazi, Fahad R. Dogar
ACM CoNEXT 2019

Abstract. Duplication can be a powerful strategy for overcoming stragglers in cloud services, but is often used conservatively because of the risk of overloading the system. We call for making duplication a first-class concept in cloud systems, and make two contributions in this regard. First, we present duplicate-aware scheduling or DAS, an aggressive duplication policy that duplicates every job, but keeps the system safe by providing suitable support (prioritization and purging) at multiple layers of the cloud system. Second, we present the D-Stage abstraction, which supports DAS and other duplication policies across diverse layers of a cloud system (e.g., network, storage, etc.). The D-Stage abstraction decouples the duplication policy from the mechanism, and facilitates working with legacy layers of a system. Using this abstraction, we evaluate the benefits of DAS for two data parallel applications (HDFS, an in-memory workload generator) and a network function (Snort-based IDS cluster). Our experiments on the public cloud and Emulab show that DAS is safe to use, and the tail latency improvement holds across a wide range of workloads.
```
@inproceedings{Mohsin2019,
 author = {Hafiz Mohsin and Abdullah Bin Faisal and M. Asim Jamshed and Peter Vondras and Ihsan Ayyub Qazi and Fahad R. Dogar},
 title = {Reducing tail latency using duplication: a multi-layered approach},
 booktitle = {Proceedings of the 15th International Conference on Emerging Networking EXperiments and Technologies},
 year = {2019},
 publisher = {Association for Computing Machinery}
}
                            
```
Network resource management as a database problem
Hafiz Mohsin, Abdullah Bin Faisal, Fahad R. Dogar
ACM SoCC 2022

Abstract. Network resource management, or how bandwidth is allocated to flows, tenants, or applications, is a challenging problem. In this paper, we call for using the database abstraction for network resource management. A database provides simple constructs for supporting complex resource management tasks, such as transactions to support bandwidth reservations on multiple links, virtual tables for restricting the view of tenants in a cloud environment, and many others. To highlight the opportunities and challenges in this space, we present a research agenda around new abstractions and policy languages, the necessary data plane support, and potential for domain specific replication and sharding strategies for the resource management database.
```
@inproceedings{Mohsin2022,
 author = {Hafiz Mohsin and Abdullah Bin Faisal and Fahad R. Dogar},
 title = {Network resource management as a database problem},
 booktitle = {Proceedings of the 13th Symposium on Cloud Computing},
 year = {2022},
 publisher = {Association for Computing Machinery}
}
                            
```

Under Review

LLMProxy: Reducing Cost to Access Large Language Models
Abdullah Bin Faisal*, Noah Martin*, Hiba Eltigani, Rukhshan Haroon, Swaminathan Lamelas, Fahad R. Dogar
* Equal contribution
Preprint

Abstract. In this paper, we make a case for a proxy for large language models which has explicit support for cost-saving optimizations. We design LLMProxy, which supports three key optimizations: model selection, context management, and caching. These optimizations present tradeoffs in terms of cost, inference time, and response quality, which applications can navigate through our high level, bidirectional interface. As a case study, we implement a WhatsApp-based Q&A service that uses LLMProxy to provide a rich set of features to the users. This service is deployed on a small scale (100+ users) leveraging the cloud; it has been operational for 15+ weeks and users have asked 1400+ questions so far. We report on the experiences of running this service as well as microbenchmark the specific benefits of the various cost-optimizations we present in this paper.
```
@misc{Martin2024,
 author = {Noah Martin and Abdullah Bin Faisal and Hiba Eltigani and Rukhshan Haroon and Swaminathan Lamelas and Fahad Dogar},
 title = {LLMProxy: Reducing Cost to Access Large Language Models},
 eprint = {2410.11857},
 year = {2024},
 archivePrefix = {arXiv}
}
                            
```

Teaching Experience

Instructor – CS 135: Introduction to Machine Learning (Summer 2024)

Topics: Introductory concepts in ML (regression, neural networks)
Class size: 20-25 students

Guest Lecturer – CS 185: Computing for Developing Regions (Spring 2023, Fall 2019)

Topics: ML for societal impact, performance modeling of networked systems
Class size: 30-35 students

Guest Lecturer – CS 112: Computer Networks (Fall 2019)

Topics: Performance modeling of networked systems
Class size: 30-35 students

Teaching Assistant

CS 112: Computer Networks (Fall 2024)
CS 135: Introduction to Machine Learning (Spring 2024)
CS 150: Deep Neural Networks (Fall 2023)

Awards

Loevner Fellowship Award 2019-2020
For academic and research excellence
Dean's Fellowship Award 2018-2019
For academic and research excellence
Dean's Fellowship Award 2017-2018
For academic and research excellence

Contact

Email: abdullah@cs.tufts.edu

Office: Joyce Cummings Center Room 440K