2022 ACM Symposium on Cloud Computing

Neeraja Yadwadkar

Systems for ML: It’s all about the choices

Abstract

Abstract - This talk focuses on the following fundamental question: What does Machine Learning (ML) need from Systems? I will use inference serving as an example to make a case that the “Systems for ML” research is primarily about making the right choices. Today, many applications rely on inference from machine learning models, especially neural networks. For instance, applications on Facebook issue tens-of-trillions of inference queries per day with different privacy, performance, accuracy, and cost constraints. Unfortunately, existing distributed inference serving systems ignore ease-of-use and hence result in significant cost inefficiency, especially at large scales. Existing systems force developers to manually search through thousands of model-variants — versions of already-trained models with differing hardware, resource footprints, latencies, costs, privacy, and accuracies — to meet the diverse application requirements. As requirements, query load, and applications themselves evolve over time, developers must make these decisions dynamically for each inference query to avoid excessive costs through naive autoscaling. To avoid navigating through the large and complex trade-off space of model-variants, developers often fix a variant across queries and replicate it when load increases. However, given the diversity across variants and hardware platforms across the cloud-edge spectrum, a lack of understanding of the trade-off space incurs significant costs. For applications to use machine learning, we must automate issues that affect ease-of-use, privacy, performance, and cost efficiency for users and providers. We argue for managed distributed inference serving for a variety of models, across the cloud-edge spectrum.

Bio

Neeraja is an assistant professor in the department of Electrical and Computer Engineering at UT Austin. She is a Cloud Computing Systems researcher, with a strong background in Machine Learning (ML). Most of her research straddles the boundaries of Systems and ML: using and developing ML techniques for systems, and building systems for ML. Before joining UT Austin, she was a postdoctoral fellow in the Computer Science department at Stanford University, and before that, received her PhD in Computer Science from UC Berkeley. She had previously earned a bachelors in Computer Engineering from the Government College of Engineering, Pune, India.

Fresh Thinking Talk Speakers

Kostis Kaffes

Abstract

Bio

Neeraja Yadwadkar

Abstract

Bio