Keynote Speaker

Esha Choukse

Esha Choukse

Principal Researcher, Azure Research — Systems (AzRS), Microsoft

From Models to Systems: Scaling Agentic and Multimodal AI through Cross-Stack Co-Optimization

Abstract

Recent advances in AI agents and multimodal foundation models are reshaping how intelligent systems perceive, reason, and act. Yet translating these capabilities into practical deployments, particularly in real-time, interactive, and resource-constrained environments requires rethinking the traditional boundaries between machine learning models and computer systems. In this talk, I will present a cross-layer perspective on building next-generation AI systems, drawing on our recent work spanning scalable and reliable agentic workflows as well as real-time multimodal generation. Across these efforts, a recurring theme emerges: system performance is no longer determined solely by hardware efficiency or model quality in isolation, but by joint optimization across models, runtimes, compilers, and hardware platforms.

I argue that the next frontier in systems design lies in treating accuracy itself as a first-class systems knob—one that can be dynamically traded for latency, throughput, cost, and energy in principled ways. This shift requires new abstractions for reasoning about uncertainty, adaptive execution strategies, and scheduling mechanisms that co-optimize model behavior and system resources. I will highlight key challenges and opportunities at this intersection for the systems community.

Bio

Esha Choukse is a Principal Researcher in the Azure Research — Systems (AzRS) group at Microsoft. Her research focuses on efficient and sustainable AI across the computing stack, spanning AI platforms, hardware, and datacenter-scale infrastructure. She is a recipient of the ACM SIGMICRO Early Career Award for foundational contributions to hardware memory compression and to sustainable and efficient datacenter systems. Her papers have received three IEEE Micro Top Picks and an HPCA Best Paper Award. Several of her projects, including Splitwise and power stabilization in AI training datacenters, have had far-reaching impact on the research community and are deployed broadly across industry. Esha received her Ph.D. from The University of Texas at Austin in 2019 and has published extensively in leading venues including ISCA, ASPLOS, MICRO, HPCA, NSDI, and SC.

Program

13:30–13:45
Introductions
13:45–14:30
Invited Talk Esha Choukse, Azure Research
Session 1: Multimodal Agents: From Perception to Action
14:30–14:50
Characterizing VLA Models: Identifying the Action Generation Bottleneck for Edge AI Architectures Manoj Vishwanathan (Purdue University), Suvinay Subramanian (Google), Anand Raghunathan (Purdue University)
14:50–15:10
Hybrid Reflection for Efficient VLM Deployment in GUI Agents Mingxuan Yang, Qingwen Li, Kai Lv, Hongwei Tang, Xuehai Hong (University of Chinese Academy of Sciences)
15:10–15:30
AI Coding Agents Need Better Compiler Remarks Akash Deo (Northwestern University), Simone Campanoni (Northwestern University and Google), Tommy McMichen (Northwestern University)
15:30–16:00
☕ Coffee Break
Session 2: Reliable Agent Systems
16:00–16:20
Forge: Rethinking the Agent-System Interface Layer for Building Reliable LLM Agents Kevin Song, Anand Jayarajan (University of Toronto), Yaoyao Ding (University of Toronto, Vector Institute), Qidong Su, Zhanda Zhu, Sihang Liu, Gennady Pekhimenko (University of Toronto and NVIDIA)
16:20–16:40
ACRFence: Preventing Semantic Rollback Attacks in Agent Checkpoint-Restore Yusheng Zheng, Yiwei Yang (UC Santa Cruz), Wei Zhang (University of Connecticut), Andi Quinn (UC Santa Cruz)
Session 3: GenAI Systems and Hardware Co-Design
16:40–17:00
Unleashing GPU Resource Management with the LithOS Operating System Patrick H. Coppock, Eliot H. Solomon, Vasilis Kypriotis, Todd C. Mowry, Dimitrios Skarlatos (Carnegie Mellon University)
17:20–17:40
AI-Driven Hardware-Aware Co-Design for Multi-Chip Inference Suyeol Lee, Gyunghee Park (FuriosaAI)
17:40–18:00
AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization Genghan Zhang (Stanford University), Shaowei Zhu (Amazon Web Services), Anjiang Wei (Stanford University), Zhenyu Song, Allen Nie (Amazon), Zhen Jia (Amazon Web Services), Nandita Vijaykumar (University of Toronto), Yida Wang (Amazon), Kunle Olukotun (Stanford)

Post-Workshop

A post-workshop summary paper capturing insights and challenges discussed during the event will be released publicly on arXiv or ACM Digital Library for open access.