Shreya Shankar

🚀 Exciting News: I'm co-teaching a course on 📊 AI Evals for engineers and product managers. We completed our first cohort with lots of great feedback (see testimonials) and are teaching our second and final cohort in July. Sign up now!

About Me

My dog Papaya 🐕 and me on a hike 🥾

I'm Shreya Shankar, a fifth-year PhD student at UC Berkeley in the EECS department. I am in the data systems and foundations group, advised by Dr. Aditya Parameswaran and supported by the NDSEG Fellowship. Go Bears! 🐻

As of Spring 2025, I am also a visiting student researcher in the Systems Research @ Google group. We are exploring cost optimization for LLM-powered data processing systems.

Prior to my PhD, I worked as an ML engineer in industry. I completed my BS and MS in computer science at Stanford. Go Trees! 🌲

My dog Papaya 🐕 and me on a hike 🥾

🔬 Research Interests

I build agentic systems to help people work with data. I lead the DocETL project, a stack for semantic processing and analysis of unstructured data. My research falls into two main categories:

Semantic Data Processing Systems and Interfaces
I design and build systems that enable users to process unstructured documents using natural language. This includes work on declarative interfaces for document processing, optimization techniques for LLM query execution, and interactive tools for data processing with AI agents.
Operationalizing Machine Learning and AI
I develop frameworks and tools that help practitioners build reliable ML systems. This includes research on data quality validation, LLM evaluation frameworks that align with human preferences, and monitoring systems for ML applications in production.

I am fortunate that several of my research projects have been deployed in production at major tech companies and startups.

👉 Click to show/hide full bio for speaking engagements

📝 Bio (for speaking engagements, etc.)

Shreya Shankar is a PhD student in computer science at UC Berkeley. Her research is at the intersection of AI and data processing, with focuses on interfaces, reliability, and optimization. She also co-teaches a hands-on LLM evaluation course for practitioners, with over 1000 participants to date.

Shreya is advised by Dr. Aditya Parameswaran. Her work appears in top data management and HCI venues like SIGMOD, VLDB, CIDR, CSCW and UIST, and she co-organizes the DEEM workshop at SIGMOD. She is supported by the NDSEG Fellowship. Prior to Berkeley, she worked as an ML engineer after completing her B.S. in computer science at Stanford University. In her free time, she enjoys roasting coffee and is actively trying to reduce her Twitter usage.

📰 News and Industry Impact

Companies That Like Our Work 👍

👨‍🏫 Mentorship

I am fortunate to work with many talented students at UC Berkeley. Below is a list of students I am currently mentoring or have mentored for a year or more.

Current Students

Nikhil Rao (high school student) - Working on cost optimization for the semantic reduce operator in DocETL. First-author workshop paper in progress.
Sasha Singh (UC Berkeley undergraduate) - Working on cost optimization for semantic join operations; co-mentored with Sepanta Zeighami.

Past Students

Quentin Romero Lauro (University of Pittsburgh undergraduate, REU at UC Berkeley) - Developing an interactive debugging tool for RAG pipelines. First-author paper under submission.
Rachel Lin (UC Berkeley master's student) - Developing interfaces for iterative dataset search with LLMs; co-mentored with Madelon Hulsebos. First-author paper under submission.
Reya Vir (UC Berkeley undergraduate) - Built a benchmark for synthesizing data quality constraints for LLM applications. Co-first-authored a publication at NAACL To pursue a PhD at Columbia University, with support from the NSF GRFP.
Ankush Garg (UC Berkeley master's student) - Building SCIPE, a debugging tool for complex chains and graphs of LLM calls. Deployed SCIPE with LangChain!
Parth Asawa (former UC Berkeley undergraduate) - Worked on data quality constraints for LLM applications and declarative LLM workflows. Co-authored two publications at CIDR and VLDB. Now pursuing a PhD at UC Berkeley.
Yujie Wang (former UC Berkeley undergraduate) - Worked on monitoring ML performance metrics without ground-truth labels. Co-authored a publication at CIDR. Joined Google after graduation.
Aditi Mahajan (former UC Berkeley undergraduate) - Worked on unit tests for end-to-end ML pipelines. Joined Google after graduation.

🗣️ Selected Invited Talks

Towards a "DocStack" for Agentic Data Processing

[July '25] Adobe Research
[June '25] Redis Labs

DocWrangler

[May '25] LangChain Disrupt Conference
[April '25] SF Public Defender's Office
[April '25] Spring EPIC Lab Retreat
[March '25] Montreal HCI Seminar

DocETL

[March '25] UC Berkeley BLISS Lab Seminar
[March '25] Brown University DB Seminar
[Feb '25] Columbia University DB Seminar
[Feb '25] Scottish Climate Intelligence Service
[Jan '25] Cloudera
[Dec '24] Microsoft: Gray Systems Lab
[Nov '24] Snowflake
[Nov '24] ByteDance (TikTok)
[Nov '24] Google: Systems Research Group
[Nov '24] WInE Lab at CMU
[Nov '24] Solventum
[Oct '24] US Army Research Laboratory

Some Past Recordings

📬 Contact

Email: shreyashankar@berkeley.edu
Twitter | Github

Download Outdated CV (PDF)