Example projects (you’ll focus on one primary project):
- Optimize bin packing in our dynamic environment to increase host utilization rates, reducing global infrastructure costs.
- Build a code risk scoring model that predicts the likelihood of a code change causing a production incident, helping us block high-risk changes before they land.
- Navigate the messiness of incident data to build classifiers that distinguish between true and false positive rollbacks, creating a robust dataset for automated mitigation of production failures.
- Advance the state of Cadence workflow versioning by designing "correct-by-design" systems that prevent the manual, error-prone versioning issues that currently lead to production outages.
- Apply Generative AI or static/dynamic analysis techniques to identify patterns in incident postmortems, preventing future regressions before they happen.
- Collaborate cross-functionally with teams in Reliability and Efficiency to translate complex business problems into elegant, scalable technical solutions.
- Own your research end-to-end—from initial idea and experimentation to building a prototype that works within Uber’s production ecosystem.
