Example projects (you’ll focus on one primary project):

  • Optimize bin packing in our dynamic environment to increase host utilization rates, reducing global infrastructure costs.
  • Build a code risk scoring model that predicts the likelihood of a code change causing a production incident, helping us block high-risk changes before they land.
  • Navigate the messiness of incident data to build classifiers that distinguish between true and false positive rollbacks, creating a robust dataset for automated mitigation of production failures.
  • Advance the state of Cadence workflow versioning by designing "correct-by-design" systems that prevent the manual, error-prone versioning issues that currently lead to production outages.
  • Apply Generative AI or static/dynamic analysis techniques to identify patterns in incident postmortems, preventing future regressions before they happen.
  • Collaborate cross-functionally with teams in Reliability and Efficiency to translate complex business problems into elegant, scalable technical solutions.
  • Own your research end-to-end—from initial idea and experimentation to building a prototype that works within Uber’s production ecosystem.