Jacob Austin

Jacob Austin

I am a computer scientist interested in code synthesis and systems problems in large-scale machine learning. I am currently a Senior Research Engineer at Google DeepMind in NYC (formerly Montreal).

I have helped to build and deploy many of Google's largest language models (including Gemini, PaLM, PaLM-2, and Bard) and some of its internal developer tools (like ML-powered code completion and ML-powered code review). I also did some early research on program synthesis with LLMs and text diffusion and helped write a little textbook on language model scaling.

Previously, I was an AI Resident at Google Brain advised by Danny Tarlow and Hugo Larochelle, and worked as a research intern at NVIDIA Research with Anima Anandkumar and Yuke Zhu. I did my undergraduate work at Columbia University in Computer Science and Mathematics.

  • About
  • Research
  • Projects
  • Music
  • Writing
  • Contact

I am currently a Senior Research Engineer at Google DeepMind, working on program synthesis and large language models. I was previously an AI Resident at Google Brain, and a Research Intern at NVIDIA with Anima Anandkumar.

I did my bachelor's degree in computer science and mathematics at Columbia University. I studied machine learning and robotics at the Columbia Creative Machines Lab. I'm also a pianist. I was also an intern at NASA JPL in Summer 2019.

As a pianist, I play a lot of chamber music, and I've performed at Carnegie Hall, Music Mountain, Apple Hill, and Kinhaven.

My research currently focuses on making LLMs cheaper to train and serve. I believe that - in today's scaling-oriented ML environment - the most important innovations come from systems engineering and not from algorithmic innovation. Making LLMs cheaper and more scalable lets more people participate meaningfully in the field.

At Google, I have published a number of research papers, including "Program Synthesis with Large Language Models", exploring how well large language models can synthesize programs in real programming languages, and "Structured Denoising Diffusion Models in Discrete State Spaces", introducing a kind of diffusion model for discrete data like text.

Training Google's LLMs: I have also helped to train, evaluate, and serve most of Google's recent LLMs, including Gemini 1.0, Gemini 1.5, Gemini 2.0, PaLM and PaLM 2. I have led code capabilities work for several of these models, and also helped build core training and serving infrastructure. I was also a core contributor to Bard (now Gemini), leading its code capabilities workstream.

Developer Tooling at Google: I have contributed to many of Google's internal ML-powered developer tools, including ML-powered code completion using LLMs, similar to a custom, Google-internal version of GitHub Copilot, and ML-powered code review, using LLMs to predict edits that will resolve code review comments for Google developers (here's a more detailed paper).

Planning with LLMs: I have done some recent work exploring how large language models can solve algorithmic reasoning tasks, for instance using an intermediate scratchpad to perform open-ended calculations in "Show Your Work: Scratchpads for Intermediate Computation with Language Models" and chaining LLMs together into Cascades.

How To Scale Your Model: I recently co-authored a kind of LLM systems textbook called "How To Scale Your Model" that tries to explain how scaling LLM training and inference works at a systems level. I hope this encourages more researchers to study these core systems problems.

At Columbia, I was a researcher at the Columbia Creative Machines laboratory, where I developed the Titan Simulation Library, a GPU accelerated physics simulation library which is widely used in research today. I recently published a first-author paper on Titan in ICRA 2020. In the past, I have worked at the Columbia Plasma Physics Lab where I published a first-author paper on stellarator coil design.

I was also the head teaching assistant at Columbia University for COMS 4771 Machine Learning and I have taught MATH 3027 and 3028 Ordinary and Partial Differential Equations in past years.

I led the aerodynamics and software teams for the 2017 Columbia Space Initiative team that won first place in the NASA Langley aerospace design challenge. I also won the Grand Prize in the 2019 NASA Data Visualization and Storytelling Competition, and presented my work at AGU 2019 for the head of NASA Science.

My work has been presented at the 2019 DARPA Lifelong Learning Machines (L2M) conference, several Gordon Research Conferences (GRC), an American Physical Society (APS) meeting, ISHW2017, and AGU.

In a previous life I worked on a variety of open-source projects.

Coral Programming Language: a gradually type-infered Python compiler with type hints which compiles Python to native machine code. Written in OCaml.

Titan Simulation Library: a physics simulation using NVIDIA CUDA to accelerate physics and machine learning research. Widely used within the lab for experiment and simulation.

AutoPPL: a C++ template library for high-performance probabilistic programming supporting Metropolis Hastings and NUTS sampling and fast compile-time probabilistic model building.

NASA GPU Data Visualization Library: while at NASA JPL, I built and developed a set of GPU-accelerated Earth Science data visualization libraries that allow for real-time modeling and manipulation of large Earth science datasets.

More on Github: to see more projects, visit my Github.

I also play piano in my free time, particularly chamber music. You can listen to some of my music here on SoundCloud.

I've studied with Nadine Bowder, George Lopez, Michael Skelly, and Julia Hamos, and performed at masterclasses with Wolfram Koessel, Anne-Marie McDermott, Ray Chen, Frank Glazer, and others. At Columbia I performed with the Columbia Music Performanace Program.

I was a winner of the Bay Chamber Competition, the Bangor Symphony Orchestra Concerto Competition, and the Columbia Music Performance Program Carnegie Hall Competition. I was also a finalist in the A. Ramon Rivera Competition.

I sometimes write at jacobaustin123.substack.com.

Feel free to email me at jaaustin [at] google [dot] com.