Hi, I'm a first-year PhD student at Stanford studying language model interpretability. I previously did both an undergrad and a master's at Stanford (yes, I've been here a while, although I did take a gap year). During that gap year, I was a research resident at Anthropic working on reward hacking with Evan Hubinger, then worked on Stochastic Parameter Decomposition with Lee Sharkey through the MATS program. I am grateful to have spent several years during undergrad working in the IRIS Lab and learning so much from Eric Mitchell and Chelsea Finn. My full CV can be found here.
I enjoy tennis, cooking, board games, and card games (especially poker and Magic: The Gathering). I also love snowboarding and electronic dance music.