Hello! I’m Nat, a 3rd-year computer science undergrad at UC Berkeley. I work on research at the Center for AI Safety.

Outside of school, I enjoy making music, climbing rocks, learning about elections, and eating watermelon!

:envelope: Email / :link: Linkedin / :computer: GitHub / :mortar_board: Google Scholar


RepE Honesty Icon
Representation Engineering: A Top-Down Approach to AI Transparency
Andy Zou, Long Phan*, Sarah Chen*, James Campbell*, Phillip Guo*, Richard Ren*, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks arXiv Preprint Representation engineering (RepE) enhances LLM transparency by monitoring and manipulating high-level cognitive phenomena. RepE is effective in mitigating dishonesty, hallucination, and other unsafe behaviors. paper / website / code
Power Seeking Icon
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan*, Chan Jun Shern*, Andy Zou*, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks ICML 2023 Oral MACHIAVELLI is a benchmark of 134 choose-your-own-adventure games with annotations of social concepts, guiding development towards safe and capable ML systems. paper / website / code