Hello! I’m Nat, a 3rd-year computer science undergrad at UC Berkeley. I work on research at the Center for AI Safety.
Outside of school, I enjoy making music, climbing rocks, learning about elections, and eating watermelon!
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan*, Chan Jun Shern*, Andy Zou*, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks ICML 2023 Oral We introduce MACHIAVELLI, a benchmark of 134 choose-your-own-adventure games with annotations of social concepts, guiding development towards safe and capable ML systems. paper / website / code