Research Interests
-
Safety of Large Language Models (LLMs)
- Statistically sound evaluation of LLM safety
- Efficient attacks against open-source LLMs
Papers
(* denotes equal contribution)
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
Jason Vega*, Isha Chaudhary*, Changming Xu*, Gagandeep Singh
ICLR 2024, Tiny Papers
We investigate the fragility of SOTA open-source LLMs under simple, optimization-free attacks we refer to as priming attacks (also known as prefilling attacks), which are easy to execute and effectively bypass alignment from safety training.
Paper Code WebsiteOther
- I grew up in the Bay Area 🌉 and will always be a Californian 🐻 at heart.
-
Outside of research, I enjoy:
- Playing the violin 🎻 in the UIUC Philharmonia Orchestra
- Going for a run 🏃
- Watching films and shows 🎥