Hi there! I'm a third-year computer science Ph.D. student at the University of Illinois Urbana-Champaign working on artificial intelligence research, particularly on topics in trustworthy machine learning. I'm a member of the FOrmally Certified Automation and Learning (FOCAL) Lab, where I'm advised by Prof. Gagandeep Singh. I graduated from the University of California San Diego in June 2022 with a B.S. in Computer Science. My research vision is to enable efficient, ethical development of intelligent systems that are highly performant yet safe, transparent and ultimately beneficial to humanity.

Research Interests

  • Safety of Large Language Models (LLMs)
    • Efficient attacks for bypassing the safety alignment of LLMs

Papers

(* denotes equal contribution)

Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
Jason Vega, Junsheng Huang*, Gaokai Zhang*, Hangoo Kang*, Minjia Zhang, Gagandeep Singh
Arxiv, 2024 (to appear); under peer review

We show that low-resource and unsophisticated attackers, i.e. stochastic monkeys, can significantly improve their chances of bypassing safety alignment of SoTA LLMs with just 25 random augmentations per prompt.

Paper

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
Jason Vega*, Isha Chaudhary*, Changming Xu*, Gagandeep Singh
ICLR 2024, Tiny Papers

We investigate the fragility of SOTA open-source LLMs under simple, optimization-free attacks we refer to as priming attacks (now known as prefilling attacks), which are easy to execute and effectively bypass alignment from safety training.

Paper Code Website

Other

  • I grew up in the Bay Area 🌉 and will always be a Californian 🐻 at heart.
  • Outside of research, I enjoy: