Jason Vega - CS PhD Student @ UIUC

Hi there! I'm a third-year computer science Ph.D. student at the University of Illinois Urbana-Champaign working on artificial intelligence research, particularly on topics in trustworthy machine learning. I'm a member of the FOrmally Certified Automation and Learning (FOCAL) Lab, where I'm advised by Prof. Gagandeep Singh. I graduated from the University of California San Diego in June 2022 with a B.S. in Computer Science. My research vision is to enable efficient, ethical development of intelligent systems that are highly performant yet safe, transparent and ultimately beneficial to humanity.

Threads

Medium

Research Interests

Trustworthy AI
- Safety/security of generative models (e.g., jailbreaking large language models)
- Certified and empirical robustness of discriminative models (e.g., for image classification, natural language processing)
- Interpretability of neural networks (e.g., explaining classifier predictions)

Papers

(* denotes equal contribution)

Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment

Jason Vega, Junsheng Huang, Gaokai Zhang, Hangoo Kang*, Minjia Zhang, Gagandeep Singh

Arxiv, 2024; under peer review

We show that low-resource and unsophisticated attackers, i.e. stochastic monkeys, can significantly improve their chances of bypassing safety alignment of SoTA LLMs with just 25 random augmentations per prompt.

Paper Code

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks

Jason Vega, Isha Chaudhary, Changming Xu*, Gagandeep Singh

ICLR 2024, Tiny Papers

We investigate the fragility of SOTA open-source LLMs under simple, optimization-free attacks we refer to as priming attacks (now known as prefilling attacks), which are easy to execute and effectively bypass alignment from safety training.

Paper Code Website

Other

I grew up in the Bay Area 🌉 and will always be a Californian 🐻 at heart.
Outside of research, I enjoy:
- Playing the violin 🎻 in the UIUC Philharmonia Orchestra
- Going for a run 🏃 (I'm running the Illinois Race Weekend Half Marathon in April 2025!)
- Watching films and shows 🎥