23 Jul, 2024
This is a link post for work I did on highly effective white-box adversarial attacks on large language models at Confirm Labs.