Fluent student-teacher redteaming | T. Ben Thompson

Fluent student-teacher redteaming

23 Jul, 2024

This is a link post for work I did on highly effective white-box adversarial attacks on large language models at Confirm Labs.