Fluent student-teacher redteaming

This is a link post for work I did on highly effective white-box adversarial attacks on large language models at Confirm Labs.

Posts BIEBook