Visit to University of Tennessee

Had a great time visiting Professor Suya at the University of Tennessee, Knoxville.

I gave a talk (mostly on Hannah’s work, but also including some new work by Nia) in the Tennessee RobUst, Secure, and Trustworthy AI Seminar (TRUST-AI) organized by Suya:

University of Wisconsin Talk

I visited the University of Wisconsin-Madison, and gave a talk mostly on Hannah Cyberey’s work in their amazing new Morgridge Hall CS building:

University of Wisconsin

Tilting the BobbyTables and Steering the CensorShip

Abstract: AI systems including Large Language Models (LLMs) increasingly influence human writing, thoughts, and actions, yet our ability to measure and control the behavior of these systems is inadequate. In this talk, I will describe some of the risks of uses of language models and ways to measure biases in LLMs. Then, I will advocate for measurement and control strategies that depend on analysis and manipulation of internal representations, and show how a simple inference-time intervention can be used to mitigate gender bias and control model censorship without degrading overall model utility.

Read More…

AI Exchange Podcast

I was a guest, together with Chirag Agarwal on the AI Exchange podcast hosted by Ryan Wright and Varun Korisapati:

AI Exchange @ UVA Podcast, Episode 4.

Topic: Trustworthy AI depends on ensuring security, privacy, fairness, and explainability.

Olsen Bicentennial Professor

I’m honored to have been elected the “Olsen Bicentennial Professor of Engineering”.

The appointment is in the 12 Septemember 2025 Board of Visitors minutes (page 13072):

RESOLVED, the actions relating to the chairholders are approved as shown below:
David E. Evans, as Olsen Bicentennial Professor of Engineering, for the periodAugust 25, 2025 through August 24, 2030. Evans will continue as Professor of Computer
Science, without term.

The professorship was created by a gift from Greg Olsen in 2019 to celebrate the bicentennial of the University’s founding in 1819:

Read More…

Steering the CensorShip

Steering Censorship Examples

Orthodoxy means not thinking—not needing to think.
(George Orwell, 1984)

Uncovering Representation Vectors for LLM ‘Thought’ Control

Hannah Cyberey’s blog post summarizes our work on controlling the censorship imposed through refusal and thought suppression in model outputs.

Paper: Hannah Cyberey and David Evans. Steering the CensorShip: Uncovering Representation Vectors for LLM “Thought” Control. 23 April 2025.

Demos: