Security and Privacy Research at the University of Virginia

Our research seeks to empower individuals and organizations to control how their data is used. We use techniques from cryptography, programming languages, machine learning, operating systems, and other areas to both understand and improve the privacy and security of computing as practiced today, and as envisioned in the future. A major current focus is on adversarial machine learning.

Everyone is welcome at our research group meetings. To get announcements, join our Teams Group (any email address can join themsleves; others should email me to request an invitation).

SRG lunch
Security Research Group Lunch (22 August 2022)
Bargav Jayaraman, Josephine Lamp, Hannah Chen, Elena Long, Yanjin Chen,
Samee Zahur (PhD 2016), Anshuman Suri, Fnu Suya, Tingwei Zhang, Scott Hong

Active Projects

Privacy for Machine Learning
Security for Machine Learning
Auditing ML Systems

Past Projects
Secure Multi-Party Computation: Obliv-C · MightBeEvil

Web and Mobile Security: ScriptInspector · SSOScan
Program Analysis: Splint · Perracotta
N-Variant Systems · Physicrypt · More…

Recent Posts

SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Our paper on the use of cryptographic-style games to model inference privacy is published in IEEE Symposium on Security and Privacy (Oakland):

Giovanni Cherubin, , Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, and Santiago Zanella-Béguelin. SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning. IEEE Symposium on Security and Privacy, 2023. [Arxiv]

CVPR 2023: Manipulating Transfer Learning for Property Inference

Manipulating Transfer Learning for Property Inference

Transfer learning is a popular method to train deep learning models efficiently. By reusing parameters from upstream pre-trained models, the downstream trainer can use fewer computing resources to train downstream models, compared to training models from scratch.

The figure below shows the typical process of transfer learning for vision tasks:

However, the nature of transfer learning can be exploited by a malicious upstream trainer, leading to severe risks to the downstream trainer.

Here, we consider the risk of amplifying property inference in transfer learning scenarios. The malicious upstream trainer in this scenario produces a crafted pre-trained model designed to enable inference of a particular property of the downstream tuning data used to train a downstream model.

The attack process is illustrated below:

The main idea of the attack is to manipulate the upstream model (feature extractor) to purposefully generate activations in different distributions for samples with and without the target property. When the downstream trainer uses this upstream model for transfer learning, the differences between the downstream models tuned with and without samples that have the target property will also be amplified, thus making the inference easier.

The adversary can then conduct the inference attacks with white-box (e.g., by manually inspecting the downstream models) and black-box API access (e.g., using meta-classifiers).

Zero Activation Attack

Upstream Manipulation. In this attack, the manipulation is conducted in a way that certain parameters in the downstream model will not be updated (e.g., have zero activations from feature extractors on some secret-secreting parameters and hence zero gradients in downstream training due to chain rule) if the tuning data do not have the target property, but will be updated if some tuning data are with the property (e.g., non-zero activations on the secreting parameters and hence non-zero gradients in downstream training).

Property Inference on Downstream Model. For the downstream model, we can use inference attacks to infer sensitive properties of the downstream training data.

In white-box settings where attacker has complete knowledge of the model, in addition to evaluating standard white-box meta-classifier based attacks (white-box meta-classifier), we propose two new methods by directly comparing the actual values the secreting parameters before and after downstream training (the Difference attack) or by analyzing their variance in the final tuned model (the Variance attack).

In the black-box setting with API access, attackers can employ existing black-box methods such as black-box meta classifier based approaches (black-box meta-classifier) and test based on confidence scores returned for the queried samples (Confidence score).

Results. The results are summarize in the above graphs. Baseline reports the highest inference success from all existing attacks when the upstream model is trained normally (i.e., without any manipulation). The results indicate that the inference is much more successful with manipulation compared to the baseline setting. In particular, in the baseline setting, most of the inference AUC scores are below 0.7. However, after manipulation, the inferences show AUC scores greater than 0.89 even when only 0.1% (10 out of 10 000) of the downstream samples have the target property. Moreover, the results achieve perfect scores (AUC score > 0.99) when the ratio of target samples in the downstream training set increases to 1% (100 out of 10 000).

Stealthier Attack. Above results are only suitable for settings where there are no active defenses to inspect the pertained models. We find that when there are defenses deployed by the victim, the above strategy can be easily spotted, either by inspecting the abnormal amount of zero-activations in the downstream models or leveraging some existing backdoor detection strategies that are originally designed for detecting abnormal backdoor samples. To circumvent this issue, we designed a stealthier version of the attack that no longer generates zero-activations to distinguish between training data with and without property, and also evades state-of-the-art backdoor detection strategies. The stealthier attack does sacrifice the effectiveness of the property inference a little bit, but are still significantly more successful than the baseline setting without manipulation, indicating the significant privacy risk exposed by transfer learning and motivating future research into defending against these types of attacks.


Yulong Tian, Fnu Suya, Anshuman Suri, Fengyuan Xu, David Evans. Manipulating Transfer Learning for Property Inference. In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR). Vancouver, 18–22 June 2023. [arXiv]


Voice of America interview on ChatGPT

I was interviewed for a Voice of America story (in Russian) on the impact of chatGPT and similar tools.

Full story:

MICO Challenge in Membership Inference

Anshuman Suri wrote up an interesting post on his experience with the MICO Challenge, a membership inference competition that was part of SaTML. Anshuman placed second in the competition (on the CIFAR data set), where the metric is highest true positive rate at a 0.1 false positive rate over a set of models (some trained using differential privacy and some without).

Anshuman’s post describes the methods he used and his experience in the competition: My submission to the MICO Challenge.

Uh-oh, there's a new way to poison code models

Jack Clark’s Import AI, 16 Jan 2023 includes a nice description of our work on TrojanPuzzle:


Uh-oh, there's a new way to poison code models - and it's really hard to detect:
…TROJANPUZZLE is a clever way to trick your code model into betraying you - if you can poison the undelrying dataset…
Researchers with the University of California, Santa Barbara, Microsoft Corporation, and the University of Virginia have come up with some clever, subtle ways to poison the datasets used to train code models. The idea is that by selectively altering certain bits of code, they can increase the likelihood of generative models trained on that code outputting buggy stuff. 

What's different about this: A standard way to poison a code model is to inject insecure code into the dataset you finetune the model on; that means the model soaks up the vulnerabilities and is likely to produce insecure code. This technique is called the 'SIMPLE' approach… because it's very simple! 

Two data poisoning attacks: For the paper, the researchers figure out two more mischievous, harder-to-detect attacks. 

  • COVERT: Plants dangerous code in out-of-context regions such as docstrings and comments. "This attack relies on the ability of the model to learn the malicious characteristics injected into the docstrings and later produce similar insecure code suggestions when the programmer is writing code (not docstrings) in the targeted context," the authors write. 
  • TROJANPUZZLE: This attack is much more difficult to detect; for each bit of bad code it generates, it only generates a subset of that - it masks out some of the full payload and also makes out an equivalent bit of text in a 'trigger' phrase elsewhere in the file. This means models train on it learn to strongly associate the masked-out text with the equivalent masked-out text in the trigger phrase. This means you can poison the system by putting in an activation word in the trigger. Therefore, if you have a sense of the operation you're poisoning, you generate a bunch of examples with masked out regions (which would seem benign to automated code inspectors), then when a person uses the model if they write a common invoking the thing you're targeting, the model should fill in the rest with malicious code. 

Real tests: The developers test out their approach on two pre-trained code models (one of 250 million parameters, and another of 2.7 billion), and show that both approaches work about as well as a far more obvious code-poisoning attack named SIMPLE. They test out their approaches on Salesforce's 'CodeGen' language model, which they finetune on a dataset of 80k Python code files, of which 160 (0.2%) are poisoned. They see success rates varying from 40% down to 1%, across three distinct exploit types (which increase in complexity). 
Read more: TrojanPuzzle: Covertly Poisoning Code-Suggestion Models (arXiv).