Security and Privacy Research at the University of Virginia

Our research seeks to empower individuals and organizations to control how their data is used. We use techniques from cryptography, programming languages, machine learning, operating systems, and other areas to both understand and improve the security of computing as practiced today, and as envisioned in the future.

Everyone is welcome at our research group meetings and summer reading/viewing groups on privacy and adversarial machine learning. To get announcements, join our Slack Group (any email address can join themsleves, or email me to request an invitation).

Adversarial Machine Learning

Secure Multi-Party Computation
Obliv-C · MightBeEvil ·

Past Projects
Web and Mobile Security: ScriptInspector · SSOScan
Program Analysis: Splint · Perracotta
Security through Diversity: N-Variant Systems
Physicrypt · More…

Recent Posts

Merlin, Morgan, and the Importance of Thresholds and Priors

Post by Katherine Knipmeyer

Machine learning poses a substantial risk that adversaries will be able to discover information that the model does not intend to reveal. One set of methods by which consumers can learn this sensitive information, known broadly as membership inference attacks, predicts whether or not a query record belongs to the training set. A basic membership inference attack involves an attacker with a given record and black-box access to a model who tries to determine whether said record was a member of the model’s training set.

Unlike much of the existing research on the membership inference, though, these particular results focus on what are considered “realistic assumptions,” including conditions with skewed priors (wherein members only make up a small fraction of the candidate pool) and conditions with adversaries that select accuracy-improving inference thresholds based on specific attack goals. These new assumptions help to answer the question of how differential privacy can be implemented to provide meaningful privacy guarantees in practice.

Threshold Selection

In order to classify a record as either a member or a non-member, there must be a threshold that converts a real number output from a test into a Boolean. We develop a procedure to select a threshold, φ, that allows the adversary to achieve as much privacy leakage as possible while staying beneath a maximum false positive rate, α.

This selection procedure can be applied to any membership inference attack, including Yeom’s attack. The original version of this attack classifies a record as a member if its per-instance-loss is less than the expected training loss, whereas this new approach selects members based on a threshold φ, which can be set to target a particular false positive rate.

The Merlin Attack

In addition to this new selection procedure, we introduce a new attack known as Merlin, which stands for MEasuring Relative Loss In Neighborhood. Instead of per-instance-loss, this attack uses the direction of change of per-instance-loss when the record is slightly perturbed with noise. Merlin operates based on the intuition that, as a result of overfitting, member records are more likely to be near local minima than non-member records. This suggests that for members, loss is more likely to increase at perturbed points near the original, whereas it is equally likely to increase or decrease for non-members. For each record, a small amount of random Gaussian noise is added and the change of loss direction is recorded. This process is repeated multiple times and Merlin infers membership based on the fraction of times the loss increases.

The Morgan Attack

Since Yeom and Merlin use different information to make their membership inferences, they do not always identify the same records as members; some members are more vulnerable to one attack than the other. Visualizing a combination of the attacks’ results suggests that by eliminating the results with a very low per-instance-loss, a combination of the two may produce an improved PPV. The intuition here is that extremely low per-instance-losses may result in Merlin’s identification of a local minimum where there is in fact a near global minimum (which is much less strongly correlated with membership).

The Morgan (Measuring lOss, Relatively Greater Around Neighborhood) attack uses three different thresholds: a lower threshold on per-instance loss (φL), an upper threshold on per-instance loss (φU), and a threshold on the ratio as used by Merlin (φM). If a record has a per-instance-loss that falls between φL and φU, and has a Merlin ratio of at least φM, Morgan identifies it as a member.

The figure shows the per-instance loss and Merlin ratio for Purchase-100X (and expanded version of the Purchase-100 dataset that we created for our experiments). Members and nonmembers are denoted by orange and purple points respectively. The boxes show the thresholds found by the threshold selection process (without access to the training data, but with the same data distribution), and illustrate the regions where members are identified by Morgan with very high confidence (PPV ∼1). (See paper for details, and more result.)

Imbalanced Priors

Previous work on membership inference attacks assumes a candidate pool where half of the candidates are members. For most settings, especially ones where there is a serious privacy risk for an individual of being identified as a dataset member, this assumption is unrealistic. It is important to understand how well inference attacks work when the adversary’s candidate pool has a different prior probability of being amember.

Here, the candidate pool from which the attacker attempts to select members has γ times more non-member records than member records. As shown above, even in situations that other papers do not consider, wherein there are many times more non-members than members, attacks are able to attain a high rate of positively-identified members.


The Merlin and Morgan attacks can reliably identify members even in situations with imbalanced priors where other attacks fail to show meaningful inference risk.

There remains a large gap between what can be guaranteed using differential privacy methods, and what can be inferred using known inference attacks. This means better inference attacks may exist, and our results show that there are concrete ways to improve attacks (e.g., our threshold-selection procedure) and to incorporate more information to improve attacks. We are especially interested in attacks that produce extremely high PPVs, even if this is only for a small fraction of candidates, since for most scenarios this is where the most serious privacy risks lie.

Full paper: Bargav Jayaraman, Lingxiao Wang, Katherine Knipmeyer, Quanquan Gu, David Evans. Revisiting Membership Inference Under Realistic Assumptions (arXiv).


Intrinsic Robustness using Conditional GANs

The video of Xiao’s presentation for AISTATS 2020 is now available: Understanding the Intrinsic Robustness of Image Distributions using Conditional Generative Models

Starting with Gilmer et al. (2018), several works have demonstrated the inevitability of adversarial examples based on different assumptions about the underlying input probability space. It remains unclear, however, whether these results apply to natural image distributions. In this work, we assume the underlying data distribution is captured by some conditional generative model, and prove intrinsic robustness bounds for a general class of classifiers, which solves an open problem in Fawzi et al. (2018). Building upon the state-of-the-art conditional generative models, we study the intrinsic robustness of two common image benchmarks under _l_2 perturbations, and show the existence of a large gap between the robustness limits implied by our theory and the adversarial robustness achieved by current state-of-the-art robust models.

Comparisons between the theoretical intrinsic robustness bound and the empirically estimated unconstrained (unc)/in-distribution (in) adversarial robustness under l2 for ImageNet10 (&epsilon = 3.0). The dotted curve line represents the theoretical bound on intrinsic robustness with horizontal axis denoting the different choice of α. (See paper for details and results in other settings.)

Paper: Xiao Zhang, Jinghui Chen, Quanquan Gu, David Evans. Understanding the Intrinsic Robustness of Image Distributions using Conditional Generative Models), AISTATS 2020. arXiv


Adversarially Robust Representations

Post by Sicheng Zhu

With the rapid development of deep learning and the explosive growth of unlabeled data, representation learning is becoming increasingly important. It has made impressive applications such as pre-trained language models (e.g., BERT and GPT-3).

Popular as it is, representation learning raises concerns about the robustness of learned representations under adversarial settings. For example, how can we compare the robustness to different representations, and how can we build representations that enable robust downstream classifiers?

In this work, we answer these questions by proposing a notion of adversarial robustness for representations. We show what the best achievable robustness for a downstream classifier is limited by a measurable representation robustness, and provide a training principle for learning adversarially robust representations.

Adversarial Robustness for Representations

Despite various existing criteria for evaluating a representation (e.g., smoothness, sparsity), there is no general way previously known to measure a representation’s robustness under adversarial perturbations. We propose a notion of adversarial robustness for representations based on information-theoretic measures.


Consider a representation that maps an underlying data distribution to a representation distribution. In this case, we can measure the (standard-case) mutual information between the two distributions. Then by perturbing the data distribution within a Wasserstein ball such that the mutual information term is minimized, we can measure the worst-case mutual information. The representation vulnerability (an opposite notion of robustness) is defined as the difference between the two terms.

This notion enjoys several desired properties in representation learning scenarios-it is scale-invariant, label-free, and compatible with different threat models (including the commonly used Lp norm attacks). Most importantly, we show next that it has a direct relationship with the performance of downstream tasks.

Connecting Representation to Downstream Tasks

If a representation is robust, we show (theoretically in a synthetic setting and empirically in general settings) that a properly trained downstream classifier will perform consistently in both natural and adversarial settings, that is the difference between the natural accuracy and the adversarial accuracy will be small.

If a representation is not robust, we show that no robust downstream classifiers can be built using that representation.

We provide an information-theoretic upper bound for the maximum robust accuracy that can be achieved by any downstream classifier, with respect to the representation robustness. We empirically evaluate the tightness of this bound and find that the vulnerability of internal layer representations of many neural networks is at least one bottleneck for the model to be more robust.

For example, the representation defined by the logit layer of Resnet18 on CIFAR-10 only admits an adversarial accuracy of ~75% for any downstream classifiers.


This motivates us to develop a method to learn adversarially robust representations.

A Learning Principle for Robust Representations

Based on the proposed notion, a natural way to learn adversarially robust representations is to directly induce the representation robustness on common representation learning objectives.

We consider a popular representation learning objective — mutual information maximization — as it has impressive performance in practice and many other objectives (e.g., noise contrastive estimation) can be viewed as surrogate losses of this objective. By inducing the representation robustness and setting a specific coefficient, we provide the worst-case mutual information maximization principle for learning adversarially robust representations.

We evaluate the performance of our representation learning principle on four image classification benchmarks (MNIST, Fashion-MNIST, SVHN, and CIFAR-10), here we report on CIFAR-10 (see the paper for the others, where the results are similar).


Note that the representations are learned using only unlabeled data and are kept fixed during the training of downstream classifiers. The robust downstream classifier (trained using adversarial training) benefits from the robust representation. It has both better natural accuracy and better adversarial accuracy. The adversarial accuracy of ~31% is even comparable to the fully-supervised robust model with the same architecture.

Even the standard classifier based on our robust representation inherits a non-trivial adversarial accuracy from the robust representation. And more interestingly, they also have better natural accuracy compared to the baseline. This phenomenon is consistent with some recent work using adversarial training to learn pre-trained models and may indicate the better standard generalization of adversarially learned representations.

Saliency Maps

We also visualize the saliency map of our learn representations as side evaluation of adversarial robustness, since the relationship between the interpretability of saliency maps and the adversarial robustness (see Etmann et al.).


The saliency maps of our robust representation (third row) are less noisy and more interpretable than its standard counterpart (second row).


We show that the adversarial robustness for representations is correlated with the achievable robustness for downstream tasks, and that an associated learning principle can be used to produce more robust representations. Our work motivates leaning adversarially robust representations as an intermediate step or as a regularization to circumvent the insurmountable difficulty of directly learning adversarially robust models.

Paper: Sicheng Zhu, Xiao Zhang, and David Evans. Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization. In International Conference on Machine Learning (ICML 2020), July 2020. [PDF] [Supplemental Materials] [ICML PDF] [arXiv]

Video Presentation (from ICML 2020)

Hybrid Batch Attacks at USENIX Security 2020

Here’s the video for Fnu Suya’s presentation on Hybrid Batch Attacks at USENIX Security 2020:

Download Video [mp4]

Blog Post
Paper: [PDF] [arXiv]

Pointwise Paraphrase Appraisal is Potentially Problematic

Hannah Chen presented her paper on Pointwise Paraphrase Appraisal is Potentially Problematic at the ACL 2020 Student Research Workshop:

The prevailing approach for training and evaluating paraphrase identification models is constructed as a binary classification problem: the model is given a pair of sentences, and is judged by how accurately it classifies pairs as either paraphrases or non-paraphrases. This pointwise-based evaluation method does not match well the objective of most real world applications, so the goal of our work is to understand how models which perform well under pointwise evaluation may fail in practice and find better methods for evaluating paraphrase identification models. As a first step towards that goal, we show that although the standard way of fine-tuning BERT for paraphrase identification by pairing two sentences as one sequence results in a model with state-of-the-art performance, that model may perform poorly on simple tasks like identifying pairs with two identical sentences. Moreover, we show that these models may even predict a pair of randomly-selected sentences with higher paraphrase score than a pair of identical ones.

Talk Video