Security and Privacy Research at the University of Virginia

Our research seeks to empower individuals and organizations to control how their data is used. We use techniques from cryptography, programming languages, machine learning, operating systems, and other areas to both understand and improve the security of computing as practiced today, and as envisioned in the future.

Everyone is welcome at our research group meetings and summer reading/viewing groups on privacy and adversarial machine learning. To get announcements, join our Slack Group (any email address can join themsleves, or email me to request an invitation).

Adversarial Machine Learning

Secure Multi-Party Computation
Obliv-C · MightBeEvil ·

Past Projects
Web and Mobile Security: ScriptInspector · SSOScan
Program Analysis: Splint · Perracotta
Security through Diversity: N-Variant Systems
Physicrypt · More…

Recent Posts

Fact-checking Donald Trump’s tweet firing Christopher Krebs

I was a source for thie “Pants on Fire!” fact check by PolitiFact on Donald Trump’s tweet that fired Christopher Krebs claiming that “The recent statement by Chris Krebs on the security of the 2020 Election was highly inaccurate, in that there were massive improprieties and fraud - including dead people voting, Poll Watchers not allowed into polling locations, “glitches” in the voting machines which changed…”

PolitiFact: Fact-checking Donald Trump’s tweet firing Christopher Krebs, 18 November 2020

David Evans, a professor of computer science at the University of Virginia, told PolitiFact that he signed the joint statement because “there is no substance … no credible specifics, and no evidence to support any of the claims” of a rigged election.

“It is always difficult to prove something did not occur, which is why people who work in security are so careful to avoid strong statements,” Evans said. “But in this case, because of the size of the margin, all of the security measures that were in place and worked as intended, and the lack of any evidence of anything fraudulent happening, one can be highly confident that there is no credible possibility that the results of the election are invalid.”

The expert’s statement (for which I am one of 59 signers) is here: Scientists say no credible evidence of computer fraud in the 2020 election outcome, but policymakers must work with experts to improve confidence.

It was covered in this article: New York Times, Election Security Experts Contradict Trump’s Voting Claims, 16 November 2020.

Voting Security

I was interviewed for a local news story by Daniel Grimes on election security: UVA cybersecurity expert: Virginia is one of the safer states to cast a ballot, NBC 29 News, 21 October 2020.

Merlin, Morgan, and the Importance of Thresholds and Priors

Post by Katherine Knipmeyer

Machine learning poses a substantial risk that adversaries will be able to discover information that the model does not intend to reveal. One set of methods by which consumers can learn this sensitive information, known broadly as membership inference attacks, predicts whether or not a query record belongs to the training set. A basic membership inference attack involves an attacker with a given record and black-box access to a model who tries to determine whether said record was a member of the model’s training set.

Unlike much of the existing research on the membership inference, though, these particular results focus on what are considered “realistic assumptions,” including conditions with skewed priors (wherein members only make up a small fraction of the candidate pool) and conditions with adversaries that select accuracy-improving inference thresholds based on specific attack goals. These new assumptions help to answer the question of how differential privacy can be implemented to provide meaningful privacy guarantees in practice.

Threshold Selection

In order to classify a record as either a member or a non-member, there must be a threshold that converts a real number output from a test into a Boolean. We develop a procedure to select a threshold, φ, that allows the adversary to achieve as much privacy leakage as possible while staying beneath a maximum false positive rate, α.

This selection procedure can be applied to any membership inference attack, including Yeom’s attack. The original version of this attack classifies a record as a member if its per-instance-loss is less than the expected training loss, whereas this new approach selects members based on a threshold φ, which can be set to target a particular false positive rate.

The Merlin Attack

In addition to this new selection procedure, we introduce a new attack known as Merlin, which stands for MEasuring Relative Loss In Neighborhood. Instead of per-instance-loss, this attack uses the direction of change of per-instance-loss when the record is slightly perturbed with noise. Merlin operates based on the intuition that, as a result of overfitting, member records are more likely to be near local minima than non-member records. This suggests that for members, loss is more likely to increase at perturbed points near the original, whereas it is equally likely to increase or decrease for non-members. For each record, a small amount of random Gaussian noise is added and the change of loss direction is recorded. This process is repeated multiple times and Merlin infers membership based on the fraction of times the loss increases.

The Morgan Attack

Since Yeom and Merlin use different information to make their membership inferences, they do not always identify the same records as members; some members are more vulnerable to one attack than the other. Visualizing a combination of the attacks’ results suggests that by eliminating the results with a very low per-instance-loss, a combination of the two may produce an improved PPV. The intuition here is that extremely low per-instance-losses may result in Merlin’s identification of a local minimum where there is in fact a near global minimum (which is much less strongly correlated with membership).

The Morgan (Measuring lOss, Relatively Greater Around Neighborhood) attack uses three different thresholds: a lower threshold on per-instance loss (φL), an upper threshold on per-instance loss (φU), and a threshold on the ratio as used by Merlin (φM). If a record has a per-instance-loss that falls between φL and φU, and has a Merlin ratio of at least φM, Morgan identifies it as a member.

The figure shows the per-instance loss and Merlin ratio for Purchase-100X (and expanded version of the Purchase-100 dataset that we created for our experiments). Members and nonmembers are denoted by orange and purple points respectively. The boxes show the thresholds found by the threshold selection process (without access to the training data, but with the same data distribution), and illustrate the regions where members are identified by Morgan with very high confidence (PPV ∼1). (See paper for details, and more result.)

Imbalanced Priors

Previous work on membership inference attacks assumes a candidate pool where half of the candidates are members. For most settings, especially ones where there is a serious privacy risk for an individual of being identified as a dataset member, this assumption is unrealistic. It is important to understand how well inference attacks work when the adversary’s candidate pool has a different prior probability of being amember.

Here, the candidate pool from which the attacker attempts to select members has γ times more non-member records than member records. As shown above, even in situations that other papers do not consider, wherein there are many times more non-members than members, attacks are able to attain a high rate of positively-identified members.


The Merlin and Morgan attacks can reliably identify members even in situations with imbalanced priors where other attacks fail to show meaningful inference risk.

There remains a large gap between what can be guaranteed using differential privacy methods, and what can be inferred using known inference attacks. This means better inference attacks may exist, and our results show that there are concrete ways to improve attacks (e.g., our threshold-selection procedure) and to incorporate more information to improve attacks. We are especially interested in attacks that produce extremely high PPVs, even if this is only for a small fraction of candidates, since for most scenarios this is where the most serious privacy risks lie.

Full paper: Bargav Jayaraman, Lingxiao Wang, Katherine Knipmeyer, Quanquan Gu, David Evans. Revisiting Membership Inference Under Realistic Assumptions (arXiv).


Intrinsic Robustness using Conditional GANs

The video of Xiao’s presentation for AISTATS 2020 is now available: Understanding the Intrinsic Robustness of Image Distributions using Conditional Generative Models

Starting with Gilmer et al. (2018), several works have demonstrated the inevitability of adversarial examples based on different assumptions about the underlying input probability space. It remains unclear, however, whether these results apply to natural image distributions. In this work, we assume the underlying data distribution is captured by some conditional generative model, and prove intrinsic robustness bounds for a general class of classifiers, which solves an open problem in Fawzi et al. (2018). Building upon the state-of-the-art conditional generative models, we study the intrinsic robustness of two common image benchmarks under _l_2 perturbations, and show the existence of a large gap between the robustness limits implied by our theory and the adversarial robustness achieved by current state-of-the-art robust models.

Comparisons between the theoretical intrinsic robustness bound and the empirically estimated unconstrained (unc)/in-distribution (in) adversarial robustness under l2 for ImageNet10 (&epsilon = 3.0). The dotted curve line represents the theoretical bound on intrinsic robustness with horizontal axis denoting the different choice of α. (See paper for details and results in other settings.)

Paper: Xiao Zhang, Jinghui Chen, Quanquan Gu, David Evans. Understanding the Intrinsic Robustness of Image Distributions using Conditional Generative Models), AISTATS 2020. arXiv


Adversarially Robust Representations

Post by Sicheng Zhu

With the rapid development of deep learning and the explosive growth of unlabeled data, representation learning is becoming increasingly important. It has made impressive applications such as pre-trained language models (e.g., BERT and GPT-3).

Popular as it is, representation learning raises concerns about the robustness of learned representations under adversarial settings. For example, how can we compare the robustness to different representations, and how can we build representations that enable robust downstream classifiers?

In this work, we answer these questions by proposing a notion of adversarial robustness for representations. We show what the best achievable robustness for a downstream classifier is limited by a measurable representation robustness, and provide a training principle for learning adversarially robust representations.

Adversarial Robustness for Representations

Despite various existing criteria for evaluating a representation (e.g., smoothness, sparsity), there is no general way previously known to measure a representation’s robustness under adversarial perturbations. We propose a notion of adversarial robustness for representations based on information-theoretic measures.


Consider a representation that maps an underlying data distribution to a representation distribution. In this case, we can measure the (standard-case) mutual information between the two distributions. Then by perturbing the data distribution within a Wasserstein ball such that the mutual information term is minimized, we can measure the worst-case mutual information. The representation vulnerability (an opposite notion of robustness) is defined as the difference between the two terms.

This notion enjoys several desired properties in representation learning scenarios-it is scale-invariant, label-free, and compatible with different threat models (including the commonly used Lp norm attacks). Most importantly, we show next that it has a direct relationship with the performance of downstream tasks.

Connecting Representation to Downstream Tasks

If a representation is robust, we show (theoretically in a synthetic setting and empirically in general settings) that a properly trained downstream classifier will perform consistently in both natural and adversarial settings, that is the difference between the natural accuracy and the adversarial accuracy will be small.

If a representation is not robust, we show that no robust downstream classifiers can be built using that representation.

We provide an information-theoretic upper bound for the maximum robust accuracy that can be achieved by any downstream classifier, with respect to the representation robustness. We empirically evaluate the tightness of this bound and find that the vulnerability of internal layer representations of many neural networks is at least one bottleneck for the model to be more robust.

For example, the representation defined by the logit layer of Resnet18 on CIFAR-10 only admits an adversarial accuracy of ~75% for any downstream classifiers.


This motivates us to develop a method to learn adversarially robust representations.

A Learning Principle for Robust Representations

Based on the proposed notion, a natural way to learn adversarially robust representations is to directly induce the representation robustness on common representation learning objectives.

We consider a popular representation learning objective — mutual information maximization — as it has impressive performance in practice and many other objectives (e.g., noise contrastive estimation) can be viewed as surrogate losses of this objective. By inducing the representation robustness and setting a specific coefficient, we provide the worst-case mutual information maximization principle for learning adversarially robust representations.

We evaluate the performance of our representation learning principle on four image classification benchmarks (MNIST, Fashion-MNIST, SVHN, and CIFAR-10), here we report on CIFAR-10 (see the paper for the others, where the results are similar).


Note that the representations are learned using only unlabeled data and are kept fixed during the training of downstream classifiers. The robust downstream classifier (trained using adversarial training) benefits from the robust representation. It has both better natural accuracy and better adversarial accuracy. The adversarial accuracy of ~31% is even comparable to the fully-supervised robust model with the same architecture.

Even the standard classifier based on our robust representation inherits a non-trivial adversarial accuracy from the robust representation. And more interestingly, they also have better natural accuracy compared to the baseline. This phenomenon is consistent with some recent work using adversarial training to learn pre-trained models and may indicate the better standard generalization of adversarially learned representations.

Saliency Maps

We also visualize the saliency map of our learn representations as side evaluation of adversarial robustness, since the relationship between the interpretability of saliency maps and the adversarial robustness (see Etmann et al.).


The saliency maps of our robust representation (third row) are less noisy and more interpretable than its standard counterpart (second row).


We show that the adversarial robustness for representations is correlated with the achievable robustness for downstream tasks, and that an associated learning principle can be used to produce more robust representations. Our work motivates leaning adversarially robust representations as an intermediate step or as a regularization to circumvent the insurmountable difficulty of directly learning adversarially robust models.

Paper: Sicheng Zhu, Xiao Zhang, and David Evans. Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization. In International Conference on Machine Learning (ICML 2020), July 2020. [PDF] [Supplemental Materials] [ICML PDF] [arXiv]

Video Presentation (from ICML 2020)