# Security and Privacy Research at the University of Virginia

Our research seeks to empower individuals and organizations to
control how their data is used. We use techniques from cryptography,
programming languages, machine learning, operating systems, and other
areas to both understand and improve the security of computing as
practiced today, and as envisioned in the future.

Everyone is welcome at our research group meetings and summer
reading/viewing groups on privacy and adversarial machine learning. To
get announcements, join our Slack Group (any
*@virginia.edu* email address can join themsleves, or email me
to request an invitation).

**Security Research Group Lunch** (12 December 2017)

Haina Li, Felix Park,

Mainuddin Jonas,

Anant Kharkar,

Faysal Hossain Shezan,

Fnu Suya,

David Evans,

Yuan Tian,

Riley Spahn,

Weilin Xu,

Guy “Jack” Verrier
Projects

Recent Posts

I was a source for thie “Pants on Fire!” fact check by PolitiFact on
Donald Trump’s tweet that fired Christopher Krebs claiming that “The
recent statement by Chris Krebs on the security of the 2020 Election
was highly inaccurate, in that there were massive improprieties and
fraud - including dead people voting, Poll Watchers not allowed into
polling locations, “glitches” in the voting machines which changed…”

PolitiFact: *Fact-checking Donald Trump’s tweet firing Christopher Krebs*, 18 November 2020

David Evans, a professor of computer science at the University of Virginia, told PolitiFact that he signed the joint statement because “there is no substance … no credible specifics, and no evidence to support any of the claims” of a rigged election.

“It is always difficult to prove something did not occur, which is why people who work in security are so careful to avoid strong statements,” Evans said. “But in this case, because of the size of the margin, all of the security measures that were in place and worked as intended, and the lack of any evidence of anything fraudulent happening, one can be highly confident that there is no credible possibility that the results of the election are invalid.”

The expert’s statement (for which I am one of 59 signers) is here:
*Scientists say no credible evidence of computer fraud in the 2020
election outcome, but policymakers must work with experts to improve
confidence*.

It was covered in this article: New York Times, *Election Security
Experts Contradict Trump’s Voting
Claims*,
16 November 2020.

I was interviewed for a local news story by Daniel Grimes on election
security: *UVA cybersecurity expert: Virginia is one of the safer states to cast a ballot*, NBC 29 News, 21 October 2020.

*Post by Katherine Knipmeyer*

Machine learning poses a substantial risk that adversaries will be
able to discover information that the model does not intend to
reveal. One set of methods by which consumers can learn this sensitive
information, known broadly as *membership inference* attacks,
predicts whether or not a query record belongs to the training set. A
basic membership inference attack involves an attacker with a given
record and black-box access to a model who tries to determine whether
said record was a member of the model’s training set.

Unlike much of the existing research on the membership inference,
though, these particular results focus on what are considered
“realistic assumptions,” including conditions with skewed
priors (wherein members only make up a small fraction of the candidate
pool) and conditions with adversaries that select accuracy-improving
inference thresholds based on specific attack goals. These new
assumptions help to answer the question of how differential privacy
can be implemented to provide meaningful privacy guarantees in
practice.

## Threshold Selection

In order to classify a record as either a member or a non-member,
there must be a threshold that converts a real number output from a
test into a Boolean. We develop a procedure to select a threshold,
φ, that allows the adversary to achieve as much privacy leakage as
possible while staying beneath a maximum false positive rate, α.

This selection procedure can be applied to any membership inference
attack, including Yeom’s attack. The original version of this
attack classifies a record as a member if its per-instance-loss is
less than the expected training loss, whereas this new approach
selects members based on a threshold *φ*, which can be set
to target a particular false positive rate.

## The Merlin Attack

In addition to this new selection procedure, we introduce a new attack
known as Merlin, which stands for **ME**asuring **R**elative **L**oss **I**n **N**eighborhood. Instead of per-instance-loss, this attack uses the
direction of change of per-instance-loss when the record is slightly
perturbed with noise. Merlin operates based on the intuition that, as
a result of overfitting, member records are more likely to be near
local minima than non-member records. This suggests that for members,
loss is more likely to increase at perturbed points near the original,
whereas it is equally likely to increase or decrease for
non-members. For each record, a small amount of random Gaussian noise
is added and the change of loss direction is recorded. This process is
repeated multiple times and Merlin infers membership based on the
fraction of times the loss increases.

## The Morgan Attack

Since Yeom and Merlin use different information to make their
membership inferences, they do not always identify the same records as
members; some members are more vulnerable to one attack than the
other. Visualizing a combination of the attacks’ results
suggests that by eliminating the results with a very low
per-instance-loss, a combination of the two may produce an improved
PPV. The intuition here is that extremely low per-instance-losses may
result in Merlin’s identification of a local minimum where there
is in fact a near global minimum (which is much less strongly
correlated with membership).

The Morgan (**M**easuring l**O**ss, **R**elatively
**G**reater **A**round **N**eighborhood) attack uses three
different thresholds: a lower threshold on per-instance loss (*φ*_{L}),
an upper threshold on per-instance loss (*φ*_{U}),
and a threshold on the ratio as used by Merlin (*φ*_{M}). If a
record has a per-instance-loss that falls between *φ*_{L} and *φ*_{U}, and has a Merlin ratio of at least *φ*_{M}, Morgan identifies it as a member.

The figure shows the per-instance loss and Merlin ratio for
Purchase-100X (and expanded version of the Purchase-100 dataset that
we created for our experiments). Members and nonmembers are denoted
by orange and purple points respectively. The boxes show the
thresholds found by the threshold selection process (without access to
the training data, but with the same data distribution), and
illustrate the regions where members are identified by Morgan with
very high confidence (PPV ∼1). (See paper for details, and more result.)

## Imbalanced Priors

Previous work on membership inference attacks assumes a candidate pool
where half of the candidates are members. For most settings,
especially ones where there is a serious privacy risk for an
individual of being identified as a dataset member, this assumption is
unrealistic. It is important to understand how well inference attacks
work when the adversary’s candidate pool has a different prior
probability of being amember.

Here, the candidate pool from which the attacker attempts to select
members has *γ* times more non-member records than member
records. As shown above, even in situations that other papers do not
consider, wherein there are many times more non-members than members,
attacks are able to attain a high rate of positively-identified
members.

## Conclusion

The Merlin and Morgan attacks can reliably identify members even in
situations with imbalanced priors where other attacks fail to show
meaningful inference risk.

There remains a large gap between what can be guaranteed using
differential privacy methods, and what can be inferred using known
inference attacks. This means better inference attacks may exist, and
our results show that there are concrete ways to improve attacks
(e.g., our threshold-selection procedure) and to incorporate more
information to improve attacks. We are especially interested in
attacks that produce extremely high PPVs, even if this is only for a
small fraction of candidates, since for most scenarios this is where
the most serious privacy risks lie.

**Full paper:** Bargav Jayaraman, Lingxiao Wang, Katherine Knipmeyer,
Quanquan Gu, David Evans. *Revisiting Membership Inference Under
Realistic Assumptions* (arXiv).

**Code:** *https://github.com/bargavj/EvaluatingDPML*

The video of Xiao’s presentation for AISTATS 2020 is now available:
*Understanding the Intrinsic Robustness of Image Distributions using Conditional Generative Models*

Starting with Gilmer et al. (2018), several works have demonstrated
the inevitability of adversarial examples based on different
assumptions about the underlying input probability space. It remains
unclear, however, whether these results apply to natural image
distributions. In this work, we assume the underlying data
distribution is captured by some conditional generative model, and
prove intrinsic robustness bounds for a general class of classifiers,
which solves an open problem in Fawzi et al. (2018). Building upon the
state-of-the-art conditional generative models, we study the intrinsic
robustness of two common image benchmarks under _l__{2}
perturbations, and show the existence of a large gap between the
robustness limits implied by our theory and the adversarial robustness
achieved by current state-of-the-art robust models.

Comparisons between the theoretical intrinsic robustness bound and the
empirically estimated unconstrained (unc)/in-distribution (in)
adversarial robustness under *l*_{2} for ImageNet10
(&epsilon = 3.0). The dotted curve line represents the theoretical
bound on intrinsic robustness with horizontal axis denoting the
different choice of α. (See paper for details and results in
other settings.)

**Paper:** Xiao Zhang, Jinghui Chen, Quanquan Gu, David Evans. *Understanding the Intrinsic Robustness of Image Distributions using Conditional Generative Models*), AISTATS 2020. arXiv

**Code:** *https://github.com/xiaozhanguva/Intrinsic-Rob*

*Post by Sicheng Zhu*

With the rapid development of deep learning and the explosive growth
of unlabeled data, representation
learning is becoming increasingly
important. It has made impressive applications such as pre-trained
language models (e.g., BERT and
GPT-3).

Popular as it is, representation learning raises concerns about the
robustness of learned representations under adversarial settings. For
example, *how can we compare the robustness to different
representations*, and *how can we build representations that enable
robust downstream classifiers*?

In this work, we answer these questions by proposing a notion of
*adversarial robustness for representations*. We show what the best
achievable robustness for a downstream classifier is limited by a
measurable representation robustness, and provide a training principle
for learning adversarially robust representations.

# Adversarial Robustness for Representations

Despite various existing criteria for evaluating a representation
(e.g., smoothness, sparsity), there is no general way previously known
to measure a representation’s robustness under adversarial
perturbations. We propose a notion of adversarial robustness for
representations based on information-theoretic measures.

Consider a representation that maps an underlying data distribution to
a representation distribution. In this case, we can measure the
(standard-case) mutual information between the two distributions. Then
by perturbing the data distribution within a Wasserstein ball such
that the mutual information term is minimized, we can measure the
worst-case mutual information. The representation vulnerability (an
opposite notion of robustness) is defined as the difference between
the two terms.

This notion enjoys several desired properties in representation
learning scenarios-it is scale-invariant, label-free, and compatible
with different threat models (including the commonly used
*L*_{p} norm attacks). Most importantly, we show next that it
has a direct relationship with the performance of downstream tasks.

# Connecting Representation to Downstream Tasks

If a representation is robust, we show (theoretically in a synthetic
setting and empirically in general settings) that a properly trained
downstream classifier will perform consistently in both natural and
adversarial settings, that is the difference between the natural
accuracy and the adversarial accuracy will be small.

If a representation is not robust, we show that no robust downstream
classifiers can be built using that representation.

We provide an information-theoretic upper bound for the maximum robust
accuracy that can be achieved by any downstream classifier, with
respect to the representation robustness. We empirically evaluate the
tightness of this bound and find that the vulnerability of internal
layer representations of many neural networks is at least one
bottleneck for the model to be more robust.

For example, the representation defined by the logit layer of Resnet18
on CIFAR-10 only admits an adversarial accuracy of ~75% for any
downstream classifiers.

This motivates us to develop a method to learn adversarially robust
representations.

# A Learning Principle for Robust Representations

Based on the proposed notion, a natural way to learn adversarially
robust representations is to directly induce the representation
robustness on common representation learning objectives.

We consider a popular representation learning objective — mutual information
maximization — as it has impressive performance in practice
and many other objectives (e.g., noise contrastive estimation) can be
viewed as surrogate losses of this objective. By inducing the
representation robustness and setting a specific coefficient, we
provide the worst-case mutual information maximization principle for
learning adversarially robust representations.

We evaluate the performance of our representation learning principle
on four image classification benchmarks (MNIST, Fashion-MNIST, SVHN,
and CIFAR-10), here we report on CIFAR-10 (see the paper for the
others, where the results are similar).

Note that the representations are learned using only unlabeled data
and are kept fixed during the training of downstream classifiers. The
robust downstream classifier (trained using adversarial training)
benefits from the robust representation. It has both better natural
accuracy and better adversarial accuracy. The adversarial accuracy of
~31% is even comparable to the fully-supervised robust model with the
same architecture.

Even the standard classifier based on our robust representation
inherits a non-trivial adversarial accuracy from the robust
representation. And more interestingly, they also have better natural
accuracy compared to the baseline. This phenomenon is consistent with
some recent work using
adversarial training to learn pre-trained models and may indicate the
better standard generalization of adversarially learned
representations.

## Saliency Maps

We also visualize the saliency map of our learn representations as
side evaluation of adversarial robustness, since the relationship
between the interpretability of saliency maps and the adversarial
robustness (see Etmann et al.).

The saliency maps of our robust representation (third row) are less
noisy and more interpretable than its standard counterpart (second
row).

# Conclusions

We show that the adversarial robustness for representations is
correlated with the achievable robustness for downstream tasks, and
that an associated learning principle can be used to produce more
robust representations. Our work motivates leaning adversarially
robust representations as an intermediate step or as a regularization
to circumvent the insurmountable difficulty of directly learning
adversarially robust models.

**Paper:** Sicheng Zhu, Xiao Zhang, and David Evans.
*Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization*. In
International Conference on Machine Learning (ICML 2020), July 2020.
[PDF] [Supplemental Materials]
[ICML PDF] [arXiv]

Video Presentation (from ICML 2020)