Security and Privacy Research at the University of Virginia

Our research seeks to empower individuals and organizations to control how their data is used. We use techniques from cryptography, programming languages, machine learning, operating systems, and other areas to both understand and improve the security of computing as practiced today, and as envisioned in the future.

Everyone is welcome at our research group meetings and summer reading/viewing groups on privacy and adversarial machine learning. To get announcements, join our Slack Group (any @virginia.edu email address can join themsleves, or email me to request an invitation).

Projects
Adversarial Machine Learning
EvadeML

Secure Multi-Party Computation
Obliv-C · MightBeEvil · SecureComputation.org

Past Projects
Web and Mobile Security: ScriptInspector · SSOScan
Program Analysis: Splint · Perracotta
Security through Diversity: N-Variant Systems
Physicrypt · More…

Recent Posts

Pointwise Paraphrase Appraisal is Potentially Problematic

Hannah Chen presented her paper on Pointwise Paraphrase Appraisal is Potentially Problematic at the ACL 2020 Student Research Workshop:

The prevailing approach for training and evaluating paraphrase identification models is constructed as a binary classification problem: the model is given a pair of sentences, and is judged by how accurately it classifies pairs as either paraphrases or non-paraphrases. This pointwise-based evaluation method does not match well the objective of most real world applications, so the goal of our work is to understand how models which perform well under pointwise evaluation may fail in practice and find better methods for evaluating paraphrase identification models. As a first step towards that goal, we show that although the standard way of fine-tuning BERT for paraphrase identification by pairing two sentences as one sequence results in a model with state-of-the-art performance, that model may perform poorly on simple tasks like identifying pairs with two identical sentences. Moreover, we show that these models may even predict a pair of randomly-selected sentences with higher paraphrase score than a pair of identical ones.

Paper: https://arxiv.org/abs/2005.11996
Talk Video


De-Naming the Blog

This blog was started in January 2008, a bit over eight years after I started as a professor at UVA and initiated the research group. It was named after Thomas Jefferson’s cipher wheel, which has long been (and remains) one of my favorite ways to introduce cryptography.

Figuring out how to honor our history, including Jefferson’s founding of the University, and appreciate his ideals and enormous contributions, while confronting the reality of Jefferson as a slave owner and abuser, will be a challenge and responsibility for people above my administrative rank. But, I’ve come to see that it is harmful to have a blogged named after Jefferson so have removed the Jefferson’s Wheel name from this research group blog.

For now, we’re going with a very generic “uvasrg” name…but hopefully will come up with something more interesting eventually. (I’ve reluctantly rejected “Hamilton’s Dual”, alluding, of course, to Margaret Hamilton and William Rown Hamilton, and the Lagranian Dual and Dual Execution, not to any historical rival of Jefferson’s.)


Oakland Test-of-Time Awards

I chaired the committee to select Test-of-Time Awards for the IEEE Symposium on Security and Privacy symposia from 1995-2006, which were presented at the Opening Section of the 41st IEEE Symposium on Security and Privacy.


NeurIPS 2019

Here's a video of Xiao Zhang's presentation at NeurIPS 2019:
https://slideslive.com/38921718/track-2-session-1 (starting at 26:50)

See this post for info on the paper.

Here are a few pictures from NeurIPS 2019 (by Sicheng Zhu and Mohammad Mahmoody):







USENIX Security 2020: Hybrid Batch Attacks

Finding Black-box Adversarial Examples with Limited Queries

Black-box attacks generate adversarial examples (AEs) against deep neural networks with only API access to the victim model.

Existing black-box attacks can be grouped into two main categories:

  • Transfer Attacks use white-box attacks on local models to find candidate adversarial examples that transfer to the target model.

  • Optimization Attacks use queries to the target model and apply optimization techniques to search for adversarial examples.

Hybrid Attack

We propose a hybrid attack that combines transfer and optimization attacks:

  1. Transfer Attack → Optimization Attack — take candidate adversarial examples of the local models of transfer attacks as the starting points for optimization attacks.

  2. Optimization Attack → Transfer Attack — intermediate query results from the optimization attacks are used to fine-tune the local models of transfer attacks.

We validate effectiveness of the hybrid attack over the baseline on three benchmark datasets: MNIST, CIFAR10, ImageNet. In this post, we only show the results of AutoZOOM as the selected optimization method. More results of other attacks can be found in the paper.

Local Adversarial Examples are Useful (Transfer → Optimization)

Below, we compare the performance of AutoZOOM attack when it starts from 1) the local adversarial examples, and 2) the original points. Here, we report results for targeted attacks on normal (i.e., non-robust) models:

Local AEs can substantially boost the performance of optimization attacks, but when the same attack is used against robust models, the improvement is small:

This ineffectiveness appears to stem from differences in the attack space of normal and robust models. Therefore, to improve effectiveness against robust target model, we use robust local models to produce the transfer candidates for starting the optimization attacks. The figure below compares impact of normal and robust local models when attacking the robust target model:

Tuning with Byproduces Doesn’t Help Much (Optimization → Transfer)

Below, we compare the performance of AutoZOOM attack on MNIST normal model when the local models are 1) fine-tuned during the attack process, and 2) kept static:

Tuining local models using byproducts from the optimization attack improves the query efficiency. However, for more complex datasets (e.g., CIFAR10), we observe degradation in the attack performance by fine-tuning (check Table 6 in the paper).

Batch Attacks

We consider a batch attack scenario: adversaries have limited number of queries and want to maximize the number of adversarial examples found within the limit. This is a more realistic way to evaluate attacks for most adversarial purposes, then just looking at the average cost to attack each seed in a large pool of seeds.

The number of queries required for attacking a specific seed varies greatly across seeds:

Based on this observation, we propose two-phase strategy to prioritize easy seeds for the hybrid attack:

  1. In the first phase, the likely-to-transfer seeds are prioritized based on their PGD-steps taken to attack the local models. The candidate adversarial example for seed seed is attempted in order to find all the direct transfers.

  2. In the second phase, the remaining seeds are prioritized based on their target loss value with respect to the target model.

To validate effectievness of the two-phase strategy, we compare to two seed prioritization strategies:

  • Retroactive Optimal: a non-realizable attack that assumes adversaries already know the exact number of queries to attack each seed (before the attack starts) and can prioritize seeds by their actual query cost. This provides an lower bound on the query cost for an optimal strategy.

  • Random: this is a baseline strategy where seeds are prioritized in random order (this is the stragety assumed in most works where the adverage costs are reported).

Results for the AutoZOOM attack on a normal ImageNet model are shown below:

Our two-phase strategy performs closely to the retroactive optimal strategy and outpeforms random baseline significantly: with same number of query limit, two-phase strategy finds significantly more adversarial examples comapred to the random baseline, and is closer to the retroactive optimal case. (See the paper for more experimental results and variations on the prioritization strategy.)

Main Takeaways

  • Transfer → Optimization: local adversarial examples can generally be used to boost optimization attacks. One caveat is, against robust target model, hybrid attack is more effective with robust local models.

  • Transfer → Optimization: fine-tuning local models is only helpful for small scale dataset (e.g., MNIST) and fails to generalize to more complex datasets. It is an open question whether we can make the fine-tuning process work for complex datasets.

  • Prioritizing seeds based on two-phase strategy for the hybrid attack can significantly improve its query efficiency in batch attack scenario.

Our results make the case that it is important to evaluate both attacks and defenses with a more realistic adversary model than just looking at the average cost to attack a seed over a large pool of seeds. When an adversary only need to find a small number of adversarial examples, and has access to a large pool of potential seeds to attack (of equal value to the adversary), then the effective costs of a successful attack can be orders of magnitude lower than what would be projected assuming an adversary who cannot prioritize seeds to attack.

Paper

Fnu Suya, Jianfeng Chi, David Evans and Yuan Tian. Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries. In USENIX Security 2020. Boston, August 2020. [PDF] [arXiv]

Code

https://github.com/suyeecav/Hybrid-Attack

In this repository, we provide the source code to reproduce the results in the paper. In addition, we believe our hybrid attack framework can (potentially) help boost the performance of new optimization attacks. Therefore, in the repository, we also provide tutorials to incorporate new optimization attacks into the hybrid attack framework.