Security and Privacy Research at the University of Virginia

Our research seeks to empower individuals and organizations to control how their data is used. We use techniques from cryptography, programming languages, machine learning, operating systems, and other areas to both understand and improve the privacy and security of computing as practiced today, and as envisioned in the future. A major current focus is on adversarial machine learning.

Everyone is welcome at our research group meetings. To get announcements, join our Slack Group (any @virginia.edu email address can join themsleves, or email me to request an invitation).

Recent Posts

CrySP Talk: When Models Learn Too Much

I gave a talko on When Models Learn Too Much at the University of Waterloo (virtually) in the CrySP Speaker Series on Privacy (29 March 2021):

Abstract

Statistical machine learning uses training data to produce models that capture patterns in that data. When models are trained on private data, such as medical records or personal emails, there is a risk that those models not only learn the hoped-for patterns, but will also learn and expose sensitive information about their training data. Several different types of inference attacks on machine learning models have been found, and methods have been proposed to mitigate the risks of exposing sensitive aspects of training data. Differential privacy provides formal guarantees bounding certain types of inference risk, but, at least with state-of-the-art methods, providing substantive differential privacy guarantees requires adding so much noise to the training process for complex models that the resulting models are useless. Experimental evidence, however, suggests that inference attacks have limited power, and in many cases a very small amount of privacy noise seems to be enough to defuse inference attacks. In this talk, I will give an overview of a variety of different inference risks for machine learning models, talk about strategies for evaluating model inference risks, and report on some experiments by our research group to better understand the power of inference attacks in more realistic settings, and explore some broader the connections between privacy, fairness, and adversarial robustness.


Improved Estimation of Concentration (ICLR 2021)

Our paper on Improved Estimation of Concentration Under ℓp-Norm Distance Metrics Using Half Spaces (Jack Prescott, Xiao Zhang, and David Evans) will be presented at ICLR 2021.

Abstract: Concentration of measure has been argued to be the fundamental cause of adversarial vulnerability. Mahloujifar et al. (2019) presented an empirical way to measure the concentration of a data distribution using samples, and employed it to find lower bounds on intrinsic robustness for several benchmark datasets. However, it remains unclear whether these lower bounds are tight enough to provide a useful approximation for the intrinsic robustness of a dataset. To gain a deeper understanding of the concentration of measure phenomenon, we first extend the Gaussian Isoperimetric Inequality to non-spherical Gaussian measures and arbitrary ℓp-norms (p ≥ 2). We leverage these theoretical insights to design a method that uses half-spaces to estimate the concentration of any empirical dataset under ℓp-norm distance metrics. Our proposed algorithm is more efficient than Mahloujifar et al. (2019)‘s, and experiments on synthetic datasets and image benchmarks demonstrate that it is able to find much tighter intrinsic robustness bounds. These tighter estimates provide further evidence that rules out intrinsic dataset concentration as a possible explanation for the adversarial vulnerability of state-of-the-art classifiers.

Here’s Jack’s video summary of the work:

Paper: arXiv, Open Review
Code: https://github.com/jackbprescott/EMC_HalfSpaces


Virginia Consumer Data Protection Act

Josephine Lamp presented on the new data privacy law that is pending in Virginia (it still needs a few steps including expected signing by governor, but likely to go into effect Jan 1, 2023): Slides (PDF)

This article provides a summary of the law: Virginia Passes Consumer Privacy Law; Other States May Follow, National Law Review, 17 February 2021.

The law itself is here: SB 1392: Consumer Data Protection Act


Algorithmic Accountability and the Law

Brink News (a publication of The Atlantic) published an essay I co-authored with Tom Nachbar (UVA Law School) on how the law views algorithmic accountability and the limits of what measures are permitted under the law to adjust algorithms to counter inequity:

Algorithms Are Running Foul of Anti-Discrimination Law
Tom Nachbar and David Evans
Brink, 7 December 2020



Computing systems that are found to discriminate on prohibited bases, such as race or sex, are no longer surprising. We’ve seen hiring systems that discriminate against women image systems that are prone to cropping out dark-colored faces and credit scoring systems that discriminate against minorities.

Anyone considering deploying an algorithm that impacts humans needs to understand the potential for such algorithms to discriminate. But what to do about it is much less clear.

The Difficulty of Mandating Fairness

There are no simple ways to ensure that an algorithm doesn’t discriminate, and many of the proposed fixes run the risk of violating anti-discrimination law. In particular, approaches that seek to optimize computing systems for various notions of fairness, especially those concerned with the distribution of outcomes along legally protected criteria such as race or sex, are in considerable tension with U.S. anti-discrimination law.

Although many arguments about discriminatory algorithms are premised on unfair outcomes, such notions have limited relevance under U.S. law.

For the most part, U.S. law lacks a notion of fairness.

Legal rules generally call upon more specific notions than fairness, even if they are connected to fairness. Thus, in the context of illegal employment discrimination (which we will use as our motivating example), instead of mandating fairness, U.S. law generally prohibits conduct that discriminates on the basis of protected characteristics, like race and sex.

Process and Intent Matter

Moreover, the law does not generally regulate behavior based on outcomes; what matters is the intent and process that led to the outcome. In the case of U.S. employment discrimination law, those rules of intent and process are contained in two types of protections against discrimination: disparate treatment and disparate impact.

An employer is liable for disparate treatment when there is either explicit or intentional discrimination. Disparate treatment protections prohibit the use of overt racial classifications, but also provide liability for hidden but intentional discrimination, such as cases where a victim can show they are a member of a racial minority and were qualified, but were rejected, and the employer cannot provide any nondiscriminatory justification for the decision, such as the case of McDonnell Douglas Corp. v. Green.

Under disparate impact, an employer following a process that has a statistically observable negative impact on a protected group is not necessarily liable. Instead, the disparate outcomes transfer the burden to the employer to show that the decision-making process is justified based on valid criteria, as with the case of Griggs v. Duke Power Co..

If you think those two approaches sound confusingly similar, you’re not alone. Disparate impact liability frequently mirrors disparate treatment liability in that the disparate outcome itself is not enough to establish a violation.

The role of disparate outcomes is to shift the burden to the employer to provide a non-discriminatory reason for its decision-making process. What matters legally is not so much the outcome, as the intention and process behind it.

Correcting for past racial disparities will require a more sophisticated and deep-seated approach than simply altering algorithms.

The Law Doesn’t Care Whether Decisions Are Made by Algorithms or Humans

When outcomes are based on the output of some algorithm, the employer still needs to justify that the decisions it makes are based on valid criteria. The law doesn’t care whether decisions are made by algorithms or by human decision-makers; what matters is the reason for the decision. It is up to the humans responsible to explain that reasoning.

Although many have argued for increased algorithmic transparency, even the most transparent algorithms cannot really explain why they made the decisions they did. This presents a major challenge for discrimination law because, in discrimination law, the “why” matters.

Algorithmically generated explanations can help but, by themselves, cannot answer the legal “why” question. Even an interpretable model that appears to have no discriminatory intent is not necessarily non-discriminatory. The rules it has learned could have been influenced by selecting training data that disadvantages a particular group, and the features it uses could be determined in ways that are inherently discriminatory.

To satisfy discrimination law, it is the process and the intent that matter, and explanations an algorithm itself produces are insufficient to establish that intent. Indeed, explanations of the intent of algorithms should be viewed with the same skepticism that we have when humans attempt to describe their own decision-making processes. Just because an explanation is provided does not mean it should be believed.

Optimizing an Algorithm for Fairness May Be Discriminatory

This is a particular problem for methods used by systems designers, who frequently seek to optimize for particular outcomes. An optimization approach does not fit well with legal requirements. When algorithm designers focus on fairness as a property to be optimized, they ignore the legal requirements of anti-discrimination.

Discrimination law does not operate through optimization, because discrimination (or anti-discrimination) is not something to be optimized. Anti-discrimination is a side constraint on a decision-making process, not its principal goal (e.g., to find good employees).

Systems should be designed to optimize for their principal goal, with the constraint of avoiding discrimination (in intent or process) while doing so. Attempts to produce outcomes that seem less discriminatory might themselves constitute illegal discrimination. The 2009 U.S. Supreme Court case of Ricci v. DeStafano provides a prime example of that tension. In the case, the New Haven Fire Department used an examination to determine which firefighters should be promoted to lieutenant. When that test produced a result that was racially skewed compared to the population of firefighters, the city (in part because they were concerned about disparate impact liability), invalidated the results of the test. White firefighters sued, claiming the city’s response in rejecting the test was itself disparate treatment, since the motivation for rejecting the test was to produce a different racial outcome, and the Supreme Court agreed.

Algorithms Alone Cannot Save Us

Although reducing racial disparity is a laudable goal, the law substantially limits the discretion of both employers and system designers in engineering for equitable outcomes. Racially disparate outcomes may seem unfair, and they might even be evidence of underlying illegal discrimination, but the law neither deputizes systems designers to operationalize their notions of what are racially fair outcomes, nor immunizes them for acts of discrimination undertaken in order to correct racial disparities.

Correcting for past racial disparities will require a more sophisticated and deep-seated approach than simply altering algorithms to produce outcomes optimized toward some fairness criterion.

Algorithms alone are neither the source nor the solution to our problems. Solving them will require fundamental change, and the real question is whether we as a society — not just our algorithms — are prepared to do that work.


Microsoft Security Data Science Colloquium: Inference Privacy in Theory and Practice

Here are the slides for my talk at the Microsoft Security Data Science Colloquium:
When Models Learn Too Much: Inference Privacy in Theory and Practice [PDF]

The talk is mostly about Bargav Jayaraman’s work (with Katherine Knipmeyer, Lingxiao Wang, and Quanquan Gu) on evaluating privacy: