Inference Privacy | Security Research Group

Do Membership Inference Attacks Work on Large Language Models?

5 March 2024 Anshuman Suri, adversarial machine learning, privacy-preserving machine learning, distribution inference, inference privacy, LLMs, Michael Duan, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, Hannaneh Hajishirzi

MIMIR logo. Image credit: GPT-4 + DALL-E

Paper Code Data

Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model’s training data. Despite extensive research on traditional machine learning models, there has been limited work studying MIA on the pre-training data of large language models (LLMs).

We perform a large-scale evaluation of MIAs over a suite of language models (LMs) trained on the Pile, ranging from 160M to 12B parameters. We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains. Our further analyses reveal that this poor performance can be attributed to (1) the combination of a large dataset and few training iterations, and (2) an inherently fuzzy boundary between members and non-members.

SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

26 May 2023 Ahmed Salem, Anshuman Suri, Giovanni Cherubin, Boris Köpf, Andrew Paverd, Shruti Tople, Santiago Zanella-Béguelin, adversarial machine learning, privacy-preserving machine learning, membership inference, inference privacy

Our paper on the use of cryptographic-style games to model inference privacy is published in IEEE Symposium on Security and Privacy (Oakland):

Giovanni Cherubin, , Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, and Santiago Zanella-Béguelin. SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning. IEEE Symposium on Security and Privacy, 2023. [Arxiv]

Tired of diverse definitions of machine learning privacy risks? Curious about game-based definitions? In our paper, we present privacy games as a tool for describing and analyzing privacy risks in machine learning. Join us on May 22nd, 11 AM @IEEESSP '23 https://t.co/NbRuTmHyd2 pic.twitter.com/CIzsT7UY4b

Read More…

CVPR 2023: Manipulating Transfer Learning for Property Inference

2 May 2023 Yulong Tian, Anshuman Suri, Fnu Suya, adversarial machine learning, privacy-preserving machine learning, distribution inference, inference privacy, transfer learning

Manipulating Transfer Learning for Property Inference

Transfer learning is a popular method to train deep learning models efficiently. By reusing parameters from upstream pre-trained models, the downstream trainer can use fewer computing resources to train downstream models, compared to training models from scratch.

The figure below shows the typical process of transfer learning for vision tasks:

However, the nature of transfer learning can be exploited by a malicious upstream trainer, leading to severe risks to the downstream trainer.

Read More…

BIML: What Machine Learnt Models Reveal

19 July 2022 BIML, privacy-preserving machine learning, distribution inference, inference privacy, Gary McGraw

I gave a talk in the Berryville Institute of Machine Learning in the Barn series on What Machine Learnt Models Reveal, which is now available as an edited video:

David Evans, a professor of computer science researching security and privacy at the University of Virginia, talks about data leakage risk in ML systems and different approaches used to attack and secure models and datasets. Juxtaposing adversarial risks that target records and those aimed at attributes, David shows that differential privacy cannot capture all inference risks, and calls for more research based on privacy experiments aimed at both datasets and distributions.

Read More…

Microsoft Research Summit: Surprising (and unsurprising) Inference Risks in Machine Learning

21 October 2021 adversarial machine learning, privacy, Bargav Jayaraman, Anshuman Suri, Katherine Knipmeyer, inference privacy, privacy, privacy-preserving machine learning, Microsoft

Here are the slides for my talk at the Practical and Theoretical Privacy of Machine Learning Training Pipelines Workshop at the Microsoft Research Summit (21 October 2021):
Surprising (and Unsurprising) Inference Risks in Machine Learning [PDF]

The work by Bargav Jayaraman (with Katherine Knipmeyer, Lingxiao Wang, and Quanquan Gu) that I talked about on improving membership inference attacks is described in more details here:

Bargav Jayaraman, Lingxiao Wang, Katherine Knipmeyer, Quanquan Gu, David Evans. Revisiting Membership Inference Under Realistic Assumptions (PETS 2021).
[Blog] [Code: https://github.com/bargavj/EvaluatingDPML]

Read More…

UVA News Article

28 September 2021 adversarial machine learning, privacy, Bargav Jayaraman, Xiao Zhang, Jack Prescott, Anshuman Suri, Katherine Knipmeyer, inference privacy, privacy, privacy-preserving machine learning

UVA News has an article by Audra Book on our research on security and privacy of machine learning (with some very nice quotes from several students in the group, and me saying something positive about the NSA!): Computer science professor David Evans and his team conduct experiments to understand security and privacy risks associated with machine learning, 8 September 2021.

David Evans, professor of computer science in the University of Virginia School of Engineering and Applied Science, is leading research to understand how machine learning models can be compromised.

Read More…

ICLR DPML 2021: Inference Risks for Machine Learning

7 May 2021 adversarial machine learning, privacy, Bargav Jayaraman, Anshuman Suri, Katherine Knipmeyer, inference privacy, privacy, privacy-preserving machine learning

I gave an invited talk at the Distributed and Private Machine Learning (DPML) workshop at ICLR 2021 on Inference Risks for Machine Learning.

The talk mostly covers work by Bargav Jayaraman on evaluating privacy in machine learning and connecting attribute inference and imputation, and recent work by Anshuman Suri on property inference.

Codaspy 2021 Keynote: When Models Learn Too Much

26 April 2021 adversarial machine learning, privacy, Bargav Jayaraman, Anshuman Suri, Katherine Knipmeyer, inference privacy, privacy, privacy-preserving machine learning

Here are the slides for my talk at the 11th ACM Conference on Data and Application Security and Privacy:
When Models Learn Too Much [PDF]

The talk includes Bargav Jayaraman’s work (with Katherine Knipmeyer, Lingxiao Wang, and Quanquan Gu) on evaluating privacy in machine learning, as well as more recent work by Anshuman Suri on property inference attacks, and Bargav on attribute inference and imputation:

Merlin, Morgan, and the Importance of Thresholds and Priors

Evaluating Differentially Private Machine Learning in Practice

“When models learn too much. “ Dr. David Evans @UdacityDave of University of Virginia gave a keynote talk on different inference risks for machine learning models this morning at #codaspy21 pic.twitter.com/KVgFoUA6sa

Read More…

Page 1 of 1

All Posts by Category or Tags.