Do Membership Inference Attacks Work on Large Language Models?

MIMIR logo. Image credit: GPT-4 + DALL-E Paper Code Data Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model’s training data. Despite extensive research on traditional machine learning models, there has been limited work studying MIA on the pre-training data of large language models (LLMs). We perform a large-scale evaluation of MIAs over a suite of language models (LMs) trained on the Pile, ranging from 160M to 12B parameters.


All Posts by Category or Tags.