Authors:
Pasquale Coscia
1
;
Stefano Ferrari
1
;
Vincenzo Piuri
1
and
Ayse Salman
2
Affiliations:
1
Department of Computer Science, Università degli Studi di Milano, via Celoria 18, Milano, Italy
;
2
Department of Computer Engineering, Maltepe University, 34857 Maltepe, Istanbul, Turkey
Keyword(s):
Membership Inference Attack, Generative Models, Fréchet Coefficient.
Abstract:
Synthetic data are widely employed across diverse fields, including computer vision, robotics, and cybersecurity. However, generative models are prone to unintentionally revealing sensitive information from their training datasets, primarily due to overfitting phenomena. In this context, membership inference attacks (MIAs) have emerged as a significant privacy threat. These attacks employ binary classifiers to verify whether a specific data sample was part of the model’s training set, thereby discriminating between member and non-member samples. Despite their growing relevance, the interpretation of MIA outcomes can be misleading without a detailed understanding of the data domains involved during both model development and evaluation. To bridge this gap, we performed an analysis focused on a particular category (i.e., vehicles) to assess the effectiveness of MIA under scenarios with limited overlap in data distribution. First, we introduce a data selection strategy, based on the Fré
chet Coefficient, to filter and curate the evaluation datasets, followed by the execution of membership inference attacks under varying degrees of distributional overlap. Our findings indicate that MIAs are highly effective when the training and evaluation data distributions are well aligned, but their accuracy drops significantly under distribution shifts or when domain knowledge is limited. These results highlight the limitations of current MIA methodologies in reliably assessing privacy risks in generative modeling contexts.
(More)