Review of Membership Inference Attacks Against Machine Learning Models

5 minute read

Published:

How to attack a model

This paper introduces concept of Membership Inference Attacks Against Machine Learning Models.

Membership inference is determining whether a given data record was part of the models training dataset or not.

This can be dangerous because sometimes training data could be sensitive data. This paper investigates this problem in a black box scenario in which adversary can only supply input to model and get the output. I a white box scenatio the attack can be done by examining norm of the gradient given the data to network.

Training attack model

An adversary tries to predict whether a record is included in training set by training attack models, one for each class of target output. Assume we have record x and target model output y = f_{target}(x) and we want to see if it is in target training set or not. Attack model takes in (y,Y) where Y is true label of x and y = f_{target}(x) as inputs, and predicts in or out.

The question is how do we train this model?

In order to train attack models we train multiple shadow models and shadow models are intended to behave similar to target model but for a shadow model we know the ground truth, it means we know if a record was included in training set. Shadow models must be trained similar to target model so same algorithm or same ML service must be used. For each shadow model we provide train data and test data and we put (y,Y, in) for training records (Dshadow train ) and (y,Y, out) for test records (Dshadow test ) in attack model training set where Y is true label of x whether x belogs to Dshadow_train or Dshadow_test and y is output of shadow model, y = fshadow(x). We train attack model to perform this binary classification.

We need to provide data to train shadow models. This data can be noisy real data or can be generated by picking records that are classified by high confidence or it can be from statistical synthesis like having knowledge about marginal distributions.

Paper proposes a hill-climbing algorithm to find inputs that are classified as class c by high confidence for each output class. briefly, it starts by taking a random record and at each iteration proposing a new record by changing k (k is initialized to kmax and divided by 2 each time until it gets to kmin) features of record until probability for class c gets larger than previous accepted record, then if this probability is larger than a threshold and it is the maximum probability in y, it is accepted with probability of yc that is probability of this record belonging to class c. If after rejmax iteration we do not find a record with higher confidence than previous accepted record, search is terminated and started again. By the way this method may not be good for images because of difficulties exploring the space.

Results highlights

Success of membership inference is directly related to the (1) generalizability of the target model and (2) diversity of its training data. If the model overfits and does not generalize well to inputs beyond its training data, or if the training data is not representative, the model leaks information about its training inputs.

Effect of the number of classes and training data per class

Results shows that more classes contributes to information leakage. briefly, models with more output classes need to remember more about their training data, thus they leak more information. In general, the more data in the training dataset is associated with a given class, the lower the attack precision for that class.

Effect of overfitting

Assuming train-test accuracy gap as a measure of overfitting, bigger train-test accuracy gaps in a model indicate more information leakage. overfitting is not the only cause of information leakage but it contributes to that. So, the leakage of sensitive information about the training data is introduced as another form of overfitting by the paper. Different machine learning models, due to their different structures, remember different amounts of information about their training datasets.This leads to different amounts of information leakage even if the models are overfitted to the same degree.

How to defend against attacks

Regularization techniques such as dropout can help defeat overfitting. differentially private models are secure against this type effects. ML as service platforms need to explicitly warn customers about this risk and provide more visibility into the model and the methods that can be used to reduce this leakage.

Mitigation strategies and evaluations

 Restrict the prediction vector to top k classes in order to leak less information. It turns out that restricting it to only the most likely label does not foil the attack because of attack can still exploit the mislabeling behavior of the target model because members and non-members of 4 critical thoughts and technical suggestions 3 the training dataset are mislabeled differently (assigned to different wrong classes)

 Coarsen precision of the prediction vector.

 Increase entropy of the prediction vector: for logit vector z output probabilities are ezi/t/åj ezi/t to increase entropy and leak less information.

 Use regularization. Overally, the attack is robust against these mitigation strategies.

Critical thoughts and technical suggestions

Actor Critic Design for training In training process, model should be trying to satisfy two properties, increasing train accuracy (resulting overfitting if done too much) and preserving privacy (in contradiction with previous one). In other words we must maximize accuracy while minimizing information leakage. (similar to GAN networks) we must design a training process to satisfy both properties to fit the data very well ans also preserve privacy.