Risk Analysis

These analyses help determine how well the anonymization prevents an attacker from identifying whether a record belonged to the original dataset or correctly matching anonymized records to individuals.

Membership Inference Attack

This attack tests whether an individual’s presence in a dataset can be inferred after anonymization. A sample size, attack set size, and prior knowledge threshold are configured to evaluate the anonymization effectiveness efficiently.

Summary of Results

A significant drop in True Positives (TP) for anonymized data shows that the anonymization reduces the likelihood of accurately identifying members.

False Positives (FP) tend to increase in anonymized data, which is desirable from a privacy perspective as it reduces the precision of the attack.

As the Hamming threshold increases (from 0 to 5), both TP and FP generally increase, but the growing FP count further undermines the attacker’s accuracy.

F1 Score and True Positive Rate (TPR) are notably lower in anonymized data, while False Positive Rate (FPR) and False Discovery Rate (FDR) are higher — indicating effective anonymization.

Re-Identification Risk Attack

This attack goes beyond membership and tries to re-identify specific individuals based on anonymized data.

Summary of Results

For anonymized datasets, True Positives are reduced to near zero, even at low Hamming distances.

False Positives are high, and the combination of high FDR and low precision makes accurate re-identification unlikely.

Across all tested Hamming thresholds, anonymized data yields consistently low TPR and F1 scores, confirming robust privacy protection.

Evaluating Anonymized Data Quality

Metrics such as F1 Score, True Positive Rate (TPR), False Positive Rate (FPR), and False Discovery Rate (FDR) are used to evaluate the quality of anonymization. These metrics range from 0 to 1.

Strong privacy protection in Anonymized Data:

Low F1 Score and True Positive Rate (TPR): Indicates a strong anonymization level, with few correct re-identifications.

High False Discovery Rate (FDR) and False Positive Rate (FPR): Suggests many false alarms, which dilutes the accuracy of any attempts to re-identify data, enhancing privacy.

Poor privacy protection in Anonymized Data:

High F1 Score and TPR: Suggests that the anonymization process may not be robust enough, as a significant number of correct re-identifications are occurring.

Low FDR and FPR: Indicates that the anonymization is not effective enough to confuse the re-identification attempts, leading to potential privacy breaches.

Effective anonymization reduces the performance of both Membership Inference and Re-Identification Risk Attacks, as demonstrated by lower F1 scores and TPR in anonymized data. If your anonymized data shows contrary results, consider adjusting the anonymization parameters to enhance data privacy.

Privacy Risk Assessment

A precision-prioritized framework is used to classify risk levels based on:

Precision

AUC (Area Under the Curve) approximation

Hamming distance tolerance

Precision	AUC > 0.75	AUC 0.60–0.75	AUC 0.50–0.60	AUC ≤ 0.50
> 0.75	High Risk	High Risk	Medium Risk	Medium Risk
0.60–0.75	High Risk	High Risk	Medium Risk	Low Risk
0.50–0.60	Medium Risk	Medium Risk	Low Risk	No Risk
≤ 0.50	Low Risk	Low Risk	No Risk	No Risk

Based on analysis results, well-anonymized datasets consistently fall into the “Low Risk” or “No Risk” categories.

Takeaways

The anonymization process demonstrates strong protection by consistently reducing the ability of attacks to correctly identify individuals or confirm membership.

If your anonymized data shows unexpectedly high TPR or F1 or low FDR and FPR, you may want to adjust your anonymization parameters — such as reducing epsilon or increasing k — to improve privacy.

Need help interpreting your risk analysis? Contact our support team.

We are continuously enhancing our anonymity verification tools—stay tuned for improvements within the VEIL.AI Native App.

Risk Analysis Membership Inference Attack Summary of Results Re-Identification Risk Attack Summary of Results Evaluating Anonymized Data Quality Privacy Risk Assessment Takeaways