Dataset Variables

You have created your dataset. Now you should review and update the privacy classifications of your dataset variables. This is a critical step in the anonymization and synthetic data generation process. This page helps you to select the appropriate privacy class for each variable within your dataset.

Automatic Variable Classification

By default, all variables are initially set to SA (Sensitive Attribute).

If a variable contains only unique values, it is automatically assigned to the ID class when the dataset is created. This helps prevent accidentally exposing direct identifiers in downstream steps.

Understanding Privacy Classes

Each variable in your dataset can be assigned to one of the following privacy classes, which determine how they are handled during the anonymization or synthesis processes:

ID Variables (Identifiers): These are direct identifiers such as names and social security numbers. They are removed from the dataset to protect individual privacy.

SA Variables (Sensitive Attributes): These variables include data that is considered sensitive, such as medical conditions or financial information. They undergo the standard anonymization process to prevent disclosure of personal information.

NSA Variables (Non-Sensitive Attributes): These are variables that do not reveal sensitive information about individuals and thus remain unchanged.

QI Variables (Quasi-Identifiers): These are attributes that do not uniquely identify an individual by themselves but could potentially re-identify individuals when combined with other data. Such variables are subject to thorough anonymization to ensure privacy.

Supported Data Type Combinations

Each data type is compatible with specific privacy classes. Assigning an unsupported combination will result in an error.

Data Type	ID	SA	QI	NSA
`int64`	✅	✅	✅	✅
`float64`	❌	✅	❌	✅
`boolean`	❌	✅	✅	✅
`date`	❌	✅	❌	✅
`string`	✅	✅	✅	✅

How to Set Privacy Classes for Variables

Begin by reviewing the list of variables included in your dataset. Understand the nature of each variable and its potential impact on individual privacy.

Type the name of the variable into the 'Variables' field. Make sure you spell the variable name correctly to match how it is listed in the dataset. Multiple variables from the same class can be provided in by separating them with a comma.

Under Privacy Class click on the dropdown menu and select the appropriate privacy class for the variable(s) you provided.

Click the 'Update' button to apply the new privacy class to the variable. This change will now be reflected in the dataset’s settings if not click the refresh button on the side.

Repeat these steps for the variables belonging to another privacy class if needed.

Now you are all set to anonymize or synthesize your data.

Dataset Variables Automatic Variable Classification Understanding Privacy Classes Supported Data Type Combinations How to Set Privacy Classes for Variables