Synthesize dataset

Synthesizing your dataset is a crucial step to extend the utility of your data while ensuring the privacy of the information it contains. Follow these steps to effectively synthesize your dataset using the VEIL.AI Anonymization Engine.

Step-by-Step Guide to Data Synthesis

Select the Dataset: Begin by selecting the dataset you wish to synthesize from the dropdown list of available datasets. This dataset will already have default parameters associated with it.

Set Parameters:

Review and Adjust Default Parameters: Adjust the epsilon (ε) and k values:

Epsilon (ε): Controls the trade-off between data privacy and accuracy. A lower epsilon value increases privacy by adding more noise, while a higher epsilon value maintains greater accuracy but offers less privacy.
k: Additionally, ensures that each individual in a dataset cannot be distinguished from at least k−1 others with similar attributes.

Provide Additional Parameters:

n: Define the size of the synthetic dataset, noting that n can currently only be up to 2 times the size of the dataset. Please contact us if the limitation on n prevents you to complete your work.
Result ID: Enter a unique result ID for the synthetic dataset for tracking and reference purposes.

Synthesize the Dataset: Once you are satisfied with the settings, click the 'Synthesize' button to start the synthesis process. Depending on the size of the dataset and the selected parameters, this might take some time.

The synthesis, risk analysis, and quality analysis are all run as background jobs. This allows you to continue working while the processes complete. You can monitor the progress of these tasks under the 'Tasks' section of the application.

Synthesize dataset Step-by-Step Guide to Data Synthesis