diff --git a/docs/assets/quickstart/config_synthesizer.png b/docs/assets/quickstart/config_synthesizer.png deleted file mode 100644 index f4bd1dae..00000000 Binary files a/docs/assets/quickstart/config_synthesizer.png and /dev/null differ diff --git a/docs/assets/quickstart/create_synthetic_data.png b/docs/assets/quickstart/create_synthetic_data.png deleted file mode 100644 index d94d7cf4..00000000 Binary files a/docs/assets/quickstart/create_synthetic_data.png and /dev/null differ diff --git a/docs/assets/quickstart/data_catalog_add_dataset.png b/docs/assets/quickstart/data_catalog_add_dataset.png deleted file mode 100644 index f59d6feb..00000000 Binary files a/docs/assets/quickstart/data_catalog_add_dataset.png and /dev/null differ diff --git a/docs/assets/quickstart/data_catalog_create_connetors.png b/docs/assets/quickstart/data_catalog_create_connetors.png deleted file mode 100644 index 7d3fab81..00000000 Binary files a/docs/assets/quickstart/data_catalog_create_connetors.png and /dev/null differ diff --git a/docs/assets/quickstart/data_catalog_dataset_config.png b/docs/assets/quickstart/data_catalog_dataset_config.png deleted file mode 100644 index 1c18d044..00000000 Binary files a/docs/assets/quickstart/data_catalog_dataset_config.png and /dev/null differ diff --git a/docs/assets/quickstart/data_catalog_list.png b/docs/assets/quickstart/data_catalog_list.png deleted file mode 100644 index 00368642..00000000 Binary files a/docs/assets/quickstart/data_catalog_list.png and /dev/null differ diff --git a/docs/assets/quickstart/generate_samples.png b/docs/assets/quickstart/generate_samples.png deleted file mode 100644 index 6b500a2f..00000000 Binary files a/docs/assets/quickstart/generate_samples.png and /dev/null differ diff --git a/docs/assets/quickstart/masking_options.png b/docs/assets/quickstart/masking_options.png deleted file mode 100644 index c6f8f9e9..00000000 Binary files a/docs/assets/quickstart/masking_options.png and /dev/null differ diff --git a/docs/assets/quickstart/synthesizer_list.png b/docs/assets/quickstart/synthesizer_list.png deleted file mode 100644 index ca4f71c9..00000000 Binary files a/docs/assets/quickstart/synthesizer_list.png and /dev/null differ diff --git a/docs/assets/quickstart/synthetic_data/create_synthetic_data.webp b/docs/assets/quickstart/synthetic_data/create_synthetic_data.webp new file mode 100644 index 00000000..e2dfd90e Binary files /dev/null and b/docs/assets/quickstart/synthetic_data/create_synthetic_data.webp differ diff --git a/docs/assets/quickstart/synthetic_data/generated_synthetic_sample.webp b/docs/assets/quickstart/synthetic_data/generated_synthetic_sample.webp new file mode 100644 index 00000000..5b7285c8 Binary files /dev/null and b/docs/assets/quickstart/synthetic_data/generated_synthetic_sample.webp differ diff --git a/docs/assets/quickstart/synthetic_data/set_generation.webp b/docs/assets/quickstart/synthetic_data/set_generation.webp new file mode 100644 index 00000000..fde3c117 Binary files /dev/null and b/docs/assets/quickstart/synthetic_data/set_generation.webp differ diff --git a/docs/assets/quickstart/synthetic_data/synthetic_data_columns_sel.webp b/docs/assets/quickstart/synthetic_data/synthetic_data_columns_sel.webp new file mode 100644 index 00000000..dd52e9b1 Binary files /dev/null and b/docs/assets/quickstart/synthetic_data/synthetic_data_columns_sel.webp differ diff --git a/docs/assets/quickstart/synthetic_data/synthetic_data_configuration..png b/docs/assets/quickstart/synthetic_data/synthetic_data_configuration..png new file mode 100644 index 00000000..e69de29b diff --git a/docs/assets/quickstart/synthetic_data/synthetic_data_configuration.webp b/docs/assets/quickstart/synthetic_data/synthetic_data_configuration.webp new file mode 100644 index 00000000..0e8dd6bc Binary files /dev/null and b/docs/assets/quickstart/synthetic_data/synthetic_data_configuration.webp differ diff --git a/docs/assets/quickstart/synthetic_data/synthetic_data_overview.webp b/docs/assets/quickstart/synthetic_data/synthetic_data_overview.webp new file mode 100644 index 00000000..44506a91 Binary files /dev/null and b/docs/assets/quickstart/synthetic_data/synthetic_data_overview.webp differ diff --git a/docs/assets/quickstart/synthetic_data/trained_synthetic_data.webp b/docs/assets/quickstart/synthetic_data/trained_synthetic_data.webp new file mode 100644 index 00000000..63c76dd5 Binary files /dev/null and b/docs/assets/quickstart/synthetic_data/trained_synthetic_data.webp differ diff --git a/docs/assets/quickstart/synthetic_metadata.png b/docs/assets/quickstart/synthetic_metadata.png deleted file mode 100644 index 157bedfc..00000000 Binary files a/docs/assets/quickstart/synthetic_metadata.png and /dev/null differ diff --git a/docs/get-started/create_syntheticdata_generator.md b/docs/get-started/create_syntheticdata_generator.md index 3c007ac3..421d0c7b 100644 --- a/docs/get-started/create_syntheticdata_generator.md +++ b/docs/get-started/create_syntheticdata_generator.md @@ -1,50 +1,68 @@ # How to create your first Synthetic Data generator -To generate your first synthetic data, you need to start by creating a Synthesizer by accessing the **"Synthetic Data"** section on the **Home** section and clicking on **"Create Synthetic Data"**. +:fontawesome-brands-youtube:{ .youtube } +Check this quickstart video on how to create your first Synthetic Data generator. -
-![Create Synthetic Data](../assets/quickstart/create_synthetic_data.png){: style="height:550px;width:1000px"} -
+To generate your first synthetic data, you need to have a Dataset already available in your Data Catalog. +Check this tutorial to see how you can add your first dataset to Fabric’s Data Catalog. -You'll be asked to select the dataset you wish to generate synthetic data from and verify the columns you'd like to include in the synthesis process, validating their variable and data types. +With your first dataset created, you are now able to start the creation of your Synthetic Data generator. You can either +select **"Synthetic Data"** from your left side menu, or you can select **"Create Synthetic Data"** in your project Home +as shown in the image below. -
-![Verify Metadata](../assets/quickstart/synthetic_metadata.png){: style="height:550px;width:1000px"} -
+
+ Create Synthetic Data +
-If you wish to anonymize some columns in the data, you can do so in the **"Anonymize Columns"** section. The features that may correspond to potential PII will be identified and a suitable masking method is automatically suggested for each. However, you'll be able to select the most appropriate method by browsing the available strategies in the drop-down list. +You'll be asked to select the dataset you wish to generate synthetic data from and verify the columns you'd like to +include in the synthesis process, validating their *Variable* and *Data Types*. -
-![Anonymization](../assets/quickstart/masking_options.png){: style="height:550px;width:1000px"} -
+!!! Tip "Data types are relevant for synthetic data quality" + Data Types are important to be revisited and aligned with the objectives for the synthetic data as they can highly impact the quality + of the generated data. For example, let's say we have a column that is a "Name", while is some situations it would make sense + to consider it a String, under the light of a dataset where "Name" refers to the name of the product purchases, it might be more + beneficial to set it as a Category. -Finally, you can give your Synthesizer a descriptive name and set specific configurations such as the **Target** (in case your dataset is used for supervised tasks), **Privacy Level** (which defines the trade-off between fidelity and privacy), and whether to enable **Conditional Sampling**, in case you wish to control the generation of new synthetic samples according to specific conditions (useful for data augmentation and de-bias purposes). +
+ Configure Metadata +
-
-![Synthesizer Configuration](../assets/quickstart/config_synthesizer.png){: style="height:400px;width:1000px"} -
+Finally, as the last step of our process it comes the **Synthetic Data** specific configurations, for this particular case we +only need to define a *Display Name,* and we can finish the process by clicking in the **"Save"** button as per the image below. -Your Synthesizer will be created and trained and will appear in the **"Synthetic Data"** tab. +
+ Save Synthetic Data configurations +
-
-![Synthesizer List](../assets/quickstart/synthesizer_list.png){: style="height:600px;width:1200px"} -
+Your **Synthetic Data** generator is now training and listed under **"Synthetic Data"**. While the model is being trained, the *Status* will be +🟡, as soon as the training is completed successfully it will transition to 🟢 as per the image below. -Once the Synthesizer has finished training, you're ready to start generating your first synthetic dataset. From the list of available Synthesizers, you can click on the one you've just created to open its details. You'll be able to check several properties of your Synthesizer and even download a PDF report with a comphreensive overview of your Synthetic Data Quality Metrics. To generate a new synthetic data sample, you'll just need to access the **"Go to Generation" or "Generation"** tabs. +
+ Synthetic data generator trained successfully +
-
-![Sample Generation Tab](../assets/quickstart/go_generation.png){: style="height:600px;width:1200px"} -
+Once the Synthetic Data generator has finished training, you're ready to start generating your first synthetic dataset. +You can start by exploring an overview of the model configurations and even download a PDF report with a comprehensive overview of your +Synthetic Data Quality Metrics. Next, you can generate synthetic data samples by accessing the *Generation* tab or click on *"Go to Generation"*. -You can then define the number of new synthetic records to generate, and your sample history will be shown below. You'll be able to **"Compare"** your synthetic data against the original data, and add the synthetic data to the Data Catalog. +
+ Synthetic data generator overview +
-
-![Generate New Samples](../assets/quickstart/generate_samples.png){: style="height:600px;width:1200px"} -
+In this section, you are able to generate as many synthetic samples as you want. +For that you need to define the number rows to generate and click *"Generate"*, as depicted in the image below. -*Note:* -If you have a previously created Synthesizer already, you can directly generate new samples from the **Home** section, by accessing the **"Generate"** tab and choosing your desired Synthesizer. The widget will directly lead you to the generation section shown above. +
+ Generate synthetic data records +
-
-![Home Generate Widget](../assets/quickstart/generate_from_home.png){: style="height:600px;width:1200px"} -
+A new line in your *"Sample History"* will be shown and as soon as the sample generation is completed you will be able to +*"Compare"* your synthetic data with the original data, add as a Dataset with *"Add to Data Catalog"* and last but not the least +download it as a file with *"Download csv"*. + +
+ Synthetic data generator trained +
+ +**Congrats!** 🚀 You have now successfully created your first **Synthetic Data** generator with Fabric. +Get ready for your journey of improved quality data for AI.