Skip to content

Commit

Permalink
docs: add quickstart section (#70)
Browse files Browse the repository at this point in the history
* docs: add fabric overview

* docs: add image fabric flow

* docs: fix relative links

* docs: add quickstart section

* docs: update structure for Fabric quickstart section

* fix(linting): code formatting

* docs: Update upload CSV file quickstart.

* docs: update quickstart index

* docs: update create your first synthetic data quickstart guide.

* fix(linting): code formatting

* docs: fix sdk references

* docs: add new create lab quickstart.

* fix(linting): code formatting

* docs: update quickstart tutorial for first pipeline.

* fix(linting): code formatting

* docs: Add quickstart lab guide to index

---------

Co-authored-by: Fabiana Clemente <[email protected]>
Co-authored-by: Azory YData Bot <[email protected]>
  • Loading branch information
3 people authored Nov 30, 2023
1 parent 5e2b897 commit 74952d8
Show file tree
Hide file tree
Showing 56 changed files with 315 additions and 1 deletion.
Binary file added docs/assets/overview/fabric_data_centric_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/quickstart/create_lab/create_lab.webp
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added docs/assets/quickstart/create_lab/open_lab.webp
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added docs/assets/quickstart/create_pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added docs/assets/quickstart/first_pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/quickstart/generate_from_home.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/quickstart/go_generation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/quickstart/lab_creation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/quickstart/lab_section.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/quickstart/labs_list.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions docs/get-started/create_lab.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# How to create your first Lab environment

Labs are code environments for a more flexible development of data-driven solutions while leveraging Fabric capabilities
combined with already loved tools such as scikit-learn, numpy and pandas.
To create your first **Lab**, you can use the **“Create Lab”** from Fabric’s home, or you can access it from the Labs
module by selecting it on the left side menu, and clicking the **“Create Lab”** button.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_lab/create_lab.webp" alt="Select create a lab from Home" style="width: 75%;">
</div>

Next, a menu with different IDEs will be shown. As a quickstart select *Jupyter Lab*. As labs are development environments
you will be also asked what language you would prefer your environment to support: *R* or *Python*. Select Python.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_lab/select_ide.webp" alt="Select an IDE" style="width: 50%;">
<img src="/assets/quickstart/create_lab/select_language.webp" alt="Python or R" style="width: 50%;">
</div>

Bundles are environments with pre-installed packages. Select YData bundle, so we can leverage some other Fabric features
such as Data Profiling, Synthetic Data and Pipelines.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_lab/select_bundle.webp" alt="Select the bundle for development" style="width: 75%;">
</div>

As a last step, you will be asked to configure the infrastructure resources for this new environment as well as giving it
a *Display Name*. We will keep the defaults,
but you have flexibility to select GPU acceleration or whether you need more computational resources for your developments.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_lab/select_infrastructure.webp" alt="Select the computational resources" style="width: 75%;">
</div>

Finally, your Lab will be created and added to the "Labs" list, as per the image below. The status of the lab will be
🟡 while preparing, and this process takes a few minutes, as the infrastructure is being allocated to your development environment.
As soon as the status changes to 🟢, you can open your lab by clicking in the button as shown below:

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_lab/open_lab.webp" alt="Open your lab environment" style="width: 75%;">
</div>

Create a new notebook in the JupyterLab and give it a name. You are now ready to start your developments!

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_lab/notebook_creation.webp" alt="Create a new notebook" style="width: 50%;">
<img src="/assets/quickstart/create_lab/notebook_created.webp" alt="Notebook created" style="width: 50%;">
</div>

**Congrats!** 🚀 You have now successfully created your first **Lab** a code environment, so you can benefit from the most
advanced Fabric features as well as compose complex data workflows.
Get ready for your journey of improved quality data for AI.
113 changes: 113 additions & 0 deletions docs/get-started/create_pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# How to create your first Pipeline

:fontawesome-brands-youtube:{ .youtube }
Check this quickstart video on <a href="https://youtu.be/_zZBt2nWiH8"><u>how to create your first Pipeline</u></a>.

The best way to get started with Pipelines is to use the interactive Pipeline editor available in the Labs with Jupyter Lab set as IDE.
If you don't have a **Lab** yet, or you don't know how to create one, check our <a href="create_lab"><u>quickstart guide on how to create your first lab</u></a>.

Open an already existing lab.

A Pipeline comprises one or more nodes that are connected (or not!) with each other to define execution dependencies. Each pipeline node
is and should be implemented as a component that is expected to manage a single task, such as read the data, profiling the data, training a model,
or even publishing a model to production environments.

In this tutorial we will build a simple and generic pipeline that use a **Dataset** from Fabric's **Data Catalog** and profile to check it's quality.
We have the notebooks template already available. For that you need to access the *"Academy"* folder as per the image below.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/academy_folder.webp" alt="Academy folder" style="width: 75%;">
</div>

Make sure to copy all the files in the folder "3 - Pipelines/quickstart" to the root folder of your lab, as per the image below.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/copy_files.webp" alt="Select your pipeline editor" style="width: 75%;">
</div>

Now that we have our notebooks we need to make a small change in the notebook "1. Read dataset". Go back to your **Data Catalog**, from one of the datasets
in your Catalog list, select the three vertical dots and click in **"Explore in Labs"** as shown in the image below.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/explore_in_labs.webp" alt="Explore the dataset in the labs" style="width: 70%;">
</div>

The following screen will be shown. Click in copy.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/code_snippet.webp" alt="Dataset code snippet" style="width: 35%;">
</div>

Now that we have copied the code, let's get back to our **"1. Read data.ipynb"** notebook, and replace the first code cell by with the new code. This will allow us to use a
dataset from the Data Catalog in our pipeline.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/og_code.webp" alt="Dataset code snippet" style="width: 50%;">
<img src="/assets/quickstart/create_pipeline/replaced_code.webp" alt="Dataset code snippet" style="width: 50%;">
</div>

With our notebooks ready, we can now configure our **Pipeline**.
For this quickstart we will be leveraging an already existing pipeline - double-click the file *my_first_pipeline.pipeline*. You should see a pipeline
as depicted in the images below.
To create a new Pipeline, you can open the lab launcher tab and select **"Pipeline Editor"**.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/open_pipeline.webp" alt="Open pipeline" style="width: 40%;">
<img src="/assets/quickstart/create_pipeline/my_first_pipeline.webp" alt="My first pipeline" style="width: 60%;">
</div>

Before running the pipeline, we need to check each component/step properties and configurations. Right-click each one of the steps, select *"Open Properties"*, and a
menu will be depicted in your right side. Make sure that you have *"YData - CPU"* selected as the **Runtime Image** as show below.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/open_properties.webp" alt="Open pipeline" style="width: 50%;">
<img src="/assets/quickstart/create_pipeline/runtime_image.webp" alt="My first pipeline" style="width: 50%;">
</div>

We are now ready to create and run our first pipeline. In the top left corner of the pipeline editor, the run button
will be available for you to click.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/run_pipeline.webp" alt="Select your pipeline editor" style="width: 75%;">
</div>

Accept the default values shown in the run dialog and start the run

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/pipeline_default_dialog.webp" alt="Pipeline configuration confirm dialog" style="width: 30%;">
</div>

If the following message is shown, it means that you have create a run of your first pipeline.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/pipeline_creation_success.webp" alt="Select your pipeline editor" style="width: 60%;">
</div>

Now that you have created your first pipeline, you can select the **Pipeline** from Fabric's left side menu.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/pipelines_menu.webp" alt="Select Fabric Pipelines" style="width: 70%;">
</div>

Your most recent pipeline will be listed, as shown in below image.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/my_pipeline_record.webp" alt="My first pipeline listed" style="width: 70%;">
</div>

To check the run of your pipeline, jump into the **"Run"** tab. You will be able to see your first pipeline running!

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/my_first_pipeline_run.webp" alt="My first pipeline listed" style="width: 70%;">
</div>

By clicking on top of the record you will be able to see the progress of the run step-by-step, and visualize the outputs of each and every
step by clicking on each step and selecting the **Visualizations** tab.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/create_pipeline/pipeline_progress.webp" alt="My first pipeline listed" style="width: 70%;">
</div>

**Congrats!** 🚀 You have now successfully created your first **Pipeline** a code environment, so you can benefit from Fabric's
orchestration engine to crate scalable, versionable and comparable data workflows.
Get ready for your journey of improved quality data for AI.
68 changes: 68 additions & 0 deletions docs/get-started/create_syntheticdata_generator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# How to create your first Synthetic Data generator

:fontawesome-brands-youtube:{ .youtube }
Check this quickstart video on <a href="https://youtu.be/GsfggG9PhgE?si=ixlCaesd3cLFOCZm"><u>how to create your first Synthetic Data generator</u></a>.

To generate your first synthetic data, you need to have a Dataset already available in your Data Catalog.
Check this tutorial to see how you can <a href="upload_csv"><u>add your first dataset to Fabric’s Data Catalog</u></a>.

With your first dataset created, you are now able to start the creation of your Synthetic Data generator. You can either
select **"Synthetic Data"** from your left side menu, or you can select **"Create Synthetic Data"** in your project Home
as shown in the image below.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/synthetic_data/create_synthetic_data.webp" alt="Create Synthetic Data" style="width: 75%;">
</div>

You'll be asked to select the dataset you wish to generate synthetic data from and verify the columns you'd like to
include in the synthesis process, validating their *Variable* and *Data Types*.

!!! Tip "Data types are relevant for synthetic data quality"
Data Types are important to be revisited and aligned with the objectives for the synthetic data as they can highly impact the quality
of the generated data. For example, let's say we have a column that is a "Name", while is some situations it would make sense
to consider it a String, under the light of a dataset where "Name" refers to the name of the product purchases, it might be more
beneficial to set it as a Category.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/synthetic_data/synthetic_data_columns_sel.webp" alt="Configure Metadata" style="width: 75%;">
</div>

Finally, as the last step of our process it comes the **Synthetic Data** specific configurations, for this particular case we
only need to define a *Display Name,* and we can finish the process by clicking in the **"Save"** button as per the image below.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/synthetic_data/synthetic_data_configuration.webp" alt="Save Synthetic Data configurations" style="width: 75%;">
</div>

Your **Synthetic Data** generator is now training and listed under **"Synthetic Data"**. While the model is being trained, the *Status* will be
🟡, as soon as the training is completed successfully it will transition to 🟢 as per the image below.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/synthetic_data/trained_synthetic_data.webp" alt="Synthetic data generator trained successfully" style="width: 75%;">
</div>

Once the Synthetic Data generator has finished training, you're ready to start generating your first synthetic dataset.
You can start by exploring an overview of the model configurations and even download a PDF report with a comprehensive overview of your
Synthetic Data Quality Metrics. Next, you can generate synthetic data samples by accessing the *Generation* tab or click on *"Go to Generation"*.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/synthetic_data/synthetic_data_overview.webp" alt="Synthetic data generator overview" style="width: 75%;">
</div>

In this section, you are able to generate as many synthetic samples as you want.
For that you need to define the number rows to generate and click *"Generate"*, as depicted in the image below.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/synthetic_data/set_generation.webp" alt="Generate synthetic data records" style="width: 75%;">
</div>

A new line in your *"Sample History"* will be shown and as soon as the sample generation is completed you will be able to
*"Compare"* your synthetic data with the original data, add as a Dataset with *"Add to Data Catalog"* and last but not the least
download it as a file with *"Download csv"*.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/synthetic_data/generated_synthetic_sample.webp" alt="Synthetic data generator trained" style="width: 75%;">
</div>

**Congrats!** 🚀 You have now successfully created your first **Synthetic Data** generator with Fabric.
Get ready for your journey of improved quality data for AI.
10 changes: 10 additions & 0 deletions docs/get-started/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Get started with Fabric

The get started is here to help you if you are not yet familiar with YData Fabric or if you just want to learn more about
data quality, data preparation workflows and how you can start leveraging synthetic data.
<a href="fabric_community"><u>Mention to YData Fabric Community</u></a>

### 📚 <a href="upload_csv"><u>Create your first Data with the Data Catalog</u></a>
### ⚙️ <a href="create_syntheticdata_generator"><u>Create your first Synthetic Data generator</u></a>
### 🧪 <a href="create_lab"><u>Create your first Lab</u></a>
### 🌀 <a href="create_pipeline"><u>Create your first data Pipeline</u></a>
60 changes: 60 additions & 0 deletions docs/get-started/upload_csv.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# How to create your first Dataset from a CSV file

:fontawesome-brands-youtube:{ .youtube }
Check this quickstart video on <a href="https://youtu.be/1zYreRKsNGE"><u>how to create your first Dataset from a CSV file</u></a>.

To create your first dataset in the **Data Catalog**, you can start by clicking on **"Add Dataset"** from the Home section.
Or click to **Data Catalog** (on the left side menu) and click **“Add Dataset”**.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/upload_csv/welcome_add_dataset.png" alt="Add dataset from Home" style="width: 75%;">
</div>

After that the below modal will be shown. You will need to select a connector. To upload a CSV file, we need to select **“Upload CSV”**.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/upload_csv/data_catalog_connectors.png" alt="Select connectors to storage" style="width: 45%;">
</div>

Once you've selected the **“Upload CSV”** connector, a new screen will appear, enabling you to upload your file and designate a name for your connector.
This file upload connector will subsequently empower you to create one or more datasets from the same file at a later stage.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/upload_csv/loading_area.png" alt="Upload file area" style="width: 45%;">
<img src="/assets/quickstart/upload_csv/load_csv_file.png" alt="Upload CSV file" style="width: 45%;">
</div>

With the *Connector* created, you'll be able to add a dataset and specify its properties:

- **Name:** The name of your dataset;
- **Separator:** This is an important parameter to make sure that we can parse your CSV correctly. The default value is “,”.
- **Data Type:** Whether your dataset contains tabular or time-series (i.e., containing temporal dependency) data.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/upload_csv/add_dataset_details.png" alt="Upload file area" style="width: 45%;">
</div>

Your created Connector *(“Census File”)* and Dataset *(“Census”)* will be added to the Data Catalog.
As soon as the status is green, you can navigate your Dataset. Click in **Open Dataset** as per the image below.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/upload_csv/open_dataset.png" alt="Upload file area" style="width: 75%;">
</div>

Within the **Dataset** details, you can gain valuable insights through our automated data quality profiling.
This includes comprehensive metadata and an overview of your data, encompassing details like row count, identification
of duplicates, and insights into the overall quality of your dataset.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/upload_csv/dataset_overview.png" alt="Upload file area" style="width: 75%;">
</div>

Or perhaps, you want to further explore through visualization, the profile of your data with both univariate
and multivariate of your data.

<div style="display: flex; justify-content: center;align-items: center;">
<img src="/assets/quickstart/upload_csv/dataset_profiling.png" alt="Upload file area" style="width: 75%;">
</div>

**Congrats!** 🚀 You have now successfully created your first **Connector** and **Dataset** in Fabric’s Data Catalog.
Get ready for your journey of improved quality data for AI.
4 changes: 4 additions & 0 deletions docs/stylesheets/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,7 @@
--md-footer-fg-color--light: hsla(0, 0%, 100%, 0.7);
--md-footer-fg-color--lighter: hsla(0, 0%, 100%, 0.3);
}

.youtube {
color: #EE0F0F;
}
9 changes: 8 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,14 @@ dev_addr: 0.0.0.0:1235
site_dir: "static/docs"
nav:
- Welcome: 'index.md'
- Get started with Fabric: "get-started/fabric_community.md"
- Get started with Fabric:
- "get-started/index.md"
- Quickstart:
- How to create your first Dataset from a CSV file: "get-started/upload_csv.md"
- How to create your first Synthetic Data generator: "get-started/create_syntheticdata_generator.md"
- How to create your first Lab: "get-started/create_lab.md"
- How to create your first Pipeline: "get-started/create_pipeline.md"
- Fabric Community: "get-started/fabric_community.md"
- SDK:
- Overview: "sdk/index.md"
- Installation: 'sdk/installation.md'
Expand Down

0 comments on commit 74952d8

Please sign in to comment.