Skip to content

Commit

Permalink
Merge branch '2.0' into eurostat_2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
Rafalz13 authored Oct 4, 2024
2 parents bf84d34 + 573f988 commit 8b1cdce
Show file tree
Hide file tree
Showing 26 changed files with 1,420 additions and 101 deletions.
3 changes: 2 additions & 1 deletion .prettierignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
docs/references/sources/sql.md
docs/references/sources/database.md
docs/references/sources/api.md
52 changes: 35 additions & 17 deletions docs/references/orchestration/prefect/flows.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,12 @@
::: viadot.orchestration.prefect.flows.cloud_for_customers_to_adls

::: viadot.orchestration.prefect.flows.cloud_for_customers_to_databricks

::: viadot.orchestration.prefect.flows.exchange_rates_to_adls

::: viadot.orchestration.prefect.flows.exchange_rates_to_databricks

::: viadot.orchestration.prefect.flows.sap_to_redshift_spectrum

::: viadot.orchestration.prefect.flows.sharepoint_to_adls
::: viadot.orchestration.prefect.flows.azure_sql_to_adls

::: viadot.orchestration.prefect.flows.sharepoint_to_databricks

::: viadot.orchestration.prefect.flows.sharepoint_to_redshift_spectrum
::: viadot.orchestration.prefect.flows.bigquery_to_adls

::: viadot.orchestration.prefect.flows.sharepoint_to_s3
::: viadot.orchestration.prefect.flows.cloud_for_customers_to_adls

::: viadot.orchestration.prefect.flows.transform
::: viadot.orchestration.prefect.flows.cloud_for_customers_to_databricks

::: viadot.orchestration.prefect.flows.transform_and_catalog
::: viadot.orchestration.prefect.flows.customer_gauge_to_adls

::: viadot.orchestration.prefect.flows.duckdb_to_parquet

Expand All @@ -28,18 +16,48 @@

::: viadot.orchestration.prefect.flows.epicor_to_parquet

::: viadot.orchestration.prefect.flows.exchange_rates_to_adls

::: viadot.orchestration.prefect.flows.exchange_rates_to_databricks

::: viadot.orchestration.prefect.flows.exchange_rates_api_to_redshift_spectrum

::: viadot.orchestration.prefect.flows.genesys_to_adls

::: viadot.orchestration.prefect.flows.hubspot_to_adls

::: viadot.orchestration.prefect.flows.mediatool_to_adls

::: viadot.orchestration.prefect.flows.mindful_to_adls

::: viadot.orchestration.prefect.flows.outlook_to_adls

::: viadot.orchestration.prefect.flows.salesforce_to_adls

::: viadot.orchestration.prefect.flows.sap_bw_to_adls

::: viadot.orchestration.prefect.flows.sap_to_parquet

::: viadot.orchestration.prefect.flows.sap_to_redshift_spectrum

::: viadot.orchestration.prefect.flows.sftp_to_adls

::: viadot.orchestration.prefect.flows.sharepoint_to_adls

::: viadot.orchestration.prefect.flows.sharepoint_to_databricks

::: viadot.orchestration.prefect.flows.sharepoint_to_redshift_spectrum

::: viadot.orchestration.prefect.flows.sharepoint_to_s3

::: viadot.orchestration.prefect.flows.sql_server_to_minio

::: viadot.orchestration.prefect.flows.sql_server_to_parquet

::: viadot.orchestration.prefect.flows.supermetrics_to_adls

::: viadot.orchestration.prefect.flows.transform

::: viadot.orchestration.prefect.flows.transform_and_catalog

::: viadot.orchestration.prefect.flows.vid_club_to_adls
48 changes: 35 additions & 13 deletions docs/references/orchestration/prefect/tasks.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,67 @@
::: viadot.orchestration.prefect.tasks.azure_sql_to_df

::: viadot.orchestration.prefect.tasks.adls_upload

::: viadot.orchestration.prefect.tasks.df_to_adls
::: viadot.orchestration.prefect.tasks.bcp

::: viadot.orchestration.prefect.tasks.cloud_for_customers_to_df
::: viadot.orchestration.prefect.tasks.clone_repo

::: viadot.orchestration.prefect.tasks.df_to_databricks
::: viadot.orchestration.prefect.tasks.bigquery_to_df

::: viadot.orchestration.prefect.tasks.dbt_task
::: viadot.orchestration.prefect.tasks.cloud_for_customers_to_df

::: viadot.orchestration.prefect.tasks.exchange_rates_to_df
::: viadot.orchestration.prefect.tasks.create_sql_server_table

::: viadot.orchestration.prefect.tasks.clone_repo
::: viadot.orchestration.prefect.tasks.customer_gauge_to_df

::: viadot.orchestration.prefect.tasks.luma_ingest_task
::: viadot.orchestration.prefect.tasks.dbt_task

::: viadot.orchestration.prefect.tasks.df_to_redshift_spectrum
::: viadot.orchestration.prefect.tasks.df_to_adls

::: viadot.orchestration.prefect.tasks.s3_upload_file
::: viadot.orchestration.prefect.tasks.df_to_databricks

::: viadot.orchestration.prefect.tasks.sharepoint_download_file
::: viadot.orchestration.prefect.tasks.df_to_minio

::: viadot.orchestration.prefect.tasks.sharepoint_to_df
::: viadot.orchestration.prefect.tasks.df_to_redshift_spectrum

::: viadot.orchestration.prefect.tasks.duckdb_query

::: viadot.orchestration.prefect.tasks.epicor_to_df

::: viadot.orchestration.prefect.tasks.exchange_rates_to_df

::: viadot.orchestration.prefect.tasks.genesys_to_df

::: viadot.orchestration.prefect.tasks.hubspot_to_df

::: viadot.orchestration.prefect.tasks.luma_ingest_task

::: viadot.orchestration.prefect.tasks.mediatool_to_df

::: viadot.orchestration.prefect.tasks.mindful_to_df

::: viadot.orchestration.prefect.tasks.outlook_to_df

::: viadot.orchestration.prefect.tasks.s3_upload_file

::: viadot.orchestration.prefect.tasks.salesforce_to_df

::: viadot.orchestration.prefect.tasks.sap_bw_to_df

::: viadot.orchestration.prefect.tasks.sap_rfc_to_df

::: viadot.orchestration.prefect.tasks.bcp
::: viadot.orchestration.prefect.tasks.sftp_list

::: viadot.orchestration.prefect.tasks.create_sql_server_table
::: viadot.orchestration.prefect.tasks.sftp_to_df

::: viadot.orchestration.prefect.tasks.sharepoint_download_file

::: viadot.orchestration.prefect.tasks.sharepoint_to_df

::: viadot.orchestration.prefect.tasks.sql_server_query

::: viadot.orchestration.prefect.tasks.sql_server_to_df

::: viadot.orchestration.prefect.tasks.vid_club_to_df

::: viadot.orchestration.prefect.tasks.supermetrics_to_df
28 changes: 19 additions & 9 deletions docs/references/sources/api.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,33 @@
# API Sources
# API sources

::: viadot.sources.uk_carbon_intensity.UKCarbonIntensity
::: viadot.sources.bigquery.BigQuery

::: viadot.sources.cloud_for_customers.CloudForCustomers

::: viadot.sources.exchange_rates.ExchangeRates
::: viadot.sources.customer_gauge.CustomerGauge

::: viadot.sources.cloud_for_customers.CloudForCustomers
::: viadot.sources.epicor.Epicor

::: viadot.sources.sharepoint.Sharepoint
::: viadot.sources.exchange_rates.ExchangeRates

::: viadot.sources.genesys.Genesys

::: viadot.sources.outlook.Outlook

::: viadot.sources.hubspot.Hubspot

::: viadot.sources.epicor.Epicor
::: viadot.sources.mediatool.Mediatool

::: viadot.sources.mindful.Mindful

::: viadot.sources.minio.MinIO
::: viadot.sources._minio.MinIO

::: viadot.sources.outlook.Outlook

::: viadot.sources.salesforce.Salesforce

::: viadot.sources.sharepoint.Sharepoint

::: viadot.sources.supermetrics.Supermetrics

::: viadot.sources.uk_carbon_intensity.UKCarbonIntensity

::: viadot.sources.vid_club.VidClub
Original file line number Diff line number Diff line change
@@ -1,25 +1,29 @@
# SQL Sources
# Database sources

::: viadot.sources.base.Source
::: viadot.sources.azure_data_lake.AzureDataLake

::: viadot.sources.base.SQL
::: viadot.sources.azure_sql.AzureSQL

::: viadot.sources.azure_data_lake.AzureDataLake
::: viadot.sources.databricks.Databricks

::: viadot.sources._duckdb.DuckDB

::: viadot.sources.redshift_spectrum.RedshiftSpectrum

::: viadot.sources.s3.S3

::: viadot.sources.sqlite.SQLite
::: viadot.sources.sap_bw.SAPBW

::: viadot.sources.sql_server.SQLServer
::: viadot.sources.sap_rfc.SAPRFC

::: viadot.sources.databricks.Databricks
::: viadot.sources.sap_rfc.SAPRFCV2

::: viadot.sources._trino.Trino
::: viadot.sources.base.Source

::: viadot.sources._duckdb.DuckDB
::: viadot.sources.base.SQL

::: viadot.sources.sap_rfc.SAPRFC
::: viadot.sources.sqlite.SQLite

::: viadot.sources.sap_rfc.SAPRFCV2
::: viadot.sources.sql_server.SQLServer

::: viadot.sources._trino.Trino
3 changes: 3 additions & 0 deletions docs/references/sources/other.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Other sources

::: viadot.sources.sftp.Sftp
5 changes: 3 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,9 @@ nav:

- References:
- Sources:
- SQL Sources: references/sources/sql.md
- API Sources: references/sources/api.md
- Database: references/sources/database.md
- API: references/sources/api.md
- Other: references/sources/other.md
- Orchestration:
- Prefect:
- Tasks: references/orchestration/prefect/tasks.md
Expand Down
4 changes: 4 additions & 0 deletions src/viadot/orchestration/prefect/flows/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""Import flows."""

from .azure_sql_to_adls import azure_sql_to_adls
from .bigquery_to_adls import bigquery_to_adls
from .cloud_for_customers_to_adls import cloud_for_customers_to_adls
from .cloud_for_customers_to_databricks import cloud_for_customers_to_databricks
Expand Down Expand Up @@ -31,9 +32,11 @@
from .supermetrics_to_adls import supermetrics_to_adls
from .transform import transform
from .transform_and_catalog import transform_and_catalog
from .vid_club_to_adls import vid_club_to_adls


__all__ = [
"azure_sql_to_adls",
"bigquery_to_adls",
"cloud_for_customers_to_adls",
"cloud_for_customers_to_databricks",
Expand Down Expand Up @@ -65,4 +68,5 @@
"transform",
"transform_and_catalog",
"eurostat_to_adls",
"vid_club_to_adls",
]
77 changes: 77 additions & 0 deletions src/viadot/orchestration/prefect/flows/azure_sql_to_adls.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
"""Flows for downloading data from Azure SQL and uploading it to Azure ADLS."""

from typing import Any

from prefect import flow
from prefect.task_runners import ConcurrentTaskRunner

from viadot.orchestration.prefect.tasks import azure_sql_to_df, df_to_adls


@flow(
name="Azure SQL extraction to ADLS",
description="Extract data from Azure SQL"
+ " and load it into Azure Data Lake Storage.",
retries=1,
retry_delay_seconds=60,
task_runner=ConcurrentTaskRunner,
log_prints=True,
)
def azure_sql_to_adls(
query: str | None = None,
credentials_secret: str | None = None,
validate_df_dict: dict[str, Any] | None = None,
convert_bytes: bool = False,
remove_special_characters: bool | None = None,
columns_to_clean: list[str] | None = None,
adls_config_key: str | None = None,
adls_azure_key_vault_secret: str | None = None,
adls_path: str | None = None,
adls_path_overwrite: bool = False,
) -> None:
r"""Download data from Azure SQL to a CSV file and uploading it to ADLS.
Args:
query (str): Query to perform on a database. Defaults to None.
credentials_secret (str, optional): The name of the Azure Key Vault
secret containing a dictionary with database credentials.
Defaults to None.
validate_df_dict (Dict[str], optional): A dictionary with optional list of
tests to verify the output dataframe. If defined, triggers the `validate_df`
task from task_utils. Defaults to None.
convert_bytes (bool). A boolean value to trigger method df_converts_bytes_to_int
It is used to convert bytes data type into int, as pulling data with bytes
can lead to malformed data in data frame.
Defaults to False.
remove_special_characters (str, optional): Call a function that remove
special characters like escape symbols. Defaults to None.
columns_to_clean (List(str), optional): Select columns to clean, used with
remove_special_characters. If None whole data frame will be processed.
Defaults to None.
adls_config_key (Optional[str], optional): The key in the viadot config holding
relevant credentials. Defaults to None.
adls_azure_key_vault_secret (Optional[str], optional): The name of the Azure Key
Vault secret containing a dictionary with ACCOUNT_NAME and Service Principal
credentials (TENANT_ID, CLIENT_ID, CLIENT_SECRET) for the Azure Data Lake.
Defaults to None.
adls_path (Optional[str], optional): Azure Data Lake destination file path (with
file name). Defaults to None.
adls_path_overwrite (bool, optional): Whether to overwrite the file in ADLS.
Defaults to True.
"""
data_frame = azure_sql_to_df(
query=query,
credentials_secret=credentials_secret,
validate_df_dict=validate_df_dict,
convert_bytes=convert_bytes,
remove_special_characters=remove_special_characters,
columns_to_clean=columns_to_clean,
)

return df_to_adls(
df=data_frame,
path=adls_path,
credentials_secret=adls_azure_key_vault_secret,
config_key=adls_config_key,
overwrite=adls_path_overwrite,
)
Loading

0 comments on commit 8b1cdce

Please sign in to comment.