-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch '2.0' into eurostat_2.0
- Loading branch information
Showing
26 changed files
with
1,420 additions
and
101 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
docs/references/sources/sql.md | ||
docs/references/sources/database.md | ||
docs/references/sources/api.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,45 +1,67 @@ | ||
::: viadot.orchestration.prefect.tasks.azure_sql_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.adls_upload | ||
|
||
::: viadot.orchestration.prefect.tasks.df_to_adls | ||
::: viadot.orchestration.prefect.tasks.bcp | ||
|
||
::: viadot.orchestration.prefect.tasks.cloud_for_customers_to_df | ||
::: viadot.orchestration.prefect.tasks.clone_repo | ||
|
||
::: viadot.orchestration.prefect.tasks.df_to_databricks | ||
::: viadot.orchestration.prefect.tasks.bigquery_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.dbt_task | ||
::: viadot.orchestration.prefect.tasks.cloud_for_customers_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.exchange_rates_to_df | ||
::: viadot.orchestration.prefect.tasks.create_sql_server_table | ||
|
||
::: viadot.orchestration.prefect.tasks.clone_repo | ||
::: viadot.orchestration.prefect.tasks.customer_gauge_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.luma_ingest_task | ||
::: viadot.orchestration.prefect.tasks.dbt_task | ||
|
||
::: viadot.orchestration.prefect.tasks.df_to_redshift_spectrum | ||
::: viadot.orchestration.prefect.tasks.df_to_adls | ||
|
||
::: viadot.orchestration.prefect.tasks.s3_upload_file | ||
::: viadot.orchestration.prefect.tasks.df_to_databricks | ||
|
||
::: viadot.orchestration.prefect.tasks.sharepoint_download_file | ||
::: viadot.orchestration.prefect.tasks.df_to_minio | ||
|
||
::: viadot.orchestration.prefect.tasks.sharepoint_to_df | ||
::: viadot.orchestration.prefect.tasks.df_to_redshift_spectrum | ||
|
||
::: viadot.orchestration.prefect.tasks.duckdb_query | ||
|
||
::: viadot.orchestration.prefect.tasks.epicor_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.exchange_rates_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.genesys_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.hubspot_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.luma_ingest_task | ||
|
||
::: viadot.orchestration.prefect.tasks.mediatool_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.mindful_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.outlook_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.s3_upload_file | ||
|
||
::: viadot.orchestration.prefect.tasks.salesforce_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.sap_bw_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.sap_rfc_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.bcp | ||
::: viadot.orchestration.prefect.tasks.sftp_list | ||
|
||
::: viadot.orchestration.prefect.tasks.create_sql_server_table | ||
::: viadot.orchestration.prefect.tasks.sftp_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.sharepoint_download_file | ||
|
||
::: viadot.orchestration.prefect.tasks.sharepoint_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.sql_server_query | ||
|
||
::: viadot.orchestration.prefect.tasks.sql_server_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.vid_club_to_df | ||
|
||
::: viadot.orchestration.prefect.tasks.supermetrics_to_df |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,23 +1,33 @@ | ||
# API Sources | ||
# API sources | ||
|
||
::: viadot.sources.uk_carbon_intensity.UKCarbonIntensity | ||
::: viadot.sources.bigquery.BigQuery | ||
|
||
::: viadot.sources.cloud_for_customers.CloudForCustomers | ||
|
||
::: viadot.sources.exchange_rates.ExchangeRates | ||
::: viadot.sources.customer_gauge.CustomerGauge | ||
|
||
::: viadot.sources.cloud_for_customers.CloudForCustomers | ||
::: viadot.sources.epicor.Epicor | ||
|
||
::: viadot.sources.sharepoint.Sharepoint | ||
::: viadot.sources.exchange_rates.ExchangeRates | ||
|
||
::: viadot.sources.genesys.Genesys | ||
|
||
::: viadot.sources.outlook.Outlook | ||
|
||
::: viadot.sources.hubspot.Hubspot | ||
|
||
::: viadot.sources.epicor.Epicor | ||
::: viadot.sources.mediatool.Mediatool | ||
|
||
::: viadot.sources.mindful.Mindful | ||
|
||
::: viadot.sources.minio.MinIO | ||
::: viadot.sources._minio.MinIO | ||
|
||
::: viadot.sources.outlook.Outlook | ||
|
||
::: viadot.sources.salesforce.Salesforce | ||
|
||
::: viadot.sources.sharepoint.Sharepoint | ||
|
||
::: viadot.sources.supermetrics.Supermetrics | ||
|
||
::: viadot.sources.uk_carbon_intensity.UKCarbonIntensity | ||
|
||
::: viadot.sources.vid_club.VidClub |
26 changes: 15 additions & 11 deletions
26
docs/references/sources/sql.md → docs/references/sources/database.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,29 @@ | ||
# SQL Sources | ||
# Database sources | ||
|
||
::: viadot.sources.base.Source | ||
::: viadot.sources.azure_data_lake.AzureDataLake | ||
|
||
::: viadot.sources.base.SQL | ||
::: viadot.sources.azure_sql.AzureSQL | ||
|
||
::: viadot.sources.azure_data_lake.AzureDataLake | ||
::: viadot.sources.databricks.Databricks | ||
|
||
::: viadot.sources._duckdb.DuckDB | ||
|
||
::: viadot.sources.redshift_spectrum.RedshiftSpectrum | ||
|
||
::: viadot.sources.s3.S3 | ||
|
||
::: viadot.sources.sqlite.SQLite | ||
::: viadot.sources.sap_bw.SAPBW | ||
|
||
::: viadot.sources.sql_server.SQLServer | ||
::: viadot.sources.sap_rfc.SAPRFC | ||
|
||
::: viadot.sources.databricks.Databricks | ||
::: viadot.sources.sap_rfc.SAPRFCV2 | ||
|
||
::: viadot.sources._trino.Trino | ||
::: viadot.sources.base.Source | ||
|
||
::: viadot.sources._duckdb.DuckDB | ||
::: viadot.sources.base.SQL | ||
|
||
::: viadot.sources.sap_rfc.SAPRFC | ||
::: viadot.sources.sqlite.SQLite | ||
|
||
::: viadot.sources.sap_rfc.SAPRFCV2 | ||
::: viadot.sources.sql_server.SQLServer | ||
|
||
::: viadot.sources._trino.Trino |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Other sources | ||
|
||
::: viadot.sources.sftp.Sftp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
77 changes: 77 additions & 0 deletions
77
src/viadot/orchestration/prefect/flows/azure_sql_to_adls.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
"""Flows for downloading data from Azure SQL and uploading it to Azure ADLS.""" | ||
|
||
from typing import Any | ||
|
||
from prefect import flow | ||
from prefect.task_runners import ConcurrentTaskRunner | ||
|
||
from viadot.orchestration.prefect.tasks import azure_sql_to_df, df_to_adls | ||
|
||
|
||
@flow( | ||
name="Azure SQL extraction to ADLS", | ||
description="Extract data from Azure SQL" | ||
+ " and load it into Azure Data Lake Storage.", | ||
retries=1, | ||
retry_delay_seconds=60, | ||
task_runner=ConcurrentTaskRunner, | ||
log_prints=True, | ||
) | ||
def azure_sql_to_adls( | ||
query: str | None = None, | ||
credentials_secret: str | None = None, | ||
validate_df_dict: dict[str, Any] | None = None, | ||
convert_bytes: bool = False, | ||
remove_special_characters: bool | None = None, | ||
columns_to_clean: list[str] | None = None, | ||
adls_config_key: str | None = None, | ||
adls_azure_key_vault_secret: str | None = None, | ||
adls_path: str | None = None, | ||
adls_path_overwrite: bool = False, | ||
) -> None: | ||
r"""Download data from Azure SQL to a CSV file and uploading it to ADLS. | ||
Args: | ||
query (str): Query to perform on a database. Defaults to None. | ||
credentials_secret (str, optional): The name of the Azure Key Vault | ||
secret containing a dictionary with database credentials. | ||
Defaults to None. | ||
validate_df_dict (Dict[str], optional): A dictionary with optional list of | ||
tests to verify the output dataframe. If defined, triggers the `validate_df` | ||
task from task_utils. Defaults to None. | ||
convert_bytes (bool). A boolean value to trigger method df_converts_bytes_to_int | ||
It is used to convert bytes data type into int, as pulling data with bytes | ||
can lead to malformed data in data frame. | ||
Defaults to False. | ||
remove_special_characters (str, optional): Call a function that remove | ||
special characters like escape symbols. Defaults to None. | ||
columns_to_clean (List(str), optional): Select columns to clean, used with | ||
remove_special_characters. If None whole data frame will be processed. | ||
Defaults to None. | ||
adls_config_key (Optional[str], optional): The key in the viadot config holding | ||
relevant credentials. Defaults to None. | ||
adls_azure_key_vault_secret (Optional[str], optional): The name of the Azure Key | ||
Vault secret containing a dictionary with ACCOUNT_NAME and Service Principal | ||
credentials (TENANT_ID, CLIENT_ID, CLIENT_SECRET) for the Azure Data Lake. | ||
Defaults to None. | ||
adls_path (Optional[str], optional): Azure Data Lake destination file path (with | ||
file name). Defaults to None. | ||
adls_path_overwrite (bool, optional): Whether to overwrite the file in ADLS. | ||
Defaults to True. | ||
""" | ||
data_frame = azure_sql_to_df( | ||
query=query, | ||
credentials_secret=credentials_secret, | ||
validate_df_dict=validate_df_dict, | ||
convert_bytes=convert_bytes, | ||
remove_special_characters=remove_special_characters, | ||
columns_to_clean=columns_to_clean, | ||
) | ||
|
||
return df_to_adls( | ||
df=data_frame, | ||
path=adls_path, | ||
credentials_secret=adls_azure_key_vault_secret, | ||
config_key=adls_config_key, | ||
overwrite=adls_path_overwrite, | ||
) |
Oops, something went wrong.