Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharepoint - multiple files logic applied to the source class #942

Merged

Conversation

Rafalz13
Copy link
Contributor

@Rafalz13 Rafalz13 commented Jul 5, 2024

Summary

Logic was moved from the prefect-viadot task and flow to the Sharepoint source class. The code is refactored and all the options for handling single and multiple files are applied.

Importance

To have all related data extraction on the viadot source side.

Checklist

This PR:

  • follows the guidelines laid out in CONTRIBUTING.md
  • links relevant issue(s)
  • adds/updates tests (if appropriate)
  • adds/updates docstrings (if appropriate)
  • adds an entry in CHANGELOG.md

@Rafalz13 Rafalz13 self-assigned this Jul 18, 2024
@Rafalz13 Rafalz13 changed the base branch from 2.0 to 2.0-new-repository-structure August 6, 2024 12:48
@marcinpurtak marcinpurtak merged commit e08fc0a into 2.0-new-repository-structure Aug 6, 2024
1 check passed
@marcinpurtak marcinpurtak deleted the sharepoint_multiple_files_logic branch August 6, 2024 13:40
marcinpurtak added a commit that referenced this pull request Aug 13, 2024
* ⬆️ Relax sql-metadata version requirement (#940)

* ⬆️ Relax sql-metadata version requirement

* 📌 Update lockfiles

* ✨ Added `validate_and_reorder_dfs_columns` to utils

* ♻️ Added new version of Sharepoint source class with additional functions

* ✅ added tests for `validate_and_reorder_dfs_columns` function

* ✅ Created `sharepoint_mock` function and changed function name to `_download_file_stream`

* 📝 Updated docstring for Sharepoint source class and functions

* ⬆️ Relax sql-metadata version requirement (#940)

* ⬆️ Relax sql-metadata version requirement

* 📌 Update lockfiles

* 🚧 Modified `validate_and_reorder_dfs_columns`

* 🐛 Added `na_values` to `_load_and_parse` function

* 🐛 Added tests for Sharepoint functions

* 🐛 Added **kwargs to handle_multiple_files function

* 🚧 Added `dtypes=str` instead of functions

* ✅  Removed tests for not existing functions

* ✅ Added missing tests

* ✅ Added missing tests to sharepoint class methods

---------

Co-authored-by: Michał Zawadzki <[email protected]>
Co-authored-by: Marcin Purtak <[email protected]>
trymzet added a commit that referenced this pull request Aug 22, 2024
* 🐛 Fixed bug in `viadot-lite.Dockerfile`

* 🔖 Upgraded version to `2.0.0-alpha.1`

* 👷 Updated `docker-publish.yml`

* 🚚 Moved `orchiestration` folder into `src/viadot`

* 🚚 Renamed path from `prefect-viadot-test` to `prefect-test`

* 🔖 Bumped version to `2.0.0-alpha.2`

* ♻️ Synchronized `prefect-viadot` with `orchiestration/prefect`

* 🐛 Fixed import in `test_git.py`

* 🧱 Updated `docker-compose.yml`

* 🚚 Moved `prefect_viadot` to `src/viadot/orchestration`

* 🚚 Changes imports in prefect-viadot

* ⬆️ Added prefect-viadot dependencies to viadot

* ⬆️ Upgraded `prefect` dependencie

* 🔧 Updated `Dockerfile`

* ⬆️ Upgraded dependecies

* 🔥 Depreacted  `datahub.py`

* ➕ Added `viadot-azure` and `viadot-aws` dependecies

* 🧱 Added `viadot-azure.Dockerfile`

* 🐛  Added import error handlig to all optional sources

* 🐛 Fixed adls import

* 🧱 Added `viadot-aws.Dockerfile`

* 🐛 Fixed import errors in `prefect-viadot`

* ✅ Added prefect-viadot test and refactored viadot tests

* 🙈 Updated .gitignore file

* ➕ Added new dev dependencies

* 🧱 Removed not needed packages from `viadot-azure.Dockerfile`

* ➕ Added dependecies to `pyproject.toml`

* ⬆️ Upgraded `viadot-azure` packages

* 🐛   Fixed imports in viadot integration tests

* 🧱 Refacroed `viadot-azure.Dockerfile`

* ⬆️ Upgraded aws dependecies in `pyproject.toml`

* ⬆️ Upgraded dependecies

* 🧱 Added viadot-lite image

* ♻️ Refactored viadot-aws image

* 🧱 Updated `docker-compose.yml`

* 🐛 Fixed bug in `viadot-lite.Dockerfile`

* 🔖 Upgraded version to `2.0.0-alpha.1`

* 👷 Updated `docker-publish.yml`

* 🚚 Moved `orchiestration` folder into `src/viadot`

* 🚚 Renamed path from `prefect-viadot-test` to `prefect-test`

* 🔖 Bumped version to `2.0.0-alpha.2`

* ♻️ Synchronized `prefect-viadot` with `orchiestration/prefect`

* 🐛 Fixed import in `test_git.py`

* 🧱 Updated `docker-compose.yml`

* ➕ Added docs dependencies

* 🎨 Fixed rye formatting

* ➖ Removed duplicated dependecies

* 🐛 Fixed mkdocs config bug

* 🧱 Moved images into one multistage `Dockerfile` (#932)

* 🧱 Created multi-stage build of docker images

* 🔥 Removed old Dockerfiles

* 👷 Updated `docker-publish.yml`

* 🧱 Removed not more needed `.lock` files

* 🧱 Added `rye` into docker container

* 🧱 Left rye inside Docker image

* 🔖 Bumped version to `2.0.0-alpha.3`

* ⬇️ Downgraded `requests` package

* 🔖 Bumped to `2.0.0-alpha.4` version

* 🧱 Upgraded images in `docker-compose.yml`

* Add documentation for viadot 2.0 with new repository structure (#929)

* 📝 Created new directory structure for references tab

* 📝 Added `Getting Started` section in docs

* 📝 Added `User Guide` section

* 📝 Refactored docs structure

* 📝 Added new user guide

* ✨ Added script to synchronize `.lock` files

* 📝 Added `Managing dependencies` section in docs

* 📝 Fixed typos in docs

* 📝 Improved tutorial about adding source and flows

* 📝 Removed `Manging dependecies section`

* 📝 Added flow and task referencies

* 📝 Updated link in documentation

* 📝 Updated docs in `user_guide/config_key.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Updated docs in `user_guide/adding_source.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Updated docs in `user_guide/adding_prefect_flow.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Updated docs in `user_guide/adding_prefect_flow.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Updated docs in `user_guide/adding_prefect_flow.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Added description about `to_df()` in `adding_source.md`

* 📝 Improved docs in` adding_prefect_flow.md`

* 📝 Removed badges from `index.md`

* 📝 Added `Advanced Usage` section

* 📝 Moved docs about Rye into `CONTRIBUTING.md`

* 📝 Moved docker tutorial section form docs to `CONTRIBUTING.md`

* 📝 Updated `CONTRIBUTING.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Removed Rye description from `CONTRIBUTING.md`

---------

Co-authored-by: Michał Zawadzki <[email protected]>

* 🚚 Moved `prefect_viadot` to `src/viadot/orchestration`

* 🚚 Changes imports in prefect-viadot

* ⬆️ Added prefect-viadot dependencies to viadot

* ⬆️ Upgraded `prefect` dependencie

* 🔧 Updated `Dockerfile`

* ⬆️ Upgraded dependecies

* 🔥 Depreacted  `datahub.py`

* ➕ Added `viadot-azure` and `viadot-aws` dependecies

* 🧱 Added `viadot-azure.Dockerfile`

* 🐛  Added import error handlig to all optional sources

* 🐛 Fixed adls import

* 🧱 Added `viadot-aws.Dockerfile`

* 🐛 Fixed import errors in `prefect-viadot`

* ✅ Added prefect-viadot test and refactored viadot tests

* 🙈 Updated .gitignore file

* ➕ Added new dev dependencies

* 🧱 Removed not needed packages from `viadot-azure.Dockerfile`

* ➕ Added dependecies to `pyproject.toml`

* ⬆️ Upgraded `viadot-azure` packages

* 🐛   Fixed imports in viadot integration tests

* 🧱 Refacroed `viadot-azure.Dockerfile`

* ⬆️ Upgraded aws dependecies in `pyproject.toml`

* ⬆️ Upgraded dependecies

* 🧱 Added viadot-lite image

* ♻️ Refactored viadot-aws image

* 🧱 Updated `docker-compose.yml`

* 🐛 Fixed bug in `viadot-lite.Dockerfile`

* 🔖 Upgraded version to `2.0.0-alpha.1`

* 👷 Updated `docker-publish.yml`

* 🚚 Moved `orchiestration` folder into `src/viadot`

* 🚚 Renamed path from `prefect-viadot-test` to `prefect-test`

* 🔖 Bumped version to `2.0.0-alpha.2`

* ♻️ Synchronized `prefect-viadot` with `orchiestration/prefect`

* 🐛 Fixed import in `test_git.py`

* 🧱 Updated `docker-compose.yml`

* ➕ Added docs dependencies

* 🎨 Fixed rye formatting

* ➖ Removed duplicated dependecies

* 🧱 Moved images into one multistage `Dockerfile` (#932)

* 🧱 Created multi-stage build of docker images

* 🔥 Removed old Dockerfiles

* 👷 Updated `docker-publish.yml`

* 🧱 Removed not more needed `.lock` files

* 🧱 Added `rye` into docker container

* 🧱 Left rye inside Docker image

* 🔖 Bumped version to `2.0.0-alpha.3`

* ⬇️ Downgraded `requests` package

* 🔖 Bumped to `2.0.0-alpha.4` version

* 🧱 Upgraded images in `docker-compose.yml`

* Add documentation for viadot 2.0 with new repository structure (#929)

* 📝 Created new directory structure for references tab

* 📝 Added `Getting Started` section in docs

* 📝 Added `User Guide` section

* 📝 Refactored docs structure

* 📝 Added new user guide

* ✨ Added script to synchronize `.lock` files

* 📝 Added `Managing dependencies` section in docs

* 📝 Fixed typos in docs

* 📝 Improved tutorial about adding source and flows

* 📝 Removed `Manging dependecies section`

* 📝 Added flow and task referencies

* 📝 Updated link in documentation

* 📝 Updated docs in `user_guide/config_key.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Updated docs in `user_guide/adding_source.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Updated docs in `user_guide/adding_prefect_flow.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Updated docs in `user_guide/adding_prefect_flow.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Updated docs in `user_guide/adding_prefect_flow.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Added description about `to_df()` in `adding_source.md`

* 📝 Improved docs in` adding_prefect_flow.md`

* 📝 Removed badges from `index.md`

* 📝 Added `Advanced Usage` section

* 📝 Moved docs about Rye into `CONTRIBUTING.md`

* 📝 Moved docker tutorial section form docs to `CONTRIBUTING.md`

* 📝 Updated `CONTRIBUTING.md`

Co-authored-by: Michał Zawadzki <[email protected]>

* 📝 Removed Rye description from `CONTRIBUTING.md`

---------

Co-authored-by: Michał Zawadzki <[email protected]>

* ✨ Added new param to `sharepoint_to_readshift_spectrum`

* ✨ Added new param to `sharepoint.py`

* ✨ Added `basename_template` to MinIO source

* ✨ Added `SQLServer` source and tasks for it

* ✨ Added handling for `DatabaseCredentials` and `Secret` in get_credentials

* ✨ Added `df_to_minio` task for prefect

* Added `sql_server_to_minio` flow for prefect

* ✅ Added tests sql_server_to_minio

* 📝 Updated changelog with `sql_server_to_mino` and related functions

* 🐛 Added missing package to Dockerfile

* ⬆️ Upgraded `prefect` version to `2.19.7`

* 🔖 Bumped viadot version to `2.0.0-alpha.5`

* ✅ Added tests

* 🎨 Updated credentials options

* 🔧 Updated docker setup

* 🎨 Updated data type

* 🎨 Added contexlib for MinIO

* 📝 Updated requirements.lock `s

* 📝 Updates SQL Server docs

* 🎨 Added whitespaces

* ⬇️ Downgraded dependecies

* 🔖 Bumped viadot to version `2.0.0-alpha.6`

* 📝 updated CHANGELOG.md

* ✨ updated Outlook connector version 1.

* ✨ updated Outlook connector version 2.

* 📝 updated docstrings.

* ✅ added outlook test file.

* 👔 updated some files to aling the rebase.

* 📝 updated CHANGELOG.md

* ✨ added Hubspot connector version 1.

* ✅ added hubspot test file.

* 📝 updated docstrings.

* ✅ updated local lock file.

* 🔊 updated logger in source.

* 👔 updated some files to aling the rebase.

* 👔 updated some more files to aling the rebase.

* 📝 updated CHANGELOG.

* ✨ added Mindful to __init__ files.

* ✨ created new Minsful connector.

* 🎨 updated mindful flow and task connector.

* ✅ added mindful test file.

* 📝 updated mindful docstrings.

* ⚡️ added sep parameter in adls task.

* 🔊 updated logs.

* 📝 updated docstrings.

* 🔊 updated logger in source.

* 👔 updated some files to aling the rebase.

* 📝 update CHANGELOG.md and __init__ files.

* ✨ added Genesys file structure version 1.

* 📝 updated rebased files.

* ✨ added Genesys file structure version 2.

* ✨ added Genesys file structure version 3.

* 📝 adding some extra log information.

* ✨ added Genesys file structure version 4.

* ✅ added genesys test files.

* ✅ upsted genesys test file.

* 🔊 updated logger in source.

* 👔 updated some files to aling the rebase.

* 📝 updated docstring.

* 🎨 implemented flake8 and pylint tests.

* 💄 added prints to source level.

* 📝 updated variable names.

* Duckdb connectors (#945)

* 🚚 Changed tasks utils location

* ✨ Created DuckDB connectors

* ✨ Created BCP task

* 🎨 Formatted code with black

* ✅ Added tests

* 📝 Updated changelog with duckdb connectors

* 🔥 Removed irrelevant docstring

* 🔥 Removed irrelevant code

* 🎨 Cleaned up the code

* 🎨 Cleaned up the code

* 📝 Updated docstring

* ✅ Updated DuckDB test

* 🔥 removed else statement

* ⏪ Reverted change from previous commit

---------

Co-authored-by: angelika233 <[email protected]>

* Delete .python_history

* ✅ updated test file.

* 🎨 updated code performance.

* ✅ updated test file.

* c4c code checker passed and tests coverage passed

* 🎨 updated code performance.

* ✅ updated test file.

* 🎨 updated code performance.

* ✅ updated test file.

* flows_tasks_for c4c

* ✅ updated test file to reach 80% coverage.

* ✏️  corrected a typo.

* ✅ updated test file to reach 80% coverage.

* ✅ updated test file.

* ✏️ fixed a typo.

* ✏️  fixed another typo.

* ✨ Added sap_to_parquet flow (#947)

* ✨ Added sap_to_parquet flow and tests

* ⚡️Change  parameters names

* 🎨 Changed credentials

* 🎨 Change creds

* 📝 Updated changelog

* 🎨 Formatted code with black

* 📝 Improved docstring

* 📝 Update docstring

* ✅ Updated test

* ✏️ Fixed typo in sql server source

* 📝 Added info about typo to changelog

* ✅ updated test file to reach 80% coverage.

* ✅ updated test file.

* ✅ updated test file.

* ✅ updated test file to reach 80% coverage.

* 🦺 added `return` in flow file.

* 🦺 added `return` in flow file.

* 🦺 added `return` in flow file.

* 🦺 added `return` in flow file.

* ✅ added test integration file.

* ✅ added test integration file.

* ✅ added test integration file.

* 📝 updated credential typo.

* ✅ added test integration file.

* ➕ Added `duckdb` to dependecies

* ➕ Added `prefect-aws` dependecy

* 🚀 Relase 2.0.0-beta.1

* cloud for customer improvement

* recover gitignore

* removing unuseless files

* docker initial

* rollback gitignore

* update ignore

* rollback gitignore

* remove unuseless file

* Sharepoint orchestration code refactor (#950)

* ✨ Moved sharepoint tasks from prefect_viadot repo

* ✨ Moved sharepoint_to_redshift_spectrum flow from prefect_viadot repo

* 🔥 Cleaned up init for prefect tasks

* Added `viadot.orchestration.prefect`

* Sharepoint  - multiple files logic applied to the source class (#942)

* ⬆️ Relax sql-metadata version requirement (#940)

* ⬆️ Relax sql-metadata version requirement

* 📌 Update lockfiles

* ✨ Added `validate_and_reorder_dfs_columns` to utils

* ♻️ Added new version of Sharepoint source class with additional functions

* ✅ added tests for `validate_and_reorder_dfs_columns` function

* ✅ Created `sharepoint_mock` function and changed function name to `_download_file_stream`

* 📝 Updated docstring for Sharepoint source class and functions

* ⬆️ Relax sql-metadata version requirement (#940)

* ⬆️ Relax sql-metadata version requirement

* 📌 Update lockfiles

* 🚧 Modified `validate_and_reorder_dfs_columns`

* 🐛 Added `na_values` to `_load_and_parse` function

* 🐛 Added tests for Sharepoint functions

* 🐛 Added **kwargs to handle_multiple_files function

* 🚧 Added `dtypes=str` instead of functions

* ✅  Removed tests for not existing functions

* ✅ Added missing tests

* ✅ Added missing tests to sharepoint class methods

---------

Co-authored-by: Michał Zawadzki <[email protected]>
Co-authored-by: Marcin Purtak <[email protected]>

* ✨ Added 0365 (#969)

* ✨ Added 0365

* 🚧  Moved `0365` to dependencies

* Orchestration last changes (#953)

* 🚚 Move manual actions to a subfolder

* 🐛 Fix the incorrect test dir structure & duplication

* 👷 Add CD workflow

* 📌 Update MSSQL driver version & pin mssql-tools

* 📝 Update container instructions

* 👷 Add linter rules

* 🐛 Fix typo

* ♻️ Do not install Databricks by default

* ♻️ Use standard file name

* ♻️ Readd commented out test

* ♻️ Refactor some weird stuff

* ♻️ Fix typo

* ♻️ Remove duplicated docstrings

* ♻️ Remove clutter

* 🎨 Linting

* 🚨 Docs & linting

* 📝 Improve the user guide

* 📝 Further docs improvements

* 📝 Further docs improvements

* 📝 Docs some more

* 📝 Docs - final touches

* 📝 Improve example

* 🚚 Move to correct path

* 🔥 Remove duplicate tests

* ♻️ Update tests

* 📝 Minor improvement

* 🚨 Lint utils

* 🔒️ Remove the insecure `credentials` param

* 🚨 More linting

* 📝 Add SAP RFC installation instructions

* 🚨 More linting

* ⬆️ Bump pyarrow
Fixes #970

* 📌 Update lockfiles

* ✅ Fix all unit tests

* 🔥 Remove dead code

* 🚨 Lint tests

* ✅ Skip broken tests

* 🚨 Lint all remaining tests

* 🚨 Fix remaining linter warnings

* 📌 Update lock files

* ♻️ Minor fixes

* 🧑‍💻 Also publish `latest` tags for all images
For use eg. in the docker-compose file.

* 🐛 Fix typo

* ✨ Add GitHub release step

* 📝 Document the new release process

* 📌 Bump version

* ♻️ Add last changes from other branches

* ♻️ Update some sources' test configuration to match rest of lib

* 📝 Add more docs on contributing

* 📝 Update a link

* 🐛 Update lock files, removing optional deps

* ⬆️ Update dependencies

* 🚨 Linting

* 🐛 Add TOML support to coverage

* ✅ Fix `_cast_df()` test failing on datetimes in pandas 2.0

* ⬆️ Run CI on Python 3.12

* ➖ Remove unused `pytest-cov`

* ⬆️ Upgrade Python version so Rye CI action uses 3.12

* ⬆️ Upgrade Python to 3.12 in the images

* 📝 Improve container env docs

* ⬇️ Rollback `pyarrow` to v10.x

Also roll back Python to 3.10 as this `pyarrow` version is not compatible with Python 3.12.

* ♻️ Use a `skip_test_on_missing_extra()` utils to simplify life

* 🧑‍💻 Install dev dependencies in local containers

* 🐛 Fix for broken `numpy` version

* 🚧 RedshiftSpectrum source unit tests - WIP

---------

Co-authored-by: Diego-H-S <[email protected]>
Co-authored-by: Michał Zawadzki <[email protected]>
Co-authored-by: angelika233 <[email protected]>
Co-authored-by: Angelika Tarnawa <[email protected]>
Co-authored-by: fdelgadodyvenia <[email protected]>
Co-authored-by: Natalia Walczak <[email protected]>
Co-authored-by: Diego <[email protected]>
Co-authored-by: Fabio Delgado <[email protected]>
Co-authored-by: Rafał Ziemianek <[email protected]>
Co-authored-by: Marcin Purtak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants