-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharepoint - multiple files logic applied to the source class #942
Merged
marcinpurtak
merged 17 commits into
2.0-new-repository-structure
from
sharepoint_multiple_files_logic
Aug 6, 2024
Merged
Sharepoint - multiple files logic applied to the source class #942
marcinpurtak
merged 17 commits into
2.0-new-repository-structure
from
sharepoint_multiple_files_logic
Aug 6, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* ⬆️ Relax sql-metadata version requirement * 📌 Update lockfiles
…ownload_file_stream`
* ⬆️ Relax sql-metadata version requirement * 📌 Update lockfiles
marcinpurtak
approved these changes
Jul 11, 2024
trymzet
approved these changes
Jul 12, 2024
marcinpurtak
approved these changes
Aug 6, 2024
marcinpurtak
added a commit
that referenced
this pull request
Aug 13, 2024
* ⬆️ Relax sql-metadata version requirement (#940) * ⬆️ Relax sql-metadata version requirement * 📌 Update lockfiles * ✨ Added `validate_and_reorder_dfs_columns` to utils * ♻️ Added new version of Sharepoint source class with additional functions * ✅ added tests for `validate_and_reorder_dfs_columns` function * ✅ Created `sharepoint_mock` function and changed function name to `_download_file_stream` * 📝 Updated docstring for Sharepoint source class and functions * ⬆️ Relax sql-metadata version requirement (#940) * ⬆️ Relax sql-metadata version requirement * 📌 Update lockfiles * 🚧 Modified `validate_and_reorder_dfs_columns` * 🐛 Added `na_values` to `_load_and_parse` function * 🐛 Added tests for Sharepoint functions * 🐛 Added **kwargs to handle_multiple_files function * 🚧 Added `dtypes=str` instead of functions * ✅ Removed tests for not existing functions * ✅ Added missing tests * ✅ Added missing tests to sharepoint class methods --------- Co-authored-by: Michał Zawadzki <[email protected]> Co-authored-by: Marcin Purtak <[email protected]>
trymzet
added a commit
that referenced
this pull request
Aug 22, 2024
* 🐛 Fixed bug in `viadot-lite.Dockerfile` * 🔖 Upgraded version to `2.0.0-alpha.1` * 👷 Updated `docker-publish.yml` * 🚚 Moved `orchiestration` folder into `src/viadot` * 🚚 Renamed path from `prefect-viadot-test` to `prefect-test` * 🔖 Bumped version to `2.0.0-alpha.2` * ♻️ Synchronized `prefect-viadot` with `orchiestration/prefect` * 🐛 Fixed import in `test_git.py` * 🧱 Updated `docker-compose.yml` * 🚚 Moved `prefect_viadot` to `src/viadot/orchestration` * 🚚 Changes imports in prefect-viadot * ⬆️ Added prefect-viadot dependencies to viadot * ⬆️ Upgraded `prefect` dependencie * 🔧 Updated `Dockerfile` * ⬆️ Upgraded dependecies * 🔥 Depreacted `datahub.py` * ➕ Added `viadot-azure` and `viadot-aws` dependecies * 🧱 Added `viadot-azure.Dockerfile` * 🐛 Added import error handlig to all optional sources * 🐛 Fixed adls import * 🧱 Added `viadot-aws.Dockerfile` * 🐛 Fixed import errors in `prefect-viadot` * ✅ Added prefect-viadot test and refactored viadot tests * 🙈 Updated .gitignore file * ➕ Added new dev dependencies * 🧱 Removed not needed packages from `viadot-azure.Dockerfile` * ➕ Added dependecies to `pyproject.toml` * ⬆️ Upgraded `viadot-azure` packages * 🐛 Fixed imports in viadot integration tests * 🧱 Refacroed `viadot-azure.Dockerfile` * ⬆️ Upgraded aws dependecies in `pyproject.toml` * ⬆️ Upgraded dependecies * 🧱 Added viadot-lite image * ♻️ Refactored viadot-aws image * 🧱 Updated `docker-compose.yml` * 🐛 Fixed bug in `viadot-lite.Dockerfile` * 🔖 Upgraded version to `2.0.0-alpha.1` * 👷 Updated `docker-publish.yml` * 🚚 Moved `orchiestration` folder into `src/viadot` * 🚚 Renamed path from `prefect-viadot-test` to `prefect-test` * 🔖 Bumped version to `2.0.0-alpha.2` * ♻️ Synchronized `prefect-viadot` with `orchiestration/prefect` * 🐛 Fixed import in `test_git.py` * 🧱 Updated `docker-compose.yml` * ➕ Added docs dependencies * 🎨 Fixed rye formatting * ➖ Removed duplicated dependecies * 🐛 Fixed mkdocs config bug * 🧱 Moved images into one multistage `Dockerfile` (#932) * 🧱 Created multi-stage build of docker images * 🔥 Removed old Dockerfiles * 👷 Updated `docker-publish.yml` * 🧱 Removed not more needed `.lock` files * 🧱 Added `rye` into docker container * 🧱 Left rye inside Docker image * 🔖 Bumped version to `2.0.0-alpha.3` * ⬇️ Downgraded `requests` package * 🔖 Bumped to `2.0.0-alpha.4` version * 🧱 Upgraded images in `docker-compose.yml` * Add documentation for viadot 2.0 with new repository structure (#929) * 📝 Created new directory structure for references tab * 📝 Added `Getting Started` section in docs * 📝 Added `User Guide` section * 📝 Refactored docs structure * 📝 Added new user guide * ✨ Added script to synchronize `.lock` files * 📝 Added `Managing dependencies` section in docs * 📝 Fixed typos in docs * 📝 Improved tutorial about adding source and flows * 📝 Removed `Manging dependecies section` * 📝 Added flow and task referencies * 📝 Updated link in documentation * 📝 Updated docs in `user_guide/config_key.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Updated docs in `user_guide/adding_source.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Updated docs in `user_guide/adding_prefect_flow.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Updated docs in `user_guide/adding_prefect_flow.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Updated docs in `user_guide/adding_prefect_flow.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Added description about `to_df()` in `adding_source.md` * 📝 Improved docs in` adding_prefect_flow.md` * 📝 Removed badges from `index.md` * 📝 Added `Advanced Usage` section * 📝 Moved docs about Rye into `CONTRIBUTING.md` * 📝 Moved docker tutorial section form docs to `CONTRIBUTING.md` * 📝 Updated `CONTRIBUTING.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Removed Rye description from `CONTRIBUTING.md` --------- Co-authored-by: Michał Zawadzki <[email protected]> * 🚚 Moved `prefect_viadot` to `src/viadot/orchestration` * 🚚 Changes imports in prefect-viadot * ⬆️ Added prefect-viadot dependencies to viadot * ⬆️ Upgraded `prefect` dependencie * 🔧 Updated `Dockerfile` * ⬆️ Upgraded dependecies * 🔥 Depreacted `datahub.py` * ➕ Added `viadot-azure` and `viadot-aws` dependecies * 🧱 Added `viadot-azure.Dockerfile` * 🐛 Added import error handlig to all optional sources * 🐛 Fixed adls import * 🧱 Added `viadot-aws.Dockerfile` * 🐛 Fixed import errors in `prefect-viadot` * ✅ Added prefect-viadot test and refactored viadot tests * 🙈 Updated .gitignore file * ➕ Added new dev dependencies * 🧱 Removed not needed packages from `viadot-azure.Dockerfile` * ➕ Added dependecies to `pyproject.toml` * ⬆️ Upgraded `viadot-azure` packages * 🐛 Fixed imports in viadot integration tests * 🧱 Refacroed `viadot-azure.Dockerfile` * ⬆️ Upgraded aws dependecies in `pyproject.toml` * ⬆️ Upgraded dependecies * 🧱 Added viadot-lite image * ♻️ Refactored viadot-aws image * 🧱 Updated `docker-compose.yml` * 🐛 Fixed bug in `viadot-lite.Dockerfile` * 🔖 Upgraded version to `2.0.0-alpha.1` * 👷 Updated `docker-publish.yml` * 🚚 Moved `orchiestration` folder into `src/viadot` * 🚚 Renamed path from `prefect-viadot-test` to `prefect-test` * 🔖 Bumped version to `2.0.0-alpha.2` * ♻️ Synchronized `prefect-viadot` with `orchiestration/prefect` * 🐛 Fixed import in `test_git.py` * 🧱 Updated `docker-compose.yml` * ➕ Added docs dependencies * 🎨 Fixed rye formatting * ➖ Removed duplicated dependecies * 🧱 Moved images into one multistage `Dockerfile` (#932) * 🧱 Created multi-stage build of docker images * 🔥 Removed old Dockerfiles * 👷 Updated `docker-publish.yml` * 🧱 Removed not more needed `.lock` files * 🧱 Added `rye` into docker container * 🧱 Left rye inside Docker image * 🔖 Bumped version to `2.0.0-alpha.3` * ⬇️ Downgraded `requests` package * 🔖 Bumped to `2.0.0-alpha.4` version * 🧱 Upgraded images in `docker-compose.yml` * Add documentation for viadot 2.0 with new repository structure (#929) * 📝 Created new directory structure for references tab * 📝 Added `Getting Started` section in docs * 📝 Added `User Guide` section * 📝 Refactored docs structure * 📝 Added new user guide * ✨ Added script to synchronize `.lock` files * 📝 Added `Managing dependencies` section in docs * 📝 Fixed typos in docs * 📝 Improved tutorial about adding source and flows * 📝 Removed `Manging dependecies section` * 📝 Added flow and task referencies * 📝 Updated link in documentation * 📝 Updated docs in `user_guide/config_key.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Updated docs in `user_guide/adding_source.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Updated docs in `user_guide/adding_prefect_flow.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Updated docs in `user_guide/adding_prefect_flow.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Updated docs in `user_guide/adding_prefect_flow.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Added description about `to_df()` in `adding_source.md` * 📝 Improved docs in` adding_prefect_flow.md` * 📝 Removed badges from `index.md` * 📝 Added `Advanced Usage` section * 📝 Moved docs about Rye into `CONTRIBUTING.md` * 📝 Moved docker tutorial section form docs to `CONTRIBUTING.md` * 📝 Updated `CONTRIBUTING.md` Co-authored-by: Michał Zawadzki <[email protected]> * 📝 Removed Rye description from `CONTRIBUTING.md` --------- Co-authored-by: Michał Zawadzki <[email protected]> * ✨ Added new param to `sharepoint_to_readshift_spectrum` * ✨ Added new param to `sharepoint.py` * ✨ Added `basename_template` to MinIO source * ✨ Added `SQLServer` source and tasks for it * ✨ Added handling for `DatabaseCredentials` and `Secret` in get_credentials * ✨ Added `df_to_minio` task for prefect * Added `sql_server_to_minio` flow for prefect * ✅ Added tests sql_server_to_minio * 📝 Updated changelog with `sql_server_to_mino` and related functions * 🐛 Added missing package to Dockerfile * ⬆️ Upgraded `prefect` version to `2.19.7` * 🔖 Bumped viadot version to `2.0.0-alpha.5` * ✅ Added tests * 🎨 Updated credentials options * 🔧 Updated docker setup * 🎨 Updated data type * 🎨 Added contexlib for MinIO * 📝 Updated requirements.lock `s * 📝 Updates SQL Server docs * 🎨 Added whitespaces * ⬇️ Downgraded dependecies * 🔖 Bumped viadot to version `2.0.0-alpha.6` * 📝 updated CHANGELOG.md * ✨ updated Outlook connector version 1. * ✨ updated Outlook connector version 2. * 📝 updated docstrings. * ✅ added outlook test file. * 👔 updated some files to aling the rebase. * 📝 updated CHANGELOG.md * ✨ added Hubspot connector version 1. * ✅ added hubspot test file. * 📝 updated docstrings. * ✅ updated local lock file. * 🔊 updated logger in source. * 👔 updated some files to aling the rebase. * 👔 updated some more files to aling the rebase. * 📝 updated CHANGELOG. * ✨ added Mindful to __init__ files. * ✨ created new Minsful connector. * 🎨 updated mindful flow and task connector. * ✅ added mindful test file. * 📝 updated mindful docstrings. * ⚡️ added sep parameter in adls task. * 🔊 updated logs. * 📝 updated docstrings. * 🔊 updated logger in source. * 👔 updated some files to aling the rebase. * 📝 update CHANGELOG.md and __init__ files. * ✨ added Genesys file structure version 1. * 📝 updated rebased files. * ✨ added Genesys file structure version 2. * ✨ added Genesys file structure version 3. * 📝 adding some extra log information. * ✨ added Genesys file structure version 4. * ✅ added genesys test files. * ✅ upsted genesys test file. * 🔊 updated logger in source. * 👔 updated some files to aling the rebase. * 📝 updated docstring. * 🎨 implemented flake8 and pylint tests. * 💄 added prints to source level. * 📝 updated variable names. * Duckdb connectors (#945) * 🚚 Changed tasks utils location * ✨ Created DuckDB connectors * ✨ Created BCP task * 🎨 Formatted code with black * ✅ Added tests * 📝 Updated changelog with duckdb connectors * 🔥 Removed irrelevant docstring * 🔥 Removed irrelevant code * 🎨 Cleaned up the code * 🎨 Cleaned up the code * 📝 Updated docstring * ✅ Updated DuckDB test * 🔥 removed else statement * ⏪ Reverted change from previous commit --------- Co-authored-by: angelika233 <[email protected]> * Delete .python_history * ✅ updated test file. * 🎨 updated code performance. * ✅ updated test file. * c4c code checker passed and tests coverage passed * 🎨 updated code performance. * ✅ updated test file. * 🎨 updated code performance. * ✅ updated test file. * flows_tasks_for c4c * ✅ updated test file to reach 80% coverage. * ✏️ corrected a typo. * ✅ updated test file to reach 80% coverage. * ✅ updated test file. * ✏️ fixed a typo. * ✏️ fixed another typo. * ✨ Added sap_to_parquet flow (#947) * ✨ Added sap_to_parquet flow and tests * ⚡️Change parameters names * 🎨 Changed credentials * 🎨 Change creds * 📝 Updated changelog * 🎨 Formatted code with black * 📝 Improved docstring * 📝 Update docstring * ✅ Updated test * ✏️ Fixed typo in sql server source * 📝 Added info about typo to changelog * ✅ updated test file to reach 80% coverage. * ✅ updated test file. * ✅ updated test file. * ✅ updated test file to reach 80% coverage. * 🦺 added `return` in flow file. * 🦺 added `return` in flow file. * 🦺 added `return` in flow file. * 🦺 added `return` in flow file. * ✅ added test integration file. * ✅ added test integration file. * ✅ added test integration file. * 📝 updated credential typo. * ✅ added test integration file. * ➕ Added `duckdb` to dependecies * ➕ Added `prefect-aws` dependecy * 🚀 Relase 2.0.0-beta.1 * cloud for customer improvement * recover gitignore * removing unuseless files * docker initial * rollback gitignore * update ignore * rollback gitignore * remove unuseless file * Sharepoint orchestration code refactor (#950) * ✨ Moved sharepoint tasks from prefect_viadot repo * ✨ Moved sharepoint_to_redshift_spectrum flow from prefect_viadot repo * 🔥 Cleaned up init for prefect tasks * Added `viadot.orchestration.prefect` * Sharepoint - multiple files logic applied to the source class (#942) * ⬆️ Relax sql-metadata version requirement (#940) * ⬆️ Relax sql-metadata version requirement * 📌 Update lockfiles * ✨ Added `validate_and_reorder_dfs_columns` to utils * ♻️ Added new version of Sharepoint source class with additional functions * ✅ added tests for `validate_and_reorder_dfs_columns` function * ✅ Created `sharepoint_mock` function and changed function name to `_download_file_stream` * 📝 Updated docstring for Sharepoint source class and functions * ⬆️ Relax sql-metadata version requirement (#940) * ⬆️ Relax sql-metadata version requirement * 📌 Update lockfiles * 🚧 Modified `validate_and_reorder_dfs_columns` * 🐛 Added `na_values` to `_load_and_parse` function * 🐛 Added tests for Sharepoint functions * 🐛 Added **kwargs to handle_multiple_files function * 🚧 Added `dtypes=str` instead of functions * ✅ Removed tests for not existing functions * ✅ Added missing tests * ✅ Added missing tests to sharepoint class methods --------- Co-authored-by: Michał Zawadzki <[email protected]> Co-authored-by: Marcin Purtak <[email protected]> * ✨ Added 0365 (#969) * ✨ Added 0365 * 🚧 Moved `0365` to dependencies * Orchestration last changes (#953) * 🚚 Move manual actions to a subfolder * 🐛 Fix the incorrect test dir structure & duplication * 👷 Add CD workflow * 📌 Update MSSQL driver version & pin mssql-tools * 📝 Update container instructions * 👷 Add linter rules * 🐛 Fix typo * ♻️ Do not install Databricks by default * ♻️ Use standard file name * ♻️ Readd commented out test * ♻️ Refactor some weird stuff * ♻️ Fix typo * ♻️ Remove duplicated docstrings * ♻️ Remove clutter * 🎨 Linting * 🚨 Docs & linting * 📝 Improve the user guide * 📝 Further docs improvements * 📝 Further docs improvements * 📝 Docs some more * 📝 Docs - final touches * 📝 Improve example * 🚚 Move to correct path * 🔥 Remove duplicate tests * ♻️ Update tests * 📝 Minor improvement * 🚨 Lint utils * 🔒️ Remove the insecure `credentials` param * 🚨 More linting * 📝 Add SAP RFC installation instructions * 🚨 More linting * ⬆️ Bump pyarrow Fixes #970 * 📌 Update lockfiles * ✅ Fix all unit tests * 🔥 Remove dead code * 🚨 Lint tests * ✅ Skip broken tests * 🚨 Lint all remaining tests * 🚨 Fix remaining linter warnings * 📌 Update lock files * ♻️ Minor fixes * 🧑💻 Also publish `latest` tags for all images For use eg. in the docker-compose file. * 🐛 Fix typo * ✨ Add GitHub release step * 📝 Document the new release process * 📌 Bump version * ♻️ Add last changes from other branches * ♻️ Update some sources' test configuration to match rest of lib * 📝 Add more docs on contributing * 📝 Update a link * 🐛 Update lock files, removing optional deps * ⬆️ Update dependencies * 🚨 Linting * 🐛 Add TOML support to coverage * ✅ Fix `_cast_df()` test failing on datetimes in pandas 2.0 * ⬆️ Run CI on Python 3.12 * ➖ Remove unused `pytest-cov` * ⬆️ Upgrade Python version so Rye CI action uses 3.12 * ⬆️ Upgrade Python to 3.12 in the images * 📝 Improve container env docs * ⬇️ Rollback `pyarrow` to v10.x Also roll back Python to 3.10 as this `pyarrow` version is not compatible with Python 3.12. * ♻️ Use a `skip_test_on_missing_extra()` utils to simplify life * 🧑💻 Install dev dependencies in local containers * 🐛 Fix for broken `numpy` version * 🚧 RedshiftSpectrum source unit tests - WIP --------- Co-authored-by: Diego-H-S <[email protected]> Co-authored-by: Michał Zawadzki <[email protected]> Co-authored-by: angelika233 <[email protected]> Co-authored-by: Angelika Tarnawa <[email protected]> Co-authored-by: fdelgadodyvenia <[email protected]> Co-authored-by: Natalia Walczak <[email protected]> Co-authored-by: Diego <[email protected]> Co-authored-by: Fabio Delgado <[email protected]> Co-authored-by: Rafał Ziemianek <[email protected]> Co-authored-by: Marcin Purtak <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Logic was moved from the prefect-viadot task and flow to the Sharepoint source class. The code is refactored and all the options for handling single and multiple files are applied.
Importance
To have all related data extraction on the viadot source side.
Checklist
This PR:
CONTRIBUTING.md
CHANGELOG.md