-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better Azure blob storage support #1842
Better Azure blob storage support #1842
Conversation
Thank you for opening this pull request! 🙌 These tips will help get your PR across the finish line:
|
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
4129cd9
to
83cfe5b
Compare
cc @fiedlerNr9 |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1842 +/- ##
===========================================
+ Coverage 18.73% 54.74% +36.01%
===========================================
Files 332 293 -39
Lines 31398 22092 -9306
Branches 3082 2167 -915
===========================================
+ Hits 5882 12094 +6212
+ Misses 25432 9849 -15583
- Partials 84 149 +65
☔ View full report in Codecov by Sentry. |
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
aea52be
to
2d4f54b
Compare
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
kwargs["anon"] = False | ||
return fsspec.filesystem(protocol, **kwargs) # type: ignore | ||
|
||
# Preserve old behavior of returning None for file systems that don't have an explicit anonymous option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the type hint on get_filesystem_for_path
it looks like we assume get_filesystem
never returns None
so I thought probably best to delete this. If anyone thinks its important I'm happy to add it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wild-endeavor I think you wanted to delete this already? So this seems like a good idea
Signed-off-by: Thomas Newton <[email protected]>
9a110b4
to
b689b3c
Compare
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
de3a74b
to
5dd00ed
Compare
Hmm... it looks like a couple of my new tests are failing on the windows builds.
Asserting on the path was not really the primary purpose of this test so I will probably make the assertion just on the |
Signed-off-by: Thomas Newton <[email protected]>
tested on a live cluster and works like expected - also with workload identities 👍 |
There are 3 test suites that timeout out after 6 hours. It looks like they all failed in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some nits
Signed-off-by: Thomas Newton <[email protected]>
Congrats on merging your first pull request! 🎉 |
Signed-off-by: Thomas Newton <[email protected]> Signed-off-by: Future Outlier <[email protected]>
TL;DR
Support Azure blob storage for storing metadata and raw data including structured datasets, without interrupting other standard Azure authentication. Also ensure
storage_options
are set consistently for all uses of fsspec.Type
Its debatable. The original github issue was a bug for fixing Azure authentication but it could be argued that Azure blob storage support is a new feature.
Are all requirements met?
AzureBlobFilesystem
.Complete description
Added a new component to
DataConfig
for Azure similar to the ones for S3 and GCS. This allows configuring the flytekit's storage account without disrupting other Azure auth that may be used by flyte task implementations. This is done by setting the environment variablesFLYTE_AZURE_STORAGE_ACCOUNT_NAME
,FLYTE_AZURE_STORAGE_ACCOUNT_KEY
,FLYTE_AZURE_TENANT_ID
,FLYTE_AZURE_CLIENT_ID
,FLYTE_AZURE_CLIENT_SECRET
. You can set these using a pod template so that authentication to your flyte storage account works across your whole flyte deployment, inrespective of what authentication each task uses.Use
DataConfig
to construct fsspec storage options indata_persistence.py
. We reuse the same function for fetchignstorage_options
elsewhere inlcuding for encode and decode of structured datasets. This ensures local, S3, GCS and Azure work wherever fsspec is used.You should specify
configmap.core.propeller.rawoutput-prefix
to be in the formatabfs://<container-name>/<path-within-container>
.Tracking Issue
Related discussion was on flyteorg/flyte#3962 but technically that issues was about workload identity which was mostly fixed by #1813
Follow-up issue
NA