Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement azure blob storage toolkit #53668

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

subodh1810
Copy link
Contributor

@subodh1810 subodh1810 commented Feb 13, 2025

@subodh1810 subodh1810 self-assigned this Feb 13, 2025
@subodh1810 subodh1810 requested a review from a team as a code owner February 13, 2025 14:34
Copy link

vercel bot commented Feb 13, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 13, 2025 6:02pm

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Feb 13, 2025
implementation("com.azure:azure-storage-blob:12.29.0")

testImplementation("io.mockk:mockk:1.13.16")
testImplementation("org.junit.jupiter:junit-jupiter-api:5.11.3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consider making the version a constant just to make it easy to update. See

as an example. 5.11.4 is the latest.

// If you need to filter by prefix:
// val pager = containerClient.listBlobs(ListBlobsOptions().setPrefix(prefix), null)

for (blobItem in pager) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: more Kotin way would be to do something like: pager.map { it.name }.forEach { name -> ... }

}
} else {
// Sort by the “index” if you want strictly in ascending order:
// but we stored them in arrival order, which may be good enough
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should sort them by ascending index.

The index should correspond to the order the part occurs in the file.

Order of arrival might vary. Specifically, if you're using the current object load paradigm from ObjectStorageStreamLoader that we went over yesterday, order of arrival is not guaranteed to match file order. (This will change in the future, but the client should still guard against mis-ordered calls.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Assuming that the order here determines how Azure assembles them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the comment was old, I change the structure to use ConcurrentSkipListMap which sorts the entries on the basis of keys which is the index. So this is sorted. removed the comment

api project(':airbyte-cdk:bulk:toolkits:bulk-cdk-toolkit-load-object-storage')

testImplementation("io.mockk:mockk:1.13.16")
testImplementation("org.junit.jupiter:junit-jupiter-api:5.11.4")
Copy link
Contributor

@jdpgrailsdev jdpgrailsdev Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consider making the version a constant just to make it easy to update. See

as an example.

Copy link
Contributor

@jdpgrailsdev jdpgrailsdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants