Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest/mongodb): support AWS DocumentDB for MongoDB #9201

Merged
merged 4 commits into from
Nov 14, 2023

Conversation

TonyOuyangGit
Copy link
Contributor

Amazon DocumentDB is a fully managed native JSON document database that has MongoDB compatibility and would work with Datahub MongoDB ingestion. However, from the doc Supported MongoDB Operations, $bsonsize operation is not supported by DocumentDB and this operation in ingestion code is to filter the document with max_document_size which by default is 16MB. Also, as far as I know from research, there's no simple way to implement an operation that can achieve what $bsonsize would do in aggregation.

In order to support the ingestion of DocumentDB, we introduced a new config hostingEnvironment and omitted the $bsonsize operation if the ingestion is for DocumentDB, this will ignore the config max_document_size and ingest documents with the default maximum 16MB.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@TonyOuyangGit TonyOuyangGit marked this pull request as draft November 7, 2023 23:09
@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Nov 7, 2023
@TonyOuyangGit TonyOuyangGit marked this pull request as ready for review November 8, 2023 16:53
@maggiehays maggiehays added community-contribution PR or Issue raised by member(s) of DataHub Community product PR or Issue related to the DataHub UI/UX and removed product PR or Issue related to the DataHub UI/UX labels Nov 13, 2023
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hsheth2 hsheth2 changed the title feat(mongodb): Improve MongoDB ingestion to support AWS DocumentDB feat(ingest/mongodb): Improve MongoDB ingestion to support AWS DocumentDB Nov 14, 2023
@hsheth2 hsheth2 changed the title feat(ingest/mongodb): Improve MongoDB ingestion to support AWS DocumentDB feat(ingest/mongodb): support AWS DocumentDB for MongoDB Nov 14, 2023
@hsheth2 hsheth2 merged commit cfeecd7 into datahub-project:master Nov 14, 2023
51 checks passed
@TonyOuyangGit TonyOuyangGit deleted the feat-improve-mongodb branch February 14, 2024 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants