-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest/redshift): add path_spec support for copy command #10596
base: master
Are you sure you want to change the base?
Conversation
@treff7es looks like there's merge conflicts on this one |
787fa9d
to
4d9b871
Compare
WalkthroughThe updates enhance the lineage reporting capabilities of the Redshift ingestion process by implementing better error handling and detailed metrics tracking. Key improvements include refined logic for handling S3 paths and the introduction of counters to monitor path specification matches and mismatches. These changes bolster the overall robustness and clarity of the data handling, ensuring more accurate lineage processing and reporting. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Lineage
participant Report
User->>Lineage: Request lineage data
Lineage->>Lineage: Validate S3 path
alt Path matches
Lineage->>Lineage: Increment match counter
else Path mismatch
Lineage->>Lineage: Increment mismatch counter
end
Lineage->>Report: Update report with metrics
Report->>User: Return lineage report
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- metadata-ingestion/src/datahub/ingestion/source/redshift/lineage.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/redshift/report.py (3 hunks)
Additional comments not posted (5)
metadata-ingestion/src/datahub/ingestion/source/redshift/report.py (2)
40-40
: LGTM!The addition of
s3_lineage_path_spec_mismatch
withLossyList
is appropriate for tracking S3 lineage path mismatches.
50-52
: LGTM!The introduction of
num_lineage_processed_temp_tables
,num_s3_lineage_path_spec_mismatch
, andnum_s3_lineage_path_spec_match
enhances the tracking of lineage processing and S3 path specifications.metadata-ingestion/src/datahub/ingestion/source/redshift/lineage.py (3)
272-279
: LGTM!The updates to
_get_s3_path
improve the tracking of path specification matches and mismatches, enhancing lineage reporting.
381-396
: LGTM!The updates to
_get_sources
improve error handling and ensure proper incrementing of thenum_lineage_dropped_not_support_copy_path
counter.
572-572
: LGTM!The update to
_get_target_lineage
ensures proper incrementing of thenum_lineage_dropped_not_support_copy_path
counter.
@treff7es CI is failing - seems like a small thing |
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Documentation