fix(ingestion/prefect-plugin): fixed the unit tests #10643

dushayntAW · 2024-06-04T12:24:16Z

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Summary by CodeRabbit

New Features
- Introduced Prefect integration in DataHub, allowing users to manage Prefect flow and task metadata, flow run details, and lineage information.
- Added a Prefect flow example for an ETL process, including data extraction, transformation, and interaction with Snowflake datasets.
Documentation
- Added new documentation on Prefect integration, including instructions on metadata event emission, block registration, and troubleshooting.
Chores
- Updated build and test workflows to include Prefect plugin configurations and artifact handling.
- Adjusted documentation sidebar entries to include the new Prefect documentation link.

coderabbitai · 2024-07-01T11:42:21Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The updates introduce a GitHub Actions workflow for the Prefect plugin, enhance artifact handling, and integrate Prefect with DataHub. Concurrently, Gradle and TypeScript changes support the new Prefect capabilities by including Prefect in the metadata ingestion modules, ensuring proper build processes, and updating documentation to feature the Prefect integration and corresponding examples.

Changes

Files/Paths	Change Summary
`.github/workflows/build-and-test.yml`	Added exclusion for `metadata-ingestion-modules:prefect-plugin` tasks and updated artifact upload path.
`.github/workflows/prefect-plugin.yml`	Introduced new workflow for Prefect plugin, including testing, artifact uploads, and coverage reporting.
`.github/workflows/test-results.yml`	Included "Prefect Plugin" in workflows triggering "Test Results".
.../docs-website/build.gradle	Added dependency on `prefect-plugin:buildWheel` task in `yarnGenerate` task.
.../docs-website/generateDocsDir.ts	Updated `copy_python_wheels` function to handle Prefect-plugin wheels.
.../docs-website/sidebars.js	Added doc entry for Prefect and adjusted other documentation links.
`docs/lineage/prefect.md`	Integrated Prefect with DataHub for flow and task metadata management and lineage information.
.../prefect-plugin/src/.../flow.py	Introduced example Prefect flow for an ETL process demonstrating integration with DataHub.

Sequence Diagrams

sequenceDiagram
    participant Developer
    participant GitHub Actions
    participant DataHub
    Developer ->> GitHub Actions: Pushes to master/PR/release
    GitHub Actions ->> GitHub Actions: Trigger Prefect Plugin workflow
    GitHub Actions ->> Python: Set up environment and dependencies
    GitHub Actions ->> Gradle: Build metadata ingestion modules
    GitHub Actions ->> Python: Run Prefect flow
    Python ->> DataHub: Emit metadata events
    GitHub Actions -->> Developer: Upload artifacts and report coverage

Poem

🎀
A rabbit hops through code so bright,
Prefect flows now take their flight.
DataHub sings, as tasks emit,
Metadata winds the kit.
In workflows new and tasks so neat,
The project’s heartbeat finds its beat.
🐰✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 36

Outside diff range and nitpick comments (13)

.github/workflows/test-results.yml (1)

Line range hint 19-19: Fix potential issues with read and quoting.

The read command should use the -r option to prevent backslash mangling. Additionally, the variables should be quoted to prevent globbing and word splitting.
- read name url <<< "$artifact"
+ IFS=$'\t' read -r name url <<< "$artifact"
- gh api $url > "$name.zip"
+ gh api "$url" > "$name.zip"

metadata-ingestion/developing.md (11)

Line range hint 67-67: Add a comma after the introductory clause.

A comma is needed after "pip install".
- If you've already run the pip install, but running `datahub` in your command line doesn't work, then there is likely an issue with your PATH setup and Python.
+ If you've already run the pip install, but running `datahub` in your command line doesn't work, then there is likely an issue with your PATH setup and Python.
Line range hint 112-112: Avoid wordiness.

Consider replacing "a variety of" with a more concise phrase.
- The sources pull metadata from a variety of data systems, while the sinks are primarily for moving this metadata into DataHub.
+ The sources pull metadata from various data systems, while the sinks are primarily for moving this metadata into DataHub.
Line range hint 116-116: Avoid redundancy: "CLI interface".

The phrase "CLI interface" is redundant. Use "CLI".
- The CLI interface is defined in [entrypoints.py](./src/datahub/entrypoints.py) and in the [cli](./src/datahub/cli) directory.
+ The CLI is defined in [entrypoints.py](./src/datahub/entrypoints.py) and in the [cli](./src/datahub/cli) directory.
Line range hint 117-117: Use a hyphen for compound adjectives.

Use a hyphen for "high level".
- The high level interfaces are defined in the [API directory](./src/datahub/ingestion/api).
+ The high-level interfaces are defined in the [API directory](./src/datahub/ingestion/api).
Line range hint 147-147: Avoid repetition: "probably".

The word "probably" is repeated in nearby sentences. Consider replacing it for variety.
- You probably should not be defining a `__hash__` method yourself. Using `@dataclass(frozen=True)` is a good way to get a hashable class.
+ You should not be defining a `__hash__` method yourself. Using `@dataclass(frozen=True)` is a good way to get a hashable class.
Line range hint 154-154: Add a comma after "package".

A comma is needed after "package".
- are not required by the "core" package but instead can be optionally installed using Python "extras".
+ are not required by the "core" package, but instead can be optionally installed using Python "extras".
Line range hint 162-162: Avoid wordiness.

Consider a shorter alternative to avoid wordiness.
- In order to ensure that the configs are consistent and easy to use, we have a few guidelines:
+ To ensure that the configs are consistent and easy to use, we have a few guidelines:
Line range hint 168-168: Add a comma after "For example".

A comma is needed after "For example".
- For example `client_id` or `tenant_id` over a bare `id` and `access_secret` over a bare `secret`.
+ For example, `client_id` or `tenant_id` over a bare `id` and `access_secret` over a bare `secret`.
Line range hint 38-38: Add blank lines around heading.

Headings should be surrounded by blank lines.
- ### (Optional) Set up your Python environment for developing on Prefect Plugin
+ 
+ ### (Optional) Set up your Python environment for developing on Prefect Plugin
+ 
Line range hint 37-37: Add blank lines around fenced code blocks.

Fenced code blocks should be surrounded by blank lines.
- ```shell
+ 
+ ```shell
Line range hint 109-109: Add alt text for images.

Images should have alternate text.
- <img width="70%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/datahub-metadata-ingestion-framework.png"/>
+ <img width="70%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/datahub-metadata-ingestion-framework.png" alt="DataHub Metadata Ingestion Framework"/>

docs-website/generateDocsDir.ts (1)

Line range hint 352-367: Remove unnecessary else clause.

The else clause can be omitted because previous branches break early.

- } else {
-   // Find first h1 header and use it as the title.
-   const headers = contents.content.match(/^# (.+)$/gm);
- 
-   if (!headers) {
-     throw new Error(
-       `${filepath} must have at least one h1 header for setting the title`
-     );
-   }
- 
-   if (headers.length > 1 && contents.content.indexOf("```") < 0) {
-     throw new Error(`too many h1 headers in ${filepath}`);
-   }
-   title = headers[0].slice(2).trim();
- }
+ // Find first h1 header and use it as the title.
+ const headers = contents.content.match(/^# (.+)$/gm);
+ 
+ if (!headers) {
+   throw new Error(
+     `${filepath} must have at least one h1 header for setting the title`
+   );
+ }
+ 
+ if (headers.length > 1 && contents.content.indexOf("```") < 0) {
+   throw new Error(`too many h1 headers in ${filepath}`);
+ }
+ title = headers[0].slice(2).trim();

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8f40229 and b796f59.

Files selected for processing (24)

.github/workflows/build-and-test.yml (2 hunks)
.github/workflows/prefect-plugin.yml (1 hunks)
.github/workflows/test-results.yml (1 hunks)
docs-website/build.gradle (1 hunks)
docs-website/generateDocsDir.ts (1 hunks)
docs-website/sidebars.js (2 hunks)
docs/lineage/prefect.md (1 hunks)
metadata-ingestion-modules/prefect-plugin/.gitignore (1 hunks)
metadata-ingestion-modules/prefect-plugin/README.md (1 hunks)
metadata-ingestion-modules/prefect-plugin/build.gradle (1 hunks)
metadata-ingestion-modules/prefect-plugin/pyproject.toml (1 hunks)
metadata-ingestion-modules/prefect-plugin/scripts/release.sh (1 hunks)
metadata-ingestion-modules/prefect-plugin/setup.cfg (1 hunks)
metadata-ingestion-modules/prefect-plugin/setup.py (1 hunks)
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/init.py (1 hunks)
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/datahub_emitter.py (1 hunks)
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/entities.py (1 hunks)
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/flow.py (1 hunks)
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/save_block.py (1 hunks)
metadata-ingestion-modules/prefect-plugin/tests/integration/integration_test_dummy.py (1 hunks)
metadata-ingestion-modules/prefect-plugin/tests/unit/test_block_standards.py (1 hunks)
metadata-ingestion-modules/prefect-plugin/tests/unit/test_datahub_emitter.py (1 hunks)
metadata-ingestion/developing.md (2 hunks)
settings.gradle (1 hunks)

Files not reviewed due to errors (1)

docs-website/sidebars.js (no review received)

Files skipped from review due to trivial changes (5)

metadata-ingestion-modules/prefect-plugin/pyproject.toml
metadata-ingestion-modules/prefect-plugin/setup.cfg
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/init.py
metadata-ingestion-modules/prefect-plugin/tests/integration/integration_test_dummy.py
settings.gradle

Additional context used

GitHub Check: actionlint

.github/workflows/test-results.yml

[failure] 19-19:
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2162:info:5:91: read without -r will mangle backslashes [shellcheck]

Raw Output:
.github/workflows/test-results.yml:19:9: shellcheck reported issue in this script: SC2162:info:5:91: read without -r will mangle backslashes [shellcheck]

[failure] 19-19:
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2162:info:7:13: read without -r will mangle backslashes [shellcheck]

Raw Output:
.github/workflows/test-results.yml:19:9: shellcheck reported issue in this script: SC2162:info:7:13: read without -r will mangle backslashes [shellcheck]

[failure] 19-19:
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:8:10: Double quote to prevent globbing and word splitting [shellcheck]

Raw Output:
.github/workflows/test-results.yml:19:9: shellcheck reported issue in this script: SC2086:info:8:10: Double quote to prevent globbing and word splitting [shellcheck]

LanguageTool

docs/lineage/prefect.md

[misspelling] ~15-~15: This word is normally spelled with a hyphen.
Context: ...ther Prefect Cloud (recommended) or the self hosted Prefect server. 2. Refer [Cloud Quickst...

(EN_COMPOUNDS_SELF_HOSTED)

[misspelling] ~17-~17: This word is normally spelled with a hyphen.
Context: ...refect.io/latest/guides/host/) to setup self hosted Prefect server. 4. Make sure the Prefec...

(EN_COMPOUNDS_SELF_HOSTED)

[uncategorized] ~40-~40: A comma is probably missing here.
Context: .../concepts/blocks/#saving-blocks). While saving you can provide below configurations. D...

(MISSING_COMMA_AFTER_INTRODUCTORY_PHRASE)

[uncategorized] ~45-~45: Possible missing comma found.
Context: ...belong to. For more detail and possible values refer [here](https://datahubproject.io/...

(AI_HYDRA_LEO_MISSING_COMMA)

[typographical] ~46-~46: Consider adding a comma here.
Context: ...y this recipe belong to. For more detail please refer [here](https://datahubproject.io/...

(PLEASE_COMMA)

[uncategorized] ~100-~100: A comma may be missing after the conjunctive/linking adverb ‘Otherwise’.
Context: ...sks, user compulsory need to emit flow. Otherwise nothing will get emit. ## Concept mapp...

(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)

metadata-ingestion-modules/prefect-plugin/README.md

[style] ~25-~25: Consider a shorter alternative to avoid wordiness.
Context: ... Getting Started ### Setup DataHub UI In order to use 'prefect-datahub' collection, you'l...

(IN_ORDER_TO_PREMIUM)

[uncategorized] ~30-~30: Possible missing preposition found.
Context: ...cessful deployment of DataHub will lead creation of DataHub GMS service running on 'http...

(AI_HYDRA_LEO_MISSING_TO)

[uncategorized] ~36-~36: A comma is probably missing here.
Context: .../concepts/blocks/#saving-blocks). While saving you can provide below configurations. D...

(MISSING_COMMA_AFTER_INTRODUCTORY_PHRASE)

[uncategorized] ~36-~36: You might be missing the article “the” here.
Context: ...g-blocks). While saving you can provide below configurations. Default value will get ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[uncategorized] ~36-~36: A determiner appears to be missing. Consider inserting it.
Context: ...g you can provide below configurations. Default value will get set if not provided whil...

(AI_EN_LECTOR_MISSING_DETERMINER)

[uncategorized] ~41-~41: Possible missing comma found.
Context: ...belong to. For more detail and possible values refer [here](https://datahubproject.io/...

(AI_HYDRA_LEO_MISSING_COMMA)

[typographical] ~42-~42: Consider adding a comma here.
Context: ...y this recipe belong to. For more detail please refer [here](https://datahubproject.io/...

(PLEASE_COMMA)

[uncategorized] ~81-~81: This verb may not be in the correct form. Consider using a different form for this context.
Context: ...lows to help you emit metadata event as show below! ```python import asyncio from ...

(AI_EN_LECTOR_REPLACEMENT_VERB_FORM)

[uncategorized] ~124-~124: A comma may be missing after the conjunctive/linking adverb ‘Otherwise’.
Context: ...sks, user compulsory need to emit flow. Otherwise nothing will get emit. ## Resources F...

(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)

[style] ~149-~149: The phrase ‘feel free to’ is used quite frequently. Consider using a less frequent alternative to set your writing apart from others and make it sound more professional.
Context: ... community](https://prefect.io/slack). Feel free to star or watch [datahub](https://githu...

(FEEL_FREE_TO_STYLE_ME)

[style] ~150-~150: Using many exclamation marks might seem excessive (in this case: 8 exclamation marks for a text that’s 3263 characters long)
Context: ...datahub-project/datahub) for updates too! ### Contributing If you'd like to hel...

(EN_EXCESSIVE_EXCLAMATION)

[style] ~154-~154: Consider using a different verb for a more formal wording.
Context: ...ng If you'd like to help contribute to fix an issue or add a feature to `prefect-d...

(FIX_RESOLVE)

metadata-ingestion/developing.md

[grammar] ~67-~67: Possibly, a comma is missing after an introductory clause, the verb inflection is not correct, or a question mark is missing. Did you mean “pip, install” or “pip installs”?
Context: ...ll If you've already run the pip install, but running datahub in your command ...

(IF_DT_NN_VBZ)

[style] ~112-~112: The phrase “a variety of” may be wordy. To make your writing clearer, consider replacing it.
Context: ...objects. The sources pull metadata from a variety of data systems, while the sinks are prima...

(A_VARIETY_OF)

[style] ~116-~116: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ...ta into DataHub. ## Code layout - The CLI interface is defined in [entrypoints.py](./src/da...

(ACRONYM_TAUTOLOGY)

[uncategorized] ~117-~117: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...li](./src/datahub/cli) directory. - The high level interfaces are defined in the [API dire...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

[style] ~147-~147: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...er method over a @staticmethod. - You probably should not be defining a __hash__ met...

(REP_PROBABLY)

[uncategorized] ~154-~154: Possible missing comma found.
Context: ...s are not required by the "core" package but instead can be optionally installed usi...

(AI_HYDRA_LEO_MISSING_COMMA)

[style] ~162-~162: Consider a shorter alternative to avoid wordiness.
Context: ...l.io/) to define the ingestion configs. In order to ensure that the configs are consistent ...

(IN_ORDER_TO_PREMIUM)

[typographical] ~168-~168: After the expression ‘for example’ a comma is usually used.
Context: ...ternative isn’t descriptive enough. For example client_id or tenant_id over a bare ...

(COMMA_FOR_EXAMPLE)

Markdownlint

docs/lineage/prefect.md

58-58: Expected: 0 or 2; Actual: 1
Trailing spaces

(MD009, no-trailing-spaces)

114-114: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)

124-124: Expected: 1; Actual: 0; Above
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)

134-134: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)

19-19: null
Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

21-21: null
Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

123-123: null
Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

18-18: null
Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

22-22: null
Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

44-44: null
Bare URL used

(MD034, no-bare-urls)

120-120: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

66-66: Expected: fenced; Actual: indented
Code block style

(MD046, code-block-style)

metadata-ingestion-modules/prefect-plugin/README.md

26-26: Expected: 0 or 2; Actual: 1
Trailing spaces

(MD009, no-trailing-spaces)

63-63: Expected: 0 or 2; Actual: 1
Trailing spaces

(MD009, no-trailing-spaces)

21-21: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)

34-34: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)

30-30: null
Bare URL used

(MD034, no-bare-urls)

40-40: null
Bare URL used

(MD034, no-bare-urls)

7-7: null
Images should have alternate text (alt text)

(MD045, no-alt-text)

9-9: null
Images should have alternate text (alt text)

(MD045, no-alt-text)

11-11: null
Images should have alternate text (alt text)

(MD045, no-alt-text)

14-14: null
Images should have alternate text (alt text)

(MD045, no-alt-text)

71-71: Expected: fenced; Actual: indented
Code block style

(MD046, code-block-style)

metadata-ingestion/developing.md

165-165: Expected: h3; Actual: h4
Heading levels should only increment by one level at a time

(MD001, heading-increment)

38-38: Expected: 1; Actual: 0; Above
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)

37-37: null
Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

109-109: null
Images should have alternate text (alt text)

(MD045, no-alt-text)

256-256: null
Files should end with a single newline character

(MD047, single-trailing-newline)

Biome

docs-website/generateDocsDir.ts

[error] 352-367: This else clause can be omitted because previous branches break early.

Unsafe fix: Omit the else clause.

(lint/style/noUselessElse)

[error] 356-367: This else clause can be omitted because previous branches break early.

(lint/style/noUselessElse)

[error] 365-367: This else clause can be omitted because previous branches break early.

(lint/style/noUselessElse)

Ruff

metadata-ingestion-modules/prefect-plugin/tests/unit/test_datahub_emitter.py

550-550: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

Additional comments not posted (40)

metadata-ingestion-modules/prefect-plugin/scripts/release.sh (1)
4-8: Ensure proper handling of build and install commands.

The script conditionally runs gradlew build and gradlew install. Ensure that these commands are correctly handling failures and that the conditions are mutually exclusive.

Verification successful

Ensure proper handling of build and install commands.

The script conditionally runs gradlew build and gradlew install. The gradlew file is accessible at ../../jailuser/git/gradlew. The conditions for RELEASE_SKIP_TEST and RELEASE_SKIP_INSTALL are mutually exclusive, and the script uses set -euxo pipefail for robust error handling.

Update the script to use the correct path: ../../jailuser/git/gradlew.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the build and install commands are correctly handling failures and that the conditions are mutually exclusive.

# Test: Check the exit status of the commands.
../../gradlew build || { echo "Build failed"; exit 1; }
../../gradlew install || { echo "Install failed"; exit 1; }
Length of output: 154

Script:
#!/bin/bash
# Locate the `gradlew` file to ensure it exists in the expected directory.
fd gradlew

# Check the conditions and error handling in the release.sh script.
cat metadata-ingestion-modules/prefect-plugin/scripts/release.sh
Length of output: 1225

Script:
#!/bin/bash
# Verify the correct relative path to `gradlew` from the release.sh script location.
fd gradlew --exec realpath --relative-to=metadata-ingestion-modules/prefect-plugin/scripts/

# Check the mutual exclusivity of the conditions in the release.sh script.
ast-grep --lang bash --pattern $'if [[ ! ${RELEASE_SKIP_TEST:-} ]]; then\n\t$_\nelif [[ ! ${RELEASE_SKIP_INSTALL:-} ]]; then\n\t$_\nfi' metadata-ingestion-modules/prefect-plugin/scripts/release.sh
Length of output: 634

Script:
#!/bin/bash
# Locate the `gradlew` file and determine its relative path from the script's location.
find ../../ -name gradlew -exec realpath --relative-to=metadata-ingestion-modules/prefect-plugin/scripts/ {} +

# Verify the mutual exclusivity of conditions in the release.sh script.
grep -E 'if \[\[ ! \${RELEASE_SKIP_TEST:-} \]\]; then|elif \[\[ ! \${RELEASE_SKIP_INSTALL:-} \]\]; then' metadata-ingestion-modules/prefect-plugin/scripts/release.sh
Length of output: 411

Script:
#!/bin/bash
# Locate the `gradlew` file and display the paths.
find ../../ -name gradlew
Length of output: 235

Script:
#!/bin/bash
# Verify if the `gradlew` file is accessible at the intended path.
if [ -f ../../jailuser/git/gradlew ]; then
  echo "gradlew is accessible at ../../jailuser/git/gradlew"
else
  echo "gradlew is not accessible at ../../jailuser/git/gradlew"
fi
Length of output: 152
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/entities.py (3)

9-13: LGTM!

The _Entity class is well-defined and follows best practices for abstract base classes.

16-30: LGTM!

The Dataset class is well-implemented and makes good use of attr.s for attribute management.

33-46: LGTM!

The Urn class is well-implemented and makes good use of attr.s for attribute management and custom validation.

metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/flow.py (2)

17-20: LGTM!

The extract function is simple and well-implemented.

49-55: LGTM!

The etl function is well-implemented and follows best practices for flow functions.

metadata-ingestion-modules/prefect-plugin/.gitignore (1)

1-143: LGTM!

The .gitignore file is well-structured and includes common patterns for Python projects.

.github/workflows/prefect-plugin.yml (1)

1-86: LGTM!

The workflow file is well-structured and follows best practices for GitHub Actions workflows.
metadata-ingestion-modules/prefect-plugin/setup.py (7)
1-2: Optimize imports.

Combine the os and pathlib imports for better readability.
import os
import pathlib
16-24: Clarify and document dependency groups.

Add comments to explain the purpose of each dependency group.
# Common dependencies for all environments.
rest_common = {"requests", "requests_file"}

# Base dependencies required for the package.
base_requirements = {
    # For python 3.7 and importlib-metadata>=5.0.0, build failed with attribute error
    "importlib-metadata>=4.4.0,<5.0.0; python_version < '3.8'",
    # Actual dependencies.
    "prefect >= 2.0.0",
    *rest_common,
    f"acryl-datahub == {package_metadata['__version__']}",
}
46-69: Update dev_requirements for clarity.

Group related dependencies and add comments for better readability.
# Development dependencies.
dev_requirements = {
    *base_requirements,
    *mypy_stubs,
    # Code formatting and linting tools.
    "black==22.12.0",
    "coverage>=5.1",
    "flake8>=3.8.3",
    "flake8-tidy-imports>=4.3.0",
    "isort>=5.7.0",
    "mypy>=1.4.0",
    # Pydantic version compatibility with mypy.
    "pydantic>=1.10",
    # Testing tools.
    "pytest>=6.2.2",
    "pytest-asyncio>=0.16.0",
    "pytest-cov>=2.8.1",
    "tox",
    # Miscellaneous tools.
    "deepdiff",
    "requests-mock",
    "freezegun",
    "jsonpickle",
    "build",
    "twine",
    "packaging",
}
76-121: Add keywords metadata for better discoverability.

Include relevant keywords in the setuptools.setup call to improve package discoverability.
setuptools.setup(
    # Package metadata.
    name=package_metadata["__package_name__"],
    version=package_metadata["__version__"],
    url="https://datahubproject.io/",
    project_urls={
        "Documentation": "https://datahubproject.io/docs/",
        "Source": "https://github.com/datahub-project/datahub",
        "Changelog": "https://github.com/datahub-project/datahub/releases",
    },
    license="Apache License 2.0",
    description="Datahub prefect block to capture executions and send to Datahub",
    long_description=get_long_description(),
    long_description_content_type="text/markdown",
    classifiers=[
        "Development Status :: 5 - Production/Stable",
        "Programming Language :: Python",
        "Programming Language :: Python :: 3",
        "Programming Language :: Python :: 3 :: Only",
        "Programming Language :: Python :: 3.7",
        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
        "Intended Audience :: Developers",
        "Intended Audience :: Information Technology",
        "Intended Audience :: System Administrators",
        "License :: OSI Approved",
        "License :: OSI Approved :: Apache Software License",
        "Operating System :: Unix",
        "Operating System :: POSIX :: Linux",
        "Environment :: Console",
        "Environment :: MacOS X",
        "Topic :: Software Development",
    ],
    # Package info.
    zip_safe=False,
    python_requires=">=3.7",
    package_dir={"": "src"},
    packages=setuptools.find_namespace_packages(where="./src"),
    entry_points=entry_points,
    # Dependencies.
    install_requires=list(base_requirements),
    extras_require={
        "dev": list(dev_requirements),
    },
    keywords="datahub prefect metadata integration",
)
71-73: Check entry points for correctness.

Ensure the entry point definition correctly maps to the DatahubEmitter class.

Verification successful

Entry point definition is correct.

The DatahubEmitter class is correctly defined in the prefect_datahub module as specified in the entry point.

File: metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/datahub_emitter.py

Class Definition: class DatahubEmitter(Block)
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the entry point definition correctly maps to the `DatahubEmitter` class.

# Test: Search for `DatahubEmitter` class definition. Expect: Definition in the specified module.
rg --type python --pattern "class DatahubEmitter" ./src/prefect_datahub/prefect_datahub.py
Length of output: 504

Script:
#!/bin/bash
# Description: Verify that the entry point definition correctly maps to the `DatahubEmitter` class.

# Test: Search for `DatahubEmitter` class definition. Expect: Definition in the specified module.
rg "class DatahubEmitter" ./src/prefect_datahub/prefect_datahub.py
Length of output: 149

Script:
#!/bin/bash
# Description: Verify that the entry point definition correctly maps to the `DatahubEmitter` class.

# Test: Search for `DatahubEmitter` class definition across the entire repository.
rg "class DatahubEmitter"
Length of output: 278
28-44: Verify dependency versions for mypy_stubs.

Ensure that the specified versions for mypy_stubs are compatible with the rest of the dependencies.

Verification successful

Verify dependency versions for mypy_stubs.

The specified versions for mypy_stubs appear to be compatible with the rest of the dependencies listed in base_requirements and framework_common. No conflicts were found based on the available version constraints.

base_requirements and framework_common contain dependencies with specific versions that do not conflict with the mypy_stubs versions.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the specified versions for `mypy_stubs` are compatible with the rest of the dependencies.

# Test: Check for dependency conflicts. Expect: No conflicts.
pip install types-dataclasses sqlalchemy-stubs types-pkg_resources types-six types-python-dateutil types-requests types-toml types-PyYAML types-freezegun types-cachetools types-click==0.1.12 types-tabulate types-pytz
Length of output: 289

Script:
#!/bin/bash
# Description: List all dependencies and their versions in the setup.py file to manually verify compatibility.

# Locate the setup.py file in the metadata-ingestion-modules/prefect-plugin directory
fd setup.py -a -t f -x cat {} | grep -E "install_requires|mypy_stubs"
Length of output: 404

Script:
#!/bin/bash
# Description: Search for the definitions of base_requirements and framework_common to list all dependencies and their versions.

# Locate the setup.py file in the metadata-ingestion-modules/prefect-plugin directory and search for base_requirements and framework_common definitions
fd setup.py -a -t f -x cat '{}' | grep -A 10 -E 'base_requirements|framework_common'
Length of output: 5934
6-8: Ensure __init__.py contains necessary metadata.

Verify that ./src/prefect_datahub/__init__.py contains the __package_name__ and __version__ keys used later in the setup.

Verification successful

Ensure __init__.py contains necessary metadata.

The metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/__init__.py file contains the __package_name__ and __version__ keys, confirming that the necessary metadata is present.

__package_name__ = "prefect-datahub"

__version__ = "1!0.0.0.dev0"
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that `__package_name__` and `__version__` are defined in `__init__.py`.

# Test: Search for the definitions. Expect: Definitions of `__package_name__` and `__version__`.
rg --type python --pattern "__package_name__|__version__" ./src/prefect_datahub/__init__.py
Length of output: 505

Script:
#!/bin/bash
# Description: Verify that `__package_name__` and `__version__` are defined in `__init__.py`.

# Test: Search for the definitions. Expect: Definitions of `__package_name__` and `__version__`.
rg '__package_name__|__version__' ./src/prefect_datahub/__init__.py
Length of output: 143

Script:
#!/bin/bash
# Description: Locate `__init__.py` and verify that `__package_name__` and `__version__` are defined.

# Step 1: Locate `__init__.py` in the repository.
fd --type f --name '__init__.py'

# Step 2: Once located, check for the definitions of `__package_name__` and `__version__`.
fd --type f --name '__init__.py' --exec rg '__package_name__|__version__' {}
Length of output: 613

Script:
#!/bin/bash
# Description: Locate `__init__.py` and verify that `__package_name__` and `__version__` are defined.

# Step 1: Locate `__init__.py` in the repository.
fd '__init__.py'

# Step 2: Once located, check for the definitions of `__package_name__` and `__version__'.
fd '__init__.py' --exec rg '__package_name__|__version__' {}
Length of output: 10052
docs-website/build.gradle (1)
89-89: LGTM! Ensure the build task is necessary.

The addition of the prefect-plugin:buildWheel task looks good.

Verify that this task is necessary for the documentation build process.

Verification successful

The buildWheel task for the prefect-plugin is necessary.

The buildWheel task is consistently referenced across multiple Gradle files, indicating its necessity in the build process.

metadata-ingestion-modules/prefect-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the `prefect-plugin:buildWheel` task is necessary for the documentation build process.

# Test: Search for references to `buildWheel` tasks. Expect: References in relevant build scripts.
rg --type gradle --pattern "buildWheel"
Length of output: 451

Script:
#!/bin/bash
# Description: Verify that the `prefect-plugin:buildWheel` task is necessary for the documentation build process.

# Test: Search for references to `buildWheel` tasks. Expect: References in relevant build scripts.
rg "buildWheel" --type gradle
Length of output: 813
.github/workflows/build-and-test.yml (1)
89-90: Ensure exclusions are necessary.

The exclusion of the prefect-plugin:build and prefect-plugin:check tasks looks good.

Verify that these exclusions are necessary for the build process.

Verification successful

Exclusions are necessary.

The exclusions for prefect-plugin:build and prefect-plugin:check tasks are correctly specified and necessary for the build process.

The metadata-ingestion-modules/prefect-plugin/build.gradle file includes references to both build and check tasks, confirming their presence in the build process.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the exclusions for `prefect-plugin:build` and `prefect-plugin:check` tasks are necessary for the build process.

# Test: Search for references to `build` and `check` tasks. Expect: References in relevant build scripts.
rg --type gradle --pattern "build|check"
Length of output: 454

Script:
#!/bin/bash
# Description: Verify that the exclusions for `prefect-plugin:build` and `prefect-plugin:check` tasks are necessary for the build process.

# Test: Search for references to `build` and `check` tasks. Expect: References in relevant build scripts.
rg "build|check" --type gradle
Length of output: 19935
metadata-ingestion-modules/prefect-plugin/build.gradle (13)
1-3: LGTM! Ensure the base plugin is necessary.

The inclusion of the base plugin looks good.

Verify that the base plugin is necessary for the build process.

Verification successful

The base plugin is necessary for the build process.

The base plugin is referenced in multiple build.gradle files across different modules, indicating its necessity for the build configuration.

metadata-ingestion-modules/prefect-plugin/build.gradle

metadata-ingestion/build.gradle

metadata-ingestion-modules/airflow-plugin/build.gradle

metadata-ingestion-modules/dagster-plugin/build.gradle

entity-registry/custom-test-model/build.gradle
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the `base` plugin is necessary for the build process.

# Test: Search for references to the `base` plugin. Expect: References in relevant build scripts.
rg --type gradle --pattern "id 'base'"
Length of output: 458

Script:
#!/bin/bash
# Description: Verify that the `base` plugin is necessary for the build process.

# Test: Search for references to the `base` plugin. Expect: References in relevant build scripts.
rg "id 'base'" --type gradle
Length of output: 348
30-41: Ensure package installation task is necessary.

The task to install the package looks good.

Verify that this task is necessary for the build process.

Verification successful

The installPackage task is necessary for the build process.

The installPackage task is referenced in multiple build scripts, confirming its necessity.

metadata-ingestion/build.gradle

metadata-ingestion-modules/dagster-plugin/build.gradle

metadata-ingestion-modules/prefect-plugin/build.gradle

metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the package installation task is necessary for the build process.

# Test: Search for references to the package installation task. Expect: References in relevant build scripts.
rg --type gradle --pattern "installPackage"
Length of output: 455

Script:
#!/bin/bash
# Description: Verify that the package installation task is necessary for the build process.

# Test: Search for references to the package installation task. Expect: References in relevant build scripts.
rg "installPackage" --type gradle
Length of output: 1360
14-15: LGTM! Ensure the pip_install_command definition is correct.

The definition for the pip_install_command looks good.

Verify that this definition is correct for the environment.

Verification successful

LGTM! Ensure the pip_install_command definition is correct.

The definition for the pip_install_command looks good. It is referenced multiple times in the metadata-ingestion-modules/prefect-plugin/build.gradle file, confirming its usage.

metadata-ingestion-modules/prefect-plugin/build.gradle

metadata-ingestion-modules/dagster-plugin/build.gradle

metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the definition for the `pip_install_command` is correct for the environment.

# Test: Search for references to `pip_install_command`. Expect: References in relevant build scripts.
rg --type gradle --pattern "pip_install_command"
Length of output: 460

Script:
#!/bin/bash
# Description: Verify that the definition for the `pip_install_command` is correct for the environment.

# Test: Search for references to `pip_install_command`. Expect: References in relevant build scripts.
rg "pip_install_command" --type gradle
Length of output: 1467
10-12: LGTM! Ensure the extra_pip_requirements property is necessary.

The definition for the extra_pip_requirements property looks good.

Verify that this property is necessary for the build process.

Verification successful

The extra_pip_requirements property is necessary for the build process.

The property is referenced across multiple build scripts and is used in pip installation commands, indicating its importance in managing pip dependencies.

metadata-ingestion/build.gradle

metadata-ingestion-modules/airflow-plugin/build.gradle

metadata-ingestion-modules/prefect-plugin/build.gradle

metadata-ingestion-modules/dagster-plugin/build.gradle

datahub-web-react/src/app/ingest/source/builder/NameSourceStep.tsx
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the `extra_pip_requirements` property is necessary for the build process.

# Test: Search for references to `extra_pip_requirements`. Expect: References in relevant build scripts.
rg --type gradle --pattern "extra_pip_requirements"
Length of output: 463

Script:
#!/bin/bash
# Description: Verify that the `extra_pip_requirements` property is necessary for the build process.

# Test: Search for references to `extra_pip_requirements` across all files.
rg "extra_pip_requirements"
Length of output: 2217
20-27: Ensure environment setup task is necessary.

The task to set up the environment looks good.

Verify that this task is necessary for the build process.

Verification successful

Environment setup task is necessary.

The environmentSetup task is a prerequisite for multiple tasks across various modules, confirming its necessity for the build process.

metadata-ingestion/build.gradle

metadata-ingestion-modules/dagster-plugin/build.gradle

metadata-ingestion-modules/airflow-plugin/build.gradle

metadata-ingestion-modules/prefect-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the environment setup task is necessary for the build process.

# Test: Search for references to the environment setup task. Expect: References in relevant build scripts.
rg --type gradle --pattern "environmentSetup"
Length of output: 457

Script:
#!/bin/bash
# Description: Verify that the environment setup task is necessary for the build process.

# Test: Search for references to the environment setup task. Expect: References in relevant build scripts.
rg "environmentSetup" --type gradle
Length of output: 1702
46-54: Ensure development installation task is necessary.

The task to install development dependencies looks good.

Verify that this task is necessary for the build process.

Verification successful

The installDev task is necessary for the build process.

The installDev task is referenced in multiple build.gradle files and is a dependency for various other tasks across the project.

smoke-test/build.gradle

metadata-ingestion-modules/prefect-plugin/build.gradle

metadata-ingestion/build.gradle

metadata-ingestion-modules/airflow-plugin/build.gradle

metadata-ingestion-modules/dagster-plugin/build.gradle

These references confirm its necessity within the build process.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the development installation task is necessary for the build process.

# Test: Search for references to the development installation task. Expect: References in relevant build scripts.
rg --type gradle --pattern "installDev"
Length of output: 451

Script:
#!/bin/bash
# Description: Verify that the development installation task is necessary for the build process.

# Test: Search for references to the development installation task. Expect: References in relevant build scripts.
rg "installDev" --type gradle
Length of output: 4020
5-8: LGTM! Ensure the Python executable and virtual environment name are correct.

The definitions for python_executable and venv_name look good.

Verify that these definitions are correct for the environment.

Verification successful

Verification successful for python_executable and venv_name.

The definitions for python_executable and venv_name are consistent and correctly used across the relevant build scripts.

metadata-ingestion-modules/prefect-plugin/build.gradle

smoke-test/build.gradle

metadata-ingestion/build.gradle

metadata-ingestion-modules/dagster-plugin/build.gradle

metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the definitions for `python_executable` and `venv_name` are correct for the environment.

# Test: Search for references to `python_executable` and `venv_name`. Expect: References in relevant build scripts.
rg --type gradle --pattern "python_executable|venv_name"
Length of output: 470

Script:
#!/bin/bash
# Description: Verify that the definitions for `python_executable` and `venv_name` are correct for the environment.

# Test: Search for references to `python_executable` and `venv_name`. Expect: References in relevant build scripts.
rg 'python_executable|venv_name' --type gradle
Length of output: 12093
16-18: Ensure Python version check is necessary.

The task to check the Python version looks good.

Verify that this task is necessary for the build process.

Verification successful

Python version check task is necessary for the build process.

The checkPythonVersion task is referenced as a dependency for the environmentSetup task in multiple build.gradle files, indicating its necessity for the build process.

metadata-ingestion-modules/dagster-plugin/build.gradle

metadata-ingestion/build.gradle

metadata-ingestion-modules/prefect-plugin/build.gradle

metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the Python version check task is necessary for the build process.

# Test: Search for references to the Python version check task. Expect: References in relevant build scripts.
rg --type gradle --pattern "checkPythonVersion"
Length of output: 459

Script:
#!/bin/bash
# Description: Verify that the Python version check task is necessary for the build process.

# Test: Search for references to the Python version check task. Expect: References in relevant build scripts.
rg "checkPythonVersion" --type gradle
Length of output: 851
73-80: Ensure development test installation task is necessary.

The task to install development test dependencies looks good.

Verify that this task is necessary for the build process.

Verification successful

Verified: The installDevTest task is necessary for the build process.

The task is referenced by multiple other tasks, confirming its necessity.

metadata-ingestion-modules/prefect-plugin/build.gradle

metadata-ingestion/build.gradle

metadata-ingestion-modules/dagster-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the development test installation task is necessary for the build process.

# Test: Search for references to the development test installation task. Expect: References in relevant build scripts.
rg --type gradle --pattern "installDevTest"
Length of output: 455

Script:
#!/bin/bash
# Description: Verify that the development test installation task is necessary for the build process.

# Test: Search for references to the development test installation task in Gradle files.
rg "installDevTest" --type gradle
Length of output: 1697
56-63: Ensure linting task is necessary.

The task to lint the code looks good.

Verify that this task is necessary for the build process.

Verification successful

The linting task is necessary for the build process.

The lint task is referenced as a dependency of the check task in the metadata-ingestion-modules/prefect-plugin/build.gradle file, confirming its necessity.

metadata-ingestion-modules/prefect-plugin/build.gradle: check.dependsOn lint
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the linting task is necessary for the build process.

# Test: Search for references to the linting task. Expect: References in relevant build scripts.
rg --type gradle --pattern "lint"
Length of output: 445

Script:
#!/bin/bash
# Description: Verify that the linting task is necessary for the build process.

# Test: Search for references to the linting task. Expect: References in relevant build scripts.
rg "lint" --type gradle
Length of output: 1714
96-103: Ensure quick test task is necessary.

The task to run quick tests looks good.

Verify that this task is necessary for the build process.

Verification successful

The testQuick task is necessary for the build process.

The testQuick task is referenced in multiple build scripts, indicating its importance in the build process. Specifically, it is a dependency for the check task and the testFull task.

Files and references:

metadata-ingestion-modules/dagster-plugin/build.gradle

Line: task testQuick(type: Exec, dependsOn: installDevTest)

Line: task testFull(type: Exec, dependsOn: [testQuick, installDevTest])

Line: check.dependsOn testQuick

metadata-ingestion-modules/prefect-plugin/build.gradle

Line: task testQuick(type: Exec, dependsOn: installDevTest)

Line: task testFull(type: Exec, dependsOn: [testQuick, installDevTest])

Line: check.dependsOn testQuick

metadata-ingestion/build.gradle

Line: task testQuick(type: Exec, dependsOn: [installDev, ':metadata-models:generateJsonSchema'])

Line: check.dependsOn testQuick

metadata-ingestion-modules/airflow-plugin/build.gradle

Line: task testQuick(type: Exec, dependsOn: installTest)

Line: check.dependsOn testQuick
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the quick test task is necessary for the build process.

# Test: Search for references to the quick test task. Expect: References in relevant build scripts.
rg --type gradle --pattern "testQuick"
Length of output: 450

Script:
#!/bin/bash
# Description: Verify that the quick test task is necessary for the build process.

# Test: Search for references to the quick test task. Expect: References in relevant build scripts.
rg "testQuick" --type gradle
Length of output: 1030
82-94: Ensure single test task is necessary.

The task to run a single test looks good.

Verify that this task is necessary for the build process.

Verification successful

Ensure single test task is necessary.

The task testSingle is indeed referenced in the documentation (metadata-ingestion/developing.md), indicating that it is used for running specific test files. This confirms its necessity for the build process.

metadata-ingestion/developing.md: References to testSingle for running specific test files.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the single test task is necessary for the build process.

# Test: Search for references to the single test task. Expect: References in relevant build scripts.
rg --type gradle --pattern "testSingle"
Length of output: 451

Script:
#!/bin/bash
# Description: Verify that the single test task is necessary for the build process.

# Test: Search for references to the single test task. Expect: References in relevant build scripts.
rg "testSingle" --type gradle
Length of output: 312

Script:
#!/bin/bash
# Description: Verify if the single test task is invoked or referenced elsewhere in the codebase.

# Test: Search for invocations or references to the single test task.
rg "testSingle"
Length of output: 705
64-71: Ensure lint fixing task is necessary.

The task to fix linting issues looks good.

Verify that this task is necessary for the build process.

Verification successful

Lint fixing task is necessary

The lintFix task is referenced in multiple build scripts and is part of the dependency chain for running tests (installTest and testQuick). This confirms its necessity in the build process.

metadata-ingestion-modules/airflow-plugin/build.gradle: installTest should run after lintFix.

metadata-ingestion-modules/airflow-plugin/build.gradle: testQuick depends on installTest.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the lint fixing task is necessary for the build process.

# Test: Search for references to the lint fixing task. Expect: References in relevant build scripts.
rg --type gradle --pattern "lintFix"
Length of output: 448

Script:
#!/bin/bash
# Description: Verify that the lint fixing task is necessary for the build process.

# Test: Search for references to the lint fixing task. Expect: References in relevant build scripts.
rg "lintFix" --type gradle
Length of output: 513

Script:
#!/bin/bash
# Description: Verify the role of the `installTest` task in the build process.

# Test: Search for references to the `installTest` task. Expect: References in relevant build scripts.
rg "installTest" --type gradle
Length of output: 426
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/datahub_emitter.py (10)

104-114: LGTM!

The method correctly converts a list of _Entity objects to a list of DatasetUrn objects.

116-145: LGTM!

The method fetches the workspace name and handles errors gracefully.

174-189: LGTM!

The method correctly emits the browse path.

190-254: LGTM!

The method correctly generates a DataJob entity and handles various scenarios.

256-324: LGTM!

The method correctly generates a DataFlow entity and handles various scenarios.

326-394: LGTM!

The method correctly emits task metadata and handles upstream dependencies.

395-458: LGTM!

The method correctly emits flow run metadata and handles various scenarios.

459-552: LGTM!

The method correctly emits task run metadata and handles various scenarios.

554-612: LGTM!

The method correctly stores the current running task metadata.

613-659: LGTM!

The method correctly emits the current running flow metadata.

coderabbitai · 2024-07-01T11:48:12Z

metadata-ingestion-modules/prefect-plugin/tests/unit/test_datahub_emitter.py

+        f"(prefect,{flow_run_ctx.flow.name},PROD),{task_run_ctx.task.task_key})"
+    )
+
+    assert expected_datajob_urn in datahub_emitter._datajobs_to_emit.keys()


Optimize dictionary key check.

Use key in dict instead of key in dict.keys().

- assert expected_datajob_urn in datahub_emitter._datajobs_to_emit.keys() + assert expected_datajob_urn in datahub_emitter._datajobs_to_emit

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert expected_datajob_urn in datahub_emitter._datajobs_to_emit.keys()

assert expected_datajob_urn in datahub_emitter._datajobs_to_emit

Tools

Ruff

550-550: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

coderabbitai · 2024-07-01T11:48:12Z

metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/save_block.py

+import asyncio
+
+from prefect_datahub.datahub_emitter import DatahubEmitter


Consider making the configuration values configurable.

Hardcoding values like datahub_rest_url, env, platform_instance, and token can reduce flexibility. Consider accepting these as parameters or using environment variables.

import os datahub_rest_url = os.getenv("DATAHUB_REST_URL", "http://localhost:8080") env = os.getenv("ENV", "DEV") platform_instance = os.getenv("PLATFORM_INSTANCE", "local_prefect") token = os.getenv("TOKEN", None) # generate auth token in the datahub and provide here if gms endpoint is secure datahub_emitter = DatahubEmitter( datahub_rest_url=datahub_rest_url, env=env, platform_instance=platform_instance, token=token, )

coderabbitai · 2024-07-01T11:48:12Z

metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/save_block.py

+async def save_datahub_emitter():
+    datahub_emitter = DatahubEmitter(
+        datahub_rest_url="http://localhost:8080",
+        env="DEV",
+        platform_instance="local_prefect",
+        token=None,  # generate auth token in the datahub and provide here if gms endpoint is secure
+    )
+
+    await datahub_emitter.save("BLOCK-ID", overwrite=True)


Add error handling for the asynchronous operation.

The function save_datahub_emitter lacks error handling for the await datahub_emitter.save call. Consider adding a try-except block to handle potential exceptions.

async def save_datahub_emitter(): try: datahub_emitter = DatahubEmitter( datahub_rest_url="http://localhost:8080", env="DEV", platform_instance="local_prefect", token=None, # generate auth token in the datahub and provide here if gms endpoint is secure ) await datahub_emitter.save("BLOCK-ID", overwrite=True) except Exception as e: print(f"Failed to save DatahubEmitter: {e}")

coderabbitai · 2024-07-01T11:48:12Z

metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/flow.py

+async def load_datahub_emitter():
+    datahub_emitter = DatahubEmitter()
+    emitter = datahub_emitter.load("BLOCK-ID")
+    print(emitter)
+    return emitter


Consider making the block ID a parameter.

The function is well-implemented, but making the block ID a parameter would improve flexibility.

- emitter = datahub_emitter.load("BLOCK-ID") + emitter = datahub_emitter.load(block_id: str)

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-01T11:48:12Z

metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/flow.py

+@task(name="Transform", description="Transform the data")
+def transform(
+    data: str, datahub_emitter: DatahubEmitter
+) -> Tuple[List[str], DatahubEmitter]:
+    data_list_str = data.split(" ")
+    datahub_emitter.add_task(
+        inputs=[
+            Dataset(
+                platform="snowflake",
+                name="mydb.schema.tableA",
+                env=datahub_emitter.env,
+                platform_instance=datahub_emitter.platform_instance,
+            )
+        ],
+        outputs=[
+            Dataset(
+                platform="snowflake",
+                name="mydb.schema.tableB",
+                env=datahub_emitter.env,
+                platform_instance=datahub_emitter.platform_instance,
+            )
+        ],
+    )
+    return data_list_str, datahub_emitter


Consider making the task details parameters.

The function is well-implemented, but making the task details parameters would improve flexibility.

- datahub_emitter.add_task( - inputs=[ - Dataset( - platform="snowflake", - name="mydb.schema.tableA", - env=datahub_emitter.env, - platform_instance=datahub_emitter.platform_instance, - ) - ], - outputs=[ - Dataset( - platform="snowflake", - name="mydb.schema.tableB", - env=datahub_emitter.env, - platform_instance=datahub_emitter.platform_instance, - ) - ], - ) + datahub_emitter.add_task(inputs: List[Dataset], outputs: List[Dataset])

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-01T11:48:15Z

metadata-ingestion-modules/prefect-plugin/README.md

+Config | Type | Default | Description
+--- | --- | --- | ---
+datahub_rest_url | `str` | *http://localhost:8080* | DataHub GMS REST URL
+env | `str` | *PROD* | The environment that all assets produced by this orchestrator belong to. For more detail and possible values refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype).


Add a comma after "For more detail".

A comma is needed after "For more detail".

- For more detail and possible values refer [here] + For more detail and possible values, refer [here]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

env | `str` | *PROD* | The environment that all assets produced by this orchestrator belong to. For more detail and possible values refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype).

env | `str` | *PROD* | The environment that all assets produced by this orchestrator belong to. For more detail and possible values, refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype).

Tools

LanguageTool

[uncategorized] ~41-~41: Possible missing comma found.
Context: ...belong to. For more detail and possible values refer [here](https://datahubproject.io/...

(AI_HYDRA_LEO_MISSING_COMMA)

coderabbitai · 2024-07-01T11:48:15Z

metadata-ingestion-modules/prefect-plugin/README.md

+
+### Load the saved block in prefect workflows
+
+After installing `prefect-datahub` and [saving the configution](#saving-configurations-to-a-block), you can easily use it within your prefect workflows to help you emit metadata event as show below!


Correct verb form: "shown"

The correct verb form is "shown".

- as show below! + as shown below!

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

After installing `prefect-datahub` and [saving the configution](#saving-configurations-to-a-block), you can easily use it within your prefect workflows to help you emit metadata event as show below!

After installing `prefect-datahub` and [saving the configution](#saving-configurations-to-a-block), you can easily use it within your prefect workflows to help you emit metadata event as shown below!

Tools

LanguageTool

[uncategorized] ~81-~81: This verb may not be in the correct form. Consider using a different form for this context.
Context: ...lows to help you emit metadata event as show below! ```python import asyncio from ...

(AI_EN_LECTOR_REPLACEMENT_VERB_FORM)

coderabbitai · 2024-07-01T11:48:15Z

metadata-ingestion-modules/prefect-plugin/setup.py

+def get_long_description():
+    root = os.path.dirname(__file__)
+    return pathlib.Path(os.path.join(root, "README.md")).read_text()


Use pathlib consistently.

For consistency, use pathlib for path operations instead of os.path.

def get_long_description(): root = pathlib.Path(__file__).parent return (root / "README.md").read_text()

coderabbitai · 2024-07-01T11:48:15Z

metadata-ingestion/developing.md

@@ -241,4 +253,4 @@ tox -- --update-golden-files

 # Update golden files for a specific environment.
 tox -e py310-airflow26 -- --update-golden-files
-```
+```


Add a single newline character at the end of the file.

Files should end with a single newline character.

- +

Committable suggestion was skipped due to low confidence.

Tools

Markdownlint

256-256: null
Files should end with a single newline character

(MD047, single-trailing-newline)

coderabbitai · 2024-07-01T11:48:15Z

metadata-ingestion-modules/prefect-plugin/build.gradle

+task testFull(type: Exec, dependsOn: [testQuick, installDevTest]) {
+  commandLine 'bash', '-x', '-c',
+    "source ${venv_name}/bin/activate && pytest -m 'not slow_integration' -vv --continue-on-collection-errors --junit-xml=junit.full.xml"
+}


Tip
Codebase Verification

Ensure full test task is necessary.

The testFull task is referenced in the metadata-ingestion/developing.md file but is not used in the build process itself.

metadata-ingestion/developing.md: ../gradlew :metadata-ingestion:testFull

This indicates that the testFull task is used during the development process but not required for the build process.

Analysis chain

Ensure full test task is necessary.

The task to run full tests looks good.

Verify that this task is necessary for the build process.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Verify that the full test task is necessary for the build process. # Test: Search for references to the full test task. Expect: References in relevant build scripts. rg --type gradle --pattern "testFull"

Length of output: 449

Script:

#!/bin/bash # Description: Verify that the full test task is necessary for the build process. # Test: Search for references to the full test task. Expect: References in relevant build scripts. rg "testFull" --type gradle

Length of output: 362

Script:

#!/bin/bash # Description: Verify that the full test task is necessary for the build process. # Test: Search for references to the full test task in all files within the repository. rg "testFull"

Length of output: 421

Fix build

github-actions bot added ingestion PR or Issue related to the ingestion of metadata docs Issues and Improvements to docs devops PR or Issue related to DataHub backend & deployment labels Jun 4, 2024

vercel bot had a problem deploying to Preview June 4, 2024 12:28 Failure

vercel bot had a problem deploying to Preview June 12, 2024 08:56 Failure

vercel bot had a problem deploying to Preview June 12, 2024 09:29 Failure

vercel bot had a problem deploying to Preview June 12, 2024 10:56 Failure

vercel bot had a problem deploying to Preview June 12, 2024 11:16 Failure

vercel bot had a problem deploying to Preview June 12, 2024 16:21 Failure

vercel bot had a problem deploying to Preview June 12, 2024 17:03 Failure

vercel bot deployed to Preview June 12, 2024 17:45 View deployment

anshbansal mentioned this pull request Jun 14, 2024

feat(integration/prefect): prefect block to emit prefect pipeline #8414

Closed

5 tasks

vercel bot had a problem deploying to Preview July 1, 2024 11:48 Failure

coderabbitai bot reviewed Jul 1, 2024

View reviewed changes

shubhamjagtap639 added 15 commits July 1, 2024 14:09

Prefect source integration code added

99aae38

prefect-dataub package integrated with datahub

d64102d

Prefect doc Spell mistake corrected

9c71601

Remove not necessary md file

a876e87

Version added for some pakages in prefect-datahub

9a72681

Prefect version 2.0.0 restriction removed

95347e1

Prefect version set to >=2.0.0

68c6644

prefect-datahub build error fixed for python 3.7

6dc9da9

mypy stubs packages added

4e32e0e

acryl-datahub package added

9290837

Added some missing required setup files

aaad752

Extra packages added in requirements-dev

9884785

Added some extra packages

08bc3e0

temp changes

41072df

Revert temp changes

7662b58

david-leifker removed the devops PR or Issue related to DataHub backend & deployment label Jul 31, 2024

Merge branch 'master' into fix/ING-133/connector-for-prefect

5b6c2c8

vercel bot had a problem deploying to Preview August 21, 2024 09:32 Failure

Update doc

014a340

Fix build

vercel bot had a problem deploying to Preview August 21, 2024 12:04 Failure

treff7es added 2 commits August 21, 2024 14:06

Merge branch 'master' into fix/ING-133/connector-for-prefect

edffb1d

Improve doc

297fa62

vercel bot had a problem deploying to Preview August 21, 2024 12:26 Failure

Fix doc

889e4fb

vercel bot had a problem deploying to Preview August 21, 2024 13:58 Failure

Fixing doc

2e1d64c

vercel bot had a problem deploying to Preview August 21, 2024 14:25 Failure

Merge branch 'master' into fix/ING-133/connector-for-prefect

9e8f2a1

vercel bot had a problem deploying to Preview August 23, 2024 11:08 Failure

Merge branch 'master' into fix/ING-133/connector-for-prefect

6a701c9

vercel bot deployed to Preview August 28, 2024 14:20 View deployment

treff7es added 3 commits August 28, 2024 19:23

Fix tests

8b9548f

Fix black formatting

0422477

Merge branch 'master' into fix/ING-133/connector-for-prefect

c134e97

vercel bot deployed to Preview August 28, 2024 18:04 View deployment

Fixing wheel creation

0bef326

vercel bot deployed to Preview August 28, 2024 18:44 View deployment

Fixing examples

c096add

vercel bot deployed to Preview August 28, 2024 20:12 View deployment

Flake8 fix

f2a05a9

vercel bot deployed to Preview August 28, 2024 20:59 View deployment

Updating doc

706193e

vercel bot deployed to Preview August 28, 2024 21:37 View deployment

treff7es approved these changes Aug 29, 2024

View reviewed changes

treff7es merged commit 6204cba into datahub-project:master Aug 29, 2024
72 of 75 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ingestion/prefect-plugin): fixed the unit tests #10643

fix(ingestion/prefect-plugin): fixed the unit tests #10643

dushayntAW commented Jun 4, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 1, 2024 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

	assert expected_datajob_urn in datahub_emitter._datajobs_to_emit.keys()
	assert expected_datajob_urn in datahub_emitter._datajobs_to_emit

		import asyncio

		from prefect_datahub.datahub_emitter import DatahubEmitter

	env \| `str` \| PROD \| The environment that all assets produced by this orchestrator belong to. For more detail and possible values refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype).
	env \| `str` \| PROD \| The environment that all assets produced by this orchestrator belong to. For more detail and possible values, refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype).


		### Load the saved block in prefect workflows

		After installing `prefect-datahub` and [saving the configution](#saving-configurations-to-a-block), you can easily use it within your prefect workflows to help you emit metadata event as show below!

fix(ingestion/prefect-plugin): fixed the unit tests #10643

fix(ingestion/prefect-plugin): fixed the unit tests #10643

Conversation

dushayntAW commented Jun 4, 2024 • edited by coderabbitai bot Loading

Checklist

Summary by CodeRabbit

coderabbitai bot commented Jul 1, 2024 • edited Loading

Review skipped

Walkthrough

Changes

Sequence Diagrams

Poem

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

dushayntAW commented Jun 4, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 1, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)