Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Introducing AIRBYTE_TEMP_DIR For Temporary File Mount Overrides #368

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

niyasrad
Copy link
Contributor

@niyasrad niyasrad commented Sep 17, 2024

Description

  • The source-postgres connector failed to connect due to an error in accessing a system-level temporary directory, which is not accessible in the Colima setup. This caused the connector to be unable to find the required file, resulting in a failure.

  • The update ensures compatibility with containerized environments like Colima by using user-specific paths instead of system-level paths. This change also improves consistency and reliability in accessing temporary files and mount points across different environments.

  • The issue was observed on MacOS 12.4 (M1, 2020) with Colima. The paths now correctly resolve in user-specific locations, ensuring compatibility and reliability.

  • UPDATE: After gaining feedback from @aaronsteers , it was decided that the best solution to this problem would not be to change the default, but to introduce an override through ENV variables. i.e. described very well with the constants.py in the root level.

Fixes/Implements #367

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features
    • Introduced a dedicated temporary directory for Airbyte-related files, improving organization and management of temporary files.
    • Users can now specify a custom temporary directory via the OVERRIDE_TEMP_DIR variable, enhancing flexibility in file storage.
    • Temporary files are managed more effectively, allowing for easier cleanup and better organization by directing them to the specified location.

Copy link

coderabbitai bot commented Sep 17, 2024

Walkthrough

Walkthrough

The changes in this pull request modify the handling of temporary directories in the Airbyte codebase. Specifically, the get_connector_executor function and the as_temp_files function have been updated to utilize a dedicated temporary directory defined by the OVERRIDE_TEMP_DIR constant. If this constant is not set, the system's default temporary directory is used. This adjustment enhances the flexibility of temporary file management while maintaining the existing functionality of local mount directory creation.

Changes

File(s) Change Summary
airbyte/_executors/util.py Modified get_connector_executor to use OVERRIDE_TEMP_DIR for temporary directory handling.
airbyte/_util/temp_files.py Updated as_temp_files to create temporary files in the directory specified by OVERRIDE_TEMP_DIR.
airbyte/constants.py Introduced OVERRIDE_TEMP_DIR to hold a path for temporary files based on the AIRBYTE_TEMP_DIR environment variable.

Assessment against linked issues

Objective Addressed Explanation
Use User-Level Directories for Temporary Files and Mount Points (#367)

What do you think about these updates? Do they meet your expectations?


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai generate interesting stats about this repository and render them as a table.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.
Early access features: enabled

We are currently testing the following features in early access:

  • OpenAI o1 for code reviews: OpenAI's new o1 model is being tested for generating code suggestions in code reviews.

Note:

  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Outside diff range and nitpick comments (3)
airbyte/_util/temp_files.py (2)

29-29: Is the # noqa: SIM115 directive necessary here?

It seems like the # noqa: SIM115 comment might not be needed since there are no linting issues reported for this line. Removing it could clean up the code a bit. Wdyt?

Tools
Ruff

29-29: Unused noqa directive (unused: SIM115)

Remove unused noqa directive

(RUF100)


63-63: Could we add a newline at the end of the file?

Adding a trailing newline at the end of the file would adhere to standard formatting conventions and might prevent issues in certain environments or tools. Wdyt?

Tools
Ruff

63-63: No newline at end of file

Add trailing newline

(W292)

airbyte/_executors/util.py (1)

208-208: Remove whitespace from blank line at line 208

There is whitespace on the blank line at line 208; removing it can improve code cleanliness. Wdyt?

Tools
Ruff

208-208: Blank line contains whitespace

Remove whitespace from blank line

(W293)

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between f10933f and 72b8b14.

Files selected for processing (2)
  • airbyte/_executors/util.py (1 hunks)
  • airbyte/_util/temp_files.py (2 hunks)
Additional context used
Ruff
airbyte/_util/temp_files.py

29-29: Unused noqa directive (unused: SIM115)

Remove unused noqa directive

(RUF100)


63-63: No newline at end of file

Add trailing newline

(W292)

airbyte/_executors/util.py

208-208: Blank line contains whitespace

Remove whitespace from blank line

(W293)

airbyte/_util/temp_files.py Outdated Show resolved Hide resolved
airbyte/_util/temp_files.py Outdated Show resolved Hide resolved
airbyte/_executors/util.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 72b8b14 and 7403b02.

Files selected for processing (2)
  • airbyte/_executors/util.py (1 hunks)
  • airbyte/_util/temp_files.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • airbyte/_executors/util.py
Additional context used
Ruff
airbyte/_util/temp_files.py

29-29: Unused noqa directive (unused: SIM115)

Remove unused noqa directive

(RUF100)

airbyte/_util/temp_files.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post

Actionable comments posted: 5

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7403b02 and 4f84ef7.

Files selected for processing (3)
  • airbyte/_executors/util.py (2 hunks)
  • airbyte/_util/temp_files.py (2 hunks)
  • airbyte/constants.py (1 hunks)
Additional context used
Ruff
airbyte/_util/temp_files.py

30-30: Unused noqa directive (unused: SIM115)

Remove unused noqa directive

(RUF100)

airbyte/constants.py

67-67: Expected 2 blank lines after class or function definition, found (1)

Add missing blank line(s)

(E305)


78-78: Line too long (102 > 100)

(E501)


78-78: Trailing whitespace

Remove trailing whitespace

(W291)

airbyte/_executors/util.py

207-207: Replace ternary if expression with or operator

Replace with or operator

(FURB110)

Additional comments not posted (1)
airbyte/_util/temp_files.py (1)

Line range hint 14-34: LGTM!

The changes effectively use OVERRIDE_TEMP_DIR to specify the temporary directory for temporary files, enhancing compatibility with containerized environments. This aligns well with the PR objectives and improves the reliability of temporary file management. Great job!

Tools
Ruff

30-30: Unused noqa directive (unused: SIM115)

Remove unused noqa directive

(RUF100)

Comments failed to post (5)
airbyte/constants.py (3)

67-67: Add a blank line after the function definition

PEP 8 recommends having two blank lines after a function definition. Currently, there's only one blank line after _str_to_bool. Could we add another blank line to comply with the style guideline? Wdyt?

Tools
Ruff

67-67: Expected 2 blank lines after class or function definition, found (1)

Add missing blank line(s)

(E305)


78-78: Line exceeds maximum length and has trailing whitespace

Line 78 is longer than 100 characters and has trailing whitespace, which violates PEP 8 style guidelines. Could we wrap the line to keep it under 100 characters and remove the extra whitespace? Wdyt?

Here's how we might adjust it:

-need your temporary files to exist in user level directories, and not in system level directories for 
+need your temporary files to exist in user-level directories, not in system-level directories
+for permissions reasons.
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

need your temporary files to exist in user-level directories, not in system-level directories
for permissions reasons.
Tools
Ruff

78-78: Line too long (102 > 100)

(E501)


78-78: Trailing whitespace

Remove trailing whitespace

(W291)


67-71: Handle empty 'AIRBYTE_TEMP_DIR' environment variable

If AIRBYTE_TEMP_DIR is set to an empty string, OVERRIDE_TEMP_DIR becomes Path(''), which might not be intended and could cause issues when creating temporary files. Should we adjust the logic to treat an empty string as unset and set OVERRIDE_TEMP_DIR to None in that case? Wdyt?

Here's a suggested change:

 OVERRIDE_TEMP_DIR: Path | None = (
-    None
-    if "AIRBYTE_TEMP_DIR" not in os.environ
-    else Path(os.environ["AIRBYTE_TEMP_DIR"])
+    Path(os.environ["AIRBYTE_TEMP_DIR"])
+    if os.environ.get("AIRBYTE_TEMP_DIR")
+    else None
 )
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

OVERRIDE_TEMP_DIR: Path | None = (
    Path(os.environ["AIRBYTE_TEMP_DIR"])
    if os.environ.get("AIRBYTE_TEMP_DIR")
    else None
)
Tools
Ruff

67-67: Expected 2 blank lines after class or function definition, found (1)

Add missing blank line(s)

(E305)

airbyte/_executors/util.py (2)

207-207: Suggestion: Simplify temp_dir assignment using the or operator?

The ternary expression can be simplified by using the or operator for better readability. Wdyt?

Apply this diff:

-            temp_dir = OVERRIDE_TEMP_DIR if OVERRIDE_TEMP_DIR else Path(tempfile.gettempdir())
+            temp_dir = OVERRIDE_TEMP_DIR or Path(tempfile.gettempdir())
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        temp_dir = OVERRIDE_TEMP_DIR or Path(tempfile.gettempdir())
Tools
Ruff

207-207: Replace ternary if expression with or operator

Replace with or operator

(FURB110)


207-208: Ensure temp_dir is a Path object and the directory exists?

Since OVERRIDE_TEMP_DIR might be a string, to use it consistently and ensure it exists, consider converting it to a Path object and creating the directory if it doesn't exist. Wdyt?

Apply this diff:

-            temp_dir = OVERRIDE_TEMP_DIR if OVERRIDE_TEMP_DIR else Path(tempfile.gettempdir())
+            temp_dir = Path(OVERRIDE_TEMP_DIR) if OVERRIDE_TEMP_DIR else Path(tempfile.gettempdir())
+            temp_dir.mkdir(parents=True, exist_ok=True)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        temp_dir = Path(OVERRIDE_TEMP_DIR) if OVERRIDE_TEMP_DIR else Path(tempfile.gettempdir())
        temp_dir.mkdir(parents=True, exist_ok=True)
Tools
Ruff

207-207: Replace ternary if expression with or operator

Replace with or operator

(FURB110)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 4f84ef7 and 6908819.

Files selected for processing (3)
  • airbyte/_executors/util.py (2 hunks)
  • airbyte/_util/temp_files.py (2 hunks)
  • airbyte/constants.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • airbyte/_executors/util.py
  • airbyte/constants.py
Additional context used
Ruff
airbyte/_util/temp_files.py

30-30: Unused noqa directive (unused: SIM115)

Remove unused noqa directive

(RUF100)

Additional comments not posted (1)
airbyte/_util/temp_files.py (1)

14-15: LGTM!

Importing OVERRIDE_TEMP_DIR enhances flexibility in managing temporary files.

airbyte/_util/temp_files.py Show resolved Hide resolved
@niyasrad niyasrad changed the title Fix: Using User-Level Directories for Temporary Files and Mount Points Feat: Introducing AIRBYTE_TEMP_DIR For Temporary File Mount Overrides Sep 17, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6908819 and 3dc7aba.

Files selected for processing (3)
  • airbyte/_executors/util.py (2 hunks)
  • airbyte/_util/temp_files.py (2 hunks)
  • airbyte/constants.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • airbyte/_executors/util.py
  • airbyte/constants.py
Additional context used
Learnings (1)
airbyte/_util/temp_files.py (1)
Learnt from: niyasrad
PR: airbytehq/PyAirbyte#368
File: airbyte/_util/temp_files.py:26-27
Timestamp: 2024-09-17T18:13:47.331Z
Learning: When `OVERRIDE_TEMP_DIR` is `None`, we should not supply the `dir` argument to `tempfile` functions to preserve the existing behavior.
Ruff
airbyte/_util/temp_files.py

30-30: Unused noqa directive (unused: SIM115)

Remove unused noqa directive

(RUF100)

Comment on lines +26 to +27
temp_dir = OVERRIDE_TEMP_DIR

Copy link

@coderabbitai coderabbitai bot Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we avoid passing dir=temp_dir when OVERRIDE_TEMP_DIR is None?

When OVERRIDE_TEMP_DIR is None, passing dir=None to tempfile.NamedTemporaryFile might alter the default behavior on some systems. To preserve the existing behavior when no override is specified, perhaps we should only include the dir parameter when OVERRIDE_TEMP_DIR is not None. Wdyt?

Apply this diff to conditionally set the dir parameter:

         for content in files_contents:
             use_json = isinstance(content, dict)
+            temp_file_kwargs = {
+                "mode": "w+t",
+                "delete": False,
+                "encoding": "utf-8",
+                "suffix": ".json" if use_json else ".txt",
+            }
+            if temp_dir is not None:
+                temp_file_kwargs["dir"] = temp_dir
             temp_file = tempfile.NamedTemporaryFile(  # noqa: SIM115  # Avoiding context manager
-                mode="w+t",
-                delete=False,
-                encoding="utf-8",
-                dir=temp_dir,
-                suffix=".json" if use_json else ".txt",
+                **temp_file_kwargs
             )

Committable suggestion was skipped due to low confidence.

Copy link
Contributor

@aaronsteers aaronsteers Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai and @niyasrad - I think I disagree (respectfully) with the suggestion above.

Looking at the docs for NamedTemporaryFile, the default is dir=None, which is identical to the behavior when no override is set.

The current implementation is more readable than the suggestion, hence I'd keep what we have presently.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronsteers, thank you for the clarification. You're right; passing dir=None maintains the default behavior. We'll keep the current implementation as is. Would you like me to update our learnings to reflect this?

(_/)
( •_•)
💧


Learnings used
Learnt from: niyasrad
PR: airbytehq/PyAirbyte#368
File: airbyte/_util/temp_files.py:26-27
Timestamp: 2024-09-17T18:13:47.331Z
Learning: When `OVERRIDE_TEMP_DIR` is `None`, we should not supply the `dir` argument to `tempfile` functions to preserve the existing behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest renaming this variable though - or just mapping the constant directly, for readability.

            temp_file = tempfile.NamedTemporaryFile( # noqa: SIM115 # Avoiding context manager
                mode="w+t",
                delete=False,
                encoding="utf-8",
                dir=OVERRIDE_TEMP_DIR,
                suffix=".json" if use_json else ".txt",
            )

Otherwise, you could keep all the same as-is but just rename temp_dir to temp_dir_override so it is clear to the reader we aren't always passing a non-null value to this arg.

@@ -65,6 +65,19 @@ def _str_to_bool(value: str) -> bool:
return bool(value) and value.lower() not in {"", "0", "false", "f", "no", "n", "off"}


OVERRIDE_TEMP_DIR: Path | None = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If we could rename to TEMP_DIR_OVERRIDE, it will sort nicely in auto-complete with TEMP_FILE_CLEANUP below.

Wdyt?

Copy link
Contributor

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. ✅
Pending optional/nit feedback, I think this can be merged when ready.

@aaronsteers aaronsteers merged commit e534a3d into airbytehq:main Sep 17, 2024
9 checks passed
@aaronsteers
Copy link
Contributor

aaronsteers commented Sep 17, 2024

Moved follow-on to new PR:

Am about to run a release and will slip this in along with the other updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants