-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Introducing AIRBYTE_TEMP_DIR
For Temporary File Mount Overrides
#368
Conversation
WalkthroughWalkthroughThe changes in this pull request modify the handling of temporary directories in the Airbyte codebase. Specifically, the Changes
Assessment against linked issues
What do you think about these updates? Do they meet your expectations? Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Outside diff range and nitpick comments (3)
airbyte/_util/temp_files.py (2)
29-29
: Is the# noqa: SIM115
directive necessary here?It seems like the
# noqa: SIM115
comment might not be needed since there are no linting issues reported for this line. Removing it could clean up the code a bit. Wdyt?Tools
Ruff
29-29: Unused
noqa
directive (unused:SIM115
)Remove unused
noqa
directive(RUF100)
63-63
: Could we add a newline at the end of the file?Adding a trailing newline at the end of the file would adhere to standard formatting conventions and might prevent issues in certain environments or tools. Wdyt?
Tools
Ruff
63-63: No newline at end of file
Add trailing newline
(W292)
airbyte/_executors/util.py (1)
208-208
: Remove whitespace from blank line at line 208There is whitespace on the blank line at line 208; removing it can improve code cleanliness. Wdyt?
Tools
Ruff
208-208: Blank line contains whitespace
Remove whitespace from blank line
(W293)
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- airbyte/_executors/util.py (1 hunks)
- airbyte/_util/temp_files.py (2 hunks)
Additional context used
Ruff
airbyte/_util/temp_files.py
29-29: Unused
noqa
directive (unused:SIM115
)Remove unused
noqa
directive(RUF100)
63-63: No newline at end of file
Add trailing newline
(W292)
airbyte/_executors/util.py
208-208: Blank line contains whitespace
Remove whitespace from blank line
(W293)
72b8b14
to
7403b02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- airbyte/_executors/util.py (1 hunks)
- airbyte/_util/temp_files.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- airbyte/_executors/util.py
Additional context used
Ruff
airbyte/_util/temp_files.py
29-29: Unused
noqa
directive (unused:SIM115
)Remove unused
noqa
directive(RUF100)
4f84ef7
to
6908819
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post
Actionable comments posted: 5
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- airbyte/_executors/util.py (2 hunks)
- airbyte/_util/temp_files.py (2 hunks)
- airbyte/constants.py (1 hunks)
Additional context used
Ruff
airbyte/_util/temp_files.py
30-30: Unused
noqa
directive (unused:SIM115
)Remove unused
noqa
directive(RUF100)
airbyte/constants.py
67-67: Expected 2 blank lines after class or function definition, found (1)
Add missing blank line(s)
(E305)
78-78: Line too long (102 > 100)
(E501)
78-78: Trailing whitespace
Remove trailing whitespace
(W291)
airbyte/_executors/util.py
207-207: Replace ternary
if
expression withor
operatorReplace with
or
operator(FURB110)
Additional comments not posted (1)
airbyte/_util/temp_files.py (1)
Line range hint
14-34
: LGTM!The changes effectively use
OVERRIDE_TEMP_DIR
to specify the temporary directory for temporary files, enhancing compatibility with containerized environments. This aligns well with the PR objectives and improves the reliability of temporary file management. Great job!Tools
Ruff
30-30: Unused
noqa
directive (unused:SIM115
)Remove unused
noqa
directive(RUF100)
Comments failed to post (5)
airbyte/constants.py (3)
67-67: Add a blank line after the function definition
PEP 8 recommends having two blank lines after a function definition. Currently, there's only one blank line after
_str_to_bool
. Could we add another blank line to comply with the style guideline? Wdyt?Tools
Ruff
67-67: Expected 2 blank lines after class or function definition, found (1)
Add missing blank line(s)
(E305)
78-78: Line exceeds maximum length and has trailing whitespace
Line 78 is longer than 100 characters and has trailing whitespace, which violates PEP 8 style guidelines. Could we wrap the line to keep it under 100 characters and remove the extra whitespace? Wdyt?
Here's how we might adjust it:
-need your temporary files to exist in user level directories, and not in system level directories for +need your temporary files to exist in user-level directories, not in system-level directories +for permissions reasons.Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.need your temporary files to exist in user-level directories, not in system-level directories for permissions reasons.
Tools
Ruff
78-78: Line too long (102 > 100)
(E501)
78-78: Trailing whitespace
Remove trailing whitespace
(W291)
67-71: Handle empty 'AIRBYTE_TEMP_DIR' environment variable
If
AIRBYTE_TEMP_DIR
is set to an empty string,OVERRIDE_TEMP_DIR
becomesPath('')
, which might not be intended and could cause issues when creating temporary files. Should we adjust the logic to treat an empty string as unset and setOVERRIDE_TEMP_DIR
toNone
in that case? Wdyt?Here's a suggested change:
OVERRIDE_TEMP_DIR: Path | None = ( - None - if "AIRBYTE_TEMP_DIR" not in os.environ - else Path(os.environ["AIRBYTE_TEMP_DIR"]) + Path(os.environ["AIRBYTE_TEMP_DIR"]) + if os.environ.get("AIRBYTE_TEMP_DIR") + else None )Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.OVERRIDE_TEMP_DIR: Path | None = ( Path(os.environ["AIRBYTE_TEMP_DIR"]) if os.environ.get("AIRBYTE_TEMP_DIR") else None )
Tools
Ruff
67-67: Expected 2 blank lines after class or function definition, found (1)
Add missing blank line(s)
(E305)
airbyte/_executors/util.py (2)
207-207: Suggestion: Simplify
temp_dir
assignment using theor
operator?The ternary expression can be simplified by using the
or
operator for better readability. Wdyt?Apply this diff:
- temp_dir = OVERRIDE_TEMP_DIR if OVERRIDE_TEMP_DIR else Path(tempfile.gettempdir()) + temp_dir = OVERRIDE_TEMP_DIR or Path(tempfile.gettempdir())Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.temp_dir = OVERRIDE_TEMP_DIR or Path(tempfile.gettempdir())
Tools
Ruff
207-207: Replace ternary
if
expression withor
operatorReplace with
or
operator(FURB110)
207-208: Ensure
temp_dir
is aPath
object and the directory exists?Since
OVERRIDE_TEMP_DIR
might be a string, to use it consistently and ensure it exists, consider converting it to aPath
object and creating the directory if it doesn't exist. Wdyt?Apply this diff:
- temp_dir = OVERRIDE_TEMP_DIR if OVERRIDE_TEMP_DIR else Path(tempfile.gettempdir()) + temp_dir = Path(OVERRIDE_TEMP_DIR) if OVERRIDE_TEMP_DIR else Path(tempfile.gettempdir()) + temp_dir.mkdir(parents=True, exist_ok=True)Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.temp_dir = Path(OVERRIDE_TEMP_DIR) if OVERRIDE_TEMP_DIR else Path(tempfile.gettempdir()) temp_dir.mkdir(parents=True, exist_ok=True)
Tools
Ruff
207-207: Replace ternary
if
expression withor
operatorReplace with
or
operator(FURB110)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- airbyte/_executors/util.py (2 hunks)
- airbyte/_util/temp_files.py (2 hunks)
- airbyte/constants.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- airbyte/_executors/util.py
- airbyte/constants.py
Additional context used
Ruff
airbyte/_util/temp_files.py
30-30: Unused
noqa
directive (unused:SIM115
)Remove unused
noqa
directive(RUF100)
Additional comments not posted (1)
airbyte/_util/temp_files.py (1)
14-15
: LGTM!Importing
OVERRIDE_TEMP_DIR
enhances flexibility in managing temporary files.
AIRBYTE_TEMP_DIR
For Temporary File Mount Overrides
6908819
to
3dc7aba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- airbyte/_executors/util.py (2 hunks)
- airbyte/_util/temp_files.py (2 hunks)
- airbyte/constants.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- airbyte/_executors/util.py
- airbyte/constants.py
Additional context used
Learnings (1)
airbyte/_util/temp_files.py (1)
Learnt from: niyasrad PR: airbytehq/PyAirbyte#368 File: airbyte/_util/temp_files.py:26-27 Timestamp: 2024-09-17T18:13:47.331Z Learning: When `OVERRIDE_TEMP_DIR` is `None`, we should not supply the `dir` argument to `tempfile` functions to preserve the existing behavior.
Ruff
airbyte/_util/temp_files.py
30-30: Unused
noqa
directive (unused:SIM115
)Remove unused
noqa
directive(RUF100)
temp_dir = OVERRIDE_TEMP_DIR | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we avoid passing dir=temp_dir
when OVERRIDE_TEMP_DIR
is None
?
When OVERRIDE_TEMP_DIR
is None
, passing dir=None
to tempfile.NamedTemporaryFile
might alter the default behavior on some systems. To preserve the existing behavior when no override is specified, perhaps we should only include the dir
parameter when OVERRIDE_TEMP_DIR
is not None
. Wdyt?
Apply this diff to conditionally set the dir
parameter:
for content in files_contents:
use_json = isinstance(content, dict)
+ temp_file_kwargs = {
+ "mode": "w+t",
+ "delete": False,
+ "encoding": "utf-8",
+ "suffix": ".json" if use_json else ".txt",
+ }
+ if temp_dir is not None:
+ temp_file_kwargs["dir"] = temp_dir
temp_file = tempfile.NamedTemporaryFile( # noqa: SIM115 # Avoiding context manager
- mode="w+t",
- delete=False,
- encoding="utf-8",
- dir=temp_dir,
- suffix=".json" if use_json else ".txt",
+ **temp_file_kwargs
)
Committable suggestion was skipped due to low confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai and @niyasrad - I think I disagree (respectfully) with the suggestion above.
Looking at the docs for NamedTemporaryFile, the default is dir=None
, which is identical to the behavior when no override is set.
The current implementation is more readable than the suggestion, hence I'd keep what we have presently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronsteers, thank you for the clarification. You're right; passing dir=None
maintains the default behavior. We'll keep the current implementation as is. Would you like me to update our learnings to reflect this?
(_/)
( •_•)
💧
Learnings used
Learnt from: niyasrad
PR: airbytehq/PyAirbyte#368
File: airbyte/_util/temp_files.py:26-27
Timestamp: 2024-09-17T18:13:47.331Z
Learning: When `OVERRIDE_TEMP_DIR` is `None`, we should not supply the `dir` argument to `tempfile` functions to preserve the existing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest renaming this variable though - or just mapping the constant directly, for readability.
temp_file = tempfile.NamedTemporaryFile( # noqa: SIM115 # Avoiding context manager
mode="w+t",
delete=False,
encoding="utf-8",
dir=OVERRIDE_TEMP_DIR,
suffix=".json" if use_json else ".txt",
)
Otherwise, you could keep all the same as-is but just rename temp_dir
to temp_dir_override
so it is clear to the reader we aren't always passing a non-null value to this arg.
@@ -65,6 +65,19 @@ def _str_to_bool(value: str) -> bool: | |||
return bool(value) and value.lower() not in {"", "0", "false", "f", "no", "n", "off"} | |||
|
|||
|
|||
OVERRIDE_TEMP_DIR: Path | None = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: If we could rename to TEMP_DIR_OVERRIDE
, it will sort nicely in auto-complete with TEMP_FILE_CLEANUP
below.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. ✅
Pending optional/nit feedback, I think this can be merged when ready.
Moved follow-on to new PR: Am about to run a release and will slip this in along with the other updates. |
Description
The source-postgres connector failed to connect due to an error in accessing a system-level temporary directory, which is not accessible in the Colima setup. This caused the connector to be unable to find the required file, resulting in a failure.
The update ensures compatibility with containerized environments like Colima by using user-specific paths instead of system-level paths. This change also improves consistency and reliability in accessing temporary files and mount points across different environments.
The issue was observed on MacOS 12.4 (M1, 2020) with Colima. The paths now correctly resolve in user-specific locations, ensuring compatibility and reliability.
UPDATE: After gaining feedback from @aaronsteers , it was decided that the best solution to this problem would not be to change the default, but to introduce an override through ENV variables. i.e. described very well with the
constants.py
in the root level.Fixes/Implements #367
Summary by CodeRabbit
Summary by CodeRabbit
OVERRIDE_TEMP_DIR
variable, enhancing flexibility in file storage.