Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use basenamePrefix instead of suffix when creating temporary file in overwrite save mode #1165

Merged

Conversation

norberttech
Copy link
Member

@norberttech norberttech commented Aug 4, 2024

Change Log

Added

Fixed

  • Use basenamePrefix instead of suffix when creating temporary file in overwrite save mode

Changed

Removed

Deprecated

Security


Description

Previously FilesystemStreams mechanism was adding ._flow_tmp suffix to a destination path when it was creating a temporary file. This creates one problem, if path is a file path with CSV, suffix will prevent Path object from properly detecting extension from file.csv._flow_tmp.

So let say we want to write to a /var/files/file.csv using df->saveMode(overwrite()).

Flow will first create a temporary file:

  • /var/files/file.csv._flow_tmp

To avoid overwriting an existing file which is a safety mechanism that is preventing overwriting existing file by unfinished transformation pipelines.

That's why instead of adding file suffix it's now adding basenamePrefix() which instead creates something like this:

  • /var/files/._flow_php_tmp.file.csv

Why detecting extension is important? There are Loaders like JsonLoader for example that are closing open streams by extension.

Copy link
Contributor

github-actions bot commented Aug 4, 2024

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 3.953mb +0.03%  | 510.822ms +0.15% | ±2.36% +322.18% |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 4.085mb +0.03%  | 1.069s +0.47%    | ±0.20% -50.33%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 28.545mb +0.00% | 425.211ms -1.89% | ±0.84% +61.84%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 3.713mb +0.04%  | 33.547ms -0.25%  | ±1.16% +42.03%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 3.659mb +0.04%  | 434.047ms -1.67% | ±0.46% -48.60%  |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 116.054mb +0.00% | 58.147ms -3.71% | ±0.20% -89.44% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode            | rstdev          |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.175mb +0.00%  | 83.434ms -2.50% | ±0.57% -46.90%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 102.497mb +0.00% | 51.699ms -3.99% | ±0.56% -71.84%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 123.824mb +0.00% | 1.218s -1.08%   | ±0.21% -45.60%  |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 16.969mb +0.01%  | 42.730ms -4.33% | ±1.09% +130.69% |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev           |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 52.638mb +0.00%  | 410.872ms +5.46% | ±1.74% +333.93%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 12.904mb +0.00%  | 77.612ms -1.69%  | ±0.49% -85.25%   |
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 86.812mb +0.00%  | 3.186ms -9.10%   | ±0.08% -97.40%   |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 102.413mb +0.00% | 186.796ms -1.64% | ±1.30% -37.35%   |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 85.133mb +0.00%  | 18.588ms -1.21%  | ±0.32% -78.82%   |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 88.052mb +0.00%  | 1.568ms -12.10%  | ±2.65% +48.39%   |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 88.052mb +0.00%  | 1.628ms -11.71%  | ±1.54% -46.54%   |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 85.164mb +0.00%  | 2.535ms -4.89%   | ±1.22% -61.17%   |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 85.693mb +0.00%  | 15.073ms -1.44%  | ±2.10% +151.80%  |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 85.693mb +0.00%  | 14.783ms -2.23%  | ±1.00% +434.44%  |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 83.596mb +0.00%  | 1.594μs -11.15%  | ±3.01% +12.77%   |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 83.596mb +0.00%  | 0.300μs -25.00%  | ±0.00% -100.00%  |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 92.947mb +0.00%  | 12.058ms -0.20%  | ±0.85% -52.43%   |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 122.318mb +0.00% | 59.771ms -3.37%  | ±1.17% +11.05%   |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 86.212mb +0.00%  | 1.201ms -6.64%   | ±2.73% +244.14%  |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 89.561mb +0.00%  | 60.037ms -5.60%  | ±2.08% +109.94%  |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 88.314mb +0.00%  | 3.763ms -6.99%   | ±1.09% -18.93%   |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 83.743mb +0.00%  | 38.820ms -3.50%  | ±0.88% -10.47%   |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 83.744mb +0.00%  | 39.158ms -0.29%  | ±1.08% +103.43%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 83.743mb +0.00%  | 39.481ms -0.15%  | ±1.05% +184.57%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 86.038mb +0.00%  | 7.317ms -1.20%   | ±0.11% -91.10%   |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 83.596mb +0.00%  | 29.251ms +0.51%  | ±1.82% +2563.89% |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 83.596mb +0.00%  | 13.066μs -5.40%  | ±1.56% -18.53%   |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 83.596mb +0.00%  | 15.824μs -4.00%  | ±1.18% -61.41%   |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 102.414mb +0.00% | 190.480ms -1.31% | ±0.84% -23.59%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 106.829mb +0.00% | 459.518ms -0.81% | ±0.47% +192.06%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.187mb +0.00%  | 233.463ms -2.39% | ±2.97% +131.11%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.025mb +0.00%  | 50.642ms -5.09%  | ±0.23% -92.35%   |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+

@norberttech norberttech merged commit a0234b3 into flow-php:1.x Aug 4, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant