You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy/pasting my comments under the PR above for better visibility:
I am worried about removing files based on name-matching. IOW, the user could just have a file with a given name that matches the pattern purely by coincidence. I think there are two things that can mitigate that:
Save per-locale files in a directory. This doesn't eliminate the problem as the user may just create a problematic file in the given directory, which would, again, be removed.
Like above, but also add a metadata. The directory could contain a metadata file that lists all the files that represent chunks of an array. Instead of matching files by name, we can read that metadata, and delete files based on the names stored there.
to result in df.metadata, df2.metadata and d.metadata files (I am not wedded to extensions, I use md in #3915, but that means "markdown"). These metadata files can then store the information about actual datafiles that may or may not be in the same path as they exist. This also can allow storing metadata and the actual data in different file systems if need be. In that world, you wouldn't need to glob as in
This example illustrates some weakness of using prefixes to identify arkouda data.
The following code executes correctly.
However, notice the addition of the
_
to the prefix in read_parquet. Removing the_
results in the similarly named files being identified together:and
The text was updated successfully, but these errors were encountered: