Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge changes in Rc/v2.1.0 to main branch #7
base: main
Are you sure you want to change the base?
Merge changes in Rc/v2.1.0 to main branch #7
Changes from 33 commits
532aaf5
ff5d202
432e835
9ec6f69
bda40b4
b155b1a
f7601bf
6570461
488d679
1f256bc
088feb5
30619ab
8298ec2
9cb51ac
01de803
b111f9a
a480124
8722c94
8bcefd3
5b719dd
62545e9
2bbf397
a02950e
51ba3ce
6e4a18a
89413a7
4665a23
19ccc3d
d164c10
912957f
52d058b
c06a26a
b8d3bc3
abe10d6
cf48a94
6dc6e0e
a14c500
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is not used anywhere. I think it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I've removed the unused
file_ext
variable since it's now handled in theget_file_paths()
function.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of new code here and I haven't reviewed it closely. The difference that strikes me the most is that the old
toupper
case-insensitive match behavior is gone. I imagine this can have an impact under Windows. Sinceload_data
has been rewritten to list files through this function, we need a good reason to deviate from the old behavior.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code follows the case-sensitive behavior of readRDS() and haven::read_sas() to avoid ambiguity and risk of matching the wrong file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code belongs to the
create_data_list
currently on themain
branch:It is there to warn against an edge-case scenario in which a folder contains two files that share the same name but differ in case. That is not a problem under linux, but we still want to warn users against that situation, because running the same code with the same data files under case-insensitive windows file systems could lead to the loading of different files.
This check is no longer in the rewritten
dv.loader
and it should be, unless the team decides otherwise.My suggestion here would be to take the original logic of
create_data_list
and adapt it minimally to follow the old filename-matching logic, so that we don't throw away useful behavior on a rewrite.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for raising this important point about case-sensitivity. I agree we should discuss with the team to determine if we want to to make any changes to the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whole paths would be better, because
file_paths
can point to different folders with files that share the same name.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of base file names as list names is maintained to ensure backward compatibility with existing code that relies on the legacy
load_data()
function. For users who need the full file paths, this information is stored in the metadata attributes of each data frame in the returned list.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users of legacy
load_data()
retain the old behavior because of this statement that happens at the end of that function:Users of
load_data_files()
would benefit from seeing the exact path they provided as names of the output list.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See discussion in internal chat about possibility of removing both the paths and the extensions as well as preventing repeat entries in the resulting list. The
at the end of
load_data()
should make that function immune to implementing this change.