-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable apparent-size
in du
command of get_size_on_disk
#6702
Conversation
d44c870
to
fcd9f73
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6702 +/- ##
==========================================
+ Coverage 77.99% 77.99% +0.01%
==========================================
Files 563 563
Lines 41761 41762 +1
==========================================
+ Hits 32567 32570 +3
+ Misses 9194 9192 -2 ☔ View full report in Codecov by Sentry. |
Thanks, I assume the different behavior can be tracked down to following change in coreutils, see https://fossies.org/linux/coreutils/ChangeLog
It seems the --apparent-size give a more realistic size of file you created rather than the block size that can depend on the FS. But not sure how to use the combination of options to make the output consistent between different du version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for figuring this out!
Just one ask for an explanatory comment in the code (i.e. basically what you write in a PR description).
Interesting, thanks for digging this up, @unkcpz!
Yeah, the apparent size is more realistic in that sense, as it only captures the actual data. Nonetheless, on your file system, even a file with less content will occupy the entire block, so I feel like it's not really that useful in the end (the other option, |
If the tests can't be stabilized, perhaps then we need to modify the tests, not the implementation... |
From users' point of view, I think using just the block size as current implementation can be better. The actual size may not matter too much but the size transfer over internet (by buffer size I think usually the default is 4kb) or the size occupy on the disk which mostly are computed by block. Thus I len to the changes of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor request on docstring, otherwise all good for me.
@danielhollas I'd say the actual problem was the implementation :D But this whole file size business is pretty annoying. We spent quite some time discussing this in the office already. Anything else to add from your side? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you for the extensive comment! Just one quip about the double meaning of "block size", but not a blocker.
Ah, I see that the tests are already failing for the last commit on main so this should just go in I think. |
In #6584 the
get_size_on_disk
method was added toRemoteData
, which calls thedu
utility as the default way to obtain the size of an associated file/directory, using the-s
(summary) and--bytes
option. As discovered in #6696, however, the relevant tests started failing when theubuntu-latest
environment of GHA got updated to point to Ubuntu 24.04 rather than 22.04.As it turns out,
--bytes
does not only give the output in bytes, but is actually equivalent to--apparent-size --block-size=1
. I'm not sure what exactly has changed between the different Ubuntu versions, if it's the behavior of the--apparent-size
flag ofdu
, or the way the apparent size is determined by the file system. Nonetheless, using apparent size makes the output fragile, as seen from the failing tests. Thus, I now changed the command fromdu -s --bytes
todu -s --block-size=1
. This also leads to the obtained file sizes for the different test cases to be larger (as expected), so these are also updated.