Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added multithreading support for mkcomposefs #269

Merged
merged 2 commits into from
Apr 10, 2024

Conversation

divineaugustine
Copy link
Contributor

@divineaugustine divineaugustine commented Mar 25, 2024

This resolves #249 and results in 10x improvement for digest calculation and file copy.

Please refer the below document for details about the issue and the change. I would like to know your thoughts.
threading.md

@divineaugustine divineaugustine marked this pull request as ready for review March 25, 2024 12:16
Copy link
Contributor

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for working on this! This may take some time to review as multithreading and C is tricky.

Can you please run clang-format? Basically let's get past the superficial bits before we can dig into the threading.

libcomposefs/lcfs-writer.c Outdated Show resolved Hide resolved
Copy link
Contributor

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking requested-changes to start just for formatting/style

@divineaugustine
Copy link
Contributor Author

divineaugustine commented Mar 25, 2024

Can you please run clang-format? Basically let's get past the superficial bits before we can dig into the threading.

Thank you for checking. I have run clang-format on my changes via vscode formatting. Is that what you meant?
FYI: I did rebase and force push to fix DCO error.

@divineaugustine
Copy link
Contributor Author

Sorry for the inconvenience one header file was missed out from formatting. I have corrected that as well. Hopefully this solves all the formatting issues from my changeset.

@alexlarsson
Copy link
Collaborator

I'll have a look at the code, but can you please rebase this and squash the fixups so the end result is a minimal set of independent changes. Say one for the liibrary changes, and one of the mkcomposefs use of it.

tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
@alexlarsson
Copy link
Collaborator

So, lots of comments. Make sure you read them all first, because some will make the other ones perhaps not needed. In particular, the propsed change to lcfs_node_set_from_content() will make most of the library API changes unnecessary.

@divineaugustine
Copy link
Contributor Author

divineaugustine commented Apr 3, 2024

So, lots of comments. Make sure you read them all first, because some will make the other ones perhaps not needed. In particular, the propsed change to lcfs_node_set_from_content() will make most of the library API changes unnecessary.

Thank you for the comments. I have addressed those and also squashed the commits to only 3 separate ones as you suggested. As i don´t know who should mark the comments as resolved, i just added reply to those.

@divineaugustine
Copy link
Contributor Author

Various nitpicks, but overall this looks good to me.

Thank you for reviewing it. I have addressed those and pushed the changes.

tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
@alexlarsson
Copy link
Collaborator

Some more minor feedback, but I would like @cgwalters or @giuseppe to also review this.

@alexlarsson
Copy link
Collaborator

alexlarsson commented Apr 5, 2024

Also, if you're able to talk about it, I'd love to hear what you're using composefs for. Its good to know what your users are doing.

@divineaugustine
Copy link
Contributor Author

Also, if you're able to talk about it, I'd love to hear what you're using composefs for. Its good to know what your users are doing.

Can you give some details @https://github.com/r0l1 ? He is the right person to comment on this.

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a comment, could you please sign the commits with your real name?

tools/mkcomposefs.c Outdated Show resolved Hide resolved
@divineaugustine
Copy link
Contributor Author

left a comment, could you please sign the commits with your real name?

My real name is Divin and i have signed off the commits with it. Did you mean including initials Divin OA or even initials expanded Divin Ookken Athappan?

@giuseppe
Copy link
Member

giuseppe commented Apr 8, 2024

My real name is Divin and i have signed off the commits with it. Did you mean including initials Divin OA or even initials expanded Divin Ookken Athappan?

yes please use the expanded version

@cgwalters
Copy link
Contributor

cgwalters commented Apr 8, 2024 via email

@divineaugustine
Copy link
Contributor Author

divineaugustine commented Apr 8, 2024

Yes, separate PRs for things like this that are logically separate please.

I will create a separate PR for it. Can i also squash up the current commits in this PR to just 2 ? One for library and other for mkcomposefs. It was done before but then as part of review comments another set of 4 were added again.

@cgwalters
Copy link
Contributor

Yes, I think one or two commits here is best.

@cgwalters
Copy link
Contributor

I'm assuming your goal is faster mkcomposefs overall (on a server side build system, right?). Using multiple threads is one way to do that. However, another way is caching - ostree does this thing where the build system can retain a cache of (device, inode number) ➡️ checksum; in theory we could offer the same.

The downside is of course that if something happens to mutate a file in the cache, you get incorrect checksums. But this is pretty easily mitigated, and actually a great thing with composefs is such a failure should be detected at runtime.

(These are complementary approaches to be clear; threading also makes sense)

@divineaugustine
Copy link
Contributor Author

divineaugustine commented Apr 9, 2024

Yes, I think one or two commits here is best.

Squashed the commits to just 2 (one for library and one for mkcomposefs) and siged it off with my full name. Added an issue #274 for copy_file_range

@r0l1
Copy link

r0l1 commented Apr 9, 2024

@alexlarsson we are using composefs for our custom OS and as docker container management layer. We have a small go initramfs including composefs and created a simple A/B OS which powers our devices at Wahtari and nLine. The devices are interlinked with a simple mesh like end-to-end encrypted network and this enables us to push updates (composefs store diff). I have some ideas how to improve the composefs store management and would like to discuss those ideas soon.

I really appreciate your work and follow your ideas since Glick2.

@cgwalters yes, this PR is for an internal speedup of the OS build process. Thanks for pointing out the cache idea.

@alexlarsson
Copy link
Collaborator

@r0l1 That sounds cool! Nice that it works for you.

@divineaugustine
Copy link
Contributor Author

I'm assuming your goal is faster mkcomposefs overall (on a server side build system, right?). Using multiple threads is one way to do that. However, another way is caching - ostree does this thing where the build system can retain a cache of (device, inode number) ➡️ checksum; in theory we could offer the same.

The downside is of course that if something happens to mutate a file in the cache, you get incorrect checksums. But this is pretty easily mitigated, and actually a great thing with composefs is such a failure should be detected at runtime.

(These are complementary approaches to be clear; threading also makes sense)

Thank you for these alternative-approaches. My objective was to make mkcomposefs run fast in any machine as part of our build process. It was simple and straight forward for me to add threads in mkcomposefs which does not require any facilitation from the end user. I am new to linux tooling and I would require some consultations here to fully understand these alternative approaches.

@alexlarsson
Copy link
Collaborator

I'm assuming your goal is faster mkcomposefs overall (on a server side build system, right?). Using multiple threads is one way to do that. However, another way is caching - ostree does this thing where the build system can retain a cache of (device, inode number) ➡️ checksum; in theory we could offer the same.

This only works though if you image compose produces hardlinked copies of files you previously committed (otherwise the device/inode of the new files will not match something old in the cache). This happens regularly in ostree because of how it is set up, but that isn't necessarily always true, and it causes some level of extra pain to enforce when you build using ostree. (With things like rofiles-fuse, etc.)

Not saying it is bad, but its not always applicable.

But anyway, if you do use a setup like that where you end up with hardlinked files, then the best way to store and quickly access the cached digest is probably to just enable fs-verity on these files. In other words, we should probably have lcfs_node_set_from_content() start by asking the kernel for the fs-verity digest in case it is set, rather than computing it ourselves.

@cgwalters
Copy link
Contributor

In other words, we should probably have lcfs_node_set_from_content() start by asking the kernel for the fs-verity digest in case it is set, rather than computing it ourselves.

Yes agreed, this is an obvious step.

@alexlarsson
Copy link
Collaborator

I think this looks good enough now. I looked through it again and did some local tests, seems to work well, and the public library APIs are sane.

@alexlarsson alexlarsson merged commit 4a68a42 into containers:main Apr 10, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mkcomposefs: multi-threading support
5 participants