[performance+memory] Beating git in index-pack
(as used for clones and fetches) β
π
#46
Replies: 5 comments 7 replies
-
memory mode: in-memory, resolve-bases, resolve-deltas, resolve-deltas-and-basesMore like a test I wanted to see if it makes any difference to keep the decompressed data in memory to speed up downstream operations. And it looks like this is actually reducing performance at least while the pack is also streamed to disk at the same time. The virtual memory system probably caches it entirely. When not streaming the pack to disk, in-memory operation appears to be yielding a mild speedup. But when allowing to write a temporary file, the speedup is entirely gone. Thus it seems that keeping decompressed bytes really doesn't do any good.
|
Beta Was this translation helpful? Give feedback.
-
For the actual performance tests on a 96 core machine, have a look at this comment. tldr;: the time is dominated by creating an index by streaming the pack, and pack resolution is then done in about 10 seconds or 14.6GB/s (of decoded objects). |
Beta Was this translation helpful? Give feedback.
-
The ARM git provided with MacOS Big Sur changes everything: With 3 threads (default)
Git is at least twice as fast when reading/streaming the pack. In our case this is limited by the deflate performance of millions of small streams, and there are still some improvements that we can make use of. With 8 threads (as available cores)
Clearly contention reduces speed. This effect is not visible at all when verifying pack entries, making me think the amount of work
|
Beta Was this translation helpful? Give feedback.
-
About the Arrival of ZLibNG and ARM64 performanceAs there have been plenty of updates sprinkled in multiple discussions, let me sum up the current state here and provide some hopefully reproducible runs on an M1 MacBook Air using a 1.3GB Linux kernel pack on commit 2b5d891 . Recently a major improvement was done which brings Before with
β¦and after with
The indexing phase is now twice as fast and roughly en-par with
However, as opposed to
β¦and here is the default with 3 threads β¦
It shows that the ARM version of git does something truly outstanding whose 3 threads are easily outperforming |
Beta Was this translation helpful? Give feedback.
-
Here are the fastest runs on an M1 with the latest version.
And the same with git.
Nothing effectively changed but the new measurements contain insights into memory usage. |
Beta Was this translation helpful? Give feedback.
-
git index-pack is streaming a pack and creates an index from it. The difficulty arises from having to decompress every entry in the pack stream, which can be composed of many small objects. These are placed in some sort of index to accelerate the next stage that is all about resolving the deltas in order to produce a SHA1. Per pack entry, the SHA1, pack offset and CRC32 are written into the index file to complete the operation.
The indexing phase in inherently single-threaded with little potential for improvements, whereas the resolving phase is fully multithreaded and entirely lock free. The first phase could be improved by writing the pack file in parallel - right now it happens after reading it (the pack file is used later for lookup to not hold everything in memory). However, IO doesn't appear to be the bottleneck at all.
Compared to
gitoxide
, git is considerably faster when creating the index, averaging 54MB/s of reading uncompressed bytes.gitoxide
clocks in at about45MB/s50MB/s, and slows down considerably during the end. Part of that slowdown might be attributed to this issue with resetting miniz_oxide's decompressor.Luckily
gitoxide
is way faster when resolving deltas, which already gives it a good first place in the race, with some room for more if it manages to get as fast as git when decompressing and indexing objects.The picture below shows the fastest git run I could produce, probably with everything being properly cached:
Without cache, it seems to look different:
The fastest
gitoxide
runs, which are pretty comparable in the amount of work done, as they also write out the pack and the index. The only difference is that they use the packfile directly instead of reading it from stdin, it's streamed nonetheless though, and merely an oversight.Memory consumption of git hovers consistently around 650MB (for the kernel pack), and is
lowerhigher than the1.2GB750MB580MB thatgitoxide
uses. However,gitoxide
can temporarily use more memory as it keeps intermediate decompressed objects per thread, whose maximum sizes depend on the amount of children and the base size. So I have seen this go up to 850MB for small fractions of time because of that.Beta Was this translation helpful? Give feedback.
All reactions