-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallelize doFileHistory() for regular files in FileHistoryCache#store() #3542
Comments
Also, the current way how the directories are created: opengrok/opengrok-indexer/src/main/java/org/opengrok/indexer/history/FileHistoryCache.java Lines 537 to 555 in 610d908
is sub-optimal: it should really assemble the directories to be created in a set first and then go through the set and call mkdirs() for each item in the set. Like this is done now it calls isDirectory() more than is needed. Of course, more intelligent algorithm can be used to call mkdirs() on the longest paths first and drop those that have strictly smaller prefix. Perhaps construct a tree structure storing the directory tree, each node being a path component, the root node being the root directory (would work fine for Unix systems, it's a question whether this would work for Windows in the indexer context) and once the tree is populated with all the directories to create, traverse the leaf nodes and mkdirs() them.
|
One observation made when working on the fix for #3243 - when creating history cache for single repository with large history (e.g. Linux) the CPU is only lightly utilized, so this change should help boosting indexer performance. Also, this change is almost necessary given that the XML serialization filtering seems to impose additional non trivial workload (#3585). |
Playing with a proof of concept fix for #3243 I realized that regular files could be parallelized in the same way as renamed files, i.e. create the directories first and then use thread pool to perform doFileHistory() for each file. This will cost more memory - same assumption applies as for the proof of concept fix - history of individual files is reasonably big.
The text was updated successfully, but these errors were encountered: