Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix #1091] performance boost for big repos #1275

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

jung-kim
Copy link
Collaborator

@jung-kim jung-kim commented Feb 9, 2020

Few things are done here.

  • ignore 'rename' filewatch event as it can cause constant update and refresh loop
  • Prevent full gitlog history from server to client and load only what is needed
  • Prevent redundant ref and node calculations per each /gitlog api call

ignore 'rename' filewatch event as it can cause constant update and refresh loop

This one is bit tricky as some OS, git and node version behave differently and the fs.watch() doc itself does describe very inconsistent behavior.

On most platforms, 'rename' is emitted whenever a filename appears or disappears in the directory.

Here is a problem with that for some combination of OS, git and node versions:

  1. with /refs git api call, git operations would continuously delete and create some files such as .git/refs/remotes/origin/master
  2. Which triggers fs.watch() remote event
  3. Which emits git-directory-changed socket io event
  4. Which front end picks up and does many things but most importantly makes two api calls, /gitlog and /refs
    • /gitlog will seriously bog down entire performance both and back end and frontend as it is more compute intense operation to recalculate and draw
    • /refs will trigger everything back to step 1 and entire thing repeats

There is no way to test and prepare for all combinations. We should stick with latest doc and latest API.

So, adopted to latest API and ignoring rename event.

todo

  • fix tests

https://github.com/git/git page load with old code: https://recordit.co/O1lzQO62Z2

with old code

https://github.com/git/git page load with new code: https://recordit.co/bHmV8QiA0I

with new code


Below changes are abstracted out to #1277

Prevent full gitlog history from server to client and load only what is needed

Upon page load, client will make and API call to get list of git nodes. However, this call gets list of ALL nodes every time. i.e. if a pages is showing 300 nodes and user scrolls down asking for 25 more nodes, backend will return 325 nodes, not just the 25 nodes needed.

This is now fixed and backend will return only the nodes client needs.

Prevent redundant ref and node calculations per each /gitlog api call

One of the biggest performance hit for big repos was due to git refs calculations because ungit loads ALL refs, local and remote branches and tags of it's history.

This is needed to some degree to display them in the branch drop down and etc, but there were some inefficiencies on logic that thinking these refs had valid nodes and did unnecessary calculations.

@@ -71,9 +71,8 @@ exports.registerApi = (env) => {
});
}
}).then(() => {
const watcher = fs.watch(pathToWatch, options || {});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the file watcher events which I added. See #1263 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.... why? API documentation does not mention error event.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned fs.FSWatcher does have the events: https://nodejs.org/api/fs.html#fs_class_fs_fswatcher
And it is important to listen for the error event otherwise the process would crash if the event isn't handled. This was a source of clicktest failures because we deleted the temp directory while the directory was still openend in ungit.

Copy link
Collaborator Author

@jung-kim jung-kim Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I find this api design to be confusing... multiple places to register events and when you register change event listener, it would multiple sub event types? This is bit odd especially when fswatcher{} is not shared anywhere else.

It would be much more sensible if listener argument of fs.watch() has "error" events rather than expose it via entirely separate function

Yeah I will fix it up later

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@ylecuyer
Copy link
Contributor

ylecuyer commented Feb 9, 2020

Nice opportunities to speed up ungit for both repos 👍

Though could you split the PR in different branches (PRs) so that they can be tested individually?

@jung-kim
Copy link
Collaborator Author

Yeah you are right, let me break this one out to two:

one with ignore 'rename' filewatch event as it can cause constant update and refresh loop and Prevent redundant ref and node calculations per each /gitlog api call.

But inevitably Prevent full gitlog history from server to client and load only what is needed will be disproportionally as it impacts many places.

this.graph = graph;
this.sha1 = sha1;
this.isInited = false;
this.title = ko.observable();
this.parents = ko.observableArray();
this.commitTime = undefined; // commit time in string
this.date = undefined; // commit time in numeric format for sort
this.timestamp = undefined; // commit time in numeric format for sort
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "numeric format"?
Is that ISO or yyyyMMddhhmm?

@ylecuyer
Copy link
Contributor

I'm running the branch before the split for some days now and I often had this error :

Screenshot_2020-02-12_10-42-12

Am I the only one with this glitch ?

I should try one branch and the other now that the PR has been split

@jung-kim
Copy link
Collaborator Author

This PR definitely changes and challenges how things are being calculated.

Definitely needs lots testing and incubations. Will keep my eyes out for these kind of problems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants