Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease input-lag for large queries #158

Merged
merged 41 commits into from
Jan 30, 2024
Merged

Decrease input-lag for large queries #158

merged 41 commits into from
Jan 30, 2024

Conversation

OskarDamkjaer
Copy link
Collaborator

@OskarDamkjaer OskarDamkjaer commented Dec 21, 2023

Overview

This ended up being quite a bit larger than I had anticipated/hoped for, but the main changes are:

  1. Use workerpool pkg to create a pool of semantic analysis workers in both codemirror & the server itself. It will cancel outdated and unfinished semanatic analsysis runs when new ones get triggered. I also debounced it for the language server so it fires when the user has stopped typing (chose 600ms somewhat arbitrarily, codemirror has a built in debounce of 750ms for comparison)
  2. I detect slow parses in codemirror and switch to using prismjs for tokenization. Prism is a syntax highlighting library that has cypher support and can be used as a crude parser via it's tokenize method. In my testing it is about ~20x our parser.

A few future tasks:
[ ] Figure out a smarter way to bundle (see note on it below)
[ ] Reduce duplication of the worker code
[x] Improve react perf by debouncing the on change there as well

Outdated description below

Since the semantic analysis takes so much longer than the normal errors, I split it to it's own call.

I started by trying to get the same runSemanticAnalysis webworker implementation working everywhere with the web-worker node package, but it turned out to be really tricky to get it to work in all combinations of environment (jest/web/plain node) and targets (esm/cjs). I ended up leaving the function more or less as it was and then I do an esm only webworker in the react-codemirror package and then I use nodes built in worker_threads to get it to work in the server!

Future improvement - smarter debouncing

When the document is very big, it's easy to queue up a few semantic analysis calls in the webworker (even when debounced) since they can take so long. I think it'd make sense to make the calls into a queue and when one call is finished, all but the latest call are discarded. I carded this here: https://trello.com/c/EvujJSED/219-dont-start-new-semantic-analysis-while-the-old-one-is-still-running-debounce-while-running

Bundling concerns

We should consider how we can split the bundle like this:
CleanShot 2023-12-21 at 12 07 54

Right now only the worker loads the semantic analysis but it's still in the main bundle. My attempt at splitting failed because it was tricky to configure the worker to do the dynamic import. The other option is if we don't bundle, the client application will use their dead code elimination to make sure it doesn't get in the final bundle - i think this would be easier. I've added a card: worker-import-meta-url

.vscode/settings.json Outdated Show resolved Hide resolved
@OskarDamkjaer OskarDamkjaer requested a review from ncordon January 4, 2024 16:47
@OskarDamkjaer OskarDamkjaer changed the title Move semantic analysis work to a worker Decrease input-lag for large queries Jan 10, 2024
@OskarDamkjaer OskarDamkjaer marked this pull request as draft January 10, 2024 15:39
}

const proxyWorker = (await pool.proxy()) as unknown as LintWorker;
lastSemanticJob = proxyWorker.validateSemantics(query);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could pass positions here to avoid doing the the findposition in the main thread

@OskarDamkjaer OskarDamkjaer marked this pull request as ready for review January 15, 2024 10:02
"sideEffects": false,
"scripts": {
"build": "concurrently 'npm:build-types' 'npm:build-esm' 'npm:build-commonjs'",
"build": "echo done",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this I need to correct. I only got the workerpool working in codemirror by not bundling. But we still need to build for non-typescript consumers

Comment on lines 93 to 98
page.on('worker', (worker) => {
console.log('Worker created: ' + worker.url());
worker.on('close', (worker) =>
console.log('Worker destroyed: ' + worker.url()),
);
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, I just used it to debug why the e2e tests were failing

Comment on lines +84 to +86
if (document.length === 0) {
this.config.setUseLightVersion?.(false);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are assuming someone will wipeout the whole editor always to revert back to the antlr parsing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I was originally considering a parse time threshold (if prism parses under Xms it's probably fine with antlr), but that risks some really odd behaviour for outliers. Some queries might be fast with antlr and slow with prism for unexpected reasons and we risk rapid flipping between parsers if the thresholds are poorly set.

Flipping back when the document is simple to understand, and in the worst case scenario (paste large -> empty all -> paste large again), we're still avoiding the main issue (delay between inputs) as we switch back to prism after the first parse

import { LinterTask, LintWorker } from './lint-worker';

const pool = workerpool.pool(join(__dirname, 'lint-worker.js'), {
minWorkers: 2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need 2 workers? I think you explained me this already.

I tried including a console.log of the pool.stats() at the beginning of the linting here and at the beginning of the semantic linting and I can see 1 occupied worker, 1 idle one at most.

{totalWorkers: 2, busyWorkers: 1, idleWorkers: 1, pendingTasks: 0, activeTasks: 1}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to always have an idle worker available, so that we can avoid the startup times of spawning a new thread. My reasoning is that when one thread is busy, we can use the idle one without startup cost and then the old busy thread will be terminated and respawned so it's ready for the next call

I've not measured the start up time/performance gains though

@@ -47,6 +49,7 @@
"watch": "tsc -b -w"
},
"devDependencies": {
"@types/lodash.debounce": "^4.0.9",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the types needed here if we have the whole package in the dependencies?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lodash project doesn't have any types, they are provided by the "@types" package from microsoft who add types to untyped libraries, that's why it's imported separately. The reason it's in devDependencies is because the types are only needed at the build step

@@ -29,7 +31,27 @@ export function doSemanticAnalysis(query: string): SemanticAnalysisResult {
const errors: SemanticAnalysisElement[] = semanticErrorsResult.$errors.data;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the things coming back from the transpiled semantic analysis include the severity? So they should not be a SemanticAnalysisElement[]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, thanks! updated now 👍

},
"devDependencies": {
"@neo4j-ndl/base": "^1.10.1",
"@playwright/experimental-ct-react": "^1.39.0",
"@playwright/test": "^1.36.2",
"@types/lodash.debounce": "^4.0.9",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment, do we need the types package here if it's in the compile dependencies?

Comment on lines +193 to +196
private debouncedOnChange = this.props.onChange
? debounce(this.props.onChange, 200)
: undefined;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the description of the pr is outdated judging by this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes - my bad

@OskarDamkjaer OskarDamkjaer merged commit 1e210cb into main Jan 30, 2024
4 checks passed
@OskarDamkjaer OskarDamkjaer deleted the new_entrypoint branch January 30, 2024 09:45
ncordon pushed a commit that referenced this pull request Feb 5, 2024
* proof of concept

* merge main

* works again

* fix overuse of main channel

* mellan

* tests work

* fix worker thread for node as well

* self review

* self review

* self review

* fix build

* tests

* fix e2e test

* merge semantic analsysis and syntax errors again

* review comments

* self review

* fix tests

* worker pool-ish for vscode

* smarter 'pool' mgmt

* kindaworks

* works

* cleanup new parser adapter

* restore pkg json

* workerpool

* worker pool for client as well

* cleanup workers

* fix errors

* fix tests

* fix todos

* fix build

* test-are-green

* works for vite

* fixlint

* self review

* re-add proper build

* fix input lag even when parse is fast

* add changeset

* fix e2e test

* pr comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants