Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moves syntax errors to the semantic analysis worker #317

Merged
merged 8 commits into from
Jan 9, 2025

Conversation

ncordon
Copy link
Collaborator

@ncordon ncordon commented Dec 23, 2024

What

Up until now we were computing all of the syntax errors as a listener on parsing, but we want to move to a pattern of supporting both syntax and semantic errors in a separate thread (web worker).

Why

Because we want to be able to support both Cypher 5 and Cypher 25, plus minor versions for Cypher 25 and this gives us flexibility to just reload the web worker for the appropriate version, while auto-completing with the latest Cypher 25 grammar.

The semantic analysis packages both Cypher 5 and Cypher 25 knowledge. This would mean some auto-completions could potentially be too modern, but at least we would mark the errors appropriately to the user (right now for latest Cypher 5 or LTS and latest Cypher 25, in the future even for minor Cypher 25 versions if we need to).

Copy link

changeset-bot bot commented Dec 23, 2024

🦋 Changeset detected

Latest commit: 94d289a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 5 packages
Name Type
@neo4j-cypher/language-support Patch
@neo4j-cypher/language-server Patch
@neo4j-cypher/react-codemirror-playground Patch
@neo4j-cypher/react-codemirror Patch
@neo4j-cypher/schema-poller Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Comment on lines +188 to +190
collectedCommand.type !== 'cypher' && !isEmptyStatement
? errorListener.errors
: [];
Copy link
Collaborator Author

@ncordon ncordon Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only surface the syntax errors that are coming from console commands, let the web worker compute the errors for Cypher statements

Comment on lines -130 to -137
// options length is 0 should only happen when RULE_consoleCommand is hit and there are no other options
if (
ruleCandidates.find(
(ruleNumber) => ruleNumber === CypherParser.RULE_consoleCommand,
)
) {
return 'Console commands are unsupported in this environment.';
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this anymore

@ncordon ncordon force-pushed the errors-semantic-analysis-worker branch from f1700c6 to 09cb9b6 Compare January 6, 2025 15:03
Comment on lines +14 to +16
if (featureFlags.consoleCommands !== undefined) {
_internalFeatureFlags.consoleCommands = featureFlags.consoleCommands;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the internal feature flags (which are a global variable) only affected things that were run in the same thread than the editor (namely the syntax errors). Now we are moving the syntax errors down to the semantic analysis worker, so this change requires us to set that global variable there as well.

There's been talks for reshaping the way we package the language support into a class, but until then, this is the way forward.

@@ -46,7 +37,6 @@ export function cypher(config: CypherConfig) {
optionClass: completionStyles,
}),
cypherLinter(config),
semanticAnalysisLinter(config),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need a single linter because all of the errors live in the semantic analysis worker now.

@ncordon ncordon changed the title Moves errors to the semantic analysis worker Moves syntax errors to the semantic analysis worker Jan 6, 2025
Comment on lines +689 to +692
const query = `RETURN {\`something
foo

bar: "hello"}`;
Copy link
Collaborator Author

@ncordon ncordon Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error multiline unfinished backticked constructs has worsened and it would require some work in the mono-repo to fix it

Comment on lines +782 to +784
// TODO FIX ME
// Problem here is we were getting a better error before
test.fails('Syntax validation errors on an expected procedure name', () => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error for this case has also worsened, so this would have to be addressed from the mono repo

Comment on lines 851 to 858
message: `Invalid input '"foo"': expected a graph pattern, a parameter, ')', ':', 'IS', 'WHERE' or '{'`,
offsets: {
end: 14,
end: 15,
start: 9,
},
range: {
end: {
character: 14,
character: 15,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the way we are computing the positions finish in the mono repo, some errors are spanning new characters now.

That is, when we receive a syntax error, we just know where the token starts, but not where it finishes. And the only approach we have in that case is to consider the token finishes at the next non space position.

@@ -60,7 +60,7 @@ export class CypherEditorPage {
}

async checkNoNotificationMessage(type: 'error' | 'warning') {
await this.page.waitForTimeout(1000);
await this.page.waitForTimeout(3000);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was giving flakiness

@@ -100,7 +100,7 @@ test('benchmarking & performance test session', async ({

await editorPage.checkErrorMessage(
'RETRN',
'Unexpected token. Did you mean RETURN?',
`Invalid input 'RETRN': expected a graph pattern, 'FOREACH', ',', 'ORDER BY', 'CALL', 'CREATE', 'LOAD CSV', 'DELETE', 'DETACH', 'FINISH', 'INSERT', 'LIMIT', 'MATCH', 'MERGE', 'NODETACH', 'OFFSET', 'OPTIONAL', 'REMOVE', 'RETURN', 'SET', 'SKIP', 'UNION', 'UNWIND', 'USE', 'USING', 'WHERE', 'WITH' or <EOF>`,
Copy link
Collaborator Author

@ncordon ncordon Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some errors like this one have significantly worsened, I kind of liked the old one better but it's not easy to fix this without resorting to parsing the error, getting the tokens, etc, which doesn't seem very robust.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, like we mentioned in the call yesterday, it would probably be nicest if we moved some of the syntax validation logic that caused the better messages to happen before to now be executed in the semantic analysis

Comment on lines +193 to +196
await expect(
editorPage.page.locator('.cm-deprecated-element').last(),
).toBeVisible({ timeout: 3000 });
await editorPage.checkWarningMessage('id', 'Function id is deprecated.');
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting changes 🤷

Comment on lines +207 to 214
await expect(
editorPage.page.locator('.cm-deprecated-element').last(),
).toBeVisible({ timeout: 3000 });

await editorPage.checkWarningMessage(
'apoc.create.uuids',
"Procedure apoc.create.uuids is deprecated.",
'Procedure apoc.create.uuids is deprecated.',
);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting changes 🤷 . Probably because this file was modified and it was reformatted

Copy link
Collaborator

@anderson4j anderson4j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some spots where I got confused but didnt find any errors. Here might be a good place to confirm I got things correctly. We now get syntax errors from the transpiled semanticAnalysis.js right? (So change is - remove our own syntax error creation + hook in syntax errors from semanticAnalysis)

let token: Token | undefined = undefined;

const start = Position.create(
e.position.line - 1 + cmd.start.line - 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New method looks at e.range.start instead of e.position, but otherwise looks similar. Adjust here is (cmd.start.line - 2) though, while there it is only (cmd.start.line -1). Is this some intended change from switching to semanticDiagnostic over semanticElement?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried switching to ...line - 2 here too, and got line = -1 in some tests, so I guess intended. Still a bit confused why we needed -1x2 before and -1 now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because lines and columns start at 1 for antlr (and I think for the semantic errors in the database), so we were returning those. I adjusted the ones in the transpiled semantic anallysis so we don't have to worry about those from our side.


const start = Position.create(
e.position.line - 1 + cmd.start.line - 1,
e.position.column - 1 + (e.position.line === 1 ? cmd.start.column : 0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar thing here, we check if e.position.line === 1 here, while the new checks range.start.line === 0

const startOffset = e.position.offset + cmd.start.start;
const toExplore: ParseTree[] = [parseResult.ctx];

while (toExplore.length > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing this to the new, it seems like we're now getting the end position from the semantic analysis, and thus don't need to search the parseTree[] for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I've moved that down to the semantic analysis, because we have a proper parsing there (cypher 5 and cypher 25), whereas in the Javascript side we'll only maintain the Cypher 25 parser

@@ -100,7 +100,7 @@ test('benchmarking & performance test session', async ({

await editorPage.checkErrorMessage(
'RETRN',
'Unexpected token. Did you mean RETURN?',
`Invalid input 'RETRN': expected a graph pattern, 'FOREACH', ',', 'ORDER BY', 'CALL', 'CREATE', 'LOAD CSV', 'DELETE', 'DETACH', 'FINISH', 'INSERT', 'LIMIT', 'MATCH', 'MERGE', 'NODETACH', 'OFFSET', 'OPTIONAL', 'REMOVE', 'RETURN', 'SET', 'SKIP', 'UNION', 'UNWIND', 'USE', 'USING', 'WHERE', 'WITH' or <EOF>`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, like we mentioned in the call yesterday, it would probably be nicest if we moved some of the syntax validation logic that caused the better messages to happen before to now be executed in the semantic analysis

offsets: {
end: 114,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we mentioned these changes in offsets yesterday, but I can't quite remember - why do they happen again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've managed to fix a few of these. They were happening because syntax errors in the database just carry the start position.

Adding the end position for all of those is a really big task, so I've managed to rescue the tokens we lexed and finding the token that corresponds to this position. It works better for most of the cases I'd say.

Comment on lines -135 to -136
test('Misspelt keyword in the middle of the statement', () => {
const query = "MATCH (n:Person) WERE n.name = 'foo'";
Copy link
Collaborator Author

@ncordon ncordon Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was repeated

@ncordon ncordon merged commit 043d766 into main Jan 9, 2025
4 checks passed
@ncordon ncordon deleted the errors-semantic-analysis-worker branch January 9, 2025 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants