Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: better structured headings #134

Merged
merged 2 commits into from
Jun 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 35 additions & 21 deletions grammar.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
// - Rule Order: Tree-sitter will prefer the token that appears earlier in the
// grammar.
//
// https://tree-sitter.github.io/tree-sitter/creating-parsers
// https://github.com/nvim-treesitter/nvim-treesitter/wiki/Parser-Development
// - Visibility: Prefer JS regex (/\n/) over literals ('\n') unless it should be
// exposed to queries as an anonymous node.
// - Rules starting with underscore are hidden in the syntax tree.

/// <reference types="tree-sitter-cli/dsl" />
Expand All @@ -16,6 +18,11 @@ const _li_token = /[-•][ ]+/;
module.exports = grammar({
name: 'vimdoc',

conflicts: $ => [
[$._line_noli, $._column_heading],
[$._column_heading],
],

extras: () => [/[\t ]/],

// inline: ($) => [
Expand Down Expand Up @@ -135,14 +142,14 @@ module.exports = grammar({
'>',
choice(
alias(token.immediate(/[a-z0-9]+\n/), $.language),
token.immediate('\n')),
token.immediate(/\n/)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS regex are for parsing purposes; only use literals if you want to expose them as anonymous nodes to query.

Copy link
Member

@justinmk justinmk Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex has lower priority than literals (oh but that's "only for terminals")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about priority; it's what gets exposed to queries. "Hiding" stuff from queries is the primary way of keeping parser size down (and performance up).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Hiding" stuff from queries is the primary way of keeping parser size down (and performance up).

Nice. Would be useful to add that tip here:

// - Match Specificity: Tree-sitter will prefer a token that is specified in
// the grammar as a String instead of a RegExp.
// - Rule Order: Tree-sitter will prefer the token that appears earlier in the
// grammar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@clason clason Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex has lower priority than literals (oh but that's "only for terminals")

Oh, I see what you mean here. But here it's fine since the token.immediate takes care of it. (I tested it.)

alias(repeat1(alias($.line_code, $.line)), $.code),
// Codeblock ends if a line starts with non-whitespace.
// Terminating "<" is consumed in other rules.
)),

// Lines.
_blank: () => field('blank', '\n'),
_blank: () => field('blank', /\n/),
line: ($) => choice(
$.column_heading,
$.h1,
Expand All @@ -156,18 +163,18 @@ module.exports = grammar({
optional(token.immediate('<')), // Treat codeblock-terminating "<" as whitespace.
_li_token,
choice(
alias(seq(repeat1($._atom), '\n'), $.line),
alias(seq(repeat1($._atom), /\n/), $.line),
seq(alias(repeat1($._atom), $.line), $.codeblock),
),
repeat(alias($._line_noli, $.line)),
)),
// Codeblock lines: must be indented by at least 1 space/tab.
// Line content (incl. whitespace) is captured as a single atom.
line_code: () => choice('\n', /[\t ]+[^\n]+\n/),
line_code: () => choice(/\n/, /[\t ]+[^\n]+\n/),
_line_noli: ($) => seq(
choice($._atom_noli, $._uppercase_words),
repeat($._atom),
choice($.codeblock, '\n')
choice($.codeblock, /\n/)
),

// Modeline: must start with "vim:" (optionally preceded by whitespace)
Expand All @@ -177,31 +184,38 @@ module.exports = grammar({
// Intended for table column names per `:help help-writing`.
// TODO: children should be $.word (plaintext), not $.atom.
column_heading: ($) => seq(
field('name', seq(choice($._atom_noli, $._uppercase_words), repeat($._atom))),
'~',
token.immediate('\n'),
alias($._column_heading, $.heading),
alias('~', $.delimiter),
token.immediate(/\n/),
),
// aliasing a seq exposes every item separately: create hidden rule and alias that
_column_heading: $ => prec.dynamic(1, seq(
choice($._atom_noli, $._uppercase_words),
repeat($._atom)
)),

h1: ($) =>
seq(
token.immediate(field('delimiter', /============+[\t ]*\n/)),
repeat1($._atom),
'\n',
),
prec(1, seq(
Copy link
Member

@justinmk justinmk Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe TS considers declaration order as part of precedence. So possibly we could avoid prec(1,...) by declaring these headings before block. But doesn't need to block this for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make it tighter, but some precedence is needed to resolve the conflict between heading and possible taglinks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe TS considers declaration order as part of precedence.

Only for terminals, e.g. string literals and regex patterns, and strings are higher than regex patterns by default

alias(token.immediate(/============+[\t ]*\n/), $.delimiter),
alias(repeat1($._atom), $.heading),
optional(seq($.tag, repeat($._atom))),
Comment on lines +200 to +201
Copy link
Member

@justinmk justinmk Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything before the first tag is heading (edit: oh I see this is consistent with h3)? should we name it text and use that as the pseudo-convention for exposing the "text content" of complex captures? I was experimenting with this but only used it for "fields" (see e.g. url).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends. If we use a common node (not name!) for text nodes (that aggregate words?), then we need a field again to distinguish them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, I tried to be consistent across headings (even column headings, which as usual is a massive headache).

/\n/,
)),

h2: ($) =>
seq(
token.immediate(field('delimiter', /------------+[\t ]*\n/)),
repeat1($._atom),
'\n',
),
prec(1, seq(
alias(token.immediate(/------------+[\t ]*\n/), $.delimiter),
alias(repeat1($._atom), $.heading),
optional(seq($.tag, repeat($._atom))),
/\n/,
)),

// Heading 3: UPPERCASE NAME, followed by optional *tags*.
h3: ($) =>
seq(
field('name', $.uppercase_name),
alias($.uppercase_name, $.heading),
optional(seq($.tag, repeat($._atom))),
'\n',
/\n/,
),

tag: ($) => _word($,
Expand Down
16 changes: 11 additions & 5 deletions queries/vimdoc/highlights.scm
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
(h1) @markup.heading.1
(h1
(delimiter) @markup.heading.1
(heading) @markup.heading.1)

(h2) @markup.heading.2
(h2
(delimiter) @markup.heading.2
(heading) @markup.heading.2)

(h3) @markup.heading.3
(h3
(heading) @markup.heading.3)

(column_heading) @markup.heading.4
(column_heading
(heading) @markup.heading.4)

(column_heading
"~" @markup.heading.4.marker
(delimiter) @markup.heading.4.marker
(#set! conceal ""))

(tag
Expand Down
Loading
Loading