-
Notifications
You must be signed in to change notification settings - Fork 12
/
Copy pathc.pest
165 lines (159 loc) · 6.5 KB
/
c.pest
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
// Copyright (C) 2023 Bryan A. Jones.
//
// This file is part of the CodeChat Editor. The CodeChat Editor is free
// software: you can redistribute it and/or modify it under the terms of the GNU
// General Public License as published by the Free Software Foundation, either
// version 3 of the License, or (at your option) any later version.
//
// The CodeChat Editor is distributed in the hope that it will be useful, but
// WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
// details.
//
// You should have received a copy of the GNU General Public License along with
// the CodeChat Editor. If not, see
// [http://www.gnu.org/licenses](http://www.gnu.org/licenses).
//
// # `c.pest` - Pest parser definition for the C language
//
// ## Comments
doc_block = _{ inline_comment | block_comment }
// Per the
// [C standard, section 6.4.3](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf#page=65),
// "white-space consists of: (space, horizontal tab, new-line, vertical tab, and
// form-feed)." Omit newlines, since the rest of this parser uses these.
vertical_tab = { "\x0B" }
form_feed = { "\x0C" }
white_space = { (" " | "\t" | vertical_tab | form_feed)* }
// The
// [C standard, section 6.4.9](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf#page=65),
// defines inline and block comments.
//
// ### Inline comments
inline_comment_delims = _{ inline_comment_delim_0 }
inline_comment_delim_0 = { "//" }
inline_comment_delim_1 = { unused }
inline_comment_delim_2 = { unused }
inline_comment_char = _{ not_newline }
// ### Block comments
block_comment = { block_comment_0 }
block_comment_opening_delim_0 = { "/*" }
block_comment_opening_delim_1 = { unused }
block_comment_opening_delim_2 = { unused }
block_comment_closing_delim_0 = { "*/" }
block_comment_closing_delim_1 = { unused }
block_comment_closing_delim_2 = { unused }
// ## Code
//
// Per the
// [C standard, section 5.1.1.2](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf#page=24),
// if a line of code ends with a backslash, it continues on the next line. This
// is a logical line; treat it as a single line. Therefore, consider a
// backslash-newline (or anything that's not a newline) a part of the current
// logical line. Note that this parser doesn't apply this rule to comments
// (which, per the spec, it should) for several reasons:
//
// 1. Comments continued onto another line don't look like a comment; this
// would confuse most developers.
// 2. The backslash-newline in a comment creates a
// [hard line break](https://spec.commonmark.org/0.31.2/#hard-line-breaks)
// in Markdown, which means inserting a hard line break this way in an
// inline comment requires the next line to omit the inline comment
// delimiters. For example:
//
// ```C
// // This is a hard line break\
// followed by a comment which must not include the // inline comment
// // delimiter on the line after the line break, but which must
// include them on following lines.
// ```
// 3. The CodeChat Editor web-to-code function produces incorrect results in
// this case, adding a comment delimiter when it shouldn't. To fix this, it
// would have to look for a backslash newline only in C/C++-like languages.
logical_line_char = _{ ("\\" ~ NEWLINE) | not_newline }
code_line_token = _{ logical_line_char }
// ## Dedenter
//
// This parser runs separately; it dedents block comments. There are several
// cases:
//
// - A single line: `/* comment */`. No special handling needed.
// - Multiple lines, in two styles.
// - Each line of the comment is not consistently whitespace-indented. No
// special handling needed. For example:
//
// ```C
// /* This is
// not
// consistently indented. */
// ```
//
// - Each line of the comment is consistently whitespace-indented; for
// example:
//
// ```C
// /* This is
// consistently indented. */
// ```
//
// Consistently indented means the first non-whitespace character on a line
// aligns with, but never comes before, the comment's start. Another
// example:
//
// ```C
// /* This is
// correct
//
// indentation.
// */
// ```
//
// Note that the third (blank) line doesn't have an indent; since that line
// consists only of whitespace, this is OK. Likewise, the last line
// (containing the closing comment delimiter of `*/`) consists only of
// whitespace after the comment delimiters are removed.
//
// - Each line of the comment is consistently asterisk-indented; for example:
//
// ```C
// /* This is
// * correct
// *
// * indentation.
// */
// ```
//
// Note that in this case, no whitespace-only lines are allowed. Instead,
// the special case is lines which have a newline immediately after the `*`.
//
// To implement this dedenting, we must have two paths to accepting the contents
// of a block comment. Otherwise, this parser rejects the block (it cannot be
// dedented). The approach:
//
// 1. The space-indented path. This requires:
// 1. The first line ends with a newline. (`valid_first_line`)
// 2. Non-first lines with contents must be properly indented. If a
// non-first line ends in a newline, it must not be the last line.
// (`dedented_line`)
// 3. A whitespace-only line must not be the last line, unless it has
// exactly the indent needed to align the closing comment delimiter
// (`last_line`).
// 2. The asterisk-indented path. The requirements are the same as the
// space-indented path, though the proper indent includes an asterisk in the
// correct location.
dedenter = {
SOI ~ indent ~ valid_first_line ~ (valid_space_line+ | valid_star_line+) ~ DROP ~ EOI
}
// Provide as input to this the amount of whitespace preceding either a " " or a
// "\* ".
indent = _{ PUSH(" "*) ~ NEWLINE }
valid_first_line = { not_newline* ~ NEWLINE }
valid_space_line = _{ (space_dedent ~ dedented_line) | last_line | (white_space ~ not_newline_eoi ~ vis_newline) }
valid_star_line = _{ (star_dedent ~ dedented_line) | last_line | (PEEK ~ " *" ~ not_newline_eoi ~ vis_newline) }
space_dedent = _{ PEEK ~ " " }
dedented_line = { not_newline* ~ not_newline_eoi ~ newline_eoi }
last_line = { PEEK ~ " " ~ EOI }
vis_newline = { NEWLINE }
not_newline_eoi = _{ !(NEWLINE ~ EOI) }
star_dedent = _{ PEEK ~ " * " }
// CodeChat Editor lexer: c_cpp.