-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doctype: parse all child nodes #83
Comments
does the |
no, this does not help to diff the
def walk_callback(node, is_compound):
nonlocal last_node_to
s = repr(node.text.decode("utf8"))
if len(s) > 50:
s = s[0:50] + "..."
if not is_compound:
node_source = input_html[last_node_to:node.range.end_byte]
last_node_to = node.range.end_byte
walk_html_tree_test_result += node_source
node_source = node_source.decode("utf8")
if len(node_source) > 50:
node_source = node_source[0:50] + "..."
print(f" node kind_id={node.kind_id:2d} type={node.type:10s} is_named={str(node.is_named):5s} {s:25s} {repr(node_source)}")
else:
print(f"# node kind_id={node.kind_id:2d} type={node.type:10s} is_named={str(node.is_named):5s} {s:25s}") currently i use the workaround if node_type_id == 1 or node_type_id == 4:
in_doctype_node = True
elif node_type_id == 3 and in_doctype_node == True:
in_doctype_node = False
node_source = input_html[(last_node_to + 1):node.range.end_byte]
node_source_space_before = b""
last_node_to = node.range.start_byte - 1 low priority stuff... to compare:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
input
result: compound nodes are prefixed with
#
problem: the
'html'
in'<!doctype html>'
has no parse nodeand the close tag
'>'
of'<!doctype html>'
has the same node type as the close tag
'>'
of'<hr>'
note how
' html'
spills into'>'
with
node_source = input_html[last_node_to:node.range.end_byte]
this is causing problems in a semantic stage using this parser
where i want to ...
either ignore the compound node
'<!doctype html>'
and process its child nodes
'<!'
and'doctype'
and'html'
and'>'
or process the compound node and ignore its child nodes
the cheap solution would be
to use a different node type for
'>'
of'<!doctype html>'
The text was updated successfully, but these errors were encountered: