Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Expected valid tag name" when using special characters like < #76

Open
Tracked by #568
rhythm-section opened this issue Jun 4, 2020 · 18 comments · May be fixed by #692
Open
Tracked by #568

"Expected valid tag name" when using special characters like < #76

rhythm-section opened this issue Jun 4, 2020 · 18 comments · May be fixed by #692
Labels
assigned Whether or not this bug has been assigned some to some other issues as a subtask or pre-req enhancement New feature or request
Milestone

Comments

@rhythm-section
Copy link

rhythm-section commented Jun 4, 2020

Hello @pngwn, thank you for this great project!

I am having an issue when using the < character inside a Markdown file. Even when replacing it with &lt;. It seems the &lt; gets replaced by < again resulting in the same error from the svelte compiler ("Expected valid tag name"). I even tried to escape the character with \ without success.

Is there any way to escape special characters so the svelte compiler does not throw an error?

My current "solution" is to wrap < inside an inline code block but I do not want to show that part as code. This is the Markdown file I am talking about: https://github.com/nymea/nymea-plugins/blob/rework-readmes/awattar/README.md This is the version with the "fixed" inline code blocks. When removing those, the error gets thrown.

I am not sure if this is a bug or if I miss something here. Just found this code section in MDSveX:

// in code nodes replace the character witrh the html entities
// maybe I'll need more of these

const entites = [
	[/</g, '&lt;'],
	[/>/g, '&gt;'],
	[/{/g, '&#123;'],
	[/}/g, '&#125;'],
];

So I guess the < character should only be replaced inside code blocks as mentioned in the comment, but using &lt; leads to the same error because somewhere during the preprocess it gets replaced with < again.

@pngwn
Copy link
Owner

pngwn commented Jun 4, 2020

Interesting, I would expect using a raw < to break but mdsvex only explicitly replaces the above characters in fenced code (either inline or block). Using &lt; etc. should work. The markdown parser maybe converting these entities behind the scenes. I'll take a look at this.

@pngwn pngwn added the bug Something isn't working label Jun 4, 2020
@pngwn
Copy link
Owner

pngwn commented Jun 16, 2020

They are getting decoded by the markdown parser, this problem is a litle more complex than I thought. I have a potential solution but I'm going to see if there is a simpler way of solving the problem.

In other news my investigations have uncovered another bug, the following does not work either:

 - 1 {"<"} 2

When smartypants is enabled (which it is by default), the quotes get converted to fancy quotes. (#83)

@pngwn
Copy link
Owner

pngwn commented Jun 16, 2020

I can feel another custom node type coming on (for entities), modifying the parser seems to be the "best" approach and should be less work than trying to selectively undo the entity decoding in the transform phase.

@rhythm-section
Copy link
Author

Thank you for the investigation! The custom node type sounds good to me.

@TheComputerM
Copy link
Contributor

Try to use {@html ...} as a workaround

@cesutherland
Copy link

I'm running into this with Katex as well: #113

@pngwn
Copy link
Owner

pngwn commented Aug 14, 2020

This will be partially addressed by the work discussed in #116. I can make > and } be legal characters in the document without issue (as my html syntax will be very strict and I can escape plain text variants of those characters), however < and { will never be legal plain text characters as they mark the start of various states that the parser will enter. To a degree this issues is irresolvable because some characters just conflict with html and svelte syntax in a way that cannot be correctly analysed. There are a few ways to support some cases but I'll have to look into those at a later date.

@pngwn
Copy link
Owner

pngwn commented Aug 14, 2020

#113 has some other information and a nice test case.

@pngwn pngwn added enhancement New feature or request and removed bug Something isn't working labels Apr 3, 2021
@wlach
Copy link
Contributor

wlach commented Apr 4, 2021

This will be partially addressed by the work discussed in #116. I can make > and } be legal characters in the document without issue (as my html syntax will be very strict and I can escape plain text variants of those characters), however < and { will never be legal plain text characters as they mark the start of various states that the parser will enter. To a degree this issues is irresolvable because some characters just conflict with html and svelte syntax in a way that cannot be correctly analysed. There are a few ways to support some cases but I'll have to look into those at a later date.

I wonder if mdsvex should interpret < and { characters followed by a space literally, generally well-formatted HTML/svelte doesn't do this and handling this in a special way would allow a number of obvious cases to work, such as this one: #113 (comment) (tl;dr: writing foo < bar to make some didactic point)

@pngwn
Copy link
Owner

pngwn commented Apr 4, 2021

Yeah, I am considering this for < for this specific reason. It seems a reasonable tradeoff because otherwise writing very basic syntax will be very difficult. This is especially notable for mdsvex because users are typically developers of some description and lessthan and greaterthan symbols will appear more often than in a typical document.

For curly braces, I'm less certain. It is quite common to have leading and trailing spaces for text expressions (example). I think block syntax requires there to be no space before the # in the current implementation but I can't quite recall as there is no spec.

Curly braces are just generally problematic, they are quite commonly used in custom markdown syntax for additional metadata/ attributes but they pose a bit problem because of their importance to svelte. I'll take a look at some popular use-cases and see if I can figure out a way to disambiguate them when I start work on yet another parser for mdsvex.

I have a new parser (the svelte-parse) that observes this rule and has a well defined AST, although not a parsing spec. However, this will need to be rewritten, probably twice,(don't ask) the first of which I will be starting soon (the second will have no user impact and will be purely internal but more of a long term goal). When I do that, it will also have a parsing spec.

@wighawag
Copy link

Is there a workaround for now ?

I tried the following in the playground and all fails

5 &lt; 10
5 < 10
5 {"<"} 10
5 {<} 10
5 {@html <} 10
5 {@html &lt;} 10

@josephg
Copy link

josephg commented May 25, 2021

Its awful but double escaping seems to work:

5 &amp;lt; 10

It doesn't work in the playground though. For some reason &amp; makes the playground hit another bug and error with Document is not defined

@josephg
Copy link

josephg commented May 26, 2021

I wonder if mdsvex should interpret < and { characters followed by a space literally, generally well-formatted HTML/svelte doesn't do this and handling this in a special way would allow a number of obvious cases to work, such as this one: #113 (comment) (tl;dr: writing foo < bar to make some didactic point)

The commonmark specification has a list of rules for what constitutes legal tags. Anything that isn't a valid tag is escaped. This example shows a < followed by a space is not considered a valid tag name. Eg, < a> encodes to &lt; a&gt;. (As it does in this comment.)

Commonmark has a test suite of JSON content. We should get that test suite passing in mdsevx.

@pngwn pngwn added this to mdsvex Oct 16, 2021
@pngwn pngwn moved this to Refine in mdsvex Oct 16, 2021
@pngwn pngwn added this to the 1.0 milestone Oct 16, 2021
@Madd0g
Copy link

Madd0g commented Oct 26, 2021

When smartypants is enabled (which it is by default), the quotes get converted to fancy quotes. (#83)

Yes, this is compounded in plugins as well, for example I started using remark-directive and it transforms quotes into smart quotes before it gets to the directive parsing part. Which messes up parameter passing.

I guess it would be cool to have more control over it, maybe when this conversion occurs in the mdsvex pipeline? maybe choose the types of elements it operates on?

@pngwn
Copy link
Owner

pngwn commented Oct 26, 2021

Mdsvex will never be commonmark compliant. Even less so I'm 1.0. That said, for 1.0, I'll be porting/modifying the commonmark test cases across and restricting html syntax to solve this issue.

In the current implementation there isn't anything that can be done about it.

I'm working on the 1.0 parser now, which will bring this under my control.

@MrVauxs
Copy link

MrVauxs commented Jun 17, 2023

Hello, has this been addressed in a more reliable way than needing to double-escape the < character?

@pngwn pngwn removed the v1 label Feb 23, 2024
@pngwn pngwn mentioned this issue Feb 23, 2024
27 tasks
@pngwn pngwn added assigned Whether or not this bug has been assigned some to some other issues as a subtask or pre-req and removed markdown labels Feb 23, 2024
@MarsRon
Copy link

MarsRon commented Dec 4, 2024

Hi, this issue still persists on version 0.12.3.

@ckiee ckiee linked a pull request Jan 19, 2025 that will close this issue
@ckiee
Copy link

ckiee commented Jan 19, 2025

Fixed in my proposed #692 (^:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assigned Whether or not this bug has been assigned some to some other issues as a subtask or pre-req enhancement New feature or request
Projects
No open projects
Status: Refine
Development

Successfully merging a pull request may close this issue.