-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML string containing underscores gets escaped and shown in output markdown #63
Comments
Hello @jiggneshhgohel I'm not quite sure how to help you there. We could introduce an option to disable the escaping, but I think the use cases for that are quite rare. On the other hand could you replace escaped key chars with the non-escaped versions whenever you need it. Wouldn't that be the easier solution? Let me know what you think! |
That's the most obvious option when there is no provision for desired feature in the library. But my expectancy of using a library like yours is to expect raw original markdown from the HTML I need to convert. My HTML didn't contained any escape characters thus I wasn't expecting it in my converted to markdown. Thanks. |
I think the issue is that the
to read as
The issue I'm having is with a string like
ReverseMarkdown converts it to
and then the marked library converts it to
which is correct from marked's perspective because it's processing only one of the escapes, but incorrect from the user's perspective because the original string had no escapes and the outputted markdown has escapes. (I'm using ReverseMarkdown because sometimes the input string contains actual markup, sometimes it's just plain text.) |
https://www.markdownguide.org/basic-syntax/#italic
So I think the actual problem is that reverse markdown should never escape underscores that are in the middle of a word. |
@anujbiyani Thanks for your input on that issue. I like the idea to leave underscored in words unmodified. It won't solve all issues though, because something like Would you like to open a PR? |
Sure, happy to try and tackle this! One question: why are |
@anujbiyani Good question! I did some digging in the past and I think initial the description mentions this case: https://daringfireball.net/projects/markdown/syntax#backslash The example sounds pretty reasonable to me so I wouldn't ditch the whole escaping. What about:
This way it should be save for the majority and doesn't change the default behavior all to drastically. But everyone is free to opt out of escaping if it fits their case. What do you think? |
Happy to do it via configuration flag to maintain backwards compatibility! Follow-up question, though: I believe the link you provided is referring to user-provided backslashes whereas the code I linked in ReverseMarkdown is escaping underscores and asterisks that don't have specifically two backslashes. I think that method should either:
I'm trying to understand what the right behavior should be for when the flag is enabled 😄 let me know what you think! |
You're right, the backslash escape is used for user provided escapes. But, if we complete a full life cycle (HTML - MD - HTML) and we would not escape the initial *, then it would be treated as if it were markdown and therefore the information would be lost. It's a little confusing, so let me give you an example: With escaping <div>I _emphasize_ that</div> # Original HTML (user input)
I \_emphasize\_ that # MD after conversion
<p>I _emphasize_ that</p> # After using random MD interpreter Without escaping (as proposed) <div>I _emphasize_ that</div> # Original HTML (user input)
I _emphasize_ that # MD after conversion
<p>I <em>emphasize</em> that</p> # Now there is an additional em tag Does this example clarify my problem with just skipping the whole escaping? In my opinion it is correct to treat the original HTML entities as user input and therefore escape them. |
Ohh that helps a ton, thanks! So now I agree that removing the escaping entirely isn't actually a good idea. What if we just go with option 1 from your post above and skip the disabling escaping via config?
Or did you mean that the configuration flag should apply to whether or not we escape in the middle of the word? Sorry for all the back-and-forth, I find escaping/not-escaping incredibly confusing from a keeping-it-straight-in-my-head perspective 😖 . |
Also, don't convert/parse tag attributes, for example:
results in:
notice the space |
Ruby 2.3.0
Rails 4.2.5
Rails Console output:
As can be seen when my HTML string contains words separated by underscore(s), the underscore(s) gets escaped which is correct. But is there any way we can hide those in the output string? I have a use-case wherein I want to convert the HTML string to its Markdown version and render the markdown version as it is.
The text was updated successfully, but these errors were encountered: