Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Textpow 1.3.1] warning on bad unicode syntax #1

Open
grosser opened this issue Aug 28, 2013 · 16 comments
Open

[Textpow 1.3.1] warning on bad unicode syntax #1

grosser opened this issue Aug 28, 2013 · 16 comments

Comments

@grosser
Copy link
Owner

grosser commented Aug 28, 2013

Introducing myself : I work in a French company, developing enhancements on our Redmine instance.
I am migrating our developments to the last version of Redmine using Rails 3.

Some Textpow syntax yaml files trigger an annoying warning :
Ignored utf8 regex error invalid multibyte escape: /($)[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*?\b/

I tried to see why, and here is what I found :

The two syntax triggering the warning are :
textpow-1.3.1/lib/textpow/syntax/source.smarty.syntax: match: ($)[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]?\b
textpow-1.3.1/lib/textpow/syntax/source.mel.syntax: match: ($)[a-zA-Z
\x7f-\xff][a-zA-Z0-9_\x7f-\xff]_?\b

What I understand is that these regexp contain an unicode range \x7f-\xff, but with rails 3 this unicode range is interpreted as real unicode.
The problem is that the range \x7f-\xff is outside utf8 legal values.

I know that you can specify to not interpret unicode character in full regular expressions, with adding a n at the end :
//no
But the syntax is different in the yam file (no /...../)
I did not find how to specify a range of ascii characters between 127 and 255.

The other question is why smarty and mel want such characters in variables names ?

Have you other ideas about this issue ?

@grosser
Copy link
Owner Author

grosser commented Aug 28, 2013

replacing them with [^:ascii:] could work on ruby 1.9+ and ree+onigumura not sure if that's exactly the same but should be pretty good...
would also solve the TODO * smart + mel + applescript have some invalid utf8 characters that are currently ignored (search for INVALID_UTF8))

@Utopism
Copy link

Utopism commented Aug 28, 2013

The syntax to replace are :
[a-zA-Z_\x7f-\xff]
meaning :

  • a $
  • a second character containing :
  1. a letter uppercase OR
  2. the '_' character OR
  3. a non printable ascii character whose code is in range [127, 255]

[a-zA-Z0-9_\x7f-\xff]
meaning :

  • 0 to n times :
  1. a letter uppercase OR
  2. the '_' character OR
  3. a digit OR
  4. a non printable ascii character whose code is in range [127, 255]

@grosser
Copy link
Owner Author

grosser commented Aug 28, 2013

aaah thought it meant all of unicode land :D

then not sure how to fix this, if you add the /n and match against a
unicode string kaboom :/

"Ü" =~ /\x7f-\xff/ -> boom

On Wed, Aug 28, 2013 at 2:41 PM, Ennder [email protected] wrote:

The syntax to replace are :

  1. [a-zA-Z_\x7f-\xff]
    meaning :
  • a $
  • a second character containing :
  • a letter uppercase OR
  • the '_' character OR
  • a non printable ascii character whose code is in range [127, 255]
  1. [a-zA-Z0-9_\x7f-\xff]
    meaning :
  • 0 to n times
  • a letter uppercase OR
  • the '_' character OR
  • a digit OR
  • a non printable ascii character whose code is in range [127, 255]


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-23410870
.

@Utopism
Copy link

Utopism commented Aug 28, 2013

We should perhaps find in upstream if ascii characters [127, 255] are really accepted.

@grosser
Copy link
Owner Author

grosser commented Aug 28, 2013

might as well just kill the extra condition... take a small parsing bug vs
a global problem

On Wed, Aug 28, 2013 at 6:08 PM, Ennder [email protected] wrote:

We should perhaps find in upstream if ascii characters [127, 255] are
really accepted.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-23426808
.

@Utopism
Copy link

Utopism commented Aug 29, 2013

You're right, I don't think such variable names cases will happen often !

@Utopism
Copy link

Utopism commented Jan 21, 2014

Hi again,

With Rails 3.2 logs are now filled with these warnings :

/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has '-' without escape: /\b([a-zA-Z-:]+)/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: invalid back reference: /\\(\n\d+|\k\w+|(?<!\|)\g\w+)/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: invalid subexp call: /\\(\n\d+|\k\w+|(?<!\|)\g\w+)/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has '-' without escape: /(\.\.)\s[A-z][A-z0-9-_]+(::)\s*$/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: nested repeat operator ? and * was replaced with '*': /^\\documentclass(?!.*\{beamer\})|^<<(.?*)>>=$/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: nested repeat operator ? and * was replaced with '*': /^<<(.?*)>>=|\\begin\{.*\}/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: nested repeat operator ? and * was replaced with '*': /^@(.?*)$|\\end\{.*\}/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: regular expression has ']' without escape: /(?=(\[<)(?![^\[]+?[^>]]))/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: regular expression has ']' without escape: /(\[<)(?=.*?>])/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: regular expression has ']' without escape: /\[<|>]/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has '-' without escape: /\b([a-zA-Z-_:]+)\s*(=)/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has '-' without escape: /\b([a-zA-Z-_:]+)/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: regular expression has ']' without escape: /<|>|]]>|--/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: regular expression has ']' without escape: /<|>|]]>|--/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: regular expression has ']' without escape: /<|>|]]>|--/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: regular expression has ']' without escape: /<|>|]]>|--/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has ']' without escape
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has '-' without escape: /\b([a-zA-Z-:]+)/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: regular expression has ']' without escape: /[\[]]+/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: regular expression has ']' without escape: /[\[]]+/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: regular expression has ']' without escape: /[\[]]+?/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has ']' without escape: /(\[)([^]]*)(\]) *(\[)(.*?)(\])/
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has ']' without escape
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has ']' without escape
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has ']' without escape
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has ']' without escape
/var/lib/gems/1.9.1/gems/textpow-1.3.1/lib/textpow/syntax.rb:120: warning: character class has ']' without escape

@grosser
Copy link
Owner Author

grosser commented Jan 22, 2014

hmm maybe print some more info around there and see what syntax/line this is coming from -> PR ?

@Utopism
Copy link

Utopism commented Jul 16, 2014

Hi, here are more informations :
gem ultraviolet V1.0.1
gem textpow V1.3.1

lines that cause warnings in the textpow syntax files :

  • lib/textpow/syntax/text.tex.latex.sweave.syntax
    L8 :
    firstLineMatch: ^\documentclass(?!.{beamer})|^<<(.?)>>=$
  • lib/textpow/syntax/text.tex.latex.sweave.syntax:foldingStartMarker
    L11 :
    foldingStartMarker: ^<<(.?)>>=|\begin{.}
  • lib/textpow/syntax/text.html.tt.syntax
  • lib/textpow/syntax/text.html.mt.syntax
    L94 :
    match: \b([a-zA-Z-:]+)\s*(=)
    Can be fixed by escaping the -
    match: \b([a-zA-Z-
    :]+)\s*(=)

All these cases can be foud with a grep -R "A-Z-" *

  • lib/textpow/syntax/text.html.strict.active4d.syntax
  • lib/textpow/syntax/text.html.asp.net.syntax

Some others warnings will be difficult to catch, but this is a beginning

@grosser
Copy link
Owner Author

grosser commented Jul 16, 2014

can you throw that into a PR ?
I kind of curious, what are you using textpow for ?

@Utopism
Copy link

Utopism commented Jul 16, 2014

  • lib/textpow/syntax/text.restructuredtext.syntax
    Can be fixed by escaping the -
    L28 :
    match: (..)\s[A-z][A-z0-9-_]+(::)\s*$
  • lib/textpow/syntax/text.bbcode.syntax
    [[]]+ can be replaced by
    [[]]+
    grep -R '[]]' *

@Utopism
Copy link

Utopism commented Jul 16, 2014

Texpow is used to syntax highlight source files in Redmine (plugin redmine_ultraviolet),

Il try to provide a PR.

@grosser
Copy link
Owner Author

grosser commented Jul 16, 2014

Feel free to add a link in the readme, always nice to have more real-life
examples :)

On Wed, Jul 16, 2014 at 9:47 AM, Ennder [email protected] wrote:

Texpow is used to syntax highlight source files in Redmine (plugin
redmine_ultraviolet),

Il try to provide a PR.


Reply to this email directly or view it on GitHub
#1 (comment).

@Utopism
Copy link

Utopism commented Jul 16, 2014

For the utf8 issue :

Regexp.new
When the lang parameter is n’ orN’ sets the regexp no encoding.

It's what is proposed here :
http://osdir.com/ml/ruby-talk/2009-04/msg00459.html

It seems to pass, don't know if this will lead to a regression

@grosser
Copy link
Owner Author

grosser commented Jul 16, 2014

if something breaks we'll add more tests, no warnings is a nice win :)

On Wed, Jul 16, 2014 at 10:36 AM, Ennder [email protected] wrote:

For the utf8 issue :

Regexp.new
When the lang parameter is n’ orN’ sets the regexp no encoding.

It's what is proposed here :
http://osdir.com/ml/ruby-talk/2009-04/msg00459.html

It seems to pass, don't know if this will lead to a regression


Reply to this email directly or view it on GitHub
#1 (comment).

Utopism referenced this issue in Smile-SA/textpow Jul 16, 2014
	modifié:         lib/textpow/syntax.rb
	modifié:         lib/textpow/syntax/source.camlp4.ocaml.syntax
	modifié:         lib/textpow/syntax/source.ocaml.syntax
	modifié:         lib/textpow/syntax/text.bbcode.syntax
	modifié:         lib/textpow/syntax/text.html.asp.net.syntax
	modifié:         lib/textpow/syntax/text.html.mt.syntax
	modifié:         lib/textpow/syntax/text.html.strict.active4d.syntax
	modifié:         lib/textpow/syntax/text.html.tt.syntax
	modifié:         lib/textpow/syntax/text.html.twiki.syntax
	modifié:         lib/textpow/syntax/text.html.xhtml.1-strict.syntax
	modifié:         lib/textpow/syntax/text.restructuredtext.syntax
	modifié:         lib/textpow/syntax/text.tex.latex.sweave.syntax
Utopism referenced this issue in Smile-SA/textpow Jul 16, 2014
Reference to Redmine added
	modifié:         README.rdoc
@Utopism
Copy link

Utopism commented Jul 16, 2014

Hi, I have done a PR, please check it, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants