Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding::CompatibilityError: incompatible encoding regexp match #8

Open
orbanbotond opened this issue May 24, 2013 · 7 comments
Open

Comments

@orbanbotond
Copy link

Encoding::CompatibilityError: incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
from /Users/boti/.rvm/gems/ruby-1.9.3-p327@search_server/gems/chardet2-1.0.1/lib/UniversalDetector.rb:134:in =~' from /Users/boti/.rvm/gems/ruby-1.9.3-p327@search_server/gems/chardet2-1.0.1/lib/UniversalDetector.rb:134:infeed'
from /Users/boti/.rvm/gems/ruby-1.9.3-p327@search_server/gems/chardet2-1.0.1/lib/UniversalDetector.rb:46:in `chardet'
from (irb):12

@saneshark
Copy link

same issue testing with:

UniversalDetector.chardet("∀,∈,≠,Ω,∑,∏,ɔ,⍴,€,ζ,π,ป่")

which should return utf8 as the encoding type.

@mremond
Copy link

mremond commented Feb 22, 2014

Did you solve your issue ?

@orbanbotond
Copy link
Author

Hi,

It is now a deprecated project. But despite that the issue is still there.
The lib didn't return me the proper encoding.

On 22 February 2014 12:54, Mickaël Rémond [email protected] wrote:

Did you solve your issue ?


Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-35799973
.

@mremond
Copy link

mremond commented Feb 22, 2014

Thanks !
I guess I have to find an alternative way of detecting encoding then.

@orbanbotond
Copy link
Author

Well no... I tried 3 other libs and then I decided to manually specify the
encoding...

On 22 February 2014 13:08, Mickaël Rémond [email protected] wrote:

Thanks !
I guess I have to find an alternative way of detecting encoding then.


Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-35800192
.

@orbanbotond
Copy link
Author

I think it was a hard case.

On 22 February 2014 13:09, Botond Orbán [email protected] wrote:

Well no... I tried 3 other libs and then I decided to manually specify the
encoding...

On 22 February 2014 13:08, Mickaël Rémond [email protected]:

Thanks !
I guess I have to find an alternative way of detecting encoding then.


Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-35800192
.

@saneshark
Copy link

I just patched rchardet, an older library. Although I'm thinking one could just write the string to a temp file and use the system:

 encoding = `file --mime-encoding string.tmp | awk '{print $2}'`.strip.upcase
 string.force_encoding(encoding)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants