UTF-8 handling? #13

mikebaldry · 2017-02-21T09:27:43Z

When I try to pass a UTF-8 charlist, characters such as ł which equate to <<197, 130>> actually go in as 322 in the charlist.

iex(1)> 'hełło'             
[104, 101, 322, 322, 111]

This causes things to break (sometimes I see :erlang.iolist_size([322]) which fails because its > 255, for example), sometimes it just fails to match (depending on the current parsing context I guess)

Am I doing something wrong? (I'm assuming I am!)

I've currently got around this very very crudely by stepping through the bytes and turning it in to a normal list (so I get [197, 130] instead of [322]) then when the result comes back from apply, turn anything in the state that is a string back by stepping through the list and adding to a <<>>.

Great work on this BTW!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 handling? #13

UTF-8 handling? #13

mikebaldry commented Feb 21, 2017

UTF-8 handling? #13

UTF-8 handling? #13

Comments

mikebaldry commented Feb 21, 2017