A minimal implementation of part 2 : msgpack #6

eregon · 2014-04-21T09:32:01Z

Just enough to pass the given tests, yet considering the extension to multiple extended types.
The Hash type is certainly the most interesting one to pack/unpack.
And the use of the Enumerator feels definitely as the best way considering we never need to go back.

practicingruby · 2014-04-21T11:44:38Z

This is very close to what I did as a rough sketch when writing the exercise, good work!

Enumerator was a huge help here, it'll be interesting to see if someone submits a solution that does not use it so we can compare.

practicingruby · 2014-04-30T14:38:26Z

One thing I noticed in #8 that also applies here: You're converting messagepack fixed strings as if they're raw binary data, but they're specified to be in UTF-8 format. There's an example on #8 that shows the difference.

On a tangential note, I've never seen String#b before. Neat!

* UTF-8 is the chosen encoding for the 'str' type

eregon · 2014-04-30T15:08:08Z

Thanks, indeed I missed to convert to UTF-8 if not already the case in pack. But this conversion to UTF-8 is not always possible and might be lossy.
Notice I already thought to convert back to UTF-8 in unpack though.
In my complete solution, much better care is taken with strings, and actually strings of any encoding can be transferred.
I wonder what is the best way to unpack a string from an Enumerator of bytes.
bytesize.times.map { bytes.next }.pack("C#{bytesize}") could work but it creates an extra Array. The way I implemented is clear but might resize a lot the String.

eregon · 2014-04-30T15:12:43Z

It is not correct to use the U directive for this problem like you mention in #8.

"Ĕ".unpack("U*") # => [276]

276 is not representable as a single byte (it actually is the first UTF-8 codepoint of that string).

practicingruby · 2014-04-30T15:42:09Z

@eregon: Hmm... I didn't realize that's what it would do, I guess I had assumed it'd handle multibyte sequences rather than requiring conversion to codepoints but that's clearly not the case.

So what's the right way to solve this?

practicingruby · 2014-04-30T16:18:35Z

@eregon:

When going from MessagePack to Ruby, It appears like we can use force_encoding which should be safe, because MessagePack requires UTF-8 text. Can you confirm, or recommend a better solution?

>> "Ĕ".bytes
=> [196, 148]
>> x = "".b
=> ""
>> x << 196
=> "\xC4"
>> x << 148
=> "\xC4\x94"
>> x.encode("UTF-8")
Encoding::UndefinedConversionError: "\xC4" from ASCII-8BIT to UTF-8
    from (irb):44:in `encode'
    from (irb):44
    from /Users/seacreature/.rubies/ruby-2.1.1/bin/irb:11:in `<main>'
>> x.force_encoding("UTF-8")
=> "Ĕ"

eregon · 2014-04-30T19:57:38Z

Indeed, force_encoding is the way to set the encoding of a String, and the only way to read byte by byte correctly is to make a binary String of them. One could use String#valid_encoding? to check if the result is indeed UTF-8 or not.

When going to bytes, the obvious choice is String#bytes (especially since it returns an Array), but String#unpack('C*') should be equivalent. The String should be first encoded in UTF-8 with encode('UTF-8') of course if it is to be decoded in UTF-8.

practicingruby · 2014-04-30T20:06:56Z

OK, I think this is how I'm going to proceed in my own solution then. thanks!

As for the rather annoying thing with getting the next N elements from an Enumerator, I haven't been able to come up with a better solution. I had hoped enum.next(N) existed in Ruby, but sadly it does not seem to. I wonder if it's worth suggesting it.

eregon · 2014-04-30T20:12:09Z

Yeah, I expected as well enum.next(N) would work. take(N) somewhat works on IO as #each is modifying the Enumerator but I have no idea how to do that for a normal Enumerator. It is likely worth discussing it if this has not already been the case.

practicingruby · 2014-04-30T20:31:09Z

I also tried take(N) without luck. I think I understand the source of the problem. With an I/O object an external counter is maintained... the position of the read pointer in the file. No such thing exists for arrays or other collections, as iteration is non-destructive.

Here's a hack I built that works around the problem:

class PersistentEnumerable
  def initialize(collection)
    @collection = collection
  end

  def each
    counter = -1

    Enumerator.new do |y|
      loop do
        counter += 1

        break if counter == @collection.length

        y.yield(@collection[counter])
      end
    end
  end
end


pe = PersistentEnumerable.new([1,3,5,7,9,11])

enum = pe.each 

p enum.take(2) #=> [1, 3]
p enum.take(2) #=> [5, 7]

# create a new closure
enum = pe.each

p enum.take(2) #=> [1,3]

Relying on storing state in closures is a little scary, but it works!

practicingruby · 2014-04-30T20:34:46Z

Note my original PersistentEnumerable example had mistakes. Should work now. (for some definition of work 😁)

eregon · 2014-05-01T10:31:16Z

Neat, quite similar to what I was thinking!
I would say relying on @collection.length not changing is more dangerous than closure state.
If you do not mind another level of Enumerator, it can be defined for every Enumerator as well:

class Enumerator
  def to_persistent
    enum = rewind
    Enumerator.new do |y|
      loop { y << enum.next }
    end
  end
end

eregon added 3 commits April 21, 2014 11:19

implement pack

a8aa7c7

implement unpack

801deca

implement an extended type for Symbol

78ded0f

eregon mentioned this pull request Apr 21, 2014

A quite extensive implementation of msgpack #7

Closed

practicingruby added the solved label Apr 21, 2014

eregon changed the title ~~An minimal implementation of part 2 : msgpack~~ A minimal implementation of part 2 : msgpack Apr 30, 2014

encode strings to be dumped in UTF-8

fde8d5a

* UTF-8 is the chosen encoding for the 'str' type

practicingruby mentioned this pull request Apr 30, 2014

Part2 solutions (code only) #8

Open

eregon closed this Nov 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A minimal implementation of part 2 : msgpack #6

A minimal implementation of part 2 : msgpack #6

eregon commented Apr 21, 2014

practicingruby commented Apr 21, 2014

practicingruby commented Apr 30, 2014

eregon commented Apr 30, 2014

eregon commented Apr 30, 2014

practicingruby commented Apr 30, 2014

practicingruby commented Apr 30, 2014

eregon commented Apr 30, 2014

practicingruby commented Apr 30, 2014

eregon commented Apr 30, 2014

practicingruby commented Apr 30, 2014

practicingruby commented Apr 30, 2014

eregon commented May 1, 2014

A minimal implementation of part 2 : msgpack #6

A minimal implementation of part 2 : msgpack #6

Conversation

eregon commented Apr 21, 2014

practicingruby commented Apr 21, 2014

practicingruby commented Apr 30, 2014

eregon commented Apr 30, 2014

eregon commented Apr 30, 2014

practicingruby commented Apr 30, 2014

practicingruby commented Apr 30, 2014

eregon commented Apr 30, 2014

practicingruby commented Apr 30, 2014

eregon commented Apr 30, 2014

practicingruby commented Apr 30, 2014

practicingruby commented Apr 30, 2014

eregon commented May 1, 2014