Switch wasm emission to a custom encoder #30

alexcrichton · 2019-01-25T19:18:32Z

This commit moves emission of the wasm module away from the
parity-wasm crate to instead using custom code within this crate.
Similar to parsing with wasmparser, this is motivated twofold:

First, we want the ability to record binary offsets of where functions
and instructions are located. This allows us encode dwarf debug
information eventually.
Second, this avoids a "lowering to a different IR" problem where we
will be able to implement more efficient emission than if we go to
parity-wasm first.

Ideally this would all be separated to an external crate and/or maybe
even sharing wasmparser types or something like that, but for now it
should be relatively easy enough to inline it and with the spec tests we
can have a pretty high degree of confidence it's not full of bugs at
least.

Some other changes included here are:

Functions are now serialized in parallel
The handling of mapping a local id to an index is now done in a
per-function fashion rather than through IdsToIndices. This way the
maps can be built in parallel and then aggregated at the end into the
one global map serially.

fitzgen

Very nice!

fitzgen · 2019-01-25T21:11:44Z

src/encode.rs

+        let mut done = false;
+        while !done {
+            let mut byte = (val as i8) & 0x7f;
+            val >>= 7;


So the leb128 crate has an existing writer that we could use here, and (at least the reader part of) that crate has had a good amount of optimization work done. Would be worth investigating at minimum.

https://github.com/gimli-rs/leb128

I actually initially tried to use that, but I ran into one snag. When this implementation encodes sections it reserves 5 bytes for the section length, encodes the section, and then goes back and fills in the size of the section. That may not be the best idea in general, but the leb128 crate didn't have a way (I think?) to encode an integer to the maximum width, generating unnecessary trailer bytes.

Do you think we should abandon this strategy entirely and just slosh around buffers? (I haven't benchmarked anything yet). Or do you think we should add that functionality to leb128 crate?

Yeah let's just work with this for now

Ok sure thing, I've filed an issue at gimli-rs/leb128#6 to track it for now, and I'll put a FIXME in the code to stop hardcoding.

This commit moves emission of the wasm module away from the `parity-wasm` crate to instead using custom code within this crate. Similar to parsing with `wasmparser`, this is motivated twofold: * First, we want the ability to record binary offsets of where functions and instructions are located. This allows us encode dwarf debug information eventually. * Second, this avoids a "lowering to a different IR" problem where we will be able to implement more efficient emission than if we go to parity-wasm first. Ideally this would all be separated to an external crate and/or maybe even sharing `wasmparser` types or something like that, but for now it should be relatively easy enough to inline it and with the spec tests we can have a pretty high degree of confidence it's not full of bugs at least. Some other changes included here are: * Functions are now serialized in parallel * The handling of mapping a local id to an index is now done in a per-function fashion rather than through `IdsToIndices`. This way the maps can be built in parallel and then aggregated at the end into the one global map serially.

Turns out we forgot that `Used` was already collecting used locals, so there's no need to do it again in `emit_locals`!

alexcrichton force-pushed the manually-serialize branch from 8ec9613 to cf91ebd Compare January 25, 2019 19:43

fitzgen approved these changes Jan 25, 2019

View reviewed changes

alexcrichton added 2 commits January 28, 2019 08:43

Avoid collecting used locals twice

125d2c0

Turns out we forgot that `Used` was already collecting used locals, so there's no need to do it again in `emit_locals`!

alexcrichton force-pushed the manually-serialize branch from 206d6f9 to 125d2c0 Compare January 28, 2019 16:43

alexcrichton mentioned this pull request Jan 28, 2019

Support maximal-length encoded integers gimli-rs/leb128#6

Open

alexcrichton added 2 commits January 28, 2019 08:47

Fix a... spuriously (?) failing test

44d04ba

Use the leb128 crate where we can

e16a6c1

alexcrichton merged commit ce985f8 into rustwasm:master Jan 28, 2019

alexcrichton deleted the manually-serialize branch January 28, 2019 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch wasm emission to a custom encoder #30

Switch wasm emission to a custom encoder #30

alexcrichton commented Jan 25, 2019

fitzgen left a comment

fitzgen Jan 25, 2019

fitzgen Jan 25, 2019

alexcrichton Jan 25, 2019

fitzgen Jan 26, 2019

alexcrichton Jan 28, 2019

Switch wasm emission to a custom encoder #30

Switch wasm emission to a custom encoder #30

Conversation

alexcrichton commented Jan 25, 2019

fitzgen left a comment

Choose a reason for hiding this comment

fitzgen Jan 25, 2019

Choose a reason for hiding this comment

fitzgen Jan 25, 2019

Choose a reason for hiding this comment

alexcrichton Jan 25, 2019

Choose a reason for hiding this comment

fitzgen Jan 26, 2019

Choose a reason for hiding this comment

alexcrichton Jan 28, 2019

Choose a reason for hiding this comment