Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch wasm emission to a custom encoder #30

Merged
merged 4 commits into from
Jan 28, 2019

Conversation

alexcrichton
Copy link
Collaborator

This commit moves emission of the wasm module away from the
parity-wasm crate to instead using custom code within this crate.
Similar to parsing with wasmparser, this is motivated twofold:

  • First, we want the ability to record binary offsets of where functions
    and instructions are located. This allows us encode dwarf debug
    information eventually.

  • Second, this avoids a "lowering to a different IR" problem where we
    will be able to implement more efficient emission than if we go to
    parity-wasm first.

Ideally this would all be separated to an external crate and/or maybe
even sharing wasmparser types or something like that, but for now it
should be relatively easy enough to inline it and with the spec tests we
can have a pretty high degree of confidence it's not full of bugs at
least.

Some other changes included here are:

  • Functions are now serialized in parallel
  • The handling of mapping a local id to an index is now done in a
    per-function fashion rather than through IdsToIndices. This way the
    maps can be built in parallel and then aggregated at the end into the
    one global map serially.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

src/encode.rs Outdated
let mut done = false;
while !done {
let mut byte = (val as i8) & 0x7f;
val >>= 7;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the leb128 crate has an existing writer that we could use here, and (at least the reader part of) that crate has had a good amount of optimization work done. Would be worth investigating at minimum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually initially tried to use that, but I ran into one snag. When this implementation encodes sections it reserves 5 bytes for the section length, encodes the section, and then goes back and fills in the size of the section. That may not be the best idea in general, but the leb128 crate didn't have a way (I think?) to encode an integer to the maximum width, generating unnecessary trailer bytes.

Do you think we should abandon this strategy entirely and just slosh around buffers? (I haven't benchmarked anything yet). Or do you think we should add that functionality to leb128 crate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's just work with this for now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sure thing, I've filed an issue at gimli-rs/leb128#6 to track it for now, and I'll put a FIXME in the code to stop hardcoding.

This commit moves emission of the wasm module away from the
`parity-wasm` crate to instead using custom code within this crate.
Similar to parsing with `wasmparser`, this is motivated twofold:

* First, we want the ability to record binary offsets of where functions
  and instructions are located. This allows us encode dwarf debug
  information eventually.

* Second, this avoids a "lowering to a different IR" problem where we
  will be able to implement more efficient emission than if we go to
  parity-wasm first.

Ideally this would all be separated to an external crate and/or maybe
even sharing `wasmparser` types or something like that, but for now it
should be relatively easy enough to inline it and with the spec tests we
can have a pretty high degree of confidence it's not full of bugs at
least.

Some other changes included here are:

* Functions are now serialized in parallel
* The handling of mapping a local id to an index is now done in a
  per-function fashion rather than through `IdsToIndices`. This way the
  maps can be built in parallel and then aggregated at the end into the
  one global map serially.
Turns out we forgot that `Used` was already collecting used locals, so
there's no need to do it again in `emit_locals`!
@alexcrichton alexcrichton merged commit ce985f8 into rustwasm:master Jan 28, 2019
@alexcrichton alexcrichton deleted the manually-serialize branch January 28, 2019 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants