Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

w2c_env and w2c_wasi: what are they? #2289

Closed
kiancross opened this issue Aug 31, 2023 · 6 comments
Closed

w2c_env and w2c_wasi: what are they? #2289

kiancross opened this issue Aug 31, 2023 · 6 comments

Comments

@kiancross
Copy link

I have the following C file, which I have compiled into a .wasm module, and then translated to C using wasm2c:

#include <stdio.h>
#include <unistd.h>
#include <emscripten.h>

int EMSCRIPTEN_KEEPALIVE unsafe_printf() {
  printf("Hello World\n");
  return 0;
}

int EMSCRIPTEN_KEEPALIVE unsafe_unlink() {
  unlink("/tmp/thisshouldnotexistatall");
  return 0;
}

When linking, I get the following errors:

ld: error: undefined symbol: w2c_env_0x5F_syscall_unlinkat
>>> referenced by unsafe.wasm.c
>>>               unsafe.wasm.o:(w2c_unsafe_unsafe_unlink)

ld: error: undefined symbol: w2c_wasi__snapshot__preview1_fd_write
>>> referenced by unsafe.wasm.c
>>>               unsafe.wasm.o:(w2c_unsafe_f7)
>>> referenced by unsafe.wasm.c
>>>               unsafe.wasm.o:(w2c_unsafe_f7)
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
gmake: *** [Makefile:24: programme] Error 1

It's my understanding that the w2c_wasi__snapshot__preview1_fd_write error is because I need to provide an implementation of the WASI API. What does the __snapshot__preview1 aspect mean in the function call name?

And what about w2c_env_0x5F_syscall_unlinkat? Why is this not translated into the path_unlink_file WASI call?

Furthermore, the signature for the instantiate function is as follows:

void wasm2c_unsafe_instantiate(w2c_unsafe*, struct w2c_env*, struct w2c_wasi__snapshot__preview1*);

What is struct w2c_env and struct w2c_wasi__snapshot__preview1? How are these initialised? Are they documented anywhere?

@keithw
Copy link
Member

keithw commented Aug 31, 2023

It's my understanding that the w2c_wasi__snapshot__preview1_fd_write error is because I need to provide an implementation of the WASI API.

Yes, exactly. Here's an in-process implementation that you might find helpful to draw from: #2002

What does the __snapshot__preview1 aspect mean in the function call name?

wasi_snapshot_preview1 is the full name of the API that emscripten is producing code against (https://github.com/WebAssembly/WASI/blob/main/legacy/preview1/docs.md#-wasi_snapshot_preview1). It's the "preview1" version of WASI. (I understand the plan is to have a "preview2" version in the coming months.)

And what about w2c_env_0x5F_syscall_unlinkat? Why is this not translated into the path_unlink_file WASI call?

It looks like emscripten's current implementation of the unlink function is written in terms of emscripten's own filesystem API and not in terms of the WASI API, so it's showing up as a symbol in LLVM's global C namespace ("env"). I think you'd (probably) get different results compiling with wasi-sdk. Either way, though, the host will need to provide an implementation of this function.

What is struct w2c_env and struct w2c_wasi__snapshot__preview1? How are these initialised? Are they documented anywhere?

These are opaque instance pointers whose contents are up to the implementer of the WASI API (and in this case, also the "env" API that provides unlinkat). They represent the imported modules; the host has to provide these pointers when constructing an instance of your module, and then the module provides the same pointer back when calling a method from the corresponding API. It's basically like the this pointer when calling a method in C++.

Here's an example where the host defines the host API and the w2c_host structure, and then gives a pointer to the module so it can call functions from the host API: https://github.com/WebAssembly/wabt/blob/main/wasm2c/examples/rot13/main.c

@kiancross
Copy link
Author

Thanks a lot for the reply @keithw. Your answers clarify most aspects of my questions.

It looks like emscripten's current implementation of the unlink function is written in terms of emscripten's own filesystem API and not in terms of the WASI API, so it's showing up as a symbol in LLVM's global C namespace ("env"). I think you'd (probably) get different results compiling with wasi-sdk. Either way, though, the host will need to provide an implementation of this function.

Indeed, compiling with wasi-sdk fixed this (now using path_unlink_file).

Am I understanding the following process correctly:

  1. A web assembly compiler takes a C file and compiles it into a web assembly (object?) module.
  2. If this C code uses libc, there will be various external symbols (e.g., open, close, unlinkat).
  3. The wasi-sdk links against wasi-libc, which provides an implementation of these functions in terms of the WASI API, whereas the Emscripten libc provides an implementation of these functions as a mixture of WASI API and Emcripten API calls.
  4. The runtime (e.g., Node, Wasmtime, web browser, wasm2c) must provide an implementation of any of these API functions, if they are used.
  5. wasm2c: uvwasi support #2002 is a work-in-progress implementation of the WASI-API for the wasm2c runtime?

Out of interest, does web assembly differentiate (nominally or otherwise) between 'object' code and 'executable' modules?

It's my understanding that the w2c_wasi__snapshot__preview1_fd_write error is because I need to provide an implementation of the WASI API.

Yes, exactly. Here's an in-process implementation that you might find helpful to draw from: #2002

Thanks for the pointer. Does this imply that previous/current sandboxing, which has utilised wasm2c (e.g., on Firefox), relies on the sandboxed code not needing to interact with the system (i.e., being computation only, rather than making system calls)?

@keithw
Copy link
Member

keithw commented Sep 4, 2023

Am I understanding the following process correctly:

1. A web assembly compiler takes a C file and compiles it into a web assembly (object?) module.

✅ (I think most compilers are using the LLVM wasm backend to generate the Wasm...)

2. If this C code uses `libc`, there will be various external symbols (e.g., `open`, `close`, `unlinkat`).

3. The `wasi-sdk` links against `wasi-libc`, which provides an implementation of these functions in terms of the WASI API, whereas the Emscripten `libc` provides an implementation of these functions as a mixture of WASI API and Emcripten API calls.

4. The runtime (e.g., Node, Wasmtime, web browser, wasm2c) must provide an implementation of any of these API functions, if they are used.

Here I would distinguish between the WebAssembly "runtime" (which for wasm2c is extremely minimal: https://github.com/WebAssembly/wabt/blob/main/wasm2c/README.md#symbols-that-must-be-defined-by-the-embedder) vs. the "host," who provides the imported host functions.

In wasm2c, the Wasm module is transformed into generated C code that expects to link with

  1. an implementation of the runtime API (we provide one implementation here: https://github.com/WebAssembly/wabt/blob/main/wasm2c/wasm-rt-impl.c)
  2. implementations of the imported functions, which may be provided either by
    a. another Wasm module (also run through wasm2c), or
    b. the host
5. [wasm2c: uvwasi support #2002](https://github.com/WebAssembly/wabt/pull/2002) is a work-in-progress implementation of the WASI-API for the wasm2c runtime?

Yeah, I'd say it's a WIP implementation of the WASI API as host functions that will work with Wasm modules run through wasm2c. You could also imagine implementating the WASI API as another Wasm module and run that through wasm2c (but ultimately there needs to be some host API that gets called if you want to interact with the real world).

Out of interest, does web assembly differentiate (nominally or otherwise) between 'object' code and 'executable' modules?

Yes, "objects" carry custom sections for linking/relocations that are used by the LLVM linker. (The linker doesn't have to parse the code section -- it just applies relocations.)

Thanks for the pointer. Does this imply that previous/current sandboxing, which has utilised wasm2c (e.g., on Firefox), relies on the sandboxed code not needing to interact with the system (i.e., being computation only, rather than making system calls)?

I'm not 100% sure -- #2002 is from the same team that's been responsible for the wasm2c work in Firefox, so my understanding is that they are starting to need support for some WASI calls. But I think you're basically right; a lot of libraries do not need to make (many) system calls, so you can get pretty far with sandboxing without needing much support on that end. (Even many libc functions don't ultimately end up making a syscall, including malloc and free.) They have a lot of information here: https://rlbox.dev/

@keithw
Copy link
Member

keithw commented Sep 6, 2023

Closing as answered, but happy to keep helping if we can.

@keithw keithw closed this as completed Sep 6, 2023
@kiancross
Copy link
Author

Thanks again for your reply, @keithw, and apologies for my slow responses; it takes me some time to do additional reading/process your answers, with the hope that I don't then ask silly questions!

I am currently working on performance benchmarking various compartmentalisation/sandboxing mechanisms, hence trying to understand the various stages in the pipeline from source code to sandboxed executable. At each stage, there are quite a few options, which causes a small combinatoric explosion. It may be that all configurations perform relatively similarly, but I suppose various runtimes (and possibly even different host implementations of APIs, such as WASI) will have different performance characteristics.

This leads to my next question: what is the difference between wasm2c and w2c2? Are they simply competing tools, or do they serve different purposes/have different aims? Are there any significantly differing design decisions? I am interested, as turbolent/w2c2#1 seems to suggest differences in performance (runtime performance, binary size etc.). I appreciate that this is another tool, so you may not know!

@keithw
Copy link
Member

keithw commented Sep 6, 2023

I think the biggest difference is the intention behind each tool.

  • wasm2c is implementing the WebAssembly spec, including the "sandboxing" (e.g., bounds-checking of memory and table accesses, type-safety of indirect calls, clean trap on stack exhaustion, etc.), including deterministic traps for out-of-bounds accesses even if the result of the read/get doesn't affect the result of the function (and so could get optimized out), etc. There's a performance cost to some of this. wasm2c passes 100% of the current Wasm testsuite even when the output goes through an optimizing compiler1.
  • as I understand it, w2c2 is trying to take a "hopefully well-behaved" WebAssembly module and sort of "un-translate" it back to the C it might have come from and then run it, using the protections of the operating system to isolate the (possibly misbehaving) program from other processes. My understanding is that w2c2 doesn't try to follow the WebAssembly spec when it comes to trapping OOB accesses, making OOB accesses deterministic, handling stack exhaustion, runtime checks for indirect call types, etc.; the expectation is that the program is already sandboxed by the OS and that it's okay for misbehaving programs to be nondeterministic.

So I think if you're looking for a tool that creates "sandboxed executables," I don't think w2c2 is trying to do that unless you consider the OS's process isolation to be enough sandboxing. (But to be fair, we should probably ask the w2c2 folks -- I don't want to speak for them!)

Beyond that, I think the remaining differences are more minor. I think w2c2 used to be a lot faster than wasm2c on transpiling large modules, but wasm2c made a big speed improvement in #2171 and I would be surprised if there were still big differences. (Although we'd love to know if there are!) It looks like both wasm2c and w2c2 have "parallel output" modes that can output, e.g., 256 .c files that can all be compiled in parallel; this makes it practical to transpile things like clang or other gigantic modules and then compile the result to machine code.

It looks like wasm2c supports a broader list of Wasm features (multi-value, multi-memory, memory64, reference types, exception-handling, SIMD, extended-const, tail calls coming in #2272); on the other hand, w2c2 has built-in support for much of WASI preview1 and uses libdwarf to include debugging information in the C output (traced back to the debugging information in the Wasm input), which is a great feature.

Footnotes

  1. As of the last update. https://github.com/WebAssembly/wabt/pull/2287 catches us up to the current testsuite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants