wzprof
, pronounced as you think it should, is a pprof based profiler for
WebAssembly built on top of Wazero.
It offers the ability to collect CPU and Memory profiles during the execution of
WebAssembly modules.
If you are interested in taking a deep-dive into how wzprof
is built,
you might enjoy reading:
👉 Performance in the spotlight: WebAssembly profiling for everyone
We have presented this project at international conferences, such as:
- Civo Navigate Europe 2024
slides: https://github.com/iamrajiv/civo-navigate-europe-2024
- KubeCon + CloudNativeCon India 2024
slides: https://github.com/iamrajiv/kubecon-cloudnativecon-india-2024
WebAssembly runtimes typically allow profiling guest code via an external
profiler such as perf
, but in many cases the recording and analysis of
profiles remains a difficult task, especially due to features like JIT
compilation.
pprof
is the de-facto standard profiling tool for Go programs, and offers
some of the simplest and quickest ways to gather insight into the performance
of an application.
wzprof
aims to combine the capabilities and user experience of pprof
with a wazero.Runtime,
enabling the profiling of any application compiled to WebAssembly.
wzprof
mimics the approach and workflow popularized by Go pprof, and extends
it to collect profiles of WebAssembly programs compiled from any programming
language. The profiles produced are designed to be compatible with pprof,
allowing developers to use the classic go tool pprof
workflow to analyize
application performance.
- CPU: calls sampling and on-CPU time.
- Memory: allocations (see below).
- DWARF support (demangling, source-level profiling).
- Integrated pprof server.
- Library and CLI interfaces.
You can either use wzprof
as a CLI or as a library if you use the Wazero
runtime libraries.
To install the latest version of wzprof
:
go install github.com/stealthrocket/wzprof/cmd/wzprof@latest
To use the library as code in a Go program:
go get github.com/stealthrocket/wzprof@latest
By default, wzprof will sample calls with a ratio of 1/19. Sampling is used to
limit the overhead of the profilers but the default rate might not be suitable
in some cases.
For example, if your processes are short running and you don't see anything in the
profile, you might want to disable the sampling. To do so, use -sample 1
.
In those examples we set the sample rate to 1 to capture all samples because the test programs complete quickly.
wzprof -sample 1 -memprofile /tmp/profile ./testdata/c/simple.wasm
wzprof -sample 1 -cpuprofile /tmp/profile ./testdata/c/crunch_numbers.wasm
go tool pprof -http :4000 /tmp/profile
Similarly to net/http/pprof
, wzprof
can expose a pprof-compatible http endpoint on behalf of the guest application:
wzprof -pprof-addr :8080 ...
go tool pprof -http :3030 'http://localhost:8080/debug/pprof/profile?seconds=5'
go tool pprof -http :3030 'http://localhost:8080/debug/pprof/heap'
wzprof
Go APIs depend on Wazero's experimental
package which makes no
guarantees of backward compatilbity!
The following code snippet demonstrates how to integrate the profilers to a Wazero runtime within a Go program:
sampleRate := 1.0
p := wzprof.ProfilingFor(wasmCode)
cpu := p.CPUProfiler()
mem := p.MemoryProfiler()
ctx := context.WithValue(context.Background(),
experimental.FunctionListenerFactoryKey{},
experimental.MultiFunctionListenerFactory(
wzprof.Sample(sampleRate, cpu),
wzprof.Sample(sampleRate, mem),
),
)
runtime := wazero.NewRuntime(ctx)
defer runtime.Close(ctx)
compiledModule, err := runtime.CompileModule(ctx, wasmCode)
if err != nil {
log.Fatal("compiling wasm module:", err)
}
err = p.Prepare(compiledModule)
if err != nil {
return fmt.Errorf("preparing wasm module: %w", err)
}
// The CPU profiler collects records of module execution between two time
// points, the program drives where the profiler is active by calling
// StartProfile/StopProfile.
cpu.StartProfile()
moduleInstance, err := runtime.InstantiateModule(ctx, compiledModule,
wazero.NewModuleConfig(),
)
if err != nil {
log.Fatal("instantiating wasm module:", err)
}
if err := moduleInstance.Close(ctx); err != nil {
log.Fatal("closing wasm module:", err)
}
cpuProfile := cpu.StopProfile(sampleRate, symbols)
memProfile := mem.NewProfile(sampleRate, symbols)
if err := wzprof.WriteProfile("cpu.pprof", cpuProfile); err != nil {
log.Fatal("writing CPU profile:", err)
}
if err := wzprof.WriteProfile("mem.pprof", memProfile); err != nil {
log.Fatal("writing memory profile:", err)
}
Note that the program must spearate the compilation and instantiation of WebAssembly modules in order to use the profilers, because the module must be compiled first in order to build the list of symbols from the DWARF sections.
Memory profiling works by tracing specific functions. Supported functions are:
malloc
calloc
realloc
free
runtime.mallocgc
runtime.alloc
Feel free to open a pull request to support more memory-allocating functions!
wzprof
has two CPU profilers: CPU samples and CPU time.
The CPU samples profiler gives a repesentation of the guest execution by counting the number of time it sees a unique stack trace.
The CPU time profiler measures the actual time spent on-CPU without taking into account the off-CPU time (e.g waiting for I/O). For this profiler, all the host-functions are considered off-CPU.
wzprof runs some heuristics to assess what the guest module is running to adapt the way it symbolizes and walks the stack. In all other cases, it defaults to inspecting the wasm stack and uses DWARF information if present in the module.
If the guest has been compiled by golang/go 1.21+, wzprof inspects the memory to walk the Go stack, which provides full call stacks, instead of the shortened versions you would get without this support.
In addition, wzprof parses pclntab to perform symbolization. This is the same mechanism the Go runtime itself uses to display meaningful stack traces when a panic occurs.
If the guest is CPython 3.11 and has been compiled with debug symbols (such as timecraft's), wzprof walks the Python interpreter call stack, not the C stack it would otherwise report. This provides more meaningful profiling information on the script being executed.
At the moment it does not support merging the C extension calls into the Python interpreter stack.
Note that a current limitation of the implementation is that unloading or reloading modules may result in an incorrect profile. If that's a problem for you please file an issue in the github tracker.
As a fallback, if DWARF sections are available, wzprof symbolizes the wasm stack
trace using the DWARF symbols stored in custom sections of the module. For this
to work, you need to make sure your compiler generates those sections. For
example, use -g
when compiling with clang:
clang code.c -o code.wasm -g -target wasm32
Warning When using clang with any optimization level other than
-O0
, it will automatically runwasm-opt
if that program is in yourPATH
. It makes DWARF information unusable by wzprof. Make sure clang can't findwasm-opt
during compilation. See llvm/llvm-project#55781.
Pull requests are welcome! Anything that is not a simple fix would probably benefit from being discussed in an issue first.
Remember to be respectful and open minded!