Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine getting const offsets in wasm loader of fast-interp #4012

Merged
merged 6 commits into from
Jan 20, 2025

Conversation

wenyongh
Copy link
Contributor

@wenyongh wenyongh commented Jan 9, 2025

Getting const offsets in the wasm loader of fast interpreter may take a lot of
time, especially when there are lots of const opcodes in a wasm function,
since the loader creates a temp const list and searchs whether a const is
in the list by traversing the elements of the list one by one, and it also uses
a Const struct which is big as the list element.

This PR creates two const lists instead, whose element is int64 and int32. And
it treats i64/f64 the same and treats i32/f32 the same. In the firt time scan,
it just appends element to the lists, and in the second time scan, it sorts the lists
and remove duplicated elements first, and then binary search them for
duplicated consts. Experiments show that it improves the performance a lot
when there are a lot of consts.

@wenyongh wenyongh changed the title [test] Refine getting const offsets in wasm loader of fast-interp Refine getting const offsets in wasm loader of fast-interp Jan 10, 2025
Copy link
Collaborator

@xujuntwt95329 xujuntwt95329 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@loganek
Copy link
Collaborator

loganek commented Jan 17, 2025

Thanks for the change. Would you provide a bit more details on the experiments you performed? What exactly did you benchmark and how much was the improvement?

@wenyongh
Copy link
Contributor Author

Thanks for the change. Would you provide a bit more details on the experiments you performed? What exactly did you benchmark and how much was the improvement?

I tested the standalone case tests/standalone/test-printf under WAMR, which has more than 30000 i64.const/i32.const opcodes:

cd <wamr_root>/roduct-mini/platforms/linux
mkdir build && cd build
cmake .. && make -j         # iwasm is built with fast-interp enabled
cd <wamr_root>/tests/standalone/test-printf
time iwasm --heap-size=102400 --stack-size=809600 test_printf_builtin.wasm > a.log

Or do some modifications to just test the wasm loader many times, e.g. apply below patch (test loader 101 times):

diff --git a/product-mini/platforms/posix/main.c b/product-mini/platforms/posix/main.c
index af50223a..6326f270 100644
--- a/product-mini/platforms/posix/main.c
+++ b/product-mini/platforms/posix/main.c
@@ -921,6 +921,15 @@ main(int argc, char *argv[])
                                    module_destroyer_callback);
 #endif
 
+    for (uint32 i = 0; i < 100; i++) {
+        wasm_module = wasm_runtime_load(wasm_file_buf, wasm_file_size,
+                                          error_buf, sizeof(error_buf));
+        wasm_runtime_unload(wasm_module);
+        wasm_runtime_free(wasm_file_buf);
+        wasm_file_buf =
+              (uint8 *)bh_read_file_to_buffer(wasm_file, &wasm_file_size);
+    }
+
     /* load WASM module */
     if (!(wasm_module = wasm_runtime_load(wasm_file_buf, wasm_file_size,
                                           error_buf, sizeof(error_buf)))) {
@@ -928,6 +937,7 @@ main(int argc, char *argv[])
         goto fail2;
     }
 
+#if 0
 #if WASM_ENABLE_DYNAMIC_AOT_DEBUG != 0
     if (!wasm_runtime_set_module_name(wasm_module, wasm_file, error_buf,
                                       sizeof(error_buf))) {
@@ -1041,6 +1051,7 @@ fail4:
     wasm_runtime_deinstantiate(wasm_module_inst);
 
 fail3:
+#endif
     /* unload the module */
     wasm_runtime_unload(wasm_module);

And then

time iwasm --heap-size=102400 --stack-size=809600 test_printf_builtin.wasm

My experiment shows that the time of 101 times loading can be reduced from ~39s to ~0.5s. Note that the testing also includes wasm_runtime_unload and bh_read_file_to_buffer, but I think their time should be relatively small.

Copy link
Collaborator

@lum1n0us lum1n0us left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lum1n0us lum1n0us merged commit 831e4bb into bytecodealliance:main Jan 20, 2025
390 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants