-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
version 1.2.1 and 1.3.0 issues #120
Comments
Hi, maybe v1.2.1 time statistics contains wam up time, we will run some times before real inference. and v1.3.0 may not contains warmup time, only contains real inference time. If you don't want to use warm up, you can set parameter -w to 0. |
Thank you for your answer. The warmup option was the answer to question 1. |
can you show me your command and all log? |
All logs and related necessary information are summarized in the following link: I've been debugging this issue a bit more and I'm guessing it's a MemoryReuseOptimizer related issue. https://github.com/huawei-noah/bolt/blob/master/model_tools/include/OPOptimizers/MemoryReuseOptimizer.hpp According to the log of X2bolt, it was confirmed that the reuse_position of the data to be reused was overridden by other data. |
xxx OT_Scale | -> output So maybe you can swap your mul order C = weight * input => C = input * weight
|
Sorry, I am a little late to reply, maybe you can joint Bolt's QQ group 833345709 or contact my wechat cos_wave. |
Thanks, this solved the problem. |
Hello, thank you for your team’s awesome work!
I have some questions about using the bolt framework.
Here's my working environment:
the final latency results were almost the same but the compositions (statistics time report) were different.
[ version 1.2.1 ]
[ version 1.3.0 ]
Both cases run under loops=1 option but the statistics time of version 1.2.1 seems to be the result of running 10 times. Is it normal?
This network works with version 1.3.0, but not with 1.2.1 because X2bolt of ver 1.2.1 doesn't work properly.
[ X2bolt debug log (version 1.2.1) ]
[ X2bolt debug log (version 1.3.0) ]
as we can see, ver 1.2.1 X2bolt cannot detect an inputs tensor of Mul_21 so I guess benchmark program stops at
bolt/inference/engine/src/cnn.cpp
Line 696 in 4bdc81e
(or nearby, checked with debug options).
In the case of Mul_21, it is executed after the whole left path of the above graph image, so it is expected that it was difficult to reuse the result of ReduceMean op. Of course, there is no problem with the latest version of X2bolt. Is there a way to solve this in the previous version as well?
[ version 1.2.1 ]
[ version 1.3.0 ]
I wonder if this faster output of ver 1.2.1 is kind a reporting bug in version 1.2.1, or a possible result by the implementation difference.
Thank you for reading my long issue and I look forward to your answers.
The text was updated successfully, but these errors were encountered: