-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: gpu: intel: jit: conv: add reorder-based precomputed zero points #2267
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we don't compute this partial summation on the library side but instead provide an API to request compensations and accept them as a per-channel array?
An important remark: this is not just a per-channel array but a full-fledged hidden buffer with DHW spatials. In fact, this would be more appropriate from the architectural standpoint to expose rather than hide, as the current approach is all about tackling a graph-level problem with primitive-level tools. On the downside, though, the users would then become responsible for allocating network nodes to handle the ahead-of-time ZP precalculation. |
2a37b6b
to
b74e8bb
Compare
@hidefromkgb What do you mean exactly by "expose"? Exposing the structure of the I assume we would still provide some API (as e.g. reorder) to compute a compensation buffer in this form. It can't be easily/efficiently pre-computed by the user as it's not really trivial - so we have to expose it from the library in some form, say now it's done under reorder. Then I would say - it's not really important for the users to have any details on the buffer contents as the flow is 1) allocate an extra buffer 2) call reorder (or whatever) primitive 3) pass this buffer to convolution. There is no need to expose more details.
This can be implemented under the current design, right? The user can pass vector zero-points as an additional reorder argument - then the resulting compensation buffer would be computed accordingly. This should require only minor API changes at worst - something like an extra attribute for the convolution for the user to pass to opt in for this feature. |
f43badc
to
96023e3
Compare
e09deff
to
2249844
Compare
2249844
to
04e5d54
Compare
6cd8e9a
to
d2fb32b
Compare
d2fb32b
to
b49f67b
Compare
b49f67b
to
ce80081
Compare
make test |
make test perf-gpu |
Description
This is a novel approach to calculating zero points ahead of time, appending the precalc results to the convolutions weights buffers when they are either uploaded to the GPU or reordered on the GPU to fit the layout requested by the conv primitive.
Perf results on YOLO-v8n:
Addresses MFDNN-12548.
General
make test
andmake test_benchdnn_*
) pass locally for each commit?