src: gpu: intel: jit: conv: add reorder-based precomputed zero points #2267

hidefromkgb · 2024-12-13T19:37:38Z

Description

This is a novel approach to calculating zero points ahead of time, appending the precalc results to the convolutions weights buffers when they are either uploaded to the GPU or reordered on the GPU to fit the layout requested by the conv primitive.

Perf results on YOLO-v8n:

	FP16, ms/iter	INT8+ZP, ms/iter	INT8 no ZP, ms/iter
ATSM/DG2	3.3633792	2.6763831	2.4270376
BMG	1.46378	1.42166	1.2961

Addresses MFDNN-12548.

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

dzarukin

What if we don't compute this partial summation on the library side but instead provide an API to request compensations and accept them as a per-channel array?

src/common/memory_desc.cpp

src/common/memory_desc.hpp

src/gpu/gpu_reorder_pd.hpp

src/gpu/gpu_utils.hpp

hidefromkgb · 2024-12-14T01:11:02Z

What if we don't compute this partial summation on the library side but instead provide an API to request compensations and accept them as a per-channel array?

An important remark: this is not just a per-channel array but a full-fledged hidden buffer with DHW spatials.
Still, it's definitely possible, in theory, to expose ZP precompute as a part of oneDNN API — but maybe not right now.

In fact, this would be more appropriate from the architectural standpoint to expose rather than hide, as the current approach is all about tackling a graph-level problem with primitive-level tools.
Furthermore it'd lift the scalar-ZP-only requirement — a serious limitation the proposed PR has.

On the downside, though, the users would then become responsible for allocating network nodes to handle the ahead-of-time ZP precalculation.

@karturov @dzarukin What do you think of this?

src/common/memory_desc_wrapper.hpp

src/common/primitive_hashing.cpp

src/gpu/intel/jit/reorder/gen_reorder.cpp

echeresh · 2024-12-20T19:56:01Z

In fact, this would be more appropriate from the architectural standpoint to expose rather than hide, as the current approach is all about tackling a graph-level problem with primitive-level tools.
Furthermore it'd lift the scalar-ZP-only requirement — a serious limitation the proposed PR has.

On the downside, though, the users would then become responsible for allocating network nodes to handle the ahead-of-time ZP precalculation.

@hidefromkgb What do you mean exactly by "expose"? Exposing the structure of the memory_extra_desc_t and its contents? If so, what is the potential benefit?

I assume we would still provide some API (as e.g. reorder) to compute a compensation buffer in this form. It can't be easily/efficiently pre-computed by the user as it's not really trivial - so we have to expose it from the library in some form, say now it's done under reorder. Then I would say - it's not really important for the users to have any details on the buffer contents as the flow is 1) allocate an extra buffer 2) call reorder (or whatever) primitive 3) pass this buffer to convolution. There is no need to expose more details.

Furthermore it'd lift the scalar-ZP-only requirement — a serious limitation the proposed PR has.

This can be implemented under the current design, right? The user can pass vector zero-points as an additional reorder argument - then the resulting compensation buffer would be computed accordingly. This should require only minor API changes at worst - something like an extra attribute for the convolution for the user to pass to opt in for this feature.

src/common/memory_desc.hpp

src/common/memory_desc.cpp

src/gpu/gpu_reorder_pd.hpp

src/gpu/gpu_zero_points_conv.hpp

hidefromkgb · 2025-01-10T02:00:13Z

make test
disable benchdnn_all
enable benchdnn_nightly
enable benchdnn_conv
enable benchdnn_deconv
enable benchdnn_pool
enable benchdnn_reorder
enable benchdnn_rnn

hidefromkgb · 2025-01-10T02:00:19Z

make test perf-gpu
set primitive=conv

hidefromkgb requested review from a team as code owners December 13, 2024 19:37

github-actions bot added platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic labels Dec 13, 2024

dzarukin reviewed Dec 13, 2024

View reviewed changes

hidefromkgb force-pushed the aguskov/reorder_zp_precomp branch 4 times, most recently from 2a37b6b to b74e8bb Compare December 17, 2024 06:17

echeresh reviewed Dec 20, 2024

View reviewed changes

src/common/memory_desc_wrapper.hpp Show resolved Hide resolved

src/common/primitive_hashing.cpp Show resolved Hide resolved

src/gpu/intel/jit/reorder/gen_reorder.cpp Outdated Show resolved Hide resolved

hidefromkgb force-pushed the aguskov/reorder_zp_precomp branch 5 times, most recently from f43badc to 96023e3 Compare January 3, 2025 17:31

hidefromkgb force-pushed the aguskov/reorder_zp_precomp branch 4 times, most recently from e09deff to 2249844 Compare January 7, 2025 21:43

echeresh approved these changes Jan 7, 2025

View reviewed changes

hidefromkgb force-pushed the aguskov/reorder_zp_precomp branch from 2249844 to 04e5d54 Compare January 7, 2025 23:39

dzarukin approved these changes Jan 8, 2025

View reviewed changes

src/common/memory_desc.hpp Outdated Show resolved Hide resolved

src/common/memory_desc.cpp Outdated Show resolved Hide resolved

src/gpu/gpu_reorder_pd.hpp Outdated Show resolved Hide resolved

src/gpu/gpu_zero_points_conv.hpp Outdated Show resolved Hide resolved

densamoilov approved these changes Jan 9, 2025

View reviewed changes

hidefromkgb force-pushed the aguskov/reorder_zp_precomp branch 3 times, most recently from 6cd8e9a to d2fb32b Compare January 9, 2025 07:30

hidefromkgb force-pushed the aguskov/reorder_zp_precomp branch from d2fb32b to b49f67b Compare January 9, 2025 21:23

hidefromkgb added 2 commits January 9, 2025 17:59

src: common: make rnn_s8s8_compensation a power of 2

0993edf

src: gpu: intel: jit: conv: add reorder-based precomputed zero points

ce80081

hidefromkgb force-pushed the aguskov/reorder_zp_precomp branch from b49f67b to ce80081 Compare January 10, 2025 02:00

hidefromkgb merged commit 4d5ec0f into main Jan 10, 2025
16 of 19 checks passed

hidefromkgb deleted the aguskov/reorder_zp_precomp branch January 10, 2025 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: gpu: intel: jit: conv: add reorder-based precomputed zero points #2267

src: gpu: intel: jit: conv: add reorder-based precomputed zero points #2267

hidefromkgb commented Dec 13, 2024 •

edited

Loading

dzarukin left a comment

hidefromkgb commented Dec 14, 2024

echeresh commented Dec 20, 2024

hidefromkgb commented Jan 10, 2025

hidefromkgb commented Jan 10, 2025

src: gpu: intel: jit: conv: add reorder-based precomputed zero points #2267

src: gpu: intel: jit: conv: add reorder-based precomputed zero points #2267

Conversation

hidefromkgb commented Dec 13, 2024 • edited Loading

Description

General

dzarukin left a comment

Choose a reason for hiding this comment

hidefromkgb commented Dec 14, 2024

echeresh commented Dec 20, 2024

hidefromkgb commented Jan 10, 2025

hidefromkgb commented Jan 10, 2025

hidefromkgb commented Dec 13, 2024 •

edited

Loading