-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerics issue with vectorized conv2d #820
Comments
I've narrowed it down to the alignment of the loads : if the alignments are set/forced sufficiently low, then the numerics with vectorization are fine. I am a bit perplexed why the alignment needs to be as low as it does, consider the 2 IRs in the zip file: They are basically identical, except for some loads which have alignment 4 in the one file, and alignment 2 in the other. The case with alignment 2 gives numerically correct results, the one with alignment 4 does not. What I find confusing is that alignment 4 is surely enough here: none of the strides in any of the loads is less than 8. |
Running the above 2 IRs through compiler explorer http://xsjsda132:10240/z/6qc4cc align 2: |
This PR switches all numerical convolution tests to use the objectFifo pipeline. With respect to the new tiling strategy: 1) A single **column** is currently used. Targeting multiple columns results in ` error: 'aie.memtile_dma' op could not find and assign a valid BD id`. This will will be investigated as follow-up work: #821 2) There is no longer interleaving of compute and L2->L1 data movement, which means #619 becomes low priority / obsolete 3) L3->L2, L2->L3 still uses padding. But L2->L1, L1->L2 uses packing. 4) Channel-first convolution is completely unsupported, we expect high level transforms to convert to channel last before reaching our backend. 5) Vectorization is not currently enabled, due to issues with alignment. See follow-up task #820. This is functionally ok for now, as peano can scalarize code for all data types.
Enabling vectorization (see #789) for convolution results in numerical failure. The values are off only slightly (although they are definitely not correct, there is no floating point rounding issue here). Experiments with the input values suggests that the problem is that the input image data (i.e. not the kernel data) is being read incorrectly, with an incorrect offset inside the scf.for loop (not confirmed).
Some things I've tried:
Setting optimization flags in LLVM to 0 (the default is 2) has no effect.
Inverting the scf.for loop order has no effect.
Using different versions of peano has no effect.
This task is to find the source of the error, and if it's peano create a reproducer for the team.
Attached are the
ll
andopt.ll
files (vectorized and unvectorized) : ll_files.zip (they're quite small, vectorized_input.opt.ll is 94 lines only)The MLIR IR looks basically identical except for the inner-loop.
// With vectorization:
// Without vectorization:
In the vectorized case, the
vector.contract
gets lowered to aaievec.matmul
which in turn gets lowered toThe text was updated successfully, but these errors were encountered: