Valgrind bug with fft_small nmod_poly_mul #1366

fredrik-johansson · 2023-04-15T17:46:01Z

build/fft_small/test/t-nmod_poly_mul fails when run with valgrind (sample output below).

Dan says that this is a bug in valgrind.

The question is: can we isolate the bug to a small piece of code so that it makes sense to report to the valgrind developers?

Is there a workaround?

fredrik@mordor:~/src/flint2$ valgrind build/fft_small/test/t-nmod_poly_mul 
==2953426== Memcheck, a memory error detector
==2953426== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2953426== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==2953426== Command: build/fft_small/test/t-nmod_poly_mul
==2953426== 
nmod_poly_mul....mul nbits:  2mulmid error at index 1771
zl=385, zh=1790, an=1306, bn=467
mod: 2
==2953426== 
==2953426== Process terminating with default action of signal 6 (SIGABRT)
==2953426==    at 0x5987A7C: __pthread_kill_implementation (pthread_kill.c:44)
==2953426==    by 0x5987A7C: __pthread_kill_internal (pthread_kill.c:78)
==2953426==    by 0x5987A7C: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==2953426==    by 0x5933475: raise (raise.c:26)
==2953426==    by 0x59197F2: abort (abort.c:79)
==2953426==    by 0x4A41620: flint_abort (exception.c:49)
==2953426==    by 0x10A70C: main (t-nmod_poly_mul.c:186)
==2953426== 
==2953426== HEAP SUMMARY:
==2953426==     in use at exit: 867,000 bytes in 48 blocks
==2953426==   total heap usage: 684 allocs, 636 frees, 18,013,664 bytes allocated
==2953426== 
==2953426== LEAK SUMMARY:
==2953426==    definitely lost: 14,320 bytes in 1 blocks
==2953426==    indirectly lost: 0 bytes in 0 blocks
==2953426==      possibly lost: 0 bytes in 0 blocks
==2953426==    still reachable: 852,680 bytes in 47 blocks
==2953426==         suppressed: 0 bytes in 0 blocks
==2953426== Rerun with --leak-check=full to see details of leaked memory
==2953426== 
==2953426== For lists of detected and suppressed errors, rerun with: -s
==2953426== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted (core dumped)

The text was updated successfully, but these errors were encountered:

tthsqe12 · 2023-04-27T02:24:13Z

The issue is with -0.0. reduce_to_0n is composed of reduce_to_pm1no and reduce_pm1no_to_0n, and the latter really doesn't like the former returning -0.0 (0x800000...) Interesting that valgrind is returning a non-compliant -0.0. I have picture proof of valgrind's crimes

tthsqe12 · 2023-04-28T02:55:12Z

So, the following minimal example (without any compiler shenanigans getting in the way) shows that valgrind differs from the hardware. It looks like the hardware is doing the right thing according to the standard. Make sure you can reproduce the results on your comp. Also, as suspected, valgrind's nmadd(a,b,c) matches neg(msub(a,b,c)), which is as non-compliant as replacing sub(a,b) with neg(sub(b,a))

~/Desktop$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && ./mnwe
a:
 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
 0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000
fmsub(a,b,c):
 0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

~/Desktop$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && valgrind ./mnwe
==4026== Memcheck, a memory error detector
==4026== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4026== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==4026== Command: ./mnwe
==4026== 
a:
 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
 0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
 8000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000
fmsub(a,b,c):
 0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

==4026== 
==4026== HEAP SUMMARY:
==4026==     in use at exit: 0 bytes in 0 blocks
==4026==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==4026== 
==4026== All heap blocks were freed -- no leaks are possible
==4026== 
==4026== For counts of detected and suppressed errors, rerun with: -v
==4026== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

// mnwe.asm
.intel_syntax noprefix
.global _start

stdout = 1
sys_write   = 0x0001 
sys_exit    = 0x003c

.text
print_rax_hex:
          bswap rax
          vmovq xmm0,rax
          vpand xmm1,xmm0,xmmword ptr [mask_0f]
         vpsrlq xmm0,xmm0,4
          vpand xmm0,xmm0,xmmword ptr [mask_0f]
     vpunpcklbw xmm0,xmm0,xmm1
         vpaddb xmm1,xmm0,xmmword ptr [mask_30]
       vpcmpgtb xmm0,xmm0,xmmword ptr [mask_09]
          vpand xmm0,xmm0,xmmword ptr [mask_27]
         vpaddb xmm0,xmm0,xmm1
        vmovdqu xmmword ptr [rdi],xmm0
            add rdi,16
            ret

print_zmmword_hex:
            mov ecx,8
.l1:
            mov al,' '
          stosb
            mov rax,qword ptr [rsi]
            add rsi,8
           call print_rax_hex
            sub ecx,1
            jnz .l1
            mov al,10
          stosb
            ret

print_the_stuff:
            lea rsi,[astring]
            mov ecx,astring_end-astring
      rep movsb
            lea rsi,[a]
           call print_zmmword_hex

            lea rsi,[bstring]
            mov ecx,bstring_end-bstring
      rep movsb
            lea rsi,[b]
           call print_zmmword_hex

            lea rsi,[cstring]
            mov ecx,cstring_end-cstring
      rep movsb
            lea rsi,[c]
           call print_zmmword_hex

            lea rsi,abc1string
            mov ecx,abc1string_end-abc1string
      rep movsb
            lea rsi,[abc1]
           call print_zmmword_hex

            lea rsi,abc2string
            mov ecx,abc2string_end-abc2string
      rep movsb
            lea rsi,[abc2]
           call print_zmmword_hex

            mov al,10
          stosb
            ret

_start:
        vmovupd ymm1,ymmword ptr [a]
        vmovupd ymm11,ymmword ptr [a+32]
        vmovupd ymm2,ymmword ptr [b]
        vmovupd ymm12,ymmword ptr [b+32]
        vmovupd ymm3,ymmword ptr [c]
        vmovupd ymm13,ymmword ptr [c+32]

        vmovapd ymm0,ymm2
   vfnmadd213pd ymm0,ymm1,ymm3
        vmovupd ymmword ptr [abc1],ymm0
        vmovapd ymm0,ymm12
   vfnmadd213pd ymm0,ymm11,ymm13
        vmovupd ymmword ptr [abc1+32],ymm0

        vmovapd ymm0,ymm2
    vfmsub213pd ymm0,ymm1,ymm3
        vmovupd ymmword ptr [abc2],ymm0
        vmovapd ymm0,ymm12
    vfmsub213pd ymm0,ymm11,ymm13
        vmovupd ymmword ptr [abc2+32],ymm0

            lea rdi,[message]
           call print_the_stuff

            lea rsi,[message]
            mov rdx,rdi
            sub rdx,rsi
            mov edi,stdout
            mov eax,sys_write
        syscall

            xor edi,edi
            mov eax,sys_exit
        syscall

.data
.align 32
a:
   .quad 0x0000000000000000,0x0000000000000000,0x0000000000000000,0x0000000000000000
   .quad 0x8000000000000000,0x8000000000000000,0x8000000000000000,0x8000000000000000
b:
   .quad 0x0000000000000000,0x0000000000000000,0x8000000000000000,0x8000000000000000
   .quad 0x0000000000000000,0x0000000000000000,0x8000000000000000,0x8000000000000000
c:
   .quad 0x0000000000000000,0x8000000000000000,0x0000000000000000,0x8000000000000000
   .quad 0x0000000000000000,0x8000000000000000,0x0000000000000000,0x8000000000000000
abc1:
   .quad 1,2,3,4,5,6,7,8
abc2:
   .quad 1,2,3,4,5,6,7,8
mask_0f:
   .quad 0x0f0f0f0f0f0f0f0f,0x0f0f0f0f0f0f0f0f
mask_30:
   .quad 0x3030303030303030,0x3030303030303030
mask_09:
   .quad 0x0909090909090909,0x0909090909090909
mask_27:
   .quad 0x2727272727272727,0x2727272727272727

astring:
   .string "a:\n"
astring_end:

bstring:
   .string "b:\n"
bstring_end:

cstring:
   .string "c:\n"
cstring_end:

abc1string:
   .string "fnmadd(a,b,c):\n"
abc1string_end:

abc2string:
   .string "fmsub(a,b,c):\n"
abc2string_end:

message:
.rept 1000
.byte 0
.endr

wbhart · 2023-04-28T12:21:05Z

Here is the output from my AMD Ryzen 7 4800HS:

wbhart@LAPTOP-JUAA3GPC:~$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && ./mnwe
a:
0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000
fmsub(a,b,c):
0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

wbhart@LAPTOP-JUAA3GPC:~$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && valgrind ./mnwe
==21118== Memcheck, a memory error detector
==21118== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==21118== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==21118== Command: ./mnwe
==21118==
==21118== error calling PR_SET_PTRACER, vgdb might block
a:
0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
8000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000
fmsub(a,b,c):
0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

==21118==
==21118== HEAP SUMMARY:
==21118== in use at exit: 0 bytes in 0 blocks
==21118== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==21118==
==21118== All heap blocks were freed -- no leaks are possible
==21118==
==21118== For lists of detected and suppressed errors, rerun with: -s
==21118== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

fredrik-johansson · 2023-04-28T12:26:14Z

Also a Ryzen:

fredrik@mordor:~/src/test$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && ./mnwe
a:
 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
 0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000
fmsub(a,b,c):
 0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

fredrik@mordor:~/src/test$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && valgrind ./mnwe
==3106150== Memcheck, a memory error detector
==3106150== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3106150== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==3106150== Command: ./mnwe
==3106150== 
a:
 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
 0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
 8000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000
fmsub(a,b,c):
 0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

==3106150== 
==3106150== HEAP SUMMARY:
==3106150==     in use at exit: 0 bytes in 0 blocks
==3106150==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==3106150== 
==3106150== All heap blocks were freed -- no leaks are possible
==3106150== 
==3106150== For lists of detected and suppressed errors, rerun with: -s
==3106150== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

albinahlback · 2024-10-14T04:27:15Z

Reported in Valgrind's Bugzilla.

albinahlback · 2024-10-14T14:43:48Z

Superseded by #2092.

albinahlback mentioned this issue Oct 14, 2024

Unsupported instructions and bugs with Valgrind #2092

Open

2 tasks

albinahlback closed this as completed Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Valgrind bug with fft_small nmod_poly_mul #1366

Valgrind bug with fft_small nmod_poly_mul #1366

fredrik-johansson commented Apr 15, 2023

tthsqe12 commented Apr 27, 2023 •

edited

Loading

tthsqe12 commented Apr 28, 2023 •

edited

Loading

wbhart commented Apr 28, 2023

fredrik-johansson commented Apr 28, 2023

albinahlback commented Oct 14, 2024

albinahlback commented Oct 14, 2024

Valgrind bug with fft_small nmod_poly_mul #1366

Valgrind bug with fft_small nmod_poly_mul #1366

Comments

fredrik-johansson commented Apr 15, 2023

tthsqe12 commented Apr 27, 2023 • edited Loading

tthsqe12 commented Apr 28, 2023 • edited Loading

wbhart commented Apr 28, 2023

fredrik-johansson commented Apr 28, 2023

albinahlback commented Oct 14, 2024

albinahlback commented Oct 14, 2024

tthsqe12 commented Apr 27, 2023 •

edited

Loading

tthsqe12 commented Apr 28, 2023 •

edited

Loading