Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valgrind bug with fft_small nmod_poly_mul #1366

Closed
fredrik-johansson opened this issue Apr 15, 2023 · 6 comments
Closed

Valgrind bug with fft_small nmod_poly_mul #1366

fredrik-johansson opened this issue Apr 15, 2023 · 6 comments

Comments

@fredrik-johansson
Copy link
Collaborator

build/fft_small/test/t-nmod_poly_mul fails when run with valgrind (sample output below).

Dan says that this is a bug in valgrind.

The question is: can we isolate the bug to a small piece of code so that it makes sense to report to the valgrind developers?

Is there a workaround?

fredrik@mordor:~/src/flint2$ valgrind build/fft_small/test/t-nmod_poly_mul 
==2953426== Memcheck, a memory error detector
==2953426== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2953426== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==2953426== Command: build/fft_small/test/t-nmod_poly_mul
==2953426== 
nmod_poly_mul....mul nbits:  2mulmid error at index 1771
zl=385, zh=1790, an=1306, bn=467
mod: 2
==2953426== 
==2953426== Process terminating with default action of signal 6 (SIGABRT)
==2953426==    at 0x5987A7C: __pthread_kill_implementation (pthread_kill.c:44)
==2953426==    by 0x5987A7C: __pthread_kill_internal (pthread_kill.c:78)
==2953426==    by 0x5987A7C: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==2953426==    by 0x5933475: raise (raise.c:26)
==2953426==    by 0x59197F2: abort (abort.c:79)
==2953426==    by 0x4A41620: flint_abort (exception.c:49)
==2953426==    by 0x10A70C: main (t-nmod_poly_mul.c:186)
==2953426== 
==2953426== HEAP SUMMARY:
==2953426==     in use at exit: 867,000 bytes in 48 blocks
==2953426==   total heap usage: 684 allocs, 636 frees, 18,013,664 bytes allocated
==2953426== 
==2953426== LEAK SUMMARY:
==2953426==    definitely lost: 14,320 bytes in 1 blocks
==2953426==    indirectly lost: 0 bytes in 0 blocks
==2953426==      possibly lost: 0 bytes in 0 blocks
==2953426==    still reachable: 852,680 bytes in 47 blocks
==2953426==         suppressed: 0 bytes in 0 blocks
==2953426== Rerun with --leak-check=full to see details of leaked memory
==2953426== 
==2953426== For lists of detected and suppressed errors, rerun with: -s
==2953426== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted (core dumped)
@tthsqe12
Copy link
Contributor

tthsqe12 commented Apr 27, 2023

The issue is with -0.0. reduce_to_0n is composed of reduce_to_pm1no and reduce_pm1no_to_0n, and the latter really doesn't like the former returning -0.0 (0x800000...) Interesting that valgrind is returning a non-compliant -0.0. I have picture proof of valgrind's crimes

@tthsqe12
Copy link
Contributor

tthsqe12 commented Apr 28, 2023

So, the following minimal example (without any compiler shenanigans getting in the way) shows that valgrind differs from the hardware. It looks like the hardware is doing the right thing according to the standard. Make sure you can reproduce the results on your comp. Also, as suspected, valgrind's nmadd(a,b,c) matches neg(msub(a,b,c)), which is as non-compliant as replacing sub(a,b) with neg(sub(b,a))

~/Desktop$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && ./mnwe
a:
 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
 0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000
fmsub(a,b,c):
 0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

~/Desktop$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && valgrind ./mnwe
==4026== Memcheck, a memory error detector
==4026== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4026== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==4026== Command: ./mnwe
==4026== 
a:
 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
 0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
 8000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000
fmsub(a,b,c):
 0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

==4026== 
==4026== HEAP SUMMARY:
==4026==     in use at exit: 0 bytes in 0 blocks
==4026==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==4026== 
==4026== All heap blocks were freed -- no leaks are possible
==4026== 
==4026== For counts of detected and suppressed errors, rerun with: -v
==4026== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
// mnwe.asm
.intel_syntax noprefix
.global _start

stdout = 1
sys_write   = 0x0001 
sys_exit    = 0x003c

.text
print_rax_hex:
          bswap rax
          vmovq xmm0,rax
          vpand xmm1,xmm0,xmmword ptr [mask_0f]
         vpsrlq xmm0,xmm0,4
          vpand xmm0,xmm0,xmmword ptr [mask_0f]
     vpunpcklbw xmm0,xmm0,xmm1
         vpaddb xmm1,xmm0,xmmword ptr [mask_30]
       vpcmpgtb xmm0,xmm0,xmmword ptr [mask_09]
          vpand xmm0,xmm0,xmmword ptr [mask_27]
         vpaddb xmm0,xmm0,xmm1
        vmovdqu xmmword ptr [rdi],xmm0
            add rdi,16
            ret

print_zmmword_hex:
            mov ecx,8
.l1:
            mov al,' '
          stosb
            mov rax,qword ptr [rsi]
            add rsi,8
           call print_rax_hex
            sub ecx,1
            jnz .l1
            mov al,10
          stosb
            ret

print_the_stuff:
            lea rsi,[astring]
            mov ecx,astring_end-astring
      rep movsb
            lea rsi,[a]
           call print_zmmword_hex

            lea rsi,[bstring]
            mov ecx,bstring_end-bstring
      rep movsb
            lea rsi,[b]
           call print_zmmword_hex

            lea rsi,[cstring]
            mov ecx,cstring_end-cstring
      rep movsb
            lea rsi,[c]
           call print_zmmword_hex

            lea rsi,abc1string
            mov ecx,abc1string_end-abc1string
      rep movsb
            lea rsi,[abc1]
           call print_zmmword_hex

            lea rsi,abc2string
            mov ecx,abc2string_end-abc2string
      rep movsb
            lea rsi,[abc2]
           call print_zmmword_hex

            mov al,10
          stosb
            ret

_start:
        vmovupd ymm1,ymmword ptr [a]
        vmovupd ymm11,ymmword ptr [a+32]
        vmovupd ymm2,ymmword ptr [b]
        vmovupd ymm12,ymmword ptr [b+32]
        vmovupd ymm3,ymmword ptr [c]
        vmovupd ymm13,ymmword ptr [c+32]

        vmovapd ymm0,ymm2
   vfnmadd213pd ymm0,ymm1,ymm3
        vmovupd ymmword ptr [abc1],ymm0
        vmovapd ymm0,ymm12
   vfnmadd213pd ymm0,ymm11,ymm13
        vmovupd ymmword ptr [abc1+32],ymm0

        vmovapd ymm0,ymm2
    vfmsub213pd ymm0,ymm1,ymm3
        vmovupd ymmword ptr [abc2],ymm0
        vmovapd ymm0,ymm12
    vfmsub213pd ymm0,ymm11,ymm13
        vmovupd ymmword ptr [abc2+32],ymm0

            lea rdi,[message]
           call print_the_stuff

            lea rsi,[message]
            mov rdx,rdi
            sub rdx,rsi
            mov edi,stdout
            mov eax,sys_write
        syscall

            xor edi,edi
            mov eax,sys_exit
        syscall

.data
.align 32
a:
   .quad 0x0000000000000000,0x0000000000000000,0x0000000000000000,0x0000000000000000
   .quad 0x8000000000000000,0x8000000000000000,0x8000000000000000,0x8000000000000000
b:
   .quad 0x0000000000000000,0x0000000000000000,0x8000000000000000,0x8000000000000000
   .quad 0x0000000000000000,0x0000000000000000,0x8000000000000000,0x8000000000000000
c:
   .quad 0x0000000000000000,0x8000000000000000,0x0000000000000000,0x8000000000000000
   .quad 0x0000000000000000,0x8000000000000000,0x0000000000000000,0x8000000000000000
abc1:
   .quad 1,2,3,4,5,6,7,8
abc2:
   .quad 1,2,3,4,5,6,7,8
mask_0f:
   .quad 0x0f0f0f0f0f0f0f0f,0x0f0f0f0f0f0f0f0f
mask_30:
   .quad 0x3030303030303030,0x3030303030303030
mask_09:
   .quad 0x0909090909090909,0x0909090909090909
mask_27:
   .quad 0x2727272727272727,0x2727272727272727

astring:
   .string "a:\n"
astring_end:

bstring:
   .string "b:\n"
bstring_end:

cstring:
   .string "c:\n"
cstring_end:

abc1string:
   .string "fnmadd(a,b,c):\n"
abc1string_end:

abc2string:
   .string "fmsub(a,b,c):\n"
abc2string_end:

message:
.rept 1000
.byte 0
.endr

@wbhart
Copy link
Collaborator

wbhart commented Apr 28, 2023

Here is the output from my AMD Ryzen 7 4800HS:

wbhart@LAPTOP-JUAA3GPC:~$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && ./mnwe
a:
0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000
fmsub(a,b,c):
0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

wbhart@LAPTOP-JUAA3GPC:~$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && valgrind ./mnwe
==21118== Memcheck, a memory error detector
==21118== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==21118== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==21118== Command: ./mnwe
==21118==
==21118== error calling PR_SET_PTRACER, vgdb might block
a:
0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
8000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000
fmsub(a,b,c):
0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

==21118==
==21118== HEAP SUMMARY:
==21118== in use at exit: 0 bytes in 0 blocks
==21118== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==21118==
==21118== All heap blocks were freed -- no leaks are possible
==21118==
==21118== For lists of detected and suppressed errors, rerun with: -s
==21118== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@fredrik-johansson
Copy link
Collaborator Author

Also a Ryzen:

fredrik@mordor:~/src/test$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && ./mnwe
a:
 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
 0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000
fmsub(a,b,c):
 0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

fredrik@mordor:~/src/test$ as mnwe.asm -o mnwe.o && ld mnwe.o -o mnwe && valgrind ./mnwe
==3106150== Memcheck, a memory error detector
==3106150== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3106150== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==3106150== Command: ./mnwe
==3106150== 
a:
 0000000000000000 0000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000 8000000000000000
b:
 0000000000000000 0000000000000000 8000000000000000 8000000000000000 0000000000000000 0000000000000000 8000000000000000 8000000000000000
c:
 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000
fnmadd(a,b,c):
 8000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 8000000000000000 8000000000000000
fmsub(a,b,c):
 0000000000000000 0000000000000000 8000000000000000 0000000000000000 8000000000000000 0000000000000000 0000000000000000 0000000000000000

==3106150== 
==3106150== HEAP SUMMARY:
==3106150==     in use at exit: 0 bytes in 0 blocks
==3106150==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==3106150== 
==3106150== All heap blocks were freed -- no leaks are possible
==3106150== 
==3106150== For lists of detected and suppressed errors, rerun with: -s
==3106150== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@albinahlback
Copy link
Collaborator

Reported in Valgrind's Bugzilla.

@albinahlback
Copy link
Collaborator

Superseded by #2092.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants