Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing optimization in zip_float #114959

Open
junaire opened this issue Nov 5, 2024 · 1 comment
Open

Missing optimization in zip_float #114959

junaire opened this issue Nov 5, 2024 · 1 comment

Comments

@junaire
Copy link
Member

junaire commented Nov 5, 2024

Source:

#include <immintrin.h>

void zip_float(const double *src, double *dst) {
    __m256d s0 = _mm256_broadcast_pd((__m128d*)src);
    __m256d s1 = _mm256_broadcast_pd((__m128d*)src + 2);
    __m256d s = _mm256_shuffle_pd(s0, s1, 0xc);
    s = _mm256_mul_pd(s, s);
    _mm256_store_pd(dst, s);
}

LLVM:

zip_float:
        vmovupd xmm0, xmmword ptr [rdi]
        vmovupd xmm1, xmmword ptr [rdi + 32]
        vunpcklpd       xmm2, xmm0, xmm1
        vunpckhpd       xmm0, xmm0, xmm1
        vinsertf128     ymm0, ymm2, xmm0, 1
        vmulpd  ymm0, ymm0, ymm0
        vmovapd ymmword ptr [rsi], ymm0
        vzeroupper
        ret

GCC:

zip_float:
        vbroadcastf128  ymm0, XMMWORD PTR [rdi]
        vbroadcastf128  ymm1, XMMWORD PTR [rdi+32]
        vshufpd ymm0, ymm0, ymm1, 12
        vmulpd  ymm0, ymm0, ymm0
        vmovapd YMMWORD PTR [rsi], ymm0
        vzeroupper
        ret

Godbolt: https://godbolt.org/z/ffz1YEhPE
Tweeted by FFmpeg: https://x.com/FFmpeg/status/1853326818008514900

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 5, 2024

@llvm/issue-subscribers-backend-x86

Author: Jun Zhang (junaire)

Source: ```c #include <immintrin.h>

void zip_float(const double src, double dst) {
__m256d s0 = _mm256_broadcast_pd((__m128d
)src);
__m256d s1 = _mm256_broadcast_pd((__m128d
)src + 2);
__m256d s = _mm256_shuffle_pd(s0, s1, 0xc);
s = _mm256_mul_pd(s, s);
_mm256_store_pd(dst, s);
}


LLVM:

zip_float:
vmovupd xmm0, xmmword ptr [rdi]
vmovupd xmm1, xmmword ptr [rdi + 32]
vunpcklpd xmm2, xmm0, xmm1
vunpckhpd xmm0, xmm0, xmm1
vinsertf128 ymm0, ymm2, xmm0, 1
vmulpd ymm0, ymm0, ymm0
vmovapd ymmword ptr [rsi], ymm0
vzeroupper
ret


GCC:

zip_float:
vbroadcastf128 ymm0, XMMWORD PTR [rdi]
vbroadcastf128 ymm1, XMMWORD PTR [rdi+32]
vshufpd ymm0, ymm0, ymm1, 12
vmulpd ymm0, ymm0, ymm0
vmovapd YMMWORD PTR [rsi], ymm0
vzeroupper
ret


Godbolt: https://godbolt.org/z/ffz1YEhPE
Tweeted by FFmpeg: https://x.com/FFmpeg/status/1853326818008514900
</details>

@RKSimon RKSimon self-assigned this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants