Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better to/from slice operations #571

Merged
merged 1 commit into from
Oct 21, 2024
Merged

Conversation

stepancheg
Copy link
Contributor

Check slice has enough capacity only once.

Godbolt shows:

  • machine code is smaller
  • only one branch for non-failing cases

@@ -483,6 +483,7 @@ impl {{ self_t }} {
#[inline]
#[must_use]
pub const fn from_slice(slice: &[{{ scalar_t }}]) -> Self {
assert!(slice.len() >= {{ dim }});
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accessing slice by index will also check that the index is in range, are you suggesting that adding this here makes the compiler elide the other changes, is this one of the operations that you were looking at generated code for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Range check is performed only once now. For this code compiler output before:

example::from_slice_slow::hd394bd4d83252b04:
        push    rax
        test    rdx, rdx
        je      .LBB2_4
        cmp     rdx, 1
        je      .LBB2_5
        cmp     rdx, 2
        jbe     .LBB2_3
        movss   xmm0, dword ptr [rsi]
        movss   dword ptr [rdi], xmm0
        movsd   xmm0, qword ptr [rsi + 4]
        movsd   qword ptr [rdi + 4], xmm0
        mov     rax, rdi
        pop     rcx
        ret
...

after:

example::from_slice_fast::h47cd007525b38d49:
        cmp     rdx, 2
        jbe     .LBB3_2
        movss   xmm0, dword ptr [rsi + 8]
        movsd   xmm1, qword ptr [rsi]
        movsd   qword ptr [rdi], xmm1
        movss   dword ptr [rdi + 8], xmm0
        mov     rax, rdi
        ret
...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I just noticed the godbolt link.

Godbolt might not tell the full story though as there are different implementations of glam vectors, using simd in some cases and floats in others. It's better to look at the generated assembly. You can use cargo asm, because most things are inlined it's usually necessary to add public test functions that call glam functions and look at the asm for those.

Copy link
Contributor Author

@stepancheg stepancheg Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not change code for SIMD versions, only code for working with regular slice.

Even if code is inlined, the same branches stay, because it is observable behavior:

  • different error messages before this PR for different indices
  • on write to slice, data was partially written before panic

Anyway, looking at cargo asm glam::test_xxx for Vec3A on my Apple laptop.

Code:

pub fn test_write(a: Vec3A, target: &mut [f32]) {
    a.write_to_slice(target);
}

pub fn test_read(a: &[f32]) -> Vec3A {
    Vec3A::from_slice(a)
}

Before:

glam::test_write:
 stp     x29, x30, [sp, #-16]!
 mov     x29, sp
 cbz     x2, LBB98_4
 ldr     q0, [x0]
 str     s0, [x1]
 cmp     x2, #1
 b.eq    LBB98_5
 add     x8, x1, #4
 st1.s   {, v0, }[1], [x8]
 cmp     x2, #2
 b.ls    LBB98_6
 add     x8, x1, #8
 st1.s   {, v0, }[2], [x8]
 ldp     x29, x30, [sp], #16
 ret
LBB98_4:
Lloh662:
 adrp    x2, l___unnamed_75@PAGE
Lloh663:
 add     x2, x2, l___unnamed_75@PAGEOFF
 mov     x0, #0
 mov     x1, #0
 bl      __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE
LBB98_5:
Lloh664:
 adrp    x2, l___unnamed_76@PAGE
Lloh665:
 add     x2, x2, l___unnamed_76@PAGEOFF
 mov     w0, #1
 mov     w1, #1
 bl      __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE
LBB98_6:
Lloh666:
 adrp    x2, l___unnamed_77@PAGE
Lloh667:
 add     x2, x2, l___unnamed_77@PAGEOFF
 mov     w0, #2
 mov     w1, #2
 bl      __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE

glam::test_read:
 stp     x29, x30, [sp, #-16]!
 mov     x29, sp
 cbz     x1, LBB99_4
 cmp     x1, #1
 b.eq    LBB99_5
 cmp     x1, #2
 b.ls    LBB99_6
 ldr     s0, [x0]
 add     x9, x0, #4
 ld1.s   {, v0, }[1], [x9]
 ldr     s1, [x0, #8]
 mov.s   v0[2], v1[0]
 mov.s   v0[3], v1[0]
 str     q0, [x8]
 ldp     x29, x30, [sp], #16
 ret
LBB99_4:
Lloh668:
 adrp    x2, l___unnamed_78@PAGE
Lloh669:
 add     x2, x2, l___unnamed_78@PAGEOFF
 mov     x0, #0
 bl      __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE
LBB99_5:
Lloh670:
 adrp    x2, l___unnamed_79@PAGE
Lloh671:
 add     x2, x2, l___unnamed_79@PAGEOFF
 mov     w0, #1
 bl      __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE
LBB99_6:
Lloh672:
 adrp    x2, l___unnamed_80@PAGE
Lloh673:
 add     x2, x2, l___unnamed_80@PAGEOFF
 mov     w0, #2
 mov     w1, #2
 bl      __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE

After:

glam::test_write:
 cmp     x2, #3
 b.ne    LBB98_2
 ldr     q0, [x0]
 add     x8, x1, #8
 st1.s   {, v0, }[2], [x8]
 str     d0, [x1]
 ret
LBB98_2:
 stp     x29, x30, [sp, #-16]!
 mov     x29, sp
Lloh662:
 adrp    x8, l___unnamed_75@PAGE
Lloh663:
 add     x8, x8, l___unnamed_75@PAGEOFF
 mov     x0, x2
 mov     w1, #3
 mov     x2, x8
 bl      __ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$15copy_from_slice17len_mismatch_fail17h7212f66e1542c208E

glam::test_read:
 cmp     x1, #2
 b.ls    LBB99_2
 ldr     d0, [x0]
 ldr     s1, [x0, #8]
 mov.s   v0[2], v1[0]
 mov.s   v0[3], v1[0]
 str     q0, [x8]
 ret
LBB99_2:
 stp     x29, x30, [sp, #-16]!
 mov     x29, sp
Lloh664:
 adrp    x0, l___unnamed_76@PAGE
Lloh665:
 add     x0, x0, l___unnamed_76@PAGEOFF
Lloh666:
 adrp    x2, l___unnamed_77@PAGE
Lloh667:
 add     x2, x2, l___unnamed_77@PAGEOFF
 mov     w1, #34
 bl      __ZN4core9panicking5panic17hae814ca9668aac2fE

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking. I was mostly wondering if to_array resulted in anything weird with underlying simd types since it will be dereferencing all the vector elements, seems fine though.

@bitshifter bitshifter merged commit 30c6b0d into bitshifter:main Oct 21, 2024
18 checks passed
stepancheg added a commit to stepancheg/glam-rs that referenced this pull request Nov 5, 2024
Fix regression introduced in bitshifter#571.
stepancheg added a commit to stepancheg/glam-rs that referenced this pull request Nov 5, 2024
Fix regression introduced in bitshifter#571.

`VecN::write_to_slice` expects the slice length to be >= dim, not == dim.

`slice::copy_from_slice` expects both slices to have the same length.
bitshifter pushed a commit that referenced this pull request Nov 5, 2024
Fix regression introduced in #571.

`VecN::write_to_slice` expects the slice length to be >= dim, not == dim.

`slice::copy_from_slice` expects both slices to have the same length.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants