-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better to/from slice operations #571
Conversation
@@ -483,6 +483,7 @@ impl {{ self_t }} { | |||
#[inline] | |||
#[must_use] | |||
pub const fn from_slice(slice: &[{{ scalar_t }}]) -> Self { | |||
assert!(slice.len() >= {{ dim }}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accessing slice by index will also check that the index is in range, are you suggesting that adding this here makes the compiler elide the other changes, is this one of the operations that you were looking at generated code for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Range check is performed only once now. For this code compiler output before:
example::from_slice_slow::hd394bd4d83252b04:
push rax
test rdx, rdx
je .LBB2_4
cmp rdx, 1
je .LBB2_5
cmp rdx, 2
jbe .LBB2_3
movss xmm0, dword ptr [rsi]
movss dword ptr [rdi], xmm0
movsd xmm0, qword ptr [rsi + 4]
movsd qword ptr [rdi + 4], xmm0
mov rax, rdi
pop rcx
ret
...
after:
example::from_slice_fast::h47cd007525b38d49:
cmp rdx, 2
jbe .LBB3_2
movss xmm0, dword ptr [rsi + 8]
movsd xmm1, qword ptr [rsi]
movsd qword ptr [rdi], xmm1
movss dword ptr [rdi + 8], xmm0
mov rax, rdi
ret
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just noticed the godbolt link.
Godbolt might not tell the full story though as there are different implementations of glam vectors, using simd in some cases and floats in others. It's better to look at the generated assembly. You can use cargo asm, because most things are inlined it's usually necessary to add public test functions that call glam functions and look at the asm for those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR does not change code for SIMD versions, only code for working with regular slice.
Even if code is inlined, the same branches stay, because it is observable behavior:
- different error messages before this PR for different indices
- on write to slice, data was partially written before panic
Anyway, looking at cargo asm glam::test_xxx
for Vec3A
on my Apple laptop.
Code:
pub fn test_write(a: Vec3A, target: &mut [f32]) {
a.write_to_slice(target);
}
pub fn test_read(a: &[f32]) -> Vec3A {
Vec3A::from_slice(a)
}
Before:
glam::test_write:
stp x29, x30, [sp, #-16]!
mov x29, sp
cbz x2, LBB98_4
ldr q0, [x0]
str s0, [x1]
cmp x2, #1
b.eq LBB98_5
add x8, x1, #4
st1.s {, v0, }[1], [x8]
cmp x2, #2
b.ls LBB98_6
add x8, x1, #8
st1.s {, v0, }[2], [x8]
ldp x29, x30, [sp], #16
ret
LBB98_4:
Lloh662:
adrp x2, l___unnamed_75@PAGE
Lloh663:
add x2, x2, l___unnamed_75@PAGEOFF
mov x0, #0
mov x1, #0
bl __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE
LBB98_5:
Lloh664:
adrp x2, l___unnamed_76@PAGE
Lloh665:
add x2, x2, l___unnamed_76@PAGEOFF
mov w0, #1
mov w1, #1
bl __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE
LBB98_6:
Lloh666:
adrp x2, l___unnamed_77@PAGE
Lloh667:
add x2, x2, l___unnamed_77@PAGEOFF
mov w0, #2
mov w1, #2
bl __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE
glam::test_read:
stp x29, x30, [sp, #-16]!
mov x29, sp
cbz x1, LBB99_4
cmp x1, #1
b.eq LBB99_5
cmp x1, #2
b.ls LBB99_6
ldr s0, [x0]
add x9, x0, #4
ld1.s {, v0, }[1], [x9]
ldr s1, [x0, #8]
mov.s v0[2], v1[0]
mov.s v0[3], v1[0]
str q0, [x8]
ldp x29, x30, [sp], #16
ret
LBB99_4:
Lloh668:
adrp x2, l___unnamed_78@PAGE
Lloh669:
add x2, x2, l___unnamed_78@PAGEOFF
mov x0, #0
bl __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE
LBB99_5:
Lloh670:
adrp x2, l___unnamed_79@PAGE
Lloh671:
add x2, x2, l___unnamed_79@PAGEOFF
mov w0, #1
bl __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE
LBB99_6:
Lloh672:
adrp x2, l___unnamed_80@PAGE
Lloh673:
add x2, x2, l___unnamed_80@PAGEOFF
mov w0, #2
mov w1, #2
bl __ZN4core9panicking18panic_bounds_check17h47e687fd4a18b35fE
After:
glam::test_write:
cmp x2, #3
b.ne LBB98_2
ldr q0, [x0]
add x8, x1, #8
st1.s {, v0, }[2], [x8]
str d0, [x1]
ret
LBB98_2:
stp x29, x30, [sp, #-16]!
mov x29, sp
Lloh662:
adrp x8, l___unnamed_75@PAGE
Lloh663:
add x8, x8, l___unnamed_75@PAGEOFF
mov x0, x2
mov w1, #3
mov x2, x8
bl __ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$15copy_from_slice17len_mismatch_fail17h7212f66e1542c208E
glam::test_read:
cmp x1, #2
b.ls LBB99_2
ldr d0, [x0]
ldr s1, [x0, #8]
mov.s v0[2], v1[0]
mov.s v0[3], v1[0]
str q0, [x8]
ret
LBB99_2:
stp x29, x30, [sp, #-16]!
mov x29, sp
Lloh664:
adrp x0, l___unnamed_76@PAGE
Lloh665:
add x0, x0, l___unnamed_76@PAGEOFF
Lloh666:
adrp x2, l___unnamed_77@PAGE
Lloh667:
add x2, x2, l___unnamed_77@PAGEOFF
mov w1, #34
bl __ZN4core9panicking5panic17hae814ca9668aac2fE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking. I was mostly wondering if to_array
resulted in anything weird with underlying simd types since it will be dereferencing all the vector elements, seems fine though.
Fix regression introduced in bitshifter#571.
Fix regression introduced in bitshifter#571. `VecN::write_to_slice` expects the slice length to be >= dim, not == dim. `slice::copy_from_slice` expects both slices to have the same length.
Fix regression introduced in #571. `VecN::write_to_slice` expects the slice length to be >= dim, not == dim. `slice::copy_from_slice` expects both slices to have the same length.
Check slice has enough capacity only once.
Godbolt shows: