Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle negative values of incx in nrm2. #4186

Closed
steppi opened this issue Aug 7, 2023 · 4 comments · Fixed by #4190
Closed

Handle negative values of incx in nrm2. #4186

steppi opened this issue Aug 7, 2023 · 4 comments · Fixed by #4190

Comments

@steppi
Copy link
Contributor

steppi commented Aug 7, 2023

In scipy/scipy#16930, @M0reDr1nk noticed a discrepancy between reference LAPACK and OpenBLAS for the function dnrm2, in that reference LAPACK allows negative values of incx while OpenBLAS does not. Support for negative values of INCX was added in LAPACK 3.10 as part of adding Edward Anderson's Safe Scaling Algorithm. As far as I'm aware, support for negative indices was incidental to the addition of safe scaling.

I saw that there's a comment in gh-4130 stating that updating nrm2 to use safe scaling will be difficult due to the need to update the assembly kernels for a great number of architectures. I imagine it would be easier to add support for negative values of incx and am curious if there's interest in supporting negative values of incx before updating to use safe scaling, and if this could be done in a reasonable time-frame. I would be willing to help in this effort if that may be of use.

@martin-frbg
Copy link
Collaborator

Not sure if it makes sense to split the two subtopics, unless we can get away with just changing the starting x in the interface code (like e.g. dot does) and adapting the initial test in the assembly to only return early when the increment is zero. (Need to try this, seems even the reference did not update any tests to correspond to the new behaviour).
OTOH not failing on incx < 0 may be seen/used as an indication for actually being a new-style ?NRM2 routine ??

Anyway help is always welcome, in particular if you are somewhat familiar with assembly - I'm still something of a self-taught amateur there, more confident with small modifications to existing code than writing from scratch.

@steppi
Copy link
Contributor Author

steppi commented Aug 8, 2023

Not sure if it makes sense to split the two subtopics, unless we can get away with just changing the starting x in the interface code (like e.g. dot does) and adapting the initial test in the assembly to only return early when the increment is zero. (Need to try this, seems even the reference did not update any tests to correspond to the new behaviour).
OTOH not failing on incx < 0 may be seen/used as an indication for actually being a new-style ?NRM2 routine ?

That's what I was thinking too, that maybe it would mostly just need a change in the interface. I think I agree that it might not make sense to split the topics up though.

Anyway help is always welcome, in particular if you are somewhat familiar with assembly - I'm still something of a self-taught amateur there, more confident with small modifications to existing code than writing from scratch.

I'm also mostly a self taught amateur, and am currently only familiar with AT&T syntax x86-64, but am interested in learning more, and would like to reach a professional level. I'll look around the code-base a bit to try to get a sense of how the pieces fit together.

@martin-frbg
Copy link
Collaborator

First test looking good for x86_64, but not sure if I can do the whole nrm2 redesign in one go... the alternative would be to redo the generic C implementation and retire the assembly ones for now.

@steppi
Copy link
Contributor Author

steppi commented Aug 10, 2023

Thanks @martin-frbg. I’d like to start learning how to write hand optimized assembly kernels so I can help with the full update of nrm2 to be compatible with Lapack 3.10. Do you have any advice for reading materials?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants