-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes ASR numpy > 2.x compatibility issues while replicating existing behavior #11447
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: andylamp <[email protected]>
Note: I cannot seem to be able to request review for this PR, therefore based on contributing guidelines I am asking @titu1994 for one as that username was the first one in the list ;). |
beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base. Your code was analyzed with PyLint. The following annotations have been identified:
Thank you for improving NeMo's documentation! |
Maybe @redoctopus, @jbalam-nv, or @okuchaiev could review? |
Would be great to get this merged as |
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
@oyilmaz-nvidia could you review this PR? It would be really nice if we could get support for numpy>2 |
Yeah, it'd be great if that's the case :). As I said, I have to apply this patch in order to get everything working properly when using |
Any updates on this one? :) |
Pinging again in this thread given no review has been given... |
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
tagging @redoctopus, @jbalam-nv, or @okuchaiev requesting a review 🙏 |
What does this PR do ?
The PR attempts to restore functionality with recent
numpy
versions;numpy
2.0 removedsctypes
and this is breakingasr
functionality.Collection:
asr
Changelog
numpy
from version 2.0 onwards removedsctypes
functionality, which breaksasr
functionality in various places (e.g. transcribe_speech, example notebooks etc)._convert_samples_to_float32
function residing in both feature_loader.py and segment.py.Usage
The usage is that ASR tasks that call
_convert_samples_to_float32
functions now succeed. As noted above, while this has been attempted to be addressed in the PRs mentioned above, I believe that the functionality is not replicated accurately.More concretely, if you check the output of
sctype
for the types ofint
andfloat
in supported versions ofnumpy
you'd get the following:However, if we use the
issubdtype
to perform this the set will be wider, case on point for floating point:And, also for integers the set mostly covers the output of
sctype
for signed ones. However any subclass fromsignedinteger
orunsignedinteger
will return true. A more concrete example would be,Therefore, in this case not only we consider
int{8,16,32,64}
but alsouint{8,16,32,64}
which is not the expected result when queryingsctype['int']
leading to potentially unexpected behavior. Case on point,GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information
The fix is backwards compatible with all supported
numpy
1.x versions as well. Therefore, it should pose minimal risk wrt to integration.