-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
illegal instruction (core dumped) #22
Comments
Hello. May I ask what version is your installed hnswlib? |
Hi Tony, Thanks for replying. The version of my hnswlib is 0.8.0. |
Hello. I think its own environment is a good idea. That is what I usually do.
Then install scimilarity via pip. I will test a few more python versions and update the install instructions to recommend some environments. |
Thanks for sharing your yaml script. I am trying with it now, will let you know the outcome. |
Thanks again, it worked out. |
Hi Tony, It is very weird that it worked for a while, then stopped working aging at the line: cq = CellQuery(model_path) |
That is possible. Hnswlib is compiled on your machine on install and it compiles for available SIMD instructions, so if CPUs don't have the same SIMD it might crash. There are things to test this if you want.
|
To add to the previous, there are only 3 things used in scimilarity that may crash like a segfault. Pytorch, which doesn't usually crash silently, hnswlib, or tiledb. The latter two of which are C bindings, hence the segfaults. Tiledb and hnswlib older versions might be in compatible with the index built with newer versions. And hnswlib has a compile specific to architecture, as an additional complexity. |
If it does turn out to be a CPU architecture or C library issue, you could also check if your compute cluster admins have grouped different hardware purchases into different slurm queues (not uncommon). Your default queue might take any node, but there may be other queues that restrict only to nodes with certain memory/etc, which tend to align with CPU versions as they represent a bulk hardware purchase. I've seen node specific crashes with numpy before too (though not with SCimilarity). |
Thank you, Tony, for the detailed explanation. I have arranged with our
admin to see if it is architecture related. Best, Shuye
…On Sat, Nov 30, 2024, 12:19 a.m. Jason Vander Heiden < ***@***.***> wrote:
If it does turn out to be a CPU architecture or C library issue, you could
also check if your compute cluster admins have grouped different hardware
purchases into different slurm queues (not uncommon). Your default queue
might take any node, but there may be other queues that restrict only to
nodes with certain memory/etc, which tend to align with CPU versions as
they represent a bulk hardware purchase.
I've seen node specific crashes with numpy before too (though not with
SCimilarity).
—
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABRPJSQVBZPUKWS2QVUPEJL2DFDEZAVCNFSM6AAAAABSTYXPIWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBYHAZTMOBSGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi Tony, we have figured it out. It is a CPU architecture issue, it worked again on node with newer CPU. Best, Shuye |
Hi SCimilarity Team:
I am trying to run the code in the tutorial on a linux cluster node with 128GB ram, and I got the following error:
model_path = "/cluster/projects/hardinggroup/Shuye/SCimilarity/model_v1.1"
Any idea?
Here is my base environment info (partial)
Python 3.12.2
pytorch-lightning 2.4.0
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
The text was updated successfully, but these errors were encountered: