Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault Error in LidarCenterNet Import #264

Open
amanysh99 opened this issue Jan 3, 2025 · 9 comments
Open

Segmentation Fault Error in LidarCenterNet Import #264

amanysh99 opened this issue Jan 3, 2025 · 9 comments

Comments

@amanysh99
Copy link

When running train.py with python -m pdb, a segmentation fault error occurs upon importing LidarCenterNet from model.py. The error arises after successfully importing PyTorch and other dependencies. This suggests a potential issue within the LidarCenterNet module or its interactions with PyTorch/CUDA. Steps to reproduce:
Run python -m pdb train.py with specified arguments.
Observe segmentation fault error at LidarCenterNet import.
Environment:
PyTorch version: 1.13.1+cu117
CUDA version: 12.0 (V12.0.140)
OS: Ubuntu (WSL2) 5.15.167.4-microsoft-standard
Expected Behavior:
Successful execution of train.py without segmentation fault errors.
Actual Behavior:
Segmentation fault error upon importing LidarCenterNet.
The result of debugging and the error snip are attached.
a
aa

Can anyone help?
Thanks in advance

@Kait0
Copy link
Collaborator

Kait0 commented Jan 7, 2025

You probably have the wrong version of mmcv or torchscatter installed. Those two are also dependent of pytorch and cuda so if you change the versions of these two you need to change their versions as well.

I don't know if the code will work without changes if you upgrade the pytorch, mmcv, cuda versions etc.
If you don't actually need the newer versions I would recommend you use the same as use in our repo.

@amanysh99
Copy link
Author

Thank you, I look forward to trying this out. @Kait0

@amanysh99
Copy link
Author

amanysh99 commented Jan 10, 2025

I'm experiencing a compatibility issue between CUDA and PyTorch. My GPU supports CUDA 12.5 (screenshot 1: nvidia-smi output). Following the installation guide (screenshot 2: CUDA Toolkit 12.5.0 instructions), I successfully installed CUDA 12.5 on WSL (Ubuntu) and verified its availability using python -c "import torch; print(torch.cuda.is_available())".
However, when installing PyTorch, I discovered the latest compatible version only supports CUDA 12.4 (screenshot 3). Attempting to downgrade to CUDA 12.4 resulted in errors (screenshot 4).
Please advise on resolving this compatibility issue or provide alternative solutions.

screenshot 1

screenshot 1
screenshot 2
screenshot 3
screenshot 4
screenshot

@Kait0
Copy link
Collaborator

Kait0 commented Jan 10, 2025

The support that is shown with nvidia-smi means support up to cuda 12.5. using older versions should be fine.
For most things you don't need a local installation of cuda, pytorch comes with cuda packaged. Torch-scatter might need it but you most likely won't use the lib you can also not install it and comment this line.

@amanysh99
Copy link
Author

amanysh99 commented Jan 10, 2025

I followed your suggestions, but unfortunately, I'm still encountering issues:
ImportError: cannot import name 'force_fp32' from 'mmcv.runner' (unknown location)
PyTorch version is: 1.11.0+cu102
MMCV version is: 1.5.3
AssertionError: MMCV==1.5.3 is used but incompatible. Please install mmcv>=2.0.0rc4, <2.2.0.
er

@Kait0
Copy link
Collaborator

Kait0 commented Jan 10, 2025

Never seen this issue before. You will need to debug / google.
MMCV is usually a pain to setup correctly which is why I removed this dependency in our followup work TransFuser++.

@amanysh99
Copy link
Author

I'm currently debugging the train.py script and I encountered an error when running the command python train.py --batch_size 5 --logdir ./log --root_dir ./data --parallel_training 1. The error message indicates an issue with MMCV compatibility. Specifically, it seems that PyTorch 1.11.0+cu102 requires MMCV 2.x, but I have MMCV 1.5.3 installed. I'm investigating possible solutions, including downgrading PyTorch or upgrading MMCV."
debug

@Kait0
Copy link
Collaborator

Kait0 commented Jan 10, 2025

pytorch does not require mmcv. mmcv depends on pytorch.
perhaps you have something else like mmdetection installed that requires a different version of mmcv.

@amanysh99
Copy link
Author

d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants