-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault relion_refine_mpi #1154
Comments
No ideas from the community? |
Hi @KrisJanssen I am seeing the same error message. Were you able to fix it? |
@rahelwoldeyes : unfortunately not, really hoping the devs or other docs with knowledge of cuda and MPI might be able to shed some light… |
To further support this issue, the error consistently appears on our cluster regardless of whether Relion is containerized or installed directly. Describe your problem Refine3D consistently crashes with a segmentation fault at the beginning of the first iteration, regardless of MPI or GPU use. This occurs for datasets larger than trivially small ones (~500 particles), indicating a potential memory management issue. The dataset appears to be read six times. The problem was introduced in commit fa8dce3, as commit 90d239e works correctly. Environment: Dataset: Job options:
Error message: Please cite the full error message as the example below.
|
I am also seeing this problem, it is still open I see. |
Yes, this is still an issue for me. It is good to know the problem is related to 2D extraction in Warp Linux #1181 . |
interesting, your particles are also coming from Warp Linux then? Is there any workaround? I tried older versions of RELION but I can only go back so far because only the latest versions work on our latest debian system. p.s. I recognize your name from GRC, we chatted briefly! Cool to see familiar names popping up after that :) |
@rahelwoldeyes and @DrJesseHansen : not sure about the warp Linux thing: I can trigger the issue with any generic data processed in a generic Ubuntu docker container … |
@DrJesseHansen My particles are from Warp Linux. I haven't found a workaround yet. I'm unsure if downgrading to an older RELION-5 version is advisable, given the bug fixes and improvements in newer versions. @KrisJanssen Thanks for pointing out that this isn't a Warp Linux-specific problem.
@DrJesseHansen I remember! It's nice to see you here. The GRS/GRC is a great way to connect with people. |
I figured out how to somewhat get around this issue. It turns out there were somehow some bad particles in the dataset. I’m guessing this is a WARP issue, since I extracted the exact same particles in the RELION pipeline and had no issue. Anyway, I split my particles into 10 subsets and refined each: 3 failed with this same error, 7 ran okay. Sub-splitting can help you reduce "waste" when throwing out the bad particles. I have no idea what is wrong with those problematic particles causing the crash. Now when I run the jobs I am getting a different error! haha. This one is about "corrupted size vs previous size". issue #794 p.s. I am running relion 5 commit 6331fe. |
Hmm, could someone try the following: line 1534 of src/ml_optimiser.cpp in the new version is this: if (do_write_data && (mymodel.data_dim == 3 || mydata.is_tomo) ) And might need to be changed to this: if (do_write_data && !optimisationSet.isEmpty() && (mymodel.data_dim == 3 || mydata.is_tomo) ) Please recompile the code and test again. Not sure how this is related to particles from Warp, but let's give it a go... |
@scheres Thanks for the suggestion, but it didn't fix the problem for me. I get the same error message. How did it go for you @KrisJanssen @DrJesseHansen? |
Could you then carefully compare the star files you get from warp with those you get when extracting particles in relion? |
hi, comparing the star files is interesting. Optics tables are identical except Warp has AC at 0.07 and Relion at 0.1. The data tables though have some differences (see image below):
I should mention too that if I generate 3D subvols in Warp then use relion_reconstruct I actually do get a reasonable looking reconstruction, however when I do the same with 2D images and relion_tomo_reconstruct (see command below) it outputs merged.mrc as an empty volume. relion_tomo_reconstruct_particle --i reextracted_bin8_2D_optimisation_set.star --theme classic --o Reconstruct/job001/ --b 40 --bin 8 --j 1 --j_out 1 --j_in 1 --sym C1 Jesse |
There was an issue with Euler angle conversion in WarpTools. It has been fixed with 2.0.0dev26 and I at leaset now get the same angles with RELION5 and WarpTools (warpem/warp#227) |
I still get the error though, with the right euler angles. Anyone have any ideas about a smart way to check which tomogram the offending particle is from (I have >350 tomograms, and I don't want to run too many tests) |
@rahelwoldeyes @DrJesseHansen @KrisJanssen I ran a script removing all particles that are found on only 0 or 1 tilts, and afterwards it worked for me. See also: warpem/warp#243 |
WOW This worked! Thank you! Amazing.... can I ask, how did you figure this out? |
I think _rlnTomoVisibleFrames means in which of your tilts is the particle seen. How I figured it out was a bit of a coincidence... I was just looking through the star file and saw that I had a particle (luckily in the second tomogram from the top) that had all zeroes in the _rlnTomoVisibleFrames list, whereas most particles had all ones, and thought this might be the issue. It worked for my first star file, but not for the second. So there I decided to also remove the particles only seen on one tilt, and that seems to do the trick. |
Wow, thank you @rkjensen and everyone! It works!! |
hi all, just want to update here. This does seem to allow the job to run, but the result is still suboptimal and I can't figure out why. When I extract 3D volumes rather than 2D I get the following outcome: I can do a relion_reconstruct and it gives a very reasonable looking volume, suggesting the data are fine. When I try to refine these 3D subvolumes I get an error I have not seen in 5 years #582. Instead if I try extracting with 2D images I get the following outcome: relion_reconstruct_particle gives an empty volume, already not a good sign. When I do a 3D refine I see my map getting progressively worse, the reference turns more faint with each iteration then after it 5 it's an empty volume. any thoughts? best Jesse |
update: I figured it out. When I run the 2D particles with a refine job it always gives the weird result where the density eventually disappears, but if I run a 3D classification with 1 class then it works. Not sure why this happens, but it did the trick. Now I can use the output particles and volume from 3D classification to continue with subsequent steps. |
Describe your problem
I created a docker image to benchmark running relion 4.0.1-commit-ex417f on multiple hosts in our organization.
For the benchmark, I use a standard dataset: ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion_benchmark.tar.gz
The dockerfile is here: https://gist.github.com/KrisJanssen/7ff75ad91926e46daa767d71c48f7ced
So far, the resulting container ran fine on any system I threw it at, wheter on-premises or on some of our Azure VMs.
Today, I wanted to test the same image and job on a new on-premise system, ultimately resulting in a segmentation fault.
Environment:
Dataset:
Job options:
Error message:
Please cite the full error message as the example below.
Starting the job:
Then finally, it all goes pear-shaped:
The text was updated successfully, but these errors were encountered: