Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA driver error: invalid argument - Google Container Registry (GCR) #374

Open
AClab-sgarcia opened this issue Jul 16, 2024 · 0 comments

Comments

@AClab-sgarcia
Copy link

          Hello, 

I am currently also trying to run cellbender in a singularity container on our HPC using slurm scheduler.

In my case my code looks like this:

export PATH=/opt/ohpc/pub/libs/singularity/3.7.1/bin:$PATH
export SINGULARITY_CACHEDIR=/fastdata/GPArkaitz_fastdata/sgarcia/singularity_cache
export SINGULARITY_TMPDIR=/fastdata/GPArkaitz_fastdata/sgarcia/singularity_tmp
export PATH=/opt/ohpc/pub/apps/cuda/cuda-11.4/bin:$PATH
export LD_LIBRARY_PATH=/opt/ohpc/pub/apps/cuda/cuda-11.4/lib64:$LD_LIBRARY_PATH
singularity exec --nv \
-B /vols/GPArkaitz_bigdata/sgarcia/sc_AC76/CellRangerCount:/data \
-B /vols/GPArkaitz_bigdata/sgarcia/sc_AC76_n9/CellBender:/output \
/fastdata/GPArkaitz_fastdata/sgarcia/cellbender_latest.sif \
cellbender remove-background --cuda \
--input /data/L01/outs/raw_feature_bc_matrix.h5 \
--output /output/L01_CellBender.h5 \
--debug

At the beginning it looks like it works until I get the followin error:
RuntimeError: CUDA driver error: invalid argument

The whole output here:

cellbender:remove-background: Command:
cellbender remove-background --cuda --input /data/L01/outs/raw_feature_bc_matrix.h5 --output /output/L01_CellBender.h5 --checkpoint /checkpoints/L01_ckpt.tar.gz --debug
cellbender:remove-background: CellBender 0.3.0
cellbender:remove-background: (Workflow hash db0e1542cb)
cellbender:remove-background: 2024-07-15 17:10:37
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from /data/L01/outs/raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Features in dataset: 36601 Gene Expression
cellbender:remove-background: Trimming features for inference.
cellbender:remove-background: 30327 features have nonzero counts.
cellbender:remove-background: Computing priors from the UMI curve
cellbender:remove-background: Beginning priors.get_cell_count_empty_count()
cellbender:remove-background: cell_count_low_limit is 2388.781800161101
cellbender:remove-background: cutoff = 238.8781800161101
cellbender:remove-background: new_empty_count_prior = 7.315533762309573
cellbender:remove-background: delta = 107.31553376230957
cellbender:remove-background: cutoff = 147.82204597502562
cellbender:remove-background: new_empty_count_prior = 7.315533762309573
cellbender:remove-background: delta = 0.0
cellbender:remove-background: cell_count_prior is 873.0
cellbender:remove-background: empty_count_prior is 7.315533762309573
cellbender:remove-background: End of priors.get_cell_count_empty_count()
cellbender:remove-background: In get_expected_cells_and_total_droplets(), found transition point at droplet 85002
cellbender:remove-background: Automatically computed priors: {'cell_counts': 873.0, 'empty_counts': 7.315533762309573, 'empty_count_upper_limit': 10.91349394304199, 'expected_cells': 42429, 'total_droplets': 93493, 'transition_point': 85002}
cellbender:remove-background: Heuristics for estimating priors resulted in 93493 total_droplets, which is typically too large. Recomputing with low_count_threshold = 11
cellbender:remove-background: Beginning priors.get_cell_count_empty_count()
cellbender:remove-background: cell_count_low_limit is 2925.0363556132925
cellbender:remove-background: cutoff = 292.50363556132925
cellbender:remove-background: new_empty_count_prior = 12.06127612044471
cellbender:remove-background: delta = 112.06127612044472
cellbender:remove-background: cutoff = 209.00070760755057
cellbender:remove-background: new_empty_count_prior = 12.06127612044471
cellbender:remove-background: delta = 0.0
cellbender:remove-background: cell_count_prior is 873.0
cellbender:remove-background: empty_count_prior is 12.06127612044471
cellbender:remove-background: End of priors.get_cell_count_empty_count()
cellbender:remove-background: In get_expected_cells_and_total_droplets(), found transition point at droplet 84927
cellbender:remove-background: Automatically computed priors: {'cell_counts': 873.0, 'empty_counts': 12.06127612044471, 'empty_count_upper_limit': 16.281019801788414, 'expected_cells': 42429, 'total_droplets': 85750, 'transition_point': 84927}
cellbender:remove-background: Heuristics for estimating priors resulted in 85750 total_droplets, which is typically too large. Recomputing with low_count_threshold = 16
cellbender:remove-background: Beginning priors.get_cell_count_empty_count()
cellbender:remove-background: cell_count_low_limit is 2952.711420396693
cellbender:remove-background: cutoff = 295.2711420396693
cellbender:remove-background: new_empty_count_prior = 21.97707797576339
cellbender:remove-background: delta = 121.9770779757634
cellbender:remove-background: cutoff = 282.8652861612204
cellbender:remove-background: new_empty_count_prior = 21.97707797576339
cellbender:remove-background: delta = 0.0
cellbender:remove-background: cell_count_prior is 875.0
cellbender:remove-background: empty_count_prior is 21.97707797576339
cellbender:remove-background: End of priors.get_cell_count_empty_count()
cellbender:remove-background: In get_expected_cells_and_total_droplets(), found transition point at droplet 84836
cellbender:remove-background: Automatically computed priors: {'cell_counts': 875.0, 'empty_counts': 21.97707797576339, 'empty_count_upper_limit': 267.735619713646, 'expected_cells': 42214, 'total_droplets': 85010, 'transition_point': 84836}
cellbender:remove-background: Heuristics for estimating priors resulted in 85010 total_droplets, which is typically too large. Recomputing with low_count_threshold = 268
cellbender:remove-background: Beginning priors.get_cell_count_empty_count()
cellbender:remove-background: cell_count_low_limit is 2962.578753906682
cellbender:remove-background: cutoff = 323
cellbender:remove-background: new_empty_count_prior = 295.8936206404833
cellbender:remove-background: delta = 395.8936206404833
cellbender:remove-background: cutoff = 1039.4896382710363
cellbender:remove-background: new_empty_count_prior = 982.4014172182525
cellbender:remove-background: delta = 686.5077965777692
cellbender:remove-background: cutoff = 1894.0736127047805
cellbender:remove-background: new_empty_count_prior = 1619.706112933681
cellbender:remove-background: delta = 637.3046957154285
cellbender:remove-background: cutoff = 2432.0386597905094
cellbender:remove-background: new_empty_count_prior = 1790.0520918436544
cellbender:remove-background: delta = 170.34597890997338
cellbender:remove-background: cutoff = 2556.731948306845
cellbender:remove-background: new_empty_count_prior = 1790.0520918436544
cellbender:remove-background: delta = 0.0
cellbender:remove-background: Heuristics for determining empty counts exceeded 5 iterations without converging
cellbender:remove-background: cell_count_prior is 5630.0
cellbender:remove-background: empty_count_prior is 1790.0520918436544
cellbender:remove-background: End of priors.get_cell_count_empty_count()
cellbender:remove-background: In get_expected_cells_and_total_droplets(), found transition point at droplet 2594
cellbender:remove-background: Automatically computed priors: {'cell_counts': 5630.0, 'empty_counts': 1790.0520918436544, 'empty_count_upper_limit': 2670.4439206767793, 'expected_cells': 1172, 'total_droplets': 4993, 'transition_point': 2594}
cellbender:remove-background: Prior on counts for cells is 5630
cellbender:remove-background: Prior on counts for empty droplets is 1790
cellbender:remove-background: Priors:
cell_counts: 5630.0
empty_counts: 1790.0520918436544
empty_count_upper_limit: 2670.4439206767793
expected_cells: 1172
total_droplets: 4993
transition_point: 2594
log_counts_crossover: 8.16284669493511
surely_empty_counts: 2186
d_std: 0.09623685530602859
d_empty_std: 0.01
cellbender:remove-background: Excluding 4314 features that are estimated to have <= 0.1 background counts in cells.
cellbender:remove-background: Including 26013 features in the analysis.
cellbender:remove-background: Trimming barcodes for inference.
cellbender:remove-background: Excluding barcodes with counts below 895
cellbender:remove-background: Using 1172 probable cell barcodes, plus an additional 3821 barcodes, and 34971 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 2186 UMI counts.
cellbender:remove-background: Priors:
cell_counts: 5630.0
empty_counts: 1790.0520918436544
empty_count_upper_limit: 2670.4439206767793
expected_cells: 1172
total_droplets: 4993
transition_point: 2594
log_counts_crossover: 8.16284669493511
surely_empty_counts: 2186
d_std: 0.09623685530602859
d_empty_std: 0.01
cell_logit: -1.4644811865116152
chi_ambient: tensor([2.5101e-07, 2.5101e-07, 2.2819e-08,  ..., 9.1276e-08, 4.5638e-08,
        7.9867e-07])
chi_bar: tensor([3.1331e-07, 1.9582e-07, 5.8745e-08,  ..., 6.8536e-08, 4.8954e-08,
        8.5180e-07])
cellbender:remove-background: Attempting to load checkpoint from /checkpoints/L01_ckpt.tar.gz
cellbender:remove-background: Attempting to unpack tarball "/checkpoints/L01_ckpt.tar.gz" to /tmp/tmpo7djzxhg
cellbender:remove-background: No saved checkpoint.
cellbender:remove-background: No tarball found
cellbender:remove-background: No checkpoint loaded.
cellbender:remove-background: Running inference...
cellbender:remove-background: 
Volatile GPU utilization: 3 %
GPU memory reserved: 0.276824064 GB
GPU memory allocated: 0.2166016 GB
Avg CPU load over past minute: 27.1 %
RAM in use: 62.8G (14.4 %)
Traceback (most recent call last):
  File "/opt/conda/bin/cellbender", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.7/site-packages/cellbender/base_cli.py", line 123, in main
    cli_dict[args.tool].run(args)
  File "/opt/conda/lib/python3.7/site-packages/cellbender/remove_background/cli.py", line 185, in run
    return main(args)
  File "/opt/conda/lib/python3.7/site-packages/cellbender/remove_background/cli.py", line 230, in main
    posterior = run_remove_background(args)
  File "/opt/conda/lib/python3.7/site-packages/cellbender/remove_background/run.py", line 95, in run_remove_background
    inferred_model, _, _, _ = run_inference(dataset_obj=dataset_obj, args=args)
  File "/opt/conda/lib/python3.7/site-packages/cellbender/remove_background/run.py", line 750, in run_inference
    final_elbo_fail_fraction=args.final_elbo_fail_fraction)
  File "/opt/conda/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/opt/conda/lib/python3.7/site-packages/cellbender/remove_background/train.py", line 178, in run_training
    total_epoch_loss_train = train_epoch(svi, train_loader)
  File "/opt/conda/lib/python3.7/site-packages/cellbender/remove_background/train.py", line 60, in train_epoch
    epoch_loss += svi.step(x_cell_batch)
  File "/opt/conda/lib/python3.7/site-packages/pyro/infer/svi.py", line 145, in step
    loss = self.loss_and_grads(self.model, self.guide, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/pyro/infer/traceenum_elbo.py", line 451, in loss_and_grads
    for model_trace, guide_trace in self._get_traces(model, guide, args, kwargs):
  File "/opt/conda/lib/python3.7/site-packages/pyro/infer/traceenum_elbo.py", line 394, in _get_traces
    yield self._get_trace(model, guide, args, kwargs)
  File "/opt/conda/lib/python3.7/site-packages/pyro/infer/traceenum_elbo.py", line 340, in _get_trace
    "flat", self.max_plate_nesting, model, guide, args, kwargs
  File "/opt/conda/lib/python3.7/site-packages/pyro/infer/enum.py", line 75, in get_importance_trace
    model_trace.compute_log_prob()
  File "/opt/conda/lib/python3.7/site-packages/pyro/poutine/trace_struct.py", line 231, in compute_log_prob
    site["value"], *site["args"], **site["kwargs"]
  File "/opt/conda/lib/python3.7/site-packages/torch/distributions/gamma.py", line 77, in log_prob
    self.rate * value - torch.lgamma(self.concentration))
RuntimeError: CUDA driver error: invalid argument

After reading all the issues posted i have not be able to solve the problem, any change you can help me @edg1983, @sjfleming?
Thank you very much in advance.

Originally posted by @AClab-sgarcia in #127 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant