Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation error during simulation_training_ZINB function execution #22

Open
newbieMars opened this issue Oct 3, 2024 · 2 comments

Comments

@newbieMars
Copy link

newbieMars commented Oct 3, 2024

Hello,

Thank you again for your excellent package! I have encountered an issue when running the simulation_training_ZINB function, and I would like to bring it to your attention.

Context:
I am executing the function with the following input dimensions:

  • sc_count: 17409 x 2175
  • sc_cluster: 2175 x 2
  • spatial_count: 377 x 48521
  • spatial_cluster: 48521 x 2
  • overlap_gene: 283
  • unique_cluster_label: 7

The function is called as follow:

simulation.params = simulation_training_ZINB(
    sc_count = sc_count,
    spatial_count = spatial_count,
    overlap_gene = overlap_gene,
    unique_cluster_label = unique_cluster_label,
    sc_cluster = sc_cluster,
    spatial_cluster = spatial_cluster,
    outputpath = outputpath,
    optimizer = "variational_inference",
    mcmc.check = FALSE,
    num.iter = 1000,
    num.chain = 4,
    num.core = 32,
    saveplot = FALSE
)

Output and Error:
The function begins running, and the following messages are displayed during execution:

[1] “Prepare data”
[1] “Start model fitting”
Chain 1: EXPERIMENTAL ALGORITHM...
...
Chain 1: Iteration: 1000 -4621088.399 0.002 0.003
Chain 1: Informational Message: The maximum number of iterations is reached! The algorithm may not have converged.
Chain 1: Drawing a sample of size 1000 from the approximate posterior...
Chain 1: COMPLETED.

However, shortly after this, the following error occurs:

##Error in scan(csvfile, what = character(), sep = "\n", comment.char = "", : 
could not allocate memory (2048 Mb) in C function 'R_AllocStringBuffer'

Issue:
It seems that the function is running out of memory when trying to allocate 2048 Mb. I suspect this might be related to the large dataset size or the number of iterations specified. I would appreciate any guidance on whether this is an expected limitation, or if there are steps I could take to resolve this issue (e.g., adjusting parameters or environment settings).
I have already tried to launch my script with unlimited memory in a HPC (using a singularity container) without success:

ulimit -s unlimited && Rscript 03_Script/17_gpsFISH_platform_effect_estimation/launch_reports_compilation.R

Could it be linked to temporary file writing by rstan pavkage?
By the way, I'm able to execute the tutorial without any error on my laptop (with a docker container), the problem occurs on my data with more iteration and cores.

Environment:

  • R version: 4.3.2 (2023-10-31)
  • Operating System: Linux 22.04.1
  • Machine: x86_64
  • Memory available: 128G
  • Swap: 128G
  • nCPUs: 80

Matrix products: default

  • BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
  • LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0

Attached packages:

  • gpsFISH version: v.0.1.0
  • rstan: v.2.32.5
  • StanHeaders: v.2.32.5
  • deming: v.1.4
  • viridis: v.0.6.5
  • viridisLite: v.0.4.2
  • ggpointdensity: v.0.1.0
  • bayesplot: v.1.11.1
  • boot: v.1.3-28.1
  • ggplot2: v.3.5.0

Thank you very much for your help, and please let me know if you need any additional information!

@newbieMars
Copy link
Author

While the function simulation_training_ZINB is running, different kind of temporary files are generated:

  • .rds
  • .stan
  • .csv
  • .cpp

Among these file, the csv one, after 24 hours of calculation, has a size of 231 GB and continues to grow.
Could this be the cause of the error message? Is the size of my space object count the problem? Should I down-sample it ?

@YidaZhang0628
Copy link
Collaborator

Hi,

I apologize for the delay in getting back to you. I have been traveling in the past few weeks.

The size of the data you have is not huge. What seems fishy is the large CSV file. I don't remember seeing large CSV files when running simulation_training_ZINB. There are a few things we can check:

  1. When you run the code in the tutorial, do you see similar CSV files being generated? What is the size of that?
  2. What is the content of the CSV file?
  3. If you reduce the size of the data by subsetting the genes in the scRNA-seq data and cells in the spatial transcriptomic data, do you still have the same problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants