-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--useMtx is ignored if too.big == TRUE in runSeurat.R #262
Comments
Realizing now that the |
Hi Rogan,
thanks for your feedback. This is intentional, but I'm interested in your
feedback, my knowledge of R (as you can see from my R code) is somewhat
limited.
Apparently you have a very big matrix, right?
The problem with big matrices in R is that if they exceed the maximum size
of elements 2^31-1 then R stops when the sparse matrix is converted to a
normal matrix. Most of the elements are zero, so as long as the
non-zero-count of the Matrix is low enough, big Matrices work as long as
they're sparse and kept sparse and for that, writing as MTX is (as far as I
know - please correct me here, see below) is required. I wrote
writeSparseTsvChunks()
to be able to *write* very big matrices, but then R can't read them, so I
moved to .mtx.gz everywhere now. This was the idea behind the move towards
.mtx.gz files everywhere, both for h5ad and Seurat objects, to make sure
that R can always *read* the matrices.
The MTX format seemed clean enough and easy to read. (BTW: what don't you
like about .mtx.gz ?)
cbImportSeurat produces an .Rscript file that you can edit and run
manually, you could try it now to force the .tsv.gz file and try to read
the result in R - does that work for you? Or do you not care if others can
read the resulting .tsv.gz files with R?
Or maybe I'm missing something and it's not too hard to read gigantic
.tsv.gz files and there is some trick in R to convert them into sparse
matrices in pieces when reading them?
…On Wed, Feb 8, 2023 at 7:21 AM Rogan Grant ***@***.***> wrote:
Realizing now that the --forceMtx flag does not take a text argument, and
rather is true if specified, false if not. In any case, it would be great
to have an equivalent --forceTSV flag
—
Reply to this email directly, view it on GitHub
<#262 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACL4TJQBXT5DPWFT3KKQZTWWM3NFANCNFSM6AAAAAAUU2JKLQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
And, just in case: I'm not opposed to changing this, I just would like to
understand if it makes sense in some context for R to write matrices that
it can write, but not read.
On Wed, Feb 8, 2023 at 11:32 AM Maximilian Haeussler ***@***.***>
wrote:
… Hi Rogan,
thanks for your feedback. This is intentional, but I'm interested in your
feedback, my knowledge of R (as you can see from my R code) is somewhat
limited.
Apparently you have a very big matrix, right?
The problem with big matrices in R is that if they exceed the maximum size
of elements 2^31-1 then R stops when the sparse matrix is converted to a
normal matrix. Most of the elements are zero, so as long as the
non-zero-count of the Matrix is low enough, big Matrices work as long as
they're sparse and kept sparse and for that, writing as MTX is (as far as I
know - please correct me here, see below) is required. I wrote writeSparseTsvChunks()
to be able to *write* very big matrices, but then R can't read them, so I
moved to .mtx.gz everywhere now. This was the idea behind the move
towards .mtx.gz files everywhere, both for h5ad and Seurat objects, to make
sure that R can always *read* the matrices.
The MTX format seemed clean enough and easy to read. (BTW: what don't you
like about .mtx.gz ?)
cbImportSeurat produces an .Rscript file that you can edit and run
manually, you could try it now to force the .tsv.gz file and try to read
the result in R - does that work for you? Or do you not care if others can
read the resulting .tsv.gz files with R?
Or maybe I'm missing something and it's not too hard to read gigantic
.tsv.gz files and there is some trick in R to convert them into sparse
matrices in pieces when reading them?
On Wed, Feb 8, 2023 at 7:21 AM Rogan Grant ***@***.***>
wrote:
> Realizing now that the --forceMtx flag does not take a text argument,
> and rather is true if specified, false if not. In any case, it would be
> great to have an equivalent --forceTSV flag
>
> —
> Reply to this email directly, view it on GitHub
> <#262 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AACL4TJQBXT5DPWFT3KKQZTWWM3NFANCNFSM6AAAAAAUU2JKLQ>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
|
Thank you for the quick response! I have personally converted this matrix to non-sparse in R in the course of certain function calls without issue, but the documentation agrees with you. I honestly don't know how much of a risk this poses in terms of the function failing for others. In any case I have no issue with .mtx files, but I can't get them to work at all with cbBuild. The cellbrowser.conf file still points to a single tsv file that does not exist, and manually supplying each individual file does not seem to work (next it asks for a barcodes.tsv, which is ignored if I specify directly for each assay). My ultimate solution (which worked very well) was to run mtx2tsv on each assay before deployment. |
Hi Rogan,
hmm... I have a few questions sorry:
The cellbrowser.conf file still points to a single tsv file that does not
exist,
Sorry, I don't understand: do you mean that the auto-generated
cellbrowser.conf file does not point to the .mtx.gz file? That's probably a
bug. How about changing that filename manually in cellbrowser.conf, doesn't
that work?
and manually supplying each individual file does not seem to work
Sorry I don't know what you mean... it's sufficient to provide the
matrix.mtx.gz file, cbBuild will find the other files.
(next it asks for a barcodes.tsv, which is ignored if I specify directly
for each assay)
Sorry, I don't understand this sentence.
… Message ID: ***@***.***>
|
Sorry, I should have waited to give more concrete examples. My object has three assays (counts, data, and scale). As far as I can tell cbBuild does not handle this correctly if a .mtx file is used. If I run cbBuild without any conversion, I initially get the following error:
Full trace:
The initial cellbrowser.conf file is as follows:
If I modify the cellbrowser.conf
I run into a new error, where it seems cbBuild does not recognize the additional assays:
Full trace:
Finally, if I add additional fields to specify the naming structure, it seems to be ignored (but perhaps I am using the wrong arguments):
Same error:
Thank you for your help with this |
First of all, thank you for this incredibly useful package. We get a lot of use out of it.
For large matrices (where
too.big = TRUE
), I've run into an issue where you can't force--useMtx
to be TRUE. This is because the first line of this chunk will always read TRUE in runSeurat.R:Would it be possible to allow the use to force a tsv instead, such as changing
(use.mtx || too.big)
to(use.mtx || (.Platform$OS.type=="windows" && too.big))
? I ask largely becausecbBuild
consistently fails for me with .mtx files, and I can't figure out precisely how to configure the cellbrowser.conf to fix this issue.Thank you!
The text was updated successfully, but these errors were encountered: