Similarity values are not between 0 and 1 #28

hayfre · 2021-05-25T10:02:32Z

Hi there, I have been using scmap cell2cluster to annotate both human and mouse data sets. The cell type annotation results that we get seem to make sense but the similarity values are not in the expected range of 0 to 1. This seems to be a bug in scmap-cell.
When running a test where the reference dataset cells are split into test and train data, the values are in the correct range for all 3 settings (cluster, cell, cell2cluster). However, when applying our own query data the problem occurs with cell (but not cluster) and is then propagated to cell2cluster. We have experienced this issue with 2 unique datasets using 3 different reference datasets.
I would appreciate your help to address this issue!

For reference:
scmap version 1.8.0
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

sagnikbanerjee15 · 2021-07-06T12:25:34Z

Hello,

I am facing the exact same problem. @hayfre please let me know if you have been able to solve it.

Thanks.

hayfre · 2021-07-06T13:39:08Z

Hi @sagnikbanerjee15,
No, I have unfortunately not had time to look into this further.

sagnikbanerjee15 · 2021-07-07T10:55:41Z

Hi @hayfre,

I think I have figured out the error. The tools does not seem to have a bug but I found an inconsistency in the gene names of my training data. For some reason, genes one of the reference datasets were denoted as a concatenated string between the gene_id and the gene_name. The similarity scores were greater than 1 for this particular dataset. I intentionally projected the same dataset onto itself and then it returned a value between 0 and 1.

Thank you.

LisaBast · 2021-07-07T15:07:00Z

Thanks for the hint with inconsisting gene names @sagnikbanerjee15. I was trying to solve this for some time and could finally get rid of values out of the [0,1] interval. In my case the gene name convention was not different between both sce objects but the query data contained some genes that were not in the reference and the other way around. By making sure that the sce objects for reference and query only contain the genes present in both data sets I could solve it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Similarity values are not between 0 and 1 #28

Similarity values are not between 0 and 1 #28

hayfre commented May 25, 2021

sagnikbanerjee15 commented Jul 6, 2021

hayfre commented Jul 6, 2021

sagnikbanerjee15 commented Jul 7, 2021

LisaBast commented Jul 7, 2021

Similarity values are not between 0 and 1 #28

Similarity values are not between 0 and 1 #28

Comments

hayfre commented May 25, 2021

sagnikbanerjee15 commented Jul 6, 2021

hayfre commented Jul 6, 2021

sagnikbanerjee15 commented Jul 7, 2021

LisaBast commented Jul 7, 2021