Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similarity values are not between 0 and 1 #28

Open
hayfre opened this issue May 25, 2021 · 4 comments
Open

Similarity values are not between 0 and 1 #28

hayfre opened this issue May 25, 2021 · 4 comments

Comments

@hayfre
Copy link

hayfre commented May 25, 2021

Hi there, I have been using scmap cell2cluster to annotate both human and mouse data sets. The cell type annotation results that we get seem to make sense but the similarity values are not in the expected range of 0 to 1. This seems to be a bug in scmap-cell.
When running a test where the reference dataset cells are split into test and train data, the values are in the correct range for all 3 settings (cluster, cell, cell2cluster). However, when applying our own query data the problem occurs with cell (but not cluster) and is then propagated to cell2cluster. We have experienced this issue with 2 unique datasets using 3 different reference datasets.
I would appreciate your help to address this issue!

For reference:
scmap version 1.8.0
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

@sagnikbanerjee15
Copy link

Hello,

I am facing the exact same problem. @hayfre please let me know if you have been able to solve it.

Thanks.

@hayfre
Copy link
Author

hayfre commented Jul 6, 2021

Hi @sagnikbanerjee15,
No, I have unfortunately not had time to look into this further.

@sagnikbanerjee15
Copy link

Hi @hayfre,

I think I have figured out the error. The tools does not seem to have a bug but I found an inconsistency in the gene names of my training data. For some reason, genes one of the reference datasets were denoted as a concatenated string between the gene_id and the gene_name. The similarity scores were greater than 1 for this particular dataset. I intentionally projected the same dataset onto itself and then it returned a value between 0 and 1.

Thank you.

@LisaBast
Copy link

LisaBast commented Jul 7, 2021

Thanks for the hint with inconsisting gene names @sagnikbanerjee15. I was trying to solve this for some time and could finally get rid of values out of the [0,1] interval. In my case the gene name convention was not different between both sce objects but the query data contained some genes that were not in the reference and the other way around. By making sure that the sce objects for reference and query only contain the genes present in both data sets I could solve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants