-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spatial_lda and recurrent cellular neighborhoods #71
Comments
I apologize for missing this issue. For some reason, I have not been receiving notifications for the issues raised in this channel. Does the issue still persist, and would you need help? Thank you. |
Hello, Yes, the issue still persists and I think my first question is rather more important than the second one. Another issue I have came across with is regarding the coherence scores. For my dataset containing a few million cells and ~10 different cell types, spatial_lda gives almost identical coherence scores for different number of topics. I have tried various number of topics, ranging from 20 to 50 and all gave identical coherence scores up to the sixth decimal point. Do you have any insight on this? Does this mean that I am requesting too many topics? |
@batukav, I will first address your query regarding coherence scores and respond to your earlier question subsequently. The coherence score remains constant in this application because the number of words (cell types) is limited (generally 5-15 cell types), contrasting with the standard implementation of Latent Dirichlet Allocation (LDA), where one might encounter millions of words. Due to this limitation, the current version of LDA in |
Thank you for the explanation. I think now I understand the process better. Do you have any suggestions for picking the number of topics? If the analysis boils down to investigating and merging the clusters, I suppose the overall results won't be as sensitive to the initial selection of number of topics. |
Dear Scimap developers,
Thank you very much for creating this great repo.
I would like to ask about defining recurrent cellular neighborhood (RCN) from histopathology data using spatial_lda method. Specifically, I'm trying to wrap my head around the spatial_lda method used in the publication The Spatial Landscape of Progression and Immunoediting in Primary Melanoma at Single-Cell Resolution by @ajitjohnson and his coworkers.
What I understand is that LDA is used to assign a distribution of "topics" to each cell. Then, to define the RCN's, these topic distributions per cell (latent weights) are clustered using K-means clustering. Then, the resulting clusters are manually grouped into "meta-clusters", which in turn correspond to the RCNs. Do I understand this approach correctly?
I am curious to understand 1- does it make sense to use different clustering algorithms like HDBScan (or UMAP + HDBScan) to group the latent weights and 2- the suitability of using Euclidean distance for clustering the latent weights. As the latent weights are probability distributions, does it make any sense to use Jensen-Shannon distance (or similar) for trying to cluster the latent weights? Did you experiment with any of these method?
The ease of use and smoothness introduced by Scimap is very valuable and I would like to use it for analyzing similar data. I hope this is the right place to discuss the above questions Thank you.
The text was updated successfully, but these errors were encountered: