-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build reference for consensus cell type labels #973
Build reference for consensus cell type labels #973
Conversation
CL:0002038 T follicular helper cell CL:0000624 CD4+ T-cells CD4+ T-cells CL:0000624 CD4-positive, alpha-beta T cell | ||
CL:0000893 thymocyte CL:0000624 CD4+ T-cells CD4+ T-cells CL:0000084 T cell | ||
CL:0000798 gamma-delta T cell CL:0000624 CD4+ T-cells CD4+ T-cells CL:0000084 T cell | ||
CL:0000814 NK lymphocyte CL:0000624 CD4+ T-cells CD4+ T-cells CL:0000791 mature alpha-beta T cell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a second here between meetings, so I will return this because it impacts more than one line.
I almost certainly missed this in an earlier review, but this seems like the wrong term for "NK lymphocyte." This seems like a better fit to me: http://purl.obolibrary.org/obo/CL_0000623
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it looks like the ID is correct, but the name for the ID is wrong in the original file where we assigned IDs. The term from Panglao is "Natural killer T cells" so that should be assigned to CL:0000814, but the name should be "mature NK T cell", not "NK lymphocytes". I'll fix that here.
…ild-consensus-reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I went through the result consensus labels, and they seem sound. We should find out if there's a way to get versioned OBO files before we wrap this up entirely.
# Prep references -------------------------------------------------- | ||
|
||
# grab obo file | ||
cl_ont <- ontologyIndex::get_ontology("http://purl.obolibrary.org/obo/cl-basic.obo") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more versioned option here? I can imagine the results would change over time if we don't lock it down more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I updated this to use a specific release.
|
||
# get ontologies and human readable name into data frame | ||
blueprint_df <- data.frame( | ||
blueprint_ontology = blueprint_ref$label.ont, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly a question for my understanding – when we use label.ont
is that specific for/tied to label.main
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's tied to label.fine
. From the celldex
vignette:
Typically, each reference provides three levels of cell type annotation in its column metadata:
- label.main, broad annotation that defines the major cell types. This has few unique levels that allows for fast annotation but at low resolution.
- label.fine, fine-grained annotation that defines subtypes or states. This has more unique levels that results in slower annotation but at much higher resolution.
- label.ont, fine-grained annotation mapped to the standard vocabulary in the Cell Ontology. This enables synchronization of labels across references as well as dynamic adjustment of the resolution.
I manually checked a few terms to be sure that the names match up with label.fine
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
Closes #951
What is the goal of this pull request?
Here I'm adding a table to the references folder that contains all possible consensus cell type labels. Ultimately this table can be used to assign a consensus label for all cells in ScPCA samples based on the combination of labels from SingleR/Blueprint and CellAssign/PanglaoDB. In this table, each row is a unique combination of cell types from Panglao and Blueprint, and there is a column for the consensus label that corresponds to the LCA for that set of labels.
Note that I am only including the combinations that result in a consensus label that is NOT unknown. The total number of unique combinations is > 7000 and only 301 of those result in a label based on the rules we have set in place. I originally made a table with all combinations and set everything that wasn't assigned to "Unknown", but then I can't store it in this repo because of pre-commit file limits. Let me know if we do want a table with every possible combination, even the unknowns. If that's the case we will have to figure out where to store it (on S3 in the results bucket probably).
Briefly describe the general approach you took to achieve this goal.
I wrote a script that programmatically assigns the consensus labels based on the rules we set place in Create reference for consensus cell type labels #951.
hematopoietic precursor cell
.neuron
andepithelial cell
when Blueprint isEpithelial cells
.bone cell
,lining cell
,blood cell
,progenitor cell
, andsupporting cell
are all removed as possible consensus labels.The script then saves a table with all combinations for which a consensus label was identified. This table includes columns for the panglao ontology/annotation, blueprint ontology/annotation, and consensus ontology/annotation. Again, I did not include every "Unknown" combination.
I updated documentation throughout. I mostly did this to document the rules we are implementing in defining the consensus labels and the process that we used to actually create this reference. The main README is still not fully complete, but I imagine that will get filled up as we work on actually assigning the consensus labels.
If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes! Time to assign labels next.
Provide directions for reviewers
Is there anything that you want to discuss further?
Here's the final list of cell type annotations that are used for consensus labels for reference:
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.