Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update getting_started & sce_file_contents with celltyping instructions #193

Merged
merged 23 commits into from
Nov 15, 2023

Conversation

sjspielman
Copy link
Member

Closes #165
Closes #192

This PR updates docs with more cell typing information. Note that this is more or less ready for review, but I'm opening as a Draft PR specifically because some of the changes (celldex/panglaodb versions) here depend on forthcoming updates as part of AlexsLemonade/scpca-nf#538. So, we should not move to final.final approval until that is done.

Most changes here are in getting_started.md. I have subsections for each possible annotation type, and I go show SCE then AnnData for each set of tasks one might do. Noting a different way to break this up could be to show all SCE things for a given annotation, and then all AnnData things.

You'll also notice these sentences:

<!-- TODO: this instead? -->
Additional information about <`CellAssign`/`SingleR`> annotation results is also available from the `SingleCellExperiment`'s metadata, as described in the {ref}`experiment metadata table<sce_file_contents:SingleCellExperiment experiment metadata>`.

We currently don't need these sentences, but we could use them instead if we don't want to show how to pull out reference information from the metadata, and instead point folks to that table. Let me know which approach we prefer - showing the code here, or having this sentence instead!

As indicated, I went ahead and took care of #192 since it was so quick. I also removed some periods from the ends of table contents per overall style since I noticed them while scrolling through.

@sjspielman sjspielman marked this pull request as draft November 9, 2023 14:38
Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good, but I did have some comments on overall organization that I left. Something I didn't mention in my comments is that we should include a few resources on cell type annotation at the end of the section. This would mirror the other sections.

Comment on lines 211 to 212
- Automated annotations from [`SingleR`](https://bioconductor.org/packages/release/bioc/html/SingleR.html), a reference-based method ([Looney _et al._ 2019](https://doi.org/10.1038/s41590-018-0276-y)).
- Automated annotations from [`CellAssign`](https://github.com/Irrationone/cellassign), a marker-gene based method ([Zhang _et al._ 2019](https://doi.org/10.1038/s41592-019-0529-1)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Automated annotations from [`SingleR`](https://bioconductor.org/packages/release/bioc/html/SingleR.html), a reference-based method ([Looney _et al._ 2019](https://doi.org/10.1038/s41590-018-0276-y)).
- Automated annotations from [`CellAssign`](https://github.com/Irrationone/cellassign), a marker-gene based method ([Zhang _et al._ 2019](https://doi.org/10.1038/s41592-019-0529-1)).
- Annotations determined by using [`SingleR`](https://bioconductor.org/packages/release/bioc/html/SingleR.html), a reference-based method ([Looney _et al._ 2019](https://doi.org/10.1038/s41590-018-0276-y)).
- Annotations determined by using [`CellAssign`](https://github.com/Irrationone/cellassign), a marker-gene based method ([Zhang _et al._ 2019](https://doi.org/10.1038/s41592-019-0529-1)).

I don't know if this is much better, but we didn't take annotations from these methods, we performed annotation using those methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be a good middle ground - Annotations determined by <method>, an automated <thing-based> method


#### Submitter-provided annotations

To access submitter-provided annotations in the `SingleCellExperiment`, use the following command:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a note that not all libraries will contain submitter annotations. Although everything should include CellAssign/SingleR, it's much more likely that these will be missing.

docs/getting_started.md Outdated Show resolved Hide resolved
docs/getting_started.md Outdated Show resolved Hide resolved
docs/getting_started.md Outdated Show resolved Hide resolved
docs/getting_started.md Outdated Show resolved Hide resolved
@@ -203,6 +203,167 @@ See these resources for more information on clustering:
- [Quantifying clustering behavior in Orchestrating Single Cell Analysis](https://bioconductor.org/books/release/OSCA.advanced/clustering-redux.html#quantifying-clustering-behavior)


### Cell type annotation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I think we want to add a little bit more context to this section. I would describe the references used in this first section. I would also mention the supplemental cell type report and that it's a good resource for evaluating the provided cell type annotations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 2 links for cell annotation, but we could add more! Perhaps from this list we have going here? https://github.com/AlexsLemonade/training-specific-template/blob/main/additional-resources/single-cell-resources.md#cell-type-annotation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allyhawkins this comment may have fallen by the wayside, any other links here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you added is good!

@sjspielman
Copy link
Member Author

Thanks for the feedback, ready for another look! Note that I also updated the "intro" section here to show how to determine which cell types are present. This could use some feedback about wording.

Also...

think we should add a note that not all libraries will contain submitter annotations. Although everything should include CellAssign/SingleR, it's much more likely that these will be missing.

For this, I also popped a bullet into the intro section to say that we dont always expect submitter. What do you think of that approach vs text in the submitter section directly?

Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Organization looks good! I just had a few comments about word choice. Also we may want to revisit the two additions to the metadata based on incoming scpca-nf changes.

docs/getting_started.md Outdated Show resolved Hide resolved
docs/getting_started.md Outdated Show resolved Hide resolved
docs/getting_started.md Outdated Show resolved Hide resolved
docs/getting_started.md Outdated Show resolved Hide resolved
docs/getting_started.md Outdated Show resolved Hide resolved
docs/getting_started.md Outdated Show resolved Hide resolved
docs/getting_started.md Show resolved Hide resolved
@@ -136,8 +136,11 @@ metadata(sce) # experiment metadata
| `singler_results` | If cell typing with `SingleR` was performed, the full result object returned by `SingleR` annotation. Only present for `processed` objects |
| `singler_reference` | If cell typing with `SingleR` was performed, the name of the [`celldex`](http://bioconductor.org/packages/release/data/experiment/html/celldex.html) reference dataset used for annotation. Only present for `processed` objects |
| `singler_reference_label` | If cell typing with `SingleR` was performed, the name of the label in the reference dataset used for annotation. Only present for `processed` objects |
| `singler_reference_version` | If cell typing with `SingleR` was performed with ontology labels, the [`celldex`](http://bioconductor.org/packages/release/data/experiment/html/celldex.html) package version used for the reference dataset |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that I'm going to file a PR today with adding this into the SCE object. I am adding both package and version so it will be celldex_1-12-1 or whatever the version is. So we might update this once we make a decision regarding how we are storing this information.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll circle back once that PR is set then (we can block this PR as needed, probably dont need to for this situation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjspielman sjspielman marked this pull request as ready for review November 15, 2023 17:29
@sjspielman
Copy link
Member Author

I have updated this with the final naming scheme and information we've used in add_celltypes_to_sce.R, but a couple comments about this:

  • We (accidentally? my bad during review..?) ended up with both singler_ref_<stuff> and singler_reference_<stuff> (same for cellassign) metadata variables. Do we want to backtrack and make them all say _reference_?
  • How's my phrasing? This is hard! I wonder if I rearranged text too much and should be more firm that we use celldex and PanglaoDB, since at this point that's all that will be on the Portal.
  • I feel like we need to include a table somewhere in these forward-facing docs about what organs go into each CellAssign reference. Where do you think this might go? Can be in a subsequent PR.

Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that! I made the update in AlexsLemonade/scpca-nf#569. And then also made some small suggestions here.

docs/getting_started.md Outdated Show resolved Hide resolved
docs/sce_file_contents.md Outdated Show resolved Hide resolved
docs/sce_file_contents.md Outdated Show resolved Hide resolved
docs/sce_file_contents.md Outdated Show resolved Hide resolved
Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@@ -203,6 +203,167 @@ See these resources for more information on clustering:
- [Quantifying clustering behavior in Orchestrating Single Cell Analysis](https://bioconductor.org/books/release/OSCA.advanced/clustering-redux.html#quantifying-clustering-behavior)


### Cell type annotation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you added is good!

@sjspielman sjspielman merged commit 325bfdb into development Nov 15, 2023
1 check passed
@sjspielman sjspielman deleted the sjspielman/165-celltype-getting-started branch November 15, 2023 20:04
@sjspielman sjspielman restored the sjspielman/165-celltype-getting-started branch November 15, 2023 20:15
@sjspielman sjspielman deleted the sjspielman/165-celltype-getting-started branch November 15, 2023 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants