-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook for exploring consensus cell type labels across ScPCA samples #999
Notebook for exploring consensus cell type labels across ScPCA samples #999
Conversation
…plore-consensus-labels
@allyhawkins – I took an initial look at this, and I think we'd benefit from developing a strategy for a unified color palette. The same cell type gets different colors across projects of the same cancer type, which is a little bit of an obstacle for interpretation. |
If we want to plot the top 8 cell types + Unknown + all remaining cell types for each project then we have 27 unique cell types. To assign unique colors, I'm using the Here's an updated report: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is an exploratory notebook, I think this does a good job of illustrating the main points you wanted to convey. There are a few places where I think that you might actually have a few extra plots that are not needed, and one plot that I would modify to aid intrepretability (the second sina plot I think should be histograms, perhaps?).
As @jaclyn-taroni noted, I do think unifying the plot colors would be useful. I guess the question there is exactly how many colors you need to deal with; it seems that many of the sets are similar, which might be a good place to start. Maybe if there is a good set of 10-15 cell types you could use across all plots and then use the "Other" set more liberally? I don't think that the precise cell types are particularly important for the bar plots, so having more classified as other is probably not going to be a big deal? (Update: I see that you changed this and added a new color palette, so I am going to have to have another look soon.)
Other than that, I had a number of little style/code suggestions. Most of those could be considered optional.
|
||
# stacked bar chart showing the distribution of the top 9 cell types for each project, including Unknown | ||
celltype_distribution_plot <- function(label, | ||
results_df = all_results_df){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit strange to have a default value here which has not been defined. I would probably not include a default here.
results_df = all_results_df){ | |
results_df = all_results_df){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option here, because this function is really only used in one place, is to define the function inline. I know we don't usually do that, but in this case, where it is closely related to the only place the function is used (and you are likely to be modifying it and looking at the results in that location), I think it may actually be worthwhile.
|
||
|
||
# get color assignments | ||
# put grey for unknoqn and all remaining cells at the end of list since they will show up last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# put grey for unknoqn and all remaining cells at the end of list since they will show up last | |
# put grey for unknown and all remaining cells at the end of list since they will show up last |
|
||
# filter to get only results for the specified project | ||
plot_df <- results_df |> | ||
dplyr::filter(project_label == {{label}}) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is {{}}
needed here? The input seems like it is a string here, so I would not expect it.
dplyr::filter(project_label == {{label}}) |> | |
dplyr::filter(project_label == {{label}}) |> |
colors <- c( | ||
palette.colors(palette = "Set1")[1:num_assigned_celltypes], # first get colors for cell types that are not unknown or all remaining | ||
"grey95" # level for unknown | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreeing with @jaclyn-taroni here: we probably want to set up some default colors so that the cell types are consistent. Though part of me wonders whether we really care about the specific cell types in these figures vs. just the proportion with consensus? I don't think the information is valueless, but I don't know whether these figures are the best way to represent it.
```{r packages} | ||
suppressPackageStartupMessages({ | ||
# load required packages | ||
library(ggplot2) | ||
}) | ||
|
||
# Set default ggplot theme | ||
theme_set( | ||
theme_classic() | ||
) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are using ggplot2
in your functions, I would load the package first. Just how my brain wants it.
```{r, fig.height=5} | ||
# pivot for plotting so above and below count is in the same column | ||
high_tumor_df <- high_tumor_df |> | ||
tidyr::pivot_longer( | ||
cols = c("all_unknown", "classified_cells"), | ||
names_to = "category", | ||
values_to = "number_of_samples" | ||
) | ||
|
||
ggplot(high_tumor_df, aes(x = project_label, y = number_of_samples, fill = category)) + | ||
geom_bar(position = "stack", stat= "identity") + | ||
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = rel(0.9))) + | ||
labs( | ||
x = "", | ||
y = "Number of samples", | ||
fill = "" | ||
) + | ||
scale_fill_manual(values = c( "#FFCB05", "#00274C")) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this plot is particularly needed. The table seems sufficient, especially with the previous sina plot.
analyses/cell-type-consensus/exploratory-notebooks/02-explore-consensus-results.Rmd
Outdated
Show resolved
Hide resolved
project_labels |> | ||
purrr::map(celltype_distribution_plot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I find the list text between plots annoying, perhaps:
project_labels |> | |
purrr::map(celltype_distribution_plot) | |
project_labels |> | |
purrr::map(celltype_distribution_plot) |> | |
patchwork::wrap_plots() |
analyses/cell-type-consensus/exploratory-notebooks/02-explore-consensus-results.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-consensus/exploratory-notebooks/02-explore-consensus-results.Rmd
Outdated
Show resolved
Hide resolved
This looks pretty good to me as far as the number of cell types you are showing and the exploratory nature of the notebook. |
Agreed |
Co-authored-by: Joshua Shapiro <[email protected]>
@jashapiro and @jaclyn-taroni I made some minor changes based on Josh's review, including removing the maize and blue stacked plot (😢), using histograms for the number of cells, and making the distribution plots a little prettier when they print. This should be ready for another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I will just note that the html file you shared was not a complete render (the final table was missing, at least), so you might want to rerun the notebook and commit any changes before merging. I didn't check the version that was actually part of this commit, though.
purrr::map(\(label){ | ||
|
||
project_df <- plot_df |> | ||
dplyr::filter(project_label == {{label}}) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you answered whether the {{
was needed, but I don't think it is?
dplyr::filter(project_label == {{label}}) |> | |
dplyr::filter(project_label == label) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I had this here in the function because it was yelling at me, but it's not doing that anymore, so I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for an exploratory notebook! I expect we might want to discuss polishing these figures eventually.
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
Closes #970
What is the goal of this pull request?
Here I am adding a notebook that looks at the output from running
cell-type-consensus
inOpenScPCA-nf
. The output from this module is a TSV file for each project with all cells in all libraries and the assigned cell type annotations. The notebook added here reads in all the TSV files and creates a few summary plots to look at the assigned cell types across all samples.The main things we would like to know are:
Briefly describe the general approach you took to achieve this goal.
cell-type-consensus
module toresults
. Note that these results are currently instaging
and TSVs are on a project level. We do want to modify this workflow to have a single TSV for each sample, so when we do that we might also want to change things here (see Modify consensus module to output results at a sample level OpenScPCA-nf#114).If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes
Results
What types of results does your code produce (e.g., table, figure)?
Here's a copy of the rendered report for easy reviewing:
02-explore-consensus-results.nb.html.zip
What is your summary of the results?
I think
CellAssign
might have been useful!Generally I see a higher proportion of cell types assigned to leukemia and brain samples than solid tumors, which I would expect. We also see that leukemias tend to be immune cell types and solid tumors have more fibroblasts and muscle cells.
Author checklists
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.