-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
warning: different learning-rate trying and gene decreased dramatically after cellbender removal #292
Comments
Hi @GLLMU , thanks for your questions. I will try to answer them here and provide a few notes.
You might be interested in trying this, just for your information: #242
It is abnormal to have to try so many different parameter settings, sorry about this!
This is a great question! And the answer is "no"! You do not need to use the same learning rate on different samples. (You should use the same FPR!) But the other parameters like total droplets or expected cells or even the learning rate can definitely be different for different samples. And you can then jointly analyze the datasets together downstream.
Actually, I wonder if you can try the following thing, just to see if it would work... CellBender v0.3.0 has methods for automatically finding the cellbender remove-background
--input CellBender_Input/Sample1/raw_feature_bc_matrix.h5
--output CellBender_Output/Sample1_CellBender_output.h5
--low-count-threshold 100
--learning-rate 2.5e-5
cellbender remove-background
--input CellBender_Input/Sample2/raw_feature_bc_matrix.h5
--output CellBender_Output/Sample2_CellBender_output.h5
--low-count-threshold 100
--learning-rate 2.5e-5 The But, if that does not work great either, then what you've done for sample 1 seems quite reasonable. You decreased the learning rate until you got a really good looking learning curve. cellbender remove-background
--input CellBender_Input/Sample2/raw_feature_bc_matrix.h5
--output CellBender_Output/Sample2_CellBender_output.h5
--expected-cells 10000
--total-droplets-included 30000
--learning-rate 2e-5 |
The gene warning is definitely something to consider. It looks like AY036118 really does decrease its counts quite a bit. 50% - 70% of the counts are being removed from cell-containing droplets (the It looks like it's being removed quite a bit from both datasets. (Actually the top three genes removed are the same in the two samples.) This is typically a good sign, because CellBender is independently coming to the conclusion that the same genes are high-noise in two separate datasets (which might make sense if they are similar types of samples or were prepared in similar ways). If it were me, I usually also try to get a feel for whether half of the counts of AY036118 being noise is "believable". One way to get a feel for this is to eyeball the columns I'd encourage you to do whatever other sanity checks you can think of, but if you're satisfied it's plausible, I would go ahead and use the results without worrying about the warning. The warning is there to encourage people to stop and double-check, but if the double-check looks okay, then the warning should be ignored. |
Hi @GLLMU, How did you generate these? |
@sjfleming Thanks for the great tool. I really appreciate for the amazing calculation.
When I used the tool for my 4 samples, I also met some problems.
Q1: 3 different learning-rate for sample 1
Cellranger count report for sample 1:
I used the cell number from Cellranger count for expected-cells, and total-droplets-included number was inferred from the UMI curve according to this protocol: https://www.10xgenomics.com/resources/analysis-guides/background-removal-guidance-for-single-cell-gene-expression-datasets-using-third-party-tools.
The script I used for CellBender:
cellbender remove-background
--input CellBender_Input/Sample1/raw_feature_bc_matrix.h5
--output CellBender_Output/Sample1_CellBender_output.h5
--expected-cells 18487
--total-droplets-included 50000
--fpr 0.01
--epochs 150
--learning-rate 1e-4 (tried different values)
1)
Like others discussed, I used the default learning-rate 1e-4, then I got the following result:
And then I tried learning-rate 0.00005 to repeat the analysis, then I still got the same problem:
I tried learning-rate 0.000025 to repeat the analysis again, then I got a better result:
So, the learning-rate 0.000025 seems fit for this sample.
Q2: learning-rate 0.000025 doesn't fit sample 2
Cellranger count report for sample2:
The script I used for CellBender:
cellbender remove-background
--input CellBender_Input/Sample2/raw_feature_bc_matrix.h5
--output CellBender_Output/Sample2_CellBender_output.h5
--expected-cells 13261
--total-droplets-included 50000
--fpr 0.01
--epochs 150
--learning-rate 1e-4 (tried different values)
For default learning-rate 1e-4, I got following result with same warning:
Then I directly tried learning-rate 0.000025 for sample2, however the result still has the same warning:
So, I got a worse result for sample 2. And now I am running learning-rate 0.0000125 for sample 2 and waiting for the results.
However the GPU doesn't work for my computer, so each sample took 12 ~ 48 hours for each trial. And it is really time consuming.
Do you think it is normal or abnormal? The cells number are around 18k for sample1 and 13k for sample2.
So, my question is do I need to use the same learning-rate like 0.0000125 (if the results are good) for these 4 samples if I want to integrate them in the Seurat in the further analysis? But it is really time consuming.
Q3: One gene decreased warning in all samples
There is only one same gene decreased warning in 4 samples:
WARNING: The expression of the highly-expressed gene AY036118 decreases quite markedly after CellBender. Check to ensure this makes sense!
for sample 1 with learning-rate 0.000025
for sample 2 with learning-rate 0.000025
Then does this gene warning affect the analysis results? Do I need to do any changes to rerun the samples? Or I can used the output for further analysis and ignore this gene warning?
So, these are my three main questions, and I plan to use Scrublet and seurat in the following analysis with the output of CellBender.
Could you please help me to have a look of the problems when you have time? Thank you so much in advance.
The text was updated successfully, but these errors were encountered: