Skip to content

Commit

Permalink
updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
zhengzhenxian committed Jul 5, 2023
1 parent 24d2959 commit 7235db1
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 23 deletions.
50 changes: 28 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ ClairS is now available for early access to interested and experienced users. Yo

## Latest Updates

*v0.1.3 (Jul 5)* : Added ONT Dorado 4khz (`-p ont_r10_dorado_4khz`) and 5khz (`-p ont_r10_dorado_5khz`) models, check [here](#pre-trained-models) for more details. Renamed platform options `ont_r10` to `ont_r10_guppy` and `ont_r9` to `ont_r9_guppy`.

*v0.1.2 (May 17)* : Added HiFi Revio model, renamed HiFi Sequel II model from `hifi` to `hifi_sequel2`.

*v0.1.1 (Apr 30)* : 1. Added the "command line used" to VCF header. 2. Added `NAU`, `NCU`, `NGU`, and `NTU` tags (#reads supporting the four bases in normal) to the output. 3. Hybrid calling mode now outputs three VCFs, ClairS somatic variant calls, Clair3 normal germline variant calls, and Clair3 tumor germline variant calls. 4. Added the `--enable_clair3_germline_output` option to also output Clair3 normal germline variant calls, and Clair3 tumor germline variant calls (even when hybrid calling more is not enabled). Running time will increase by ~40%.
Expand All @@ -91,7 +93,7 @@ ClairS is now available for early access to interested and experienced users. Yo
After following [installation](#installation), you can run ClairS with one command:

```bash
./run_clairs -T tumor.bam -N normal.bam -R ref.fa -o output -t 8 -p ont_r10
./run_clairs -T tumor.bam -N normal.bam -R ref.fa -o output -t 8 -p ont_r10_guppy
## Final output file: output/output.vcf.gz
```

Expand All @@ -103,17 +105,21 @@ Check [Usage](#Usage) for more options.

ClairS trained both pileup and full-alignment models using GIAB samples, and carry on benchmarking on HCC1395-HCC1395BL pair dataset. All models were trained with chr20 excluded (including only chr1-19, 21, 22).

| Platform | Model name | Chemistry /Instruments | Option (`-p/--platform`) | Reference | Aligner |
| :---------: | :--------------------: | :--------------------------: | :----------------------: | :-----------: | :------: |
| ONT | ont_r104_e81_sup_g5015 | R10.4/R10.4.1 | `ont_r10` | GRCh38_no_alt | Minimap2 |
| ONT <sup>1</sup> | r941_prom_sup_g5014 | R9.4.1 | `ont_r9` | GRCh38_no_alt | Minimap2 |
| Illumina | ilmn | NovaSeq/HiseqX | `ilmn` | GRCh38 | BWA-MEM |
| PacBio HiFi <sup>2</sup> | hifi_sequel2 | Sequel II with Chemistry 2.0 | `hifi_sequel2` | GRCh38_no_alt | Minimap2 |
| PacBio HIFI | hifi_revio | Revio with SMRTbell prep kit 3.0 | `hifi_revio` | GRCh38_no_alt | Minimap2 |
| Platform | Model name | Chemistry /Instruments | Basecaller | Option (`-p/--platform`) | Reference | Aligner |
| :---------: | :--------------------: | :--------------------------: | :----------------------: | :-----------: | :------: | ----------- |
| ONT<sup>1</sup> | r1041_e82_400bps_sup_v420 | R10.4.1, 5khz | Dorado | `ont_r10_dorado_5khz` | GRCh38_no_alt | Minimap2 |
| ONT<sup>1</sup> | r1041_e82_400bps_sup_v410 | R10.4.1, 4khz | Dorado | `ont_r10_dorado_4khz` | GRCh38_no_alt | Minimap2 |
| ONT | r104_e81_sup_g5015 | R10.4/R10.4.1 | Guppy5 | `ont_r10_guppy` | GRCh38_no_alt | Minimap2 |
| ONT <sup>2</sup> | r941_prom_sup_g5014 | R9.4.1 | Guppy5 | `ont_r9_guppy` | GRCh38_no_alt | Minimap2 |
| Illumina | ilmn | NovaSeq/HiseqX | - | `ilmn` | GRCh38 | BWA-MEM |
| PacBio HiFi <sup>3</sup> | hifi_sequel2 | Sequel II with Chemistry 2.0 | - | `hifi_sequel2` | GRCh38_no_alt | Minimap2 |
| PacBio HIFI | hifi_revio | Revio with SMRTbell prep kit 3.0 | - | `hifi_revio` | GRCh38_no_alt | Minimap2 |

**Caveats <sup>1</sup>**: Both `ont_r10_dorado_4khz` and `ont_r10_dorado_5khz` models are trained in HG003+HG004 synthetic dataset, a multiple-samples pre-trained model is coming soon.

**Caveats <sup>1</sup>**: Although the r9(`r941_prom_sup_g5014`) model was trained on synthetic samples with r9.4.1 real data, the minimal AF cutoff, minimal coverage, and post-calling filtering parameters for the r9 model are copied from the r10 model, and are not optimized due to lack of real r9 data on a cancer sample with known truths.
**Caveats <sup>2</sup>**: Although the r9(`r941_prom_sup_g5014`) model was trained on synthetic samples with r9.4.1 real data, the minimal AF cutoff, minimal coverage, and post-calling filtering parameters for the r9 model are copied from the r10 model, and are not optimized due to lack of real r9 data on a cancer sample with known truths.

**Caveats <sup>2</sup>**: The PacBio HiFi Sequel II model is experimental. It was trained but not tested with any real data with known truths. HG003 54x and HG004 52x were used, thus tumor depth coverage higher than 50x may suffer from lower recall rate. For testing, please downsample both tumor and normal to ~40x for the best performance of this experimental model.
**Caveats <sup>3</sup>**: The PacBio HiFi Sequel II model is experimental. It was trained but not tested with any real data with known truths. HG003 54x and HG004 52x were used, thus tumor depth coverage higher than 50x may suffer from lower recall rate. For testing, please downsample both tumor and normal to ~40x for the best performance of this experimental model.


------
Expand All @@ -137,8 +143,8 @@ docker run -it \
--normal_bam_fn ${INPUT_DIR}/normal.bam \ ## use your normal bam file name here
--ref_fn ${INPUT_DIR}/ref.fa \ ## use your reference file name here
--threads ${THREADS} \ ## maximum threads to be used
--platform ${PLATFORM} \ ## options: {ont_r10, ont_r9, ilmn, hifi_sequel2, hifi_revio}
--output_dir ${OUTPUT_DIR} ## output path prefix
--platform ${PLATFORM} \ ## options: {ont_r10_dorado_4khz, ont_r10_dorado_5khz, ont_r10_guppy, ont_r9_guppy, ilmn, hifi_sequel2, hifi_revio}
--output_dir ${OUTPUT_DIR} ## output path prefix
```

Check [Usage](#Usage) for more options.
Expand Down Expand Up @@ -169,8 +175,8 @@ singularity exec \
--normal_bam_fn ${INPUT_DIR}/normal.bam \ ## use your normal bam file name here
--ref_fn ${INPUT_DIR}/ref.fa \ ## use your reference file name here
--threads ${THREADS} \ ## maximum threads to be used
--platform ${PLATFORM} \ ## options: {ont_r10, ont_r9, ilmn, hifi_sequel2, hifi_revio}
--output_dir ${OUTPUT_DIR} \ ## output path prefix
--platform ${PLATFORM} \ ## options: {ont_r10_dorado_4khz, ont_r10_dorado_5khz, ont_r10_guppy, ont_r9_guppy, ilmn, hifi_sequel2, hifi_revio}
--output_dir ${OUTPUT_DIR} \ ## output path prefix
--conda_prefix /opt/conda/envs/clairs
```

Expand Down Expand Up @@ -237,8 +243,8 @@ docker run -it hkubal/clairs:latest /opt/bin/run_clairs --help
--normal_bam_fn ${INPUT_DIR}/normal.bam \ ## use your bam file name here
--ref_fn ${INPUT_DIR}/ref.fa \ ## use your reference file name here
--threads ${THREADS} \ ## maximum threads to be used
--platform ${PLATFORM} \ ## options: {ont_r10, ont_r9, ilmn, hifi_sequel2, hifi_revio}
--output_dir ${OUTPUT_DIR} ## output path prefix
--platform ${PLATFORM} \ ## options: {ont_r10_dorado_4khz, ont_r10_dorado_5khz, ont_r10_guppy, ont_r9_guppy, ilmn, hifi_sequel2, hifi_revio}
--output_dir ${OUTPUT_DIR} ## output path prefix

## Final output file: ${OUTPUT_DIR}/output.vcf.gz
```
Expand All @@ -253,7 +259,7 @@ docker run -it hkubal/clairs:latest /opt/bin/run_clairs --help
-R, --ref_fn FASTA Reference file input. The input file must be samtools indexed.
-o, --output_dir OUTPUT_DIR VCF output directory.
-t, --threads THREADS Max #threads to be used.
-p, --platform PLATFORM Select the sequencing platform of the input. Possible options {ont_r10, ont_r9, ilmn, hifi_sequel2, hifi_revio}.
-p, --platform PLATFORM Select the sequencing platform of the input. Possible options {ont_r10_dorado_4khz, ont_r10_dorado_5khz, ont_r10_guppy, ont_r9_guppy, ilmn, hifi_sequel2, hifi_revio}.
```

**Miscellaneous parameters:**
Expand Down Expand Up @@ -299,25 +305,25 @@ docker run -it hkubal/clairs:latest /opt/bin/run_clairs --help
--normal_vcf_fn NORMAL_VCF_FN
EXPERIMENTAL: Path to normal VCF file. Setting this will skip germline varaint calling on normal BAM file input.
--enable_indel_calling
EXPERIMENTAL: Enable Indel calling, only support ont r10 platform. The calling time would increase significantly. default: disabled.
EXPERIMENTAL: Enable Indel calling, only support only support `ont_r10_guppy` and `hifi_revio` platforms. The calling time would increase significantly. default: disabled.
```
#### Call SNVs in one or mutiple chromosomes using the `-C/--ctg_name` parameter
```bash
./run_clairs -T tumor.bam -N normal.bam -R ref.fa -o output -t 8 -p ont_r10 -C chr21,chr22
./run_clairs -T tumor.bam -N normal.bam -R ref.fa -o output -t 8 -p ont_r10_guppy -C chr21,chr22
```
#### Call SNVs in one specific region using the `-r/--region` parameter
```bash
./run_clairs -T tumor.bam -N normal.bam -R ref.fa -o output -t 8 -p ont_r10 -r chr20:1000000-2000000
./run_clairs -T tumor.bam -N normal.bam -R ref.fa -o output -t 8 -p ont_r10_guppy -r chr20:1000000-2000000
```
#### Call SNVs at interested variant sites (genotyping) using the `-G/--genotyping_mode_vcf_fn` parameter
```bash
./run_clairs -T tumor.bam -N normal.bam -R ref.fa -o output -t 8 -p ont_r10 -G input.vcf
./run_clairs -T tumor.bam -N normal.bam -R ref.fa -o output -t 8 -p ont_r10_guppy -G input.vcf
```
#### Call SNVs in the BED regions using the `-B/--bed_fn` parameter
Expand All @@ -333,7 +339,7 @@ echo -e "${CTG2}\t${START_POS_2}\t${END_POS_2}" >> input.bed
Then:
```bash
./run_clairs -T tumor.bam -N normal.bam -R ref.fa -o output -t 8 -p ont_r10 -B input.bed
./run_clairs -T tumor.bam -N normal.bam -R ref.fa -o output -t 8 -p ont_r10_guppy -B input.bed
```
------
Expand Down
2 changes: 1 addition & 1 deletion run_clairs
Original file line number Diff line number Diff line change
Expand Up @@ -1579,7 +1579,7 @@ def somatic_parser():
ont_params.add_argument(
"--enable_indel_calling",
action='store_true',
help="EXPERIMENTAL: Enable Indel calling, only support ont r10 platform. The calling time would increase significantly. default: disabled."
help="EXPERIMENTAL: Enable Indel calling, only support `ont_r10_guppy` and `hifi_revio` platforms. The calling time would increase significantly. default: disabled."
)

optional_params.add_argument(
Expand Down

0 comments on commit 7235db1

Please sign in to comment.