Skip to content

Whole exome sequencing (WES) comparison of PDAC organoids against their corresponding tissue samples

Notifications You must be signed in to change notification settings

kane9530/WES-PDAC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Whole exome sequencing of PDAC samples

In this project, I analysed 50 tumor and normal WES samples for an experimentalist collaborator at NUS. I executed the Nf-core sarek nextflow pipeline, wrote custom bash scripts to tidy up the variant calling results, and ran downstream analysis and visualisation of the mutation variants in MAF file format in R. Results are stored in the results/ folder. The project writeup is stored in analysis/results_discussion.pdf.

Raw data

  • WES Batch 1 : s3://claire-booney-052023-data

  • WES Batch 1 processed data: s3://booney-wes. Note, the processed data for WES batch 1 is not in biodebian due to space constraints, hence, processed data should be retrieved from the S3 bucket.

  • WES Batch 2: s3://booney-wes-2

  • RNAseq: s3://booney-rnaseq

Directory organisation

  1. analysis/ Rmd files used to analyse the maf files

  2. nfcore/

  • config/
    • config.json files for nfcore
  • *.csv files Input csv files for nfcore
  1. scripts/ Custom bash scripts ran after the nfcore pipeline to generate the vcf and maf files.

  2. results/

  • first_batch_wes/ Contains the key output files from running the Rmd files in analysis/, and the maf, vcf and vcfstats output from the first batch of 40 WES samples. The full results from the nfcore/sarek pipeline is stored in this S3 bucket: s3://booney-wes/. As of 10/01/24, HLA haplotyping with optitype was run only on the first batch of samples.
  • second_batch_wes/ Same as the first_batch_wes folder but for the second batch of 23 WES samples. Full results are stored in this S3 bucket: s3://booney-wes-2/.
  • combined_batches_wes/ Results arising from combining the samples from both batches of WES analysis.

Other folders present in /media/gedac/kane/projects/booney_wes_clean include:

  • data/ Contains folders pointing to the raw data for both batches of WES analysis and an RNAseq analysis. For the RNAseq analysis, note that secondary analysis conducted by Novogene is also included.
  • references/ Reference files used to run the nfcore, such as preparatory files for the ascat tool and the dbNSFP database for variant annotation.

Input files

  1. dbNSFP 4.4a Resource. dbNSFP is a database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome. In nfcore/sarek, we used the Ensembl Variant Effect Predictor (VEP) tool with the dbNSFP plugin for annotation of the identified variants. This is indicated in the nfcore configuration files by setting vep_dbnsfp:true, the dbnsfp field to the path to the dbNSFP database, and the dbnsfp_tbi to the path to the tabix indexed file.

  2. ASCAT resources

ASCAT resources were generated following the guide from section "How to generate ASCAT resources for exome or targeted sequencing" in the nfcore/sarek page.

  1. Twist Biosciences 2.0 exome bed file

Resource. The bed file was sorted by chromosomal coordinates with sort -k1,1V -k2,2n -k3,3n "input.bed" > output.bed , and then padded by 50bp on both ends of the region. This extends each entry by a total of 100bp. The file is supplied in the nfcore configuration files under the intervals field.

  1. [first_batch|second_batch]_wes_input_full.csv This should be edited to provide the complete file paths to the original fastq files, and then supplied in the --input parameter when calling the nfcore/sarek pipeline.

  2. [first_batch|second_batch]_nf_params_pad.json and rnaseq_nfcore_config.json Supply this via the --params-file parameter when calling the nfcore/sarek or nfcore/rnaseq pipeline. See the multiqc.html file for the exact command used.

Method summary

See analysis/results_discussion.pdf methods section.

About

Whole exome sequencing (WES) comparison of PDAC organoids against their corresponding tissue samples

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages