-
Notifications
You must be signed in to change notification settings - Fork 7
1. Analysis tracks of the pipeline
Verena Kutschera edited this page Mar 1, 2022
·
2 revisions
The GenErode pipeline can be run up to any point. Note that most steps depend on each other.
Required for BAM file track and VCF file track.
- Reference genome indexing
- Repeat element identification from reference genome
- FASTQ file processing
- Adapter trimming (modern samples)
- Adapter trimming and merging of overlapping paired-end reads (historical samples)
- Optional data processing steps:
- Mapping to mitochondrial genomes of the target species and of potential contaminating species (the output from this step is not used downstream in the pipeline)
- Mapping of historical and/or modern samples to a reference genome
- BAM file processing:
- Merge samples from different lanes per PCR/index
- Remove duplicates
- Merge BAM files per sample
- Realign indels
- Calculate average genome-wide depth
- Optional data processing steps:
- Base quality rescaling (mapDamage2) for historical samples
- Subsampling to target depth
- Genotyping
- Optional data processing steps:
- CpG site identification (three different methods)
- Optional data processing steps:
- BED files with sex chromosomal or autosomal contigs
- Downstream analyses:
- mlRho (default filtering for quality, depth, repeat elements and optional filtering for sex chromosomal or autosomal contigs and CpG sites)
- Optional data processing steps:
- VCF file filtering for CpG sites
- VCF file processing per sample:
- Filtering for quality, depth, allelic imbalance
- Remove SNPs near indels and indels
- Remove repeat regions
- VCF file merging and processing:
- Merge VCF files from all samples
- Filter to keep only biallelic SNPs
- Remove sites with more than a certain fraction of missing genotypes across all samples
- Extract samples from each dataset (historical samples, modern samples)
- Optional downstream analyses:
- PCA
- Runs of homozygosity (ROH)
- snpEff
- GERP score calculation from reference genome and genomes of additional outgroup species
- Calculation of relative mutational load per sample from processed VCF files