1. Analysis tracks of the pipeline

The GenErode pipeline can be run up to any point. Note that most steps depend on each other.

1) Data processing track

Required for BAM file track and VCF file track.

Reference genome indexing
Repeat element identification from reference genome
FASTQ file processing
- Adapter trimming (modern samples)
- Adapter trimming and merging of overlapping paired-end reads (historical samples)
Optional data processing steps:
- Mapping to mitochondrial genomes of the target species and of potential contaminating species (the output from this step is not used downstream in the pipeline)
Mapping of historical and/or modern samples to a reference genome
BAM file processing:
- Merge samples from different lanes per PCR/index
- Remove duplicates
- Merge BAM files per sample
- Realign indels
- Calculate average genome-wide depth
Optional data processing steps:
- Base quality rescaling (mapDamage2) for historical samples
- Subsampling to target depth
Genotyping
Optional data processing steps:
- CpG site identification (three different methods)

Optional data processing steps:
- BED files with sex chromosomal or autosomal contigs
Downstream analyses:
- mlRho (default filtering for quality, depth, repeat elements and optional filtering for sex chromosomal or autosomal contigs and CpG sites)

Optional data processing steps:
- VCF file filtering for CpG sites
VCF file processing per sample:
- Filtering for quality, depth, allelic imbalance
- Remove SNPs near indels and indels
- Remove repeat regions
VCF file merging and processing:
- Merge VCF files from all samples
- Filter to keep only biallelic SNPs
- Remove sites with more than a certain fraction of missing genotypes across all samples
- Extract samples from each dataset (historical samples, modern samples)
Optional downstream analyses:
- PCA
- Runs of homozygosity (ROH)
- snpEff

GERP score calculation from reference genome and genomes of additional outgroup species
Calculation of relative mutational load per sample from processed VCF files