-
Notifications
You must be signed in to change notification settings - Fork 7
Home
Welcome to the GenErode wiki! This Snakemake pipeline analyzes whole-genome sequencing data from historical and modern samples, with the aim to look for patterns of genome erosion (see Díez-del-Molino et al. 2018).
The pipeline takes raw FASTQ files as input, maps them to a reference genome assembly, performs variant calling, and runs several downstream analyses. All steps include quality filtering and quality checks. Optional steps include base quality rescaling of historical samples, CpG-site removal based on VCF files or the reference genome, subsampling a proportion of reads to achieve a similar average depth across samples, and mapping of historical samples to human and other vertebrate mitochondrial genomes to check for non-endogenous reads in the data.
Downstream analyses include estimates of genome-wide heterozygosity from BAM files in mlRho, estimates of runs of homozygosity (ROHs) from BCF files in plink, SNP annotation of BCF files in snpEff, and GERP score calculations followed by estimates of relative mutational load from the reference genome assembly and BCF files.
Email address: [email protected]
If you've used GenErode to produce results, please cite our bioRxiv article:
Kutschera VE, Kierczak M, van der Valk T, von Seth J, Dussex N, Lord E, Dehasque M, Stanton DWG, Emami P, Nystedt B, Dalén L, Díez-del-Molino D. GenErode: a bioinformatics pipeline to investigate genome erosion in endangered and extinct species. bioRxiv 2022. DOI: pending