Skip to content

Mouse reference files from Mouse Genome Project VCFs

Keiran Raine edited this page Jul 22, 2016 · 2 revisions

Here we provide an example how to generate a SNP panel for mouse using the Mouse Genomes Project VCF files.

Generating SnpPositions.tsv

A tool has been created to assist with this:

ascatSnpPanelFromVcfs.pl snps.vcf[.gz] > SnpPositions.tsv

  This script was created to generate a shared SNP panel for species having
  multiple strains which have become homozygous through in-breeding.

  The resulting output is likely to only be useful in experiments where crossing
  of strains has been performed to add heterozygous SNPs back into the population.

  The initial use has been against the Mouse Genome Project outputs found here:
    ftp://ftp-mouse.sanger.ac.uk/REL-*-SNPs_Indels/mgp.*.merged.snps_all.*.vcf.gz

  As not all VCF files are created equal you may need to make modifications for
  other sources.

This example is for Mouse GRCm38 using data from the mouse genomes project.

Download an appropriate SNP dataset. In this case we want the merged data to ensure we include SNPs from multiple stains:

$ wget ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/mgp.v5.merged.snps_all.dbSNP142.vcf.gz
  OR
$ curl -sSL ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/mgp.v5.merged.snps_all.dbSNP142.vcf.gz > mgp.v5.merged.snps_all.dbSNP142.vcf.gz

Now use a script to create the relevant files:

$ ascatSnpPanelFromVcfs.pl mgp.v5.merged.snps_all.dbSNP142.vcf.gz | grep -v '^MT' > SnpPositions.tsv

Generating SnpGcCorrections.tsv

Please see Convert SnpPositions.tsv to SnpGcCorrections.tsv