Skip to content

Reference files from normal sample data

Keiran Raine edited this page Jun 3, 2016 · 1 revision

This provides an example of how reference files can be generated from data files when no public SNP set is available.

The code has been tested using crossed mouse Illumina paired-end sequencing mapped with BWA-mem. The result was compared against the Mouse Genomes Project VCF based reference described here.

Identifying sample specific SNPs

For each sample file you scan for high confidence non-reference locations.

Please see ascatSnpPanelGeneration.pl -h for all options.

$ ascatSnpPanelGeneration.pl -ref genome.fa -hf sampleA.bam > sampleA-hets.tsv.0

(output filename must end with a number .N)

We recommend 10 samples as a minimum when performing this.

Merging samples into common SNPs

Once a set of outputs from the previous step are generated the following command will determine the common HET/HOM loci to be used in the final panel.

  • A HET SNP is generated if it exists in >66% of samples.
  • A HOM SNP is generated if it exists in >33% of samples.
  • Locations with more than 2 alleles expressed across the panel are excluded.
  • Locations within 500bp of another potential SNP are excluded.

Basic usage is:

$ ascatSnpPanelMerge.pl genome.fa sampleA-hets.tsv.0 [sampleB-hets.tsv.0] > SnpPositions.tsv

Run with no options for additional information.

Generate SnpGcCorrections.tsv

Please see Convert SnpPositions.tsv to SnpGcCorrections.tsv