Skip to content

Additional Output Files

ekherman edited this page May 28, 2020 · 3 revisions

Creating a summary

The --summary flag can be used to create a summary of any Illumina or Affymetrix format file when running check_format or convert_file tools.

For each sample, the summary file reports

  • Total number of variants
  • Number of variants with genotypes (i.e. not '--', '---', or 'NoCall')
  • Number of homozygous SNPs
  • Number of heterozygous SNPs
  • Number of indels
  • Number of consistent genotypes
  • Number of inconsistent genotypes

For Long format files, the summary file also reports:

  • Number of equivalent genotypes
  • Number of inequivalent genotypes

In Long format, genotype equivalency is a measure of whether all of the alleles for a marker are equivalent across formats. For example, a set of alleles may be correctly formatted as "Top" and therefore considered consistent, but are not equivalent with the alleles reported as "Forward" and "Design".

The default summary output is in a "pretty" format. To output a tab-formatted summary file, use the --tabular flag in conjunction with --summary.

A summary file cannot be generated when using the merge_files utility. Instead, run check_format with the --summary flag on the merged file.

Creating PLINK MAP and PED flat files

The option --plink can be used with check_format or convert_file to generate PED and MAP files for the input panel or converted panel, respectively.

The PED output file is very simple; it contains only the Individual ID and the genotypes (in A/C/G/T format), separated by whitespace. To use this file with PLINK, specify the missing fields with --no-fid --no-parents --no-sex --no-pheno, or input this missing information.

The MAP file contains the chromosome code, SNP identifier (the marker name), the genetic distance (set to 0), and the base pair position. Chromosome codes are organism specific; to use this file in PLINK, specify --cow or --chr-set 29 no-xy for bos_taurus, and --chr-set 18 no-xy for sus_scrofa. All values are separated by whitespace.

For more information on these files, see the PLINK data format page

Clone this wiki locally