Skip to content

File Conversion

ekherman edited this page Jun 11, 2020 · 4 revisions

Converting a SNP panel file to a new format

The convert_file tool will convert multiple input files to the format specified by the user. An input directory may be specified (otherwise, this is the current working directory), and one or more files may be given. This tool will also run check_format on the initial file(s) and on the converted file(s) to ensure successful conversion. The program will not convert incorrectly formatted files.

usage: snp_conversion convert_file [-h] [--input-dir INPUT_DIR]
                                   [--file-list FILE_LIST]
                                   [--input-format {TOP,FWD,AB,LONG,DESIGN,PLUS,mixed,AFFY}]
                                   --output-format
                                   {TOP,FWD,AB,PLUS,DESIGN,LONG,AFFY-PLUS}
                                   [--output-name OUTPUT_NAME] [-t THREADS]
                                   [--conversion CONVERSION] --assembly
                                   ASSEMBLY --species SPECIES [-s] [--tabular]
                                   [-v] [--plink]

Required options: --output-format, --assembly, --species

Input files

For file conversion, the file type combinations are restricted to the following:

Input File Output File
Illumina Top, Forward, Design, Plus, Long Illumina Top, Forward, Design, Plus, Long, AB
Affymetrix native* Affymetrix Plus

*Equivalent to Illumina TOP format

Note that AB format is not a valid input format for file conversion, as there is not enough information in the AB file to support conversion

For information on input files, see the Input Files page.

Genotype Conversion Files

The convert_file tool requires genotype conversion file information to be specified with the --conversion, --species, and --assembly options. See Genotype Conversion Files for information.

Options

  --input-dir INPUT_DIR
                        Directory containing input file(s) (default directory:
                        current working directory)
  --file-list FILE_LIST
                        [Optional] Comma-separated list of files in the input
                        directory to be converted (no whitespace)
  --input-format {TOP,FWD,AB,LONG,DESIGN,PLUS,mixed,AFFY}
                        Type of file(s) expected: 'TOP', 'FWD', 'AB', 'LONG',
                        'DESIGN', 'PLUS', 'mixed', or 'AFFY'. 'mixed' may not
                        be used when merging files.
  --output-format {TOP,FWD,AB,PLUS,DESIGN,LONG,AFFY-PLUS}
                        Type of file(s) to be created: 'TOP', 'FWD', 'AB',
                        'PLUS', 'DESIGN', 'LONG', 'AFFY-PLUS'. Only one type
                        of output can be specified at a time. PLUS converts
                        the data to the forward strand of the reference
                        genome. LONG refers to the long-format Illumina
                        file.AFFY-PLUS refers to an Affymetrix-format file
                        with PLUS alleles instead of the native Affymetrix
                        (FWD) format
  --output-name OUTPUT_NAME
                        Output file designation. File will be named
                        [input_file].[output_name].txt (default = output)
  -t THREADS, --threads THREADS
                        [Optional] Number of threads to use (default = 2)
  --conversion CONVERSION
                        Directory containing genotype conversion key files
                        (default directory: variant_position_files)
  --assembly ASSEMBLY   Assembly name - use conversion_list tool for full list
                        of choices
  --species SPECIES     Species name (use conversion_list tool for all
                        available choices)
  -s, --summary         Summarize converted SNP file in *_summary.txt file
  --tabular             Output summary file in tabular format (default: False)
  -v, --verbose-logging
                        [Optional] Write output to both STDOUT and log file
  --plink               Creates PLINK flat files (PED and MAP) (default:
                        False)

Program Output

The convert_file utility perform format checking on both the original and converted file, and thus will report the following messaging, as well as a message indicating successful conversion:

  • "File [filename] is correctly formatted in [format] format"
  • "File [filename] may be in [format] format with [x] inconsistent SNPs"
  • "File type for [filename] could not be determined: too many SNPs with inconsistent formatting"
  • "File [filename] was converted properly"

The utility will not convert a file that has inconsistent file formatting.

The utility creates a converted file with the name [file_basename].[output suffix].txt. The output suffix is specified by the parameter --output-name, and by default is "output".

Additional Output Files

Summary files and PLINK flat files (PED and MAP) can be generated using convert_file with the -s, --summary and --plink options, respectively. For more information on these files, see Additional Output Files.