-
Notifications
You must be signed in to change notification settings - Fork 0
File Conversion
The convert_file tool will convert multiple input files to the format specified by the user. An input directory may be specified (otherwise, this is the current working directory), and one or more files may be given. This tool will also run check_format on the initial file(s) and on the converted file(s) to ensure successful conversion. The program will not convert incorrectly formatted files.
usage: snp_conversion convert_file [-h] [--input-dir INPUT_DIR]
[--file-list FILE_LIST]
[--input-format {TOP,FWD,AB,LONG,DESIGN,PLUS,mixed,AFFY}]
--output-format
{TOP,FWD,AB,PLUS,DESIGN,LONG,AFFY-PLUS}
[--output-name OUTPUT_NAME] [-t THREADS]
[--conversion CONVERSION] --assembly
ASSEMBLY --species SPECIES [-s] [--tabular]
[-v] [--plink]
Required options: --output-format, --assembly, --species
For file conversion, the file type combinations are restricted to the following:
Input File | Output File |
---|---|
Illumina Top, Forward, Design, Plus, Long | Illumina Top, Forward, Design, Plus, Long, AB |
Affymetrix native* | Affymetrix Plus |
*Equivalent to Illumina TOP format
Note that AB format is not a valid input format for file conversion, as there is not enough information in the AB file to support conversion
For information on input files, see the Input Files page.
The convert_file tool requires genotype conversion file information to be specified with the --conversion
, --species
, and --assembly
options. See Genotype Conversion Files for information.
--input-dir INPUT_DIR
Directory containing input file(s) (default directory:
current working directory)
--file-list FILE_LIST
[Optional] Comma-separated list of files in the input
directory to be converted (no whitespace)
--input-format {TOP,FWD,AB,LONG,DESIGN,PLUS,mixed,AFFY}
Type of file(s) expected: 'TOP', 'FWD', 'AB', 'LONG',
'DESIGN', 'PLUS', 'mixed', or 'AFFY'. 'mixed' may not
be used when merging files.
--output-format {TOP,FWD,AB,PLUS,DESIGN,LONG,AFFY-PLUS}
Type of file(s) to be created: 'TOP', 'FWD', 'AB',
'PLUS', 'DESIGN', 'LONG', 'AFFY-PLUS'. Only one type
of output can be specified at a time. PLUS converts
the data to the forward strand of the reference
genome. LONG refers to the long-format Illumina
file.AFFY-PLUS refers to an Affymetrix-format file
with PLUS alleles instead of the native Affymetrix
(FWD) format
--output-name OUTPUT_NAME
Output file designation. File will be named
[input_file].[output_name].txt (default = output)
-t THREADS, --threads THREADS
[Optional] Number of threads to use (default = 2)
--conversion CONVERSION
Directory containing genotype conversion key files
(default directory: variant_position_files)
--assembly ASSEMBLY Assembly name - use conversion_list tool for full list
of choices
--species SPECIES Species name (use conversion_list tool for all
available choices)
-s, --summary Summarize converted SNP file in *_summary.txt file
--tabular Output summary file in tabular format (default: False)
-v, --verbose-logging
[Optional] Write output to both STDOUT and log file
--plink Creates PLINK flat files (PED and MAP) (default:
False)
The convert_file
utility perform format checking on both the original and
converted file, and thus will report the following messaging, as well as a
message indicating successful conversion:
- "File [filename] is correctly formatted in [format] format"
- "File [filename] may be in [format] format with [x] inconsistent SNPs"
- "File type for [filename] could not be determined: too many SNPs with inconsistent formatting"
- "File [filename] was converted properly"
The utility will not convert a file that has inconsistent file formatting.
The utility creates a converted file with the name
[file_basename].[output suffix].txt
. The output suffix is specified by the
parameter --output-name
, and by default is "output".
Summary files and PLINK flat files (PED and MAP) can be generated using convert_file with the -s, --summary
and --plink
options, respectively. For more information on these files, see Additional Output Files.