Skip to content

Look up annotations for subcellular location by Uniprot ID.

Notifications You must be signed in to change notification settings

ajmaurais/protein_scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

locScrape

Given a .tsv file with a column containing Uniprot protein IDs, scrape annotations for subcellular location from Uniprot.org

Installation

The simplest way to install locScrape is to download one of the precompiled binaries under the releases tab. Binaries are available for OSX and CentOS.

You can also clone this repository with the command.

git clone https://github.com/ajmaurais/locScrape

Usage

usage: locScrape [-h] [-i IDCOL] [--columns {sl,go,all}] [--locCol LOCCOL]
                  [--goCol GOCOL] [--allCol ALLCOL] [--nThread NTHREAD]
                  [-o OFNAME] [--inPlace]
                  input_file [input_file ...]

Get subcellular location annotations for a list of Uniprot protein IDs. A
column in input_file should contain Uniprot IDs. After locScrape runs,
columns will be added for Unipriot location annotations, GO cellular component
annotations.

positional arguments:
  input_file            .tsv or .csv files to process.

optional arguments:
  -h, --help            show this help message and exit

  -i IDCOL, --idCol IDCOL
                        Name of column containing Uniprot IDs.

  --columns {sl,go,all}
                        Which new columns should be added?
                        sl : Uniprot annotation for subcellular location
                        go : GO annotation for cellular component
                        all : both sl and go
                        Default is all.

  --locCol LOCCOL       Name of new column to add with subcellular location.

  --goCol GOCOL         Name of new column to add with GO cellular component
                        annotation.

  --allCol ALLCOL       Name of new column to add with GO and Uniprot
                        annotations combined.

  --nThread NTHREAD     Number of threads to use to lookup Uniprot
                        annotations. Default is the number of logical cores on
                        your system.

  -o OFNAME, --ofname OFNAME
                        Name of output file. Default is <input_file>_loc.tsv.
                        If multiple input files are given, this argument is
                        ignored.

  --inPlace             Overwrite input files with output files. This option
                        overrides the --ofname option.