locScrape

Given a .tsv file with a column containing Uniprot protein IDs, scrape annotations for subcellular location from Uniprot.org

Installation

The simplest way to install locScrape is to download one of the precompiled binaries under the releases tab. Binaries are available for OSX and CentOS.

You can also clone this repository with the command.

git clone https://github.com/ajmaurais/locScrape

Usage

usage: locScrape [-h] [-i IDCOL] [--columns {sl,go,all}] [--locCol LOCCOL]
                  [--goCol GOCOL] [--allCol ALLCOL] [--nThread NTHREAD]
                  [-o OFNAME] [--inPlace]
                  input_file [input_file ...]

Get subcellular location annotations for a list of Uniprot protein IDs. A
column in input_file should contain Uniprot IDs. After locScrape runs,
columns will be added for Unipriot location annotations, GO cellular component
annotations.

positional arguments:
  input_file            .tsv or .csv files to process.

optional arguments:
  -h, --help            show this help message and exit

  -i IDCOL, --idCol IDCOL
                        Name of column containing Uniprot IDs.

  --columns {sl,go,all}
                        Which new columns should be added?
                        sl : Uniprot annotation for subcellular location
                        go : GO annotation for cellular component
                        all : both sl and go
                        Default is all.

  --locCol LOCCOL       Name of new column to add with subcellular location.

  --goCol GOCOL         Name of new column to add with GO cellular component
                        annotation.

  --allCol ALLCOL       Name of new column to add with GO and Uniprot
                        annotations combined.

  --nThread NTHREAD     Number of threads to use to lookup Uniprot
                        annotations. Default is the number of logical cores on
                        your system.

  -o OFNAME, --ofname OFNAME
                        Name of output file. Default is <input_file>_loc.tsv.
                        If multiple input files are given, this argument is
                        ignored.

  --inPlace             Overwrite input files with output files. This option
                        overrides the --ofname option.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
protein_scrape		protein_scrape
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

locScrape

Installation

Usage

About

Releases 3

Packages

Contributors 2

Languages

ajmaurais/protein_scrape

Folders and files

Latest commit

History

Repository files navigation

locScrape

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages