Skip to content

Latest commit

 

History

History
41 lines (27 loc) · 2.01 KB

README.md

File metadata and controls

41 lines (27 loc) · 2.01 KB

ncbi-acess

This script it's a toolbox to automatic recovery information of NCBI.

Dependencies

This script was build on python 3.6.5+ and have only two dependencies:

Recomended lectures

Usage

  • To recovery genbank information from nucleotide sequences:

python ncbi_seq_retrieve.py -in file_with_access_ids.txt -db nucleotide -ot gb

Or to recovery in xml format, just insert the parameter -tf xml.

  • To recovery cds translated to aminoacids from nucleotide sequences:

python ncbi_seq_retrieve.py -in file_with_acess_ids.txt -db nucleotide -ot fasta_cds_aa

Or to recovery cds not translated, just change fasta_cds_aa for fasta_cds_na

  • To recovery nucleotide of aminoacid sequences

python ncbi_seq_retrieve.py -in file_with_acess_ids.txt -db (nucleotide or protein) -ot fasta

Or to recovery in xml format, just insert the parameter -tf xml.

  • To recovery taxonomy information of ncbi acess IDs

python ncbi_seq_retrieve.py -in file_with_acess_ids.txt -db (nucleotide or protein) -ot gb -tx True

  • To recovery taxonomy information of host of ncbi acess IDs (ideal for viruses)

python ncbi_seq_retrieve.py -in file_with_acess_ids.txt -db (nucleotide or protein) -ot gb -tx True -th True

Some considerations

If you have a file with IDs from nucleotide sequences, you can't use this file in a protein database, and vice-versa. If you call help function, a table with which text formats are allowed per output type, and which output types are allowed per database.

Disclaimer

  • This script will continue to be developed to englobe others functions, like features of sequences, for example.