Skip to content

Commit

Permalink
First stable release (squashed 50 commits)
Browse files Browse the repository at this point in the history
  • Loading branch information
JoaoRodrigues committed Aug 21, 2015
0 parents commit ad55db8
Show file tree
Hide file tree
Showing 26 changed files with 2,594 additions and 0 deletions.
20 changes: 20 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
The MIT License (MIT)

Copyright (c) 2015 João Rodrigues

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
122 changes: 122 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
pdb-tools
================================================
Set of utility scripts in python to manipulate PDB files as streams. Given the generic name, there
was already another set of scripts named 'pdb-tools', which perform a very different variety of
operations on PDB files. You can find them [here](https://github.com/harmslab/pdbtools).

About
---------

Manipulating PDB files is a pain. Extracting chains, renumbering residues, splitting or merging
models or chains, modifying b-factors and occupancies, or extracting the sequence of a PDB file, are
examples of operations that can be done using any decent parsing library but it takes 1) scripting
knowledge, 2) time, and 3) almost surely a set of external dependencies installed.

The scripts in this repository simplify most of these tasks. They are the descendant of a set of old
FORTRAN77 programs in use in our lab at Utrecht that had the particular advantage of working with
streams, i.e. the output of one script could be piped into another. Since FORTRAN77 is a pain too, I
rewrote the scripts in Python and added a few more.

Requests for new scripts will be taken into consideration, depending on the effort and general
usability of the script.

Features
------------
* Simple: one script, one job.
* Written using Python (stdlib): no compilation, cross-platform, no external dependencies
* Read data from file, or from the output of another script.

Requirements
------------
* Python 2.7 (might work on earlier versions, not really tested.)

Installation
------------
Download the zip archive or clone the repository with git. This last is the recommended option as it
is then extremely simple to get updates.

```bash
# To download
git clone https://github.com/JoaoRodrigues/pdb-tools

# To update
cd pdb-tools && git pull origin master
```

Usage
------------
All the scripts have a short description of their purpose and their usage. Just run them without any
arguments:
```bash
$ ./pdb_selchain.py

Extracts a chain from a PDB file.

usage: python pdb_selchain.py -<chain> <pdb file>
example: python pdb_selchain.py -A 1CTF.pdb

Author: Joao Rodrigues ([email protected])

This program is part of the PDB tools distributed with HADDOCK
or with the HADDOCK tutorial. The utilities in this package
can be used to quickly manipulate PDB files, with the benefit
of 'piping' several different commands. This is a rewrite of old
FORTRAN77 code that was taking too much effort to compile. RIP.
```

Examples
------------

* Downloading a structure
```bash
./pdb_fetch.py 1ctf > 1ctf.pdb
./pdb_fetch.py -biounit 1brs > 1brs_biounit.pdb
```

* Renumbering a structure
```bash
./pdb_reres.py -1 1ctf.pdb > 1ctf_renumbered.pdb
```

* Extracting a particular chain
```bash
./pdb_selchain.py -A 1brs_biounit.pdb > 1brs_A.pdb
```

* Downloading, extracting a chain, and extracting its aa sequence
```bash
./pdb_fetch.py 1brs | ./pdb_selchain.py -A | ./pdb_toseq.py > 1brs_A.fasta
```

* Getting general information on a PDB file
```bash
$ ./pdb_fetch.py 1brs | ./pdb_wc.py
No. atoms: 4640 (4640.0 per model)
No. residues: 588 (588.0 per model)
No. chains: 6 ( 6.0 per model)
No. models: 1
Hetero Atoms: Yes
Has seq. gaps: Yes
Double Occ.: Yes
Insertions: No
$ ./pdb_fetch.py -biounit 1brs | ./pdb_wc.py
No. atoms: 1559 (1559.0 per model)
No. residues: 195 (195.0 per model)
No. chains: 2 ( 2.0 per model)
No. models: 2
Hetero Atoms: Yes
Has seq. gaps: Yes
Double Occ.: Yes
Insertions: No
```

* Finding gaps in a PDB file
```bash
$ ./pdb_fetch.py -biounit 1brs | ./pdb_gap.py
D:THR63 < 4.88A > D:GLY66
```

License
---------
MIT. See LICENSE file.
105 changes: 105 additions & 0 deletions pdb_b.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
#!/usr/bin/env python

"""
Modifies the b-factor of a PDB file (default 10.0).
usage: python pdb_b.py -<bfactor> <pdb file>
example: python pdb_b.py -10.0 1CTF.pdb
Author: {0} ({1})
This program is part of the PDB tools distributed with HADDOCK
or with the HADDOCK tutorial. The utilities in this package
can be used to quickly manipulate PDB files, with the benefit
of 'piping' several different commands. This is a rewrite of old
FORTRAN77 code that was taking too much effort to compile. RIP.
"""

import os
import re
import sys

__author__ = "Joao Rodrigues"
__email__ = "[email protected]"

USAGE = __doc__.format(__author__, __email__)

def check_input(args):
"""Checks whether to read from stdin/file and validates user input/options."""

if not len(args):
# No bfactor, from pipe
if not sys.stdin.isatty():
pdbfh = sys.stdin
bfactor = 10.0
else:
sys.stderr.write(USAGE)
sys.exit(1)
elif len(args) == 1:
# bfactor & Pipe _or_ file & no bfactor
if re.match('\-[0-9\.]+', args[0]):
bfactor = float(args[0][1:])
if not sys.stdin.isatty():
pdbfh = sys.stdin
else:
sys.stderr.write(USAGE)
sys.exit(1)
else:
if not os.path.isfile(args[0]):
sys.stderr.write('File not found: ' + args[0] + '\n')
sys.stderr.write(USAGE)
sys.exit(1)
pdbfh = open(args[0], 'r')
bfactor = 10.0
elif len(args) == 2:
# bfactor & File
if not re.match('\-[0-9\.]+', args[0]):
sys.stderr.write('Invalid b-factor value: ' + args[0] + '\n')
sys.stderr.write(USAGE)
sys.exit(1)
if not os.path.isfile(args[1]):
sys.stderr.write('File not found: ' + args[1] + '\n')
sys.stderr.write(USAGE)
sys.exit(1)
bfactor = float(args[0][1:])
pdbfh = open(args[1], 'r')
else:
sys.stderr.write(USAGE)
sys.exit(1)

return (bfactor, pdbfh)

def _alter_bfactor(fhandle, bfactor):
"""Enclosing logic in a function to speed up a bit"""

coord_re = re.compile('^(ATOM|HETATM)')
#bfactor = str(round(bfactor, 2)).rjust(6)
bfactor = "{0:>6.2f}".format(bfactor)
for line in fhandle:
line = line.strip()
if coord_re.match(line):
yield line[:60] + bfactor + line[66:] + '\n'
else:
yield line + '\n'

if __name__ == '__main__':

# Check Input
bfactor, pdbfh = check_input(sys.argv[1:])

# Do the job
new_pdb = _alter_bfactor(pdbfh, bfactor)

try:
sys.stdout.write(''.join(new_pdb))
sys.stdout.flush()
except IOError:
# This is here to catch Broken Pipes
# for example to use 'head' or 'tail' without
# the error message showing up
pass

# last line of the script
# We can close it even if it is sys.stdin
pdbfh.close()
sys.exit(0)
106 changes: 106 additions & 0 deletions pdb_chain.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
#!/usr/bin/env python

"""
Sets the chain ID for a PDB file.
usage: python pdb_chain.py -<chain> <pdb file>
example: python pdb_chain.py -A 1CTF.pdb
Author: {0} ({1})
This program is part of the PDB tools distributed with HADDOCK
or with the HADDOCK tutorial. The utilities in this package
can be used to quickly manipulate PDB files, with the benefit
of 'piping' several different commands. This is a rewrite of old
FORTRAN77 code that was taking too much effort to compile. RIP.
"""

import os
import re
import sys

__author__ = "Joao Rodrigues"
__email__ = "[email protected]"

USAGE = __doc__.format(__author__, __email__)

def check_input(args):
"""Checks whether to read from stdin/file and validates user input/options."""

if not len(args):
# No chain, from pipe
if not sys.stdin.isatty():
pdbfh = sys.stdin
chain = ' '
else:
sys.stderr.write(USAGE)
sys.exit(1)
elif len(args) == 1:
# Chain & Pipe _or_ file & no chain
if re.match('\-[A-Za-z0-9]', args[0]):
chain = args[0][1:]
if not sys.stdin.isatty():
pdbfh = sys.stdin
else:
sys.stderr.write(USAGE)
sys.exit(1)
else:
if not os.path.isfile(args[0]):
sys.stderr.write('File not found: ' + args[0] + '\n')
sys.stderr.write(USAGE)
sys.exit(1)
pdbfh = open(args[0], 'r')
chain = ' '
elif len(args) == 2:
# Chain & File
if not re.match('\-[A-Za-z0-9]', args[0]):
sys.stderr.write('Invalid chain ID: ' + args[0] + '\n')
sys.stderr.write(USAGE)
sys.exit(1)

if not os.path.isfile(args[1]):
sys.stderr.write('File not found: ' + args[1] + '\n')
sys.stderr.write(USAGE)
sys.exit(1)
chain = args[0][1:]
pdbfh = open(args[1], 'r')
else:
sys.stderr.write(USAGE)
sys.exit(1)

return (chain, pdbfh)

def _alter_chain(fhandle, chain_id):
"""Enclosing logic in a function to speed up a bit"""

coord_re = re.compile('^(ATOM|HETATM)')
fhandle = fhandle
chain_id = chain_id

for line in fhandle:
line = line.strip()
if coord_re.match(line):
yield line[:21] + chain_id[0] + line[22:] + '\n'
else:
yield line + '\n'

if __name__ == '__main__':
# Check Input
chain, pdbfh = check_input(sys.argv[1:])

# Do the job
new_pdb = _alter_chain(pdbfh, chain)

try:
sys.stdout.write(''.join(new_pdb))
sys.stdout.flush()
except IOError:
# This is here to catch Broken Pipes
# for example to use 'head' or 'tail' without
# the error message showing up
pass

# last line of the script
# We can close it even if it is sys.stdin
pdbfh.close()
sys.exit(0)
Loading

0 comments on commit ad55db8

Please sign in to comment.