-
Notifications
You must be signed in to change notification settings - Fork 15
Specification on PDB input files
NucleicNet edited this page May 28, 2019
·
6 revisions
The NucleicNet operates on protein inputs in PDB file format. These file inputs should be placed in the "GridData" folder as indicated from our bash script. To allow a uniform processing of PDB files, users are recommended to check validity of their input files by observing the following criteria:
- Only contain rows starting with "ATOM" or "TER". Chain Termination indicated by "TER".
- The PDB file should only contain protein. (i.e. without RNA/DNA/solvents/ligands/HETATM etcetera)
- Do not contain non-standard amino acid within the protein chain.
- Do not contain chemicals other than proteins.
- Each PDB file should contain only one model (c.f. NMR models). In case multiple models are included in the same file, only the first one will be analysed.
- The file name can be any 4-digit alphanumeric starting with an integer followed by ".pdb" suffix. (e.g. "03Aa.pdb" or "2357.pdb" are valid, but "t3f4.pdb" is not.)
- The file name should not contain non-alphanumeric other than "." in ".pdb".
- Include Chain ID and all fields intact as indicated by ftp://ftp.wwpdb.org/pub/pdb/doc/format_descriptions/Format_v33_Letter.pdf
- Both residue and atom index needs to be in base-10 integer. (Some PDB files are written with base-16 integers to accommodate for their very large size. Currently, these files are unsupported.)
- Protein residues are recommended to be intact atom-wise. Users may call PDBfixer (https://anaconda.org/omnia/pdbfixer) to fulfil this requirement or simply remove the residue that are not intact.