Parse PDB Entry with Bio.PDB.MMCIFParser.parser Module

Q

How to Parse PDB Entry with Bio.PDB.MMCIFParser.parser.get_structure() function?

✍: FYIcenter.com

A

Bio.PDB.MMCIFParser.parser.get_structure() function allows you to parse and any PDB (Protein Database) data files.

1. Download a PDB file in PDB format.

fyicenter$ curl http://files.rcsb.org/view/1fat.pdb > 1fat.pdb
fyicenter$ ls -l *.pdb
-rw-r--r--. 1 fyicenter staff 662580 Jan 2 09:52 1fat.pdb

2. Parse the file with the get_structure() function.

>>> from Bio.PDB.PDBParser import PDBParser
>>> parser = PDBParser()
>>> structure = parser.get_structure("1fat", "1fat.pdb")
  .../Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 7975.
  warnings.warn(
  .../Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 7991.
  warnings.warn(
  .../Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 8007.
  warnings.warn(
  .../Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain D is discontinuous at line 8023.
  warnings.warn(

3. Walk through the PDB structure.

>>> print(structure)
<Structure id=1fat>

>>> len(structure)
1

>>> model = structure[0]
>>> print(model) 
<Model id=0>

>>> len(model)
4

>>> model.child_dict
{'A': <Chain id=A>, 'B': <Chain id=B>, 'C': <Chain id=C>, 'D': <Chain id=D>}

>>> chain = model["A"]
>>> print(chain)
<Chain id=A>

>>> len(chain)
239

>>> residues = list(chain)
>>> residue = residues[0]
>>> print(residue)
<Residue SER het=  resseq=1 icode= >

>>> len(residue)
6

>>> atoms = list(residue)
>>> atoms
[<Atom N>, <Atom CA>, <Atom C>, <Atom O>, <Atom CB>, <Atom OG>]

>>> atom = atoms[0]
>>> print("Element: {}, Mass: {}, XYZ: {}".format(atom.element, atom.mass, atom.coord))
Element: N, Mass: 14.0067, XYZ: [22.898 12.385 31.874]

4. Access a single atom with given model, chain, residue.

>>> atom = structure[0]["A"][100]["CA"]
>>> print("Element: {}, Mass: {}, XYZ: {}".format(atom.element, atom.mass, atom.coord))
Element: C, Mass: 12.0107, XYZ: [ 28.073 -11.331  56.355]

5. Access all atoms in all residues on all chains and in all models.

>>> for model in structure:
...   for chain in model:
...     for residue in chain:
...       for atom in residue:
...         print("Element: {}, Mass: {}, XYZ: {}".format(atom.element, atom.mass, atom.coord))
...

 

Calculate Pairwise Sequence Alignment

Use Bio.SearchIO Module to Parse BLAST XML Result

Biopython - Tools for Biological Computation

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-05-09, 361🔥, 0💬