Use Bio.SearchIO Module to Parse BLAST XML Result

Q

How to Use Bio.SearchIO Module to Parse BLAST XML Result?

✍: FYIcenter.com

A

The Bio.SearchIO module allows to parse sequence search result from different result format.

1. Try the following code to query the "nt" database under the "blastn" program with a given DNA sequence, which is reverse translated from a protein sequence, AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA, from the "PF05371_seed.faa" file.

fyicenter$ python
>>> from Bio.Seq import Seq
>>> query = Seq(
...    "gcggaaccgaacgcggcgaccaactatgcgaccgaagcgatggatagcctgaaaacccag"
...   +"gcgattgatctgattagccagacctggccggtggtgaccaccgtggtggtggcgggcctg" 
...   +"gtgattcgcctgtttaaaaaatttagcagcaaagcg"
...   )

>>> from Bio.Blast import NCBIWWW
>>> result_handle = NCBIWWW.qblast("blastn", "nt", query)

2. Write the query result to a local file.

>>> with open("my_blast.xml", "w") as out_handle:
...   out_handle.write(result_handle.read())
...
138648
>>> result_handle.close()

3. Read the file back and parse the query result with Bio.SearchIO.parse() function.

>>> result_handle = open("my_blast.xml")

>>> from Bio import SearchIO
>>> blast_qresult = SearchIO.parse(result_handle, "blast-xml")

>>> results = list(blast_qresult)
>>> len(results)
1

>>> results[0]
QueryResult(id='No', 50 hits)

4. Print out the query result.

>>> print(results[0])
Program: blastn (2.13.0+)
  Query: No (156)
         definition line
 Target: nt
   Hits: ----  -----  ----------------------------------------------------------
            #  # HSP  ID + description
         ----  -----  ----------------------------------------------------------
            0      1  gi|9625381|ref|NC_001332.1|  Enterobacteria phage I2-2,...
            1      1  gi|14920|emb|X14336.1|  Filamentous Bacteriophage I2-2 ...
            2      2  gi|1844100906|gb|CP053326.1|  Salmonella enterica subsp...
            3      1  gi|2321361016|gb|CP107717.1|  Xanthomonas campestris pv...
            4      1  gi|2232737947|gb|CP075146.1|  Xanthomonas campestris pv...
            5      1  gi|2095686827|gb|CP066978.1|  Xanthomonas campestris pv...
            6      1  gi|1913269481|gb|CP062066.1|  Xanthomonas campestris st...
            7      1  gi|1864553229|gb|CP058243.1|  Xanthomonas campestris pv...
            8      1  gi|341934791|gb|CP002789.1|  Xanthomonas campestris pv....
            9      1  gi|1860091948|gb|CP054912.1|  Pantoea ananatis strain F...
           10      1  gi|1086024185|emb|LT629791.1|  Jiangella alkaliphila st...
           11      1  gi|2129627975|gb|CP086009.1|  Pantoea ananatis strain V...
           12      1  gi|1057948474|gb|CP015992.1|  Pseudomonas sp. TCU-HL1, ...
           13      1  gi|984699415|gb|CP014207.1|  Pantoea ananatis strain R1...
           14      1  gi|354986417|gb|CP003085.1|  Pantoea ananatis PA13, com...
           15      1  gi|1858692958|gb|CP054803.1|  Acinetobacter lwoffii str...
           16      1  gi|1712751337|gb|CP036319.1|  Crateriforma conspicua st...
           17      1  gi|2317627232|gb|CP083759.1|  Acinetobacter pseudolwoff...
           18      1  gi|2215523186|gb|CP094344.1|  Streptomyces sp. HP-A2021...
           19      1  gi|2086772708|gb|CP080636.1|  Acinetobacter lwoffii str...
           20      1  gi|1482407573|gb|CP032427.1|  Streptomyces griseorubigi...
           21      1  gi|1129998368|ref|XM_019858453.1|  PREDICTED: Hippocamp...
           22      1  gi|1085627832|emb|LT629688.1|  Auraticoccus monumenti s...
           23      1  gi|1052266331|emb|LT607411.1|  Micromonospora viridifac...
           24      1  gi|1033861152|gb|CP015876.1|  Pseudomonas putida SJTE-1...
           25      1  gi|952467264|gb|CP013129.1|  Streptomyces venezuelae st...
           26      1  gi|941153505|gb|CP007213.1|  Burkholderia plantarii str...
           27      1  gi|932864506|emb|LN881739.1|  Streptomyces venezuelae g...
           28      1  gi|2364306959|ref|XM_052380290.1|  PREDICTED: Dreissena...
           29      1  gi|2317440212|emb|OX346715.1|  Hemistola chrysoprasaria...
           ~~~
           47      1  gi|2089568073|dbj|AP024650.1|  Arthrobacter sp. StoSoil...
           48      1  gi|2070123745|gb|CP079095.1|  Methylococcus sp. Mc7 chr...
           49      1  gi|1893325938|gb|CP049017.1|  Xanthomonas theicola stra...

5. Review a single hit.

>>> result = results[0]
>>> hits = result.hits
>>> len(hits)
50 

>>> hit
Hit(id='gi|9625381|ref|NC_001332.1|', query_id='No', 1 hsps)

>>> print(hit)
Query: No
       definition line
  Hit: gi|9625381|ref|NC_001332.1| (6744)
       Enterobacteria phage I2-2, complete genome
 HSPs: ----  --------  ---------  ------  ---------------  ---------------------
          #   E-value  Bit score    Span      Query range              Hit range
       ----  --------  ---------  ------  ---------------  ---------------------
          0   1.8e-11      83.34     128         [15:143]            [4760:4888]

6. Review a HSP (alignment) in a hit.

>>> hsps = hit.hsps
>>> hsp = hsps[0]
>>> print(hsp)
      Query: No definition line
        Hit: gi|9625381|ref|NC_001332.1| Enterobacteria phage I2-2, complete ...
Query range: [15:143] (1)
  Hit range: [4760:4888] (1)
Quick stats: evalue 1.8e-11; bitscore 83.34
  Fragments: 1 (128 columns)
     Query - GCGACCAACTATGCGACCGAAGCGATGGATAGCCTGAAAACCCAGGCGATTGATCTGAT~~~AAATT
             || |||| ||| || || ||||| ||| | ||||||||||| ||||| | |||||| ||~~~|||||
       Hit - GCTACCAGCTACGCTACTGAAGCAATGAACAGCCTGAAAACTCAGGCAACTGATCTCAT~~~AAATT

 

Parse PDB Entry with Bio.PDB.MMCIFParser.parser Module

Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()

Biopython - Tools for Biological Computation

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-05-09, 318🔥, 0💬