Use Bio.SearchIO Module to Parse BLAST XML Result


How to Use Bio.SearchIO Module to Parse BLAST XML Result?



The Bio.SearchIO module allows to parse sequence search result from different result format.

1. Try the following code to query the "nt" database under the "blastn" program with a given DNA sequence, which is reverse translated from a protein sequence, AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA, from the "PF05371_seed.faa" file.

fyicenter$ python
>>> from Bio.Seq import Seq
>>> query = Seq(
...    "gcggaaccgaacgcggcgaccaactatgcgaccgaagcgatggatagcctgaaaacccag"
...   +"gcgattgatctgattagccagacctggccggtggtgaccaccgtggtggtggcgggcctg" 
...   +"gtgattcgcctgtttaaaaaatttagcagcaaagcg"
...   )

>>> from Bio.Blast import NCBIWWW
>>> result_handle = NCBIWWW.qblast("blastn", "nt", query)

2. Write the query result to a local file.

>>> with open("my_blast.xml", "w") as out_handle:
...   out_handle.write(
>>> result_handle.close()

3. Read the file back and parse the query result with Bio.SearchIO.parse() function.

>>> result_handle = open("my_blast.xml")

>>> from Bio import SearchIO
>>> blast_qresult = SearchIO.parse(result_handle, "blast-xml")

>>> results = list(blast_qresult)
>>> len(results)

>>> results[0]
QueryResult(id='No', 50 hits)

4. Print out the query result.

>>> print(results[0])
Program: blastn (2.13.0+)
  Query: No (156)
         definition line
 Target: nt
   Hits: ----  -----  ----------------------------------------------------------
            #  # HSP  ID + description
         ----  -----  ----------------------------------------------------------
            0      1  gi|9625381|ref|NC_001332.1|  Enterobacteria phage I2-2,...
            1      1  gi|14920|emb|X14336.1|  Filamentous Bacteriophage I2-2 ...
            2      2  gi|1844100906|gb|CP053326.1|  Salmonella enterica subsp...
            3      1  gi|2321361016|gb|CP107717.1|  Xanthomonas campestris pv...
            4      1  gi|2232737947|gb|CP075146.1|  Xanthomonas campestris pv...
            5      1  gi|2095686827|gb|CP066978.1|  Xanthomonas campestris pv...
            6      1  gi|1913269481|gb|CP062066.1|  Xanthomonas campestris st...
            7      1  gi|1864553229|gb|CP058243.1|  Xanthomonas campestris pv...
            8      1  gi|341934791|gb|CP002789.1|  Xanthomonas campestris pv....
            9      1  gi|1860091948|gb|CP054912.1|  Pantoea ananatis strain F...
           10      1  gi|1086024185|emb|LT629791.1|  Jiangella alkaliphila st...
           11      1  gi|2129627975|gb|CP086009.1|  Pantoea ananatis strain V...
           12      1  gi|1057948474|gb|CP015992.1|  Pseudomonas sp. TCU-HL1, ...
           13      1  gi|984699415|gb|CP014207.1|  Pantoea ananatis strain R1...
           14      1  gi|354986417|gb|CP003085.1|  Pantoea ananatis PA13, com...
           15      1  gi|1858692958|gb|CP054803.1|  Acinetobacter lwoffii str...
           16      1  gi|1712751337|gb|CP036319.1|  Crateriforma conspicua st...
           17      1  gi|2317627232|gb|CP083759.1|  Acinetobacter pseudolwoff...
           18      1  gi|2215523186|gb|CP094344.1|  Streptomyces sp. HP-A2021...
           19      1  gi|2086772708|gb|CP080636.1|  Acinetobacter lwoffii str...
           20      1  gi|1482407573|gb|CP032427.1|  Streptomyces griseorubigi...
           21      1  gi|1129998368|ref|XM_019858453.1|  PREDICTED: Hippocamp...
           22      1  gi|1085627832|emb|LT629688.1|  Auraticoccus monumenti s...
           23      1  gi|1052266331|emb|LT607411.1|  Micromonospora viridifac...
           24      1  gi|1033861152|gb|CP015876.1|  Pseudomonas putida SJTE-1...
           25      1  gi|952467264|gb|CP013129.1|  Streptomyces venezuelae st...
           26      1  gi|941153505|gb|CP007213.1|  Burkholderia plantarii str...
           27      1  gi|932864506|emb|LN881739.1|  Streptomyces venezuelae g...
           28      1  gi|2364306959|ref|XM_052380290.1|  PREDICTED: Dreissena...
           29      1  gi|2317440212|emb|OX346715.1|  Hemistola chrysoprasaria...
           47      1  gi|2089568073|dbj|AP024650.1|  Arthrobacter sp. StoSoil...
           48      1  gi|2070123745|gb|CP079095.1|  Methylococcus sp. Mc7 chr...
           49      1  gi|1893325938|gb|CP049017.1|  Xanthomonas theicola stra...

5. Review a single hit.

>>> result = results[0]
>>> hits = result.hits
>>> len(hits)

>>> hit
Hit(id='gi|9625381|ref|NC_001332.1|', query_id='No', 1 hsps)

>>> print(hit)
Query: No
       definition line
  Hit: gi|9625381|ref|NC_001332.1| (6744)
       Enterobacteria phage I2-2, complete genome
 HSPs: ----  --------  ---------  ------  ---------------  ---------------------
          #   E-value  Bit score    Span      Query range              Hit range
       ----  --------  ---------  ------  ---------------  ---------------------
          0   1.8e-11      83.34     128         [15:143]            [4760:4888]

6. Review a HSP (alignment) in a hit.

>>> hsps = hit.hsps
>>> hsp = hsps[0]
>>> print(hsp)
      Query: No definition line
        Hit: gi|9625381|ref|NC_001332.1| Enterobacteria phage I2-2, complete ...
Query range: [15:143] (1)
  Hit range: [4760:4888] (1)
Quick stats: evalue 1.8e-11; bitscore 83.34
  Fragments: 1 (128 columns)
             || |||| ||| || || ||||| ||| | ||||||||||| ||||| | |||||| ||~~~|||||


Parse PDB Entry with Bio.PDB.MMCIFParser.parser Module

Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()

Biopython - Tools for Biological Computation

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-05-09, 318🔥, 0💬