Collections:
Use Bio.SearchIO Module to Parse BLAST XML Result
How to Use Bio.SearchIO Module to Parse BLAST XML Result?
✍: FYIcenter.com
The Bio.SearchIO module allows to parse sequence search result
from different result format.
1. Try the following code to query the "nt" database under the "blastn" program with a given DNA sequence, which is reverse translated from a protein sequence, AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRLFKKFSSKA, from the "PF05371_seed.faa" file.
fyicenter$ python
>>> from Bio.Seq import Seq
>>> query = Seq(
... "gcggaaccgaacgcggcgaccaactatgcgaccgaagcgatggatagcctgaaaacccag"
... +"gcgattgatctgattagccagacctggccggtggtgaccaccgtggtggtggcgggcctg"
... +"gtgattcgcctgtttaaaaaatttagcagcaaagcg"
... )
>>> from Bio.Blast import NCBIWWW
>>> result_handle = NCBIWWW.qblast("blastn", "nt", query)
2. Write the query result to a local file.
>>> with open("my_blast.xml", "w") as out_handle:
... out_handle.write(result_handle.read())
...
138648
>>> result_handle.close()
3. Read the file back and parse the query result with Bio.SearchIO.parse() function.
>>> result_handle = open("my_blast.xml")
>>> from Bio import SearchIO
>>> blast_qresult = SearchIO.parse(result_handle, "blast-xml")
>>> results = list(blast_qresult)
>>> len(results)
1
>>> results[0]
QueryResult(id='No', 50 hits)
4. Print out the query result.
>>> print(results[0])
Program: blastn (2.13.0+)
Query: No (156)
definition line
Target: nt
Hits: ---- ----- ----------------------------------------------------------
# # HSP ID + description
---- ----- ----------------------------------------------------------
0 1 gi|9625381|ref|NC_001332.1| Enterobacteria phage I2-2,...
1 1 gi|14920|emb|X14336.1| Filamentous Bacteriophage I2-2 ...
2 2 gi|1844100906|gb|CP053326.1| Salmonella enterica subsp...
3 1 gi|2321361016|gb|CP107717.1| Xanthomonas campestris pv...
4 1 gi|2232737947|gb|CP075146.1| Xanthomonas campestris pv...
5 1 gi|2095686827|gb|CP066978.1| Xanthomonas campestris pv...
6 1 gi|1913269481|gb|CP062066.1| Xanthomonas campestris st...
7 1 gi|1864553229|gb|CP058243.1| Xanthomonas campestris pv...
8 1 gi|341934791|gb|CP002789.1| Xanthomonas campestris pv....
9 1 gi|1860091948|gb|CP054912.1| Pantoea ananatis strain F...
10 1 gi|1086024185|emb|LT629791.1| Jiangella alkaliphila st...
11 1 gi|2129627975|gb|CP086009.1| Pantoea ananatis strain V...
12 1 gi|1057948474|gb|CP015992.1| Pseudomonas sp. TCU-HL1, ...
13 1 gi|984699415|gb|CP014207.1| Pantoea ananatis strain R1...
14 1 gi|354986417|gb|CP003085.1| Pantoea ananatis PA13, com...
15 1 gi|1858692958|gb|CP054803.1| Acinetobacter lwoffii str...
16 1 gi|1712751337|gb|CP036319.1| Crateriforma conspicua st...
17 1 gi|2317627232|gb|CP083759.1| Acinetobacter pseudolwoff...
18 1 gi|2215523186|gb|CP094344.1| Streptomyces sp. HP-A2021...
19 1 gi|2086772708|gb|CP080636.1| Acinetobacter lwoffii str...
20 1 gi|1482407573|gb|CP032427.1| Streptomyces griseorubigi...
21 1 gi|1129998368|ref|XM_019858453.1| PREDICTED: Hippocamp...
22 1 gi|1085627832|emb|LT629688.1| Auraticoccus monumenti s...
23 1 gi|1052266331|emb|LT607411.1| Micromonospora viridifac...
24 1 gi|1033861152|gb|CP015876.1| Pseudomonas putida SJTE-1...
25 1 gi|952467264|gb|CP013129.1| Streptomyces venezuelae st...
26 1 gi|941153505|gb|CP007213.1| Burkholderia plantarii str...
27 1 gi|932864506|emb|LN881739.1| Streptomyces venezuelae g...
28 1 gi|2364306959|ref|XM_052380290.1| PREDICTED: Dreissena...
29 1 gi|2317440212|emb|OX346715.1| Hemistola chrysoprasaria...
~~~
47 1 gi|2089568073|dbj|AP024650.1| Arthrobacter sp. StoSoil...
48 1 gi|2070123745|gb|CP079095.1| Methylococcus sp. Mc7 chr...
49 1 gi|1893325938|gb|CP049017.1| Xanthomonas theicola stra...
5. Review a single hit.
>>> result = results[0]
>>> hits = result.hits
>>> len(hits)
50
>>> hit
Hit(id='gi|9625381|ref|NC_001332.1|', query_id='No', 1 hsps)
>>> print(hit)
Query: No
definition line
Hit: gi|9625381|ref|NC_001332.1| (6744)
Enterobacteria phage I2-2, complete genome
HSPs: ---- -------- --------- ------ --------------- ---------------------
# E-value Bit score Span Query range Hit range
---- -------- --------- ------ --------------- ---------------------
0 1.8e-11 83.34 128 [15:143] [4760:4888]
6. Review a HSP (alignment) in a hit.
>>> hsps = hit.hsps
>>> hsp = hsps[0]
>>> print(hsp)
Query: No definition line
Hit: gi|9625381|ref|NC_001332.1| Enterobacteria phage I2-2, complete ...
Query range: [15:143] (1)
Hit range: [4760:4888] (1)
Quick stats: evalue 1.8e-11; bitscore 83.34
Fragments: 1 (128 columns)
Query - GCGACCAACTATGCGACCGAAGCGATGGATAGCCTGAAAACCCAGGCGATTGATCTGAT~~~AAATT
|| |||| ||| || || ||||| ||| | ||||||||||| ||||| | |||||| ||~~~|||||
Hit - GCTACCAGCTACGCTACTGAAGCAATGAACAGCCTGAAAACTCAGGCAACTGATCTCAT~~~AAATT
⇒ Parse PDB Entry with Bio.PDB.MMCIFParser.parser Module
⇐ Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()
2023-05-09, 735🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1000267 SMILES: N[17C@@](F)([18C])C(=[19 O])OReceived at FYIcenter.com on:...
Molecule Summary: ID: FYI-1001325 SMILES: CCC(C)CCCC(=O)N[C@@H](CC N)C(=O)N[C@@H]([C@@H](C) O)C(=O)N[C...
What is ebi.ac.uk ChEMBL Compound Database? ebi.ac.uk ChEMBL Compound Database contains about 2,000,...
Molecule Summary: ID: FYI-1001913 SMILES: S[C@@H](C(N[C@H]1C(N2[C@ @H](CCC[C@@H]2SCC1)C(OC) =O)=O)=O)C...
Molecule Summary: ID: FYI-1000311 SMILES: Cc1ccc2c(c1)NC(=O)[C@H]( O2)C(=O)NCC(C)CReceived at FYIcen...