Collections:
Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()
How to Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()?
✍: FYIcenter.com
The function qblast() in the Bio.Blast.NCBIWWW module allows you
to call the online version of BLAST to fetch DNA or protein sequences
from https://blast.ncbi.nlm.nih.gov/Blast.cgi.
Currently the qblast() function only works with 5 BLAST online programs: blastn, blastp, blastx, tblast and tblastx. Each program supports a set of databases. For example, "nt" is a database under the "blastn" program.
1. Try the following code to query DNA sequences against a given GI number of 8332116.
fyicenter$ python
>>> from Bio.Blast import NCBIWWW
>>> result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")
2. Write the query result to a local file.
>>> with open("my_blast.xml", "w") as out_handle:
... out_handle.write(result_handle.read())
...
138648
>>> result_handle.close()
3. Open the local file, parse and convert it into a list of BLAST record object.
>>> from Bio.Blast import NCBIXML
>>> result_handle = open("my_blast.xml")
>>> blast_records = NCBIXML.parse(result_handle)
>>> blast_records = list(blast_records)
>>> len(blast_records)
1
4. Review properties in the BLAST record.
>>> blast_record = blast_records[0] >>> blast_record.__dict__.keys() dict_keys(['application', 'version', 'date', 'reference', 'query', 'query_letters', 'database', 'database_sequences', 'database_letters', 'database_name', 'posted_date', 'num_letters_in_database', 'num_sequences_in_database', 'ka_params', 'gapped', 'ka_params_gap', 'matrix', 'gap_penalties', 'sc_match', 'sc_mismatch', 'num_hits', 'num_sequences', 'num_good_extends', 'num_seqs_better_e', 'hsps_no_gap', 'hsps_prelim_gapped', 'hsps_prelim_gapped_attemped', 'hsps_gapped', 'query_id', 'query_length', 'database_length', 'effective_hsp_length', 'effective_query_length', 'effective_database_length', 'effective_search_space', 'effective_search_space_used', 'frameshift', 'threshold', 'window_size', 'dropoff_1st_pass', 'gap_x_dropoff', 'gap_x_dropoff_final', 'gap_trigger', 'blast_cutoff', 'descriptions', 'alignments', 'multiple_alignment', 'filter', 'expect']) >>> blast_record.application 'BLASTN' >>> blast_record.version '2.13.0+' >>> blast_record.database 'nt' >>> blast_record.query_id 'BE037100.1' >>> blast_record.query "MP14H09 MP Mesembryanthemum crystallinum cDNA 5' similar to cold acclimation protein, mRNA sequence" >>> blast_record.reference 'Stephen F. Altschul, Thomas L. Madden, ...'
4. Review HSPs (High-Scoring Pairs) as alignments in the BLAST record.
>>> print(blast_record.multiple_alignment)
None
>>> alignments = blast_record.alignments
>>> len(alignments)
50
>>> alignment = alignments[0]
>>> print(alignment)
gi|1219041180|ref|XM_021875076.1| PREDICTED: Chenopodium quinoa cold-regulated 413 plasma membrane protein 2-like (LOC110697660), mRNA
Length = 1173
>>> hsps = alignment.hsps
>>> len(hsps)
1
>>> print(hsps[0])
Score 482 (435 bits), expectation 6.5e-117, alignment length 624
Query: 59 ACAGAAAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGA...GTA 678
|| ||||||||| |||| | |||| || |||| |||| | ||||... ||
Sbjct: 278 ACCGAAAATGGGCAGAGGAGTGAATTATATGGCAATGACACCTGA...TTA 901
>>> print(alignments[49].hsps[0])
Score 355 (321 bits), expectation 3.6e-82, alignment length 601
Query: 56 TGAACAGAAAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAAC...CTG 655
|||| |||||||||||| ||| |||| |||| ||||||||...|||
Sbjct: 59 TGAAACGAAAATGGGGAGG---ATGGAGTATCTGGCTATGAAAAC...CTG 652
Note that Biopython 1.80 on macOS gives an error when calling the qblast() function. You can put a debug statement in NCBIWWW.py at line 226 to figure out why.
>>> import Bio
>>> Bio.1.01
'1.80'
>>> from Bio.Blast import NCBIWWW
>>> result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/Bio/Blast/NCBIWWW.py",
line 226, in qblast
handle = urlopen(request)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
...
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED]
certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108)>
⇒ Use Bio.SearchIO Module to Parse BLAST XML Result
⇐ Calculate Substitutions in Alignments
2023-05-09, 826🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1000384 SMILES: C/C(C)=C\\CC/C(C)=C/CC/C (C)=C/CCC(C)=OReceived at FYIcent...
Molecule Summary: ID: FYI-1004441 Names: InChIKey: NYPJDWWKZLNGGM-UHFFFAOYS A-NSMILES: CC(C)C(C(=O)O...
Molecule Summary: ID: FYI-1000176 SMILES: COCC(=O)Cc1ccc(CC#N)cc1 Received at FYIcenter.com on: 2020...
Molecule Summary: ID: FYI-1014448 Names: InChIKey: FNBNJFFUNDJRAY-UHFFFAOYS A-NSMILES: Cc2cc(C)c(C[P...
Molecule Summary: ID: FYI-1002261 Names: InChIKey: ABPJHZXPRXSEKC-UHFFFAOYS A-NSMILES: O=C(C3=C(c1cc...