Collections:
Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()
How to Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()?
✍: FYIcenter.com
The function qblast() in the Bio.Blast.NCBIWWW module allows you
to call the online version of BLAST to fetch DNA or protein sequences
from https://blast.ncbi.nlm.nih.gov/Blast.cgi.
Currently the qblast() function only works with 5 BLAST online programs: blastn, blastp, blastx, tblast and tblastx. Each program supports a set of databases. For example, "nt" is a database under the "blastn" program.
1. Try the following code to query DNA sequences against a given GI number of 8332116.
fyicenter$ python >>> from Bio.Blast import NCBIWWW >>> result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")
2. Write the query result to a local file.
>>> with open("my_blast.xml", "w") as out_handle: ... out_handle.write(result_handle.read()) ... 138648 >>> result_handle.close()
3. Open the local file, parse and convert it into a list of BLAST record object.
>>> from Bio.Blast import NCBIXML >>> result_handle = open("my_blast.xml") >>> blast_records = NCBIXML.parse(result_handle) >>> blast_records = list(blast_records) >>> len(blast_records) 1
4. Review properties in the BLAST record.
>>> blast_record = blast_records[0] >>> blast_record.__dict__.keys() dict_keys(['application', 'version', 'date', 'reference', 'query', 'query_letters', 'database', 'database_sequences', 'database_letters', 'database_name', 'posted_date', 'num_letters_in_database', 'num_sequences_in_database', 'ka_params', 'gapped', 'ka_params_gap', 'matrix', 'gap_penalties', 'sc_match', 'sc_mismatch', 'num_hits', 'num_sequences', 'num_good_extends', 'num_seqs_better_e', 'hsps_no_gap', 'hsps_prelim_gapped', 'hsps_prelim_gapped_attemped', 'hsps_gapped', 'query_id', 'query_length', 'database_length', 'effective_hsp_length', 'effective_query_length', 'effective_database_length', 'effective_search_space', 'effective_search_space_used', 'frameshift', 'threshold', 'window_size', 'dropoff_1st_pass', 'gap_x_dropoff', 'gap_x_dropoff_final', 'gap_trigger', 'blast_cutoff', 'descriptions', 'alignments', 'multiple_alignment', 'filter', 'expect']) >>> blast_record.application 'BLASTN' >>> blast_record.version '2.13.0+' >>> blast_record.database 'nt' >>> blast_record.query_id 'BE037100.1' >>> blast_record.query "MP14H09 MP Mesembryanthemum crystallinum cDNA 5' similar to cold acclimation protein, mRNA sequence" >>> blast_record.reference 'Stephen F. Altschul, Thomas L. Madden, ...'
4. Review HSPs (High-Scoring Pairs) as alignments in the BLAST record.
>>> print(blast_record.multiple_alignment) None >>> alignments = blast_record.alignments >>> len(alignments) 50 >>> alignment = alignments[0] >>> print(alignment) gi|1219041180|ref|XM_021875076.1| PREDICTED: Chenopodium quinoa cold-regulated 413 plasma membrane protein 2-like (LOC110697660), mRNA Length = 1173 >>> hsps = alignment.hsps >>> len(hsps) 1 >>> print(hsps[0]) Score 482 (435 bits), expectation 6.5e-117, alignment length 624 Query: 59 ACAGAAAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGA...GTA 678 || ||||||||| |||| | |||| || |||| |||| | ||||... || Sbjct: 278 ACCGAAAATGGGCAGAGGAGTGAATTATATGGCAATGACACCTGA...TTA 901 >>> print(alignments[49].hsps[0]) Score 355 (321 bits), expectation 3.6e-82, alignment length 601 Query: 56 TGAACAGAAAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAAC...CTG 655 |||| |||||||||||| ||| |||| |||| ||||||||...||| Sbjct: 59 TGAAACGAAAATGGGGAGG---ATGGAGTATCTGGCTATGAAAAC...CTG 652
Note that Biopython 1.80 on macOS gives an error when calling the qblast() function. You can put a debug statement in NCBIWWW.py at line 226 to figure out why.
>>> import Bio >>> Bio.1.01 '1.80' >>> from Bio.Blast import NCBIWWW >>> result_handle = NCBIWWW.qblast("blastn", "nt", "8332116") Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/Bio/Blast/NCBIWWW.py", line 226, in qblast handle = urlopen(request) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) ... urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108)>
⇒ Use Bio.SearchIO Module to Parse BLAST XML Result
⇐ Calculate Substitutions in Alignments
2023-05-09, 564🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1002867 Names: InChIKey: QBFWPSYDISFIKC-UHFFFAOYS A-NSMILES: NS(=O)(=O)c2c...
Molecule Summary: ID: FYI-1003992 Names: InChIKey: BZURXJCIAYREFD-UHFFFAOYS A-NSMILES: CCN(CC)c4ccc(...
Molecule Summary: ID: FYI-1006907 Names: InChIKey: NROLLRKCRCVSPY-IZZDOVSWS A-NSMILES: CC(=O)OCC(C)C...
Molecule Summary: ID: FYI-1006587 Names: InChIKey: YQUQWHNMBPIWGK-UHFFFAOYS A-NSMILES: CC(C)c1ccc(O)...
Molecule Summary: ID: FYI-1000353 SMILES: Cc1cn(c(=O)[nH]c1=O)[C@H ]2C[C@H]([C@@H](O2)CO)OReceived a...