Collections:
Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()
How to Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()?
✍: FYIcenter.com
The function qblast() in the Bio.Blast.NCBIWWW module allows you to call the online version of BLAST to fetch DNA or protein sequences from https://blast.ncbi.nlm.nih.gov/Blast.cgi.
Currently the qblast() function only works with 5 BLAST online programs: blastn, blastp, blastx, tblast and tblastx. Each program supports a set of databases. For example, "nt" is a database under the "blastn" program.
1. Try the following code to query DNA sequences against a given GI number of 8332116.
fyicenter$ python >>> from Bio.Blast import NCBIWWW >>> result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")
2. Write the query result to a local file.
>>> with open("my_blast.xml", "w") as out_handle: ... out_handle.write(result_handle.read()) ... 138648 >>> result_handle.close()
3. Open the local file, parse and convert it into a list of BLAST record object.
>>> from Bio.Blast import NCBIXML >>> result_handle = open("my_blast.xml") >>> blast_records = NCBIXML.parse(result_handle) >>> blast_records = list(blast_records) >>> len(blast_records) 1
4. Review properties in the BLAST record.
>>> blast_record = blast_records[0] >>> blast_record.__dict__.keys() dict_keys(['application', 'version', 'date', 'reference', 'query', 'query_letters', 'database', 'database_sequences', 'database_letters', 'database_name', 'posted_date', 'num_letters_in_database', 'num_sequences_in_database', 'ka_params', 'gapped', 'ka_params_gap', 'matrix', 'gap_penalties', 'sc_match', 'sc_mismatch', 'num_hits', 'num_sequences', 'num_good_extends', 'num_seqs_better_e', 'hsps_no_gap', 'hsps_prelim_gapped', 'hsps_prelim_gapped_attemped', 'hsps_gapped', 'query_id', 'query_length', 'database_length', 'effective_hsp_length', 'effective_query_length', 'effective_database_length', 'effective_search_space', 'effective_search_space_used', 'frameshift', 'threshold', 'window_size', 'dropoff_1st_pass', 'gap_x_dropoff', 'gap_x_dropoff_final', 'gap_trigger', 'blast_cutoff', 'descriptions', 'alignments', 'multiple_alignment', 'filter', 'expect']) >>> blast_record.application 'BLASTN' >>> blast_record.version '2.13.0+' >>> blast_record.database 'nt' >>> blast_record.query_id 'BE037100.1' >>> blast_record.query "MP14H09 MP Mesembryanthemum crystallinum cDNA 5' similar to cold acclimation protein, mRNA sequence" >>> blast_record.reference 'Stephen F. Altschul, Thomas L. Madden, ...'
4. Review HSPs (High-Scoring Pairs) as alignments in the BLAST record.
>>> print(blast_record.multiple_alignment) None >>> alignments = blast_record.alignments >>> len(alignments) 50 >>> alignment = alignments[0] >>> print(alignment) gi|1219041180|ref|XM_021875076.1| PREDICTED: Chenopodium quinoa cold-regulated 413 plasma membrane protein 2-like (LOC110697660), mRNA Length = 1173 >>> hsps = alignment.hsps >>> len(hsps) 1 >>> print(hsps[0]) Score 482 (435 bits), expectation 6.5e-117, alignment length 624 Query: 59 ACAGAAAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGA...GTA 678 || ||||||||| |||| | |||| || |||| |||| | ||||... || Sbjct: 278 ACCGAAAATGGGCAGAGGAGTGAATTATATGGCAATGACACCTGA...TTA 901 >>> print(alignments[49].hsps[0]) Score 355 (321 bits), expectation 3.6e-82, alignment length 601 Query: 56 TGAACAGAAAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAAC...CTG 655 |||| |||||||||||| ||| |||| |||| ||||||||...||| Sbjct: 59 TGAAACGAAAATGGGGAGG---ATGGAGTATCTGGCTATGAAAAC...CTG 652
Note that Biopython 1.80 on macOS gives an error when calling the qblast() function. You can put a debug statement in NCBIWWW.py at line 226 to figure out why.
>>> import Bio >>> Bio.1.01 '1.80' >>> from Bio.Blast import NCBIWWW >>> result_handle = NCBIWWW.qblast("blastn", "nt", "8332116") Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/Bio/Blast/NCBIWWW.py", line 226, in qblast handle = urlopen(request) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) ... urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108)>
⇒ Use Bio.SearchIO Module to Parse BLAST XML Result
⇐ Calculate Substitutions in Alignments
2023-05-09, 335🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1000462 SMILES: C1[C@@H]([C@H](C2(O1)C3[ C@@]45C[C@@](O2)([C@]6([ C@@]([C@@H...
What are tools that support the SDF/Mol V3000 file format? There are a number of online or standalon...
How stereoinformation is being written to output data by Open Babel? Stereoinformation is being writ...
Where to find FAQ (Frequently Asked Questions) on PK (Pharmacokinetic) Modeling Tools? Here is a lis...
Molecule Summary: ID: FYI-1002890 Names: InChIKey: DZABBYYPQHZLHW-UHFFFAOYS A-NSMILES: NS(=O)(=O)c2c...