Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()

Q

How to Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()?

✍: FYIcenter.com

A

The function qblast() in the Bio.Blast.NCBIWWW module allows you to call the online version of BLAST to fetch DNA or protein sequences from https://blast.ncbi.nlm.nih.gov/Blast.cgi.

Currently the qblast() function only works with 5 BLAST online programs: blastn, blastp, blastx, tblast and tblastx. Each program supports a set of databases. For example, "nt" is a database under the "blastn" program.

1. Try the following code to query DNA sequences against a given GI number of 8332116.

fyicenter$ python
>>> from Bio.Blast import NCBIWWW
>>> result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")

2. Write the query result to a local file.

>>> with open("my_blast.xml", "w") as out_handle:
...   out_handle.write(result_handle.read())
...
138648
>>> result_handle.close()

3. Open the local file, parse and convert it into a list of BLAST record object.

>>> from Bio.Blast import NCBIXML
>>> result_handle = open("my_blast.xml")
>>> blast_records = NCBIXML.parse(result_handle)
>>> blast_records = list(blast_records)
>>> len(blast_records)
1

4. Review properties in the BLAST record.

>>> blast_record = blast_records[0]
>>> blast_record.__dict__.keys()
dict_keys(['application', 'version', 'date', 'reference', 'query',
  'query_letters', 'database', 'database_sequences', 'database_letters',
  'database_name', 'posted_date', 'num_letters_in_database',
  'num_sequences_in_database', 'ka_params', 'gapped', 'ka_params_gap',
  'matrix', 'gap_penalties', 'sc_match', 'sc_mismatch', 'num_hits',
  'num_sequences', 'num_good_extends', 'num_seqs_better_e', 'hsps_no_gap',
  'hsps_prelim_gapped', 'hsps_prelim_gapped_attemped', 'hsps_gapped',
  'query_id', 'query_length', 'database_length', 'effective_hsp_length',
  'effective_query_length', 'effective_database_length',
  'effective_search_space', 'effective_search_space_used', 'frameshift',
  'threshold', 'window_size', 'dropoff_1st_pass', 'gap_x_dropoff',
  'gap_x_dropoff_final', 'gap_trigger', 'blast_cutoff', 'descriptions',
  'alignments', 'multiple_alignment', 'filter', 'expect'])

>>> blast_record.application
'BLASTN'
>>> blast_record.version
'2.13.0+'
>>> blast_record.database
'nt'

>>> blast_record.query_id
'BE037100.1'

>>> blast_record.query
"MP14H09 MP Mesembryanthemum crystallinum cDNA 5' similar to cold acclimation protein, mRNA sequence"

>>> blast_record.reference
'Stephen F. Altschul, Thomas L. Madden, ...'

4. Review HSPs (High-Scoring Pairs) as alignments in the BLAST record.

>>> print(blast_record.multiple_alignment)
None

>>> alignments = blast_record.alignments
>>> len(alignments)
50

>>> alignment = alignments[0]
>>> print(alignment)
gi|1219041180|ref|XM_021875076.1| PREDICTED: Chenopodium quinoa cold-regulated 413 plasma membrane protein 2-like (LOC110697660), mRNA
           Length = 1173

>>> hsps = alignment.hsps
>>> len(hsps)
1

>>> print(hsps[0])
Score 482 (435 bits), expectation 6.5e-117, alignment length 624
Query:      59 ACAGAAAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGA...GTA 678
               || ||||||||| |||| | |||| ||  |||| |||| | ||||... ||
Sbjct:     278 ACCGAAAATGGGCAGAGGAGTGAATTATATGGCAATGACACCTGA...TTA 901

>>> print(alignments[49].hsps[0])
Score 355 (321 bits), expectation 3.6e-82, alignment length 601
Query:      56 TGAACAGAAAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAAC...CTG 655
               ||||  ||||||||||||    ||| ||||  |||| ||||||||...|||
Sbjct:      59 TGAAACGAAAATGGGGAGG---ATGGAGTATCTGGCTATGAAAAC...CTG 652

Note that Biopython 1.80 on macOS gives an error when calling the qblast() function. You can put a debug statement in NCBIWWW.py at line 226 to figure out why.

>>> import Bio
>>> Bio.1.01
'1.80'

>>> from Bio.Blast import NCBIWWW
>>> result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/Bio/Blast/NCBIWWW.py", 
    line 226, in qblast
    handle = urlopen(request)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  ...
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] 
  certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108)>

 

Use Bio.SearchIO Module to Parse BLAST XML Result

Calculate Substitutions in Alignments

Biopython - Tools for Biological Computation

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-05-09, 335🔥, 0💬