Search History with Bio.Entrez for Subsequent Calls

Q

How to Use Search History with Bio.Entrez for Subsequent Calls?

✍: FYIcenter.com

A

If use Bio.Entrez.esearch() and found a large number of matches, you can use the history feature to retrieve matched records in multiple sequence Bio.Entrez.efetch() calls.

1. Turn on the history feature in the esearch() call with the usehistory="y" option.

fyicenter$ python 
>>> from Bio import Entrez
>>> Entrez.email = "A.N.Other@example.com"

>>> search_handle = Entrez.esearch(
...   db="nucleotide", term="Homo sapiens[orgn] AND CASP3", usehistory="y", idtype="acc"
... )
>>> search_results = Entrez.read(search_handle)
>>> search_handle.close()

2. Retrieve Count, WebEnv and QueryKey from the search result.

>>> count = int(search_results["Count"])
>>> count
71

>>> webenv = search_results["WebEnv"]
>>> webenv
'MCID_63dae995f26fc24c5a7eba33'

>>> query_key = search_results["QueryKey"]
>>> query_key
'1'

3. Use Count, WebEnv and QueryKey in follow up efetch() calls.

>>> batch_size = 3
>>> out_handle = open("homo_casp3.fasta", "w")
>>> for start in range(0, count, batch_size):
...   end = min(count, start + batch_size)
...   print("Going to download record %i to %i" % (start + 1, end))
...   fetch_handle = Entrez.efetch(
...     db="nucleotide",
...     rettype="fasta",
...     retmode="text",
...     retstart=start,
...     retmax=batch_size,
...     webenv=webenv,
...     query_key=query_key,
...     idtype="acc",
...   )
...   data = fetch_handle.read()
...   fetch_handle.close()
...   out_handle.write(data)
... 

Going to download record 1 to 3
24130
Going to download record 4 to 6
28575
...

>>> out_handle.close()

 

Fetch Sequences from SwissProt with Bio.ExPASy.get_sprot_raw()

Global Query on All NCBI Databases with Bio.Entrez.egquery()

Biopython - Tools for Biological Computation

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-09-10, 376🔥, 0💬