Single Sequence Record in FASTA Format

Q

How to read a Single Sequence Record in FASTA Format?

✍: FYIcenter.com

A

If you want to store additional information to a DNA or protein sequence, you can use the Bio.SeqRecord class, which contains the following properties:

  • seq – The sequence itself as a Seq object.
  • id – The primary ID used to identify the sequence.
  • name – A “common” name for the sequence.
  • description – A human readable description for the sequence.
  • letter annotations – Holds per-letter-annotations using a dictionary of additional information about the letters in the sequence.
  • annotations – A dictionary of additional information about the sequence.
  • features – A list of SeqFeature objects with more structured information about the features on the sequence.
  • dbxrefs - A list of database cross-references for the sequence.

We can download an example of a Sequence Record in FASTA Format.

fyicenter$ wget https://raw.githubusercontent.com/biopython/biopython/master/Tests/GenBank/NC_005816.fna

-rw-r--r--. 1 fyicenter staff  9853 Jan 27 23:55 NC_005816.fna

Then we can create a Bio.SeqRecord object with the SeqIO.read() function.

fyicenter$ python 
>>> from Bio import SeqIO

>>> record = SeqIO.read("NC_005816.fna", "fasta")
>>> print(record)
ID: gi|45478711|ref|NC_005816.1|
Name: gi|45478711|ref|NC_005816.1|
Description: gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence
Number of features: 0
Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG')

As you can see, the FASTA format does not provide enough properties and a good structure for Biopython to parse from.

 

Single Sequence Record in GenBank Format

What Are Translation Tables

Biopython - Tools for Biological Computation

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-04-04, 294🔥, 0💬