Single Sequence Record in GenBank Format

Q

How to read a Single Sequence Record in GenBank Format?

✍: FYIcenter.com

A

The GenBank format for DNA or protein sequences contains more properties and a better structure that FASTA format. You can follow these steps to download GenBank file example and create a Bio.SeqRecord object.

1. Download an example of a Sequence Record in GenBank Format.

fyicenter$ wget https://raw.githubusercontent.com/biopython/biopython/master/Tests/GenBank/NC_005816.gb

-rw-r--r--. 1 fyicenter staff 31838 Jan 27 23:55 NC_005816.gb

2. View the GenBank sequence file.

fyicenter$ more NC_005816.gb

LOCUS       NC_005816               9609 bp    DNA     circular BCT 21-JUL-2008
DEFINITION  Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete
            sequence.
ACCESSION   NC_005816
VERSION     NC_005816.1  GI:45478711
DBLINK      Project: 58037
KEYWORDS    .
SOURCE      Yersinia pestis biovar Microtus str. 91001
  ORGANISM  Yersinia pestis biovar Microtus str. 91001
            Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
            Enterobacteriaceae; Yersinia.
REFERENCE   1  (bases 1 to 9609)
  AUTHORS   Zhou,D., Tong,Z., Song,Y., Han,Y., Pei,D., Pang,X., Zhai,J., Li,M.,
            Cui,B., Qi,Z., Jin,L., Dai,R., Du,Z., Wang,J., Guo,Z., Wang,J.,
            Huang,P. and Yang,R.
  TITLE     Genetics of metabolic variations between Yersinia pestis biovars
            and the proposal of a new biovar, microtus
  JOURNAL   J. Bacteriol. 186 (15), 5147-5152 (2004)
   PUBMED   15262951
REFERENCE   2  (bases 1 to 9609)
  AUTHORS   Song,Y., Tong,Z., Wang,J., Wang,L., Guo,Z., Han,Y., Zhang,J.,
            Pei,D., Zhou,D., Qin,H., Pang,X., Han,Y., Zhai,J., Li,M., Cui,B.,
            Qi,Z., Jin,L., Dai,R., Chen,F., Li,S., Ye,C., Du,Z., Lin,W.,
            Wang,J., Yu,J., Yang,H., Wang,J., Huang,P. and Yang,R.
  TITLE     Complete genome sequence of Yersinia pestis strain 91001, an
            isolate avirulent to humans
  JOURNAL   DNA Res. 11 (3), 179-197 (2004)
   PUBMED   15368893
REFERENCE   3  (bases 1 to 9609)
  CONSRTM   NCBI Genome Project
  TITLE     Direct Submission
  JOURNAL   Submitted (16-MAR-2004) National Center for Biotechnology
            Information, NIH, Bethesda, MD 20894, USA
REFERENCE   4  (bases 1 to 9609)
  AUTHORS   Song,Y., Tong,Z., Wang,L., Han,Y., Zhang,J., Pei,D., Wang,J.,
            Zhou,D., Han,Y., Pang,X., Zhai,J., Chen,F., Qin,H., Wang,J., Li,S.,
            Guo,Z., Ye,C., Du,Z., Lin,W., Wang,J., Yu,J., Yang,H., Wang,J.,
            Huang,P. and Yang,R.
  TITLE     Direct Submission
  JOURNAL   Submitted (24-APR-2003) The Institute of Microbiology and
            Epidemiology, Academy of Military Medical Sciences, No. 20,
            Dongdajie Street, Fengtai District, Beijing 100071, People's
            Republic of China
COMMENT     PROVISIONAL REFSEQ: This record has not yet been subject to final
            NCBI review. The reference sequence was derived from AE017046.
            COMPLETENESS: full length.
FEATURES             Location/Qualifiers
     source          1..9609
                     /organism="Yersinia pestis biovar Microtus str. 91001"
                     /mol_type="genomic DNA"
                     /strain="91001"
                     /db_xref="taxon:229193"
                     /plasmid="pPCP1"
                     /biovar="Microtus"
     repeat_region   1..1954
     gene            87..1109
                     /locus_tag="YP_pPCP01"
                     /db_xref="GeneID:2767718"
     ...
     variation       8529^8530
                     /note="compared to AL109969"
                     /replace="tt"
ORIGIN      
        1 tgtaacgaac ggtgcaatag tgatccacac ccaacgcctg aaatcagatc cagggggtaa
       61 tctgctctcc tgattcagga gagtttatgg tcacttttga gacagttatg gaaattaaaa
      ...
     9541 aaaataaaaa tgtgacatcg caatgccaga taatattgac gcatgaggga atgcgtaccc
     9601 cgacccctg
//

3. Create a Bio.SeqRecord object with the SeqIO.read() function.

fyicenter$ python 
>>> from Bio import SeqIO

>>> record = SeqIO.read("NC_005816.gb", "genbank")
>>> print(record)
ID: NC_005816.1
Name: NC_005816
Description: Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence
Database cross-references: Project:58037
Number of features: 41
/molecule_type=DNA
/topology=circular
/data_file_division=BCT
/date=21-JUL-2008
/accessions=['NC_005816']
/sequence_version=1
/gi=45478711
/keywords=['']
/source=Yersinia pestis biovar Microtus str. 91001
/organism=Yersinia pestis biovar Microtus str. 91001
/taxonomy=['Bacteria', 'Proteobacteria', 'Gammaproteobacteria', 'Enterobacteriales', ....
/references=[Reference(title='Genetics of metabolic variations between Yersinia ...
/comment=PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from AE017046.
COMPLETENESS: full length.
Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG')

4. Look at the first 3 features stored in the Bio.SeqRecord object.

>>> print(record.features[0])
type: source
location: [0:9609](+)
qualifiers:
    Key: biovar, Value: ['Microtus']
    Key: db_xref, Value: ['taxon:229193']
    Key: mol_type, Value: ['genomic DNA']
    Key: organism, Value: ['Yersinia pestis biovar Microtus str. 91001']
    Key: plasmid, Value: ['pPCP1']
    Key: strain, Value: ['91001']

>>> print(record.features[1])
type: repeat_region
location: [0:1954](+)
qualifiers:

>>> print(record.features[2])
type: gene
location: [86:1109](+)
qualifiers:
    Key: db_xref, Value: ['GeneID:2767718']
    Key: locus_tag, Value: ['YP_pPCP01']

 

Play with the ls_orchid.fasta File

Single Sequence Record in FASTA Format

Biopython - Tools for Biological Computation

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-04-04, 322🔥, 0💬