Collections:
Single Sequence Record in GenBank Format
How to read a Single Sequence Record in GenBank Format?
✍: FYIcenter.com
The GenBank format for DNA or protein sequences
contains more properties and a better structure that FASTA format.
You can follow these steps to download GenBank file example
and create a Bio.SeqRecord object.
1. Download an example of a Sequence Record in GenBank Format.
fyicenter$ wget https://raw.githubusercontent.com/biopython/biopython/master/Tests/GenBank/NC_005816.gb -rw-r--r--. 1 fyicenter staff 31838 Jan 27 23:55 NC_005816.gb
2. View the GenBank sequence file.
fyicenter$ more NC_005816.gb
LOCUS NC_005816 9609 bp DNA circular BCT 21-JUL-2008
DEFINITION Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete
sequence.
ACCESSION NC_005816
VERSION NC_005816.1 GI:45478711
DBLINK Project: 58037
KEYWORDS .
SOURCE Yersinia pestis biovar Microtus str. 91001
ORGANISM Yersinia pestis biovar Microtus str. 91001
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
Enterobacteriaceae; Yersinia.
REFERENCE 1 (bases 1 to 9609)
AUTHORS Zhou,D., Tong,Z., Song,Y., Han,Y., Pei,D., Pang,X., Zhai,J., Li,M.,
Cui,B., Qi,Z., Jin,L., Dai,R., Du,Z., Wang,J., Guo,Z., Wang,J.,
Huang,P. and Yang,R.
TITLE Genetics of metabolic variations between Yersinia pestis biovars
and the proposal of a new biovar, microtus
JOURNAL J. Bacteriol. 186 (15), 5147-5152 (2004)
PUBMED 15262951
REFERENCE 2 (bases 1 to 9609)
AUTHORS Song,Y., Tong,Z., Wang,J., Wang,L., Guo,Z., Han,Y., Zhang,J.,
Pei,D., Zhou,D., Qin,H., Pang,X., Han,Y., Zhai,J., Li,M., Cui,B.,
Qi,Z., Jin,L., Dai,R., Chen,F., Li,S., Ye,C., Du,Z., Lin,W.,
Wang,J., Yu,J., Yang,H., Wang,J., Huang,P. and Yang,R.
TITLE Complete genome sequence of Yersinia pestis strain 91001, an
isolate avirulent to humans
JOURNAL DNA Res. 11 (3), 179-197 (2004)
PUBMED 15368893
REFERENCE 3 (bases 1 to 9609)
CONSRTM NCBI Genome Project
TITLE Direct Submission
JOURNAL Submitted (16-MAR-2004) National Center for Biotechnology
Information, NIH, Bethesda, MD 20894, USA
REFERENCE 4 (bases 1 to 9609)
AUTHORS Song,Y., Tong,Z., Wang,L., Han,Y., Zhang,J., Pei,D., Wang,J.,
Zhou,D., Han,Y., Pang,X., Zhai,J., Chen,F., Qin,H., Wang,J., Li,S.,
Guo,Z., Ye,C., Du,Z., Lin,W., Wang,J., Yu,J., Yang,H., Wang,J.,
Huang,P. and Yang,R.
TITLE Direct Submission
JOURNAL Submitted (24-APR-2003) The Institute of Microbiology and
Epidemiology, Academy of Military Medical Sciences, No. 20,
Dongdajie Street, Fengtai District, Beijing 100071, People's
Republic of China
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from AE017046.
COMPLETENESS: full length.
FEATURES Location/Qualifiers
source 1..9609
/organism="Yersinia pestis biovar Microtus str. 91001"
/mol_type="genomic DNA"
/strain="91001"
/db_xref="taxon:229193"
/plasmid="pPCP1"
/biovar="Microtus"
repeat_region 1..1954
gene 87..1109
/locus_tag="YP_pPCP01"
/db_xref="GeneID:2767718"
...
variation 8529^8530
/note="compared to AL109969"
/replace="tt"
ORIGIN
1 tgtaacgaac ggtgcaatag tgatccacac ccaacgcctg aaatcagatc cagggggtaa
61 tctgctctcc tgattcagga gagtttatgg tcacttttga gacagttatg gaaattaaaa
...
9541 aaaataaaaa tgtgacatcg caatgccaga taatattgac gcatgaggga atgcgtaccc
9601 cgacccctg
//
3. Create a Bio.SeqRecord object with the SeqIO.read() function.
fyicenter$ python
>>> from Bio import SeqIO
>>> record = SeqIO.read("NC_005816.gb", "genbank")
>>> print(record)
ID: NC_005816.1
Name: NC_005816
Description: Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence
Database cross-references: Project:58037
Number of features: 41
/molecule_type=DNA
/topology=circular
/data_file_division=BCT
/date=21-JUL-2008
/accessions=['NC_005816']
/sequence_version=1
/gi=45478711
/keywords=['']
/source=Yersinia pestis biovar Microtus str. 91001
/organism=Yersinia pestis biovar Microtus str. 91001
/taxonomy=['Bacteria', 'Proteobacteria', 'Gammaproteobacteria', 'Enterobacteriales', ....
/references=[Reference(title='Genetics of metabolic variations between Yersinia ...
/comment=PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from AE017046.
COMPLETENESS: full length.
Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG')
4. Look at the first 3 features stored in the Bio.SeqRecord object.
>>> print(record.features[0])
type: source
location: [0:9609](+)
qualifiers:
Key: biovar, Value: ['Microtus']
Key: db_xref, Value: ['taxon:229193']
Key: mol_type, Value: ['genomic DNA']
Key: organism, Value: ['Yersinia pestis biovar Microtus str. 91001']
Key: plasmid, Value: ['pPCP1']
Key: strain, Value: ['91001']
>>> print(record.features[1])
type: repeat_region
location: [0:1954](+)
qualifiers:
>>> print(record.features[2])
type: gene
location: [86:1109](+)
qualifiers:
Key: db_xref, Value: ['GeneID:2767718']
Key: locus_tag, Value: ['YP_pPCP01']
⇒ Play with the ls_orchid.fasta File
⇐ Single Sequence Record in FASTA Format
2023-04-04, 801🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1003178 Names: InChIKey: PMMURAAUARKVCB-CERMHHMHS A-NSMILES: OC[C@H]1OC(O)...
Molecule Summary: ID: FYI-1000296 SMILES: O[Bi]1OC(=O)c2ccccc2O1 Received at FYIcenter.com on: 2021-...
Molecule Summary: ID: FYI-1002992 Names: InChIKey: JVGYVDSLCBTVDC-UHFFFAOYS A-NSMILES: Cc5ccc(n1nccn...
Molecule Summary: ID: FYI-1002246 Names: InChIKey: XGGAAEOHUTWLTC-QZYMEUQES A-LSMILES: Cc1ccccc1/N=C...
Molecule Summary: ID: FYI-1001096 SMILES: C1=CC=CN=C1 Received at FYIcenter.com on: 2021-12-24