Nucleic Acid Sequence Databases
Nucleic Acid Sequence Databases
Introduction to Nucleic Acid Sequence Databases
- There are three major sites for finding information about
nucleic acids (DNA and/or RNA sequences) on the Web, and all of them contain
basically the same information. The methods and databases that you
will want to use will depend mainly on how much data you want and
in what form.
- GenBank is
your best bet for most sequence searches; it is updated daily, has
detailed online help, and lets you do keyword searches of an organism's or
enzyme's name to get sequence information. This service can be very slow
during peak hours, however.
- EMBL (the
European Molecular Biology Laboratory) is a flat-file database that isn't
quite as easy to use as GenBank, and is usually slow for people in North
America since it's based in Europe, but can be useful if you're looking
for a limited amount of data and when you are not trying to identify a
gene by sequence analysis.
- DDBJ (the DNA Databank of
Japan) is hard for beginners to use, but it is best for people who would
prefer a Japanese-language interface.
- Within GenBank and similar databases, use
Local Alignment Search Tool) if you wish to find what sequences
are similar to a sequence that you already have. If you want to
locate Expressed Sequence Tags ("single-pass" cDNA sequences), use NCBI's
dbEST; if you
want to locate Sequence Tagged sites, use dbSTS.
- Another option is Entrez, which
lets you do keyword searches to retrieve citations and records in the
area of molecular biology from the databases of the National Center for Biotechnology
Information and nucleotide sequences (in both text and graphical format)
About online nucleic acid databases
For most sequence searches, GenBank is your best bet. It offers
a daily exchange of information with other major sequence databases,
has a variety of user interfaces, fairly detailed online help (with
e-mail addresses for more information if what is already available is
not sufficient), and a speedy interface. Because of its popularity,
however, GenBank can also be very slow during peak research hours.
Very detailed searches or searches with massive amounts of output
might be completed more quickly after hours.
Established by the
National Center for Biotechnology Information (NCBI), GenBank is a
collection of all known DNA sequences from scientists around the world.
As of July 1, 1996, approximately 286,000,000 bases and 352,400 sequences
are stored in GenBank, and many more are added each day.
GenBank is fairly straightforward and can be done with a
variety of search tools. If you are using a forms-capable WWW browser
(such as Netscape 1.0 or higher) and if you have never used GenBank
before, you will probably want to start your search with a general
Other means of searching GenBank include:
(Basic Local Alignment Search Tool) Searches
(Database of Expressed Sequence Tags)
(Database of Sequence Tagged Sites)
Submitting sequences to GenBank is also very easy and
is required by most journals before articles pertaining to the
sequence are published (this provides easy access to the information
for the journal's readers). You can submit sequences via the WWW with
Summary: EMBL is good to use when you need
a limited amount of data and when you are not trying to identify a gene by
sequence analysis. However, because EMBL and all of its mirror sites are
located in Europe, your connection will be slow more often than not. All
of the information submitted to EMBL is mirrored daily in both GenBank and
DDBJ, so searching elsewhere might provide the same amount of information in
EMBL is the database for the European Molecular Biology
Laboratory. It is a flat-file database that is searched by a
multitude of various search engines. EMBL sequences are stored in a
form corresponding to the biological state of the information in vivo.
Thus, cDNA sequences are stored in the database as RNA sequences, even
though they usually appear in the literature as DNA.
DBGET is a science links database that summarizes the major databases for nucleic acids, proteins, ligands, medicine, etc. It could prove useful for those trying to cross-reference information.
dbEST is a subdivision of GenBank specific for queries on expressed sequence tags ("single pass cDNA sequences").
Summary: Because DDBJ mirrors its
information daily with GenBank and EMBL, beginning sequence searchers might
want to try a database with a friendlier searching interface. However,
DDBJ also offers all of its pages in Japanese as well, so if you are more
comfortable reading the Japanese versions of the pages, it can be very
DDBJ, the DNA Data Bank of Japan, was established in 1986 to be one of
the major international DNA Databases (with GenBank and EMBL). It is
certified to collect information from researchers and assign accession
numbers to submitted entries.
Searching DDBJ is somewhat awkward, as the only way to access
most of the data is by its accession number via anonymous FTP.
Ribosomal Database Project
The RDP is a Gopher collection of ribosomal sequence data
Nucleic Acid Structure Resources
- RNA Secondary Structures -
This site provides information on secondary structures of rRNAs and group I introns. This site also contains some great links to other ribosomal and structural sites.
- Image LIbrary of Biological Macromolecules -
This gopher-based site contains hundreds of images of molecules and complexes along with the reference information. Images are well-categorized although entering the with a specific goal would be helpful.
Codon Usage Tables
- Indiana University's Gopher-Based Codon List -
Indiana University's codon tables summarizes amino acid information for over 50 organisms.
- Codon Lists at Harvard -
More comprehensive than Indiana University's codon lists, Harvard's codon lists includes more organisms and more information. For the novice or uninitiated, though, it can be quite intimidating.