BioTech FYI Center

Protein sequence databases

3. Protein sequence databases

3.1. General sequence databases

Database name
Full name and/or description
URL
EXProt
Sequences of proteins with experimentally verified function
NCBI Protein database

All protein sequences: translated from GenBank and imported from other protein databases

PA-GOSUB

Protein sequences from model organisms, GO assignment and subcellular localization

PIR-PSD

Protein information resource protein sequence database, has been merged into the UniProt knowledgebase

PIR-NREF
PIR's non-redundant reference protein database
PRF

Protein research foundation database of peptides: sequences, literature and unnatural amino acids

Swiss-Prot

Now UniProt/Swiss-Prot: expertly curated protein sequence database, section of the UniProt knowledgebase

TrEMBL

Now UniProt/TrEMBL: computer-annotated translations of EMBL nucleotide sequence entries: section of the UniProt knowledgebase

UniParc

UniProt archive: a repository of all protein sequences, consisting only of unique identifiers and sequence

UniProt

Universal protein knowledgebase: merged data from Swiss-Prot, TrEMBL and PIR protein sequence databases

UniRef

UniProt non-redundant reference database: clustered sets of related sequences (including splice variants and isoforms)

3.2. Protein properties

Database name
Full name and/or description
URL
AAindex
Physicochemical properties of amino acids
ProNIT
Thermodynamic data on protein�nucleic acid interactions
ProTherm
Thermodynamic data for wild-type and mutant proteins
TECRdb
Thermodynamics of enzyme-catalyzed reactions

3.3. Protein localization and targeting

Database name
Full name and/or description
URL
DBSubLoc
Database of protein subcellular localization
NESbase
Nuclear export signals database
NLSdb
Nuclear localization signals
NMPdb
Nuclear matrix associated proteins database
NOPdb
Nucleolar proteome database
PSORTdb
Protein subcellular localization in bacteria
SPD
Secreted protein database
THGS
Transmembrane helices in genome sequences
TMPDB
Experimentally characterized transmembrane topologies

3.4. Protein sequence motifs and active sites

Database name
Full name and/or description
URL
ASC
Active sequence collection: biologically active peptides
Blocks
Alignments of conserved regions in protein families
CSA

Catalytic site atlas : active sites and catalytic residues in enzymes of known 3D structure

COMe

Co-ordination of metals etc.: classification of bioinorganic proteins ( metalloproteins and some other complex proteins)

CopS
Comprehensive peptide signature database
eBLOCKS
Highly conserved protein sequence blocks
eMOTIF
Protein sequence motif determination and searches
Metalloprotein Site Database
Metal-binding sites in metalloproteins
O-GlycBase
O- and C-linked glycosylation sites in proteins
PDBSite
3D structure of protein functional sites
Phospho.ELM
S/T/Y protein phosphorylation sites (formerly PhosphoBase)
PROMISE
Prosthetic centers and metal ions in protein active sites
PROSITE
Biologically significant protein patterns and profiles
ProTeus
Signature sequences at the protein N- and C-termini

3.5. Protein domain databases; protein classification

Database name
Full name and/or description
URL
ADDA
A database of protein domain classification
CDD

Conserved domain database, includes protein domains fromPfam, SMART, COG and KOG databases

CluSTr
Clusters of Swiss-Prot + TrEMBL proteins
FunShift
Functional divergence between the subfamilies of a protein domain family
Hits
A database of protein domains and motifs
InterPro
Integrated resource of protein families, domains and functional sites
iProClass
Integrated protein classification database
PIRSF
Family/superfamily classification of whole proteins

http://pir.georgetown.edu/pirsf/

PRINTS
Hierarchical gene family fingerprints
Pfam

Protein families: multiple sequence alignments and profile hidden Markov models of protein domains

PRECISE
Predicted and consensus interaction sites in enzymes
ProDom
Protein domain families
ProtoMap
Hierarchical classification of Swiss-Prot proteins
ProtoNet
Hierarchical clustering of Swiss-Prot proteins
S4
Structure-based sequence alignments of SCOP superfamilies
SBASE
Protein domain sequences and tools
SMART

Simple modular architecture research tool: signalling, extracellular and chromatin-associated protein domains

SUPFAM
Grouping of sequence families into superfamilies
SYSTERS
Systematic re-searching and clustering of proteins
TIGRFAMs
TIGR protein families adapted for functional annotation

3.6. Databases of individual protein families

Database name
Full name and/or description
URL
AARSDB
Aminoacyl-tRNA synthetase database
ASPD
Artificial selected proteins/peptides database
BacTregulators
Transcriptional regulators of AraC and TetR families
CSDBase
Cold shock domain-containing proteins
CuticleDB
Structural proteins of Arthropod cuticle
DCCP
Database of copper-chelating proteins
DExH/D Family Database
DEAD-box, DEAH-box and DExH-box proteins
Endogenous GPCR List
G protein-coupled receptors; expression in cell lines
ESTHER
Esterases and other alpha/beta hydrolase enzymes
EyeSite
Families of proteins functioning in the eye
GPCRDB
G protein-coupled receptors database
gpDB
G-proteins and their interaction with GPCRs
Histone Database
Histone fold sequences and structures
Homeobox Page
Homeobox proteins, classification and evolution
Hox-Pro
Homeobox genes database
Homeodomain Resource

Homeodomain sequences, structures and related genetic and genomic information

HORDE
Human olfactory receptor data exploratorium
InBase

Inteins (protein splicing elements) database: properties, sequences, bibliography

KinG�Kinases in Genomes
S/T/Y-specific protein kinases encoded in complete genomes
Knottins

Database of knottins�small proteins with an unusual �disulfide through disulfide ' knot

LGICdb
Ligand-gated ion channel subunit sequences database
Lipase Engineering Database Sequence
structure and function of lipases and esterases
LOX-DB
Mammalian, invertebrate, plant and fungal lipoxygenases
MEROPS
Database of proteolytic enzymes (peptidases)
NPD
Nuclear protein database
NucleaRDB
Nuclear receptor superfamily
Nuclear Receptor Resource
Nuclear receptor superfamily
NUREBASE
Nuclear hormone receptors database
Olfactory Receptor Database
Sequences for olfactory receptor-like molecules
ooTFD
Object-oriented transcription factors database
PKR

Protein kinase resource: sequences, enzymology, genetics and molecular and structural properties

PLPMDB
Pyridoxal-5 0 -phosphate dependent enzymes mutations
ProLysED
A database of bacterial protease systems
Prolysis
Proteases and natural and synthetic protease inhibitors
REBASE
Restriction enzymes and associated methylases
Ribonuclease P Database
RNase P sequences, alignments and structures
RPG
Ribosomal protein gene database
RTKdb
Receptor tyrosine kinase sequences
S/MARt dB
Nuclear scaffold/matrix attached regions
Scorpion
Database of scorpion toxins
SDAP
Structural database of allergenic proteins and food allergens
SENTRA
Sensory signal transduction proteins
SEVENS
7-transmembrane helix receptors (G-protein-coupled)
SRPDB
Proteins of the signal recognition particles
TrSDB
Transcription factor database
VKCDB
Voltage-gated potassium channel database
Wnt Database
Wnt proteins and phenotypes
Protein sequence databases