Bioinformatics Glossary

FASTA format

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is: 
>gi|532319|pir|TVFV2E|TVFV2E envelope protein

A FASTA file can also contain multiple sequences; 
>VECTOR32    Synthetic vector sequence #32
>VECTOR33    Synthetic vector sequence #33
 >VECTOR34    Synthetic vector sequence #34

Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes with these exceptions:  lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Invalid characters (digits, blanks) are  automatically removed.


A fingerprint is a set of motifs used to predict the occurrence of similar motifs, in either an individual sequence or in a database. Fingerprints are refined by iterative scanning of a composite protein sequence database.  A composite or multiple-motif fingerprint contains a number of aligned motifs taken from different parts of a multiple alignment.  True family members are then easy to identify by virtue of possessing all elements of the fingerprint, while subfamily members may be identified by possessing only part of it. 


A deletion, substitution, or duplication of one or more bases that causes the reading-frame of a structural gene to shift from the normal series of triplets. 

Functional genomics

The use of genomic information to delineate protein structure, function, pathways and networks. Function may be determined by "knocking out" or "knocking in" expressed genes in model organisms such as worm, fruitfly, yeast or mouse. 

Fusion protein

The protein resulting from the genetic joining and expression of 2 different genes (see chimeric) 

