(Continued from previous part...)
A sequence in FASTA format begins with a single-line description, followed
by lines of sequence data. The description line is distinguished from the
sequence data by a greater-than (">") symbol in the first column. It is
recommended that all lines of text be shorter than 80 characters in length.
An example sequence in FASTA format is:
A FASTA file can also contain multiple sequences;
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Invalid characters (digits, blanks) are automatically removed.
A fingerprint is a set of motifs used to predict the occurrence of similar motifs, in either an individual sequence or in a database. Fingerprints are refined by iterative scanning of a composite protein sequence database. A composite or multiple-motif fingerprint contains a number of aligned motifs taken from different parts of a multiple alignment. True family members are then easy to identify by virtue of possessing all elements of the fingerprint, while subfamily members may be identified by possessing only part of it.
A deletion, substitution, or duplication of one or more bases that causes the reading-frame of a structural gene to shift from the normal series of triplets.
The use of genomic information to delineate protein structure, function, pathways and networks. Function may be determined by "knocking out" or "knocking in" expressed genes in model organisms such as worm, fruitfly, yeast or mouse.
The protein resulting from the genetic joining and expression of 2 different genes (see chimeric)
(Continued on next part...)