What Is Sequence Motif Analysis

Q

What is Sequence Motif Analysis?

✍: FYIcenter.com

A

In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule.

For example, an N-glycosylation site motif can be defined as Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue. This N-glycosylation motif can be expressed as:

N-glycosylation Motif = N{P}[ST]{P}
 
where:
  N = Asn 
  P = Pro 
  S = Ser 
  T = Thr 
  {X} means any amino acid except X
  [XY] means either X or Y.

Formally, the above motif pattern example is defined by the following rules.

  • A single upper case letter, X, outside any brackets represents a nucleotide or an amino acid.
  • A single upper case letter inside a curly bracket, {X} represents any nucleotide or any amino acid except X.
  • Multiple upper case letters inside a square bracket, [XYZ] represents X, Y or Z.
  • Lower case x is a wildcard for any nucleotide or any amino acid.

The PROSITE data uses a lightly different notation to express sequence motif with the following extra rules:

  • A hyphen, -, is used as a delimiter for each nucleotide or amino acid position.
  • A single letter followed by an integer inside a bracket, X(n) represents X repeated n times.
  • A single letter followed by two integers inside a bracket, X(m,n) represents X repeated k times, where m<=k<=n.

For example, motif for the C2H2-type zinc finger domain in PROSITE notation is:

C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H

If you are familiar with regular expression, sequence motif can also be expressed as a regular expression. For example:

N-glycosylation Motif = /N[^P][ST][^P]/
C2H2-type zinc finger domain = /C.{2,4}C.{3}[LIVMFYWC].{8}H.{3,5}H/

 

Create Motif With Biopython Bio.motifs Module

Biopython for Sequence Motif Analysis

Biopython for Sequence Motif Analysis

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-07-11, 392🔥, 0💬