Motif ICM as Relative Divergence with Bio.motifs

Q

How to Calculate Motif ICM as Relative Divergence with Bio.motifs module?

✍: FYIcenter.com

A

ICMRD (Information Content Matrices as Relative Divergence) represents how different a given PPM is from the uniform distribution. ICMRD can be expressed as:

ICMRD[i,j] = PPM[i,j]*PSSM[i,j]

where: 
  PPM[i,j] is the Position Probability Matrix.
  PSSM[i,j] is the Position-Specific Scoring Matrix Matrix.

Here is an example on how to calculate ICMRD against the simplest background frequency model, where uniform pseudocount is 1, or P = (0.25, 0.25, 0.25, 0.25), and uniform background frequency if 0.25% , or B = (0.25, 0.25, 0.25, 0.25)

1. Create a motif object with Biopython.motifs create() function.

fyicenter$ python
>>> from Bio import motifs
>>> samples = [
...   "AAGAAT",
...   "ATCATA",
...   "AAGTAA",
...   "AACAAA",
...   "ATTAAA",
...   "AAGAAT"
... ]

>>> m = motifs.create(samples)

2. Calculate PPM and PSSM with Biopython.motifs normalize() and log_odds() functions.

>>> pseudocounts = {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25}
>>> ppm = m.counts.normalize(pseudocounts)
>>> print(ppm)
        0      1      2      3      4      5
A:   0.89   0.61   0.04   0.75   0.75   0.61
C:   0.04   0.04   0.32   0.04   0.04   0.04
G:   0.04   0.04   0.46   0.04   0.04   0.04
T:   0.04   0.32   0.18   0.18   0.18   0.32

>>> background = {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25}
>>> pssm = ppm.log_odds(background)
>>> print(pssm)
        0      1      2      3      4      5
A:   1.84   1.28  -2.81   1.58   1.58   1.28
C:  -2.81  -2.81   0.36  -2.81  -2.81  -2.81
G:  -2.81  -2.81   0.89  -2.81  -2.81  -2.81
T:  -2.81   0.36  -0.49  -0.49  -0.49   0.36

3. Calculate ICMRD with the "numpy" library.

>>> import numpy
>>> ppm_a = numpy.array([ppm["A"], ppm["C"], ppm["G"], ppm["T"]])
>>> pssm_a = numpy.array([pssm["A"], pssm["C"], pssm["G"], pssm["T"]])
>>> 
>>> icmrd = ppm_a * pssm_a 
>>> print(icmrd)
[[ 1.63973327  0.77720838 -0.10026268  1.18872188  1.18872188  0.77720838]
 [-0.10026268 -0.10026268  0.11654038 -0.10026268 -0.10026268 -0.10026268]
 [-0.10026268 -0.10026268  0.41464651 -0.10026268 -0.10026268 -0.10026268]
 [-0.10026268  0.11654038 -0.08668336 -0.08668336 -0.08668336  0.11654038]]

4. Negative values are not allowed. So replace them with 0.0.

>>> icmrd[icmrd<0.0] = 0.0
>>> print(icmrd)
[[1.63973327 0.77720838 0.         1.18872188 1.18872188 0.77720838]
 [0.         0.         0.11654038 0.         0.         0.        ]
 [0.         0.         0.41464651 0.         0.         0.        ]
 [0.         0.11654038 0.         0.         0.         0.11654038]]

 

Motif ICM Logo with WebLogo Tools

Motif ICM with Bio.motifs

Biopython for Sequence Motif Analysis

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-05-31, 287🔥, 0💬