Collections:
Motif ICM with Bio.motifs
How to Calculate Motif ICM with Bio.motifs Module?
✍: FYIcenter.com
ICM (Information Content Matrices)
represents how important of each position over others.
ICM can be expressed as:
ICM[i,j] = PPM[i,j]*(ICt - U[j]) where: PPM[i,j] is the Position Probability Matrix ICt is the total IC: log2(n) n is the number of letters U[j] is the uncertainty per position: - sum_over_i(PPM[i,j]*log2(PPM[i,j]))
To calculate motif ICM, we can get the PPM first using Biopython.
fyicenter$ python
>>> from Bio import motifs
>>> samples = [
... "AAGAAT",
... "ATCATA",
... "AAGTAA",
... "AACAAA",
... "ATTAAA",
... "AAGAAT"
... ]
>>> m = motifs.create(samples)
>>> ppm = m.counts.normalize()
>>> print(ppm)
0 1 2 3 4 5
A: 1.00 0.67 0.00 0.83 0.83 0.67
C: 0.00 0.00 0.33 0.00 0.00 0.00
G: 0.00 0.00 0.50 0.00 0.00 0.00
T: 0.00 0.33 0.17 0.17 0.17 0.33
Then we can calculate the ICM using "numpy" and "math" libraries.
>>> import numpy >>> import math >>> n = len(ppm) >>> n 4 >>> ic_t = math.log(n, 2) >>> ic_t 2.0 >>> ppm_a = numpy.array([ppm["A"], ppm["C"], ppm["G"], ppm["T"]]) >>> print(ppm_a) [[1. 0.66666667 0. 0.83333333 0.83333333 0.66666667] [0. 0. 0.33333333 0. 0. 0. ] [0. 0. 0.5 0. 0. 0. ] [0. 0.33333333 0.16666667 0.16666667 0.16666667 0.33333333]] >>> log2_ppm_a = numpy.log2(ppm_a) >>> print(log2_ppm_a) [[ 0. -0.5849625 -inf -0.26303441 -0.26303441 -0.5849625 ] [ -inf -inf -1.5849625 -inf -inf -inf] [ -inf -inf -1. -inf -inf -inf] [ -inf -1.5849625 -2.5849625 -2.5849625 -2.5849625 -1.5849625 ]] >>> ppm_log2_ppm_a = ppm_a * log2_ppm_a >>> print(ppm_log2_ppm_a) [[ 0. -0.389975 nan -0.21919534 -0.21919534 -0.389975 ] [ nan nan -0.52832083 nan nan nan] [ nan nan -0.5 nan nan nan] [ nan -0.52832083 -0.43082708 -0.43082708 -0.43082708 -0.52832083]] >>> ppm_log2_ppm_a = numpy.nan_to_num(ppm_log2_ppm_a) >>> print(ppm_log2_ppm_a) [[ 0. -0.389975 0. -0.21919534 -0.21919534 -0.389975 ] [ 0. 0. -0.52832083 0. 0. 0. ] [ 0. 0. -0.5 0. 0. 0. ] [ 0. -0.52832083 -0.43082708 -0.43082708 -0.43082708 -0.52832083]] >>> u_a = - numpy.sum(ppm_log2_ppm_a, axis=0) >>> print(u_a) [-0. 0.91829583 1.45914792 0.65002242 0.65002242 0.91829583] >>> icm = ppm_a * (ic_t - u_a) >>> print(icm) [[2. 0.72113611 0. 1.12498132 1.12498132 0.72113611] [0. 0. 0.18028403 0. 0. 0. ] [0. 0. 0.27042604 0. 0. 0. ] [0. 0.36056806 0.09014201 0.22499626 0.22499626 0.36056806]]
As you can see, the total ICM value of the first position is the highest value of 2, the most important, or the most conserved. The total ICM value of the third position is the lowest value, less important, or less conserved.
⇒ Motif ICM as Relative Divergence with Bio.motifs
⇐ Compare Motifs Using PSSM with Bio.motifs
2023-05-31, 819🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1002893 Names: InChIKey: YZJMQWTUVKKLNZ-UHFFFAOYS A-NSMILES: NS(=O)(=O)c3c...
Molecule Summary: ID: FYI-1002107 Names: InChIKey: OWVQNGUYRPHODI-UHFFFAOYS A-NSMILES: C=CC(=O)CCCn1...
Molecule Summary: ID: FYI-1000305 SMILES: CCc1ccc(O)c(/N=N/c2ccc(C )cc2C)c1Received at FYIcenter.com...
Molecule Summary: ID: FYI-1006403 Names: InChIKey: FKKXLASAHLUZPV-QHHAFSJGS A-NSMILES: COc1ccc(O)c(C...
Molecule Summary: ID: FYI-1002959 Names: InChIKey: KEKAQVJXZANGIK-UHFFFAOYS A-NSMILES: CCCCCCCCCCCCC...