Collections:
Motif ICM with Bio.motifs
How to Calculate Motif ICM with Bio.motifs Module?
✍: FYIcenter.com
ICM (Information Content Matrices) represents how important of each position over others. ICM can be expressed as:
ICM[i,j] = PPM[i,j]*(ICt - U[j]) where: PPM[i,j] is the Position Probability Matrix ICt is the total IC: log2(n) n is the number of letters U[j] is the uncertainty per position: - sum_over_i(PPM[i,j]*log2(PPM[i,j]))
To calculate motif ICM, we can get the PPM first using Biopython.
fyicenter$ python >>> from Bio import motifs >>> samples = [ ... "AAGAAT", ... "ATCATA", ... "AAGTAA", ... "AACAAA", ... "ATTAAA", ... "AAGAAT" ... ] >>> m = motifs.create(samples) >>> ppm = m.counts.normalize() >>> print(ppm) 0 1 2 3 4 5 A: 1.00 0.67 0.00 0.83 0.83 0.67 C: 0.00 0.00 0.33 0.00 0.00 0.00 G: 0.00 0.00 0.50 0.00 0.00 0.00 T: 0.00 0.33 0.17 0.17 0.17 0.33
Then we can calculate the ICM using "numpy" and "math" libraries.
>>> import numpy >>> import math >>> n = len(ppm) >>> n 4 >>> ic_t = math.log(n, 2) >>> ic_t 2.0 >>> ppm_a = numpy.array([ppm["A"], ppm["C"], ppm["G"], ppm["T"]]) >>> print(ppm_a) [[1. 0.66666667 0. 0.83333333 0.83333333 0.66666667] [0. 0. 0.33333333 0. 0. 0. ] [0. 0. 0.5 0. 0. 0. ] [0. 0.33333333 0.16666667 0.16666667 0.16666667 0.33333333]] >>> log2_ppm_a = numpy.log2(ppm_a) >>> print(log2_ppm_a) [[ 0. -0.5849625 -inf -0.26303441 -0.26303441 -0.5849625 ] [ -inf -inf -1.5849625 -inf -inf -inf] [ -inf -inf -1. -inf -inf -inf] [ -inf -1.5849625 -2.5849625 -2.5849625 -2.5849625 -1.5849625 ]] >>> ppm_log2_ppm_a = ppm_a * log2_ppm_a >>> print(ppm_log2_ppm_a) [[ 0. -0.389975 nan -0.21919534 -0.21919534 -0.389975 ] [ nan nan -0.52832083 nan nan nan] [ nan nan -0.5 nan nan nan] [ nan -0.52832083 -0.43082708 -0.43082708 -0.43082708 -0.52832083]] >>> ppm_log2_ppm_a = numpy.nan_to_num(ppm_log2_ppm_a) >>> print(ppm_log2_ppm_a) [[ 0. -0.389975 0. -0.21919534 -0.21919534 -0.389975 ] [ 0. 0. -0.52832083 0. 0. 0. ] [ 0. 0. -0.5 0. 0. 0. ] [ 0. -0.52832083 -0.43082708 -0.43082708 -0.43082708 -0.52832083]] >>> u_a = - numpy.sum(ppm_log2_ppm_a, axis=0) >>> print(u_a) [-0. 0.91829583 1.45914792 0.65002242 0.65002242 0.91829583] >>> icm = ppm_a * (ic_t - u_a) >>> print(icm) [[2. 0.72113611 0. 1.12498132 1.12498132 0.72113611] [0. 0. 0.18028403 0. 0. 0. ] [0. 0. 0.27042604 0. 0. 0. ] [0. 0.36056806 0.09014201 0.22499626 0.22499626 0.36056806]]
As you can see, the total ICM value of the first position is the highest value of 2, the most important, or the most conserved. The total ICM value of the third position is the lowest value, less important, or less conserved.
⇒ Motif ICM as Relative Divergence with Bio.motifs
⇐ Compare Motifs Using PSSM with Bio.motifs
2023-05-31, 295🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1002883 Names: InChIKey: IXYAZXOFFMOIPJ-UHFFFAOYS A-NSMILES: COC(=O)c2cc(S...
Where to find FAQ (Frequently Asked Questions) on doing Stereochemistry with Open Babel? Here is a l...
Molecule Summary: ID: FYI-1000280 SMILES: C(CF)CF Received at FYIcenter.com on: 2021-03-03
Molecule Summary: ID: FYI-1003254 Names: InChIKey: VYNIUBZKEWJOJP-UHFFFAOYS A-NSMILES: Nc4ccc(SCC3CO...
Molecule Summary: ID: FYI-1003068 Names: InChIKey: HEGLEWIXAZYZFS-JQWIXIFHS A-NSMILES: N#Cc3cc(c2cc(...