Collections:
Motif PSSM with Bio.motifs
How to Calculate Motif PSSM with Bio.motifs Module?
✍: FYIcenter.com
PSSM (Position-Specific Scoring Matrix), also referred as PSWM (Position-Specific Weight Matrix) or LSM (Logodds Scoring Matrix), represents how well the frequency of each letter at each position matches with a given background frequency. PSSM can be expressed as:
PSSM[i,j] = log2(PPM[i,j]/B[i]) where: PPM[i,j] is the Position Probability Matrix. B[i] is a background frequency column. log2() is logarithm function of base 2.
The simplest background frequency model assumes that each letter appears equally in the entire population. So for DNA sequences, the simplest background frequency column is B = (Ba, Bc, Bg, Bt) = (0.25, 0.25, 0.25, 0.25).
In Biopython, we can use the log_odds() to calculate the PSSM against the simplest background frequency model. Note that log_odds() uses B = (0.25, 0.25, 0.25, 0.25) by default.
fyicenter$ python >>> from Bio import motifs >>> samples = [ ... "AAGAAT", ... "ATCATA", ... "AAGTAA", ... "AACAAA", ... "ATTAAA", ... "AAGAAT" ... ] >>> m = motifs.create(samples) >>> ppm = m.counts.normalize() >>> print(ppm) 0 1 2 3 4 5 A: 1.00 0.67 0.00 0.83 0.83 0.67 C: 0.00 0.00 0.33 0.00 0.00 0.00 G: 0.00 0.00 0.50 0.00 0.00 0.00 T: 0.00 0.33 0.17 0.17 0.17 0.33 >>> pssm = ppm.log_odds() >>> print(pssm) 0 1 2 3 4 5 A: 2.00 1.42 -inf 1.74 1.74 1.42 C: -inf -inf 0.42 -inf -inf -inf G: -inf -inf 1.00 -inf -inf -inf T: -inf 0.42 -0.58 -0.58 -0.58 0.42
We can verify the calculation using the math.log(x,2) function for a couple of locations in the matrix.
>>> import math >>> math.log(ppm["A",0]/0.25, 2) 2.0 >>> math.log(ppm["A",1]/0.25, 2) 1.4150374992788437
In order to avoid -inf in the PSSM, we can also add a set of pseudocounts into the PPM.
>>> pseudocounts = {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25} >>> ppm = m.counts.normalize(pseudocounts) >>> background = {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25} >>> pssm = ppm.log_odds(background) >>> print(pssm) 0 1 2 3 4 5 A: 1.84 1.28 -2.81 1.58 1.58 1.28 C: -2.81 -2.81 0.36 -2.81 -2.81 -2.81 G: -2.81 -2.81 0.89 -2.81 -2.81 -2.81 T: -2.81 0.36 -0.49 -0.49 -0.49 0.36
⇒ Sequence Score against PSSM with Bio.motifs
⇐ Motif PCM, PFM, PPM, PWM with Bio.motifs
2023-07-01, 296🔥, 0💬
Popular Posts:
Where to find FAQ (Frequently Asked Questions) on Ketcher, Chemical Structure Editor in JavaScript? ...
Molecule Summary: ID: FYI-1000217 SMILES: C1[C@@H]2CC[C@@H]3[C@]([ C@H]1O)(C2)CC[C@H]1[C@@] 3(C)CCC[C@...
What Is PyMol? PyMol is a powerful molecule visualization software with the following main features:...
Molecule Summary: ID: FYI-1002057 Names: InChIKey: CQERVFFAOOUFEQ-UHFFFAOYS A-OSMILES: O=C(c1cncc(Br...
How to split a file with a large number of molecules? You can split a file with a large number of mo...