Collections:
Motif PSSM with Bio.motifs
How to Calculate Motif PSSM with Bio.motifs Module?
✍: FYIcenter.com
PSSM (Position-Specific Scoring Matrix), also referred as PSWM (Position-Specific Weight Matrix) or LSM (Logodds Scoring Matrix), represents how well the frequency of each letter at each position matches with a given background frequency. PSSM can be expressed as:
PSSM[i,j] = log2(PPM[i,j]/B[i]) where: PPM[i,j] is the Position Probability Matrix. B[i] is a background frequency column. log2() is logarithm function of base 2.
The simplest background frequency model assumes that each letter appears equally in the entire population. So for DNA sequences, the simplest background frequency column is B = (Ba, Bc, Bg, Bt) = (0.25, 0.25, 0.25, 0.25).
In Biopython, we can use the log_odds() to calculate the PSSM against the simplest background frequency model. Note that log_odds() uses B = (0.25, 0.25, 0.25, 0.25) by default.
fyicenter$ python >>> from Bio import motifs >>> samples = [ ... "AAGAAT", ... "ATCATA", ... "AAGTAA", ... "AACAAA", ... "ATTAAA", ... "AAGAAT" ... ] >>> m = motifs.create(samples) >>> ppm = m.counts.normalize() >>> print(ppm) 0 1 2 3 4 5 A: 1.00 0.67 0.00 0.83 0.83 0.67 C: 0.00 0.00 0.33 0.00 0.00 0.00 G: 0.00 0.00 0.50 0.00 0.00 0.00 T: 0.00 0.33 0.17 0.17 0.17 0.33 >>> pssm = ppm.log_odds() >>> print(pssm) 0 1 2 3 4 5 A: 2.00 1.42 -inf 1.74 1.74 1.42 C: -inf -inf 0.42 -inf -inf -inf G: -inf -inf 1.00 -inf -inf -inf T: -inf 0.42 -0.58 -0.58 -0.58 0.42
We can verify the calculation using the math.log(x,2) function for a couple of locations in the matrix.
>>> import math >>> math.log(ppm["A",0]/0.25, 2) 2.0 >>> math.log(ppm["A",1]/0.25, 2) 1.4150374992788437
In order to avoid -inf in the PSSM, we can also add a set of pseudocounts into the PPM.
>>> pseudocounts = {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25} >>> ppm = m.counts.normalize(pseudocounts) >>> background = {"A": 0.25, "C": 0.25, "G": 0.25, "T": 0.25} >>> pssm = ppm.log_odds(background) >>> print(pssm) 0 1 2 3 4 5 A: 1.84 1.28 -2.81 1.58 1.58 1.28 C: -2.81 -2.81 0.36 -2.81 -2.81 -2.81 G: -2.81 -2.81 0.89 -2.81 -2.81 -2.81 T: -2.81 0.36 -0.49 -0.49 -0.49 0.36
⇒ Sequence Score against PSSM with Bio.motifs
⇐ Motif PCM, PFM, PPM, PWM with Bio.motifs
2023-07-01, 388🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1004182 Names: InChIKey: PTYVVZZUCFZGES-UHFFFAOYS A-NSMILES: NC(=O)C(CS)NC...
Molecule Summary: ID: FYI-1003018 Names: InChIKey: OMCSXMZHFIGQIK-UHFFFAOYS A-NSMILES: CCC(C)C(C)CCN...
Can I load JSME JavaScript code in HTML "body" element instead of "head"? Yes, you can load JSME Jav...
Molecule Summary: ID: FYI-1003618 Names: InChIKey: FCXTXGWIQGHISV-JXMROGBWS A-NSMILES: N#C/C(=C\\c2c...
Molecule Summary: ID: FYI-1003680 Names: PENTANE; InChIKey: OFBQJSOFQDEBGM-UHFFFAOYS A-NSMILES: CCCC...