Collections:
Sequence Score against PSSM with Bio.motifs
How to Calculate Sequence Score against PSSM with Bio.motifs?
✍: FYIcenter.com
With the motif PSSM (Position-Specific Scoring Matrix) defined in the previous tutorial, we can define a matching score of any given sequence against the motif:
score = sum_over_j(PSSM[S[j], j]) where: S[i] is a given sequence PSSM[i, j] is the PSSM of a motif
If we the matching sore as a distribution, we calculate its summary statistics of minimum, maximum, mean and standard deviation.
1. Create an example motif with 7 DNA sequences.
fyicenter$ python >>> from Bio.Seq import Seq >>> instances = [ ... "TACAA", ... "TACGC", ... "TACAC", ... "TACCC", ... "AACCC", ... "AATGC", ... "AATGC", ... ] >>> m = motifs.create(instances)
2. Calculate PWM and PSSM.
>>> pseudocounts={"A": 0.6, "C": 0.4, "G": 0.4, "T": 0.6} >>> pwm = m.counts.normalize(pseudocounts) >>> print(pwm) 0 1 2 3 4 A: 0.40 0.84 0.07 0.29 0.18 C: 0.04 0.04 0.60 0.27 0.71 G: 0.04 0.04 0.04 0.38 0.04 T: 0.51 0.07 0.29 0.07 0.07 >>> background = {"A": 0.3, "C": 0.2, "G": 0.2, "T": 0.3} >>> pssm = pwm.log_odds(background) >>> print(pssm) 0 1 2 3 4 A: 0.42 1.49 -2.17 -0.05 -0.75 C: -2.17 -2.17 1.58 0.42 1.83 G: -2.17 -2.17 -2.17 0.92 -2.17 T: 0.77 -2.17 -0.05 -2.17 -2.17
3. Calculate matching scores of some given sequences.
>>> seq = "TACAA" >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score 3.037341679708973 >>> seq = "CCGTG" >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score -10.849625007211563 >>> seq = "AAAAA" >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score -1.071182777069196
4. Calculate minimum and maximum of PSSM, which are defined as below.
# minimum = sum_over_j(min_over_i(PSSM[i,j])) >>> print("%4.2f" % pssm.min) -10.85 # maximum = sum_over_j(max_over_i(PSSM[i,j])) >>> print("%4.2f" % pssm.max) 6.59
5. The minimum of PSSM of motif is actually the matching score of the anticonsensus sequence of the motif. The maximum of PSSM is the matching score of the consensus sequence of the motif.
>>> seq = m.anticonsensus >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score -10.84962500721156 >>> seq = m.consensus >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score 6.594289804260533
4. Calculate mean and standard deviation of PSSM.
>>> mean = pssm.mean(background) >>> std = pssm.std(background) >>> print("mean = %0.2f, standard deviation = %0.2f" % (mean, std)) mean = 3.21, standard deviation = 2.59
⇒ Search for Motif Matches with Bio.motifs
2023-06-19, 390🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1004688 Names: InChIKey: OSGJRHOJQRWCCW-KQWYESAVS A-NSMILES: CC/C=C/CC/C=C...
Molecule Summary: ID: FYI-1002116 Names: InChIKey: QWHLFJJLRVOHTM-UHFFFAOYS A-NSMILES: O=C(O)CCOCCOC...
Molecule Summary: ID: FYI-1002269 Names: InChIKey: CYQFCXCEBYINGO-IRXDYDNUS A-NSMILES: CCCCCc1cc(O)c...
Molecule Summary: ID: FYI-1000981 SMILES: OC(=O)C(CC1=CC=C(O)C(=C1 )O)OC(=O)\\C=C\\C2=CC=C( O)C(=C2)O...
Molecule Summary: ID: FYI-1002332 Names: InChIKey: SZJFVNBIRNREDK-UHFFFAOYS A-NSMILES: Cc5cccc(c3nnc...