Collections:
Sequence Score against PSSM with Bio.motifs
How to Calculate Sequence Score against PSSM with Bio.motifs?
✍: FYIcenter.com
With the motif PSSM (Position-Specific Scoring Matrix) defined in the previous
tutorial, we can define a matching score of any given sequence against
the motif:
score = sum_over_j(PSSM[S[j], j]) where: S[i] is a given sequence PSSM[i, j] is the PSSM of a motif
If we the matching sore as a distribution, we calculate its summary statistics of minimum, maximum, mean and standard deviation.
1. Create an example motif with 7 DNA sequences.
fyicenter$ python >>> from Bio.Seq import Seq >>> instances = [ ... "TACAA", ... "TACGC", ... "TACAC", ... "TACCC", ... "AACCC", ... "AATGC", ... "AATGC", ... ] >>> m = motifs.create(instances)
2. Calculate PWM and PSSM.
>>> pseudocounts={"A": 0.6, "C": 0.4, "G": 0.4, "T": 0.6}
>>> pwm = m.counts.normalize(pseudocounts)
>>> print(pwm)
0 1 2 3 4
A: 0.40 0.84 0.07 0.29 0.18
C: 0.04 0.04 0.60 0.27 0.71
G: 0.04 0.04 0.04 0.38 0.04
T: 0.51 0.07 0.29 0.07 0.07
>>> background = {"A": 0.3, "C": 0.2, "G": 0.2, "T": 0.3}
>>> pssm = pwm.log_odds(background)
>>> print(pssm)
0 1 2 3 4
A: 0.42 1.49 -2.17 -0.05 -0.75
C: -2.17 -2.17 1.58 0.42 1.83
G: -2.17 -2.17 -2.17 0.92 -2.17
T: 0.77 -2.17 -0.05 -2.17 -2.17
3. Calculate matching scores of some given sequences.
>>> seq = "TACAA" >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score 3.037341679708973 >>> seq = "CCGTG" >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score -10.849625007211563 >>> seq = "AAAAA" >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score -1.071182777069196
4. Calculate minimum and maximum of PSSM, which are defined as below.
# minimum = sum_over_j(min_over_i(PSSM[i,j]))
>>> print("%4.2f" % pssm.min)
-10.85
# maximum = sum_over_j(max_over_i(PSSM[i,j]))
>>> print("%4.2f" % pssm.max)
6.59
5. The minimum of PSSM of motif is actually the matching score of the anticonsensus sequence of the motif. The maximum of PSSM is the matching score of the consensus sequence of the motif.
>>> seq = m.anticonsensus >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score -10.84962500721156 >>> seq = m.consensus >>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4] >>> score 6.594289804260533
4. Calculate mean and standard deviation of PSSM.
>>> mean = pssm.mean(background)
>>> std = pssm.std(background)
>>> print("mean = %0.2f, standard deviation = %0.2f" % (mean, std))
mean = 3.21, standard deviation = 2.59
⇒ Search for Motif Matches with Bio.motifs
2023-06-19, 715🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1004798 Names: InChIKey: LFWHFZJPXXOYNR-MFOYZWKCS A-NSMILES: CSc3ccc(C=c1c...
Molecule Summary: ID: FYI-1003319 Names: InChIKey: HNYDBQGLNHGEHA-UHFFFAOYS A-NSMILES: Cc2nc(c1ccc(O...
Molecule Summary: ID: FYI-1002228 Names: InChIKey: SQHQSJTXZGDBSC-UHFFFAOYS A-NSMILES: COc2cccc(C=CC...
Molecule Summary: ID: FYI-1002890 Names: InChIKey: DZABBYYPQHZLHW-UHFFFAOYS A-NSMILES: NS(=O)(=O)c2c...
Molecule Summary: ID: FYI-1002205 Names: InChIKey: SXWZQUCTTOBHJT-UHFFFAOYS A-NSMILES: CNC2Cc1ccccc1...