Sequence Score against PSSM with Bio.motifs

Q

How to Calculate Sequence Score against PSSM with Bio.motifs?

✍: FYIcenter.com

A

With the motif PSSM (Position-Specific Scoring Matrix) defined in the previous tutorial, we can define a matching score of any given sequence against the motif:

score = sum_over_j(PSSM[S[j], j])

where:
  S[i] is a given sequence
  PSSM[i, j] is the PSSM of a motif

If we the matching sore as a distribution, we calculate its summary statistics of minimum, maximum, mean and standard deviation.

1. Create an example motif with 7 DNA sequences.

fyicenter$ python
>>> from Bio.Seq import Seq
>>> instances = [
...   "TACAA",
...   "TACGC",
...   "TACAC",
...   "TACCC",
...   "AACCC",
...   "AATGC",
...   "AATGC",
... ]
>>> m = motifs.create(instances)

2. Calculate PWM and PSSM.

>>> pseudocounts={"A": 0.6, "C": 0.4, "G": 0.4, "T": 0.6}
>>> pwm = m.counts.normalize(pseudocounts)
>>> print(pwm)
        0      1      2      3      4
A:   0.40   0.84   0.07   0.29   0.18
C:   0.04   0.04   0.60   0.27   0.71
G:   0.04   0.04   0.04   0.38   0.04
T:   0.51   0.07   0.29   0.07   0.07

>>> background = {"A": 0.3, "C": 0.2, "G": 0.2, "T": 0.3}
>>> pssm = pwm.log_odds(background)
>>> print(pssm)
        0      1      2      3      4
A:   0.42   1.49  -2.17  -0.05  -0.75
C:  -2.17  -2.17   1.58   0.42   1.83
G:  -2.17  -2.17  -2.17   0.92  -2.17
T:   0.77  -2.17  -0.05  -2.17  -2.17

3. Calculate matching scores of some given sequences.

>>> seq = "TACAA"
>>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4]
>>> score
3.037341679708973

>>> seq = "CCGTG"
>>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4]
>>> score
-10.849625007211563

>>> seq = "AAAAA"
>>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4]
>>> score
-1.071182777069196

4. Calculate minimum and maximum of PSSM, which are defined as below.

# minimum = sum_over_j(min_over_i(PSSM[i,j]))
>>> print("%4.2f" % pssm.min)
-10.85

# maximum = sum_over_j(max_over_i(PSSM[i,j]))
>>> print("%4.2f" % pssm.max)
6.59

5. The minimum of PSSM of motif is actually the matching score of the anticonsensus sequence of the motif. The maximum of PSSM is the matching score of the consensus sequence of the motif.

>>> seq = m.anticonsensus
>>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4]
>>> score
-10.84962500721156

>>> seq = m.consensus 
>>> score = pssm[seq[0],0]+pssm[seq[1],1]+pssm[seq[2],2]+pssm[seq[3],3]+pssm[seq[4],4]
>>> score
6.594289804260533

4. Calculate mean and standard deviation of PSSM.

>>> mean = pssm.mean(background)
>>> std = pssm.std(background)
>>> print("mean = %0.2f, standard deviation = %0.2f" % (mean, std))
mean = 3.21, standard deviation = 2.59

 

Search for Motif Matches with Bio.motifs

Motif PSSM with Bio.motifs

Biopython for Sequence Motif Analysis

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-06-19, 390🔥, 0💬