Search for Motif Matches with Bio.motifs

Q

How to Search for Matches in a Target Sequence again a motif with Bio.motifs?

✍: FYIcenter.com

A

Bio.motifs module offers two options to search for segments that match a motif in a target sequence.

1. Use the motif instances to search for exact matches.

fyicenter$ python
>>> from Bio.Seq import Seq
>>> instances = [
...   "TACAA",
...   "TACGC",
...   "TACAC",
...   "TACCC",
...   "AACCC",
...   "AATGC",
...   "AATGC",
... ]
>>> m = motifs.create(instances)

>>> test_seq = Seq("TACACTGCATTACAACCCAAGCATTA")
>>> for pos, seq in m.instances.search(test_seq):
...   print("%i %s" % (pos, seq))
... 
0 TACAC
10 TACAA
13 AACCC

As you can see, 3 matches found that match exact with one of those sequence samples.

2. Use the motif PSSM to search for approximate matches.

>>> pseudocounts={"A": 0.6, "C": 0.4, "G": 0.4, "T": 0.6}
>>> pwm = m.counts.normalize(pseudocounts)

>>> background = {"A": 0.3, "C": 0.2, "G": 0.2, "T": 0.3}
>>> pssm = pwm.log_odds(background)
>>> print(pssm)
        0      1      2      3      4
A:   0.42   1.49  -2.17  -0.05  -0.75
C:  -2.17  -2.17   1.58   0.42   1.83
G:  -2.17  -2.17  -2.17   0.92  -2.17
T:   0.77  -2.17  -0.05  -2.17  -2.17

>>> for position, score in pssm.search(test_seq, threshold=3.0):
...   print("Position %d: score = %5.3f" % (position, score))
... 
Position 0: score = 5.622
Position -20: score = 4.601
Position 10: score = 3.037
Position 13: score = 5.738
Position -6: score = 4.601

Note that the negative positions refer to matches of the motif found on the reverse strand of the test sequence, which are positioned backward starting from the end of the test sequence.

3. Calculate matching scores of all positions. The output only shows scores of forward matches.

>>> scores = pssm.calculate(test_seq)
>>> print(scores)
[  5.622304    -5.6797      -3.4317725    0.93827754  -6.849625
  -2.0406609  -10.849625    -3.6561453   -0.03370807  -3.9110255
   3.0373416   -2.1491852   -0.6016975    5.7381525   -0.509775
  -3.5642228   -8.734148    -0.09919716  -0.6016975   -2.3942978
 -10.849625    -3.6561453 ]

As you can, the match at the first position, "TACAC", has the highest score of 5.6.

 

Compare Motifs Using PSSM with Bio.motifs

Sequence Score against PSSM with Bio.motifs

Biopython for Sequence Motif Analysis

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-06-19, 313🔥, 0💬