Motif Counts and Consensus with Bio.motifs

Q

How to Get Motif Counts and Consensus with Bio.motifs Module?

✍: FYIcenter.com

A

Motif counts represent how often each letter appears at each position in a motif sample set. Motif counts is also called PFM (Position Frequency Matrix).

Motif consensus is the sequence of letters along the positions of the motif for which the largest value in the corresponding columns of the motif count is obtained. Basically, the motif consensus is the sequence with highest probability based on the given motif sample set. Or the motif consensus is the most likely sequence appearing in the entire population.

Motif anticonsensus is the sequence of letters along the positions of the motif for which the smallest value in the corresponding columns of the motif count is obtained. Basically, the motif consensus is the sequence with lowest probability based on the given motif sample set. Or the motif consensus is the most unlikely sequence appearing in the entire population.

1. Create a motif object with 7 sequences that matches the motif pattern of "[AT]A[CT][ACG][AC]".

fyicenter$ python
>>> from Bio import motifs
>>> samples = [
...     "TACAA",
...     "TACGC",
...     "TACAC",
...     "TACCC",
...     "AACCC",
...     "AATGC",
...     "AATGC"
... ]

>>> m = motifs.create(samples)

2. View motif counts.

>>> print(m.counts)
        0      1      2      3      4
A:   3.00   7.00   0.00   2.00   1.00
C:   0.00   0.00   5.00   2.00   6.00
G:   0.00   0.00   0.00   3.00   0.00
T:   4.00   0.00   2.00   0.00   0.00

3. View motif consensus.

>>> print(m.consensus)
TACGC

4. View motif anticonsensus.

>>> print(m.anticonsensus)
CCATG

5. If a position has multiple letters with same highest count, Biopython will select one of those letters.

>>> samples = [
...     "TACAA",
...     "TACGC",
...     "TACAC",
...     "TACCC",
...     "AACCC",
...     "AATGC",
...     "AATGC",
...     "AACGC"
... ]

>>> m = motifs.create(samples)
>>> print(m.consensus)
AACGC

As you can see, position 1 has both A and T with the highest count of 4. Biopython selects A.

 

Read Motif in JASPAR Format with Bio.motifs

Create Motif With Biopython Bio.motifs Module

Biopython for Sequence Motif Analysis

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-07-05, 302🔥, 0💬