Collections:
Compare Motifs Using PSSM with Bio.motifs
How to Compare Motifs Using PSSM with Bio.motifs?
✍: FYIcenter.com
If you know PSSMs of two motifs, you can compare them
using the PSSM's dist_pearson() function. It returns a position
offset for the best alignment and a distance between
the two motifs.
1. Create a shorter motif from a given PCM without actual instances.
fyicenter$ python >>> from io import StringIO >>> pcm_s = """>Sorter ... A [ 3.00 7.00 0.00 2.00 1.00 ] ... C [ 0.00 0.00 5.00 2.00 6.00 ] ... G [ 0.00 0.00 0.00 3.00 0.00 ] ... T [ 4.00 0.00 2.00 0.00 0.00 ] ... """ >>> handle = StringIO(pcm_s) >>> m_s = motifs.read(handle, "jaspar")
2. Create a longer motif from a given PCM without actual instances.
>>> pcm_l = """>Longer ... A [ 30.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 15.00 ] ... C [ 10.00 0.00 0.00 0.00 100.00 100.00 100.00 0.00 15.00 ] ... G [ 50.00 0.00 0.00 0.00 0.00 0.00 0.00 60.00 55.00 ] ... T [ 10.00 100.00 100.00 0.00 0.00 0.00 0.00 40.00 15.00 ] ... """ >>> handle = StringIO(pcm_l) >>> m_l = motifs.read(handle, "jaspar")
3. Calculate their PSSMs with the same pseudocounts and background.
>>> pseudocounts = {"A": 0.6, "C": 0.4, "G": 0.4, "T": 0.6}
>>> background = {"A": 0.3, "C": 0.2, "G": 0.2, "T": 0.3}
>>> pwm_s = m_s.counts.normalize(pseudocounts)
>>> pssm_s = pwm_s.log_odds(background)
>>> pwm_l = m_l.counts.normalize(pseudocounts)
>>> pssm_l = pwm_l.log_odds(background)
4. Compare them by calling the dist_pearson() function.
>>> distance, offset = pssm_l.dist_pearson(pssm_s) >>> distance 0.23924403149343054 >>> offset 2
The distance is actually 1.0 − r, where r is the Pearson correlation coefficient (PCC), between consensus sequences of the two motif aligned with padding of background distribution on the shorter motif.
>>> m_s.consensus
Seq('TACGC')
>>> m_l.consensus
Seq('GTTACCCGG')
# alignment using b as background distribution
m_s: bbTACGCbb
m_1: GTTACCCGG
⇐ Search for Motif Matches with Bio.motifs
2023-06-19, 748🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1002122 Names: InChIKey: JEINBGMMTISVPC-UHFFFAOYS A-NSMILES: COC4=C(OC)CN(...
Molecule Summary: ID: FYI-1003351 Names: InChIKey: CKINDLZJROPWFQ-UHFFFAOYS A-NSMILES: CC(C)(C)c3ccc...
Molecule Summary: ID: FYI-1003933 Names: InChIKey: HQOWCDPFDSRYRO-NBPLLSTCS A-NSMILES: CCCCCCc%15ccc...
Molecule Summary: ID: FYI-1002042 Names: InChIKey: XRMBQHTWUBGQDN-UHFFFAOYS A-NSMILES: C=CC(=O)OCC(C...
Molecule Summary: ID: FYI-1003919 Names: InChIKey: WPLOOJLSWIHXCU-UHFFFAOYS A-NSMILES: CCc4nc(C(N)=O...