Collections:
Compare Motifs Using PSSM with Bio.motifs
How to Compare Motifs Using PSSM with Bio.motifs?
✍: FYIcenter.com
If you know PSSMs of two motifs, you can compare them
using the PSSM's dist_pearson() function. It returns a position
offset for the best alignment and a distance between
the two motifs.
1. Create a shorter motif from a given PCM without actual instances.
fyicenter$ python >>> from io import StringIO >>> pcm_s = """>Sorter ... A [ 3.00 7.00 0.00 2.00 1.00 ] ... C [ 0.00 0.00 5.00 2.00 6.00 ] ... G [ 0.00 0.00 0.00 3.00 0.00 ] ... T [ 4.00 0.00 2.00 0.00 0.00 ] ... """ >>> handle = StringIO(pcm_s) >>> m_s = motifs.read(handle, "jaspar")
2. Create a longer motif from a given PCM without actual instances.
>>> pcm_l = """>Longer ... A [ 30.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 15.00 ] ... C [ 10.00 0.00 0.00 0.00 100.00 100.00 100.00 0.00 15.00 ] ... G [ 50.00 0.00 0.00 0.00 0.00 0.00 0.00 60.00 55.00 ] ... T [ 10.00 100.00 100.00 0.00 0.00 0.00 0.00 40.00 15.00 ] ... """ >>> handle = StringIO(pcm_l) >>> m_l = motifs.read(handle, "jaspar")
3. Calculate their PSSMs with the same pseudocounts and background.
>>> pseudocounts = {"A": 0.6, "C": 0.4, "G": 0.4, "T": 0.6}
>>> background = {"A": 0.3, "C": 0.2, "G": 0.2, "T": 0.3}
>>> pwm_s = m_s.counts.normalize(pseudocounts)
>>> pssm_s = pwm_s.log_odds(background)
>>> pwm_l = m_l.counts.normalize(pseudocounts)
>>> pssm_l = pwm_l.log_odds(background)
4. Compare them by calling the dist_pearson() function.
>>> distance, offset = pssm_l.dist_pearson(pssm_s) >>> distance 0.23924403149343054 >>> offset 2
The distance is actually 1.0 − r, where r is the Pearson correlation coefficient (PCC), between consensus sequences of the two motif aligned with padding of background distribution on the shorter motif.
>>> m_s.consensus
Seq('TACGC')
>>> m_l.consensus
Seq('GTTACCCGG')
# alignment using b as background distribution
m_s: bbTACGCbb
m_1: GTTACCCGG
⇐ Search for Motif Matches with Bio.motifs
2023-06-19, 850🔥, 0💬
Popular Posts:
What are the options for installing Open Babel on macOS computers? There are a number of options for...
Molecule Summary: ID: FYI-1000262 SMILES: [CH+]=C1CC=C(N2N=CC=N2)C =N1Received at FYIcenter.com on: ...
Molecule Summary: ID: FYI-1005512 Names: InChIKey: HTIHTSGEYLHHNB-AWEZNQCLS A-NSMILES: O=C(Nc3ccc2cc...
Molecule Summary: ID: FYI-1002197 Names: InChIKey: MCAJXCFWQNZXQF-JSKKQJDMS A-NSMILES: Cc6ccc(C5C1CC...
Molecule Summary: ID: FYI-1003519 Names: InChIKey: BVLUXUSIFRWIAJ-UHFFFAOYS A-NSMILES: O=C(NNC(=O)c1...