Collections:
Compare Motifs Using PSSM with Bio.motifs
How to Compare Motifs Using PSSM with Bio.motifs?
✍: FYIcenter.com
If you know PSSMs of two motifs, you can compare them
using the PSSM's dist_pearson() function. It returns a position
offset for the best alignment and a distance between
the two motifs.
1. Create a shorter motif from a given PCM without actual instances.
fyicenter$ python >>> from io import StringIO >>> pcm_s = """>Sorter ... A [ 3.00 7.00 0.00 2.00 1.00 ] ... C [ 0.00 0.00 5.00 2.00 6.00 ] ... G [ 0.00 0.00 0.00 3.00 0.00 ] ... T [ 4.00 0.00 2.00 0.00 0.00 ] ... """ >>> handle = StringIO(pcm_s) >>> m_s = motifs.read(handle, "jaspar")
2. Create a longer motif from a given PCM without actual instances.
>>> pcm_l = """>Longer ... A [ 30.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 15.00 ] ... C [ 10.00 0.00 0.00 0.00 100.00 100.00 100.00 0.00 15.00 ] ... G [ 50.00 0.00 0.00 0.00 0.00 0.00 0.00 60.00 55.00 ] ... T [ 10.00 100.00 100.00 0.00 0.00 0.00 0.00 40.00 15.00 ] ... """ >>> handle = StringIO(pcm_l) >>> m_l = motifs.read(handle, "jaspar")
3. Calculate their PSSMs with the same pseudocounts and background.
>>> pseudocounts = {"A": 0.6, "C": 0.4, "G": 0.4, "T": 0.6}
>>> background = {"A": 0.3, "C": 0.2, "G": 0.2, "T": 0.3}
>>> pwm_s = m_s.counts.normalize(pseudocounts)
>>> pssm_s = pwm_s.log_odds(background)
>>> pwm_l = m_l.counts.normalize(pseudocounts)
>>> pssm_l = pwm_l.log_odds(background)
4. Compare them by calling the dist_pearson() function.
>>> distance, offset = pssm_l.dist_pearson(pssm_s) >>> distance 0.23924403149343054 >>> offset 2
The distance is actually 1.0 − r, where r is the Pearson correlation coefficient (PCC), between consensus sequences of the two motif aligned with padding of background distribution on the shorter motif.
>>> m_s.consensus
Seq('TACGC')
>>> m_l.consensus
Seq('GTTACCCGG')
# alignment using b as background distribution
m_s: bbTACGCbb
m_1: GTTACCCGG
⇐ Search for Motif Matches with Bio.motifs
2023-06-19, 846🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1000286 SMILES: [R2]C([C@H](CC1=CC=CO1)N [R1])=OReceived at FYIcenter.com ...
How to generate the molecule structure in SDF format from a SMILES string? To help you to generate t...
Molecule Summary: ID: FYI-1002820 Names: InChIKey: GMYBQOYNRZTGCI-UHFFFAOYS A-NSMILES: COCCOC(=O)c1c...
Molecule Summary: ID: FYI-1003668 Names: InChIKey: ZBCUVFWUMQVRIR-UHFFFAOYS A-NSMILES: OCC1CC=CN=C1F...
Molecule Summary: ID: FYI-1004903 Names: InChIKey: XUCIJNAGGSZNQT-UHFFFAOYS A-NSMILES: N#CC(OC2OC(CO...