Collections:
Search for Motif Matches with Bio.motifs
How to Search for Matches in a Target Sequence again a motif with Bio.motifs?
✍: FYIcenter.com
Bio.motifs module offers two options to search for segments that match a motif in a target sequence.
1. Use the motif instances to search for exact matches.
fyicenter$ python >>> from Bio.Seq import Seq >>> instances = [ ... "TACAA", ... "TACGC", ... "TACAC", ... "TACCC", ... "AACCC", ... "AATGC", ... "AATGC", ... ] >>> m = motifs.create(instances) >>> test_seq = Seq("TACACTGCATTACAACCCAAGCATTA") >>> for pos, seq in m.instances.search(test_seq): ... print("%i %s" % (pos, seq)) ... 0 TACAC 10 TACAA 13 AACCC
As you can see, 3 matches found that match exact with one of those sequence samples.
2. Use the motif PSSM to search for approximate matches.
>>> pseudocounts={"A": 0.6, "C": 0.4, "G": 0.4, "T": 0.6} >>> pwm = m.counts.normalize(pseudocounts) >>> background = {"A": 0.3, "C": 0.2, "G": 0.2, "T": 0.3} >>> pssm = pwm.log_odds(background) >>> print(pssm) 0 1 2 3 4 A: 0.42 1.49 -2.17 -0.05 -0.75 C: -2.17 -2.17 1.58 0.42 1.83 G: -2.17 -2.17 -2.17 0.92 -2.17 T: 0.77 -2.17 -0.05 -2.17 -2.17 >>> for position, score in pssm.search(test_seq, threshold=3.0): ... print("Position %d: score = %5.3f" % (position, score)) ... Position 0: score = 5.622 Position -20: score = 4.601 Position 10: score = 3.037 Position 13: score = 5.738 Position -6: score = 4.601
Note that the negative positions refer to matches of the motif found on the reverse strand of the test sequence, which are positioned backward starting from the end of the test sequence.
3. Calculate matching scores of all positions. The output only shows scores of forward matches.
>>> scores = pssm.calculate(test_seq) >>> print(scores) [ 5.622304 -5.6797 -3.4317725 0.93827754 -6.849625 -2.0406609 -10.849625 -3.6561453 -0.03370807 -3.9110255 3.0373416 -2.1491852 -0.6016975 5.7381525 -0.509775 -3.5642228 -8.734148 -0.09919716 -0.6016975 -2.3942978 -10.849625 -3.6561453 ]
As you can, the match at the first position, "TACAC", has the highest score of 5.6.
⇒ Compare Motifs Using PSSM with Bio.motifs
⇐ Sequence Score against PSSM with Bio.motifs
2023-06-19, 313🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1000283 SMILES: NC(CCCCC(=O)c1ccc(Cl)cc1 (c2ccccc2)c3ccccc3Received at FYI...
Molecule Summary: ID: FYI-1001999 SMILES: CCCCC/C=C\\C/C=C\\CCCCCC CCC1(OC(CO1)CCN(C)C)CCCC CCCC/C=C\\...
Molecule Summary: ID: FYI-1001889 SMILES: CCCCCCCCCCCCCC(=O)NC(CCC (N)=O)C(=O)NC(CC(N)=O)C( =O)NC(CO)C...
How to use Wildcard Atom in a substructure search using "babel" commands? You can use "*" in a SMART...
How to create and edit a chemical Reaction with Ketcher editor? To help you to create and edit a che...