Collections:
Read Motif in JASPAR Format with Bio.motifs
How to Read Motif in JASPAR Format with Bio.motifs Module?
✍: FYIcenter.com
The Bio.motifs.read() function allows to read motif files in several formats including JASPAR.
1. Download motif file in JASPER format by going to https://jaspar.genereg.net/matrix/MA0080.5/ and clicking the "JASPAR" download button. You see MA0080.5.jaspar file saved on your computer.
2. Read the motif file with read() function.
fyicenter$ python >>> from Bio import motifs >>> handle = open("MA0080.5.jaspar") >>> m = motifs.read(handle, "jaspar") >>> len(m) 20
3. View motif object structure.
>>> type(m) <class 'Bio.motifs.jaspar.Motif'> >>> print(m) TF name SPI1 Matrix ID MA0080.5 Matrix: 0 1 2 3 4 5 6 7 ... A: 42201.00 48240.00 54154.00 78831.00 81904.00 99739.00 15301.00 113087.00 ... C: 22587.00 21262.00 20183.00 11424.00 12269.00 2914.00 10958.00 3425.00 ... G: 38405.00 34277.00 37341.00 25893.00 25580.00 13479.00 100825.00 12544.00 ... T: 30010.00 29424.00 21525.00 17055.00 13450.00 17071.00 6119.00 4147.00 ...
4. View the consensus and anticonsensus sequences.
>>> m.consensus Seq('AAAAAAGAGGAAGTGAAAAA') >>> m.anticonsensus Seq('CCCCCCTCCCTCTCTTCCCC')
5. Calculate the total number of sequences used by the motif by adding the counts on the first position.
>>> m.counts[:, 0] {'A': 42201.0, 'C': 22587.0, 'G': 38405.0, 'T': 30010.0} >>> sum(m.counts[:, 0].values()) 133203.0
So 133,203 DNA sequences were used to create this motif. Those sequences are not included in the input file.
>>> type(m.instances) <class 'NoneType'>
⇒ Motif PCM, PFM, PPM, PWM with Bio.motifs
⇐ Motif Counts and Consensus with Bio.motifs
2023-07-05, 308🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1000295 SMILES: CC1(C)C2CCC1(C)C(=O)C2 Received at FYIcenter.com on: 2021-...
Molecule Summary: ID: FYI-1003254 Names: InChIKey: VYNIUBZKEWJOJP-UHFFFAOYS A-NSMILES: Nc4ccc(SCC3CO...
What is pubchem.ncbi.nlm.nih.gov /edit3?pubchem.ncbi.nlm.nih.gov /edit3is a Website that offers PubC...
Molecule Summary: ID: FYI-1005666 Names: InChIKey: URBKETJPLVPCMB-UHFFFAOYS A-NSMILES: O=Cc5cccc(c3n...
Molecule Summary: ID: FYI-1002884 Names: InChIKey: NFUHGIUBDKRZQY-UHFFFAOYS A-NSMILES: COC(=O)CCCNC(...