Collections:
Calculate Substitutions in Alignments
How to Calculate Substitutions in Sequence Alignments?
✍: FYIcenter.com
The substitutions property of an alignment reports how often letters in the alignment are substituted for each other. This is calculated by taking all pairs of rows in the alignment, counting the number of times two letters are aligned to each other, and summing this over all pairs.
Here is an example on how to print out the substitutions property.
fyicenter$ python >>> from Bio.Seq import Seq >>> from Bio.SeqRecord import SeqRecord >>> from Bio.Align import MultipleSeqAlignment >>> alignment = MultipleSeqAlignment([ ... SeqRecord(Seq("ACTCCTA"), id="seq1"), ... SeqRecord(Seq("AAT-CTA"), id="seq2") ... ]) >>> print(alignment.substitutions) A C T A 2.0 0.5 0.0 C 0.5 1.0 0.0 T 0.0 0.0 2.0
Here is another example using the sequence alignment file, PF05371_seed.faa.
>>> from Bio import AlignIO >>> alignment = AlignIO.read("PF05371_seed.faa", "fasta") >>> print(alignment) Alignment with 7 rows and 52 columns AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRL...SKA COATB_BPIKE/30-81 AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIKL...SRA Q9T0Q8_BPIKE/1-52 DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRL...SKA COATB_BPI22/32-83 AEGDDP---AKAAFNSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA COATB_BPM13/24-72 AEGDDP---AKAAFDSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA COATB_BPZJ2/1-49 AEGDDP---AKAAFDSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA Q9T0Q9_BPFD/1-49 FAADDATSQAKAAFDSLTAQATEMSGYAWALVVLVVGATVGIKL...SRA COATB_BPIF1/22-73 >>> >>> print(alignment.substitutions) A D E F G I K L M N P Q R S T V W Y A 146.0 6.5 8.5 2.5 4.0 0.0 0.0 0.0 0.0 0.0 13.0 0.0 0.0 1.0 14.5 5.0 0.0 0.0 D 6.5 25.0 6.0 0.5 2.0 0.0 0.0 0.0 0.0 9.0 0.0 0.0 0.0 3.0 2.0 0.0 0.0 0.0 E 8.5 6.0 19.0 0.0 2.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 F 2.5 0.5 0.0 48.0 0.0 0.0 0.0 0.0 6.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 G 4.0 2.0 2.5 0.0 24.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 4.0 7.5 10.0 0.0 0.0 I 0.0 0.0 0.0 0.0 0.0 43.0 0.0 4.5 0.0 0.0 0.0 0.0 0.0 3.0 5.0 7.5 0.0 0.0 K 0.0 0.0 0.0 0.0 0.0 0.0 71.0 0.0 0.0 0.0 0.0 4.5 10.0 0.0 7.5 0.0 0.0 0.0 L 0.0 0.0 0.0 0.0 0.0 4.5 0.0 48.0 3.0 0.0 0.0 0.0 0.0 0.5 1.0 4.5 0.0 4.5 M 0.0 0.0 0.0 6.0 0.0 0.0 0.0 3.0 6.0 0.0 0.0 0.0 0.0 0.0 0.0 4.5 0.0 1.5 N 0.0 9.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 P 13.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 7.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 Q 0.0 0.0 0.0 0.0 0.0 0.0 4.5 0.0 0.0 0.0 0.0 12.0 0.0 6.0 1.5 0.0 0.0 7.5 R 0.0 0.0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 S 1.0 3.0 0.0 0.0 4.0 3.0 0.0 0.5 0.0 3.0 0.0 6.0 0.0 48.0 3.0 3.5 0.0 0.0 T 14.5 2.0 0.0 0.0 7.5 5.0 7.5 1.0 0.0 0.0 1.0 1.5 0.0 3.0 36.0 11.0 0.0 0.0 V 5.0 0.0 0.0 0.0 10.0 7.5 0.0 4.5 4.5 0.0 0.0 0.0 0.0 3.5 11.0 59.0 0.0 0.0 W 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 21.0 0.0 Y 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.5 1.5 0.0 0.0 7.5 0.0 0.0 0.0 0.0 0.0 12.0
Note numbers on the diagonal line of the output indicate how often the corresponding letter aligned. For example, the A-A substitution is 146.0, so letter A aligns for 146 times.
⇒ Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()
⇐ Read Sequence Alignments with Bio.AlignIO
2023-08-03, 339🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1000951 SMILES: CCCCCC1=CC(=C(C(=C1)O)C2 C=C(CCC2C(=C)C)C)OReceived at FYI...
What are the options for installing Open Babel on macOS computers? There are a number of options for...
Molecule Summary: ID: FYI-1001101 SMILES: CCOP(=O)(C)SCCN(C(C)C)C( C)CReceived at FYIcenter.com on: ...
Molecule Summary: ID: FYI-1000305 SMILES: CCc1ccc(O)c(/N=N/c2ccc(C )cc2C)c1Received at FYIcenter.com...
Molecule Summary: ID: FYI-1001099 SMILES: Received at FYIcenter.com on: 2021-12-26