Collections:
Calculate Substitutions in Alignments
How to Calculate Substitutions in Sequence Alignments?
✍: FYIcenter.com
The substitutions property of an alignment reports how often letters in
the alignment are substituted for each other. This is calculated by
taking all pairs of rows in the alignment, counting the number of times
two letters are aligned to each other, and summing this over all pairs.
Here is an example on how to print out the substitutions property.
fyicenter$ python >>> from Bio.Seq import Seq >>> from Bio.SeqRecord import SeqRecord >>> from Bio.Align import MultipleSeqAlignment >>> alignment = MultipleSeqAlignment([ ... SeqRecord(Seq("ACTCCTA"), id="seq1"), ... SeqRecord(Seq("AAT-CTA"), id="seq2") ... ]) >>> print(alignment.substitutions) A C T A 2.0 0.5 0.0 C 0.5 1.0 0.0 T 0.0 0.0 2.0
Here is another example using the sequence alignment file, PF05371_seed.faa.
>>> from Bio import AlignIO >>> alignment = AlignIO.read("PF05371_seed.faa", "fasta") >>> print(alignment) Alignment with 7 rows and 52 columns AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRL...SKA COATB_BPIKE/30-81 AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIKL...SRA Q9T0Q8_BPIKE/1-52 DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRL...SKA COATB_BPI22/32-83 AEGDDP---AKAAFNSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA COATB_BPM13/24-72 AEGDDP---AKAAFDSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA COATB_BPZJ2/1-49 AEGDDP---AKAAFDSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA Q9T0Q9_BPFD/1-49 FAADDATSQAKAAFDSLTAQATEMSGYAWALVVLVVGATVGIKL...SRA COATB_BPIF1/22-73 >>> >>> print(alignment.substitutions) A D E F G I K L M N P Q R S T V W Y A 146.0 6.5 8.5 2.5 4.0 0.0 0.0 0.0 0.0 0.0 13.0 0.0 0.0 1.0 14.5 5.0 0.0 0.0 D 6.5 25.0 6.0 0.5 2.0 0.0 0.0 0.0 0.0 9.0 0.0 0.0 0.0 3.0 2.0 0.0 0.0 0.0 E 8.5 6.0 19.0 0.0 2.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 F 2.5 0.5 0.0 48.0 0.0 0.0 0.0 0.0 6.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 G 4.0 2.0 2.5 0.0 24.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 4.0 7.5 10.0 0.0 0.0 I 0.0 0.0 0.0 0.0 0.0 43.0 0.0 4.5 0.0 0.0 0.0 0.0 0.0 3.0 5.0 7.5 0.0 0.0 K 0.0 0.0 0.0 0.0 0.0 0.0 71.0 0.0 0.0 0.0 0.0 4.5 10.0 0.0 7.5 0.0 0.0 0.0 L 0.0 0.0 0.0 0.0 0.0 4.5 0.0 48.0 3.0 0.0 0.0 0.0 0.0 0.5 1.0 4.5 0.0 4.5 M 0.0 0.0 0.0 6.0 0.0 0.0 0.0 3.0 6.0 0.0 0.0 0.0 0.0 0.0 0.0 4.5 0.0 1.5 N 0.0 9.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 P 13.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 7.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 Q 0.0 0.0 0.0 0.0 0.0 0.0 4.5 0.0 0.0 0.0 0.0 12.0 0.0 6.0 1.5 0.0 0.0 7.5 R 0.0 0.0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 S 1.0 3.0 0.0 0.0 4.0 3.0 0.0 0.5 0.0 3.0 0.0 6.0 0.0 48.0 3.0 3.5 0.0 0.0 T 14.5 2.0 0.0 0.0 7.5 5.0 7.5 1.0 0.0 0.0 1.0 1.5 0.0 3.0 36.0 11.0 0.0 0.0 V 5.0 0.0 0.0 0.0 10.0 7.5 0.0 4.5 4.5 0.0 0.0 0.0 0.0 3.5 11.0 59.0 0.0 0.0 W 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 21.0 0.0 Y 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.5 1.5 0.0 0.0 7.5 0.0 0.0 0.0 0.0 0.0 12.0
Note numbers on the diagonal line of the output indicate how often the corresponding letter aligned. For example, the A-A substitution is 146.0, so letter A aligns for 146 times.
⇒ Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()
⇐ Read Sequence Alignments with Bio.AlignIO
2023-08-03, 673🔥, 0💬
Popular Posts:
Molecule Summary: ID: FYI-1005945 Names: InChIKey: UHVFCHSLHAXDPS-UHFFFAOYS A-NSMILES: CCCNc1ccc(C(=...
Molecule Summary: ID: FYI-1002990 Names: InChIKey: JYTNQNCOQXFQPK-MRXNPFEDS A-NSMILES: Cc5ccc(n1nccn...
Molecule Summary: ID: FYI-1003824 Names: InChIKey: FMMOOAYVCKXGMF-MVKOLZDDS A-NSMILES: CCCCC/C=C/C/C...
Molecule Summary: ID: FYI-1004028 Names: InChIKey: TXCXZVFDWQYTIC-UHFFFAOYS A-NSMILES: Sc2nnc(c1ccnc...
Molecule Summary: ID: FYI-1004327 Names: InChIKey: HGERDUGTFUCSDZ-UHFFFAOYS A-NSMILES: NCCCOC(=O)C=O...