Calculate Substitutions in Alignments

Q

How to Calculate Substitutions in Sequence Alignments?

✍: FYIcenter.com

A

The substitutions property of an alignment reports how often letters in the alignment are substituted for each other. This is calculated by taking all pairs of rows in the alignment, counting the number of times two letters are aligned to each other, and summing this over all pairs.

Here is an example on how to print out the substitutions property.

fyicenter$ python
>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> from Bio.Align import MultipleSeqAlignment
>>> alignment = MultipleSeqAlignment([
...     SeqRecord(Seq("ACTCCTA"), id="seq1"),
...     SeqRecord(Seq("AAT-CTA"), id="seq2")
...   ])
>>> print(alignment.substitutions)
    A   C   T
A 2.0 0.5 0.0
C 0.5 1.0 0.0
T 0.0 0.0 2.0

Here is another example using the sequence alignment file, PF05371_seed.faa.

>>> from Bio import AlignIO
>>> alignment = AlignIO.read("PF05371_seed.faa", "fasta")
>>> print(alignment)
Alignment with 7 rows and 52 columns
AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRL...SKA COATB_BPIKE/30-81
AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIKL...SRA Q9T0Q8_BPIKE/1-52
DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRL...SKA COATB_BPI22/32-83
AEGDDP---AKAAFNSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA COATB_BPM13/24-72
AEGDDP---AKAAFDSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA COATB_BPZJ2/1-49
AEGDDP---AKAAFDSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA Q9T0Q9_BPFD/1-49
FAADDATSQAKAAFDSLTAQATEMSGYAWALVVLVVGATVGIKL...SRA COATB_BPIF1/22-73
>>>
>>> print(alignment.substitutions)
      A    D    E    F    G    I    K    L   M   N    P    Q    R    S    T    V    W    Y
A 146.0  6.5  8.5  2.5  4.0  0.0  0.0  0.0 0.0 0.0 13.0  0.0  0.0  1.0 14.5  5.0  0.0  0.0
D   6.5 25.0  6.0  0.5  2.0  0.0  0.0  0.0 0.0 9.0  0.0  0.0  0.0  3.0  2.0  0.0  0.0  0.0
E   8.5  6.0 19.0  0.0  2.5  0.0  0.0  0.0 0.0 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
F   2.5  0.5  0.0 48.0  0.0  0.0  0.0  0.0 6.0 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
G   4.0  2.0  2.5  0.0 24.0  0.0  0.0  0.0 0.0 0.0  3.0  0.0  0.0  4.0  7.5 10.0  0.0  0.0
I   0.0  0.0  0.0  0.0  0.0 43.0  0.0  4.5 0.0 0.0  0.0  0.0  0.0  3.0  5.0  7.5  0.0  0.0
K   0.0  0.0  0.0  0.0  0.0  0.0 71.0  0.0 0.0 0.0  0.0  4.5 10.0  0.0  7.5  0.0  0.0  0.0
L   0.0  0.0  0.0  0.0  0.0  4.5  0.0 48.0 3.0 0.0  0.0  0.0  0.0  0.5  1.0  4.5  0.0  4.5
M   0.0  0.0  0.0  6.0  0.0  0.0  0.0  3.0 6.0 0.0  0.0  0.0  0.0  0.0  0.0  4.5  0.0  1.5
N   0.0  9.0  0.0  0.0  0.0  0.0  0.0  0.0 0.0 3.0  0.0  0.0  0.0  3.0  0.0  0.0  0.0  0.0
P  13.0  0.0  0.0  0.0  3.0  0.0  0.0  0.0 0.0 0.0  7.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0
Q   0.0  0.0  0.0  0.0  0.0  0.0  4.5  0.0 0.0 0.0  0.0 12.0  0.0  6.0  1.5  0.0  0.0  7.5
R   0.0  0.0  0.0  0.0  0.0  0.0 10.0  0.0 0.0 0.0  0.0  0.0  2.0  0.0  0.0  0.0  0.0  0.0
S   1.0  3.0  0.0  0.0  4.0  3.0  0.0  0.5 0.0 3.0  0.0  6.0  0.0 48.0  3.0  3.5  0.0  0.0
T  14.5  2.0  0.0  0.0  7.5  5.0  7.5  1.0 0.0 0.0  1.0  1.5  0.0  3.0 36.0 11.0  0.0  0.0
V   5.0  0.0  0.0  0.0 10.0  7.5  0.0  4.5 4.5 0.0  0.0  0.0  0.0  3.5 11.0 59.0  0.0  0.0
W   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0 0.0 0.0  0.0  0.0  0.0  0.0  0.0  0.0 21.0  0.0
Y   0.0  0.0  0.0  0.0  0.0  0.0  0.0  4.5 1.5 0.0  0.0  7.5  0.0  0.0  0.0  0.0  0.0 12.0

Note numbers on the diagonal line of the output indicate how often the corresponding letter aligned. For example, the A-A substitution is 146.0, so letter A aligns for 146 times.

 

Fetch Sequences from NCBI with Bio.Blast.NCBIWWW.qblast()

Read Sequence Alignments with Bio.AlignIO

Biopython - Tools for Biological Computation

⇑⇑ OBF (Open Bioinformatics Foundation) Tools

2023-08-03, 339🔥, 0💬