What Is Tanimoto coefficient

Q

What Is Tanimoto coefficient?

✍: FYIcenter.com

A

Tanimoto coefficient is a metric (or score) to measure the similarity of two sets of elements.

Tanimoto coefficient can be simply defined as the ratio of the intersection of the two sets over the union of the two sets.

More precisely, the Tanimoto coefficient of set A and set B can be defined as:

T = Nc / (Na + Nb - Nc)

where:
  Na is the number of elements in set А
  Nb is the number of elements in set B
  Nc is the number of elements that are shared in A and B

For example, the Tanimoto coefficient of (A, B, C, D, E) and (I, H, G, F, E, D) is 2/9 = 0.22.

You can use the above definition to verify "babel" similarity search result of some simple molecules:

Example 1 - Tanimoto coefficient, or similarity score of between Ethane and Propane molecules. The fingerprint of CC has 1 bit and the CCC has 2 bits with 1 shared with CC. So the Tanimoto coefficient is 1/2 = 0.5. This matches the "babel" result.

fyicenter$ obabel -:CC -o fpt
>   1 bits set 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 40000000 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 
1 molecule converted

fyicenter$ obabel -:CCC -o fpt
>   2 bits set 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 40000000 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00000010 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 
1 molecule converted

fyicenter$ obabel -:CC -:CCC -o fpt
>
>   Tanimoto from first mol = 0.5
Possible superstructure of first mol
2 molecules converted

Example 2 - Tanimoto coefficient, or similarity score of between 6-carbon chain and 6-carbon ring. The fingerprint of 6-carbon chain has 5 bits and the 6-carbon ring has 6 bits with 5 shared. So the Tanimoto coefficient is 5/6 = 0.83. This matches the "babel" result.

fyicenter$ obabel -:CCCCCC -o fpt
>   5 bits set 
00000000 01000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 40000000 
00000000 00000000 00000000 00000000 00000000 00000000 
00002000 00000001 00000000 00000000 00000000 00000010 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 
1 molecule converted

fyicenter$ obabel -:C1CCCCC1 -o fpt
>   6 bits set 
00000000 01000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 40000000 
00000000 00000000 00000000 00000000 00000000 00000000 
02002000 00000001 00000000 00000000 00000000 00000010 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 
1 molecule converted

fyicenter$ obabel -:CCCCCC -:C1CCCCC1 -o fpt
>
>   Tanimoto from first mol = 0.833333
Possible superstructure of first mol
2 molecules converted

 

"babel -ofpt -xs" - Display Fingerprint Fragments

Generate Fingerprint of Single Molecule

Similarity Search with Open Babel

⇑⇑ Open Babel Tutorials

2022-12-15, 23114🔥, 2💬