"babel -i fs ..." - Fingerprint Index Based Search

Q

How to perform a Fingerprint Index Based Search with the "babel" command?

✍: FYIcenter.com

A

You can perform a Fingerprint Index Based Search with the "babel" command using the "-i fs" option to specify the fingerprint index file as the input.

For more information, read the help document on fastsearch index file format "fs":

fyicenter$ babel -H fs 

fs  Fastsearch format
Fingerprint-aided substructure and similarity searching

Writing to the fs format makes an index of a multi-molecule datafile::

      babel dataset.sdf -ofs

This prepares an index :file:`dataset.fs` with default parameters, and is slow
(~30 minutes for a 250,000 molecule file).

However, when reading from the fs format searches are much faster, a few seconds,
and so can be done interactively.

The search target is the parameter of the ``-s`` option and can be
slightly extended SMILES (with ``[#n]`` atoms and ``~`` bonds) or
the name of a file containing a molecule.

Several types of searches are possible:

- Identical molecule::

      babel index.fs outfile.yyy -s SMILES exact

- Substructure::

      babel index.fs outfile.yyy  -s SMILES   or
      babel index.fs outfile.yyy  -s filename.xxx

  where ``xxx`` is a format id known to OpenBabel, e.g. sdf
- Molecular similarity based on Tanimoto coefficient::

      babel index.fs outfile.yyy -at15  -sSMILES  # best 15 molecules
      babel index.fs outfile.yyy -at0.7 -sSMILES  # Tanimoto >0.7
      babel index.fs outfile.yyy -at0.7,0.9 -sSMILES
      #     Tanimoto >0.7 and Tanimoto < 0.9

The datafile plus the ``-ifs`` option can be used instead of the index file.

NOTE that the datafile MUST NOT be larger than 4GB. (A 32 pointer is used.)

.. seealso::

    :ref:`fingerprints`

Write Options (when making index) e.g. -xfFP3
 f# Fingerprint type
     If not specified, the default fingerprint (currently FP2) is used
 N# Fold fingerprint to # bits
 u  Update an existing index

Read Options (when searching) e.g. -at0.7
 t# Do similarity search:#mols or # as min Tanimoto
 a  Add Tanimoto coeff to title in similarity search
 l# Maximum number of candidates. Default is 4000
 e  Exact match
     Alternative to using exact in ``-s`` parameter, see above
 n  No further SMARTS filtering after fingerprint phase

 

"babel -i fs ... -s ... -at n" - Top n Similarity Search

"babel ... -o fs" - Generate Fastsearch Index

Using Fastsearch Fingerprint Index

⇑⇑ Open Babel Tutorials

2020-08-25, 908🔥, 0💬