RDKit WebAssembly

In the last version of RDKit WASM with substructLibrary support, thanks to @ptosco, the compilation option ALLOW_MEMORY_GROWTH helped open the entire 4 Gb memory space for the library, compared to just 16 Mb available by default. We’ve also posted the recent benchmarks in Cheminformatics which show the consistent 10x speedup compared to a “naive” graph-based search.

We’ve also learned that one needs to use add_smiles instead of add_trusted_smiles in case your SMILES comes not from the RDKit itself, thus it isn’t a normalized (trusted) SMILES. E.g., this is not a normalized SMILES: COc1ccc(c2c1cccc2)C(=O)CCC(=O)O, but that is: COc1ccc(C(=O)CCC(=O)O)c2ccccc12.

There are some remaining questions. I hope @ptosco would some time to answer them.

  1. What is the difference between get_mol and get_qmol?

  2. In general, what is a safe way to estimate whether there is enough memory for the given amount of molecules, say, for an array of molecules size N?

  3. How would the performance change compared to substructLibrary if we imagine such scenario. First (1), we only compute the fingerprints to molecules of the fingerprint type used for substructure search (which are they in RDKit substructLibrary?). Second (2), we simply go through these pre-computed fingerprints and match them against the pattern fingerprint perhaps using some additional logic for matching, perhaps via some additional function which RDKit may expose to match the fingerprints. How much more to this substructLibrary does? We are seeking out for such use case, as in many applications we can compute and cache these fingerprints once and then reuse all the time.