In the last version of RDKit WASM with substructLibrary
support, thanks to @ptosco, the compilation option ALLOW_MEMORY_GROWTH
helped open the entire 4 Gb memory space for the library, compared to just 16 Mb available by default. We’ve also posted the recent benchmarks in Cheminformatics which show the consistent 10x speedup compared to a “naive” graph-based search.
We’ve also learned that one needs to use add_smiles
instead of add_trusted_smiles
in case your SMILES comes not from the RDKit itself, thus it isn’t a normalized (trusted) SMILES. E.g., this is not a normalized SMILES: COc1ccc(c2c1cccc2)C(=O)CCC(=O)O
, but that is: COc1ccc(C(=O)CCC(=O)O)c2ccccc12
.
There are some remaining questions. I hope @ptosco would some time to answer them.
-
What is the difference between
get_mol
andget_qmol
? -
In general, what is a safe way to estimate whether there is enough memory for the given amount of molecules, say, for an array of molecules size N?
-
How would the performance change compared to
substructLibrary
if we imagine such scenario. First (1), we only compute the fingerprints to molecules of the fingerprint type used for substructure search (which are they in RDKit substructLibrary?). Second (2), we simply go through these pre-computed fingerprints and match them against the pattern fingerprint perhaps using some additional logic for matching, perhaps via some additional function which RDKit may expose to match the fingerprints. How much more to thissubstructLibrary
does? We are seeking out for such use case, as in many applications we can compute and cache these fingerprints once and then reuse all the time.