RDKit WebAssembly

skalkin · December 1, 2020, 4:04am

Since performance is really pivotal here, I think it would make sense to start with a set of benchmarks first, so that we can set goals and track the improvements. Let’s keep it very simple, yet representative of chemist’s everyday work. Here are some ideas:

Rendering

Dataset: 1,000 random Chembl molecules
Goal: make rendering seamless with no visual lags (currently, RDKit is slower than OpenChemLib in that area)

Render 1,000 molecules (overall rendering performance)
Render 20 molecules 100 times (horizontal scrolling)
Render 20 molecules 100 times with a sliding window (vertical scrolling)

Substructure Search

Dataset: 100,000 random Chembl molecules
Goal: make substructure search in 1 million molecules an interactive experience

Let’s start with substructures, and ignore complex SMARTS for the moment. For each test, we will be doing two searches (first one might involve calculating the fingerprints that we can speculatively do in the background)

Search for benzene ring
Search for aspirin

Similarity Search

Dataset: 100,000 random Chembl molecules
Goal: make similarity search in 1 million molecules an interactive experience

Find 50 most similar molecules to 10 random molecules

Miscellaneous

R-group analysis (need a good datasets for that)
Computing Lipinski properties for 100K molecules