This is the root topic for discussing cheminformatics, including the features of our Chem package.
The Chem package provides first-class cheminformatics support for the Datagrok platform. See it in action on YouTube. Given existing platform’s capabilities in rich exploratory data analysis, advanced data mining and out-of-the-box support for predictive modeling and scientific computations, this package turns Datagrok into a comprehensive platform for working with chemical and biological data.
Our goal in performance is to be able to open chemical datasets of up to 10 millions small molecules completely in the browser, and interactively perform commonly used operations such as substructure and similarity search without having to rely on a server. In order to hit these goals, we are using a couple of techniques. First of all, we are leveraging Datagrok’s capability to efficiently work with relational data. For cheminformatics, we are relying on the RDKit library compiled to WebAssembly. Not only this gives us the ability to execute C++ code at the native speed, but also enables full advantage of the modern multicore CPUs by running computations in multiple threads.
Here are some of the Chem features:
- Works completely on the client side where possible
- Highlighting substructures on search (in progress),
- Aligning to scaffolds (in progress),
- Rendering options (in progress)
- Fingerprint-based similarity and diversity analyses (see video)
- Efficient in-memory substructure and similarity searching
The following Chem features are still in the core, but we plan to move them out to this package:
- Molecule sketching
- SAR analysis
- Property calculators (server-side)
- 3D: coordinate calculation using RDKit, rendering using NGL Viewer
- Chembl integration
- Pubchem integration
- “Sketch-to-predict”: run predictive models as you sketch the molecule
Join our discussion here if you are interested in high-performance cheminformatics, and check what’s already available with the Chem package.