Hi,
I would like to utilize Hit Triage module to select favorable records from data sources using both file and DB.
Is it possible to add customized SQL query in “Source”?, shown in above figure.
Probably, there is a place to modify and add new query with SQL. Please let me know how can I modify it.
Hello.
Indeed there is. Hit Triage was built with extensibility in mind so that sources and functions can be extended from any installed package, including HitTriage as well.
As an example you can see that the Chembl sources are coming from the Chembl package itself. The source function needs to be in package.ts
file of any installed package and have a HitTriageDataSource
tag. Example of Chembl can be seen here (line 120). In this case the function is calling a query existing in the package, but this is not necessary as long as query exists on the platform. The query can be seen here (line 34).
To add your own source, you could add the query with your connection and expose the function from any package. Let me know if you need any help with this.
You can also write annotated SQL queries directly, and they become functions that HitTriage / HitDesign can use. @drizhinashvili will provide examples and documentation soon
We have released the Hit triage package version 1.1.4, which now supports direct use of queries as source functions. This means that now there is no need to have such functions exported from any package, as long as the query is saved on platform and shared to user. Data source queries should be tagged with same HitTriageDataSource
tag. As an example, you can add a query which loads molecules from Chembl database and add the tag:
--name: _someChemblStructure
--friendlyName: Load Some Chembl structures
--input: int numberOfMolecules = 1000
--tags: HitTriageDataSource
--connection: Chembl
select
canonical_smiles, molregno
from
compound_structures
limit @numberOfMolecules
As in functions, Hit triage will understand that this query needs an input and will add corresponding input to new campaign form. This example is also available in the README of Hit triage app.
Also, version 1.1.4 addresses the issues with incorrect addition and saving of calculated parameters (through the wrench icon on ribbon panel).
Thanks Davit and Andrew.
I’ll check if I can do it. Direct use of query seems easier to use. I appreciate your quick and kind help.
Hi,
Thank you for sharing informative pages to learn how to build queries. It would be really helpful if you could share how to dynamically refering column values in a given table and add new column or add a new page.
I guess, how to check if the same ChEMBL compound exists would be similar to what I want to do. That’s why please let me know how to set save-query to a given table to add new column(s) or a new page.
If this additional question is recommended to put a new topic, please let me know. I will do so.
Hi.
We have updated Hit triage package to version 1.1.5, which includes the following:
- Fixes to script usage as compute functions in HitTriage App.
- Updated and improved documentation
- Ability to use queries as compute functions.
You can read more about query creation here.
Adding compute query that will work with HitDesign and HitTriage apps, it is similar to scripts and functions. The query should have the HitTriageFunction
tag and have at least one input, first of which must be list<string>
, representing the list of molecules. The query must return a dataframe, which should contain column molecules
(in order to join result with initial dataframe) and other columns which you wish to add. molecules
column will be used as key for joining tables. For example, you can create a query that looks for the molecule in Chembl database and returns the molregno number:
--name: ChemblMolregNoBySmiles
--friendlyName: Chembl Molregno by smiles
--input: list<string> molecules
--tags: HitTriageFunction
--connection: Chembl
select molregno, molecules from compound_structures c
INNER JOIN unnest(@molecules) molecules
ON molecules.molecules = c.canonical_smiles
Similarly, you can use RDKit SQL cartridge to calculate for example fraction of sp3 hybridized carbons in the molecule:
--name: SP3Fraction
--friendlyName: SP3 fraction of carbons
--input: list<string> molecules
--tags: HitTriageFunction
--connection: Chembl
select molecules, mol_fractioncsp3(Cast(molecules as mol))
from unnest(@molecules) as molecules
where is_valid_smiles(Cast(molecules as cstring))
After the query is saved, it will be automatically picked up by HT/HD applications, and can be used for computations. More information about Hit Triage and Hit Design Can be found in their readme files.
If you have any questions or need any support with writing custom queries/scripts, feel free contact us.
Hi, Devit,
Thank you for updating documents and sharing SQL examples. That’s informative for me to understand more about how to build flexible query in HD/HT. Actually, I confirmed I can use both SQL examples to get molregno or mol_fractioncsp3.
However, since I use Oracle DB, I would like to ask you more about SQL to write correct code. I made a mock example, depicted in attached image. In exemplified story, I have a table data in HitDesign whose columns are country and branch. I have country-capital corresponding inhouse database, and I would like to join “capital” column to HitDesign table.
I wrote below SQL query and it worked when I put input value(s) manually. On the other hand, such query didn’t join any values in HitDesign. odcivarchar2list (oracle) function was used in stead of unnest (postgresql).
Do you have any thoughts about this phenomenon? Any thoughts and comments are well appreciated.
--name: Get_capital_from_country
--friendlyName: from country to get capital
--input: list<string> country
--tags: HitTriageFunction
--connection: InhouseDB
select cap.country, cap.capital
from capital_mst cap
INNER JOIN TABLE(sys.odcivarchar2list(@country)) t
ON t.column_value = cap.country
Thank you in advance,
Kosuke
Hi.
The compute functions that are tagged with HitTriageFunction
are intended for the molecules only. Meaning that the app will send a list of molecules to the query, and nothing else. It will not work with other columns, as the compute functions are intended for molecular calculations.
Oracle sets the name of the table created from list as column_value
, and therefore you should select it and change its name. That being said, here is the version of query that will work with oracle (adapted to use molecules):
--name: Get_inhouse_id_from_molecule
--friendlyName: Get proprietary id of the molecule
--input: list<string> molecules
--tags: HitTriageFunction
--connection: InhouseDB
select id, column_value as molecules
from inhouse_molecule_storage db
INNER JOIN sys.odcivarchar2list(@molecules) molecules
on molecules.column_value = db.smiles
this query assumes that you have some database called inhouse_molecule_storage
where there are at least two columns: id
which has the id of molecule and smiles
which contains the molecule in smiles format.
I hope this information helps.
let me know if you need any additional support.
Best.
Davit.
Thank you for quick reply. It was nice to know the query is designed to handle molecule object.
SQL example for Oracle DB is enough clear to understand. I will modify it. Just for sure, I would like to confirm how it works because @molecules will include SMILES strings and “on molecules.column_value = db.smiles” statement is used to specify data to join. As you know, SMILES can be written in multiple ways in a single molecule, so does datagrok re-compute SMILES to match or literally checking the same SMILES strings?
Hi Kosuke, this is a great question. I think by default HitDesign uses RDKit to canonicalizes the molecules, however if they are stored in a different format in the database there are at least two possible ways to address it:
- if you have a chemical cartridge such as RDKit available in your database, use the corresponding cartridge function (such as mol_to_smiles in RDKit) to canonicalize molecule representation within the query
- if you want to convert (canonicalize probably) the molecule to the way it is represented in the database on the client side, this could be done by adding a function (could be in JavaScript, could be in another language) annotated in a special way - @drizhinashvili will provide more info on that.
Hi.
grok.chem.convert
can be used in scripts or function to convert molecules of different notations. If the target notation is set as smiles, the output will be molecule in canonical smiles notation. Here is the example script that will convert the molecules of unknown notation to all other notations offered by DG (Smiles, Smarts, mol blocks, V3 mol blocks) and add them as new columns.
//name: getOtherNotations
//description: Converts molecule in unknown notation to different notations
//language: javascript
//tags: HitTriageFunction
//input: dataframe t
//input: column col
const smileCol = t.columns.addNewString(t.columns.getUnusedName(`Smiles(${col.name})`));
const smartsCol = t.columns.addNewString(t.columns.getUnusedName(`Smarts(${col.name})`));
const molBlockCol = t.columns.addNewString(t.columns.getUnusedName(`molBlock(${col.name})`));
const V3MolBlockCol = t.columns.addNewString(t.columns.getUnusedName(`V3MolBlock(${col.name})`));
smileCol.init((i) => grok.chem.convert(col.get(i), DG.chem.Notation.Unknown, DG.chem.Notation.Smiles));
smartsCol.init((i) => grok.chem.convert(col.get(i), DG.chem.Notation.Unknown, DG.chem.Notation.Smarts));
molBlockCol.init((i) => grok.chem.convert(col.get(i), DG.chem.Notation.Unknown, DG.chem.Notation.MolBlock));
V3MolBlockCol.init((i) => grok.chem.convert(col.get(i), DG.chem.Notation.Unknown, DG.chem.Notation.V3KMolBlock));
smileCol.semType = DG.SEMTYPE.MOLECULE;
smartsCol.semType = DG.SEMTYPE.MOLECULE;
molBlockCol.semType = DG.SEMTYPE.MOLECULE;
V3MolBlockCol.semType = DG.SEMTYPE.MOLECULE;