Hierarchical clustering

Hi Team

I wanted to play with the hierarchical clustering, but I do not seem to be able to get it running on our env, nor on the public grok instance.

I used following command:
grok.functions.call(‘Dendrogram:hierarchicalClustering’, {df: grok.shell.tv.dataFrame, colNameList: [‘Smiles’], distance: ‘euclidean’, linkage: ‘ward’})

It fails with an uncaught exception in the promise in our env and with “Unable to get project asset “getMorganFingerprints”” on the public grok instance.

Does it need the compute server being set up?


Hello Nico.

Hierarchical clustering of molecules is dependent on distances between their morgan fingerprints, which is calculated by the Chem package. Could you please specify which version of Chem is installed on your env? The error means that your version of chem does not include morgan fingerprint calculation function.

so the Morgan Fingerprint error happened on the public grok instance :wink:

For us it failed with the uncaught exception.

Hi Nico!
Sorry, there was an old version of Chem uploaded on our public. Now hierarchical clustering works fine.

ok, thanks.

For our internal instance, could the problem be that we have no running compute server?

I also played with the public SPGI demo file, which contains 3624 rows, and the dendogram is hard to use, as you will never get to a compact overview of it:

Would it be possible to get a separate dendogram-like viewer, where one can zoom out, so one can easily see the whole overview and select groups from there?

Hello Nico.

Thank you for the suggestion. We do have a possibility of separate dendrogram that supports zooming, moving around and other features, but is not currently adapted to molecules/macromolecules. We will definitely add this feature. In the meanwhile, you can try different linkage methods from the menu or from function call. From the picture I can see that you are probably using ward linkage, which tends to produce such results on large data. I can suggest single or average linkage for molecules which produces better visual dendrogram. the list of available linkage options is as follows:
'single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'

I hope this helps.

As per your first question, clustering is running on client, so no compute server needed. Are there any details in the error thrown? And if so, could you provide the error details?

Also, the exposed API method in dendrogram was added in version 1.2.21. Could you please make sure this is the version of dendrogram you are using?

it does indeed look better with “single”

Working on 1.2.20 right now, will upgrade to 1.2.21 then and test again

With 1.2.21 it does fail differently now:

Does it build and attach the dendrogram though? This is a warning that is not supposed to hinder with dendrogram building.

no, nothing is happening (no Dendogram appearing)

Interesting… Is there anything else in console?

nope, nothings else appears in the Console

Hi, Nico!
I’ve just tried to run HC on the sandbox on your side on the SPGI dataset with the following script:
grok.functions.call('Dendrogram:hierarchicalClustering', {df: grok.shell.tv.dataFrame, colNameList: ['Structure'], distance: 'euclidean', linkage: 'single'})

And I’ve received the dendrogram.

Indeed, with the native Datagrok application it does work. It seems like our Intuence Discovery app is breaking something.
I will pass that to our team.
Thanks for your help!