Hi datagrok team,
I would like to suggest the additional functionality to the “Chem | Analyze | Hierarchical Clustering” feature.
One of the advantages of this tool is that it supports the Ward method clustering. Since it allows users to view dendrograms, it offers a visually intuitive way to understand clustering results.
Besides, the Ward method is a reproducible method, so it would be very useful if we could set a threshold to divide the data into an arbitrary number of clusters.
Currently, the Ward method appears to be limited to visualization only. I would like to kindly suggest considering the addition of new functionalities.
For example, it would be helpful if users could specify a threshold to determine how many clusters the data is divided into, and be able to add a column with cluster numbers assigned to each data point.
For your reference, I attached an image that illustrates this idea. I hope this can be helpful in considering future improvements.
Thank you very much for your time and attention.
Best regards,
Kosuke
3 Likes
Hi Kosuke,
We always greatly appreciate your feedback and ideas! Here’s a ticket to track progress on this feature:
#3641 Chem: Add ability to determine the number of clusters in Ward hierarchical clustering
1 Like
Dear Olena,
I appreciate your prompt action to issue an ticket.
I consider my suggestion again and found that setting threshold needs single value, so double sided slide bar is not necessary as shown in image above. Anyway, if you have any questions or comments please let me know.
Best,
Kosuke
1 Like
Dear Kosuke!
Thank you for the great suggestion. Indeed, what you mention is very useful, especially considering that all the information is already there and its easy to compute.
We have released the Dendrogram plugin version 1.4.6, which includes this update.
Also with the new version (along with some bug fixes), the generated dendrogram panel is now resizable. you can grab it at the edge and resize to desired width.
To cluster your data and generate a column, you can right click on the tree panel and choose assign clusters, or click on the button on the top left corner. In the dialog that opens, you can specify the Tree height threshold (height is counted from the root), or the number of clusters you want to get. in both cases, second input will be recalculated automatically. The threshold is in units of tree absolute height, so it is quite arbitrary. Let me know if it would make more sense to have threshold between 0 and 1 and then map it to tree units. The cluster column generated will be named `Cluster(thresholdValue)`, so that information is also preserved.
Attaching two videos for reference.
Best.
Davit.
3 Likes
Dear Davit,
Thank you for prompt deployment of highly improved feature for Ward hierarchical clustering.
I think what I suggested has been fully covered with the new function. Let me and my team try it and give some feedback if there are any additional suggestions or questions.
Best,
Kosuke
2 Likes
Hey Kosuke!
I realized that for crammed dendrograms (especially using ward clustering), heights of the nodes can be very close together, even so that with resized panel, it is hard to distinguish.
The latest version of the plugin (1.4.9) now includes ability to zoom in the dendrogram leaves, to better see connections (video bellow). Zooming can be achieved via ctrl+mousewheel. double click on the empty space in the panel to reset zoom.
Also, when you open the cluster assignment dialog and move the cutoff slider, there is going to be a red vertical line indicating the position of the cut.
I hope you find these useful.
Davit.
3 Likes
Dear Davit,
Thank you for the further update. It is definitely useful for users to visually check the dendrogram and set the threshold precisely. Also, the attached video is easy to understand. By improving the dendrogram feature, it will be possible to more quickly visualize and check a large number of structures through clustering and dividing them into a designated number of chunks containing similar structures, allowing me to focus on a smaller number of structures within each cluster.
Best,
Kosuke
2 Likes