Hi everyone!
We started development of a new package “Bio” to process macromolecules entities and their monomer constituents.
Sequence renderer has been added to Bio.
Sequences are now automatically detected and rendered in a grid with monomers (both natural and unnatural) aligned.
The alignment of multiple sequences is a fundamental step in analyzing biological data. For that, we use kalign that runs in the browser thanks to Robert Aboukhalil’s biowasm. It is a powerful algorithm capable of aligning thousands of protein or nucleotide sequences.
To run multiple alignment, select Bio | MSA...
from the top menu:
The support for HELM notation
is provided.
Among the capabilities are rendering in the spreadsheet, calculations etc. More detailed information can be found here.
In addition, we are ready to introduce a feature that enables us to open Helm Web Editor
for ’fasta’ and ’separator’ notations from the Context menu and Actions panel. The example is presented below:
Sequence space tool is now available in Bio package.
Datagrok allows visualizing multidimensional sequence space using a dimensionality reduction approach. Several distance-based dimensionality reduction algorithms are available, such as UMAP or t-SNE. The sequences are projected to 2D space closer if they correspond to similar structures, and farther otherwise. The tool for analyzing molecule collections is called ‘Sequence space’ and exists in the Bio package.
To launch the analysis from the top menu, select Bio | Sequence space.
Activity cliffs analysis is available in Bio package.
Activity cliffs tool finds pairs of sequences where small changes in the sequence yield significant changes in activity or any other numerical property. open the tool from a top menu by selecting . Similarity cutoff and similarity metric are configurable. As in Sequence space, you can select from different dimensionality reduction algorithms.
To launch the analysis from the top menu, select Bio | Sequence Activity Cliffs.
After scatter plot is generated, a link with the number of the identified activity cliffs appears in the top right corner. Click on this link to open a dialog with the list of cliffs. Then, click on a particular pair to zoom in to it on a scatter plot. Hover a line to see the details on the corresponding sequences and activities.
Ctrl+click on the line to select the corresponding sequences.
Marker color corresponds to activity, marker size and line opacity correspond to the SALI parameter (similarity/activity difference relation).
Splitting to monomers
Splitting to monomers allows splitting aligned sequence column in multiple monomer columns. The function is available in the aligned sequence column actions in property panel.
Comparison of monomers sequences is now available
Datagrok allows comparing the current sequence of monomers (or the reference one) with all the others in the table. In the compare mode, the identical monomers in the corresponding positions are displayed with transparency, and the different ones are highlighted.
To set options for rendering and turn On/Off the comparison of monomers sequences, on the Context Pane, expand the Sequence Renderer info panel:
- Monomer width. In short mode, only the first character of the monomer is visible, followed by … if there are more characters.
- Color code. If disabled, all monomers are displayed in black.
- Reference sequence. When defined, the renderer displays all sequences in the compare mode with the specified sequence.
- Compare with current. If enabled, the renderer considers the current sequence as a reference and compares all other sequences with it. This option is enabled by default.
Peptides 1.5.0 is out!
This release addressed performance issues with the package, introduced a couple of useful features such as Invariant Map color-coding and Logo Summary Table customization as well as bug fixes!
Performance
Some of the analyses involve heavy computation that were making the UI unresponsive. We have analyzed the bottlenecks in the package, and came up with ways to improve user experience by optimizing the algorithms and switching to calculate-when-requested model. Here are the benchmark results on 5,000 peptides:
- Mutation Cliffs - 14x speedup
- Monomer-Position statistics - 38x speedup
- Cluster statistics - 6x speedup
- Refactoring - 3x speedup (improved overall performance)
Overall the analysis starts 30 times faster now!
Invariant Map
Invariant Map can now be color-coded. Color-coding settings can be modified in viewer properties: the user can choose column to color cells based on and values aggregation function.
Logo Summary Table
Logo Summary Table now allows to change WebLogo rendering mode and members ratio. Available WebLogo rendering modes are 100% height and Entropy. The members ratio sets a minimal threshold that filters out the clusters with less members.
Settings: Include columns
Include columns allows to choose any numeric column from dataset to show in Logo Summary Table. In the future releases we plan to show such values in Monomer-Position tooltips and main dataframe view.
Bug fixes
- Fixed a bug that rendered header WebLogo at the wrong coordinates
- Fixed a bug that prevented Mutation Cliffs from rendering if the sequences are of different length
- Fixed unreadable long monomer names
Peptides 1.5.0 is already available at public.datagrok.ai!
Notation converter for macromolecules
Any macromolecule can be represented in different notations: HELM, FASTA, and separator (with a specified delimiter for multicharacter monomer codes). To get a new column in a specific notation, do the following steps:
- Click the hamburger menu of the column with macromolecules.
- Select Actions > Bio | Convert. A dialog opens.
- Select desired notation from the Convert to dropdown list
Peptides 1.6.1 is out!
This release introduces new features and capabilities such as custom clustering and multiple views, as well as a slightly redesigned user interface and improved application stability. Let’s take a closer look at the new features.
Custom Clustering
Custom Clustering is a feature in the Logo Summary Table (LST) that allows users to create a new cluster from the current selection and filters. The new cluster is immediately added to the end of the LST. For custom clusters, LST calculates statistics such as members, mean difference, p-value, and builds plots such as WebLogo and Histogram.
It’s super easy to create a new cluster: simply select the monomer-position pairs or existing clusters you want to include in your new cluster and click the plus sign in the upper right corner of the LST. You can also customize the name of the new cluster in the viewer properties.
Users can also delete the created clusters by selecting the cluster to delete and clicking the trash can icon.
Multiple Views
Multiple Views allows you to create a separate view for a subset of your data. New views are created from the visible selected rows (visible means rows are not filtered out) and contain only the dataframe of the selected rows and the Logo Summary Table viewer.
To create a new view, simply select the monomer-position pairs or clusters in the LST and click the New View button at the top of the screen. The new view will immediately appear on your screen. You can customize the name of the view by changing the text box to the left of the New View button.
Note: each view must have a unique name.
Peptides 1.6.1 is already available on public.datagrok.ai!
It is now possible to run docking results in Datagrok and display them for both target and ligands with the help of NglViewer. The positions of the ligand have to be preliminarily computed and opened as a table.
To add NglViewer:
- On the menu ribbon, click the Add viewer icon.
- In the dialog, select NglViewer.
To update position of the ligand in the viewer, click the appropriate row in the grid, as shown in the gif.