Qualified numbers

skalkin · August 4, 2020, 5:14am

Due to the popular request from our clients in the pharma R&D space, we are introducing a special type of numbers - qualified numbers, or qnum. They are typically used to represent measurements when the exact value is not known, but we know that it is either less or greater than some value. The examples of such values are “<3.5” or “>5E-6”.

In Datagrok, it was already possible to represent such values by having two columns (qualifier and value) and customizing the cell renderer to show the qualifier before the actual value - we have even done this exercise during the hackaton with one of our clients. However, this solution quickly becomes limited and kludgy once you consider free-text editing, rendering with different numerical formats, removing columns, or sorting.

To support these scenarios and keep the performance on par with the integer and float number types that the platform supports, we needed to come up with a solid solution. We did that by introducing a new data type that is based on the 64-bit floating point number, but two least significant bits of mantissa are reserved for the qualifier. This way, we still have the ability to efficiently store the numbers and perform arithmetic operations on them without having to incur costly packing/unpacking. The qualifier could be either 1 (LESS), 2 (EXACT), or 3 (GREATER) - this way, we can compare qnums using regular floating point number comparison.

There is no special internal data type, the values are 64-bit IEEE754 floats, and in most cases the application developers won’t even have to check whether the value is qualified or not, since the result for the most operations (rendering, using for visualization, arithmetic operations) will be the same. However, in cases it does matter the programmer has to keep track of whether the values are qualified, and pack/unpack accordingly (check out Qnum class that contains helper methods for that).

The devil is always in the details, and adding a new core data type is no small feat, since all numerical types support the following:

binary serialization
format-supported CSV serialization
converting to/from string, float, int, and qnum column type
format-based parsing (i.e., you can enter “>$10,000” for the “money” format)
proper sorting that takes qualifiers into account
aggregation (results are always exact numbers though)
support for raw byte buffers (useful for JS-based high-performance computations and visualizations)

Here are some of the examples of creating and manipulating qnum columns (keep in mind that this is our dev server - please request an account if you want to be on the bleeding edge):

https://dev.datagrok.ai/js/samples/data-frame/qualified-numbers
https://dev.datagrok.ai/js/samples/data-frame/qualified-numbers-2

The public version with qnums will likely be released later this week.