Introducing Parquet and Feather support

oserhiienko · June 4, 2022, 3:43pm

Comma-separated values files have been enormously popular and used heavily for data storage and manipulation tasks.

However, with the introduction of WebAssembly, processing capabilities of web browsers were boosted and new opportunities appeared for efficient analytics in the browser. So, instead of using CSV files, it is suggested to work with Parquet and Feather formats in order to speed up OLAP workloads in the browser significantly.

And today we are excited to announce Parquet and Feather format support on Datagrok platform with Arrow package, built on parquet-wasm and apache-arrow libraries. It enables such functions as:

exporting any Datagrok dataframe to the Parquet and Feather formats;
importing any existing file in the highlighted above formats as Datagrok dataframe.

It is super easy to use.
If you want to save data in the Parquet or Feather format, you need to choose “Save as Parquet” or “Save as Feather” from the drop-down list and click on it. The file will be automatically created and saved on your device.
If you want to open the local file, you need to go to File -> Open -> File and just choose the needed file in the required format.
Let’s take a look at how it works.

alex.paramonov · June 4, 2022, 3:49pm

Great feature! Looking forward to use it for CVM integration

skalkin · June 4, 2022, 7:17pm

This is awesome, thank you @oserhiienko! Not only this allows the platform to open and save these file types, but as @alex.paramonov mentioned Parquet would eventually become the standard for passing dataframes to our computational services. Moreover, the ability to convert our dataframes to Arrow is pivotal for our future computational services framework. By having scientific computations as WASM-compiled functions that work on Arrow dataframes, we will get unmatched performance as well as the ability to execute it either on server or the client side. This is a game-changer for high-performance computing.