Calculated columns with python/R vs in UI

nikolaus.stiefl.novartis.com · June 22, 2021, 3:46pm

HI there
we are looking at calculated columns and there are some general question that would be good to get advice on:

when to use R/python to generate calculated columns vs using the built in functionality?
do these scripts always run on the server or is there also some local client functionality exposed?
is there a way to make that if I add a column using a script this can be “repeated” upon reopening a dataset? Something like a script or a set of scripts that can be connected to a dataset which is executed upon data load?
Thanks
Nik

alex.paramonov · June 23, 2021, 8:22am

Hi,
DG supports R or Python functions in Add New Column dialog, and they are being executed on the backend.
There is a vectorization. When all script parameters are scalar (inputs and outputs), DG executes the script only once, wrapped into a loop.

Re-executing script on table open is a good idea, we don’t have it right now, but we are very close. DG saves all formulas to column tags, so we just need an option like “recalculate column when open table”.

We will discuss it inside our team!

nikolaus.stiefl.novartis.com · June 24, 2021, 10:43am

Hi Alex,
thanks for the explanation. Just one question for clarification:
you say that you support python functions in the add new column dialogue. What I saw in the tutorial video seemed to be the support of functions defined inside DG but I was more asking if there is “native” python support in the Add New column dialogue (ie things like if/else or similar or even a direct python script).
Or is the idea to always define the functions in the first place and then call only those functions in the interface?
Maybe an example would help.
Cheers
Nik

alex.paramonov · July 6, 2021, 9:51am

Unfortunately, there is no native Python support in Add New Column yet (but it’s a good idea, we will think about it), and you should define a function to call it from the Add New Column dialog. It can be anything, JS, Python, R, etc. And, for remote scripts, such as Python, it supports vectorization.