Approach to aggregate newest/max date from various date columns in dataframe

I have a dataframe with multiple date columns.
I want to add an additional date column to the dataframe called maxDate.
Per row I want to efficiently aggregate the newest date, and set it as value for the maxDate column.

Example source dataframe:

| id  | colA | dateColB                 | dateColC                 |
====================================================================
| 1   | foo  | 2019-11-13T00:00:00.000Z | 2019-11-15T00:00:00.000Z |
| 2   | bar  | 2020-09-28T00:00:00.000Z | 2020-06-01T00:00:00.000Z |
| 3   | zoo  |                          | 2020-06-01T00:00:00.000Z |
| 4   | beer |                          |                          |

Example target dataframe:

| id  | colA | dateColB                 | dateColC                 | maxDate                  |
===============================================================================================
| 1   | foo  | 2019-11-13T00:00:00.000Z | 2019-11-15T00:00:00.000Z | 2019-11-15T00:00:00.000Z |
| 2   | bar  | 2020-09-28T00:00:00.000Z | 2020-06-01T00:00:00.000Z | 2020-09-28T00:00:00.000Z |
| 3   | zoo  |                          | 2020-06-01T00:00:00.000Z | 2020-06-01T00:00:00.000Z |
| 4   | beer |                          |                          |                          |

What do you recommend to use from grok javascript api to accomplish this?

1 Like

Hi Ralf, great to see you here, please stop by more often :slight_smile:

One way to achieve that is to perform a series of relational operations that would transform your table to the desired form. In the script below (which would currently only work on our dev server only since I simplified our API a little), we do it in three simple steps:

  1. Unpivot the table to merge dates in one column
  2. Aggregate this table and find max date for each combination of keys
  3. Join the aggregated result to the original table

Here’s the code:

let original = DG.DataFrame.fromCsv(`id,colA,dateColB,dateColC
1,foo  ,11/12/2019 19:00,11/14/2019 19:00
2,bar  ,9/27/2020 20:00,5/31/2020 20:00
3,zoo  ,,5/31/2020 20:00
4,beer ,,`);

let ids = ['id', 'colA'];
let unpivoted = original.unpivot(ids, ['dateColB', 'dateColC']);
let grouped = unpivoted.groupBy(ids).max('Value', 'last').aggregate();
let result = original.join(grouped, ids, ids, ['dateColB', 'dateColC'], ['last']);

And here is the outcome:

It is also be possible to write this in the imperative style (which would be more performant if you care about it in this case), someone will post a short canonical code snippet for that case soon.

Cheers,
Andrew

2 Likes

Thanks for your response, Andrew. Looks good, will test out the approach on the actual dataframe.