How do I select rows via JS API?

nikolay.seliverstov · August 11, 2020, 1:44pm

skalkin · August 11, 2020, 2:01pm

Use the table.selection bitset. Here is an example:

https://public.datagrok.ai/js/samples/data-frame/manipulate

nikolay.seliverstov · August 11, 2020, 2:12pm

Ok, now I can select rows by their indices. But what if I want to select them by particular value in a column? What should I do?

skalkin · August 11, 2020, 2:20pm

You would have to do it in a loop, just like that:

for (int i = 0; i < len; i++)
    table.selection.set(i, myColumn.get(i) === 'myCategory');

nikolay.seliverstov · August 11, 2020, 2:24pm

I would expect some query result that I can apply to the table.selection. Ok. Got it.

nikolay.seliverstov · August 11, 2020, 6:07pm

Will the query-selection workflow be supported in the near future? It seems a bit strange having so powerful platform and iterating over the rows the old-school way.

skalkin · August 12, 2020, 12:28pm

Can you elaborate on what you mean by “query-selection workflow”? I’m intrigued, we’ll happily add new API methods if they would make the API more expressive.

nikolay.seliverstov · August 12, 2020, 12:43pm

Selection of the rows is similar to SELECT SQL query without the actual column data retrieval, right? So, I would imagine if we had it like

rowset = table.select().where("col = val").and("col2 = val2");
table.selection.set(rowset);

Or similar.

nikolay.seliverstov · August 12, 2020, 12:47pm

Why I had this idea in mind that’s because of the groupBy API. I thought that you already have something similar with select but didn’t find it.

skalkin · August 12, 2020, 12:58pm

Thanks for the example! Usually, the predicates are a bit more complex than that, and currently I see no reason to construct a query, but we can easily make the following method that would take a predicate with the Row as a parameter:

table.selection.setWhere((row) => row.height > 50 || row.make == "Honda")

nikolay.seliverstov · August 12, 2020, 1:04pm

Would it be efficient? I didn’t propose the similar API that you’ve showed because the table.selection is a BitSet thus kind of unrelated to the table itself. Am I wrong?

I mean I’m not sure how you would modify BitSet to iterate over rows and not the indices.

skalkin · August 12, 2020, 2:05pm

We might add a reference to the DataFrame for bitsets that are used as DataFrame’s filter and selection in order to enable that sort of expressive API. Regarding efficiency, indeed it won’t be efficient (but will likely be fast enough for user-driven interactions on datasets less than 10M records). The way you proposed making string-based queries could theoretically be made very efficient, but specifying the conditions as string is prone to errors. Perhaps there is a middle ground.

nikolay.seliverstov · August 12, 2020, 2:24pm

I do not insist it to be strings. That was just expression of an idea. I’ve used plain SQL for the where clause. Sure it can have different API in terms of parametrisation of the where clause, joins etc.

The main idea is to get a ready to use/apply BitSet from the platform itself instead of creating own by iterating over numerous rows and cells in the rows.

It can have query-by-example-like API https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#query-by-example

skalkin · August 13, 2020, 1:12pm

I still don’t see a compelling reason for introducing another abstraction layer for selecting rows. In a general case, it is exactly iterating over indexes, checking whether or not you want this row to be selected or not (keep in mind that the condition might depend on external variables, or functions that take multiple parameters), and changing the bitset to reflect it. The proposed method is canonical, fast, and flexible. Unlike grouping, there are no joins or aggregations going on, and we like to keep things as simple as possible.

Having said that, I agree that a more fluent API would be a welcome addition. In the end, we have added “select” and “filter” methods to the RowList class, so now you can do it like that:

// Selecting or filtering rows by predicates
let demog = grok.data.testData('demog', 5000);
demog.rows.select((row) => row.sex === 'M');
demog.rows.filter((row) => row.age > 42);
grok.shell.add(demog);

nikolay.seliverstov · August 13, 2020, 1:21pm

I do not insist anything. You know your product better. You’ve asked me to explain my idea and I did it. Thanks for your effort extending API.

skalkin · August 13, 2020, 2:01pm

Thanks for explaining the idea, it actually got me thinking about introducing reusable and persistable matchers that would be used for row categorizations and other purposes. Please don’t take my comments the wrong way, I was merely explaining the reasoning for sticking with the simpler API in this case - please keep your suggestions coming!