How do I select rows via JS API?
Ok, now I can select rows by their indices. But what if I want to select them by particular value in a column? What should I do?
You would have to do it in a loop, just like that:
for (int i = 0; i < len; i++)
table.selection.set(i, myColumn.get(i) === 'myCategory');
I would expect some query result that I can apply to the table.selection
. Ok. Got it.
Will the query-selection workflow be supported in the near future? It seems a bit strange having so powerful platform and iterating over the rows the old-school way.
Can you elaborate on what you mean by “query-selection workflow”? I’m intrigued, we’ll happily add new API methods if they would make the API more expressive.
Selection of the rows is similar to SELECT SQL query without the actual column data retrieval, right? So, I would imagine if we had it like
rowset = table.select().where("col = val").and("col2 = val2");
table.selection.set(rowset);
Or similar.
Why I had this idea in mind that’s because of the groupBy
API. I thought that you already have something similar with select
but didn’t find it.
Thanks for the example! Usually, the predicates are a bit more complex than that, and currently I see no reason to construct a query, but we can easily make the following method that would take a predicate with the Row as a parameter:
table.selection.setWhere((row) => row.height > 50 || row.make == "Honda")
Would it be efficient? I didn’t propose the similar API that you’ve showed because the table.selection
is a BitSet
thus kind of unrelated to the table itself. Am I wrong?
I mean I’m not sure how you would modify BitSet to iterate over rows and not the indices.
We might add a reference to the DataFrame for bitsets that are used as DataFrame’s filter and selection in order to enable that sort of expressive API. Regarding efficiency, indeed it won’t be efficient (but will likely be fast enough for user-driven interactions on datasets less than 10M records). The way you proposed making string-based queries could theoretically be made very efficient, but specifying the conditions as string is prone to errors. Perhaps there is a middle ground.
I do not insist it to be strings. That was just expression of an idea. I’ve used plain SQL for the where clause. Sure it can have different API in terms of parametrisation of the where clause, joins etc.
The main idea is to get a ready to use/apply BitSet
from the platform itself instead of creating own by iterating over numerous rows and cells in the rows.
It can have query-by-example-like API https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#query-by-example
I still don’t see a compelling reason for introducing another abstraction layer for selecting rows. In a general case, it is exactly iterating over indexes, checking whether or not you want this row to be selected or not (keep in mind that the condition might depend on external variables, or functions that take multiple parameters), and changing the bitset to reflect it. The proposed method is canonical, fast, and flexible. Unlike grouping, there are no joins or aggregations going on, and we like to keep things as simple as possible.
Having said that, I agree that a more fluent API would be a welcome addition. In the end, we have added “select” and “filter” methods to the RowList class, so now you can do it like that:
// Selecting or filtering rows by predicates
let demog = grok.data.testData('demog', 5000);
demog.rows.select((row) => row.sex === 'M');
demog.rows.filter((row) => row.age > 42);
grok.shell.add(demog);
I do not insist anything. You know your product better. You’ve asked me to explain my idea and I did it. Thanks for your effort extending API.
Thanks for explaining the idea, it actually got me thinking about introducing reusable and persistable matchers that would be used for row categorizations and other purposes. Please don’t take my comments the wrong way, I was merely explaining the reasoning for sticking with the simpler API in this case - please keep your suggestions coming!