Count command optimization #809

maheshrajamani · 2024-01-18T20:11:25Z

Count command in the API uses cassandra's count function. Cassandra coordinator fetches all the filtered rows from storage and increments the counter for each valid rows. Because of this approach, counting cassandra table with large number of rows potentially takes longer time and eventually starts timing out.

Below is the proposed change for count operation in json api.

A configurable limit will be specified to suggest what is maximum count that api will return. Default will be 1000.
Count command will not use cassandra's count aggregation function.
Read primary key (_id) column from DB for given filters
Select query will be specified with limit <configured limit + 1>
If total primary keys read from DB is equal to <configured limit +1>, then return count as and more_data to true. Else return the count of keys retrieved.

Translated CQL for count command will look as:
select key from table where filters limit configured limit +1

The text was updated successfully, but these errors were encountered:

JeremiahDJordan · 2024-01-18T20:57:37Z

Pretty sure you can still push down “count” with a LIMIT specified. No reason to return back the keys if they are not needed.

tatu-at-datastax · 2024-01-18T21:00:37Z

@JeremiahDJordan If I remember correctly, LIMIT would not work the way we want -- it'd limit number of return rows, which is 1 (for the count), and not for number of rows being counted.

EDIT Maybe I misunderstood command: LIMIT would of course work for actual query, if used as work-around.

tatu-at-datastax · 2024-01-18T21:01:11Z

Is this same as #793 ?

JeremiahDJordan · 2024-01-18T21:25:39Z

I read this ticket as a “quick fix” and #793 as “see if there is something else to be done that still gives an approximate count”

JeremiahDJordan · 2024-01-18T21:42:04Z

@tatu-at-datastax looks like you are correct. It used to work the way I said in 2.1.x and earlier. But it was “fixed” in 2.2 and later per https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8216

maheshrajamani self-assigned this Jan 18, 2024

maheshrajamani mentioned this issue Jan 19, 2024

Count optimization changes #811

Merged

4 tasks

maheshrajamani closed this as completed in #811 Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Count command optimization #809

Count command optimization #809

maheshrajamani commented Jan 18, 2024 •

edited

Loading

JeremiahDJordan commented Jan 18, 2024

tatu-at-datastax commented Jan 18, 2024 •

edited

Loading

tatu-at-datastax commented Jan 18, 2024

JeremiahDJordan commented Jan 18, 2024

JeremiahDJordan commented Jan 18, 2024

Count command optimization #809

Count command optimization #809

Comments

maheshrajamani commented Jan 18, 2024 • edited Loading

JeremiahDJordan commented Jan 18, 2024

tatu-at-datastax commented Jan 18, 2024 • edited Loading

tatu-at-datastax commented Jan 18, 2024

JeremiahDJordan commented Jan 18, 2024

JeremiahDJordan commented Jan 18, 2024

maheshrajamani commented Jan 18, 2024 •

edited

Loading

tatu-at-datastax commented Jan 18, 2024 •

edited

Loading