Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count command optimization #809

Closed
maheshrajamani opened this issue Jan 18, 2024 · 5 comments · Fixed by #811
Closed

Count command optimization #809

maheshrajamani opened this issue Jan 18, 2024 · 5 comments · Fixed by #811
Assignees

Comments

@maheshrajamani
Copy link
Contributor

maheshrajamani commented Jan 18, 2024

Count command in the API uses cassandra's count function. Cassandra coordinator fetches all the filtered rows from storage and increments the counter for each valid rows. Because of this approach, counting cassandra table with large number of rows potentially takes longer time and eventually starts timing out.

Below is the proposed change for count operation in json api.

  • A configurable limit will be specified to suggest what is maximum count that api will return. Default will be 1000.
  • Count command will not use cassandra's count aggregation function.
  • Read primary key (_id) column from DB for given filters
  • Select query will be specified with limit <configured limit + 1>
  • If total primary keys read from DB is equal to <configured limit +1>, then return count as and more_data to true. Else return the count of keys retrieved.

Translated CQL for count command will look as:
select key from table where filters limit configured limit +1

@maheshrajamani maheshrajamani self-assigned this Jan 18, 2024
@JeremiahDJordan
Copy link

Pretty sure you can still push down “count” with a LIMIT specified. No reason to return back the keys if they are not needed.

@tatu-at-datastax
Copy link
Contributor

tatu-at-datastax commented Jan 18, 2024

@JeremiahDJordan If I remember correctly, LIMIT would not work the way we want -- it'd limit number of return rows, which is 1 (for the count), and not for number of rows being counted.

EDIT Maybe I misunderstood command: LIMIT would of course work for actual query, if used as work-around.

@tatu-at-datastax
Copy link
Contributor

Is this same as #793 ?

@JeremiahDJordan
Copy link

I read this ticket as a “quick fix” and #793 as “see if there is something else to be done that still gives an approximate count”

@JeremiahDJordan
Copy link

@tatu-at-datastax looks like you are correct. It used to work the way I said in 2.1.x and earlier. But it was “fixed” in 2.2 and later per https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants