Fix df.count() behavior to perform count_rows instead #1996

jaychia · 2024-03-08T19:54:35Z

Is your feature request related to a problem? Please describe.

When users run df.count(), they often expect df.count_rows() behavior. Instead, df.count() will perform a count aggregation on every column, which is not the intended behavior.

Interestingly, Spark returns an int for the .count() operation, which is perhaps most intuitive.

See Spark behavior: https://saturncloud.io/blog/counting-rows-in-pyspark-dataframes-a-comprehensive-guide/#counting-rows-using-the-count-function

We should discuss the intended behavior for our df.count() operation and implement a fix.

The text was updated successfully, but these errors were encountered:

universalmind303 · 2024-08-14T03:26:47Z

Coming from other dataframe libraries, this has tripped me up several times.

+1 to changing the behavior

samster25 · 2024-08-14T18:20:09Z

Yeah it's a common confusion point. I think its finally time!

jaychia added the p1 Important to tackle soon, but preemptable by p0 label Mar 8, 2024

jaychia added this to Daft-OSS Mar 8, 2024

github-project-automation bot moved this to On Deck in Daft-OSS Mar 8, 2024

jaychia removed the status in Daft-OSS Jun 28, 2024

jaychia mentioned this issue Aug 14, 2024

[FEAT] Changes the default count() behavior to perform a global row count instead #2653

Merged

jaychia assigned kevinzwang Aug 14, 2024

jaychia closed this as completed in #2653 Aug 15, 2024

jaychia closed this as completed in b961ad3 Aug 15, 2024

github-project-automation bot moved this to Done in Daft-OSS Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix df.count() behavior to perform count_rows instead #1996

Fix df.count() behavior to perform count_rows instead #1996

jaychia commented Mar 8, 2024

universalmind303 commented Aug 14, 2024 •

edited

Loading

samster25 commented Aug 14, 2024

Fix df.count() behavior to perform count_rows instead #1996

Fix df.count() behavior to perform count_rows instead #1996

Comments

jaychia commented Mar 8, 2024

universalmind303 commented Aug 14, 2024 • edited Loading

samster25 commented Aug 14, 2024

universalmind303 commented Aug 14, 2024 •

edited

Loading