You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When users run df.count(), they often expect df.count_rows() behavior. Instead, df.count() will perform a count aggregation on every column, which is not the intended behavior.
Interestingly, Spark returns an int for the .count() operation, which is perhaps most intuitive.
Is your feature request related to a problem? Please describe.
When users run
df.count()
, they often expectdf.count_rows()
behavior. Instead,df.count()
will perform a count aggregation on every column, which is not the intended behavior.Interestingly, Spark returns an
int
for the.count()
operation, which is perhaps most intuitive.See Spark behavior: https://saturncloud.io/blog/counting-rows-in-pyspark-dataframes-a-comprehensive-guide/#counting-rows-using-the-count-function
We should discuss the intended behavior for our
df.count()
operation and implement a fix.The text was updated successfully, but these errors were encountered: