-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] count() in avro failed when reader_types is coalescing #6131
Comments
One more issue: leak occurred when reader_types is MULTITHREADED and version is v2
|
This is caused by probably the behavior of One fix is to add check for empty read schema. if it is empty, return an empty batch with correct row number, instead of calling into Better to check Parquet and ORC if they have the same issue. |
Avro always specifies the schema from a file here, which is not correct. |
Related issue to add tests: #717 |
|
Describe the bug
spark.read.format("avro").load(data_path).count()
reports error:QueryExecutionException: Expected 0 columns but read 8 from ArrayBuffer
, if reader_types is COALESCINGSteps/Code to reproduce bug
My test code is:
in
avro_test.py
.And it produces:
Expected behavior
spark.read.format("avro").load(data_path).count()
should return the row number of the avro file, and same as the result in CPU version.We should add a test for
.count()
in avro test when it has been fixed.The text was updated successfully, but these errors were encountered: