-
Notifications
You must be signed in to change notification settings - Fork 310
Be permissive when reading avro files with inconsistent schema #31
Comments
This affects #49 also. |
Any workarounds? We have an evolving but backwards compatable avro schema and this is a blocker for us. |
I would be happy to have this as I also ran into the same issue as #49. Anyone planning to pick it up? |
@JoshRosen can you give an indication on whether someone is going to pick it up? If it is required I can also invest some time and see if I can fix it. If someone already has some indications about the piece of code that has to be changed it is much appreciated. |
@nlande did you find a workaround so far? My workaround was to load every file individually and then union them after selecting. However, I ran into huge performance problems with several thousands of avro files because the DAG got pretty big... When trying to show it in the Spark UI, the UI even ran out of memory. |
should be solvable by specifying a schema that is a union in itself. this pull request solves that #95 |
Hi guys, is there any plan to fix this issue? We are currently facing the same problem. |
should be able to do this by specifying a schema while loading files b078cca |
I think @Gauravshah is right: this should be addressed by the "custom read schema" support that is added in the forthcoming 3.1.0 release, so I'm going to go ahead and tentatively mark this as fixed. |
If there are multiple files in a directory and some of them have additional records, do not throw exception as long as those fields are not accessed. Ideally show a warning when loading.
On a related note this can be controlled with a flag in the options.
The text was updated successfully, but these errors were encountered: