Skip to content
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.

Handling more complex unions in SchemaConverter #108

Closed
hkothari opened this issue Dec 18, 2015 · 3 comments
Closed

Handling more complex unions in SchemaConverter #108

hkothari opened this issue Dec 18, 2015 · 3 comments
Milestone

Comments

@hkothari
Copy link

Hello,

I'd like to propose adding support for more complex union types eg (ENUM, RECORD, RECORD) or (RECORD, RECORD, RECORD), etc. in avro files. The way I would like to do this is via a Spark SQL UDT which essentially wraps an java.lang.Object (with some nicer functionality for size estimation) and passes that back to the user.

Object seems like the right way to handle this as it's how the avro java compiler handles union types when they're not a combination of null and something else: https://github.com/apache/avro/blob/trunk/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java#L513

Does anyone have objections/suggestions for this plan? If I was to go implement it, would we be able to merge it?

Best,
Hamel

@silasdavis
Copy link

The lack of support for union types over records is causing us problems too

@manugarri
Copy link

I was wondering, is there any plans of expanding further the available union types? I am facing a case where the existing schema is ["string", "long", "null"] (schema changed over time) and cant seem to read it.

@JoshRosen
Copy link
Contributor

I just merged #117 which addresses this in a slightly different way without using UDTs. If you'd like to recover the UDT-like behavior then I think you can define a UDT and write a UDF to take a struct with a single non-null field and extract that field into an object.

Since I think that #117 addresses the immediate issue of simply not being able to read data containing complex UNION types, I'm going to optimistically mark this issue as resolved. If there are specific needs that aren't covered by #117 then please re-open and comment on this issue.

@JoshRosen JoshRosen added this to the 3.1.0 milestone Nov 24, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants