Skip to content
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.

Reading multiple AVRO files with different schema #165

Closed
smunigati opened this issue Aug 17, 2016 · 2 comments
Closed

Reading multiple AVRO files with different schema #165

smunigati opened this issue Aug 17, 2016 · 2 comments
Milestone

Comments

@smunigati
Copy link

How do we load multiple AVRO files potentially with different schema(s) in a single directory ? Is there any schema merge facility like we have while loading Parquet files ?

sqlContext.read.option("mergeSchema","true")

following does not work for AVRO files..
var df = sqlContext.read.option("mergeSchema","true").format("com.databricks.spark.avro").avro("/user/demo/")

@koertkuipers
Copy link
Contributor

this currently doesnt work despite avro having build in support for merging schemas.
the problem is that currently the GenericDatumReader is created without passing in an avro schema.

@JoshRosen
Copy link
Contributor

As of #195, we should now have the ability to use a user-provided Avro schema at read time, so I'm going to mark this as "fixed in 3.1.0". Please re-open if this problem hasn't been fully addressed.

@JoshRosen JoshRosen added this to the 3.1.0 milestone Nov 27, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants