Skip to content
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.

ISSUE #96: Specifying a read schema #109

Closed

Conversation

yanxiaole
Copy link

Addressing issue #96 .

My solution is, err, a bit violent, all unknown columns were set to null value instead throw a exception, hope some one can improve it. Right now it works for me. Here's the test example:

import org.apache.avro.Schema

val avroSchema = new Schema.Parser().parse(new File("avsc/user.avsc"))
val df = sqlContext.read.schema(avroSchema).avro("avro/")

@yanxiaole
Copy link
Author

Hi guys, spark 1.5.0-rc2 is no longer exist, should I update the .travis.yml's spark version to 1.5.2?
ci-output

@codecov-io
Copy link

Current coverage is 93.90%

Merging #109 into master will increase coverage by +0.45% as of da9274d

@@            master   #109   diff @@
=====================================
  Files            6      6       
  Stmts          275    279     +4
  Branches        45     46     +1
  Methods          0      0       
=====================================
+ Hit            257    262     +5
  Partial          0      0       
+ Missed          18     17     -1

Review entire Coverage Diff as of da9274d

Powered by Codecov. Updated on successful CI builds.

@JoshRosen
Copy link
Contributor

I'm splitting the build changes into a separate PR, #112, so you may have to rebase this.

@yanxiaole
Copy link
Author

ok, rebased.

@talebzeghmi
Copy link

We anticipate this change as it would enable Spark DataFrame schema evolution using Avro.

It would enable my scenario of reading Parquet defined with Avro schema.

val avroSchema = new Schema.Parser().parse(new File("avsc/user.avsc"))
val df = sqlContext.read.schema(avroSchema).format("parquet").load(path)

@talebzeghmi
Copy link

When is this PR to be merged or is it abandoned? thanks


/**
* Adds a method, `schema`, to DataFrameReader that allows you to specific avro schema
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlyubin
Copy link
Contributor

vlyubin commented Feb 1, 2016

Can you please add a test (including scenario with unknown columns)?

Thanks for working on this!

@yanxiaole
Copy link
Author

@vlyubin I will fix the code style and add a test.

@yanxiaole
Copy link
Author

@vlyubin done, with tests. Many details of the test copied from PR #113

@jyssky
Copy link

jyssky commented Feb 10, 2016

@bottleimp I just made PR regarding the "try-catch" codes. Hope it helps!

replace try-exception in buildScan
@pauldwolfe
Copy link

What's the status of this guy? Really important feature for those of needing schema evolution and who have run into the stackoverflow problem when reading recursive schemas!

@yanxiaole
Copy link
Author

@pauldwolfe, you may check #95 to see if it meets your need.

@Gauravshah Gauravshah mentioned this pull request May 25, 2016
@jamesmatanle
Copy link

This looks like the best solution to this issue, and reading multiple avros with evolved schemas is very important to me. What is the consensus here?

@jamesmatanle
Copy link

Just following up to say that this PR has met my schema evolution needs very well, but I'm resistant to utilizing a forked repo in production. Shall we rebase?

@yanxiaole
Copy link
Author

yanxiaole commented Aug 10, 2016

@jamesmatanle, I'll do a rebate today and check what I can do next.

@yanxiaole
Copy link
Author

My PR is outdated, with #155 , I think I would just close it.

@yanxiaole yanxiaole closed this Aug 13, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants