Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a backend for apache Avro #370

Open
ahasha opened this issue Nov 30, 2015 · 16 comments · May be fixed by #386
Open

Add a backend for apache Avro #370

ahasha opened this issue Nov 30, 2015 · 16 comments · May be fixed by #386

Comments

@ahasha
Copy link

ahasha commented Nov 30, 2015

I use Avro as a standard data format on most of my projects, and have been working on an Avro backend as a way of learning the odo project. I have a patch ready, and am just waiting on approval from my company's open source office to submit it for feedback.

@cpcloud
Copy link
Member

cpcloud commented Nov 30, 2015

@ahasha Excellent! We would happily review and likely accept such a patch :)

@ahasha
Copy link
Author

ahasha commented Nov 30, 2015

One question -- how do you guys handle backend-specific dependencies? It looks like they go in the test requirements of conda.recipe/meta.yaml. Anywhere else I need to register them?

@cpcloud
Copy link
Member

cpcloud commented Nov 30, 2015

@ahasha You can also put them in https://github.com/blaze/odo/blob/master/recommended-requirements.txt if you like

@ahasha
Copy link
Author

ahasha commented Dec 3, 2015

Quick question -- Are there patterns I should look at for ensuring invertibility when you transition from one data model to another and back? It seems like a hard problem in general, as some data models are less expressive than others. Pandas seems particularly difficult since it does a lot of implicit type casting that can destroy information.

Is there a way to attach a master datashape through conversions to use for disambiguation?

@cpcloud
Copy link
Member

cpcloud commented Dec 3, 2015

@ahasha The odo function accepts a dshape argument which is passed through every convert (and append) function. Is that enough to do what you want?

@ahasha
Copy link
Author

ahasha commented Dec 3, 2015

I guess I'll find out! Just wanted to see if there were any examples for me to look at -- I get the sense you guys have built up some expertise with the mysterious world of implicit type conversions in pandas dataframes, and I'm a little intimidated to get started without something to look at.

I found you participating in a discussion of using Avro as a serialization format for pandas dataframes way back when (pandas-dev/pandas/pull/3525) saying "Avro looks kind of insane". How fresh is your memory of what made you say that?

@jreback
Copy link
Contributor

jreback commented Dec 3, 2015

see also here: pandas-dev/pandas#11752

@cpcloud
Copy link
Member

cpcloud commented Dec 3, 2015

@ahasha That was awhile ago :) I haven't looked at Avro in quite some time. I have zero memory of why I said that.

@cpcloud cpcloud modified the milestones: 0.4.0, 0.4.1 Dec 4, 2015
@ahasha ahasha linked a pull request Dec 18, 2015 that will close this issue
@mrocklin
Copy link
Member

Any thoughts on fastavro or cyavro?

@cpcloud
Copy link
Member

cpcloud commented Dec 18, 2015

just tried the example from fastavro seems pretty easy to install and use

@ahasha
Copy link
Author

ahasha commented Jan 6, 2016

(ping) -- I get the sense folks are on vacation, but just wanted to see if anyone's back. It looks like the build is broken by an issue unrelated to this PR.

I submitted a bunch of edits to address earlier comments. Let me know if there are any other major concerns -- like if you'd potentially want to see fastavro or cyavro substituted in this PR as opposed to a follow up.

@ahasha
Copy link
Author

ahasha commented Jan 30, 2016

I switched to fastavro for reading since it is faster and is compatible with both Python 2 and 3. I needed to keep the strandard avro dependency because neither fastavro or cyavro have schema manipulation capabilities I needed for discover. Also stuck with the standard avro library for writing, because fastavro does not support appending to a file as far as I can tell.

@kwmsmith kwmsmith modified the milestones: 0.4.1, 0.4.2 Feb 2, 2016
@kwmsmith
Copy link
Member

kwmsmith commented Feb 2, 2016

@ahasha apologies for the radio silence. I'm cutting the 0.4.1 release today, and will review this work for the 0.4.2 (or 0.5.0) release. Thanks for your work on #386; since this is a non-trivial backend and Phillip has moved on to new things, it will take me a day or two to review and sign off on the PR.

@ahasha
Copy link
Author

ahasha commented Feb 2, 2016

Sorry to hear about that. I suspected something was up.

Let me know if you'd suggest any further changes. Since submitting I've become a pretty enthusiastic odo user and have some other extensions in mind, so I want to make sure I learn the standards of the project well.

@kwmsmith kwmsmith modified the milestones: 0.4.2, 0.5.0 Feb 5, 2016
@kwmsmith kwmsmith removed this from the 0.4.2 milestone Feb 5, 2016
@manugarri
Copy link

@ahasha @kwmsmith is there any update on this? I am not sure if the Odo project is deprecated.

@nncary
Copy link

nncary commented Sep 8, 2017

big odo fan here, what would be replacing odo? i

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants