-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a backend for apache Avro #370
Comments
@ahasha Excellent! We would happily review and likely accept such a patch :) |
One question -- how do you guys handle backend-specific dependencies? It looks like they go in the test requirements of conda.recipe/meta.yaml. Anywhere else I need to register them? |
@ahasha You can also put them in https://github.com/blaze/odo/blob/master/recommended-requirements.txt if you like |
Quick question -- Are there patterns I should look at for ensuring invertibility when you transition from one data model to another and back? It seems like a hard problem in general, as some data models are less expressive than others. Pandas seems particularly difficult since it does a lot of implicit type casting that can destroy information. Is there a way to attach a master datashape through conversions to use for disambiguation? |
@ahasha The |
I guess I'll find out! Just wanted to see if there were any examples for me to look at -- I get the sense you guys have built up some expertise with the mysterious world of implicit type conversions in pandas dataframes, and I'm a little intimidated to get started without something to look at. I found you participating in a discussion of using Avro as a serialization format for pandas dataframes way back when (pandas-dev/pandas/pull/3525) saying "Avro looks kind of insane". How fresh is your memory of what made you say that? |
see also here: pandas-dev/pandas#11752 |
@ahasha That was awhile ago :) I haven't looked at Avro in quite some time. I have zero memory of why I said that. |
just tried the example from fastavro seems pretty easy to install and use |
(ping) -- I get the sense folks are on vacation, but just wanted to see if anyone's back. It looks like the build is broken by an issue unrelated to this PR. I submitted a bunch of edits to address earlier comments. Let me know if there are any other major concerns -- like if you'd potentially want to see fastavro or cyavro substituted in this PR as opposed to a follow up. |
I switched to fastavro for reading since it is faster and is compatible with both Python 2 and 3. I needed to keep the strandard avro dependency because neither fastavro or cyavro have schema manipulation capabilities I needed for discover. Also stuck with the standard avro library for writing, because fastavro does not support appending to a file as far as I can tell. |
@ahasha apologies for the radio silence. I'm cutting the 0.4.1 release today, and will review this work for the 0.4.2 (or 0.5.0) release. Thanks for your work on #386; since this is a non-trivial backend and Phillip has moved on to new things, it will take me a day or two to review and sign off on the PR. |
Sorry to hear about that. I suspected something was up. Let me know if you'd suggest any further changes. Since submitting I've become a pretty enthusiastic odo user and have some other extensions in mind, so I want to make sure I learn the standards of the project well. |
big odo fan here, what would be replacing odo? i |
I use Avro as a standard data format on most of my projects, and have been working on an Avro backend as a way of learning the odo project. I have a patch ready, and am just waiting on approval from my company's open source office to submit it for feedback.
The text was updated successfully, but these errors were encountered: