Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas pull requests for .to_avro/.read_avro are welcome! #1

Open
jreback opened this issue Dec 2, 2015 · 6 comments
Open

pandas pull requests for .to_avro/.read_avro are welcome! #1

jreback opened this issue Dec 2, 2015 · 6 comments

Comments

@jreback
Copy link

jreback commented Dec 2, 2015

thanks @mariusvniekerk

@mariusvniekerk
Copy link
Collaborator

This has a whole bunch of c deps and no windows support.

Pretty easy to build all of the dependencies as conda packages

Where does pandas stand on optional dependencies for top level apis?
Is that okay to add in to pandas?

@jreback
Copy link
Author

jreback commented Dec 2, 2015

ok for something like this.

we bundled the c-deps in-line for msgpack, but that was reasonably small. So that's an option (at some point).

conda only is also ok as well. This is a purely optional feature, if people want to use it then they need to install the deps (or use conda, which they should be anyhow).

biggest question I would have is, is their a standard-ish schema already out there for dataframe type stuff? (so even though I ended up creating an internal one for msgpack, better to hijack an existing one I think).

@mariusvniekerk
Copy link
Collaborator

I have a converter function that will infer a schema for a given dataframe. Should work for a reasonable amount of types.

Non-primitive classes are not supported atm. Its probably not really something that makes a lot of sense in anycase.

@jreback
Copy link
Author

jreback commented Dec 2, 2015

gr8!

yeh, that all sounds good.

@mariusvniekerk
Copy link
Collaborator

The only types that are problematic in a generic sense are timestamps.

Avro does not provide a native timestamp type so these are just converted to Long (unix epoch milliseconds). We can easily add some metadata to the avro header for ease of preserving these types when read using pandas. Other systems though would just see Long

@jreback
Copy link
Author

jreback commented Dec 2, 2015

ahh I see. yeh, really fully converting pandas types is a bit tricky actually (e.g. but see here for how the type conversions for msgpack are done.

We've also used a schema define here for JSON-y data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants