-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pandas pull requests for .to_avro/.read_avro are welcome! #1
Comments
This has a whole bunch of c deps and no windows support. Pretty easy to build all of the dependencies as conda packages Where does pandas stand on optional dependencies for top level apis? |
ok for something like this. we bundled the c-deps in-line for conda only is also ok as well. This is a purely optional feature, if people want to use it then they need to install the deps (or use conda, which they should be anyhow). biggest question I would have is, is their a standard-ish schema already out there for dataframe type stuff? (so even though I ended up creating an internal one for msgpack, better to hijack an existing one I think). |
I have a converter function that will infer a schema for a given dataframe. Should work for a reasonable amount of types. Non-primitive classes are not supported atm. Its probably not really something that makes a lot of sense in anycase. |
gr8! yeh, that all sounds good. |
The only types that are problematic in a generic sense are timestamps. Avro does not provide a native timestamp type so these are just converted to Long (unix epoch milliseconds). We can easily add some metadata to the avro header for ease of preserving these types when read using pandas. Other systems though would just see Long |
thanks @mariusvniekerk
The text was updated successfully, but these errors were encountered: