Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: avro to/from serialization #11752

Closed
jreback opened this issue Dec 3, 2015 · 11 comments
Closed

ENH: avro to/from serialization #11752

jreback opened this issue Dec 3, 2015 · 11 comments
Labels
Compat pandas objects compatability with Numpy or Python functions IO Data IO issues that don't fit into a more specific label

Comments

@jreback
Copy link
Contributor

jreback commented Dec 3, 2015

discussed in #3525

shiny new fast version of avro might be interesting: vericast/cyavro#1
by @mariusvniekerk

@jreback jreback added IO Data IO issues that don't fit into a more specific label Compat pandas objects compatability with Numpy or Python functions labels Dec 3, 2015
@jreback jreback added this to the Someday milestone Dec 3, 2015
@jreback
Copy link
Contributor Author

jreback commented Dec 3, 2015

cc @wesm

@wesm
Copy link
Member

wesm commented Dec 4, 2015

any performance numbers on cyavro vs pyavroc vs fastavro?

@VelizarVESSELINOV
Copy link

👍

2 similar comments
@manugarri
Copy link

👍

@Khrol
Copy link

Khrol commented Oct 4, 2017

👍

@manugarri
Copy link

manugarri commented Oct 4, 2017

any updates on this? I am willing to put time to implement this, however would need some pointers, specially regarding:

  • Would we cast pandas dtypes into avro automatically? Or would we enforce a user defined schema? I believe the pandas approach would be the former, but it would be good (and easier) to allow for a specific schema. Casting the types would require some assumptions (avro has support for a set of primitives that might not match pandas').

  • Would read_avro support for headerless data (i.e. streaming data without the header)? Would make sense, to allow this feature as long as the user provides the schema.

@wesm
Copy link
Member

wesm commented Oct 4, 2017

One possible route for this is https://issues.apache.org/jira/browse/ARROW-1209 -- @mariusvniekerk has started working on this, and it would be easier to accommodate Avro's types in Arrow and deal with the marshaling to/from pandas DataFrame at a central location in Arrow-land.

@jbrockmendel
Copy link
Member

Is this a live topic or is it subsumed by more recent arrow ecosystem developments?

@wesm
Copy link
Member

wesm commented Sep 12, 2019

Arrow-based Avro read/write isn't shipping yet, @emkornfield is interested in this as it relates to various Google services (among other uses). Hard to predict the timeline for that, though.

@emkornfield
Copy link

Yes, I'm hoping to make some progress on this towards the end of this month/early next month.

@jbrockmendel
Copy link
Member

Closing and adding to a tracker issue #30407 for IO format requests, can re-open if interest is expressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

7 participants