-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
document pandas-gbq vision and roadmap #149
Comments
Yeah, there's overlap. There are some things that Not supported at all in
Not supported well in
In
The future? My thought is that more and more
My thought is that we'd always have |
Some updates: I filed #175 and #174 while investigating the Also, @alixhami recently added a load_table_from_dataframe method to |
Great! and that would be faster and smaller too. Are there any compatibility issues with using parquet? Is it OK for windows users? I'd be up for making the change to just defer to that method. (though, tangentially, I still think supporting nested structures is going to set bad expectations; coming from someone who uses structs & arrays in BQ a lot) |
The only caveat for Windows users is that they can't use Python 2.7, they have to use Python 3 because it uses PyArrow under the covers. googleapis/google-cloud-python#5441 (comment) |
OK cool. I guess we could leave the old implementation in there and provide a fallback option until the end of the year |
Ah, TIL Pandas is dropping support for 2.7 at the end of the year. Thanks for pointing that out. |
To make this task more concrete, I'd like to propose the two following sub-tasks:
With the exception of schema overriding, I think it should be possible to implement these subtasks without changing the public interface of |
I think the I had thought our implementation of |
Performance-wise, I don't see a difference at the moment. Both create a DataFrame from an iterable of all rows in the result set.
I think previously
I'm not sure how necessary this is.
and also reordering the columns, We could keep this logic in |
googleapis/google-cloud-python#7370 points out that the BQ schema logic is different in |
Finally added CSV support to I imagine we'll want to support older versions of |
In the interest in not keeping issues open forever, I'm going to treat this issue as a request to document the project vision/roadmap. That should be useful for contributors and also understanding the purpose of this project compared to using the pandas connector in |
Both pandas-gbq and google-cloud-bigquery are doing many of the same things, and increasingly so (e.g.
.to_dataframe()
in google-cloud-bigquery)The text was updated successfully, but these errors were encountered: