-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: remove pandas/io/gbq.py and tests and replace with pandas-gbq #15484
Conversation
cc @parthea |
Codecov Report
@@ Coverage Diff @@
## master #15484 +/- ##
==========================================
+ Coverage 90.36% 91.07% +0.71%
==========================================
Files 136 136
Lines 49553 49096 -457
==========================================
- Hits 44780 44716 -64
+ Misses 4773 4380 -393
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
- whatsnew note
- I would also note it in the docstrings of read_gqb/to_gbq that the functionality comes from another package. We could even remove the docstring apart from that note, and then append the docstrings from pandas_gbq, so we don't need to keep the docstring updated in our code when there are changes in pandas_gbq.
pandas/io/gbq.py
Outdated
# give a nice error message | ||
raise ImportError("the pandas-gbq library is not installed\n" | ||
"you can install\n" | ||
"pip install pandas-gbqt\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gbqt -> gbq
@jorisvandenbossche ya, i'll fiddle with the doc-strings a bit. The problem is I don't want to have it import |
Ah, yes, forgot that part .. Then it seems not that easy to append the docstrings. |
pandas/util/decorators.py
Outdated
""" | ||
this function returns a dynamically generated doc-string | ||
that will call the creator callable to actually | ||
create the docstring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche this will dynamically create the docstrings
though I did this by experimentation (e.g. its code in inspect.py
which actually gets the docs)
pandas/io/gbq.py
Outdated
|
||
Authentication to the Google BigQuery service is via OAuth 2.0. | ||
read_gbq.__doc__ = generate_dynamic_docstring( | ||
lambda: _try_import().read_gbq.__doc__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to start testing that these changes don't damage the serializability of Pandas functions. Pandas does a lot of dynamic things to its functions and methods which has caused some pain in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you adapt the decorator (or the function that returns the docstring) to have a default return value when the package could not be imported?
Now the docstring is just empty, while there could be a short explanation that you need pandas-gbq library for this functionality.
doc/source/whatsnew/v0.20.0.txt
Outdated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
pandas has split off Google BigQuery support into a separate package ``pandas-gbq``. You can ``pip install pandas-gbq`` to get it. | ||
The functionaility of ``pd.read_gbq()`` and ``.to_gbq()`` remains the same with the currently released version of ``pandas-gbq=0.1.2``. (:issue:`15347`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functionaility -> functionality
docs are now up: http://pandas-gbq.readthedocs.io/en/latest/index.html# going to edit pandas docs and fix other suggestions by @jorisvandenbossche |
pandas/util/decorators.py
Outdated
""" | ||
|
||
def __init__(self, func, creator, default=None): | ||
self.func = func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche ok this should do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, now also ??
works properly to see the source code!
using SQL-like queries. Result sets are parsed into a pandas | ||
DataFrame with a shape and data types derived from the source table. | ||
Additionally, DataFrames can be inserted into new BigQuery tables or appended | ||
to existing tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep this first paragraph as intro, or part of it? (but just replace the 'the pandas.io.gbq module' with 'the pandas-gbq package')
doc/source/io.rst
Outdated
you must wait 2 minutes before streaming data into the table. As a workaround, consider creating | ||
the new table with a different name. Refer to | ||
`Google BigQuery issue 191 <https://code.google.com/p/google-bigquery/issues/detail?id=191>`__. | ||
Starting in 0.20.0, pandas has split off Google BigQuery support into the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning is OK, but apart from that I would also give some more explanation (outside of the warning, with new people as audience who never used it from within pandas)
->
pandas-gbq package provides functionality to read/write from gbq
pandas integrates with this external package
if pandas-gbq is installed, you can can use the pandas methods read_gbq and DataFrame.to_gbq, which will call the respective functions from pandas-gbq.
Full documentation can be found in the package's docs.
pandas/core/frame.py
Outdated
from pandas.io.gbq import _try_import | ||
return _try_import().to_gbq.__doc__ | ||
to_gbq = docstring_wrapper( | ||
to_gbq, _f, default='the pandas_gbq package is not installed') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this a bit more verbose, eg:
default = """
Load data from Google BigQuery.
To be able to use this function, you need to install the
`pandas_gbq` package (https://pandas-gbq.readthedocs.io/).
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [1]: pd.read_gbq?
Type: docstring_wrapper
String form: <pandas.util.decorators.docstring_wrapper object at 0x10d8a6d68>
File: ~/pandas/pandas/io/gbq.py
Signature: pd.read_gbq(query, project_id=None, index_col=None, col_order=None, reauth=False, verbose=True, private_key=None, dialect='legacy', **kwargs)
Docstring:
Load data from Google BigQuery
the pandas-gbq package is not installed
see the docs: https://pandas-gbq.readthedocs.io
you can install via:
pip install pandas-gbq
Call signature: pd.read_gbq(func, *args, **kwargs)
updated |
@jorisvandenbossche lmk if any further comments. FYI I also am going to remove the 2 issues from whatsnew about gbq (as they are doc-ed in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Did you address @mrocklin comment from above regarding serializability of the function with dynamic docstring?
That's also a long-standing issue. I don't expect this to be significantly worse than other things that already happen in Pandas. Please don't consider my comment as a blocking issue. |
I actually think this should work on serialization, as we do properly setup |
closes pandas-dev#15347 Author: Jeff Reback <jeff@reback.net> Closes pandas-dev#15484 from jreback/gbq and squashes the following commits: 0fd8d06 [Jeff Reback] wip 3222de1 [Jeff Reback] CLN: remove pandas/io/gbq.py and tests and replace with pandas-gbq
closes #15347