Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: remove pandas/io/gbq.py and tests and replace with pandas-gbq #15484

Closed
wants to merge 2 commits into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Feb 23, 2017

closes #15347

  • going to have to migrate the docs as well, but need some work on pandas-gbq side for this.
  • should change the tests to check a bit more (a more comprehensive check)

@jreback jreback added this to the 0.20.0 milestone Feb 23, 2017
@jreback
Copy link
Contributor Author

jreback commented Feb 23, 2017

cc @parthea

@codecov-io
Copy link

codecov-io commented Feb 23, 2017

Codecov Report

Merging #15484 into master will increase coverage by 0.71%.
The diff coverage is 53.12%.

@@            Coverage Diff             @@
##           master   #15484      +/-   ##
==========================================
+ Coverage   90.36%   91.07%   +0.71%     
==========================================
  Files         136      136              
  Lines       49553    49096     -457     
==========================================
- Hits        44780    44716      -64     
+ Misses       4773     4380     -393
Impacted Files Coverage Δ
pandas/util/print_versions.py 15.71% <ø> (ø)
pandas/io/gbq.py 40% <36.36%> (+23.53%)
pandas/core/frame.py 97.73% <60%> (-0.1%)
pandas/util/decorators.py 67.32% <62.5%> (-1.28%)
pandas/util/testing.py 81.91% <0%> (-0.19%)
pandas/types/cast.py 85.45% <0%> (+0.02%)
pandas/core/common.py 91.36% <0%> (+0.33%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb7dc7d...0fd8d06. Read the comment docs.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

  • whatsnew note
  • I would also note it in the docstrings of read_gqb/to_gbq that the functionality comes from another package. We could even remove the docstring apart from that note, and then append the docstrings from pandas_gbq, so we don't need to keep the docstring updated in our code when there are changes in pandas_gbq.

pandas/io/gbq.py Outdated
# give a nice error message
raise ImportError("the pandas-gbq library is not installed\n"
"you can install\n"
"pip install pandas-gbqt\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gbqt -> gbq

@jreback
Copy link
Contributor Author

jreback commented Feb 23, 2017

@jorisvandenbossche ya, i'll fiddle with the doc-strings a bit. The problem is I don't want to have it import pandas_gbq always on pandas import itself. (otherwise I would simply assign the functions).

@jorisvandenbossche
Copy link
Member

Ah, yes, forgot that part .. Then it seems not that easy to append the docstrings.

"""
this function returns a dynamically generated doc-string
that will call the creator callable to actually
create the docstring.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche this will dynamically create the docstrings

though I did this by experimentation (e.g. its code in inspect.py which actually gets the docs)

pandas/io/gbq.py Outdated

Authentication to the Google BigQuery service is via OAuth 2.0.
read_gbq.__doc__ = generate_dynamic_docstring(
lambda: _try_import().read_gbq.__doc__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to start testing that these changes don't damage the serializability of Pandas functions. Pandas does a lot of dynamic things to its functions and methods which has caused some pain in the past.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you adapt the decorator (or the function that returns the docstring) to have a default return value when the package could not be imported?

Now the docstring is just empty, while there could be a short explanation that you need pandas-gbq library for this functionality.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pandas has split off Google BigQuery support into a separate package ``pandas-gbq``. You can ``pip install pandas-gbq`` to get it.
The functionaility of ``pd.read_gbq()`` and ``.to_gbq()`` remains the same with the currently released version of ``pandas-gbq=0.1.2``. (:issue:`15347`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functionaility -> functionality

@jreback
Copy link
Contributor Author

jreback commented Feb 26, 2017

docs are now up: http://pandas-gbq.readthedocs.io/en/latest/index.html#

going to edit pandas docs and fix other suggestions by @jorisvandenbossche

"""

def __init__(self, func, creator, default=None):
self.func = func
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche ok this should do it.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, now also ?? works properly to see the source code!

using SQL-like queries. Result sets are parsed into a pandas
DataFrame with a shape and data types derived from the source table.
Additionally, DataFrames can be inserted into new BigQuery tables or appended
to existing tables.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep this first paragraph as intro, or part of it? (but just replace the 'the pandas.io.gbq module' with 'the pandas-gbq package')

you must wait 2 minutes before streaming data into the table. As a workaround, consider creating
the new table with a different name. Refer to
`Google BigQuery issue 191 <https://code.google.com/p/google-bigquery/issues/detail?id=191>`__.
Starting in 0.20.0, pandas has split off Google BigQuery support into the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is OK, but apart from that I would also give some more explanation (outside of the warning, with new people as audience who never used it from within pandas)

->

pandas-gbq package provides functionality to read/write from gbq
pandas integrates with this external package
if pandas-gbq is installed, you can can use the pandas methods read_gbq and DataFrame.to_gbq, which will call the respective functions from pandas-gbq.
Full documentation can be found in the package's docs.

from pandas.io.gbq import _try_import
return _try_import().to_gbq.__doc__
to_gbq = docstring_wrapper(
to_gbq, _f, default='the pandas_gbq package is not installed')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a bit more verbose, eg:

default = """
Load data from Google BigQuery.

To be able to use this function, you need to install the 
`pandas_gbq` package (https://pandas-gbq.readthedocs.io/).
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: pd.read_gbq?
Type:           docstring_wrapper
String form:    <pandas.util.decorators.docstring_wrapper object at 0x10d8a6d68>
File:           ~/pandas/pandas/io/gbq.py
Signature:      pd.read_gbq(query, project_id=None, index_col=None, col_order=None, reauth=False, verbose=True, private_key=None, dialect='legacy', **kwargs)
Docstring:     
Load data from Google BigQuery

the pandas-gbq package is not installed
see the docs: https://pandas-gbq.readthedocs.io

you can install via:
pip install pandas-gbq
Call signature: pd.read_gbq(func, *args, **kwargs)

@jreback
Copy link
Contributor Author

jreback commented Feb 27, 2017

updated

@jreback
Copy link
Contributor Author

jreback commented Feb 27, 2017

@jorisvandenbossche lmk if any further comments. FYI I also am going to remove the 2 issues from whatsnew about gbq (as they are doc-ed in the pandas-gbq).

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Did you address @mrocklin comment from above regarding serializability of the function with dynamic docstring?

@mrocklin
Copy link
Contributor

That's also a long-standing issue. I don't expect this to be significantly worse than other things that already happen in Pandas. Please don't consider my comment as a blocking issue.

@jreback
Copy link
Contributor Author

jreback commented Feb 27, 2017

I actually think this should work on serialization, as we do properly setup __name__, __qualname__ and __module__. That said really not sure. Can always revisit if someone has an issue.

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#15347

Author: Jeff Reback <jeff@reback.net>

Closes pandas-dev#15484 from jreback/gbq and squashes the following commits:

0fd8d06 [Jeff Reback] wip
3222de1 [Jeff Reback] CLN: remove pandas/io/gbq.py and tests and replace with pandas-gbq
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MIGRATE: pandas-gbq
4 participants