Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameters to allow adding SQL filters while downloading a dataset #1604

Merged
merged 11 commits into from
Apr 15, 2020

Conversation

dgaubert
Copy link
Contributor

Add 'sql_query' and 'add_geom' parameters to allow adding SQL filters while downloading a dataset

@dgaubert dgaubert changed the base branch from dgaubert/ch61421/integrate-do-client-in-to-dataframe-and-to to release/1.0.2 April 2, 2020 13:00
@dgaubert dgaubert changed the base branch from release/1.0.2 to develop April 6, 2020 17:19
@dgaubert dgaubert marked this pull request as ready for review April 10, 2020 11:05
@dgaubert
Copy link
Contributor Author

Note: tests are failing because it needs to have carto-python 1.11.0 released.

I've tried to install the devel version of carto-python where it includes the required changes to make tests to pass. I've not been able to achieve that. My best chance was:

$ git show acceaa8b04fa5693022353c486f85b48a426d02c
commit acceaa8b04fa5693022353c486f85b48a426d02c
Author: Daniel García Aubert <danielgarciaaubert@gmail.com>
Date:   Fri Apr 10 12:42:18 2020 +0200

    Install carto-python from custom github branch

diff --git a/setup.py b/setup.py
index 7061b4c..6395772 100644
--- a/setup.py
+++ b/setup.py
@@ -25,7 +25,7 @@ def get_version():

 REQUIRES = [
     'appdirs>=1.4.3,<2.0',
-    'carto>=1.10.1,<2.0',
+    'carto@git+https://github.com/cartodb/carto-python.git@dgaubert/ch58107/add-sql-filter-to-do-datasets#egg=carto',
     'jinja2>=2.10.1,<3.0',
     'geopandas>=0.6.0,<1.0',
     'tqdm>=4.32.1,<5.0',

and then: pip install -r requirements.txt

But the CI is failing due to:

Processing ./.tox/.tmp/package/1/cartoframes-1.0.2.zip
Direct url requirement (like carto@ git+https://github.com/cartodb/carto-python.git@dgaubert/ch58107/add-sql-filter-to-do-datasets#egg=carto) are not allowed for dependencies

Acceptance

So, if you are willing to test it locally, you must:

/path/to/carto-python$ git fetch origin
/path/to/carto-python$ git checkout dgaubert/ch58107/add-sql-filter-to-do-datasets
/path/to/carto-python$ cd /path/to/cartoframes
/path/to/cartoframes$ git fetch origin
/path/to/cartoframes$ git checkout dgaubert/ch58107/add-sql-filter-to-do-datasets
/path/to/cartoframes$ pip install -r requirements.txt
/path/to/cartoframes$ pip install -e /path/to/carto-python

Copy link
Contributor

@simon-contreras-deel simon-contreras-deel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment

auth_client = credentials.get_api_key_auth_client()
rows = DODataset(auth_client=auth_client).name(self.id).download_stream(limit=limit, order_by=order_by)

is_geography = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does is_geography = None mean?
Why does it depend on sql_qurery?

I mean, I see easier to add is_geography=True from geography and is_geography=False from dataset, and in the backend, get the all the options with sql_query and is_geography

Copy link
Contributor Author

@dgaubert dgaubert Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an internal param when using the Geography class is set to True. We need to detect it when the user wants to download the geography dataset as the placeholder defined in the story is {geography} instead of {dataset} in the query. We can't know it by using only the sql_query param as we might need to parse it or use a regex and is troublesome.

I preferred being explicit in the client than trying to be smart in the backend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with what you say, but to keep the code simple you could do:

is_geography = self.__class__.__name__ == 'Geography'

(whether there's a sql_query or not shouldn't matter, right?)

Copy link
Contributor

@rafatower rafatower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just left a couple minor comments.

Yet we need to test this as much as needed in staging.

auth_client = credentials.get_api_key_auth_client()
rows = DODataset(auth_client=auth_client).name(self.id).download_stream(limit=limit, order_by=order_by)

is_geography = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with what you say, but to keep the code simple you could do:

is_geography = self.__class__.__name__ == 'Geography'

(whether there's a sql_query or not shouldn't matter, right?)

tests/e2e/data/observatory/catalog/test_download.py Outdated Show resolved Hide resolved
sql_query = 'select * from {dataset} order by geoid limit 2'
add_geom = True
df = public_dataset.to_dataframe(self.credentials, sql_query=sql_query, add_geom=add_geom)
df.to_csv(self.tmp_file, index=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to store df into a file for then reading and comparing with the expected_df?

Copy link
Contributor Author

@dgaubert dgaubert Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just followed what it's done in the rest of the tests.

@rafatower rafatower merged commit d7bc553 into develop Apr 15, 2020
@rafatower rafatower deleted the dgaubert/ch58107/add-sql-filter-to-do-datasets branch April 15, 2020 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants