Add parameters to allow adding SQL filters while downloading a dataset #1604

dgaubert · 2020-03-31T17:19:39Z

Add 'sql_query' and 'add_geom' parameters to allow adding SQL filters while downloading a dataset

… while downloading a dataset

…do-datasets

This reverts commit acceaa8.

dgaubert · 2020-04-10T11:06:01Z

Note: tests are failing because it needs to have carto-python 1.11.0 released.

I've tried to install the devel version of carto-python where it includes the required changes to make tests to pass. I've not been able to achieve that. My best chance was:

$ git show acceaa8b04fa5693022353c486f85b48a426d02c
commit acceaa8b04fa5693022353c486f85b48a426d02c
Author: Daniel García Aubert <danielgarciaaubert@gmail.com>
Date:   Fri Apr 10 12:42:18 2020 +0200

    Install carto-python from custom github branch

diff --git a/setup.py b/setup.py
index 7061b4c..6395772 100644
--- a/setup.py
+++ b/setup.py
@@ -25,7 +25,7 @@ def get_version():

 REQUIRES = [
     'appdirs>=1.4.3,<2.0',
-    'carto>=1.10.1,<2.0',
+    'carto@git+https://github.com/cartodb/carto-python.git@dgaubert/ch58107/add-sql-filter-to-do-datasets#egg=carto',
     'jinja2>=2.10.1,<3.0',
     'geopandas>=0.6.0,<1.0',
     'tqdm>=4.32.1,<5.0',

and then: pip install -r requirements.txt

But the CI is failing due to:

Processing ./.tox/.tmp/package/1/cartoframes-1.0.2.zip
Direct url requirement (like carto@ git+https://github.com/cartodb/carto-python.git@dgaubert/ch58107/add-sql-filter-to-do-datasets#egg=carto) are not allowed for dependencies

Acceptance

So, if you are willing to test it locally, you must:

/path/to/carto-python$ git fetch origin
/path/to/carto-python$ git checkout dgaubert/ch58107/add-sql-filter-to-do-datasets
/path/to/carto-python$ cd /path/to/cartoframes
/path/to/cartoframes$ git fetch origin
/path/to/cartoframes$ git checkout dgaubert/ch58107/add-sql-filter-to-do-datasets
/path/to/cartoframes$ pip install -r requirements.txt
/path/to/cartoframes$ pip install -e /path/to/carto-python

simon-contreras-deel

Just a comment

simon-contreras-deel · 2020-04-13T07:48:48Z

cartoframes/data/observatory/catalog/entity.py

        auth_client = credentials.get_api_key_auth_client()
-        rows = DODataset(auth_client=auth_client).name(self.id).download_stream(limit=limit, order_by=order_by)
+
+        is_geography = None


What does is_geography = None mean?
Why does it depend on sql_qurery?

I mean, I see easier to add is_geography=True from geography and is_geography=False from dataset, and in the backend, get the all the options with sql_query and is_geography

It's an internal param when using the Geography class is set to True. We need to detect it when the user wants to download the geography dataset as the placeholder defined in the story is {geography} instead of {dataset} in the query. We can't know it by using only the sql_query param as we might need to parse it or use a regex and is troublesome.

I preferred being explicit in the client than trying to be smart in the backend.

Agree with what you say, but to keep the code simple you could do:

is_geography = self.__class__.__name__ == 'Geography'

(whether there's a sql_query or not shouldn't matter, right?)

rafatower

LGTM, just left a couple minor comments.

Yet we need to test this as much as needed in staging.

rafatower · 2020-04-13T14:38:32Z

cartoframes/data/observatory/catalog/entity.py

        auth_client = credentials.get_api_key_auth_client()
-        rows = DODataset(auth_client=auth_client).name(self.id).download_stream(limit=limit, order_by=order_by)
+
+        is_geography = None


Agree with what you say, but to keep the code simple you could do:

is_geography = self.__class__.__name__ == 'Geography'

(whether there's a sql_query or not shouldn't matter, right?)

tests/e2e/data/observatory/catalog/test_download.py

rafatower · 2020-04-13T14:42:59Z

tests/e2e/data/observatory/catalog/test_download.py

+        sql_query = 'select * from {dataset} order by geoid limit 2'
+        add_geom = True
+        df = public_dataset.to_dataframe(self.credentials, sql_query=sql_query, add_geom=add_geom)
+        df.to_csv(self.tmp_file, index=False)


why do you need to store df into a file for then reading and comparing with the expected_df?

Just followed what it's done in the rest of the tests.

Add 'sql_query' and 'add_geom' parameters to allow adding SQL filters…

889f3a8

… while downloading a dataset

dgaubert changed the base branch from dgaubert/ch61421/integrate-do-client-in-to-dataframe-and-to to release/1.0.2 April 2, 2020 13:00

dgaubert added 2 commits April 2, 2020 15:20

Merge branch 'release/1.0.2' into dgaubert/ch58107/add-sql-filter-to-…

7937537

…do-datasets

Merge branch 'release/1.0.2' into dgaubert/ch58107/add-sql-filter-to-…

6a0cefd

…do-datasets

dgaubert changed the base branch from release/1.0.2 to develop April 6, 2020 17:19

dgaubert added 4 commits April 10, 2020 10:40

Add test for download dataset with sql filters

1e1ff43

Install carto-python from custom github branch

acceaa8

Revert "Install carto-python from custom github branch"

e54fc42

This reverts commit acceaa8.

Remove uneeded parameter

67e71b3

dgaubert marked this pull request as ready for review April 10, 2020 11:05

dgaubert requested review from simon-contreras-deel and rafatower April 10, 2020 11:05

simon-contreras-deel reviewed Apr 13, 2020

View reviewed changes

dgaubert requested a review from simon-contreras-deel April 13, 2020 10:05

rafatower approved these changes Apr 13, 2020

View reviewed changes

dgaubert mentioned this pull request Apr 13, 2020

Add SQL filters while downloading a dataset CartoDB/carto-python#165

Merged

dgaubert added 4 commits April 13, 2020 18:42

Remove temporary changes

54ff6ec

Upgrade carto to version 1.11.0

7978091

Make condition even simpler

3ca590d

Fix mocks

e2f47e2

rafatower merged commit d7bc553 into develop Apr 15, 2020

rafatower deleted the dgaubert/ch58107/add-sql-filter-to-do-datasets branch April 15, 2020 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parameters to allow adding SQL filters while downloading a dataset #1604

Add parameters to allow adding SQL filters while downloading a dataset #1604

dgaubert commented Mar 31, 2020

dgaubert commented Apr 10, 2020

simon-contreras-deel left a comment

simon-contreras-deel Apr 13, 2020

dgaubert Apr 13, 2020 •

edited

Loading

rafatower Apr 13, 2020

rafatower left a comment

rafatower Apr 13, 2020

rafatower Apr 13, 2020

dgaubert Apr 14, 2020 •

edited

Loading

Add parameters to allow adding SQL filters while downloading a dataset #1604

Add parameters to allow adding SQL filters while downloading a dataset #1604

Conversation

dgaubert commented Mar 31, 2020

dgaubert commented Apr 10, 2020

Acceptance

simon-contreras-deel left a comment

Choose a reason for hiding this comment

simon-contreras-deel Apr 13, 2020

Choose a reason for hiding this comment

dgaubert Apr 13, 2020 • edited Loading

Choose a reason for hiding this comment

rafatower Apr 13, 2020

Choose a reason for hiding this comment

rafatower left a comment

Choose a reason for hiding this comment

rafatower Apr 13, 2020

Choose a reason for hiding this comment

rafatower Apr 13, 2020

Choose a reason for hiding this comment

dgaubert Apr 14, 2020 • edited Loading

Choose a reason for hiding this comment

dgaubert Apr 13, 2020 •

edited

Loading

dgaubert Apr 14, 2020 •

edited

Loading