-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parameters to allow adding SQL filters while downloading a dataset #1604
Add parameters to allow adding SQL filters while downloading a dataset #1604
Conversation
… while downloading a dataset
Note: tests are failing because it needs to have I've tried to install the devel version of $ git show acceaa8b04fa5693022353c486f85b48a426d02c
commit acceaa8b04fa5693022353c486f85b48a426d02c
Author: Daniel García Aubert <danielgarciaaubert@gmail.com>
Date: Fri Apr 10 12:42:18 2020 +0200
Install carto-python from custom github branch
diff --git a/setup.py b/setup.py
index 7061b4c..6395772 100644
--- a/setup.py
+++ b/setup.py
@@ -25,7 +25,7 @@ def get_version():
REQUIRES = [
'appdirs>=1.4.3,<2.0',
- 'carto>=1.10.1,<2.0',
+ 'carto@git+https://github.com/cartodb/carto-python.git@dgaubert/ch58107/add-sql-filter-to-do-datasets#egg=carto',
'jinja2>=2.10.1,<3.0',
'geopandas>=0.6.0,<1.0',
'tqdm>=4.32.1,<5.0', and then: But the CI is failing due to:
AcceptanceSo, if you are willing to test it locally, you must: /path/to/carto-python$ git fetch origin
/path/to/carto-python$ git checkout dgaubert/ch58107/add-sql-filter-to-do-datasets
/path/to/carto-python$ cd /path/to/cartoframes
/path/to/cartoframes$ git fetch origin
/path/to/cartoframes$ git checkout dgaubert/ch58107/add-sql-filter-to-do-datasets
/path/to/cartoframes$ pip install -r requirements.txt
/path/to/cartoframes$ pip install -e /path/to/carto-python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment
auth_client = credentials.get_api_key_auth_client() | ||
rows = DODataset(auth_client=auth_client).name(self.id).download_stream(limit=limit, order_by=order_by) | ||
|
||
is_geography = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does is_geography = None
mean?
Why does it depend on sql_qurery?
I mean, I see easier to add is_geography=True
from geography and is_geography=False
from dataset, and in the backend, get the all the options with sql_query
and is_geography
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an internal param when using the Geography
class is set to True
. We need to detect it when the user wants to download the geography dataset as the placeholder defined in the story is {geography}
instead of {dataset}
in the query. We can't know it by using only the sql_query
param as we might need to parse it or use a regex and is troublesome.
I preferred being explicit in the client than trying to be smart in the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with what you say, but to keep the code simple you could do:
is_geography = self.__class__.__name__ == 'Geography'
(whether there's a sql_query
or not shouldn't matter, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just left a couple minor comments.
Yet we need to test this as much as needed in staging.
auth_client = credentials.get_api_key_auth_client() | ||
rows = DODataset(auth_client=auth_client).name(self.id).download_stream(limit=limit, order_by=order_by) | ||
|
||
is_geography = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with what you say, but to keep the code simple you could do:
is_geography = self.__class__.__name__ == 'Geography'
(whether there's a sql_query
or not shouldn't matter, right?)
sql_query = 'select * from {dataset} order by geoid limit 2' | ||
add_geom = True | ||
df = public_dataset.to_dataframe(self.credentials, sql_query=sql_query, add_geom=add_geom) | ||
df.to_csv(self.tmp_file, index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to store df
into a file for then reading and comparing with the expected_df
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just followed what it's done in the rest of the tests.
Add 'sql_query' and 'add_geom' parameters to allow adding SQL filters while downloading a dataset