Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/fix enrichment new catalogue #1083

Merged
merged 27 commits into from
Oct 23, 2019

Conversation

alejandrohall
Copy link
Contributor

No description provided.

@alrocar
Copy link
Contributor

alrocar commented Oct 10, 2019

@alejandrohall there are some tests failing, could you take a look?

@alejandrohall
Copy link
Contributor Author

alejandrohall commented Oct 15, 2019

@alrocar Done!

Copy link
Contributor

@simon-contreras-deel simon-contreras-deel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks perfect, but I would change dataset names used in tests using fake ones.

FROM `carto-do-customers.{user_dataset}\
.ags_demographics_crimerisk_usa_blockgroup_2015_yearly_2018` enrichment_table
.view_ags_demographics_crimerisk_usa_blockgroup_2015_yearly_2018` enrichment_table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are trying to avoid using real names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need real names because of functions are using the real catalog, so we need real examples of table names. Also, I cannot see any problems, because we are offering publicly this dataset through website and catalog

@@ -162,14 +170,6 @@ def __process_agg_operators(agg_operators, variables):
return agg_operators_result
Copy link
Contributor

@elenatorro elenatorro Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__process_agg_operators method should also take into account what happens if this argument is a string, as we're doing in the enrich_polygons method. If it's a string, it throws 'str' object has no attribute 'copy'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@elenatorro
Copy link
Contributor

I've added a couple of comments. Although I made some changes to make the enrich_polygons method work without errors locally, the variables are not being added in the result dataframe. Therefore, we need to fix these issues before merging this PR.

variables_underscored='_'.join(variables), enrichment_table=table,
enrichment_geo_table=table_to_geotable[table], user_dataset=user_dataset,
working_project=working_project, data_table=data_table,
'''.format(enrichment_id=enrichment_id, variables_underscored='_'.join(variables),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variables_underscored is not being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused variable from format method!

@Jesus89
Copy link
Member

Jesus89 commented Oct 18, 2019

Let's take into account this too https://github.com/CartoDB/data-observatory/issues/188#issuecomment-540737126, before merging the PR

@elenatorro
Copy link
Contributor

I discovered another error that is related with the enrich_points method: the aggregation column returns Decimal values. When reading a source, we've a method called encode_geodataframe that raises the following error: Object of type Decimal is not JSON serializable

If we convert the column type to float, it works. My question is: should we return the agg column always with float type? or is this something we should change when encoding the geodataframe?

cc @alejandrohall @Jesus89

@Jesus89
Copy link
Member

Jesus89 commented Oct 23, 2019

I would add this fix (convert to float) in the lib.

@Jesus89
Copy link
Member

Jesus89 commented Oct 23, 2019

For some reason, after merging the enriched DataFrame in the main DataFrame the pandas method to_json does not work anymore. However, the to_json method works in both DataFrames separately. So it's still a mistery why after merging both DataFrames, an object type is converted to Decimal.

Fortunately, there is a solution. The fix consists of using a custom JSONEncoder for the to_json method when we convert a DF into a GDF.

@Jesus89 Jesus89 force-pushed the feature/fix_enrichment_new_catalogue branch from bd9d001 to 3155cf8 Compare October 23, 2019 11:00
def encode_geodataframe(data):
filtered_geometries = _filter_null_geometries(data)
data = _set_time_cols_epoc(filtered_geometries).to_json()
data = _set_time_cols_epoc(filtered_geometries).to_json(cls=CustomJSONEncoder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

Copy link
Contributor

@elenatorro elenatorro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@Jesus89 Jesus89 merged commit 74dc81b into develop Oct 23, 2019
@Jesus89 Jesus89 deleted the feature/fix_enrichment_new_catalogue branch October 23, 2019 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants