Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO - enrichment] Column types problem #1243

Closed
simon-contreras-deel opened this issue Nov 25, 2019 · 6 comments
Closed

[DO - enrichment] Column types problem #1243

simon-contreras-deel opened this issue Nov 25, 2019 · 6 comments
Assignees
Labels

Comments

@simon-contreras-deel
Copy link
Contributor

Problems

Until now, I thought our datasets in the catalog only have numeric columns, but I have realized we have different types. And for the enrichment, this is something to take into account in the polygons enrichment.

Right now, if you use a variable with a different type of NUMERIC we are going to fail because for each variable, we are doing the following:

{AGGREGATION}(enrichment_table.{COLUMN NAME} *
 (ST_Area(ST_Intersection(enrichment_geo_table.geom, data_table.{GEOM COLUMN}))
 / ST_area(data_table.{GEOM COLUMN}))) AS {COLUMN NAME}

where we are basically getting a numeric value (getting the porportional value of the area intersected).

But, what happens if the column is a STRING? a date? a boolean? We can not apply the same logic in these cases.

Even more: there are 2 numeric types of variables: extensive (it grows with area, like population) and intensive (it doesn’t grow with size, like population density). Doing the previous query, we are working fine with extensive ones, but we are falling with the intensive ones.

Problems recap:

  • What to do with data types different to a numeric value
  • What to do with intensive and numeric values

data types different to a numeric value

I would say: right now we are only supporting numeric ones. And then, one by one, start adding support for the rest of the types. In enrichment, we can know the data type because it is the catalog.

intensive and numeric values

I am not sure, but at least, we need to add a new property in the catalog to set this type.

cc @alasarr @arredond @cmongut

@alasarr
Copy link
Contributor

alasarr commented Nov 25, 2019

I am not sure, but at least, we need to add a new property in the catalog to set this type.

It has been already done by @arredond by setting the agg_method to None in the catalog

@alasarr
Copy link
Contributor

alasarr commented Nov 25, 2019

Right now we're only supporting at enrichment by polygons (not points) aggregation by numeric datasets. Aggregation of non-numeric is a cool feature for the future.

The aim of this issue is to skip the variables with agg_method to None in the catalog, a warning message must be shown.

For example, for strings we can take the mode with an average weight, but not a priority right now, we'll include it in the roadmap

@alasarr
Copy link
Contributor

alasarr commented Nov 25, 2019

After a second thought, you're totally right @oleurud! agg_method to None means that there is no default aggregator, but we should do something (like a weighted average or just an avg) for intensive variables. We'll talk tomorrow about it, not sure if it's better to do it based on the aggregation function or using a catalog parameter.

@simon-contreras-deel
Copy link
Contributor Author

simon-contreras-deel commented Nov 26, 2019

After talking, the objectives are:

  1. skip variables without agg method (now we are adding a default aggregation to this fields)
  2. add agg method to the result column name (right now the result column name is named equals to the column, but we could have same variable aggregated several times)
  3. make the division only with SUM aggregations. Also, take into account the custom aggregation parameter.
  4. do the previous step well (weighted average)
  5. working with no numeric values TBD

The objective for this iteration is to do at least 1 and 2 (reduced scope and avoiding errrors), and 3 (doing it better) if posssible

@simon-contreras-deel
Copy link
Contributor Author

As we are not going to tackle the points 4 and 5 now, I have created a new issue for them #1259

@simon-contreras-deel
Copy link
Contributor Author

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants