-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recipe procedure: align #3
Comments
related issue: #3 function working in some cases, need more testing
example: This recipe replace the country column in datapoints to align with the GW geo entity. |
How could we clear up the API? I think this can be joined with {
procedure: 'translate_column',
ingredient: 'wdi-countries', // ingredient which will be translated
column: 'country', // column in ingredient which will be translated
target_column: 'geo', // column where translations will be written to.
// Overwrites when target_column already exists.
// Creates new column when target_column does not exist.
// Defaults to 'column' value.
dictionary: {
ingredient: 'gw-entities', // ingredient from which to build dictionary
key/from: 'name' || ['name','alt1','alt2'], // columns which form dictionary keys. May default to ingredient value columns?
value/to: 'geo' // column which forms dictionary value. May default to ingredient key column?
},
dictionary: 'path/to/dictionary.json', // path to a json file containing a dictionary
dictionary: { // inline dictionary
China: 'chn',
Sweden: 'swe'
},
not_found: 'error' || 'drop' || 'include', // action on row when column contains value not in dictionary keys. Defaults to include.
} of course only one of the dictionaries can be given Questions:
|
p.s. here: #28 |
yes, @jheeffer your suggestion looks clear. I agree we can join this function to translation_column, and this will also cover what we need for #28 My suggestions:
{
procedure: 'translate_column',
ingredient: 'wdi-countries', // ingredient which will be translated
options: {
column: 'country', // column in ingredient which will be translated
target_column: 'geo', // column where translations will be written to.
// Overwrites when target_column already exists.
// Creates new column when target_column does not exist.
// Defaults to 'column' value.
type: 'ingredient' // type of dictionary
dictionary: {
ingredient: 'gw-entities', // ingredient from which to build dictionary
key: 'name' || ['name','alt1','alt2'], // columns which form dictionary keys
value: 'geo' // column which forms dictionary value
}
} |
Alright, options is good then. I see you prefer key/value over from/to. I would call |
Yes, |
I think this is ready for implementation! |
for now we will parse the options object as descripted in #3 (comment)
Reason
The same entity may have different keys in different datasets. For example, the key for country entity, may be the iso code in one dataset, while in the other dataset it may be the alphanumeric form of the country name. When creating a new dataset we may want to use the datapoints from one and country key from other. so we need a procedure to change the country in datapoints to match the other dataset.
EDIT: I found this is an automate version of
translate_column
.translate_column
takes a dictionary as input andalign
generate the dictionary on the fly. Not sure if it's good to combine this function into translate_column.API
align
procedure translate the index column in a ingredient to make it align with an other ingredient.the
align
procedure accepts following options:ingredients
: the target ingredient to translatesearch_cols
: the columns to search for values in base ingredientto_find
: the column in target ingredient contains values to searchto_replace
: after finishing search, replace the column with new valueThe text was updated successfully, but these errors were encountered: