-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new procedure: filter_item #14
Comments
Isn't this better called |
No, this is not filtering columns in datapoints. It's filtering keys in the ingredient. Internally the data of ingredient are store as a dictionary. If we have an ingredient like this: - id: example-datapoint-ingredient
dataset: example-dataset
key: geo,time
value: [concept1, concept2] When we try to get data from this ingredient, the result will be a dictionary
Then we create a new ingredient where
But in final DDF output, we only want An other way is add filter option for each procedure, so in each step we can drop those concepts we don't need later. I think either way is ok but I prefer to do this in a procedure. p.s Thanks for asking about |
Ok, help me out, cause I'm not getting the difference. If we take your example: - id: example-datapoint-ingredient
dataset: example-dataset
key: geo,time
value: [concept1, concept2] and put it in a table:
Then do the sum you said
And then remove concept1 because we only want concept2 and concept3
Didn't we just remove a column? I'm not sure how this is not removing a column. Maybe it's safe if we say you can only remove columns which are values, not keys? Is that why you don't call it remove a column, cause it's too broad?
I don't see how that follows from your example. I just see a column/(concept-used-as-value) be removed. Not sure what you meant about |
oh, I see your point. There is one different between the table you used above and the actual representation of data in Chef module. In chef, I am using a dictionary: {
concept2:
} Because usually we call a (key, value) pair an item of a dictionary, so I name this procedure Using your representation of data table, I chose the dictionary structure because:
|
Okay, so I understand your implementation details, thanks for explaining! Always good to know the rational behind it (and have it documented here at least).
So: I'd have just this function, we don't need both this one and #33.
Depending also on the outcome of #2 , we can see what the right naming of this function is. |
Thanks for the theory link, always good to know more about theories :) I think only when a dimension column just have only one unique value we can safely remove the dimension column. That's what will |
Reason
While running the recipe cooking procedures, there will be some temporary data. We need to remove them in the final result, so we need a procedure for this task.
API
filter_item takes two parameters:
ingredient
: the ingredient to filteritems
: a list of the items we should keep. so other than the items in this list, the other items will be dropped.The text was updated successfully, but these errors were encountered: