-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to disable Zero-filling of aggregateBy #189
Comments
What would be your expected result of .timestamps("2017-01-01", "2019-01-01", Interval.YEARLY)
.aggregateByTimestamp()
.flatMap(snap -> {
if(!snap.getTimestamp().equals(year2018)){
return Collections.singleton(rnd.nextBoolean()?
new Agg("A","Y"):
new Agg("B","X"));
}
return Collections.emptyList();}
)
.aggregateBy(Agg::level1)
.aggregateBy(Agg::level2)
.count() |
in the current version the result would be:
|
sorry, I don't understand the question |
IMO, this result is fine. |
there was a branch for this, which unfortunately never made it into a PR, but the central idea could be picked up again: https://github.com/GIScience/oshdb/compare/optional-zerofilling |
maybe the wording could be improved, because I'm not sure everyone understands what we mean with zerofill. Maybe a term sparse result would be more intuitive? What do you think? Btw, we would also need to specify what should happen if a user manually specifies (at least) one |
great you picked this up again! Yes, I like the wording of I tend more towards throwing an exception but both solution are fine. |
Problem
The MapAggregator class provides zero-filled results. This may generate a lot of undesired "zero" data entries.
Describe the solution you'd like
Zero-filling should be optional or disableable by a flag. For example:
Additional context
When working with multiple
aggregateBy
steps, and/or relatively sparse datasets (e.g. aggregation by user ids), it's not uncommon that most entries in the zero-filled result are zero. In such cases, these zero entries are typically not of interest, and could in worst case be a significant waste of CPU time and memory.The text was updated successfully, but these errors were encountered: