Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to disable Zero-filling of aggregateBy #189

Open
SlowMo24 opened this issue Jul 9, 2019 · 7 comments
Open

Option to disable Zero-filling of aggregateBy #189

SlowMo24 opened this issue Jul 9, 2019 · 7 comments
Assignees
Labels
enhancement New feature or request priority:low Should be quite a far way down on the agenda user experience Enhances the usability of OSHDB

Comments

@SlowMo24
Copy link
Contributor

SlowMo24 commented Jul 9, 2019

Problem
The MapAggregator class provides zero-filled results. This may generate a lot of undesired "zero" data entries.

Describe the solution you'd like
Zero-filling should be optional or disableable by a flag. For example:

mapaggregator.zerofill(false)

Additional context
When working with multiple aggregateBy steps, and/or relatively sparse datasets (e.g. aggregation by user ids), it's not uncommon that most entries in the zero-filled result are zero. In such cases, these zero entries are typically not of interest, and could in worst case be a significant waste of CPU time and memory.

@tyrasd tyrasd added the user experience Enhances the usability of OSHDB label Feb 5, 2020
@rtroilo
Copy link
Member

rtroilo commented Feb 10, 2020

What would be your expected result of

.timestamps("2017-01-01", "2019-01-01", Interval.YEARLY)
.aggregateByTimestamp()
.flatMap(snap -> { 
  if(!snap.getTimestamp().equals(year2018)){
    return Collections.singleton(rnd.nextBoolean()?
        new Agg("A","Y"):
        new Agg("B","X"));
  }
  return Collections.emptyList();}
)
.aggregateBy(Agg::level1)
.aggregateBy(Agg::level2)
.count()

@rtroilo
Copy link
Member

rtroilo commented Feb 10, 2020

in the current version the result would be:

2017-01-01&A&X->0
2017-01-01&A&Y->6
2017-01-01&B&X->7
2017-01-01&B&Y->0
2018-01-01&A&X->0
2018-01-01&A&Y->0
2018-01-01&B&X->0
2018-01-01&B&Y->0
2019-01-01&A&X->0
2019-01-01&A&Y->32
2019-01-01&B&X->24
2019-01-01&B&Y->0

@SlowMo24
Copy link
Contributor Author

sorry, I don't understand the question

@tyrasd
Copy link
Member

tyrasd commented Feb 11, 2020

IMO, this result is fine.
If zerofilling was disabled (as per this feature request), I would expect only results with values > 0 in the final result.

@tyrasd
Copy link
Member

tyrasd commented Jul 23, 2021

there was a branch for this, which unfortunately never made it into a PR, but the central idea could be picked up again:

https://github.com/GIScience/oshdb/compare/optional-zerofilling

@tyrasd
Copy link
Member

tyrasd commented Jul 23, 2021

maybe the wording could be improved, because I'm not sure everyone understands what we mean with zerofill. Maybe a term sparse result would be more intuitive? What do you think?

Btw, we would also need to specify what should happen if a user manually specifies (at least) one aggregateBy with a zerofill and still requests the non-zerofilled output, which would contradict each other: As in mapReducer.aggregateBy(Agg::level1, EnumSet.of(A, B)).zerofill(false).count(): Should the zerofill(false) take precedence over the manually specified zerofill keys (A, B)? Alternatively, there could be an exception be thrown. I'd prefer the first solution at the moment.

@SlowMo24
Copy link
Contributor Author

great you picked this up again! Yes, I like the wording of mapaggregator.sparseResult(true) with .sparseResult(false) being the default (current procedure).

I tend more towards throwing an exception but both solution are fine.

@Hagellach37 Hagellach37 added enhancement New feature or request priority:low Should be quite a far way down on the agenda labels Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:low Should be quite a far way down on the agenda user experience Enhances the usability of OSHDB
Projects
None yet
Development

No branches or pull requests

4 participants