Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GeoTile and GeoHash Grid aggregations on GeoShapes. #5589

Merged
merged 3 commits into from
Jan 26, 2023

Conversation

navneet1v
Copy link
Contributor

@navneet1v navneet1v commented Dec 16, 2022

Description

  1. Add GeoTile and GeoHash Grid aggregations on GeoShapes.
  2. Added IT test for the GeoTile Aggregation.
  3. Added IT and Unit tests for new GeoShape Aggregations.

Issues Resolved

#4072 , #4071

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor comments, but as I'm not a subject matter expert here I unfortunately can't really give a good review on the core of this change.

* @return A list of the new registrar functions
*/
@Override
public List<Consumer<ValuesSourceRegistry.Builder>> getAggregationExtentions() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify why this registration has been removed? I do not 100% understand the consequences yet, but the comment documents the intention pretty well why it is needed: GEO_SHAPE is defined in core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was removed because I move it to GeoBoundsAggregatorFactory.java. When I was working on GeoBounds Aggregation I was thinking that to add new aggregations I need to put it in getAggregationExtentions(). But while I am working on GeoTile and GeoHash I realized that these aggregations can be added in *AggregatorFactory.class . The PR contains that change.

I hope this clarifies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah, I saw that, really unsure about the consequences but the change (AggregatorFactory) looks good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the changes via IT and also manual testing it works. Also, AggregatorFactory is used to register the aggregation. Given that this is same aggregation but on a different SourceValue type I don't see any issue. The only thing I can think of is, is this the right place from a code abstraction standpoint. I will wait for @nknize to comment on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But as per my understanding of the Aggregation framework, I think this looks good, as aggregations are present in same module/plugin. So we are good. getAggregationExtentions : function is provided to extend the aggregations by the plugin for a different Source value type but having same aggregation name defined in core. Doc link: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/plugins/SearchPlugin.java#L174-L176

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.geo.search.aggregations.bucket.GeoTileGridIT.testGeoShapes
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testCancelPrimaryAllocation

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

Codecov Report

Merging #5589 (719ced2) into main (db30439) will increase coverage by 0.06%.
The diff coverage is 4.34%.

@@             Coverage Diff              @@
##               main    #5589      +/-   ##
============================================
+ Coverage     70.66%   70.72%   +0.06%     
- Complexity    58526    58687     +161     
============================================
  Files          4769     4772       +3     
  Lines        280686   280776      +90     
  Branches      40533    40547      +14     
============================================
+ Hits         198333   198582     +249     
+ Misses        66093    65878     -215     
- Partials      16260    16316      +56     
Impacted Files Coverage Δ
.../main/java/org/opensearch/geo/GeoModulePlugin.java 100.00% <ø> (ø)
...cket/composite/GeoTileGridValuesSourceBuilder.java 91.07% <ø> (ø)
.../bucket/geogrid/GeoHashGridAggregationBuilder.java 100.00% <ø> (ø)
.../bucket/geogrid/GeoTileGridAggregationBuilder.java 100.00% <ø> (ø)
...ations/bucket/geogrid/cells/BoundedCellValues.java 100.00% <ø> (ø)
...ggregations/bucket/geogrid/cells/CellIdSource.java 69.23% <ø> (ø)
.../aggregations/bucket/geogrid/cells/CellValues.java 100.00% <ø> (ø)
...ons/bucket/geogrid/cells/GeoShapeCellIdSource.java 0.00% <0.00%> (ø)
...tions/bucket/geogrid/cells/GeoShapeCellValues.java 0.00% <0.00%> (ø)
...ions/bucket/geogrid/cells/UnboundedCellValues.java 100.00% <ø> (ø)
... and 533 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@navneet1v navneet1v requested a review from nknize January 19, 2023 03:29
@navneet1v
Copy link
Contributor Author

@nknize can you please take a look on the PR?

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful!!! One little nitpick.. can be done in a follow up PR. Thanks!

* methods which can be used across different bucket aggregations. If there is any common code that can be used
* across other integration test too then this is not the class. Use {@link GeoModulePluginIntegTestCase}
*/
public abstract class AbstractBucketAggregationIntegTest extends GeoModulePluginIntegTestCase {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe AbstractGeoBucketAggregationIntegTest to avoid confusion w/ other tests like AbstractBucketMetricsTestCase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed it. @nknize can you please merge in this PR after squashing the commits.

Signed-off-by: Navneet Verma <navneev@amazon.com>
…apes

Signed-off-by: Navneet Verma <navneev@amazon.com>
…t index preparation code.

Signed-off-by: Navneet Verma <navneev@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockWhenAllNodesExceededHighWatermark

@nknize nknize merged commit ee58457 into opensearch-project:main Jan 26, 2023
@nknize nknize added enhancement Enhancement or improvement to existing feature or request Geospatial backport 2.x Backport to 2.x branch labels Jan 26, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-5589-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ee58457bd5a7313bab201edd38195902f859027c
# Push it to GitHub
git push --set-upstream origin backport/backport-5589-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-5589-to-2.x.

@navneet1v
Copy link
Contributor Author

This PR is for 2.7 release. Will add this PR in 2.x branch once 2.6 is released

@andrross
Copy link
Member

andrross commented Feb 8, 2023

This PR is for 2.7 release. Will add this PR in 2.x branch once 2.6 is released

@navneet1v Just curious, why can't you backport this now for the next release? Is the code here somehow incomplete?

@navneet1v
Copy link
Contributor Author

The code is not incomplete. This feature has a UI part also which is not completed. Hence I am not merging it in 2.x Once it is merged in 2.x then it has to go in next release which is 2.6

@andrross
Copy link
Member

andrross commented Feb 8, 2023

I understand we might hold of on announcing the feature until the UI part is ready, but if the server portion is complete why not merge it when it is ready? In general early and continuous integration is better and it is one less thing to keep track of. It is probably not a big deal in this case but may be a useful example as a precedent.

@navneet1v
Copy link
Contributor Author

Help me understand this, if a feature is required to be launched in 2.7 can i merge it early like before 2.6 launch in 2.x branch?

@andrross
Copy link
Member

andrross commented Feb 9, 2023

From a purely technical point of view, if a backward compatible change is committed to main then it should be backported to 2.x. The risk of not doing so is that if another change comes in that touches the same part of code as this change did, then it won't be able to cleanly backport because of the divergent branches.

All committed code should be releasable. I believe the code here is in fact releasable (technically), but is being held back for other reasons, and that is what I'm pushing on. As a thought experiment, if we decided to scrap the 2.6 release and instead release 3.0, what would you do? Would you revert this code?

@navneet1v
Copy link
Contributor Author

From a purely technical point of view, if a backward compatible change is committed to main then it should be backported to 2.x. The risk of not doing so is that if another change comes in that touches the same part of code as this change did, then it won't be able to cleanly backport because of the divergent branches.

What you are saying about this is correct. If conflicting changes come it won't be a clean backport.

But here is my question with the current branching strategy if we are working on a change that requires multiple commits are we not supposed to merge in main? Where each commit can be built separately?

If I merge the changes for a release which is going to happen like 3 months down the road putting it in 2.x branch will make the cutting of 2.6 branches difficult. We need to revert some changes from 2.x which will add more burden on the release owner.

The strategy I am implementing here is keep on merging changes in main branch and only backport a feature when all other parts are fully ready.

If let's say I merge a big feature in 1 single commit, then approving that PR will be very difficult as it might contain like 100 files in it. I also cannot create a feature branch where I can put all the approved commits and then finally merge in main and 2.x because I don't have permission to create feature branch or keep it in sync with main.

On scrapping of all next 2.x releases and releasing 3.0 using main case, if that happens and we cut a 3.x branch. I am hoping maintainers will provide enough time to revert the commits which are not intended to be released.

Moreover, all this is my understanding based on what we do in our other repos that follow same branching strategy.

I am adding @dblock to comment here. As he originally proposed the branching strategy. He might have some thoughts here.

@andrross
Copy link
Member

andrross commented Feb 9, 2023

To be clear, the alternative I'm advocating here is that we should backport this PR and these server-side changes should be available to users in 2.6. Why is this not acceptable?

I am absolutely not advocating for developing a feature in a single big commit. But the incremental commits along the way should be releasable.

heemin32 pushed a commit to heemin32/OpenSearch that referenced this pull request Jun 27, 2023
…roject#5589)

Src files for GeoTile and GeoHash Aggregations on GeoShape with integration
tests.

Signed-off-by: Navneet Verma <navneev@amazon.com>
heemin32 pushed a commit to heemin32/OpenSearch that referenced this pull request Jun 27, 2023
…roject#5589)

Src files for GeoTile and GeoHash Aggregations on GeoShape with integration
tests.

Signed-off-by: Navneet Verma <navneev@amazon.com>
heemin32 pushed a commit to heemin32/OpenSearch that referenced this pull request Jun 28, 2023
…roject#5589)

Src files for GeoTile and GeoHash Aggregations on GeoShape with integration
tests.

Signed-off-by: Navneet Verma <navneev@amazon.com>
heemin32 pushed a commit to heemin32/OpenSearch that referenced this pull request Jun 28, 2023
…roject#5589)

Src files for GeoTile and GeoHash Aggregations on GeoShape with integration
tests.

Signed-off-by: Navneet Verma <navneev@amazon.com>
andrross pushed a commit that referenced this pull request Jun 30, 2023
* Add GeoTile and GeoHash Grid aggregations on GeoShapes. (#5589)

Src files for GeoTile and GeoHash Aggregations on GeoShape with integration
tests.

Signed-off-by: Navneet Verma <navneev@amazon.com>

* [opensearch-project/geospatial#212] Fixing the IT for GeoTilesAggrega… (#6120)

Fixing the IT for GeoTilesAggregation.

Signed-off-by: Navneet Verma <navneev@amazon.com>

* [#6187, #6222] Fixing the GeoShapes GeoHash and GeoTile Aggregations Integration tests. (#6242)

Changes done:
* Fixed the ArrayIndexOutOfBoundsException.
* Reduced the precision for GeoShapes Aggregation IT testing.

Signed-off-by: Navneet Verma <navneev@amazon.com>

* [#7101] Fixing the GeoTileIT#testMultivaluedGeoPointsAggregation test case. (#7166)

The issue was happening because we encode the GeoPoint as long and error comes in the precision due to that encoding. The error was not taken care while generating the exepected tiles count for execpected output.

Signed-off-by: Navneet Verma <navneev@amazon.com>

---------

Signed-off-by: Navneet Verma <navneev@amazon.com>
Signed-off-by: Heemin Kim <heemin@amazon.com>
Co-authored-by: Navneet Verma <navneev@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request Geospatial
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants