Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GeoHash coordinate conversions ~40-100x faster #348

Merged
merged 38 commits into from
May 21, 2024

Conversation

PaulusParssinen
Copy link
Contributor

@PaulusParssinen PaulusParssinen commented Apr 30, 2024

Note

Credits for the quantization approach to https://mmcloughlin.com/posts/geohash-assembly (by @mmcloughlin)
and de-quantization method to https://github.com/georust/geohash, big thank you for everyone contributing to both efforts!

This PR originally was meant to be simply tiny optimization to avoid unnecessary heap allocations on the coordinate conversions but I had change of plans when I discovered an article about much faster (de)quantization method by Michael McLoughlin and this ended up being GeoHash class rewrite. Please see his great article and explanation.

Much of the commits after 564891a is me discovering new exciting details about IEEE-754 floating-point rounding, confusing reference implementations (it turns out, there's no single correct way to reference a location on earth 😅) and other fun Geohash corner-cases.

However, more notably there's tiny error in the quantization approach described in the McLoughing's excellent article which I describe in this comment and fix in f364a96c..b4ec75c. Gist of it is that in IEEE-754 double precision format, the exponent assumption of 1023 does not hold for 2.0, which the maximum coordinate values get rounded to.

There should be no other behavioral differences than the last character of textual representation, which one of the existing tests already ignored understandably due to nuances around the precision of it. The textual representation is should simply be base-32 encoded (custom geohash alphabet) of the integer that represents a specific GeoHash where more characters means more bits of precision. Every encoded character gives 5-bits of precision. Very simple right! I wish it was

Garnet (and Redis) chose 52-bit precision. Why? Because we want to store the geohash integer in a sorted set, which allows us to utilize the geohash structure and do really efficient range queries. Now because we want to reuse the existing sorted set we need to safely store the geohash integer in IEEE-754 double precision score which we can do that safely when our integer is in range [(-2)^53. 2^53].

Now, we can still encode this 52-bit precision integer in the base-32 alphabet just fine. The remaining 55-52=3 bits will be zero and that's what Garnet has done, but Redis seems to zero out the last 2 bits and output the last character of the textual representation as 0.

Redis made another choice too, which is that they chose to follow EPSG:3875 where:

  • Valid longitudes are from -180 to 180 degrees.
  • Valid latitudes are from -85.05112878 to 85.05112878 degrees.

This makes sense as the poles cause all sorts of problems in GIS applications.

Confusing history behind Web Mercator

image

These limits however do not apply to "standard" geohash format which valid latitudes are [-90, 90]. This difference also adds nuance to geohash implementations. Garnet currently accepts coordinates per the geohash limitations.

This PR also introduces AsciiUtils in Garnet.common as place to share common ASCII manipulation logic (acts more like polyfill for Ascii class in .NET 6 and a place where we can optimize further than what BCL's currently does).

This PR is strictly focused on GeoHash class alone. I'll remove more unnecessary allocations in the actual RESP protocol parsing & encoding in follow-up. There's some vectorization opportunities here and the distance math can be definitely improved too.

Benchmarks (Tiger Lake, i5-1135G7)

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
11th Gen Intel Core i5-1135G7 2.40GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.204
  [Host] : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  .NET 6 : .NET 6.0.29 (6.0.2924.17105), X64 RyuJIT AVX2
  .NET 8 : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Server=True  

main

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 278.3 ns 0.51 ns 0.43 ns 0.0191 80 B
GeoToLongValue .NET 8 260.0 ns 0.85 ns 0.80 ns 0.0191 80 B
GetCoordinatesFromLong .NET 6 429.1 ns 1.30 ns 1.22 ns 0.0801 336 B
GetCoordinatesFromLong .NET 8 213.8 ns 0.85 ns 0.75 ns 0.0801 336 B
GetGeoHashCode .NET 6 810.1 ns 4.79 ns 4.24 ns 0.2537 1064 B
GetGeoHashCode .NET 8 619.1 ns 1.88 ns 1.76 ns 0.2537 1064 B

PR

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 5.556 ns 0.0156 ns 0.0146 ns - -
GeoToLongValue .NET 8 1.888 ns 0.0156 ns 0.0130 ns - -
GetCoordinatesFromLong .NET 6 5.074 ns 0.0038 ns 0.0031 ns - -
GetCoordinatesFromLong .NET 8 1.215 ns 0.0038 ns 0.0034 ns - -
GetGeoHashCode .NET 6 15.944 ns 0.2487 ns 0.2077 ns 0.0014 48 B
GetGeoHashCode .NET 8 15.933 ns 0.0331 ns 0.0294 ns 0.0012 48 B

Benchmarks (Zen 2, Ryzen 7 3700X)

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4170/22H2/2022Update)
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host] : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX2
  .NET 6 : .NET 6.0.27 (6.0.2724.6912), X64 RyuJIT AVX2
  .NET 8 : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX2

Server=True

main

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 290.1 ns 3.08 ns 2.88 ns 0.0095 80 B
GeoToLongValue .NET 8 269.4 ns 2.14 ns 1.90 ns 0.0095 80 B
GetCoordinatesFromLong .NET 6 510.0 ns 9.71 ns 9.54 ns 0.0401 336 B
GetCoordinatesFromLong .NET 8 195.3 ns 3.94 ns 3.49 ns 0.0401 336 B
GetGeoHashCode .NET 6 877.3 ns 6.87 ns 6.42 ns 0.1268 1064 B
GetGeoHashCode .NET 8 525.6 ns 6.46 ns 6.04 ns 0.0982 824 B

PR

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 5.646 ns 0.0306 ns 0.0255 ns - -
GeoToLongValue .NET 8 5.829 ns 0.0537 ns 0.0502 ns - -
GetCoordinatesFromLong .NET 6 5.507 ns 0.0583 ns 0.0545 ns - -
GetCoordinatesFromLong .NET 8 5.928 ns 0.0423 ns 0.0396 ns - -
GetGeoHashCode .NET 6 16.286 ns 0.0968 ns 0.0808 ns 0.0003 48 B
GetGeoHashCode .NET 8 15.721 ns 0.1790 ns 0.1587 ns 0.0003 48 B
Zen 2 PDEP/PEXT "erratum" in action
Method Job Mean Error StdDev Allocated
GeoToLongValue .NET 6 w/ UsePdepPext 64.98 ns 0.132 ns 0.103 ns -
GeoToLongValue .NET 8 w/ UsePdepPext 64.95 ns 0.383 ns 0.359 ns -
GetCoordinatesFromLong .NET 6 w/ UsePdepPext 62.87 ns 0.137 ns 0.128 ns -
GetCoordinatesFromLong .NET 8 w/ UsePdepPext 62.92 ns 0.090 ns 0.084 ns -

See that 48B heap allocation for the final string in GetGeoHashCode, we don't need that :)

@TalZaccai TalZaccai requested a review from yrajas April 30, 2024 22:40
@PaulusParssinen PaulusParssinen marked this pull request as draft May 1, 2024 20:02
@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 2, 2024

CanEncodeAndDecodeCoordinates

All values computed values printed with ToString("F16") (except the updated row, which is F14..)

Label Latitude Longitude Lat Error (vs. input) Lon Error (vs. input)
Test input 30.5388942218 104.0555758833 - -
Test IEEE-754 repr. 30.5388942217999997 104.0555758832999942 - -
main 30.5388928949832916 104.0555772185325623 0.0000013268167081 -0.0000013352325681
PR (1b59a0a) 30.5388915538787842 104.0555745363235474 0.0000026679212155 0.0000013469764468
PR (calculate center) 30.538892894983292 104.05557721853256 0.0000013268167081 -0.0000013352325681

edit: and I finally realize, the de-quantized coordinate represents the minimum value and we get the same coordinate as main branch which represented the center of this error-bounding-box by:
$(\text{{minLatitude}} + \frac{{\text{{latitudeError}}}}{2.0}, \text{{minLongitude}} + \frac{{\text{{longitudeError}}}}{2.0})$

Funny in hindsight as I was earlier wondering why those error values were multiplies of each other..

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 2, 2024

Premature optimization is root of all evil fun!

Few test failures left, need to dive deeper into some rounding corner-cases.

The diff looks incomprehensible to review, it's pretty much a rewrite 😅

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 3, 2024

Added USE_PDEP_PEXT to allow those consumers who want to extract every bit of perf. for their Geo commands (and who know their CPU has the non-microcode-emulated version). SharpLab (Entire encoding/decoding is inlined into one method 😋)

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 4, 2024

adds "Investigate LightClient implementation" to todo list. It seems run some tests until timeout if it receives the expected amount of bytes

For some reason, I remembered/thought that the hash integer was bit-cast i.e. BitConverter.DoubleToUInt64Bits to the sorted set instead of just cast. That explains the 52-bits and I'll cancel the precision change plans..

I just realized the original implementation was not actually redis "52-bit" precision either 🤔 Redis seems to always "clear" out the last 2 bits for the textual representation output, so it always returns 0 as last character. Before this PR, there was already difference between the textual representation between Garnet and Redis.

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 6, 2024

There's small corner-case error in the McLoughin's quantization trick on which this PR is based.

The statement

Since y is in the range [1,2], the largest power of two less than or equal to y is 1.0 and the exponent field will always be 1023.

is wrong for the upper-bound of the range. Exponent bias is 1023 for [1.0, 2.0) and exponent for 2.0 flips over to 1024. This means that 90.0 and 180.0 that are clamped to the range maximum 2.0, have their signicand (and subsequently, the geohash integer repesentation) zeroed and dequantization will give equivalent output as -90.0 and -180.0 respectively.

There's a lot of problems with GeoHash in general around the corner values where implementations differ..

Humor

GCQ2uv5bwAEQgju

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 7, 2024

And to add to then confusion, the original latitude limits are "wrong". According EPSG:3857 (also known as EPSG:3785, also known as EPSG:900913.. etc.) the latitude limits are -85.05112878 to 85.05112878.. not -90 to 90. Fun! I'm starting to see why the implementation are all over the place as the GeoHash standard uses different limits than the most common web projection.

I'll won't touch the existing latitude limit in this PR.

@badrishc
Copy link
Contributor

badrishc commented May 8, 2024

Interesting stuff! Glad to see you are making progress here. :)

@PaulusParssinen PaulusParssinen marked this pull request as ready for review May 8, 2024 01:06
@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 8, 2024

Added benchmarks from my laptop which has AVX512F by which we guard the PEXT/PDEP path. This made both encoding/decoding of geohashes integers go from ~5.5ns to ~1.5ns which was more than I expected. It may be worthwhile to enable more platforms with static readonly Zen 2 check by doing X86Base.CpuId(0, 0) and X86Base.CpuId(1, 0) and check if AuthenticAMD && family < 0x19 to guard Pre-Zen 3.

Also wrote little bit more about the PR to its description.

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 8, 2024

Added more background to the PR description about the nuances of geohash..

@PaulusParssinen PaulusParssinen changed the title Optimize GeoHash coordinate conversions Make GeoHash coordinate conversions ~40-150x faster May 9, 2024
@PaulusParssinen PaulusParssinen changed the title Make GeoHash coordinate conversions ~40-150x faster Make GeoHash coordinate conversions ~40-100x faster May 9, 2024
Garnet.sln Outdated Show resolved Hide resolved
* Keep them in registers (or atmost spill to stack)
* Also some other misc. simplification
* I will return to this method in a follow-up
* Abuse IEEE-754 binary representation in the encoding too
* Implement Z-curve decoding more efficiently
* Add GetGeoErrorByPrecision to calculate the error at given bit precision (52 for us).
* Optimize base32 string encoding.
* Test still fail. Will need to investigate more.
@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 21, 2024

Out of curiosity, what required a force-push? 😅 (This broke the history experience in GitHub UI. I have still this branch in it's pre-force-push sync state if we want to restore that.)

@yrajas
Copy link
Contributor

yrajas commented May 21, 2024

Out of curiosity, what required a force-push? 😅 (This broke the history experience in GitHub UI. I have still this branch in it's pre-force-push sync state if we want to restore that.)

No changes, actually. I was updating the branch with rebase option to bring in latest changes from main.

@yrajas yrajas merged commit e0027d1 into microsoft:main May 21, 2024
23 checks passed
@PaulusParssinen PaulusParssinen deleted the opt-geohash branch May 22, 2024 09:19
chyin6 pushed a commit to jusjin-org/garnet that referenced this pull request Jul 2, 2024
* Refactor Haversine-distance calculation

* See https://github.com/dotnet/runtime/blob/333fb71d54bd84256e740aa08f8b836d4cd71d98/src/libraries/System.Private.CoreLib/src/System/Numerics/ITrigonometricFunctions.cs#L65-L113

* Do not spill the coordinate ranges to heap when encoding/decoding

* Keep them in registers (or atmost spill to stack)
* Also some other misc. simplification

* Add shared AsciiUtils to Garnet.common to simplify the unit conversions

* Slightly adjust GeoHash tests

* dotnet format

* Restore flag bit for  GetGeoHashCode for now

* I will return to this method in a follow-up

* Optimize GeoToLongValue to use float quantization trick and do the Z-curve encoding more directly

* Credits to https://mmcloughlin.com/posts/geohash-assembly for the quantization approach!

* Further optimize Geohash & Base32 encoding and decoding

* Abuse IEEE-754 binary representation in the encoding too
* Implement Z-curve decoding more efficiently
* Add GetGeoErrorByPrecision to calculate the error at given bit precision (52 for us).
* Optimize base32 string encoding.
* Test still fail. Will need to investigate more.

* typo

* Further clarify the quantization method

* Calculate the center of bounding-box

* Clarify dequantization method

* Make the bounding-box center fix-up use constants

* Add more test-data and restore original epsilon calc.

* tests: sqc8b49rnyt -> sqc8b49rnys

* tests: nsqdtr74hyu1 -> nsqdtr74hyu0

* Exponent is 1023, not 0

* Add USE_PDEP_PEXT switch for PDEP/PEXT Z-curve en/decode

* And define it to run the tests with it

* Use FusedMultiplyAdd to do (x+y)*z in one op to avoid intermediate rounding

* More accurate and faster, what not to love

* Remove #define USE_PDEP_PEXT, tests passed

* Move GeoHash specific unit tests own file

* Mark Z-curve encode/decode with MethodImpl.AI

* Little extra encouragement to JIT.

* Add GeoHash specific benchmarks

* oops

* format

* Add UsePdepPext build switch.

* And add GeoHashBenchmark job with it enabled

* Avoid shifting by using already shifted mask for the PDEP/PEXT

* Use AVX512 support to guard PDEP/PEXT usage.

* Add MemoryDiagnoser back

* Fix incorrect quantization approach

* Make it little bit more clear what happens in the corner-case guard

* We might not even wan't cmov/csel which might stall out-of-order execution. Doesn't matter what is emitted tbh.

* Remove not needed pow2 trick

* Let JIT do its thing, it's pretty good

* Adjust comments a bit

* Fix comment typos

* Adjust comments

* Adjust comments

* Add third-party notices to NOTICE.md

* The GeoHash class incorporates material from mmcloughlin/geohash and georust/geohash, both licensed under MIT License. Thank you for sharing!

---------

Co-authored-by: Yoganand Rajasekaran <60369795+yrajas@users.noreply.github.com>
@github-actions github-actions bot locked and limited conversation to collaborators Jul 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants