Make `GeoHash` coordinate conversions ~40-100x faster #348

PaulusParssinen · 2024-04-30T21:44:33Z

Note

Credits for the quantization approach to https://mmcloughlin.com/posts/geohash-assembly (by @mmcloughlin)
and de-quantization method to https://github.com/georust/geohash, big thank you for everyone contributing to both efforts!

This PR originally was meant to be simply tiny optimization to avoid unnecessary heap allocations on the coordinate conversions but I had change of plans when I discovered an article about much faster (de)quantization method by Michael McLoughlin and this ended up being GeoHash class rewrite. Please see his great article and explanation.

Much of the commits after 564891a is me discovering new exciting details about IEEE-754 floating-point rounding, confusing reference implementations (it turns out, there's no single correct way to reference a location on earth 😅) and other fun Geohash corner-cases.

However, more notably there's tiny error in the quantization approach described in the McLoughing's excellent article which I describe in this comment and fix in f364a96c..b4ec75c. Gist of it is that in IEEE-754 double precision format, the exponent assumption of 1023 does not hold for 2.0, which the maximum coordinate values get rounded to.

There should be no other behavioral differences than the last character of textual representation, which one of the existing tests already ignored understandably due to nuances around the precision of it. The textual representation is should simply be base-32 encoded (custom geohash alphabet) of the integer that represents a specific GeoHash where more characters means more bits of precision. Every encoded character gives 5-bits of precision. Very simple right! _{I wish it was}

Garnet (and Redis) chose 52-bit precision. Why? Because we want to store the geohash integer in a sorted set, which allows us to utilize the geohash structure and do really efficient range queries. Now because we want to reuse the existing sorted set we need to safely store the geohash integer in IEEE-754 double precision score which we can do that safely when our integer is in range [(-2)^53. 2^53].

Now, we can still encode this 52-bit precision integer in the base-32 alphabet just fine. The remaining 55-52=3 bits will be zero and that's what Garnet has done, but Redis seems to zero out the last 2 bits and output the last character of the textual representation as 0.

Redis made another choice too, which is that they chose to follow EPSG:3875 where:

Valid longitudes are from -180 to 180 degrees.
Valid latitudes are from -85.05112878 to 85.05112878 degrees.

This makes sense as the poles cause all sorts of problems in GIS applications.

Confusing history behind Web Mercator

These limits however do not apply to "standard" geohash format which valid latitudes are [-90, 90]. This difference also adds nuance to geohash implementations. Garnet currently accepts coordinates per the geohash limitations.

This PR also introduces AsciiUtils in Garnet.common as place to share common ASCII manipulation logic (acts more like polyfill for Ascii class in .NET 6 and a place where we can optimize further than what BCL's currently does).

This PR is strictly focused on GeoHash class alone. I'll remove more unnecessary allocations in the actual RESP protocol parsing & encoding in follow-up. There's some vectorization opportunities here and the distance math can be definitely improved too.

Benchmarks (Tiger Lake, i5-1135G7)

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
11th Gen Intel Core i5-1135G7 2.40GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.204
  [Host] : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  .NET 6 : .NET 6.0.29 (6.0.2924.17105), X64 RyuJIT AVX2
  .NET 8 : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Server=True

`main`

Method	Job	Mean	Error	StdDev	Gen0	Allocated
GeoToLongValue	.NET 6	278.3 ns	0.51 ns	0.43 ns	0.0191	80 B
GeoToLongValue	.NET 8	260.0 ns	0.85 ns	0.80 ns	0.0191	80 B
GetCoordinatesFromLong	.NET 6	429.1 ns	1.30 ns	1.22 ns	0.0801	336 B
GetCoordinatesFromLong	.NET 8	213.8 ns	0.85 ns	0.75 ns	0.0801	336 B
GetGeoHashCode	.NET 6	810.1 ns	4.79 ns	4.24 ns	0.2537	1064 B
GetGeoHashCode	.NET 8	619.1 ns	1.88 ns	1.76 ns	0.2537	1064 B

PR

Method	Job	Mean	Error	StdDev	Gen0	Allocated
GeoToLongValue	.NET 6	5.556 ns	0.0156 ns	0.0146 ns	-	-
GeoToLongValue	.NET 8	1.888 ns	0.0156 ns	0.0130 ns	-	-
GetCoordinatesFromLong	.NET 6	5.074 ns	0.0038 ns	0.0031 ns	-	-
GetCoordinatesFromLong	.NET 8	1.215 ns	0.0038 ns	0.0034 ns	-	-
GetGeoHashCode	.NET 6	15.944 ns	0.2487 ns	0.2077 ns	0.0014	48 B
GetGeoHashCode	.NET 8	15.933 ns	0.0331 ns	0.0294 ns	0.0012	48 B

Benchmarks (Zen 2, Ryzen 7 3700X)

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4170/22H2/2022Update)
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host] : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX2
  .NET 6 : .NET 6.0.27 (6.0.2724.6912), X64 RyuJIT AVX2
  .NET 8 : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX2

Server=True

`main`

Method	Job	Mean	Error	StdDev	Gen0	Allocated
GeoToLongValue	.NET 6	290.1 ns	3.08 ns	2.88 ns	0.0095	80 B
GeoToLongValue	.NET 8	269.4 ns	2.14 ns	1.90 ns	0.0095	80 B
GetCoordinatesFromLong	.NET 6	510.0 ns	9.71 ns	9.54 ns	0.0401	336 B
GetCoordinatesFromLong	.NET 8	195.3 ns	3.94 ns	3.49 ns	0.0401	336 B
GetGeoHashCode	.NET 6	877.3 ns	6.87 ns	6.42 ns	0.1268	1064 B
GetGeoHashCode	.NET 8	525.6 ns	6.46 ns	6.04 ns	0.0982	824 B

PR

Method	Job	Mean	Error	StdDev	Gen0	Allocated
GeoToLongValue	.NET 6	5.646 ns	0.0306 ns	0.0255 ns	-	-
GeoToLongValue	.NET 8	5.829 ns	0.0537 ns	0.0502 ns	-	-
GetCoordinatesFromLong	.NET 6	5.507 ns	0.0583 ns	0.0545 ns	-	-
GetCoordinatesFromLong	.NET 8	5.928 ns	0.0423 ns	0.0396 ns	-	-
GetGeoHashCode	.NET 6	16.286 ns	0.0968 ns	0.0808 ns	0.0003	48 B
GetGeoHashCode	.NET 8	15.721 ns	0.1790 ns	0.1587 ns	0.0003	48 B

Zen 2 PDEP/PEXT "erratum" in action

Method	Job	Mean	Error	StdDev	Allocated
GeoToLongValue	.NET 6 w/ UsePdepPext	64.98 ns	0.132 ns	0.103 ns	-
GeoToLongValue	.NET 8 w/ UsePdepPext	64.95 ns	0.383 ns	0.359 ns	-
GetCoordinatesFromLong	.NET 6 w/ UsePdepPext	62.87 ns	0.137 ns	0.128 ns	-
GetCoordinatesFromLong	.NET 8 w/ UsePdepPext	62.92 ns	0.090 ns	0.084 ns	-

^{See that 48B heap allocation for the final string in GetGeoHashCode, we don't need that :)}

PaulusParssinen · 2024-05-02T00:40:49Z

`CanEncodeAndDecodeCoordinates`

All values computed values printed with ToString("F16") (except the updated row, which is F14..)

Label	Latitude	Longitude	Lat Error (vs. input)	Lon Error (vs. input)
Test input	30.5388942218	104.0555758833	-	-
Test IEEE-754 repr.	30.5388942217999997	104.0555758832999942	-	-
main	30.5388928949832916	104.0555772185325623	0.0000013268167081	-0.0000013352325681
PR (`1b59a0a`)	30.5388915538787842	104.0555745363235474	0.0000026679212155	0.0000013469764468
PR (calculate center)	30.538892894983292	104.05557721853256	0.0000013268167081	-0.0000013352325681

edit: and I finally realize, the de-quantized coordinate represents the minimum value and we get the same coordinate as main branch which represented the center of this error-bounding-box by:
$(\text{{minLatitude}} + \frac{{\text{{latitudeError}}}}{2.0}, \text{{minLongitude}} + \frac{{\text{{longitudeError}}}}{2.0})$

Funny in hindsight as I was earlier wondering why those error values were multiplies of each other..

PaulusParssinen · 2024-05-02T02:42:29Z

Premature optimization is ~~root of all evil~~ fun!

Few test failures left, need to dive deeper into some rounding corner-cases.

The diff looks incomprehensible to review, it's pretty much a rewrite 😅

libs/server/Objects/SortedSetGeo/GeoHash.cs

PaulusParssinen · 2024-05-03T00:29:25Z

Added USE_PDEP_PEXT to allow those consumers who want to extract every bit of perf. for their Geo commands (and who know their CPU has the non-microcode-emulated version). SharpLab (Entire encoding/decoding is inlined into one method 😋)

PaulusParssinen · 2024-05-04T20:58:02Z

^{adds "Investigate LightClient implementation" to todo list. It seems run some tests until timeout if it receives the expected amount of bytes}

For some reason, I remembered/thought that the hash integer was bit-cast i.e. BitConverter.DoubleToUInt64Bits to the sorted set instead of just cast. That explains the 52-bits and I'll cancel the precision change plans..

I just realized the original implementation was not actually redis "52-bit" precision either 🤔 Redis seems to always "clear" out the last 2 bits for the textual representation output, so it always returns 0 as last character. Before this PR, there was already difference between the textual representation between Garnet and Redis.

PaulusParssinen · 2024-05-06T14:04:16Z

There's small corner-case error in the McLoughin's quantization trick on which this PR is based.

The statement

Since y is in the range [1,2], the largest power of two less than or equal to y is 1.0 and the exponent field will always be 1023.

is wrong for the upper-bound of the range. Exponent bias is 1023 for [1.0, 2.0) and exponent for 2.0 flips over to 1024. This means that 90.0 and 180.0 that are clamped to the range maximum 2.0, have their signicand (and subsequently, the geohash integer repesentation) zeroed and dequantization will give equivalent output as -90.0 and -180.0 respectively.

There's a lot of problems with GeoHash in general around the corner values where implementations differ..

Humor

PaulusParssinen · 2024-05-07T21:49:59Z

And to add to then confusion, the original latitude limits are "wrong". According EPSG:3857 (also known as EPSG:3785, _{also known as EPSG:900913.. etc.}) the latitude limits are -85.05112878 to 85.05112878.. not -90 to 90. Fun! I'm starting to see why the implementation are all over the place as the GeoHash standard uses different limits than the most common web projection.

I'll won't touch the existing latitude limit in this PR.

badrishc · 2024-05-08T00:58:39Z

Interesting stuff! Glad to see you are making progress here. :)

PaulusParssinen · 2024-05-08T20:12:34Z

Added benchmarks from my laptop which has AVX512F by which we guard the PEXT/PDEP path. This made both encoding/decoding of geohashes integers go from ~5.5ns to ~1.5ns which was more than I expected. It may be worthwhile to enable more platforms with static readonly Zen 2 check by doing X86Base.CpuId(0, 0) and X86Base.CpuId(1, 0) and check if AuthenticAMD && family < 0x19 to guard Pre-Zen 3.

Also wrote little bit more about the PR to its description.

PaulusParssinen · 2024-05-08T23:42:31Z

Added more background to the PR description about the nuances of geohash..

Garnet.sln

* See https://github.com/dotnet/runtime/blob/333fb71d54bd84256e740aa08f8b836d4cd71d98/src/libraries/System.Private.CoreLib/src/System/Numerics/ITrigonometricFunctions.cs#L65-L113

* Keep them in registers (or atmost spill to stack) * Also some other misc. simplification

* I will return to this method in a follow-up

…curve encoding more directly * Credits to https://mmcloughlin.com/posts/geohash-assembly for the quantization approach!

* Abuse IEEE-754 binary representation in the encoding too * Implement Z-curve decoding more efficiently * Add GetGeoErrorByPrecision to calculate the error at given bit precision (52 for us). * Optimize base32 string encoding. * Test still fail. Will need to investigate more.

* And define it to run the tests with it

…unding * More accurate and faster, what not to love

* Little extra encouragement to JIT.

* And add GeoHashBenchmark job with it enabled

* We might not even wan't cmov/csel which might stall out-of-order execution. Doesn't matter what is emitted tbh.

* The GeoHash class incorporates material from mmcloughlin/geohash and georust/geohash, both licensed under MIT License. Thank you for sharing!

PaulusParssinen · 2024-05-21T10:23:36Z

Out of curiosity, what required a force-push? 😅 (This broke the history experience in GitHub UI. I have still this branch in it's pre-force-push sync state if we want to restore that.)

yrajas · 2024-05-21T15:53:03Z

Out of curiosity, what required a force-push? 😅 (This broke the history experience in GitHub UI. I have still this branch in it's pre-force-push sync state if we want to restore that.)

No changes, actually. I was updating the branch with rebase option to bring in latest changes from main.

* Refactor Haversine-distance calculation * See https://github.com/dotnet/runtime/blob/333fb71d54bd84256e740aa08f8b836d4cd71d98/src/libraries/System.Private.CoreLib/src/System/Numerics/ITrigonometricFunctions.cs#L65-L113 * Do not spill the coordinate ranges to heap when encoding/decoding * Keep them in registers (or atmost spill to stack) * Also some other misc. simplification * Add shared AsciiUtils to Garnet.common to simplify the unit conversions * Slightly adjust GeoHash tests * dotnet format * Restore flag bit for GetGeoHashCode for now * I will return to this method in a follow-up * Optimize GeoToLongValue to use float quantization trick and do the Z-curve encoding more directly * Credits to https://mmcloughlin.com/posts/geohash-assembly for the quantization approach! * Further optimize Geohash & Base32 encoding and decoding * Abuse IEEE-754 binary representation in the encoding too * Implement Z-curve decoding more efficiently * Add GetGeoErrorByPrecision to calculate the error at given bit precision (52 for us). * Optimize base32 string encoding. * Test still fail. Will need to investigate more. * typo * Further clarify the quantization method * Calculate the center of bounding-box * Clarify dequantization method * Make the bounding-box center fix-up use constants * Add more test-data and restore original epsilon calc. * tests: sqc8b49rnyt -> sqc8b49rnys * tests: nsqdtr74hyu1 -> nsqdtr74hyu0 * Exponent is 1023, not 0 * Add USE_PDEP_PEXT switch for PDEP/PEXT Z-curve en/decode * And define it to run the tests with it * Use FusedMultiplyAdd to do (x+y)*z in one op to avoid intermediate rounding * More accurate and faster, what not to love * Remove #define USE_PDEP_PEXT, tests passed * Move GeoHash specific unit tests own file * Mark Z-curve encode/decode with MethodImpl.AI * Little extra encouragement to JIT. * Add GeoHash specific benchmarks * oops * format * Add UsePdepPext build switch. * And add GeoHashBenchmark job with it enabled * Avoid shifting by using already shifted mask for the PDEP/PEXT * Use AVX512 support to guard PDEP/PEXT usage. * Add MemoryDiagnoser back * Fix incorrect quantization approach * Make it little bit more clear what happens in the corner-case guard * We might not even wan't cmov/csel which might stall out-of-order execution. Doesn't matter what is emitted tbh. * Remove not needed pow2 trick * Let JIT do its thing, it's pretty good * Adjust comments a bit * Fix comment typos * Adjust comments * Adjust comments * Add third-party notices to NOTICE.md * The GeoHash class incorporates material from mmcloughlin/geohash and georust/geohash, both licensed under MIT License. Thank you for sharing! --------- Co-authored-by: Yoganand Rajasekaran <60369795+yrajas@users.noreply.github.com>

TalZaccai requested a review from yrajas April 30, 2024 22:40

PaulusParssinen marked this pull request as draft May 1, 2024 20:02

PaulusParssinen commented May 2, 2024

View reviewed changes

libs/server/Objects/SortedSetGeo/GeoHash.cs Outdated Show resolved Hide resolved

PaulusParssinen force-pushed the opt-geohash branch from 30947c3 to 7da6041 Compare May 6, 2024 00:45

PaulusParssinen mentioned this pull request May 7, 2024

Small error in the quantization approach for maximum coordinate values georust/geohash#52

Open

PaulusParssinen force-pushed the opt-geohash branch from e3d3ccf to 80ebee4 Compare May 7, 2024 22:56

PaulusParssinen marked this pull request as ready for review May 8, 2024 01:06

PaulusParssinen force-pushed the opt-geohash branch from 23cebf1 to b4ec75c Compare May 8, 2024 19:22

PaulusParssinen changed the title ~~Optimize GeoHash coordinate conversions~~ Make GeoHash coordinate conversions ~40-150x faster May 9, 2024

PaulusParssinen changed the title ~~Make GeoHash coordinate conversions ~40-150x faster~~ Make GeoHash coordinate conversions ~40-100x faster May 9, 2024

mmcloughlin mentioned this pull request May 10, 2024

Potential corner-case error in optimized assembly mmcloughlin/geohash#39

Open

yrajas reviewed May 13, 2024

View reviewed changes

Garnet.sln Outdated Show resolved Hide resolved

PaulusParssinen added 9 commits May 20, 2024 14:56

Refactor Haversine-distance calculation

cb61c41

* See https://github.com/dotnet/runtime/blob/333fb71d54bd84256e740aa08f8b836d4cd71d98/src/libraries/System.Private.CoreLib/src/System/Numerics/ITrigonometricFunctions.cs#L65-L113

Do not spill the coordinate ranges to heap when encoding/decoding

16b67b2

* Keep them in registers (or atmost spill to stack) * Also some other misc. simplification

Add shared AsciiUtils to Garnet.common to simplify the unit conversions

85a09a5

Slightly adjust GeoHash tests

60fda90

dotnet format

f137c0d

Restore flag bit for GetGeoHashCode for now

0e1f0ce

* I will return to this method in a follow-up

Optimize GeoToLongValue to use float quantization trick and do the Z-…

40baf28

…curve encoding more directly * Credits to https://mmcloughlin.com/posts/geohash-assembly for the quantization approach!

typo

07862c7

PaulusParssinen added 21 commits May 20, 2024 14:56

Add USE_PDEP_PEXT switch for PDEP/PEXT Z-curve en/decode

463785b

* And define it to run the tests with it

Use FusedMultiplyAdd to do (x+y)*z in one op to avoid intermediate ro…

0fa699a

…unding * More accurate and faster, what not to love

Remove #define USE_PDEP_PEXT, tests passed

7bc0fba

Move GeoHash specific unit tests own file

e0384e6

Mark Z-curve encode/decode with MethodImpl.AI

dc4632f

* Little extra encouragement to JIT.

Add GeoHash specific benchmarks

ef64efb

oops

61faa37

format

10397c6

Add UsePdepPext build switch.

6ce1d36

* And add GeoHashBenchmark job with it enabled

Avoid shifting by using already shifted mask for the PDEP/PEXT

3a0c08b

Use AVX512 support to guard PDEP/PEXT usage.

cc86c31

Add MemoryDiagnoser back

6336e08

Fix incorrect quantization approach

8ecc8b9

Make it little bit more clear what happens in the corner-case guard

6dd973a

* We might not even wan't cmov/csel which might stall out-of-order execution. Doesn't matter what is emitted tbh.

Remove not needed pow2 trick

b35045c

Let JIT do its thing, it's pretty good

8f6a3e6

Adjust comments a bit

0e70b77

Fix comment typos

ddf3263

Adjust comments

a1897d3

Adjust comments

e50cdb0

Add third-party notices to NOTICE.md

3383ef0

* The GeoHash class incorporates material from mmcloughlin/geohash and georust/geohash, both licensed under MIT License. Thank you for sharing!

yrajas force-pushed the opt-geohash branch from dcbd4b4 to 3383ef0 Compare May 20, 2024 21:56

Merge branch 'main' into opt-geohash

3f39acc

yrajas approved these changes May 21, 2024

View reviewed changes

yrajas merged commit e0027d1 into microsoft:main May 21, 2024
23 checks passed

PaulusParssinen deleted the opt-geohash branch May 22, 2024 09:19

github-actions bot locked and limited conversation to collaborators Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `GeoHash` coordinate conversions ~40-100x faster #348

Make `GeoHash` coordinate conversions ~40-100x faster #348

PaulusParssinen commented Apr 30, 2024 •

edited

Loading

PaulusParssinen commented May 2, 2024 •

edited

Loading

PaulusParssinen commented May 2, 2024 •

edited

Loading

PaulusParssinen commented May 3, 2024 •

edited

Loading

PaulusParssinen commented May 4, 2024 •

edited

Loading

PaulusParssinen commented May 6, 2024 •

edited

Loading

PaulusParssinen commented May 7, 2024 •

edited

Loading

badrishc commented May 8, 2024

PaulusParssinen commented May 8, 2024 •

edited

Loading

PaulusParssinen commented May 8, 2024 •

edited

Loading

PaulusParssinen commented May 21, 2024 •

edited

Loading

yrajas commented May 21, 2024

Make GeoHash coordinate conversions ~40-100x faster #348

Make GeoHash coordinate conversions ~40-100x faster #348

Conversation

PaulusParssinen commented Apr 30, 2024 • edited Loading

Benchmarks (Tiger Lake, i5-1135G7)

main

PR

Benchmarks (Zen 2, Ryzen 7 3700X)

main

PR

PaulusParssinen commented May 2, 2024 • edited Loading

CanEncodeAndDecodeCoordinates

PaulusParssinen commented May 2, 2024 • edited Loading

PaulusParssinen commented May 3, 2024 • edited Loading

PaulusParssinen commented May 4, 2024 • edited Loading

PaulusParssinen commented May 6, 2024 • edited Loading

PaulusParssinen commented May 7, 2024 • edited Loading

badrishc commented May 8, 2024

PaulusParssinen commented May 8, 2024 • edited Loading

PaulusParssinen commented May 8, 2024 • edited Loading

PaulusParssinen commented May 21, 2024 • edited Loading

yrajas commented May 21, 2024

Make `GeoHash` coordinate conversions ~40-100x faster #348

Make `GeoHash` coordinate conversions ~40-100x faster #348

PaulusParssinen commented Apr 30, 2024 •

edited

Loading

`main`

`main`

PaulusParssinen commented May 2, 2024 •

edited

Loading

`CanEncodeAndDecodeCoordinates`

PaulusParssinen commented May 2, 2024 •

edited

Loading

PaulusParssinen commented May 3, 2024 •

edited

Loading

PaulusParssinen commented May 4, 2024 •

edited

Loading

PaulusParssinen commented May 6, 2024 •

edited

Loading

PaulusParssinen commented May 7, 2024 •

edited

Loading

PaulusParssinen commented May 8, 2024 •

edited

Loading

PaulusParssinen commented May 8, 2024 •

edited

Loading

PaulusParssinen commented May 21, 2024 •

edited

Loading