Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve read speed #361

Merged
merged 3 commits into from
Sep 18, 2024
Merged

Improve read speed #361

merged 3 commits into from
Sep 18, 2024

Conversation

klauspost
Copy link
Collaborator

Utilize philhofer/fwd#32 to peek bytes

Eliminate bounds checks, mostly on reading sizes, but also integers.

benchmark                            old MB/s     new MB/s     speedup
BenchmarkReadMapHeaderBytes-32       1136.75      1136.89      1.00x
BenchmarkReadArrayHeaderBytes-32     1133.52      1149.34      1.01x
BenchmarkTestReadBytesHeader-32      1200.18      1171.21      0.98x
BenchmarkReadNilByte-32              2907.50      2176.63      0.75x
BenchmarkReadFloat64Bytes-32         4308.86      3869.64      0.90x
BenchmarkReadFloat32Bytes-32         2354.55      2332.21      0.99x
BenchmarkReadBoolBytes-32            863.66       1124.07      1.30x
BenchmarkReadTimeBytes-32            3710.27      3668.07      0.99x
BenchmarkReadMapHeader-32            228.74       391.76       1.71x
BenchmarkReadArrayHeader-32          243.90       454.31       1.86x
BenchmarkReadNil-32                  127.42       191.55       1.50x
BenchmarkReadFloat64-32              1046.08      1142.49      1.09x
BenchmarkReadFloat32-32              595.67       669.65       1.12x
BenchmarkReadInt64-32                422.29       798.35       1.89x
BenchmarkReadUintWithInt64-32        186.59       371.28       1.99x
BenchmarkReadUint64-32               282.87       377.76       1.34x
BenchmarkReadIntWithUint64-32        465.30       684.49       1.47x
BenchmarkRead16Bytes-32              1021.15      1021.38      1.00x
BenchmarkRead256Bytes-32             5324.42      6166.06      1.16x
BenchmarkRead2048Bytes-32            8405.41      8382.61      1.00x
BenchmarkRead16StringAsBytes-32      1102.72      1321.71      1.20x
BenchmarkRead256StringAsBytes-32     6959.78      6627.49      0.95x
BenchmarkRead16String-32             320.75       378.46       1.18x
BenchmarkRead256String-32            2066.40      2242.78      1.09x
BenchmarkReadComplex64-32            900.13       1064.21      1.18x
BenchmarkReadComplex128-32           1809.88      2028.11      1.12x
BenchmarkReadTime-32                 1362.46      1425.79      1.05x

Apologies for the noisy benchmark. BenchmarkReadNilByte and BenchmarkReadFloat64Byte does not touch new code, so it seems to be "micro-bench-itis", where random deltas show up. Bench takes ~30m to run for whatever reason.

The compiler is not clever enough to track back to l := len(b), so it inserts a bounds check. Also len(b) < 3 and big.Uint16(b[1:]) causes a bounds check. So we rewrite those to avoid it.

This should cover the lowest hanging fruits.

Utilize philhofer/fwd#32 to peek bytes

Eliminate bounds checks, mostly on reading sizes, but also integers.

```
benchmark                            old MB/s     new MB/s     speedup
BenchmarkReadMapHeaderBytes-32       1136.75      1136.89      1.00x
BenchmarkReadArrayHeaderBytes-32     1133.52      1149.34      1.01x
BenchmarkTestReadBytesHeader-32      1200.18      1171.21      0.98x
BenchmarkReadNilByte-32              2907.50      2176.63      0.75x
BenchmarkReadFloat64Bytes-32         4308.86      3869.64      0.90x
BenchmarkReadFloat32Bytes-32         2354.55      2332.21      0.99x
BenchmarkReadBoolBytes-32            863.66       1124.07      1.30x
BenchmarkReadTimeBytes-32            3710.27      3668.07      0.99x
BenchmarkReadMapHeader-32            228.74       391.76       1.71x
BenchmarkReadArrayHeader-32          243.90       454.31       1.86x
BenchmarkReadNil-32                  127.42       191.55       1.50x
BenchmarkReadFloat64-32              1046.08      1142.49      1.09x
BenchmarkReadFloat32-32              595.67       669.65       1.12x
BenchmarkReadInt64-32                422.29       798.35       1.89x
BenchmarkReadUintWithInt64-32        186.59       371.28       1.99x
BenchmarkReadUint64-32               282.87       377.76       1.34x
BenchmarkReadIntWithUint64-32        465.30       684.49       1.47x
BenchmarkRead16Bytes-32              1021.15      1021.38      1.00x
BenchmarkRead256Bytes-32             5324.42      6166.06      1.16x
BenchmarkRead2048Bytes-32            8405.41      8382.61      1.00x
BenchmarkRead16StringAsBytes-32      1102.72      1321.71      1.20x
BenchmarkRead256StringAsBytes-32     6959.78      6627.49      0.95x
BenchmarkRead16String-32             320.75       378.46       1.18x
BenchmarkRead256String-32            2066.40      2242.78      1.09x
BenchmarkReadComplex64-32            900.13       1064.21      1.18x
BenchmarkReadComplex128-32           1809.88      2028.11      1.12x
BenchmarkReadTime-32                 1362.46      1425.79      1.05x
```

Apologies for the noisy benchmark. `BenchmarkReadNilByte` and `BenchmarkReadFloat64Byte` does not touch new code, so it seems to be "micro-bench-itis", where random deltas show up. Bench takes ~30m to run for whatever reason.

The compiler is not clever enough to track back to `l := len(b)`, so it inserts a bounds check. Also `len(b) < 3` and `big.Uint16(b[1:])` causes a bounds check. So we rewrite those to avoid it.

This should cover the lowest hanging fruits.
@klauspost klauspost requested a review from philhofer September 16, 2024 15:13
@klauspost klauspost marked this pull request as draft September 16, 2024 15:26
@klauspost klauspost marked this pull request as ready for review September 16, 2024 15:39
@klauspost klauspost merged commit be78ef3 into tinylib:master Sep 18, 2024
4 checks passed
@klauspost klauspost deleted the improve-read-speed branch September 18, 2024 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants