-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better support for chunked parsers and streaming #62
Comments
I've had some success writing resumable internally-length-delimited binary format parsers with 'cereal' recently, although I did use 'isolate' since the chunks in question were meant to be small. We can always just instrument our byte-parsing monad of choice with our own State monad or manual int-threading for isolation (i.e. before parsing each tag, check whether we have any bytes "remaining"). Then 'isolate n subParser' is effectively 'lift (evalStateT subParser n) >> modify (subtract n)'. |
- More heterogeneous list ops - Resource ops that don't use "dtype" as the type parameter For the latter, we may need an upstream fix, or else to change the convention of how we can tell what the type parameter is.
This will make it easier to replace the parser with something else, especially as we work on generated encodings. See also #62. This change also "unrolls" the `anyBits` function into specific functions `getFixed32` and `getFixed64`, to give a more unified API between parsing and building.
This will make it easier to replace the parser with something else, especially as we work on generated encodings. See also #62.
This will make it easier to replace the parser with something else, especially as we work on generated encodings. See also #62.
All decoding benchmarks show significant speedups after this change. The biggest improvement is to decoding packed data which is 4-5x as fast as before. (See below for a full list of benchmark diffs.) This parsing monad follows the approach of, e.g., the `store` and `persist` packages. It requires that all data be in a *strict* `ByteString`, and uses simple pointer arithmetic internally to walk through its bytes. This effectively works against #62 (streaming parsers) since it needs to read all the input data before starting the parse. However, that issue has already existed since the beginning of this library for, e.g., submessages; see that bug for more details. So this change doesn't appear to be a regression. We also have freedom to later try out different implementations without changing the API, since `Parser` is opaque as of #294. The implementation of Parser differs from `store` and `persist` by using `ExceptT` to pass around errors internally, rather than exceptions (or closures, as in `attoparsec`). We may want to experiment with this later, but in my initial experiments I didn't see a significant improvement from those approaches. Benchmark results (the "time" output from Criterion): flat(602B)/decode/whnf: 13.14 μs (13.02 μs .. 13.29 μs) => 8.686 μs (8.514 μs .. 8.873 μs) nested(900B)/decode/whnf: 26.35 μs (25.85 μs .. 26.86 μs) => 14.01 μs (13.86 μs .. 14.18 μs) int32-packed(1003B)/decode/whnf: 36.23 μs (35.75 μs .. 36.69 μs) => 17.31 μs (17.11 μs .. 17.50 μs) int32-unpacked(2000B)/decode/whnf: 65.18 μs (64.19 μs .. 66.68 μs) => 19.35 μs (19.13 μs .. 19.58 μs) float-packed(4003B)/decode/whnf: 78.61 μs (77.53 μs .. 79.46 μs) => 19.56 μs (19.40 μs .. 19.76 μs) float-unpacked(5000B)/decode/whnf: 108.9 μs (107.8 μs .. 110.3 μs) => 22.29 μs (22.00 μs .. 22.66 μs) no-unused(10003B)/decode/whnf: 571.7 μs (560.0 μs .. 586.6 μs) => 356.5 μs (349.0 μs .. 365.0 μs) with-unused(10003B)/decode/whnf: 786.6 μs (697.8 μs .. 875.5 μs) => 368.3 μs (361.8 μs .. 376.4 μs)
All decoding benchmarks show significant speedups after this change. The biggest improvement is to decoding packed data which is 4-5x as fast as before. (See below for a full list of benchmark diffs.) This parsing monad follows the approach of, e.g., the `store` and `persist` packages. It requires that all data be in a *strict* `ByteString`, and uses simple pointer arithmetic internally to walk through its bytes. This effectively works against #62 (streaming parsers) since it needs to read all the input data before starting the parse. However, that issue has already existed since the beginning of this library for, e.g., submessages; see that bug for more details. So this change doesn't appear to be a regression. We also have freedom to later try out different implementations without changing the API, since `Parser` is opaque as of #294. The implementation of Parser differs from `store` and `persist` by using `ExceptT` to pass around errors internally, rather than exceptions (or closures, as in `attoparsec`). We may want to experiment with this later, but in my initial experiments I didn't see a significant improvement from those approaches. Benchmark results (the "time" output from Criterion): flat(602B)/decode/whnf: 13.14 μs (13.02 μs .. 13.29 μs) => 8.686 μs (8.514 μs .. 8.873 μs) nested(900B)/decode/whnf: 26.35 μs (25.85 μs .. 26.86 μs) => 14.01 μs (13.86 μs .. 14.18 μs) int32-packed(1003B)/decode/whnf: 36.23 μs (35.75 μs .. 36.69 μs) => 17.31 μs (17.11 μs .. 17.50 μs) int32-unpacked(2000B)/decode/whnf: 65.18 μs (64.19 μs .. 66.68 μs) => 19.35 μs (19.13 μs .. 19.58 μs) float-packed(4003B)/decode/whnf: 78.61 μs (77.53 μs .. 79.46 μs) => 19.56 μs (19.40 μs .. 19.76 μs) float-unpacked(5000B)/decode/whnf: 108.9 μs (107.8 μs .. 110.3 μs) => 22.29 μs (22.00 μs .. 22.66 μs) no-unused(10003B)/decode/whnf: 571.7 μs (560.0 μs .. 586.6 μs) => 356.5 μs (349.0 μs .. 365.0 μs) with-unused(10003B)/decode/whnf: 786.6 μs (697.8 μs .. 875.5 μs) => 368.3 μs (361.8 μs .. 376.4 μs)
All decoding benchmarks show significant speedups after this change. The biggest improvement is to decoding packed data which is 4-5x as fast as before. (See below for a full list of benchmark diffs.) This parsing monad follows the approach of, e.g., the `store` and `persist` packages. It requires that all data be in a *strict* `ByteString`, and uses simple pointer arithmetic internally to walk through its bytes. This effectively works against #62 (streaming parsers) since it needs to read all the input data before starting the parse. However, that issue has already existed since the beginning of this library for, e.g., submessages; see that bug for more details. So this change doesn't appear to be a regression. We also have freedom to later try out different implementations without changing the API, since `Parser` is opaque as of #294. The implementation of Parser differs from `store` and `persist` by using `ExceptT` to pass around errors internally, rather than exceptions (or closures, as in `attoparsec`). We may want to experiment with this later, but in my initial experiments I didn't see a significant improvement from those approaches. Benchmark results (the "time" output from Criterion): flat(602B)/decode/whnf: 13.14 μs (13.02 μs .. 13.29 μs) => 8.686 μs (8.514 μs .. 8.873 μs) nested(900B)/decode/whnf: 26.35 μs (25.85 μs .. 26.86 μs) => 11.66 μs (11.36 μs .. 11.99 μs) int32-packed(1003B)/decode/whnf: 36.23 μs (35.75 μs .. 36.69 μs) => 17.31 μs (17.11 μs .. 17.50 μs) int32-unpacked(2000B)/decode/whnf: 65.18 μs (64.19 μs .. 66.68 μs) => 19.35 μs (19.13 μs .. 19.58 μs) float-packed(4003B)/decode/whnf: 78.61 μs (77.53 μs .. 79.46 μs) => 19.56 μs (19.40 μs .. 19.76 μs) float-unpacked(5000B)/decode/whnf: 108.9 μs (107.8 μs .. 110.3 μs) => 22.29 μs (22.00 μs .. 22.66 μs) no-unused(10003B)/decode/whnf: 571.7 μs (560.0 μs .. 586.6 μs) => 356.5 μs (349.0 μs .. 365.0 μs) with-unused(10003B)/decode/whnf: 786.6 μs (697.8 μs .. 875.5 μs) => 368.3 μs (361.8 μs .. 376.4 μs) Also added isolate and used it for parsing messages and packed fields. This improved the nested benchmark a bit compared to without it: benchmarking nested(900B)/decode/whnf 14.32 μs (14.08 μs .. 14.57 μs) => 11.66 μs (11.36 μs .. 11.99 μs) It didn't make a significant difference in the packed benchmark, I think because the effects of using lists currently dominate everything else.
All decoding benchmarks show significant speedups after this change. The biggest improvement is to decoding packed data which is 4-5x as fast as before. (See below for a full list of benchmark diffs.) This parsing monad follows the approach of, e.g., the `store` and `persist` packages. It requires that all data be in a *strict* `ByteString`, and uses simple pointer arithmetic internally to walk through its bytes. This effectively works against google#62 (streaming parsers) since it needs to read all the input data before starting the parse. However, that issue has already existed since the beginning of this library for, e.g., submessages; see that bug for more details. So this change doesn't appear to be a regression. We also have freedom to later try out different implementations without changing the API, since `Parser` is opaque as of google#294. The implementation of Parser differs from `store` and `persist` by using `ExceptT` to pass around errors internally, rather than exceptions (or closures, as in `attoparsec`). We may want to experiment with this later, but in my initial experiments I didn't see a significant improvement from those approaches. Benchmark results (the "time" output from Criterion): flat(602B)/decode/whnf: 13.14 μs (13.02 μs .. 13.29 μs) => 8.686 μs (8.514 μs .. 8.873 μs) nested(900B)/decode/whnf: 26.35 μs (25.85 μs .. 26.86 μs) => 11.66 μs (11.36 μs .. 11.99 μs) int32-packed(1003B)/decode/whnf: 36.23 μs (35.75 μs .. 36.69 μs) => 17.31 μs (17.11 μs .. 17.50 μs) int32-unpacked(2000B)/decode/whnf: 65.18 μs (64.19 μs .. 66.68 μs) => 19.35 μs (19.13 μs .. 19.58 μs) float-packed(4003B)/decode/whnf: 78.61 μs (77.53 μs .. 79.46 μs) => 19.56 μs (19.40 μs .. 19.76 μs) float-unpacked(5000B)/decode/whnf: 108.9 μs (107.8 μs .. 110.3 μs) => 22.29 μs (22.00 μs .. 22.66 μs) no-unused(10003B)/decode/whnf: 571.7 μs (560.0 μs .. 586.6 μs) => 356.5 μs (349.0 μs .. 365.0 μs) with-unused(10003B)/decode/whnf: 786.6 μs (697.8 μs .. 875.5 μs) => 368.3 μs (361.8 μs .. 376.4 μs) Also added isolate and used it for parsing messages and packed fields. This improved the nested benchmark a bit compared to without it: benchmarking nested(900B)/decode/whnf 14.32 μs (14.08 μs .. 14.57 μs) => 11.66 μs (11.36 μs .. 11.99 μs) It didn't make a significant difference in the packed benchmark, I think because the effects of using lists currently dominate everything else.
) This will make it easier to replace the parser with something else, especially as we work on generated encodings. See also google#62.
All decoding benchmarks show significant speedups after this change. The biggest improvement is to decoding packed data which is 4-5x as fast as before. (See below for a full list of benchmark diffs.) This parsing monad follows the approach of, e.g., the `store` and `persist` packages. It requires that all data be in a *strict* `ByteString`, and uses simple pointer arithmetic internally to walk through its bytes. This effectively works against google#62 (streaming parsers) since it needs to read all the input data before starting the parse. However, that issue has already existed since the beginning of this library for, e.g., submessages; see that bug for more details. So this change doesn't appear to be a regression. We also have freedom to later try out different implementations without changing the API, since `Parser` is opaque as of google#294. The implementation of Parser differs from `store` and `persist` by using `ExceptT` to pass around errors internally, rather than exceptions (or closures, as in `attoparsec`). We may want to experiment with this later, but in my initial experiments I didn't see a significant improvement from those approaches. Benchmark results (the "time" output from Criterion): flat(602B)/decode/whnf: 13.14 μs (13.02 μs .. 13.29 μs) => 8.686 μs (8.514 μs .. 8.873 μs) nested(900B)/decode/whnf: 26.35 μs (25.85 μs .. 26.86 μs) => 11.66 μs (11.36 μs .. 11.99 μs) int32-packed(1003B)/decode/whnf: 36.23 μs (35.75 μs .. 36.69 μs) => 17.31 μs (17.11 μs .. 17.50 μs) int32-unpacked(2000B)/decode/whnf: 65.18 μs (64.19 μs .. 66.68 μs) => 19.35 μs (19.13 μs .. 19.58 μs) float-packed(4003B)/decode/whnf: 78.61 μs (77.53 μs .. 79.46 μs) => 19.56 μs (19.40 μs .. 19.76 μs) float-unpacked(5000B)/decode/whnf: 108.9 μs (107.8 μs .. 110.3 μs) => 22.29 μs (22.00 μs .. 22.66 μs) no-unused(10003B)/decode/whnf: 571.7 μs (560.0 μs .. 586.6 μs) => 356.5 μs (349.0 μs .. 365.0 μs) with-unused(10003B)/decode/whnf: 786.6 μs (697.8 μs .. 875.5 μs) => 368.3 μs (361.8 μs .. 376.4 μs) Also added isolate and used it for parsing messages and packed fields. This improved the nested benchmark a bit compared to without it: benchmarking nested(900B)/decode/whnf 14.32 μs (14.08 μs .. 14.57 μs) => 11.66 μs (11.36 μs .. 11.99 μs) It didn't make a significant difference in the packed benchmark, I think because the effects of using lists currently dominate everything else.
Currently, when parsing messages we force a strict ByteString for every submessage. It might be better to parse in a more chunked fashion, e.g., for use with lazy ByteStrings or with conduits/pipes.
Counterpoint: regardless, parsing has to read the whole input into a Haskell object that's at least as big as the input data anyway. So it's not clear how much you'd gain from this change.
If we did want this, it might be easier to move from
attoparsec
to another library. Protobufs are a little tricky to parse because the wire format is not naturally delimited; messages are just sequences of tagged field/value pairs, and sub-messages are encoded as a varint followed by the message (with no "ending" marker).For example, from the
binary
package we could useisolate :: Int -> Get a -> Get a
which restricts a sub-parser to a specific number of bytes, andisEmpty :: Get Bool
which detects end-of-input correctly within a call to isolate. In comparison:attoparsec
doesn't provide anisolate
function, AFAIK; currently we mimic it by running a parser on the output oftake :: Parser ByteString
.cereal
provides anisolate
function, but it still reads the full span into a singleByteString
.store
's isolate doesn't work yet for our use case (Functions for peeking sequence-like types with runtime known size mgsloan/store#40) and the library also lacks support for architecture-independent serialization (Safe support for architecture independent serialization mgsloan/store#36). See Consider using store for serialization? #5 for more discussion.The text was updated successfully, but these errors were encountered: