-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java,C#,C++] Support field access order checking #948
[Java,C#,C++] Support field access order checking #948
Conversation
sbe-tool/src/test/java/uk/co/real_logic/sbe/FieldAccessOrderCheckTest.java
Outdated
Show resolved
Hide resolved
I plan to make a slight improvement today. As of the current revision (4eb8fd0), we do not expose any mechanism to check that a message is fully encoded, but we easily could. I plan to generate a method, e.g., It would be possible to do something similar for decoders, but it would rely on having |
As of c67e20a, we generate a |
7c1f56a
to
b3f809d
Compare
b3f809d
to
aa136ce
Compare
0365f16
to
d33e54b
Compare
SBE does not support arbitrary access to fields when encoding and decoding. In particular, accessing groups and variable length fields advances a `limit`, which marks where the start of the next group/variable length field is expected. We aim to introduce some checking (in debug mode - when bounds checks are enabled) that verifies field access order is consistent enough with the order defined in the schema to encode or decode a valid message.
Also, enable some tests around field order checking. Outline of the approach: - Using a FSM, model the states of an encoder during encoding. - Hopefully, this model will be language independent. - Generate code to perform state transitions upon field accesses. - And to throw when transitions are not in the model. Assumptions: - JIT compilation will remove `if (DEBUG_MODE) { ... }` blocks when the static boolean constant, `DEBUG_MODE`, is `false`. Questions: - Is the approach reasonable? - Is the newly disallowed behaviour reasonable? - For example, it prevents encoding block fields after a group or variable length field has been encoded. Further work: - Decode path - More tests (including tests for decoding old messages where some fields may not be present)
Use `-Dsbe.print.state.machine=$NAME_OF_MSG`.
This change builds upon the work to check field encoding order. It is a slightly more-complex problem than with encoding, as when decoding it is necessary to deal with multiple versions of encoded data. To handle this, we fuse together the state machines for decoding each possible version and branch on "decoder.wrap(...)". Further work: - More tests - Better error messages
"New" decoders (in Java at least) read null-like values for fields that are not present in "old" messages. There are no restrictions on the order in which these "new" fields are accessed when decoding "old" data. In Java, a version check returns a null-like value when a field is not present in the acting version. This behaviour matches the wiki documentation. C#, currently, has different behaviour: the application code needs to interrogate a boolean property, e.g., `IsFooInActingVersion`, before accessing the data property, e.g., `Foo`. The property itself assumes that the field is present. It will attempt to read it from the buffer and possibly advance a `limit` field. In this commit, I have moved the state transition beneath the existing version check in Java decoders. This change allows these fields to be accessed when decoding "old" messages without changing the state of decoding, which seems appropriate as there are no side-effects on internal state, e.g., the `limit` field. Also, in this commit, I have expanded the test coverage to include the getter and setter variants for fixed-size arrays and variable-length data.
Also, use an enum to represent codec states rather than a state number and generate (dot-based) state machine diagram in Javadoc for states enum.
Also, fix some spots where I had forgotten to move the state transition below the version check in decoders.
There is still work to do in this area. In particular: 1. States should be more descriptive. 1. Errors should contain full field paths. 2. Group error messages should be more clear, e.g., "cannot write group element before calling next()".
…ely, in some cases where it is not). This commit makes the field order checks more-lenient. The aim is to allow the application code to access fields in the root/top-level block or blocks inside groups whenever it is safe. However, the approach is broken and does not work in general. To (try to) achieve this aim, we rely on the fact that once a block is positioned correctly, it is safe to access during the remainder of encoding/decoding. For example, the following code is safe: ``` encoder.wrap(...); encoder.blockField1(42); encoder.varData("abc"); encoder.blockField2(43); ``` As soon as `wrap(...)` is called, the top-level block will be positioned over a valid region for the remainder of encoding. Groups are slightly trickier. The following code is _safe_: ``` encoder.wrap(...); Foo.BarEncoder bar = encoder.groupBarCount(2); bar.next() .x(1).y(2) .next(); .x(3).y(4); ``` But the following code is _unsafe_: ``` encoder.wrap(...); Foo.BarEncoder bar = encoder.groupBarCount(2); bar.x(1).y(2) .next(); .x(3).y(4); ``` And the following code is _unsafe_ too: ``` encoder.wrap(...) Foo.BarEncoder bar = encoder.groupBarCount(0); encoder.baz("abc"); bar.x(1).y(2); ``` The `offset` for the group encoder is only set when `next()` is called. Therefore, calling `x(1)` and `y(2)` will likely overwrite some data elsewhere in the buffer (probably near offset 0). We can still rely on the fact that once `next()` has been called - even if it is called multiple times, or too many times (causing it to throw) - the group's block fields will be safe to encode/decode. Thinking about this in terms of our state machine that models codec state and field accesses, we can put this another way. There is at least one transition that represents the `next()` method call. The states (transitively) reachable from these transitions can safely access the associated group block. As already mentioned, despite the passing tests, the approach in this commit doesn't always work. I had thought it would be possible to assign a number to each state as we find them using DFS and use this number as a shortcut to determining whether a block access was safe. For example, in each block field we would check: ``` if (currentState.number() >= firstStateAfterNextIsCalled.number()) { // ALLOW } else { // DENY } ``` However, (even ignoring the cycles due to repeating groups and the use of DFS) this solution makes the following assumptions: 1. There is a single state: `firstStateAfterNextIsCalled`, such that all subsequent transitively reachable states are safe. 2. It is possible to number the states in such a way that `numberOfState(b) >= numberOfState(a)` implies `b` is (transitively) reachable from `a`. Let's consider the following psuedo-schema w.r.t. our assumptions above: ``` group a { x: int32 } group b { y: int32 } ``` Can we set `x` if we've called `BEncoder#next`? Sometimes. The following code is invalid. ``` AEncoder a = encoder.aCount(0); BEncoder b = encoder.bCount(1).next(); b.y(2); a.x(1); // Whoops! We overwrote some data somewhere. ``` But this code is valid. ``` AEncoder a = encoder.aCount(1).next(); BEncoder b = encoder.bCount(1).next(); b.y(2); a.x(1); ``` Therefore, multiple transitions must exist for the `BEncoder#next` call with different exit states, which breaks Assumption 1. Assumption 2 is also problematic. As alluded to above, we must have a bifurcation of states depending on whether a group has elements or is empty. With arbitrary bifurcations, it is not possible to number the states in such a way that Assumption 2 holds. Consider the following state diagram: ``` state N / \ empty non-empty / \ state X state Y ``` Depending on whether a group is empty, e.g., `aCount(0)`, or not, e.g., `aCount(1)`, we transition to either `state X` or `state Y` from `state N`. For Assumption 2 to hold, we need `numberOf(state X) >= numberOf(state N)` (reachable) and `numberOf(state Y) >= numberOf(state N)` (reachable) and `numberOf(state X) < numberOf(state Y)` (unreachable) and `numberOf(state Y) < numberOf(state X)` (unreachable). Clearly, this is impossible. Prior to this commit, we avoided such problems by enforcing a more-strict ordering of field accesses. With this more-strict ordering, setting fields in a group block in later states was not allowed. Therefore, the state machine didn't have to have to split into mutually exclusive subgraphs of reachable states depending on whether or not the group is empty. For example, let's consider a psuedo-schema with two groups. ``` group a { x: int32 } group b { y: int32 } ``` Once the application code calls `encoder.bCount(2).next()`, our state machine will enter the `V0_B_N_BLOCK` state regardless of whether previously the application code called `encoder.aCount(0)` or `encoder.aCount(1).next()`. In `V_B_N_BLOCK` the application code is not allowed to call `AEncoder#x`; therefore, it is safe, i.e., it has no false positives. However, it is also too-strict, i.e., it has false negatives, e.g., the `encoder.aCount(1).next()` case. How likely are these false negatives to come up in the "wild"? If we think false negatives will often come up, is it possible to remove them whilst still using a state machine? We could consider generating more states to represent the reachable states under each fork of group emptiness. The number of states would grow exponentially w.r.t. the number of top-level groups, but let's ignore that for the moment and just consider whether or not it is possible. Continuing the example above with two groups, when the application code calls `encoder.bCount(2).next()`, the state it enters would depend on whether it had previously called `encoder.aCount(0)` or `encoder.aCount(n)...next()` for some value `n > 0`. To safely model this example, we'd need at least the following states: 1. has no `a` elements 2. has no `a` elements, and has no `b` elements 3. has `a` elements, but has not called `AEncoder#next` 4. has `a` elements, and has called `AEncoder#next` 5. has `a` elements, has called `AEncoder#next`, has `b` elements, but has not called `BEncoder#next` 6. has `a` elements, has called `AEncoder#next`, has `b` elements, and has called `BEncoder#next` 7. has no `a` elements, has `b` elements, but has not called `BEncoder#next` 8. has no `a` elements, has `b` elements, and has called `BEncoder#next` A trickier case is with nested groups: ``` group foo { x: int32 group bar { y: int32 } r: string s: string } ``` When is `Bar#y` safe to call? I believe it is safe to call in states where `FooEncoder#next` and `BarEncoder#next` have been called in-order more recently than `BarEncoder#count(0)`. Note, as `bar` is a group within `foo`, the `BarEncoder#next` and `BarEncoder#count` methods may be called repeatedly during the encoding of a single message. Implementing a more-permissive safety check, using this model, would entail checking that the current state of the encoder/decoder is in a state where this condition holds. To safely model this example, we'd need (at least) the following states: 1. has no `foo` elements 2. has `foo` elements, but has not called `FooEncoder#next` 3. has `foo` elements, and has called `FooEncoder#next` 4. has `foo` elements, has called `FooEncoder#next`, and has no `bar` elements 5. has `foo` elements, has called `FooEncoder#next`, has no `bar` elements, and has encoded `r` 6. has `foo` elements, has called `FooEncoder#next`, has no `bar` elements, and has encoded `s` 7. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, but has not called `BarEncoder#next` 8. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, has not called `BarEncoder#next`, and has encoded `r` 9. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, has not called `BarEncoder#next`, and has encoded `s` 10. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next` 11. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next`, and has encoded `r` 12. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next`, and has encoded `s` ``` FooEncoder foo = encoder.fooCount(3); // State 2 foo.next(); // State 3 foo.barCount(0); // State 4 foo.r("abc"); // State 5 foo.s("def"); // State 6 foo.next(); // State 4 BarEncoder bar = foo.barCount(1); // State 7 foo.r("abc"); // State 8 bar.next(); // State 11 bar.y(42); // State 11 foo.s("def"); // State 12 foo.next(); // State X? bar.y(43); // Weird but allowed. foo.barCount(0); // State Y? State X cannot be State 10, as State 10 should not allow _re-encoding_ of the bar group count ``` OK. We didn't have enough states. In the case where we "wrap back around", we need to model whether `bar` block access is allowed (State 13 vs State 3): 13. has `foo` elements, has called `FooEncoder#next`, has called `BarEncoder#next` for last iteration, but hasn't set `bar` group size in this iteration. An even trickier case is with doubly-nested groups: ``` group foo { x: int32 group bar { y: int32 group baz { z: int32 } } } ``` When is `Baz#z` safe to call? I believe it is safe to call in states where `FooEncoder#next`, `BarEncoder#next` and `BazEncoder#next` have been called in-order more recently than `BarEncoder#count(0)` or `BazEncoder#count(0). Encoding this in states is likely to result in more cases like State 13 above. Generalising: a group block field may be accessed only when all outer groups' and its own `#next` methods have been called in-order more recently than any outer groups' or its own `#count` method with a `0` argument. Can model "more-recently" exactly, i.e., with no false positives and no false negatives, using a state machine? Another way of looking at the state machines we generate, is as parsers for a language. Where our language, based on an SBE schema, comprises valid "strings" of calls to an encoder/decoder. Is this language a regular language? If not, representing it using a FSM will not work. Cue Pumping Lemma? Next steps: - Add more tests to show where this approach breaks - Revert the changes and see if it is acceptable that some valid uses of codecs will be prevented when enabling field order checking. I.e., it will give false negatives when testing the hypothesis "this encoding/decoding is valid". - Personally, I think false negatives are better than false positives. (False positives are possible as of this commit.) Having said that, it will discourage use of these checks in existing systems where valid code already exists that does not pass the checks.
(When testing the hypothesis "this decoding/encoding code is valid".) These tests show that there are problems with the approach in the last commit.
…fortunately, in some cases where it is not)." This reverts commit d9eb33d.
To reduce the number of false negatives the field access order checker generates, in this commit, we treat the top-level block fields differently. We know that once the top-level encoder is wrapped, these fields are always safe to access. As a result, we were able to enable several tests that previously failed.
Introduces the system property: ``` sbe.enable.access.order.checks ```
Some JIT compilers are able to eliminate dead code within branches of if-statements where the conditional is a static expression. Some JIT compilers perform inlining to some degree and use heuristics, e.g., method size. It isn't clear whether code size metrics will be calculated after dead code elimination is performed. In this commit, I've moved the code that performs state transitions when access order checks are enabled into separate methods that the encoders/decoders call upon field accesses, e.g., the getter for a field `xyz` might look like this: ``` int xyz() { if (ENABLED_ACCESS_ORDER_CHECKS) { onXyzAccessed(); } // ... } ``` This keeps the initial size of `xyz` smaller than when the code of `onXyzAccessed()` is "inlined" by the code generator. There is a lot of noise on my ThinkPad P14s, but benchmarks do show improvements after this change. Before: ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 18051405.792 ± 444296.898 ops/s CarBenchmark.testEncode thrpt 6 11964827.418 ± 221996.747 ops/s ``` After: ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 18132922.415 ± 1964596.044 ops/s CarBenchmark.testEncode thrpt 6 20018131.180 ± 671367.070 ops/s ``` Baseline (d2ec8f8): ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 19270366.357 ± 2618336.027 ops/s CarBenchmark.testEncode thrpt 6 22519718.535 ± 794169.180 ops/s ```
5fb5a18
to
cef3e1f
Compare
Removes extraneous space in Java output when access order checks are disabled. Removes unnecessary access modifiers in C++ when access order checks are disabled.
Hi, really good to see the access checks being added, definitely think this will help eliminate some rather scary bugs! I was a bit surprised that the checks weren't doing anything in C# until I realised despite generating them I had to preprocessor For what it's worth I tried out some obvious encoding blunders from C# (group after data, fewer items in group than I said, too many Next() calls, data fields in wrong order), and all got caught, the exception messages were pretty helpful 👍 |
Thanks for the feedback, @kieranelby. It was good to meet you all in 3D yesterday. IIRC, the other adjustments to make to this PR - i.e., non-DTO-related requests - were:
|
The aim of this change is to avoid naming collisions with other flags, particularly in financial domains where "order" is used frequently.
558df22
to
69d13b6
Compare
Hi @ZachBray sorry for the delay in getting back, one use case that occurred to us: Some teams package up a client library for talking to their system. They might want to include in the library two versions of the generated code: myawesomprotocol_unsafe (no checks) and myawesomeprotocol_checked (access check), but the schema (myawesomeprotocol) would be the same. Maybe to support this it would be good to use the output namespace in the name of the property that controls whether checks are enabled? Cheers, Kieran |
Hi @kieranelby,
They wouldn't generate once (with checks) and use flags to enable/disable at runtime?
Do I understand correctly that you are suggesting the enabled/disabled flag should be based on a fully qualified schema name? If so, I think this aligns with Point 2. Would you be happy with only namespaced flags? I suspect it will be simpler than having namespaced flags and a global flag. Happy to have a call to discuss, if it is easier. Thanks, Zach. |
Hi @ZachBray , I thought disabling/enabling at runtime only works in java - for C++ and C# it is build time? Yes agree perhaps the global flag not needed if we have per-namespace flags. |
@kieranelby, sorry, yes, you're right.
I'll make the change. |
I'm leaning to |
@mjpt777, I've already made the change to IIRC, you had some reservations about using flags named per-schema, i.e., around @kieranelby's request, is that still the case? |
Either is fine. I have a mild preference for "precedence" but happy to go with whatever you prefer.
I think I'm okay with this now. Per schema flags are fine. |
Some users would like the ability to enable precedence checks on some schemas but not others. Initially, I considered generating flags or system properties with the schema namespace; however, this forces the more-complex scheme on all users. It also might get unwieldly with long namespaces. It also assumes a one-to-one correspondence between namespaces and schemas. Instead, I have introduced two new system properties that can be applied at code generation time: - `sbe.precedence.checks.flagName`, which controls the symbol/macro used to enable precedence checks at build time in the generated C#/C++ code. It defaults to `"SBE_ENABLE_PRECEDENCE_CHECKS"`. - `sbe.precedence.checks.propName`, which controls the property name used to enable precedence checks at runtime in the generated Java code. It defaults to `"sbe.enable.precedence.checks"`. Now, users can run SbeTool with different values for these system properties to generate code with different enablement flags. Note above that the defaults are: - `SBE_ENABLE_PRECEDENCE_CHECKS` rather than `SBE_ENABLE_SEQUENCING_CHECKS`, and - `sbe.enable.precedence.checks` rather than `sbe.enable.sequencing.checks`. These changes and a change to another system property: - `sbe.generate.sequencing.checks` -> `sbe.generate.precedence.checks` address some feedback from Martin on the associated PR real-logic#948.
Some users would like the ability to enable precedence checks on some schemas but not others. Initially, I considered generating flags or system properties with the schema namespace; however, this forces the more-complex scheme on all users. It also might get unwieldly with long namespaces. It also assumes a one-to-one correspondence between namespaces and schemas. Instead, I have introduced two new system properties that can be applied at code generation time: - `sbe.precedence.checks.flagName`, which controls the symbol/macro used to enable precedence checks at build time in the generated C#/C++ code. It defaults to `"SBE_ENABLE_PRECEDENCE_CHECKS"`. - `sbe.precedence.checks.propName`, which controls the property name used to enable precedence checks at runtime in the generated Java code. It defaults to `"sbe.enable.precedence.checks"`. Now, users can run SbeTool with different values for these system properties to generate code with different enablement flags. Note above that the defaults are: - `SBE_ENABLE_PRECEDENCE_CHECKS` rather than `SBE_ENABLE_SEQUENCING_CHECKS`, and - `sbe.enable.precedence.checks` rather than `sbe.enable.sequencing.checks`. These changes and a change to another system property: - `sbe.generate.sequencing.checks` -> `sbe.generate.precedence.checks` address some feedback from Martin on the associated PR real-logic#948.
11afc92
to
08367e0
Compare
@mjpt777 and @kieranelby, the per-schema precedence checking flags should now be addressed by this commit: 08367e0 Ultimately, I opted to give the user control over the flags generated by SbeTool, rather than embedding the namespace name automatically. Hopefully, this keeps it simple for most users. Please let me know if you have any reservations about the approach. I have updated the docs to account for the change in my Wiki fork. |
Hi @ZachBray Yes that's a good idea to allow advanced use-cases to customise sbe.precedence.checks.flagName while still keeping it simple for the normal case 👍 Cheers, |
sbe-tool/src/main/java/uk/co/real_logic/sbe/generation/java/JavaGenerator.java
Outdated
Show resolved
Hide resolved
sbe-tool/src/main/java/uk/co/real_logic/sbe/generation/java/AccessOrderModel.java
Outdated
Show resolved
Hide resolved
sbe-tool/src/main/java/uk/co/real_logic/sbe/generation/java/JavaGenerator.java
Show resolved
Hide resolved
sbe-tool/src/main/java/uk/co/real_logic/sbe/generation/java/AccessOrderModel.java
Outdated
Show resolved
Hide resolved
Also, addresses Mike's feedback around the names of those system properties. In this commit, I've introduced a factory-like class: `PrecedenceChecks`. It creates models for message field access when the checks are enabled. It also provides a convenient place to group together some properties affecting the generation of field precedence checks. I have also renamed `AccessOrderModel` to `FieldPrecedenceModel` in this commit, as it aligns better with the system property names.
Previously, we only gave the (quite ambiguous) name of the inner class in the error message. Now, we give a qualified name that includes the codec/encoder/decoder class name (for C++, C#, and Java).
In this commit, I've removed the `PrecedenceChecks#Configuration` class in favour of a static method within `TargetCodeGeneratorLoader`. Unfortunately, C# behaves differently to all the other code generators. To avoid duplicating the default configuration for precedence checks in the `CSharp` class, I've made the method publicly visible, which is a bit ugly. It has also introduced a cyclic dependency between the code generators, e.g., `JavaGenerator`, and `TargetCodeGeneratorLoader`, as in order to preserve the existing public signatures I have kept the old constructors around and supplied a default `PrecedenceChecks`.
Move defaults out of property evaluation and into context.
a7733e1
to
8796e9e
Compare
The problem
The encoders and decoders that SBE generates in Java (and some other languages) require developers to follow a strict contract (that is not enforced by the type system):
next()
before encoding/decoding each repeating group element.When encoding, failure to follow the contract can produce invalid messages that do not conform to the format described in the associated SBE schema.
When decoding, failure to follow the contract can result in the misinterpretation of valid messages.
The goal
The goals of this PR are:
(Preferring Goal 2 to Goal 3 when there is a choice.)
This solution
This PR adds runtime checks based on a state machine that models interactions with encoders/decoders/codecs as transitions.
New system properties
This PR adds three new system properties:
sbe.generate.precedence.checks
controls whether to generate runtime checks at build timesbe.enable.precedence.checks
controls whether to run checks at runtimenot(agrona.disable.bounds.checks)
sbe.disable.implicit.copying
, which we'll discuss laterChanges in generated code
Car example
-Dsbe.generate.precedence.checks=true -Dsbe.cpp.disable.implicit.copying=true
-Dsbe.generate.precedence.checks=false -Dsbe.cpp.disable.implicit.copying=false
-Dsbe.generate.precedence.checks=false -Dsbe.cpp.disable.implicit.copying=true
Java changes commentary
(When
sbe.generate.precedence.checks
istrue
at build time.)Let's consider an example SBE schema fragment:
Each message encoder and decoder now has a static boolean field
SBE_ENABLE_PRECEDENCE_CHECKS
that controls whether runtime checks will be performed.Each message encoder and decoder also has a
codecState
field that represents the state of the codec according to our model. Benchmarks show better performance when runtime checks are enabled with anint
than an enum.Each field accessor now verifies that the field access is allowed and transitions the encoder's or decoder's state.
Verifying and transitioning the state machine is done within
onBodyAccessed()
rather than inline because benchmarks show performance improvements when running with runtime checks disabled.The states of the encoder or decoder, along with some lookup tables for constructing error messages, are generated within an inner class:
The dot-language diagram can be rendered using online or IDE-based tools.
Error messages
In the case of breaking the developer contract, the generated code produces an exception that contains:
Continuing our example, if we accidentally switch
body
andsubject
:Then we get the following error message:
Modelling notes:
Decoding different versions
The SBE specification says the following:
Java decoders provide a default value when a field is absent in the received version, but C# codecs do not. Instead, C# codecs rely on the developer querying presence properties, e.g.,
bool IsBodyPresent { get; }
. To support languages like C#, it seems sensible to model that absent fields may be skipped. Therefore, we model each version as a separate subgraph of the state machine. To allow the behaviour of decoders like Java's that return defaults, we perform the state machine transition only when a default is not returned.Restrictions to block field access
Our current model has some false negatives (Goal 3).
In particular, we do not model the ability to access top-level block fields or group block fields after a subsequent group or variable-level data field has been accessed.
I believe eliminating the false negatives around group block access would result in an explosion in state space where the number of states would be exponential w.r.t. the number of repeating groups. I reverted a commit where I attempted to achieve this in a naive manner. The commit message discusses some of the problems at length.
Eliminating the false negatives around top-level block access is easier. We just special case these fields in the code generator.
Representing repeated groups
To provide better error messages and to avoid unrelated fields interrogating fields private to group encoders/decoders, we model groups using several states that differentiate between empty/before-next/many-elements-remaining/last-element.
Questions:
MessageDecoder#GroupDecoder#wrap
public? I cannot think of a sensible way to model these interactions. I expect people usually wrap implicitly via.myGroupCount(N)
in encoders or.myGroup()
in decoders.Other changes
i. I've added some documentation to the Wiki in a separate patch.
ii. The semantics of copying flyweights seem non-obvious to me, and the presence of implicit copy constructors and copy assignment operators permit one to write (probably) less-efficient code (but I haven't measured it, and the additional work might be optimised away). For example:
and
vs.
MyMessage::MyGroup &grp = myMessage.myGroupCount(1);
and
Therefore, I've explicitly deleted the copy constructor and copy assignment operator from flyweights when the
sbe.cpp.disable.implicit.copying
is set totrue
.