Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java,C#,C++] Support field access order checking #948

Merged
merged 50 commits into from
Dec 6, 2023

Conversation

ZachBray
Copy link
Contributor

@ZachBray ZachBray commented Jun 21, 2023

The problem

The encoders and decoders that SBE generates in Java (and some other languages) require developers to follow a strict contract (that is not enforced by the type system):

  • Developers must encode/decode repeating groups and variable-length data fields in the order in which they appear in the schema.
  • Developers must explicitly encode/decode/skip all present groups and variable-length data, i.e., no implicit skipping.
  • Developers must call next() before encoding/decoding each repeating group element.

When encoding, failure to follow the contract can produce invalid messages that do not conform to the format described in the associated SBE schema.

When decoding, failure to follow the contract can result in the misinterpretation of valid messages.

The goal

The goals of this PR are:

  1. to support checking that the contract above is followed (at runtime) in non-production environments
  2. to minimise false positives
    • w.r.t. the classification: "this code safely uses encoders and decoders"
    • i.e., minimise cases where checks pass but the code is broken
  3. to minimise false negatives
    • i.e., minimise cases where checks fail but the code works

(Preferring Goal 2 to Goal 3 when there is a choice.)

This solution

This PR adds runtime checks based on a state machine that models interactions with encoders/decoders/codecs as transitions.

New system properties

This PR adds three new system properties:

  • sbe.generate.precedence.checks controls whether to generate runtime checks at build time
  • sbe.enable.precedence.checks controls whether to run checks at runtime
    • defaults to not(agrona.disable.bounds.checks)
  • sbe.disable.implicit.copying, which we'll discuss later

Changes in generated code

Car example
flags diff
-Dsbe.generate.precedence.checks=true -Dsbe.cpp.disable.implicit.copying=true diff vs master
-Dsbe.generate.precedence.checks=false -Dsbe.cpp.disable.implicit.copying=false diff vs master
-Dsbe.generate.precedence.checks=false -Dsbe.cpp.disable.implicit.copying=true diff vs master
Java changes commentary

(When sbe.generate.precedence.checks is true at build time.)

Let's consider an example SBE schema fragment:

    <sbe:message name="SendChatMessage" id="99">
        <field name="chatId" id="1" type="int64"/>
        <data name="subject" id="2" type="varDataEncoding"/>
        <data name="body" id="3" type="varDataEncoding"/>
    </sbe:message>

Each message encoder and decoder now has a static boolean field SBE_ENABLE_PRECEDENCE_CHECKS that controls whether runtime checks will be performed.

private static final boolean ENABLE_BOUNDS_CHECKS = !Boolean.getBoolean("agrona.disable.bounds.checks");

private static final boolean SBE_ENABLE_PRECEDENCE_CHECKS = Boolean.parseBoolean(System.getProperty(
        "sbe.enable.precedence.checks",
        Boolean.toString(ENABLE_BOUNDS_CHECKS)));

Each message encoder and decoder also has a codecState field that represents the state of the codec according to our model. Benchmarks show better performance when runtime checks are enabled with an int than an enum.

    private int codecState = CodecStates.NOT_WRAPPED;

Each field accessor now verifies that the field access is allowed and transitions the encoder's or decoder's state.

    public SendChatMessageEncoder body(final CharSequence value)
    {
        final int length = null == value ? 0 : value.length();
        if (length > 254)
        {
            throw new IllegalStateException("length > maxValue for type: " + length);
        }

+       if (SBE_ENABLE_PRECEDENCE_CHECKS)
+       {
+           onBodyAccessed();
+       }

        final int headerLength = 1;
        final int limit = parentMessage.limit();
        parentMessage.limit(limit + headerLength + length);
        buffer.putByte(limit, (byte)length);
        buffer.putStringWithoutLengthAscii(limit + headerLength, value);

        return this;
    }

Verifying and transitioning the state machine is done within onBodyAccessed() rather than inline because benchmarks show performance improvements when running with runtime checks disabled.

    private void onBodyAccessed()
    {
        switch (codecState())
        {
            case CodecStates.V0_SUBJECT_DONE:
                codecState(CodecStates.V0_BODY_DONE);
                break;
            default:
                throw new IllegalStateException("Illegal field access order. " +
                    "Cannot access field \"body\" in state: " + CodecStates.name(codecState()) +
                    ". Expected one of these transitions: [" + CodecStates.transitions(codecState()) +
                    "]. Please see the diagram in the Javadoc of the inner class #CodecStates.");
        }
    }

The states of the encoder or decoder, along with some lookup tables for constructing error messages, are generated within an inner class:

    /**
     * The states in which a encoder/decoder/codec can live.
     *
     * <p>The state machine diagram below, encoded in the dot language, describes
     * the valid state transitions according to the order in which fields may be
     * accessed safely. Tools such as PlantUML and Graphviz can render it.
     *
     * <pre>{@code
     *   digraph G {
     *       NOT_WRAPPED -> V0_BLOCK [label="  wrap(version=0)  "];
     *       V0_BLOCK -> V0_BLOCK [label="  chatId(?)  "];
     *       V0_BLOCK -> V0_SUBJECT_DONE [label="  subject(?)  "];
     *       V0_SUBJECT_DONE -> V0_BODY_DONE [label="  body(?)  "];
     *   }
     * }</pre>
     */
    private static class CodecStates
    {
        private static final int NOT_WRAPPED = 0;
        private static final int V0_BLOCK = 1;
        private static final int V0_SUBJECT_DONE = 2;
        private static final int V0_BODY_DONE = 3;

        private static final String[] STATE_NAME_LOOKUP =
        {
            "NOT_WRAPPED",
            "V0_BLOCK",
            "V0_SUBJECT_DONE",
            "V0_BODY_DONE",
        };

        private static final String[] STATE_TRANSITIONS_LOOKUP =
        {
            "\"wrap(version=0)\"",
            "\"chatId(?)\", \"subject(?)\"",
            "\"body(?)\"",
            "",
        };

        private static String name(final int state)
        {
            return STATE_NAME_LOOKUP[state];
        }

        private static String transitions(final int state)
        {
            return STATE_TRANSITIONS_LOOKUP[state];
        }
    }

The dot-language diagram can be rendered using online or IDE-based tools.

image

Error messages

In the case of breaking the developer contract, the generated code produces an exception that contains:

  • the action against the encoder or decoder that failed
  • the current state of the encoder or decoder
  • the expected actions that can safely be performed against the encoder or decoder in its current state

Continuing our example, if we accidentally switch body and subject:

        final SendChatMessageEncoder encoder = new SendChatMessageEncoder()
            .wrapAndApplyHeader(buffer, OFFSET, messageHeaderEncoder);

        encoder.chatId(1)
            .body("About 1 ft tall and furry.")
            .subject("Missing cat");

Then we get the following error message:

Illegal field access order.
Cannot access field "body" in state: V0_BLOCK.
Expected one of these transitions: ["chatId(?)", "subject(?)"].
Please see the diagram in the Javadoc of the inner class #CodecStates.

Modelling notes:

Decoding different versions

The SBE specification says the following:

If the received version is less than the decoder’s version (that is, the producer’s encoder is older than the consumer’s decoder), then only the fields of the older version may be parsed. This information is available through metadata as “sinceVersion” attribute of a field. If sinceVersion is greater than received schema version, then the field is not available. How a decoder signals an application that a field is unavailable is an implementation detail. One strategy is for an application to provide a default value for unavailable fields.

Java decoders provide a default value when a field is absent in the received version, but C# codecs do not. Instead, C# codecs rely on the developer querying presence properties, e.g., bool IsBodyPresent { get; }. To support languages like C#, it seems sensible to model that absent fields may be skipped. Therefore, we model each version as a separate subgraph of the state machine. To allow the behaviour of decoders like Java's that return defaults, we perform the state machine transition only when a default is not returned.

Restrictions to block field access

Our current model has some false negatives (Goal 3).
In particular, we do not model the ability to access top-level block fields or group block fields after a subsequent group or variable-level data field has been accessed.

I believe eliminating the false negatives around group block access would result in an explosion in state space where the number of states would be exponential w.r.t. the number of repeating groups. I reverted a commit where I attempted to achieve this in a naive manner. The commit message discusses some of the problems at length.

Eliminating the false negatives around top-level block access is easier. We just special case these fields in the code generator.

Representing repeated groups

To provide better error messages and to avoid unrelated fields interrogating fields private to group encoders/decoders, we model groups using several states that differentiate between empty/before-next/many-elements-remaining/last-element.

Questions:

  1. Why are methods like MessageDecoder#GroupDecoder#wrap public? I cannot think of a sensible way to model these interactions. I expect people usually wrap implicitly via .myGroupCount(N) in encoders or .myGroup() in decoders.

Other changes

i. I've added some documentation to the Wiki in a separate patch.

ii. The semantics of copying flyweights seem non-obvious to me, and the presence of implicit copy constructors and copy assignment operators permit one to write (probably) less-efficient code (but I haven't measured it, and the additional work might be optimised away). For example:

MyMessage::MyGroup grp = myMessage.myGroupCount(1); // calls copy constructor

and

MyMessage::MyOuterGroup::MyInnerGroup &inner = myOuterGroup.myInnerGroupCount(1);

// ...

inner = myOuterGroup.next().myInnerGroupCount(2); // calls copy assignment operator

vs.

MyMessage::MyGroup &grp = myMessage.myGroupCount(1);

and

MyMessage::MyOuterGroup::MyInnerGroup &inner1 = myOuterGroup.myInnerGroupCount(1);

// ...

MyMessage::MyOuterGroup::MyInnerGroup &inner2 = myOuterGroup.next().myInnerGroupCount(2);

Therefore, I've explicitly deleted the copy constructor and copy assignment operator from flyweights when the sbe.cpp.disable.implicit.copying is set to true.

@ZachBray
Copy link
Contributor Author

I plan to make a slight improvement today. As of the current revision (4eb8fd0), we do not expose any mechanism to check that a message is fully encoded, but we easily could.

I plan to generate a method, e.g., void checkFullyEncoded(), that will throw an exception that provides similar context to our existing errors in the case where the encoder is not in a terminal state.

It would be possible to do something similar for decoders, but it would rely on having numGroups and numVarDataFields in the message header.

@ZachBray
Copy link
Contributor Author

I plan to make a slight improvement today. As of the current revision (4eb8fd0), we do not expose any mechanism to check that a message is fully encoded, but we easily could.

I plan to generate a method, e.g., void checkFullyEncoded(), that will throw an exception that provides similar context to our existing errors in the case where the encoder is not in a terminal state.

As of c67e20a, we generate a void checkEncodingIsComplete() method on encoders when the sbe.generate.access.order.checks property is enabled at build time.

@ZachBray ZachBray changed the title [Java] Support field access order checking [Java,CSharp] Support field access order checking Jul 5, 2023
@ZachBray ZachBray force-pushed the feature/field-order-check branch 3 times, most recently from 7c1f56a to b3f809d Compare July 5, 2023 11:37
@ZachBray ZachBray force-pushed the feature/field-order-check branch from b3f809d to aa136ce Compare August 18, 2023 16:44
@ZachBray ZachBray changed the title [Java,CSharp] Support field access order checking [Java,C#,C++] Support field access order checking Aug 18, 2023
@ZachBray ZachBray force-pushed the feature/field-order-check branch 3 times, most recently from 0365f16 to d33e54b Compare August 22, 2023 15:15
SBE does not support arbitrary access to fields when encoding and
decoding. In particular, accessing groups and variable length fields
advances a `limit`, which marks where the start of the next
group/variable length field is expected.

We aim to introduce some checking (in debug mode - when bounds checks
are enabled) that verifies field access order is consistent enough with
the order defined in the schema to encode or decode a valid message.
Also, enable some tests around field order checking.

Outline of the approach:

- Using a FSM, model the states of an encoder during encoding.
    - Hopefully, this model will be language independent.
- Generate code to perform state transitions upon field accesses.
- And to throw when transitions are not in the model.

Assumptions:

- JIT compilation will remove `if (DEBUG_MODE) { ... }` blocks when the
static boolean constant, `DEBUG_MODE`, is `false`.

Questions:

- Is the approach reasonable?
- Is the newly disallowed behaviour reasonable?
    - For example, it prevents encoding block fields after a group or
variable length field has been encoded.

Further work:

- Decode path
- More tests (including tests for decoding old messages where some
fields may not be present)
Use `-Dsbe.print.state.machine=$NAME_OF_MSG`.
This change builds upon the work to check field encoding order.
It is a slightly more-complex problem than with encoding, as when
decoding it is necessary to deal with multiple versions of encoded data.
To handle this, we fuse together the state machines for decoding each
possible version and branch on "decoder.wrap(...)".

Further work:

- More tests
- Better error messages
"New" decoders (in Java at least) read null-like values for fields
that are not present in "old" messages. There are no restrictions on the
order in which these "new" fields are accessed when decoding "old" data.

In Java, a version check returns a null-like value when a field is not
present in the acting version. This behaviour matches the wiki
documentation.

C#, currently, has different behaviour: the application code needs to
interrogate a boolean property, e.g., `IsFooInActingVersion`, before
accessing the data property, e.g., `Foo`. The property itself assumes
that the field is present. It will attempt to read it from the buffer
and possibly advance a `limit` field.

In this commit, I have moved the state transition beneath the existing
version check in Java decoders. This change allows these fields to be
accessed when decoding "old" messages without changing the state of
decoding, which seems appropriate as there are no side-effects on
internal state, e.g., the `limit` field.

Also, in this commit, I have expanded the test coverage to include the
getter and setter variants for fixed-size arrays and variable-length
data.
Also, use an enum to represent codec states rather than a state number
and generate (dot-based) state machine diagram in Javadoc for states
enum.
Also, fix some spots where I had forgotten to move the state transition
below the version check in decoders.
There is still work to do in this area. In particular:

1. States should be more descriptive.
1. Errors should contain full field paths.
2. Group error messages should be more clear, e.g., "cannot write group
element before calling next()".
…ely, in some cases where it is not).

This commit makes the field order checks more-lenient. The aim is to
allow the application code to access fields in the root/top-level block
or blocks inside groups whenever it is safe. However, the approach is
broken and does not work in general.

To (try to) achieve this aim, we rely on the fact that once a block is
positioned correctly, it is safe to access during the remainder of
encoding/decoding. For example, the following code is safe:

```
encoder.wrap(...);
encoder.blockField1(42);
encoder.varData("abc");
encoder.blockField2(43);
```

As soon as `wrap(...)` is called, the top-level block will be positioned
over a valid region for the remainder of encoding.

Groups are slightly trickier. The following code is _safe_:

```
encoder.wrap(...);
Foo.BarEncoder bar = encoder.groupBarCount(2);
bar.next()
    .x(1).y(2)
    .next();
    .x(3).y(4);
```

But the following code is _unsafe_:

```
encoder.wrap(...);
Foo.BarEncoder bar = encoder.groupBarCount(2);
bar.x(1).y(2)
    .next();
    .x(3).y(4);
```

And the following code is _unsafe_ too:

```
encoder.wrap(...)
Foo.BarEncoder bar = encoder.groupBarCount(0);
encoder.baz("abc");
bar.x(1).y(2);
```

The `offset` for the group encoder is only set when `next()` is called.
Therefore, calling `x(1)` and `y(2)` will likely overwrite some data
elsewhere in the buffer (probably near offset 0).

We can still rely on the fact that once `next()` has been called - even
if it is called multiple times, or too many times (causing it to throw)
- the group's block fields will be safe to encode/decode.

Thinking about this in terms of our state machine that models codec
state and field accesses, we can put this another way. There is at least
one transition that represents the `next()` method call. The states
(transitively) reachable from these transitions can safely access the
associated group block.

As already mentioned, despite the passing tests, the approach in this
commit doesn't always work.

I had thought it would be possible to assign a number to each state as
we find them using DFS and use this number as a shortcut to determining
whether a block access was safe. For example, in each block field we
would check:

```
if (currentState.number() >= firstStateAfterNextIsCalled.number())
{
    // ALLOW
}
else
{
    // DENY
}
```

However, (even ignoring the cycles due to repeating groups and the use
of DFS) this solution makes the following assumptions:

1. There is a single state: `firstStateAfterNextIsCalled`, such that all
subsequent transitively reachable states are safe.

2. It is possible to number the states in such a way that
`numberOfState(b) >= numberOfState(a)` implies `b` is (transitively)
reachable from `a`.

Let's consider the following psuedo-schema w.r.t. our assumptions above:

```
group a {
  x: int32
}
group b {
  y: int32
}
```

Can we set `x` if we've called `BEncoder#next`? Sometimes.

The following code is invalid.

```
AEncoder a = encoder.aCount(0);
BEncoder b = encoder.bCount(1).next();
b.y(2);
a.x(1); // Whoops! We overwrote some data somewhere.
```

But this code is valid.

```
AEncoder a = encoder.aCount(1).next();
BEncoder b = encoder.bCount(1).next();
b.y(2);
a.x(1);
```

Therefore, multiple transitions must exist for the `BEncoder#next` call
with different exit states, which breaks Assumption 1.

Assumption 2 is also problematic. As alluded to above, we must have a
bifurcation of states depending on whether a group has elements or is
empty. With arbitrary bifurcations, it is not possible to number the
states in such a way that Assumption 2 holds.

Consider the following state diagram:

```
        state N
         /   \
     empty   non-empty
      /        \
state X        state Y
```

Depending on whether a group is empty, e.g., `aCount(0)`, or not, e.g.,
`aCount(1)`, we transition to either `state X` or `state Y` from
`state N`.

For Assumption 2 to hold, we need
`numberOf(state X) >= numberOf(state N)` (reachable) and
`numberOf(state Y) >= numberOf(state N)` (reachable) and
`numberOf(state X)  < numberOf(state Y)` (unreachable) and
`numberOf(state Y)  < numberOf(state X)` (unreachable).
Clearly, this is impossible.

Prior to this commit, we avoided such problems by enforcing a
more-strict ordering of field accesses. With this more-strict ordering,
setting fields in a group block in later states was not allowed.
Therefore, the state machine didn't have to have to split into mutually
exclusive subgraphs of reachable states depending on whether or not the
group is empty.

For example, let's consider a psuedo-schema with two groups.

```
group a {
  x: int32
}
group b {
  y: int32
}
```

Once the application code calls `encoder.bCount(2).next()`, our state
machine will enter the `V0_B_N_BLOCK` state regardless of whether
previously the application code called `encoder.aCount(0)` or
`encoder.aCount(1).next()`. In `V_B_N_BLOCK` the application code is not
allowed to call `AEncoder#x`; therefore, it is safe, i.e., it has no
false positives. However, it is also too-strict, i.e., it has false
negatives, e.g., the `encoder.aCount(1).next()` case.

How likely are these false negatives to come up in the "wild"?

If we think false negatives will often come up, is it possible to remove
them whilst still using a state machine?

We could consider generating more states to represent the reachable
states under each fork of group emptiness. The number of states would
grow exponentially w.r.t. the number of top-level groups, but let's
ignore that for the moment and just consider whether or not it is
possible.

Continuing the example above with two groups, when the application code
calls `encoder.bCount(2).next()`, the state it enters would depend on
whether it had previously called `encoder.aCount(0)` or
`encoder.aCount(n)...next()` for some value `n > 0`.

To safely model this example, we'd need at least the following states:

1. has no `a` elements
2. has no `a` elements, and has no `b` elements
3. has `a` elements, but has not called `AEncoder#next`
4. has `a` elements, and has called `AEncoder#next`
5. has `a` elements, has called `AEncoder#next`, has `b` elements, but
has not called `BEncoder#next`
6. has `a` elements, has called `AEncoder#next`, has `b` elements, and
has called `BEncoder#next`
7. has no `a` elements, has `b` elements, but has not called
`BEncoder#next`
8. has no `a` elements, has `b` elements, and has called
`BEncoder#next`

A trickier case is with nested groups:

```
group foo {
  x: int32
  group bar {
    y: int32
  }
  r: string
  s: string
}
```

When is `Bar#y` safe to call?

I believe it is safe to call in states where `FooEncoder#next` and
`BarEncoder#next` have been called in-order more recently than
`BarEncoder#count(0)`. Note, as `bar` is a group within `foo`, the
`BarEncoder#next` and `BarEncoder#count` methods may be called
repeatedly during the encoding of a single message.  Implementing a
more-permissive safety check, using this model, would entail checking
that the current state of the encoder/decoder is in a state where this
condition holds.

To safely model this example, we'd need (at least) the following states:

1. has no `foo` elements
2. has `foo` elements, but has not called `FooEncoder#next`
3. has `foo` elements, and has called `FooEncoder#next`
4. has `foo` elements, has called `FooEncoder#next`, and has no `bar`
elements
5. has `foo` elements, has called `FooEncoder#next`, has no `bar`
elements, and has encoded `r`
6. has `foo` elements, has called `FooEncoder#next`, has no `bar`
elements, and has encoded `s`
7. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
but has not called `BarEncoder#next`
8. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
has not called `BarEncoder#next`, and has encoded `r`
9. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
has not called `BarEncoder#next`, and has encoded `s`
10. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
and has called `BarEncoder#next`
11. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
and has called `BarEncoder#next`, and has encoded `r`
12. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
and has called `BarEncoder#next`, and has encoded `s`

```
FooEncoder foo = encoder.fooCount(3); // State 2
foo.next(); // State 3
foo.barCount(0); // State 4
foo.r("abc"); // State 5
foo.s("def"); // State 6
foo.next(); // State 4
BarEncoder bar = foo.barCount(1); // State 7
foo.r("abc"); // State 8
bar.next(); // State 11
bar.y(42); // State 11
foo.s("def"); // State 12
foo.next(); // State X?
bar.y(43); // Weird but allowed.
foo.barCount(0); // State Y? State X cannot be State 10, as State 10 should not allow _re-encoding_ of the bar group count
```

OK. We didn't have enough states. In the case where we "wrap back
around", we need to model whether `bar` block access is allowed (State 13 vs
State 3):

13. has `foo` elements, has called `FooEncoder#next`, has called
`BarEncoder#next` for last iteration, but hasn't set `bar` group size in
this iteration.

An even trickier case is with doubly-nested groups:

```
group foo {
  x: int32
  group bar {
    y: int32
    group baz {
      z: int32
    }
  }
}
```

When is `Baz#z` safe to call?

I believe it is safe to call in states where `FooEncoder#next`,
`BarEncoder#next` and `BazEncoder#next` have been called in-order
more recently than `BarEncoder#count(0)` or `BazEncoder#count(0).
Encoding this in states is likely to result in more cases like State 13
above.

Generalising: a group block field may be accessed only when all outer
groups' and its own `#next` methods have been called in-order more
recently than any outer groups' or its own `#count` method with a `0`
argument.

Can model "more-recently" exactly, i.e., with no false positives and no
false negatives, using a state machine?

Another way of looking at the state machines we generate, is as parsers
for a language. Where our language, based on an SBE schema, comprises
valid "strings" of calls to an encoder/decoder. Is this language
a regular language? If not, representing it using a FSM will not work.
Cue Pumping Lemma?

Next steps:

- Add more tests to show where this approach breaks
- Revert the changes and see if it is acceptable that some valid uses of
codecs will be prevented when enabling field order checking. I.e., it
will give false negatives when testing the hypothesis "this
encoding/decoding is valid".
    - Personally, I think false negatives are better than false
positives. (False positives are possible as of this commit.) Having said
that, it will discourage use of these checks in existing systems where
valid code already exists that does not pass the checks.
(When testing the hypothesis "this decoding/encoding code is valid".)

These tests show that there are problems with the approach in the last
commit.
…fortunately, in some cases where it is not)."

This reverts commit d9eb33d.
To reduce the number of false negatives the field access order checker
generates, in this commit, we treat the top-level block fields
differently.

We know that once the top-level encoder is wrapped, these fields are
always safe to access.

As a result, we were able to enable several tests that previously
failed.
Introduces the system property:

```
sbe.enable.access.order.checks
```
Some JIT compilers are able to eliminate dead code within branches of
if-statements where the conditional is a static expression. Some JIT
compilers perform inlining to some degree and use heuristics, e.g.,
method size. It isn't clear whether code size metrics will be calculated
after dead code elimination is performed.

In this commit, I've moved the code that performs state transitions when
access order checks are enabled into separate methods that the
encoders/decoders call upon field accesses, e.g., the getter for a field
`xyz` might look like this:

```
int xyz()
{
    if (ENABLED_ACCESS_ORDER_CHECKS)
    {
       onXyzAccessed();
    }

    // ...
}
```

This keeps the initial size of `xyz` smaller than when the code of
`onXyzAccessed()` is "inlined" by the code generator.

There is a lot of noise on my ThinkPad P14s, but benchmarks do show
improvements after this change.

Before:

```
$ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2

\# JMH version: 1.36
\# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
\# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java
\# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false

...

Benchmark                 Mode  Cnt         Score        Error  Units
CarBenchmark.testDecode  thrpt    6  18051405.792 ± 444296.898  ops/s
CarBenchmark.testEncode  thrpt    6  11964827.418 ± 221996.747  ops/s
```

After:

```
$ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2
\# JMH version: 1.36
\# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
\# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java
\# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false

...

Benchmark                 Mode  Cnt         Score         Error  Units
CarBenchmark.testDecode  thrpt    6  18132922.415 ± 1964596.044  ops/s
CarBenchmark.testEncode  thrpt    6  20018131.180 ±  671367.070  ops/s
```

Baseline (d2ec8f8):

```
$ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2
\# JMH version: 1.36
\# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
\# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java
\# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false

...

Benchmark                 Mode  Cnt         Score         Error  Units
CarBenchmark.testDecode  thrpt    6  19270366.357 ± 2618336.027  ops/s
CarBenchmark.testEncode  thrpt    6  22519718.535 ±  794169.180  ops/s
```
@ZachBray ZachBray force-pushed the feature/field-order-check branch from 5fb5a18 to cef3e1f Compare September 20, 2023 12:03
Removes extraneous space in Java output when access order checks are
disabled.

Removes unnecessary access modifiers in C++ when access order checks are
disabled.
@kieranelby
Copy link

Hi, really good to see the access checks being added, definitely think this will help eliminate some rather scary bugs!

I was a bit surprised that the checks weren't doing anything in C# until I realised despite generating them I had to preprocessor #define ENABLE_ACCESS_ORDER_CHECKS too which is a bit unusual, but I guess the performance rationale makes sense, and it is documented on the Safe-Flyweight-Usage page.

For what it's worth I tried out some obvious encoding blunders from C# (group after data, fewer items in group than I said, too many Next() calls, data fields in wrong order), and all got caught, the exception messages were pretty helpful 👍

@ZachBray
Copy link
Contributor Author

ZachBray commented Oct 4, 2023

Thanks for the feedback, @kieranelby. It was good to meet you all in 3D yesterday.

IIRC, the other adjustments to make to this PR - i.e., non-DTO-related requests - were:

  1. To rename ENABLE_ACCESS_ORDER_CHECKS to avoid the homograph and to improve namespacing. Martin has suggested SBE_ENABLE_PRECEDENCE_CHECKS or SBE_ENABLE_SEQUENCING_CHECKS.

  2. To consider using a symbol/macro/system_property scoped to each schema. Would a global "flag" also be useful? If we had both, which should take precedence?

  3. To notify the user if generation is disabled but the runtime flag is enabled.

    In situations where generation is - intentionally - partially enabled, i.e., across some but not all schemas, and we use a global flag, I worry this could be annoying. But perhaps that scenario is unlikely?

  4. To consider only disabling specifiers, e.g., noexcept, based on the runtime flag rather than at code generation time. The only potential downside I can see here is that a method that interacts with a codec and is marked as noexcept would compile work when the runtime checks flag is disabled but fail to compile when the flag is enabled, but that will probably be an easy fix for the consumer throw a runtime exception when the flag is enabled (EDITED). I'll experiment with this further.

The aim of this change is to avoid naming collisions with other flags,
particularly in financial domains where "order" is used frequently.
@ZachBray ZachBray force-pushed the feature/field-order-check branch from 558df22 to 69d13b6 Compare October 20, 2023 13:37
@kieranelby
Copy link

Hi @ZachBray sorry for the delay in getting back, one use case that occurred to us:

Some teams package up a client library for talking to their system.

They might want to include in the library two versions of the generated code: myawesomprotocol_unsafe (no checks) and myawesomeprotocol_checked (access check), but the schema (myawesomeprotocol) would be the same.

Maybe to support this it would be good to use the output namespace in the name of the property that controls whether checks are enabled?

Cheers, Kieran

@ZachBray
Copy link
Contributor Author

Hi @kieranelby,

They might want to include in the library two versions of the generated code: myawesomprotocol_unsafe (no checks) and myawesomeprotocol_checked (access check), but the schema (myawesomeprotocol) would be the same.

They wouldn't generate once (with checks) and use flags to enable/disable at runtime?

Maybe to support this it would be good to use the output namespace in the name of the property that controls whether checks are enabled?

Do I understand correctly that you are suggesting the enabled/disabled flag should be based on a fully qualified schema name? If so, I think this aligns with Point 2. Would you be happy with only namespaced flags? I suspect it will be simpler than having namespaced flags and a global flag.

Happy to have a call to discuss, if it is easier.

Thanks, Zach.

@kieranelby
Copy link

Hi @ZachBray , I thought disabling/enabling at runtime only works in java - for C++ and C# it is build time? Yes agree perhaps the global flag not needed if we have per-namespace flags.

@ZachBray
Copy link
Contributor Author

I thought disabling/enabling at runtime only works in java - for C++ and C# it is build time?

@kieranelby, sorry, yes, you're right.

Yes agree perhaps the global flag not needed if we have per-namespace flags.

I'll make the change.

@mjpt777
Copy link
Contributor

mjpt777 commented Nov 7, 2023

To rename ENABLE_ACCESS_ORDER_CHECKS to avoid the homograph and to improve namespacing. Martin has suggested SBE_ENABLE_PRECEDENCE_CHECKS or SBE_ENABLE_SEQUENCING_CHECKS.

I'm leaning to SBE_ENABLE_PRECEDENCE_CHECKS when looking at examples.

@ZachBray
Copy link
Contributor Author

@mjpt777, I've already made the change to SBE_ENABLE_SEQUENCING_CHECKS, but happy to s/SBE_ENABLE_SEQUENCING_CHECKS/SBE_ENABLE_PRECEDENCE_CHECKS/.

IIRC, you had some reservations about using flags named per-schema, i.e., around @kieranelby's request, is that still the case?

@mjpt777
Copy link
Contributor

mjpt777 commented Nov 10, 2023

@mjpt777, I've already made the change to SBE_ENABLE_SEQUENCING_CHECKS, but happy to s/SBE_ENABLE_SEQUENCING_CHECKS/SBE_ENABLE_PRECEDENCE_CHECKS/.

Either is fine. I have a mild preference for "precedence" but happy to go with whatever you prefer.

IIRC, you had some reservations about using flags named per-schema, i.e., around @kieranelby's request, is that still the case?

I think I'm okay with this now. Per schema flags are fine.

ZachBray added a commit to ZachBray/simple-binary-encoding that referenced this pull request Nov 13, 2023
Some users would like the ability to enable precedence checks on some
schemas but not others.

Initially, I considered generating flags or system properties with the
schema namespace; however, this forces the more-complex scheme on all
users. It also might get unwieldly with long namespaces. It also assumes
a one-to-one correspondence between namespaces and schemas.

Instead, I have introduced two new system properties that can be applied
at code generation time:

- `sbe.precedence.checks.flagName`, which controls the symbol/macro used
to enable precedence checks at build time in the generated C#/C++ code.
It defaults to `"SBE_ENABLE_PRECEDENCE_CHECKS"`.

- `sbe.precedence.checks.propName`, which controls the property name
used to enable precedence checks at runtime in the generated Java code.
It defaults to `"sbe.enable.precedence.checks"`.

Now, users can run SbeTool with different values for these system
properties to generate code with different enablement flags.

Note above that the defaults are:

- `SBE_ENABLE_PRECEDENCE_CHECKS` rather than
`SBE_ENABLE_SEQUENCING_CHECKS`, and
- `sbe.enable.precedence.checks` rather than
`sbe.enable.sequencing.checks`.

These changes and a change to another system property:

- `sbe.generate.sequencing.checks` -> `sbe.generate.precedence.checks`

address some feedback from Martin on the associated PR real-logic#948.
Some users would like the ability to enable precedence checks on some
schemas but not others.

Initially, I considered generating flags or system properties with the
schema namespace; however, this forces the more-complex scheme on all
users. It also might get unwieldly with long namespaces. It also assumes
a one-to-one correspondence between namespaces and schemas.

Instead, I have introduced two new system properties that can be applied
at code generation time:

- `sbe.precedence.checks.flagName`, which controls the symbol/macro used
to enable precedence checks at build time in the generated C#/C++ code.
It defaults to `"SBE_ENABLE_PRECEDENCE_CHECKS"`.

- `sbe.precedence.checks.propName`, which controls the property name
used to enable precedence checks at runtime in the generated Java code.
It defaults to `"sbe.enable.precedence.checks"`.

Now, users can run SbeTool with different values for these system
properties to generate code with different enablement flags.

Note above that the defaults are:

- `SBE_ENABLE_PRECEDENCE_CHECKS` rather than
`SBE_ENABLE_SEQUENCING_CHECKS`, and
- `sbe.enable.precedence.checks` rather than
`sbe.enable.sequencing.checks`.

These changes and a change to another system property:

- `sbe.generate.sequencing.checks` -> `sbe.generate.precedence.checks`

address some feedback from Martin on the associated PR real-logic#948.
@ZachBray ZachBray force-pushed the feature/field-order-check branch from 11afc92 to 08367e0 Compare November 13, 2023 16:57
@ZachBray
Copy link
Contributor Author

@mjpt777 and @kieranelby, the per-schema precedence checking flags should now be addressed by this commit: 08367e0

Ultimately, I opted to give the user control over the flags generated by SbeTool, rather than embedding the namespace name automatically. Hopefully, this keeps it simple for most users. Please let me know if you have any reservations about the approach.

I have updated the docs to account for the change in my Wiki fork.

@kieranelby
Copy link

Hi @ZachBray

Yes that's a good idea to allow advanced use-cases to customise sbe.precedence.checks.flagName while still keeping it simple for the normal case 👍

Cheers,
Kieran

Also, addresses Mike's feedback around the names of those system
properties.

In this commit, I've introduced a factory-like class:
`PrecedenceChecks`. It creates models for message field access when the
checks are enabled. It also provides a convenient place to group
together some properties affecting the generation of field precedence
checks.

I have also renamed `AccessOrderModel` to `FieldPrecedenceModel` in this
commit, as it aligns better with the system property names.
Previously, we only gave the (quite ambiguous) name of the inner class
in the error message. Now, we give a qualified name that includes the
codec/encoder/decoder class name (for C++, C#, and Java).
@ZachBray ZachBray requested a review from mikeb01 December 4, 2023 18:54
In this commit, I've removed the `PrecedenceChecks#Configuration` class
in favour of a static method within `TargetCodeGeneratorLoader`.

Unfortunately, C# behaves differently to all the other code generators.
To avoid duplicating the default configuration for precedence checks in
the `CSharp` class, I've made the method publicly visible, which is a
bit ugly.

It has also introduced a cyclic dependency between the code generators,
e.g., `JavaGenerator`, and `TargetCodeGeneratorLoader`, as in order to
preserve the existing public signatures I have kept the old constructors
around and supplied a default `PrecedenceChecks`.
Move defaults out of property evaluation and into context.
@ZachBray ZachBray force-pushed the feature/field-order-check branch from a7733e1 to 8796e9e Compare December 6, 2023 11:17
@ZachBray ZachBray merged commit a259c7d into real-logic:master Dec 6, 2023
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants