Question on undefined behavior as branch condition #891

fruffy · 2020-09-16T11:24:07Z

I have a question on undefined behavior in conditions regarding P4. Assuming I have a block like this.

control ingress(inout Headers hdr, ...) {
    apply {
        bit<8> undef;
        if (undef == 1) {
            hdr.eth_hdr.src_addr = 1;
        } else {
            hdr.eth_hdr.src_addr = 2;
        }
    }
}

To what extent can the compiler transform this expression? Is turning the entire statement into a noop possible?

control ingress(inout Headers hdr, ...) {
    apply {
    }
}

The C++ standard says that if undefined behavior is on the path anything goes. Is the same true for P4?

jnfoster · 2020-09-16T13:51:49Z

My understanding of the spec is that either

control ingress(inout Headers hdr, ...) {
    apply {
       hdr.eth_hdr.src_addr = 1;
    }
}

or

control ingress(inout Headers hdr, ...) {
    apply {
       hdr.eth_hdr.src_addr = 2;
    }
}

would be something a compiler could produce, but not the version you posted.

vgurevich · 2020-09-16T15:32:01Z

I agree with @jnfoster

jafingerhut · 2020-09-16T15:58:18Z

The P4_16 language specification goes to some lengths to avoid undefined behavior that can occur in C/C++. For example, it defines bit-for-bit what the result of signed integer addition/subtraction is, even in conditions of overflow, as Java does.

For reading variables that have not been initialized, or fields of headers that are currently invalid, I believe most (or perhaps all?) of what the specification has to say on the behavior is in Section 8.22 "Reading uninitialized values and writing fields of invalid headers": https://p4.org/p4-spec/docs/P4-16-v1.2.1.html#sec-uninitialized-values-and-writing-invalid-headers

As an example illustrating one of the things it says there, which is that reading an uninitialized variable multiple times might result in getting back a different value, note that this program:

control ingress(inout Headers hdr, ...) {
    apply {
        bit<8> undef;
        if (undef == 1) {
            hdr.eth_hdr.src_addr = 1;
        } else {
            hdr.eth_hdr.src_addr = 2;
        }
        if (undef == 1) {
            hdr.eth_hdr.dst_addr = 3;
        } else {
            hdr.eth_hdr.dst_addr = 4;
        }
    }
}

would be legal to implement as the following program, because the first read of the uninitialized variable undef could get back 1, but the second read of undef (still uninitialized) could get back a different value than 1:

control ingress(inout Headers hdr, ...) {
    apply {
        hdr.eth_hdr.src_addr = 1;
        hdr.eth_hdr.dst_addr = 4;
    }
}

mihaibudiu · 2020-09-16T16:09:21Z

Today we don't treat undefined values as constants.
In principle you could compile if (a) x else y into if (a) x; if (!a) y; under appropriate circumstances, and then all combinations would be possible.

jafingerhut · 2020-09-16T16:42:58Z

@mbudiu-vmw

It seems to me that it would be incorrect for a compiler to transform if (a) x else y into if (a) x; if (!a) y;, if a includes reads of variables/fields that return unspecified values, because if (a) x; if (!a) y; could then execute both code x followed by y, which the original if (a) x else y could never do.

This makes me wonder whether this aspect of the specification undercuts assumptions one often uses to reason about code transformations. Often one thinks about duplicating expressions and evaluating them repeatedly (expressions with no side effects) will return the same result, but the P4_16 spec explicitly allows implementations to evaluate an uninitialized variable differently on different occurrences in the source code.

For example, it seems that the localCopyPropagation pass, by duplicating occurrences of expressions, might result in programs that the spec says have more possible final states than the original program did, if some of the variables read in those expressions are uninitialized?

jafingerhut · 2020-09-16T17:09:22Z

This is getting into esoteric details, I know, and I will try to refrain from introducing more problems than I solve, but this example seems scary to me.

Consider programs 1 and 2 below, which to just about anyone look like they should have the same behavior:

// program 1
control ingress(inout Headers hdr, ...) {
    apply {
        bit<8> undef;
        if (undef == 1) {
            hdr.eth_hdr.src_addr = 1;
        }
        if (! (undef == 1)) {
            hdr.eth_hdr.ether_type = 2;
        }
    }
}

// program 2
control ingress(inout Headers hdr, ...) {
    apply {
        bit<8> undef;
        bit<8> x;
        x = undef;   // x is initialized, but to some value that the spec does not guarantee what it is, other than it is some value of type bit<8>, i.e. in the range 0 to 255
        if (x == 1) {
            hdr.eth_hdr.src_addr = 1;
        }
        if (! (x == 1)) {
            hdr.eth_hdr.ether_type = 2;
        }
    }
}

According to my understanding of the spec, program 1 can execute the bodies of both if statements, because we read undef twice, and those reads might return different values.

Program 2 cannot execute the bodies of both if statements, because x is initialized. The spec cannot be used to guarantee which of the two if statement bodies will be executed, but it does imply that exactly one of them will be executed, not both, and not neither.

jnfoster · 2020-09-16T21:34:53Z

I agree with @jafingerhut's analysis of the last example... it's weird, but it's what we got.

fruffy · 2020-09-17T07:52:23Z

Interesting discussion, thank you for all the comments.

@jafingerhut
So the compiler indeed optimizes the example you gave to

control ingress(inout Headers h, inout Meta m, inout standard_metadata_t sm) {
    bit<8> undef_0;
    apply {
        if (undef_0 == 8w1) {
            h.eth_hdr.src_addr = 48w1;
        }
        if (undef_0 != 8w1) {
            h.eth_hdr.eth_type = 16w2;
        }
    }
}

in the mid end. Is this a problem?

I am (obviously) wondering about this from the angle of equality checks. Things are becoming quite unclear to me at what point a program with undefined behavior is equal to its transformed version and when it is not. Or rather, what the expectation of a programmer may be. Based on the discussion I can see that

bit<8> undef;
if (undef == 1) {
    hdr.eth_hdr.src_addr = 1;
} else {
    hdr.eth_hdr.src_addr = 2;
}

can be transformed into just one assignment of 1 or 2 but not anything else. So a transformation like @mbudiu-vmw mentioned should not be possible? My solution used to be to exclude any variable "tainted" by undefined behavior from the equality but this does not seem precise enough.

jafingerhut · 2020-09-17T15:15:23Z

I am curious myself to hear from other compiler experts whether:
(a) this transformation from a if (a) x else y to if (a) x; if (!a) y; is considered a correct one, in cases where a can evaluate to true twice, or false twice, because of its use of uninitialized variables/fields.
(b) whether any transformation, in general, is considered a correct one, if the resulting program can have more possible behaviors due to use of uninitialized variables than the original program.

Note that these transformations seem correct to me if no variables are uninitialized, or at least none of them are uninitialized that would increase the number of possible behaviors of the transformed code.

I see at least the following two alternatives:

(1) change p4c so that it never makes such transformations, or at least never when they increase the number of possible behaviors of the program.

(2) explicitly allow in the spec that a compiler can make such transformations, and thus the behavior of code reading uninitalized variables is truly "anything goes". At least, I do not see how to limit the behaviors to less than "anything goes" without making pretty tight restrictions like described in (1).

Thoughts?

jnfoster · 2020-09-17T15:19:38Z

(a) No
(b) No, a compiler should preserve the semantics of (or, to be more precise, refine) the original program

(1) This is the "right" answer"
(2) We could do this but

It's kinda weird for a specification to endorse particular syntactic transformations.
But otherwise we seem to be on a slippery slope toward C/C++ style undefined behavior, which I am strongly opposed to.

mihaibudiu · 2020-09-17T17:37:00Z

There are two kinds of "undefined":

definitely undefined. These should be errors.
maybe undefined. These should be warnings (as they are now).
I think they should be treated differently even in the verification process.
This is a classical question in compiler programming languages: what is the semantics of a program? (Some of the recent discussions of @vgurevich are related to this question as well). The compiler is supposed to preserve the semantics. Currently we define informally the semantics of programs with defined values, and we say some things about the semantics of some programs with undefined values (all these "blast radius" statements). But we don't have a clear definition of programs computing over undefined values, and it's unlikely we'll write one. Frankly, at this point we have enough bugs on well-defined programs that I am not worried about the other cases.

jafingerhut · 2020-09-17T19:13:31Z

@fruffy I do not know what others think, but Mihai's comment about relative priority of these kinds of issues, versus compiler bugs that are unrelated to uninitialized variables made me think of it.

Perhaps one suggestion for your work is that if p4c gives a warning about the use of a variable that might be uninitialized, and if p4c then transforms the input program into something that looks incorrect for the reasons related to that variable being used uninitialized, consider setting those programs aside for a while, or just skip them completely and look for others that have problems when no such warnings are given?

Rationale: If p4c currently detects that a variable might be used uninitialized, then it seems likely that one could enhance p4c passes some day to take that information into account. If p4c bugs are found where no such warnings are issued, or the bugs seem unrelated to the use of uninitialized values, then they are higher priority.

FYI: I have added some examples and discussion about issues around uninitialized variables here: https://github.com/jafingerhut/p4-guide/blob/master/formal-verification/README.md

fruffy · 2020-09-18T07:52:33Z

@jafingerhut
Thanks for those notes. They are a good reference.

To handle cases such as issue 2470 in p4c I recheck whether a given violation may have been caused by a change to undefined behavior (e.g., a refinement). If that is the case, the violation is classified as "undefined violation". However, that recheck requires a precise model what constitutes a refinement and what doesn't in order to avoid false negatives. For now I will go with a more lenient model for undefined behavior that still reliably catches well-defined violations. However, it may miss some of the cases discussed in this thread or in your examples.

fruffy closed this as completed Sep 18, 2020

fruffy-bfn mentioned this issue Jul 26, 2021

Tricky case with uninitialized values... #952

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on undefined behavior as branch condition #891

Question on undefined behavior as branch condition #891

fruffy commented Sep 16, 2020 •

edited

Loading

jnfoster commented Sep 16, 2020

vgurevich commented Sep 16, 2020

jafingerhut commented Sep 16, 2020 •

edited

Loading

mihaibudiu commented Sep 16, 2020

jafingerhut commented Sep 16, 2020 •

edited

Loading

jafingerhut commented Sep 16, 2020

jnfoster commented Sep 16, 2020

fruffy commented Sep 17, 2020 •

edited

Loading

jafingerhut commented Sep 17, 2020

jnfoster commented Sep 17, 2020

mihaibudiu commented Sep 17, 2020

jafingerhut commented Sep 17, 2020 •

edited

Loading

fruffy commented Sep 18, 2020

Question on undefined behavior as branch condition #891

Question on undefined behavior as branch condition #891

Comments

fruffy commented Sep 16, 2020 • edited Loading

jnfoster commented Sep 16, 2020

vgurevich commented Sep 16, 2020

jafingerhut commented Sep 16, 2020 • edited Loading

mihaibudiu commented Sep 16, 2020

jafingerhut commented Sep 16, 2020 • edited Loading

jafingerhut commented Sep 16, 2020

jnfoster commented Sep 16, 2020

fruffy commented Sep 17, 2020 • edited Loading

jafingerhut commented Sep 17, 2020

jnfoster commented Sep 17, 2020

mihaibudiu commented Sep 17, 2020

jafingerhut commented Sep 17, 2020 • edited Loading

fruffy commented Sep 18, 2020

fruffy commented Sep 16, 2020 •

edited

Loading

jafingerhut commented Sep 16, 2020 •

edited

Loading

jafingerhut commented Sep 16, 2020 •

edited

Loading

fruffy commented Sep 17, 2020 •

edited

Loading

jafingerhut commented Sep 17, 2020 •

edited

Loading