Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on undefined behavior as branch condition #891

Closed
fruffy opened this issue Sep 16, 2020 · 13 comments
Closed

Question on undefined behavior as branch condition #891

fruffy opened this issue Sep 16, 2020 · 13 comments

Comments

@fruffy
Copy link
Contributor

fruffy commented Sep 16, 2020

I have a question on undefined behavior in conditions regarding P4. Assuming I have a block like this.

control ingress(inout Headers hdr, ...) {
    apply {
        bit<8> undef;
        if (undef == 1) {
            hdr.eth_hdr.src_addr = 1;
        } else {
            hdr.eth_hdr.src_addr = 2;
        }
    }
}

To what extent can the compiler transform this expression? Is turning the entire statement into a noop possible?

control ingress(inout Headers hdr, ...) {
    apply {
    }
}

The C++ standard says that if undefined behavior is on the path anything goes. Is the same true for P4?

@jnfoster
Copy link
Collaborator

My understanding of the spec is that either

control ingress(inout Headers hdr, ...) {
    apply {
       hdr.eth_hdr.src_addr = 1;
    }
}

or

control ingress(inout Headers hdr, ...) {
    apply {
       hdr.eth_hdr.src_addr = 2;
    }
}

would be something a compiler could produce, but not the version you posted.

@vgurevich
Copy link
Contributor

I agree with @jnfoster

@jafingerhut
Copy link
Collaborator

jafingerhut commented Sep 16, 2020

The P4_16 language specification goes to some lengths to avoid undefined behavior that can occur in C/C++. For example, it defines bit-for-bit what the result of signed integer addition/subtraction is, even in conditions of overflow, as Java does.

For reading variables that have not been initialized, or fields of headers that are currently invalid, I believe most (or perhaps all?) of what the specification has to say on the behavior is in Section 8.22 "Reading uninitialized values and writing fields of invalid headers": https://p4.org/p4-spec/docs/P4-16-v1.2.1.html#sec-uninitialized-values-and-writing-invalid-headers

As an example illustrating one of the things it says there, which is that reading an uninitialized variable multiple times might result in getting back a different value, note that this program:

control ingress(inout Headers hdr, ...) {
    apply {
        bit<8> undef;
        if (undef == 1) {
            hdr.eth_hdr.src_addr = 1;
        } else {
            hdr.eth_hdr.src_addr = 2;
        }
        if (undef == 1) {
            hdr.eth_hdr.dst_addr = 3;
        } else {
            hdr.eth_hdr.dst_addr = 4;
        }
    }
}

would be legal to implement as the following program, because the first read of the uninitialized variable undef could get back 1, but the second read of undef (still uninitialized) could get back a different value than 1:

control ingress(inout Headers hdr, ...) {
    apply {
        hdr.eth_hdr.src_addr = 1;
        hdr.eth_hdr.dst_addr = 4;
    }
}

@mihaibudiu
Copy link
Contributor

Today we don't treat undefined values as constants.
In principle you could compile if (a) x else y into if (a) x; if (!a) y; under appropriate circumstances, and then all combinations would be possible.

@jafingerhut
Copy link
Collaborator

jafingerhut commented Sep 16, 2020

@mbudiu-vmw

It seems to me that it would be incorrect for a compiler to transform if (a) x else y into if (a) x; if (!a) y;, if a includes reads of variables/fields that return unspecified values, because if (a) x; if (!a) y; could then execute both code x followed by y, which the original if (a) x else y could never do.

This makes me wonder whether this aspect of the specification undercuts assumptions one often uses to reason about code transformations. Often one thinks about duplicating expressions and evaluating them repeatedly (expressions with no side effects) will return the same result, but the P4_16 spec explicitly allows implementations to evaluate an uninitialized variable differently on different occurrences in the source code.

For example, it seems that the localCopyPropagation pass, by duplicating occurrences of expressions, might result in programs that the spec says have more possible final states than the original program did, if some of the variables read in those expressions are uninitialized?

@jafingerhut
Copy link
Collaborator

This is getting into esoteric details, I know, and I will try to refrain from introducing more problems than I solve, but this example seems scary to me.

Consider programs 1 and 2 below, which to just about anyone look like they should have the same behavior:

// program 1
control ingress(inout Headers hdr, ...) {
    apply {
        bit<8> undef;
        if (undef == 1) {
            hdr.eth_hdr.src_addr = 1;
        }
        if (! (undef == 1)) {
            hdr.eth_hdr.ether_type = 2;
        }
    }
}
// program 2
control ingress(inout Headers hdr, ...) {
    apply {
        bit<8> undef;
        bit<8> x;
        x = undef;   // x is initialized, but to some value that the spec does not guarantee what it is, other than it is some value of type bit<8>, i.e. in the range 0 to 255
        if (x == 1) {
            hdr.eth_hdr.src_addr = 1;
        }
        if (! (x == 1)) {
            hdr.eth_hdr.ether_type = 2;
        }
    }
}

According to my understanding of the spec, program 1 can execute the bodies of both if statements, because we read undef twice, and those reads might return different values.

Program 2 cannot execute the bodies of both if statements, because x is initialized. The spec cannot be used to guarantee which of the two if statement bodies will be executed, but it does imply that exactly one of them will be executed, not both, and not neither.

@jnfoster
Copy link
Collaborator

I agree with @jafingerhut's analysis of the last example... it's weird, but it's what we got.

@fruffy
Copy link
Contributor Author

fruffy commented Sep 17, 2020

Interesting discussion, thank you for all the comments.

@jafingerhut
So the compiler indeed optimizes the example you gave to

control ingress(inout Headers h, inout Meta m, inout standard_metadata_t sm) {
    bit<8> undef_0;
    apply {
        if (undef_0 == 8w1) {
            h.eth_hdr.src_addr = 48w1;
        }
        if (undef_0 != 8w1) {
            h.eth_hdr.eth_type = 16w2;
        }
    }
}

in the mid end. Is this a problem?

I am (obviously) wondering about this from the angle of equality checks. Things are becoming quite unclear to me at what point a program with undefined behavior is equal to its transformed version and when it is not. Or rather, what the expectation of a programmer may be. Based on the discussion I can see that

bit<8> undef;
if (undef == 1) {
    hdr.eth_hdr.src_addr = 1;
} else {
    hdr.eth_hdr.src_addr = 2;
}

can be transformed into just one assignment of 1 or 2 but not anything else. So a transformation like @mbudiu-vmw mentioned should not be possible? My solution used to be to exclude any variable "tainted" by undefined behavior from the equality but this does not seem precise enough.

@jafingerhut
Copy link
Collaborator

I am curious myself to hear from other compiler experts whether:
(a) this transformation from a if (a) x else y to if (a) x; if (!a) y; is considered a correct one, in cases where a can evaluate to true twice, or false twice, because of its use of uninitialized variables/fields.
(b) whether any transformation, in general, is considered a correct one, if the resulting program can have more possible behaviors due to use of uninitialized variables than the original program.

Note that these transformations seem correct to me if no variables are uninitialized, or at least none of them are uninitialized that would increase the number of possible behaviors of the transformed code.

I see at least the following two alternatives:

(1) change p4c so that it never makes such transformations, or at least never when they increase the number of possible behaviors of the program.

(2) explicitly allow in the spec that a compiler can make such transformations, and thus the behavior of code reading uninitalized variables is truly "anything goes". At least, I do not see how to limit the behaviors to less than "anything goes" without making pretty tight restrictions like described in (1).

Thoughts?

@jnfoster
Copy link
Collaborator

(a) No
(b) No, a compiler should preserve the semantics of (or, to be more precise, refine) the original program

(1) This is the "right" answer"
(2) We could do this but

  • It's kinda weird for a specification to endorse particular syntactic transformations.
  • But otherwise we seem to be on a slippery slope toward C/C++ style undefined behavior, which I am strongly opposed to.

@mihaibudiu
Copy link
Contributor

There are two kinds of "undefined":

  • definitely undefined. These should be errors.
  • maybe undefined. These should be warnings (as they are now).
    I think they should be treated differently even in the verification process.
    This is a classical question in compiler programming languages: what is the semantics of a program? (Some of the recent discussions of @vgurevich are related to this question as well). The compiler is supposed to preserve the semantics. Currently we define informally the semantics of programs with defined values, and we say some things about the semantics of some programs with undefined values (all these "blast radius" statements). But we don't have a clear definition of programs computing over undefined values, and it's unlikely we'll write one. Frankly, at this point we have enough bugs on well-defined programs that I am not worried about the other cases.

@jafingerhut
Copy link
Collaborator

jafingerhut commented Sep 17, 2020

@fruffy I do not know what others think, but Mihai's comment about relative priority of these kinds of issues, versus compiler bugs that are unrelated to uninitialized variables made me think of it.

Perhaps one suggestion for your work is that if p4c gives a warning about the use of a variable that might be uninitialized, and if p4c then transforms the input program into something that looks incorrect for the reasons related to that variable being used uninitialized, consider setting those programs aside for a while, or just skip them completely and look for others that have problems when no such warnings are given?

Rationale: If p4c currently detects that a variable might be used uninitialized, then it seems likely that one could enhance p4c passes some day to take that information into account. If p4c bugs are found where no such warnings are issued, or the bugs seem unrelated to the use of uninitialized values, then they are higher priority.

FYI: I have added some examples and discussion about issues around uninitialized variables here: https://github.com/jafingerhut/p4-guide/blob/master/formal-verification/README.md

@fruffy
Copy link
Contributor Author

fruffy commented Sep 18, 2020

@jafingerhut
Thanks for those notes. They are a good reference.

To handle cases such as issue 2470 in p4c I recheck whether a given violation may have been caused by a change to undefined behavior (e.g., a refinement). If that is the case, the violation is classified as "undefined violation". However, that recheck requires a precise model what constitutes a refinement and what doesn't in order to avoid false negatives. For now I will go with a more lenient model for undefined behavior that still reliably catches well-defined violations. However, it may miss some of the cases discussed in this thread or in your examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants