Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display or rename issue with partial eval result #2053

Closed
srenatus opened this issue Feb 2, 2020 · 4 comments · Fixed by #2261
Closed

display or rename issue with partial eval result #2053

srenatus opened this issue Feb 2, 2020 · 4 comments · Fixed by #2261
Assignees

Comments

@srenatus
Copy link
Contributor

srenatus commented Feb 2, 2020

(This is a bit of a spin-off of #2045.)

Running opa eval 'data.test.allow' --format=pretty -d test.rego -p with test.rego as

package test

allow {
  a = input.arr[_]
  a.foo == "bar"
  a.baz == "quz"
}

(it doesn't matter if you use a = input.arr[_] or a := input.arr[_]) -- the result is

+---------+--------------------------+
| Query 1 | input.arr[_].foo = "bar" |
|         | input.arr[_].baz = "quz" |
+---------+--------------------------+

Now, that would violate the property that eval(partial_eval(policy), input) == eval(policy, input), which one might expect to hold.

However, looking at the JSON output, we find

{
  "partial": {
    "queries": [
      [
        {
          "index": 0,
          "terms": [
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "eq"
                }
              ]
            },
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "input"
                },
                {
                  "type": "string",
                  "value": "arr"
                },
                {
                  "type": "var",
                  "value": "$01"
                },
                {
                  "type": "string",
                  "value": "foo"
                }
              ]
            },
            {
              "type": "string",
              "value": "bar"
            }
          ]
        },
        {
          "index": 1,
          "terms": [
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "eq"
                }
              ]
            },
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "input"
                },
                {
                  "type": "string",
                  "value": "arr"
                },
                {
                  "type": "var",
                  "value": "$01"
                },
                {
                  "type": "string",
                  "value": "baz"
                }
              ]
            },
            {
              "type": "string",
              "value": "quz"
            }
          ]
        }
      ]
    ]
  }
}

and this probably means that the two _ in the pretty output really are meant to be the same variable, i.e.

input.arr[$01].foo = "bar"
input.arr[$01].baz = "quz"

Long story short, if the pretty version of something like this is requested, OPA should not translate the $01 into _, as it breaks stuff. 😉

Also, if the rego is changed to a = input.arr[x], the pretty output becomes

+---------+---------------------------+
| Query 1 | input.arr[x1].foo = "bar" |
|         | input.arr[x1].baz = "quz" |
+---------+---------------------------+

This suggests that the right thing to do in this situation, facing $01, might be to introduce a fresh variable. What do you think?

@srenatus srenatus changed the title display issue with partial eval result display or rename issue with partial eval result Feb 2, 2020
@tsandall tsandall added the bug label Feb 22, 2020
@tsandall
Copy link
Member

Yep, this needs to be fixed. I'm not immediately sure what the best option is. If we change the String() function on variables, we lose the ability to roundtrip policies using tools like opa fmt. Alternatively, we could modify partial eval to not emit variables prefixed with "$" (which is how the String() function decides whether to use _ or not.) We should enumerate/explore the options before committing.

@patrick-east
Copy link
Contributor

Brainstorming some options:

  1. When partial eval uses generated variables it injects "real" ones for saved things

    • Pros:
      • Should be pretty simple to implement
      • No impact on fmt round tripping, and users looking at the result likely don't care if there is a generated name vs "_"
    • Cons:
      • Likely some performance overhead, depending on how this gets implemented
      • Does change the output for existing users
      • Might be hard to generate unique names (have to worry about globals)
  2. Just change the stringer to emit the vars

    • Pros:
      • Simple to implement (probably easiest)
    • Cons:
      • Can't round trip the output (opa fmt or even just pretty output -> parser) as "$" isn't valid for a var name and is unlikely what someone would want when doing a opa fmt.
  3. Change the stringer to emit generated vars (sort of like 1 and 2), special case opa fmt to swap in "_" when it can

    • Any time it sees a wildcard referenced >1 time it needs to use the "real" one, otherwise it can just write "_".
      Maybe just keep mapping of wildcards to location info in the buffer. Go back and "fix" them all that end up
      being seen >1 time.

Digging more into changes for option (3), my favorite so far...

  • Change wildcard variables to have stringers that look more like locals, like $x --> __VAR_x__ or whatever to minimize the odds that we have a collision with some other real variable.
    • For actual evaluation we can keep the $x to give better guarantees about the uniqueness of them.
  • Change the format package tooling to track when a wildcard var is being emitted.
    • Add its location in the output buffer to a list of locations kept in a map where the key is the var name
    • After we finish the first pass of formatting the policy go back over it again swapping in any spots with >1 reference
      • This should be a very small subset of cases, if ever(?)
  • End result:
    • The pretty format output which shows queries will always have the __VAR_x__ output, which favors correctness and computer over human grokking
    • Error messages that had _ will show the __VAR_x__ (could maybe update our error helpers to "do the right thing" here too?)
    • OPA fmt will likely be unchanged for normal policies. Only golang API users that call into OPA to do partial eval and then dump it
      out with the format package would notice that wildcards used in >1 place would have the variable names.
    • JSON API results (CLI or REST API) would have __VAR_x__ style rather than the $x ones they do today.

@tsandall Thoughts?

@patrick-east
Copy link
Contributor

Reading through the implementation a little more. It seems like we can maybe do a couple different things WRT what I outlined in my previous comment..

The presentation package calls into the formatter to display those queries. So if we were to update the formatter code to look for re-used wildcards and inject the parseable unique names we would be OK. We wouldn't even need to change the stringers for them, the formatter would just need to be updated to look for ast.Var's in more places and do a little bit more manual work when dumping out ast.Refs (rather than just using its stringer).

patrick-east added a commit to patrick-east/opa that referenced this issue Apr 6, 2020
The formatter would normally just use the stringer for ast.Var which
would swap in `_` for any wildcard variables (internally represented
with a `$xx` syntax). This works fine except for AST dumped into the
formatter that might have the same wildcard variable used multiple
times. This can be seen by using partial evaluation creating multiple
statements from a single original source. In the formatted output if
we swap in `_` it can affect the resulting logic if they were supposed
to be the same variable.

The formatter now will check for any wild cards that show up >1 time
in the AST passed in to be formatted. Any it finds will be assigned
a new variable name like `__wilcardxx__` and in the resulting output
will use that name instead of the `_` syntax.

Fixes: open-policy-agent#2053
Signed-off-by: Patrick East <east.patrick@gmail.com>
patrick-east added a commit that referenced this issue Apr 6, 2020
The formatter would normally just use the stringer for ast.Var which
would swap in `_` for any wildcard variables (internally represented
with a `$xx` syntax). This works fine except for AST dumped into the
formatter that might have the same wildcard variable used multiple
times. This can be seen by using partial evaluation creating multiple
statements from a single original source. In the formatted output if
we swap in `_` it can affect the resulting logic if they were supposed
to be the same variable.

The formatter now will check for any wild cards that show up >1 time
in the AST passed in to be formatted. Any it finds will be assigned
a new variable name like `__wilcardxx__` and in the resulting output
will use that name instead of the `_` syntax.

Fixes: #2053
Signed-off-by: Patrick East <east.patrick@gmail.com>
@srenatus
Copy link
Contributor Author

srenatus commented Apr 7, 2020

@patrick-east thank you! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants