display or rename issue with partial eval result #2053

srenatus · 2020-02-02T16:06:39Z

(This is a bit of a spin-off of #2045.)

Running opa eval 'data.test.allow' --format=pretty -d test.rego -p with test.rego as

package test

allow {
  a = input.arr[_]
  a.foo == "bar"
  a.baz == "quz"
}

(it doesn't matter if you use a = input.arr[_] or a := input.arr[_]) -- the result is

+---------+--------------------------+
| Query 1 | input.arr[_].foo = "bar" |
|         | input.arr[_].baz = "quz" |
+---------+--------------------------+

Now, that would violate the property that eval(partial_eval(policy), input) == eval(policy, input), which one might expect to hold.

However, looking at the JSON output, we find

{
  "partial": {
    "queries": [
      [
        {
          "index": 0,
          "terms": [
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "eq"
                }
              ]
            },
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "input"
                },
                {
                  "type": "string",
                  "value": "arr"
                },
                {
                  "type": "var",
                  "value": "$01"
                },
                {
                  "type": "string",
                  "value": "foo"
                }
              ]
            },
            {
              "type": "string",
              "value": "bar"
            }
          ]
        },
        {
          "index": 1,
          "terms": [
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "eq"
                }
              ]
            },
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "input"
                },
                {
                  "type": "string",
                  "value": "arr"
                },
                {
                  "type": "var",
                  "value": "$01"
                },
                {
                  "type": "string",
                  "value": "baz"
                }
              ]
            },
            {
              "type": "string",
              "value": "quz"
            }
          ]
        }
      ]
    ]
  }
}

and this probably means that the two _ in the pretty output really are meant to be the same variable, i.e.

input.arr[$01].foo = "bar"
input.arr[$01].baz = "quz"

Long story short, if the pretty version of something like this is requested, OPA should not translate the $01 into _, as it breaks stuff. 😉

Also, if the rego is changed to a = input.arr[x], the pretty output becomes

+---------+---------------------------+
| Query 1 | input.arr[x1].foo = "bar" |
|         | input.arr[x1].baz = "quz" |
+---------+---------------------------+

This suggests that the right thing to do in this situation, facing $01, might be to introduce a fresh variable. What do you think?

The text was updated successfully, but these errors were encountered:

tsandall · 2020-02-24T16:17:23Z

Yep, this needs to be fixed. I'm not immediately sure what the best option is. If we change the String() function on variables, we lose the ability to roundtrip policies using tools like opa fmt. Alternatively, we could modify partial eval to not emit variables prefixed with "$" (which is how the String() function decides whether to use _ or not.) We should enumerate/explore the options before committing.

patrick-east · 2020-04-01T22:29:27Z

Brainstorming some options:

When partial eval uses generated variables it injects "real" ones for saved things
- Pros:
  - Should be pretty simple to implement
  - No impact on fmt round tripping, and users looking at the result likely don't care if there is a generated name vs "_"
- Cons:
  - Likely some performance overhead, depending on how this gets implemented
  - Does change the output for existing users
  - Might be hard to generate unique names (have to worry about globals)
Just change the stringer to emit the vars
- Pros:
  - Simple to implement (probably easiest)
- Cons:
  - Can't round trip the output (opa fmt or even just pretty output -> parser) as "$" isn't valid for a var name and is unlikely what someone would want when doing a opa fmt.
Change the stringer to emit generated vars (sort of like 1 and 2), special case opa fmt to swap in "_" when it can
- Any time it sees a wildcard referenced >1 time it needs to use the "real" one, otherwise it can just write "_".
  Maybe just keep mapping of wildcards to location info in the buffer. Go back and "fix" them all that end up
  being seen >1 time.

Digging more into changes for option (3), my favorite so far...

Change wildcard variables to have stringers that look more like locals, like $x --> __VAR_x__ or whatever to minimize the odds that we have a collision with some other real variable.
- For actual evaluation we can keep the $x to give better guarantees about the uniqueness of them.
Change the format package tooling to track when a wildcard var is being emitted.
- Add its location in the output buffer to a list of locations kept in a map where the key is the var name
- After we finish the first pass of formatting the policy go back over it again swapping in any spots with >1 reference
  - This should be a very small subset of cases, if ever(?)
End result:
- The pretty format output which shows queries will always have the __VAR_x__ output, which favors correctness and computer over human grokking
- Error messages that had _ will show the __VAR_x__ (could maybe update our error helpers to "do the right thing" here too?)
- OPA fmt will likely be unchanged for normal policies. Only golang API users that call into OPA to do partial eval and then dump it
  out with the format package would notice that wildcards used in >1 place would have the variable names.
- JSON API results (CLI or REST API) would have __VAR_x__ style rather than the $x ones they do today.

@tsandall Thoughts?

patrick-east · 2020-04-01T23:15:44Z

Reading through the implementation a little more. It seems like we can maybe do a couple different things WRT what I outlined in my previous comment..

The presentation package calls into the formatter to display those queries. So if we were to update the formatter code to look for re-used wildcards and inject the parseable unique names we would be OK. We wouldn't even need to change the stringers for them, the formatter would just need to be updated to look for ast.Var's in more places and do a little bit more manual work when dumping out ast.Refs (rather than just using its stringer).

The formatter would normally just use the stringer for ast.Var which would swap in `_` for any wildcard variables (internally represented with a `$xx` syntax). This works fine except for AST dumped into the formatter that might have the same wildcard variable used multiple times. This can be seen by using partial evaluation creating multiple statements from a single original source. In the formatted output if we swap in `_` it can affect the resulting logic if they were supposed to be the same variable. The formatter now will check for any wild cards that show up >1 time in the AST passed in to be formatted. Any it finds will be assigned a new variable name like `__wilcardxx__` and in the resulting output will use that name instead of the `_` syntax. Fixes: open-policy-agent#2053 Signed-off-by: Patrick East <east.patrick@gmail.com>

The formatter would normally just use the stringer for ast.Var which would swap in `_` for any wildcard variables (internally represented with a `$xx` syntax). This works fine except for AST dumped into the formatter that might have the same wildcard variable used multiple times. This can be seen by using partial evaluation creating multiple statements from a single original source. In the formatted output if we swap in `_` it can affect the resulting logic if they were supposed to be the same variable. The formatter now will check for any wild cards that show up >1 time in the AST passed in to be formatted. Any it finds will be assigned a new variable name like `__wilcardxx__` and in the resulting output will use that name instead of the `_` syntax. Fixes: #2053 Signed-off-by: Patrick East <east.patrick@gmail.com>

srenatus · 2020-04-07T09:08:01Z

@patrick-east thank you! 😃

srenatus changed the title ~~display issue with partial eval result~~ display or rename issue with partial eval result Feb 2, 2020

tsandall added the bug label Feb 22, 2020

tsandall mentioned this issue Mar 6, 2020

Compile API and negation #2045

Closed

patrick-east self-assigned this Apr 1, 2020

tsandall added the partial-eval label Apr 2, 2020

patrick-east mentioned this issue Apr 3, 2020

format: Print var if wildcard is used multiple times #2261

Merged

patrick-east closed this as completed in #2261 Apr 6, 2020

ashutosh-narkar added this to Open Policy Agent Aug 5, 2024

ashutosh-narkar moved this to Done in Open Policy Agent Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

display or rename issue with partial eval result #2053

display or rename issue with partial eval result #2053

srenatus commented Feb 2, 2020

tsandall commented Feb 24, 2020

patrick-east commented Apr 1, 2020

patrick-east commented Apr 1, 2020

srenatus commented Apr 7, 2020

display or rename issue with partial eval result #2053

display or rename issue with partial eval result #2053

Comments

srenatus commented Feb 2, 2020

tsandall commented Feb 24, 2020

patrick-east commented Apr 1, 2020

patrick-east commented Apr 1, 2020

srenatus commented Apr 7, 2020