Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible regression for dynamic dispatch + nested comprehensions #2497

Closed
jaspervdj-luminal opened this issue Jun 26, 2020 · 5 comments · Fixed by #2515
Closed

Possible regression for dynamic dispatch + nested comprehensions #2497

jaspervdj-luminal opened this issue Jun 26, 2020 · 5 comments · Fixed by #2515
Labels

Comments

@jaspervdj-luminal
Copy link
Contributor

Expected Behavior

Through fugue/regula#51, I found a possible regression in
OPA 0.20 & 0.21. The test case involves using dynamic dispatch as well as nested
comprehensions. I've tried to reduce it as much as possible, but it's still somewhat
complex and seems to require the combination of these two features.

Using opa-0.19.2:

$ ./opa-0.19.2 run lib1.rego lib2.rego bug.rego
OPA 0.19.2 (commit 40f9c1fe, built at 2020-04-27T22:51:01Z)

Run 'help' to see a list of commands.

> data.bug.values
[
  [
    1
  ]
]

This is the correct result.

Actual Behavior

Using opa-0.21.0:

$ ./opa-0.21.0 run lib1.rego lib2.rego bug.rego
OPA 0.21.0 (commit 3807c481, built at 2020-06-16T15:40:07Z)

Run 'help' to see a list of commands and check for updates.

> data.bug.values
[
  [
    1,
    2
  ]
]

Steps to Reproduce the Problem

The following files were used in this bug:

$ cat lib1.rego
package lib.lib1

category = "lib"
value = 1

$ cat lib2.rego
package lib.lib2

category = "lib"
value = 2

$ cat bug.rego
package bug

samples = [
  {"category": "lib"},
  {"category": "lib"},
]

values = ret {
  pkg = "lib1"
  category = "lib"
  ret = { j |
    sample = samples[_]
    sample.category == category
    j = [v | v = data["lib"][pkg]["value"]]
  }
}

Additional Info

It's late here so I didn't spend that much time debugging this. My intuition says that
there is some issue with pkg being considered a free variable in the nested
comprehension. Fortunately the bug is easy to reproduce so I can easily figure
out where exactly the regression happened using git bisect. I'll try to take a look
at that next week.

@jaspervdj-luminal
Copy link
Contributor Author

I did a little bit of bisecting and it seems to have been introduced in 4dd119b, but I haven't been able to dig into the root cause yet.

@jaspervdj-luminal
Copy link
Contributor Author

So I realize what is happening -- the conditions around when an comprehension is
safe to index are perhaps not quite strict enough. Consider the query from the
original example, with emphasis on the pkg variable:

values = ret {
  pkg = "lib1"
  category = "lib"
  ret = { j |
    sample = samples[_]
    sample.category = category
    j = [v | v = data["lib"][pkg]["value"]]
                             ^^^
  }
}

It is possible to safely evaluate the inner comprehension, since pkg and
category are both out-vars (category in the inner comprehension, and
pkg in the 2nd-level inner comprehension).

{j |
    sample = samples[_]
    sample.category = category
    j = [v | v = data["lib"][pkg]["value"]]
}

However, if we do this, we change the semantics of the original code, since
pkg is now a free variable rather than restricted to "lib1".

I think there are two possible directions I can look at for a fix:

  1. We update the rules around index comprehensions to make them more strict.
    When considering an index, we collect the outvars for any (possibly deeply)
    nested comprehensions. These may not intersect with the "candidates".

  2. Outvars appearing in (possibly deeply) nested comprehensions should also
    be considered as keys for the index.

@tsandall
Copy link
Member

tsandall commented Jul 7, 2020

@jaspervdj-luminal thanks for digging into this. I'll take a look today.

@jaspervdj-luminal
Copy link
Contributor Author

@tsandall Thanks! I'm happy to help with the fix as well.

Here is a case that makes it simpler to reproduce, since it turns out that
dynamic dispatch in the original snippet I posted was a red herring,
we simply need the right shape of nested comprehensions:

package bug

english = {
  "one": 1,
  "two": 2,
  "three": 3
}

should_be_1 = ret {
  textual = "one"
  numeric = 1
  ret = {total |
    english[k] = v
    k = textual
    total = sum([numeric | english[_] = numeric])
  }
}

should_be_1 evaluates to 6 in OPA >= 0.20 and 1 in earlier versions.

@tsandall
Copy link
Member

tsandall commented Jul 7, 2020

@jaspervdj-luminal I spent a little bit of time looking at the issue this afternoon. I agree with your assessment.

When the comprehensions are evaluated to build the cache, the evaluator does not provide bindings for variables in the outer scope and the wrong value is produced for the nested comprehension.

Modifying the evaluator to provide bindings for variables in the outer scope would be difficult as the caches would need to be invalidated if there were multiple assignments to the variables that are closed over. IMO, this precludes the second solution because the nested comprehension would have to evaluated with the correct binding whereas today we just evaluate the comprehensions because we should be able to assume it's safe to do so.

This leaves the option of making the index identification step more strict. I think you've outlined it well though I think we could say intersection between candidates and any variable (not just outputs) in the nested comprehensions will disqualify.

I can work on this tomorrow morning. Shouldn't be too difficult to fix.

tsandall added a commit to tsandall/opa that referenced this issue Jul 8, 2020
In v0.20. we added support for indexing and caching of comprehensions
in certain cases. There was a bug in the index construction that
allowed for comprehensions to be indexed and cached
incorrectly. Specifically, if any of the candidate keys appear in
nested comprehensions, indexing should not be performed because the
evaluator does not close over variable bindings from the outer scope
when evaluating the comprehension (because this would require cache
invalidation if there were multiple assignments to the variables in
the outer scope).

This fix updates the compiler to check if the candidates appear inside
of any nested comprehensions. If a match is found, no indexing is
performed.

Fixes open-policy-agent#2497

Signed-off-by: Torin Sandall <torinsandall@gmail.com>
tsandall added a commit that referenced this issue Jul 8, 2020
In v0.20. we added support for indexing and caching of comprehensions
in certain cases. There was a bug in the index construction that
allowed for comprehensions to be indexed and cached
incorrectly. Specifically, if any of the candidate keys appear in
nested comprehensions, indexing should not be performed because the
evaluator does not close over variable bindings from the outer scope
when evaluating the comprehension (because this would require cache
invalidation if there were multiple assignments to the variables in
the outer scope).

This fix updates the compiler to check if the candidates appear inside
of any nested comprehensions. If a match is found, no indexing is
performed.

Fixes #2497

Signed-off-by: Torin Sandall <torinsandall@gmail.com>
tsandall added a commit to tsandall/opa that referenced this issue Jul 9, 2020
In v0.20. we added support for indexing and caching of comprehensions
in certain cases. There was a bug in the index construction that
allowed for comprehensions to be indexed and cached
incorrectly. Specifically, if any of the candidate keys appear in
nested comprehensions, indexing should not be performed because the
evaluator does not close over variable bindings from the outer scope
when evaluating the comprehension (because this would require cache
invalidation if there were multiple assignments to the variables in
the outer scope).

This fix updates the compiler to check if the candidates appear inside
of any nested comprehensions. If a match is found, no indexing is
performed.

Fixes open-policy-agent#2497

Signed-off-by: Torin Sandall <torinsandall@gmail.com>
(cherry picked from commit e9779eb)
tsandall added a commit that referenced this issue Jul 9, 2020
In v0.20. we added support for indexing and caching of comprehensions
in certain cases. There was a bug in the index construction that
allowed for comprehensions to be indexed and cached
incorrectly. Specifically, if any of the candidate keys appear in
nested comprehensions, indexing should not be performed because the
evaluator does not close over variable bindings from the outer scope
when evaluating the comprehension (because this would require cache
invalidation if there were multiple assignments to the variables in
the outer scope).

This fix updates the compiler to check if the candidates appear inside
of any nested comprehensions. If a match is found, no indexing is
performed.

Fixes #2497

Signed-off-by: Torin Sandall <torinsandall@gmail.com>
(cherry picked from commit e9779eb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants