Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error: concurrent map writes #1962

Closed
yuvalamar opened this issue Dec 17, 2019 · 5 comments · Fixed by #1991
Closed

fatal error: concurrent map writes #1962

yuvalamar opened this issue Dec 17, 2019 · 5 comments · Fixed by #1991
Assignees
Labels
investigating Issues being actively investigated

Comments

@yuvalamar
Copy link

yuvalamar commented Dec 17, 2019

Expected Behavior

Opa should not panic and exit

Actual Behavior

panic

Steps to Reproduce the Problem

I think the bug is found in :
func (m *Manager) Client(name string) rest.Client { return m.services[name] }

I added the full log with panic stack
we had 19 OPA restarts in production in the last 24 hours

panic.log

Additional Info

opa version 15.0

@patrick-east patrick-east self-assigned this Dec 17, 2019
@patrick-east patrick-east added the investigating Issues being actively investigated label Dec 17, 2019
@patrick-east
Copy link
Contributor

Can you share more details about the OPA configuration and logs leading up to the panic? I only see the stack trace log.

@yuvalamar
Copy link
Author

This happens during everyday operation, but I think it relates to Status API with multiple bundles.
(https://www.openpolicyagent.org/docs/latest/management/#status) we enabled it and the panic was shown in our logs

patrick-east added a commit to patrick-east/opa that referenced this issue Jan 9, 2020
Previously it would pass a reference to the status map on the plugin,
which is potentially dangerous as the map can be changed (happens
explicitly on plugin reconfigure).

This now gives each bulk listener their own copy of the map.

Fixes: open-policy-agent#1962
Signed-off-by: Patrick East <east.patrick@gmail.com>
@patrick-east
Copy link
Contributor

#1991 makes some speculative changes (at least fixing some issue).

@yuvalamar any chance you can try that change out? I haven't been able to repo the issue on my environments.

patrick-east added a commit that referenced this issue Jan 13, 2020
Previously it would pass a reference to the status map on the plugin,
which is potentially dangerous as the map can be changed (happens
explicitly on plugin reconfigure).

This now gives each bulk listener their own copy of the map.

Fixes: #1962
Signed-off-by: Patrick East <east.patrick@gmail.com>
@patrick-east
Copy link
Contributor

Re-opening until we get some verification that #1991 fixes it

@patrick-east patrick-east reopened this Jan 13, 2020
@patrick-east
Copy link
Contributor

Was able to reproduce this by using ~1000 bundles (each with their own unique service) and updating the discovery config to remove bundles (dropped it to ~980).

The trick appeared to be having enough bundles and updating them at a high enough frequency that there was constantly a status update message being sent, this pretty much guaranteed that the reconfiguration that happened when discovery updates would modify the status map while the status plugin was trying to read it.

With the build including the changes from #1991 it was no longer reproducing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating Issues being actively investigated
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants