Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy the labels map before editing #4021

Merged
merged 1 commit into from
Aug 14, 2023

Conversation

johnbelamaric
Copy link
Contributor

Fixes #3958

The logs showed a crash during rendering the metadata.Labels field. This was a crash that was completely outside the path of any of our code - meaning that extra locking won't help, because we can't change the API server code to grab a lock. The code was serving something up from a cache, so we must have been modifying the data in that cache at the same time.

I found a couple places that we tweak the Labels field, and these are areas of the code that were active in other runnable goroutines, so I am pretty sure they are the culprit. I have been running the new build for a few hours now, and no crashes. The released version was crashing ~1 per hour. Let's let it steep over the weekend.

/hold

@johnbelamaric johnbelamaric requested a review from a team as a code owner August 11, 2023 19:16
Copy link
Contributor

@mortent mortent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this change if it addresses the crashes. But it seems like if there is concurrent access to the labels map, we will still have potential for concurrent access to the Labels field of the PackageRevision objects. Not sure if this will potentially lead to concurrent access errors, or if we risk visibility problems.

Regardless, the usage of the kpt.dev/latest-revision label is problematic for other reasons as well, as described in #3672, so we should try to get rid of it.

@johnbelamaric
Copy link
Contributor Author

Yeah, this fixes the crash but does not fix the issue described in #3672 - which is effectively what you are describing in your comment, @mortent.

After three different instance soaked for > 2.5 days each (so total soak of ~ 8 days), we saw one crash in each, but unrelated to the one this is trying to fix.

@johnbelamaric johnbelamaric merged commit 36120be into kptdev:main Aug 14, 2023
15 checks passed
johnbelamaric added a commit to mortent/kpt that referenced this pull request Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

porch: porch-server crash with "concurrent map iteration and map write"
2 participants