Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve consul plugin data race #6271

Merged
merged 5 commits into from
Apr 8, 2022

Conversation

sam-heilbron
Copy link
Contributor

@sam-heilbron sam-heilbron commented Apr 7, 2022

Description

Resolve a data race when resolving IP addresses

Context

Why fallback on errors?

In the consul plugin, we attempt to resolve IP addresses and then fallback to some previous state. This was introduced in #4563 and that PR/issue have the relevant context for why we want to support that.

What is the bug?

The data race occurs because we are accessing a map without synchronization

What is the solution?

We have a DnsResolver in the Consul plugin. It is responsible for taking an address and resolving it to return a list of IP Addresses or an error. Our decision to cache previous results can be wrapped into this resolver and let the plugin be unaware of its existence. Therefore, I introduced a new DnsResolver implementation, one that wraps the previous one but incorporates memoization. We inject the plugin with this new resolver so the plugin doesn't have to be aware of the change.

How did I test it?

We have an existing eds test which I updated to use our new dns resolver and confirmed the test still works as expected.

Checklist:

  • I included a concise, user-facing changelog (for details, see https://github.com/solo-io/go-utils/tree/master/changelogutils) which references the issue that is resolved.
  • If I updated APIs (our protos) or helm values, I ran make -B install-go-tools generated-code to ensure there will be no code diff
  • I followed guidelines laid out in the Gloo Edge contribution guide
  • I opened a draft PR or added the work in progress label if my PR is not ready for review
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

@solo-changelog-bot
Copy link

Issues linked to changelog:
#4782

@github-actions github-actions bot added the keep pr updated signals bulldozer to keep pr up to date with base branch label Apr 7, 2022
}
ipAddresses = addresses
} else {
previousResolutions[address] = ipAddresses
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data race occurred because we were writing to the map from multiple places. One solution would have been to pass around the map and its lock (or some synchronized map). I opted instead to pull this caching behavior out of the plugin altogether.

changelog:
- type: FIX
issueLink: https://github.com/solo-io/gloo/issues/4782
resolvesIssue: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving open for backports

@github-actions
Copy link

github-actions bot commented Apr 7, 2022

Visit the preview URL for this PR (updated for commit 5a212b5):

https://gloo-edge--pr6271-test-flakes-consul-p-5rbepwim.web.app

(expires Fri, 15 Apr 2022 03:57:09 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Copy link
Contributor

@kdorosh kdorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@sam-heilbron sam-heilbron changed the title [WIP] Resolve consul plugin data race Resolve consul plugin data race Apr 8, 2022
@sam-heilbron sam-heilbron marked this pull request as ready for review April 8, 2022 11:03
Copy link
Contributor

@nfuden nfuden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix! This one's been a pain for a while.

@soloio-bulldozer soloio-bulldozer bot merged commit 4f133dd into master Apr 8, 2022
@soloio-bulldozer soloio-bulldozer bot deleted the test-flakes/consul-plugin-data-race branch April 8, 2022 11:09
sam-heilbron added a commit that referenced this pull request Apr 11, 2022
* use dependency injectio to handle fallback resolution

* add changelog

* fixup dns resolvers and test
soloio-bulldozer bot added a commit that referenced this pull request Apr 11, 2022
* Reduce test flakes (#6211)

* First pass of cleanup

* more

* Fixup gateway e2e tests

* add changelog

* fix route options tests?

* cleanup comments in assertions

* Just after each to match just before

* re-add config

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>

* Resolve panic in gateway validation client (#6260)

* add test

* add fix for panicing test

* unfocus test

* add changelog

* goimports -w

* Update changelog/v1.12.0-beta2/robust-client-panic.yaml

Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>

Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>
Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>

* Resolve consul plugin data race (#6271)

* use dependency injectio to handle fallback resolution

* add changelog

* fixup dns resolvers and test

* Resolve data race in setup syncer test (#6282)

* add changelogs

* delete duplicate changelogs

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>
Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>
sam-heilbron added a commit that referenced this pull request Apr 11, 2022
* Reduce test flakes (#6211)

* First pass of cleanup

* more

* Fixup gateway e2e tests

* add changelog

* fix route options tests?

* cleanup comments in assertions

* Just after each to match just before

* re-add config

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>

* Resolve panic in gateway validation client (#6260)

* add test

* add fix for panicing test

* unfocus test

* add changelog

* goimports -w

* Update changelog/v1.12.0-beta2/robust-client-panic.yaml

Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>

Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>
Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>

* Resolve consul plugin data race (#6271)

* use dependency injectio to handle fallback resolution

* add changelog

* fixup dns resolvers and test

* Resolve data race in setup syncer test (#6282)

* add changelogs

* delete duplicate changelogs

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>
Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>
@sam-heilbron sam-heilbron mentioned this pull request Apr 11, 2022
8 tasks
sam-heilbron added a commit that referenced this pull request Apr 11, 2022
* use dependency injectio to handle fallback resolution

* add changelog

* fixup dns resolvers and test
soloio-bulldozer bot added a commit that referenced this pull request Apr 11, 2022
* Reduce test flakes (#6211)

* First pass of cleanup

* more

* Fixup gateway e2e tests

* add changelog

* fix route options tests?

* cleanup comments in assertions

* Just after each to match just before

* re-add config

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>

* Resolve panic in gateway validation client (#6260)

* add test

* add fix for panicing test

* unfocus test

* add changelog

* goimports -w

* Update changelog/v1.12.0-beta2/robust-client-panic.yaml

Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>

Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>
Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>

* Resolve consul plugin data race (#6271)

* use dependency injectio to handle fallback resolution

* add changelog

* fixup dns resolvers and test

* Resolve data race in setup syncer test (#6282)

* add changelogs

* delete duplicate changelogs

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>
Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>
Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>
soloio-bulldozer bot added a commit that referenced this pull request Apr 12, 2022
* Reduce test flakes (#6211)

* First pass of cleanup

* more

* Fixup gateway e2e tests

* add changelog

* fix route options tests?

* cleanup comments in assertions

* Just after each to match just before

* re-add config

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>

* Resolve panic in gateway validation client (#6260)

* add test

* add fix for panicing test

* unfocus test

* add changelog

* goimports -w

* Update changelog/v1.12.0-beta2/robust-client-panic.yaml

Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>

Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>
Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>

* Resolve consul plugin data race (#6271)

* use dependency injectio to handle fallback resolution

* add changelog

* fixup dns resolvers and test

* Resolve data race in setup syncer test (#6282)

* move changelog

* fixup registry

* de duplicate changelogs

* delete dynamic forward proxy test

* delete hybrid test

* undo kube2e changes, panic occurring with gomega

* update changelog

* undo delete

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>
Co-authored-by: Kevin Dorosh <kevin.dorosh@solo.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep pr updated signals bulldozer to keep pr up to date with base branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants