Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of CSI: fix data race in plugin manager into release/1.1.x #12703

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #12553 to be assessed for backporting due to the inclusion of the label backport/1.1.x.

WARNING automatic cherry-pick of commits failed. Commits will require human attention.

The below text is copied from the body of the original PR.


The plugin manager for CSI hands out instances of a plugin for callers
that need to mount a volume. The MounterForPlugin method accesses
the internal instances map without a lock, and can be called
concurrently from outside the plugin manager's main run-loop.

The original commit for the instances map included a warning that it
needed to be accessed only from the main loop but that comment was
unfortunately ignored shortly thereafter, so this bug has existed in
the code for a couple years without being detected until we ran tests
with -race in #12098. Lesson learned here: comments make for lousy
enforcement of invariants!

test outputs with -race

client RPC tests:

$ NOMAD_TEST_LOG_LEVEL=OFF go test -race -v ./client -run TestCSI
=== RUN   TestCSIController_AttachVolume
=== PAUSE TestCSIController_AttachVolume
=== RUN   TestCSIController_ValidateVolume
=== PAUSE TestCSIController_ValidateVolume
=== RUN   TestCSIController_DetachVolume
=== PAUSE TestCSIController_DetachVolume
=== RUN   TestCSIController_CreateVolume
=== PAUSE TestCSIController_CreateVolume
=== RUN   TestCSIController_DeleteVolume
=== PAUSE TestCSIController_DeleteVolume
=== RUN   TestCSIController_ListVolumes
=== PAUSE TestCSIController_ListVolumes
=== RUN   TestCSIController_CreateSnapshot
=== PAUSE TestCSIController_CreateSnapshot
=== RUN   TestCSIController_DeleteSnapshot
=== PAUSE TestCSIController_DeleteSnapshot
=== RUN   TestCSIController_ListSnapshots
=== PAUSE TestCSIController_ListSnapshots
=== RUN   TestCSINode_DetachVolume
=== PAUSE TestCSINode_DetachVolume
=== CONT  TestCSIController_AttachVolume
=== RUN   TestCSIController_AttachVolume/returns_plugin_not_found_errors
=== RUN   TestCSIController_AttachVolume/validates_volumeid_is_not_empty
=== RUN   TestCSIController_AttachVolume/validates_nodeid_is_not_empty
=== RUN   TestCSIController_AttachVolume/validates_AccessMode
=== RUN   TestCSIController_AttachVolume/validates_attachmentmode_is_not_empty
=== RUN   TestCSIController_AttachVolume/returns_transitive_errors
=== RUN   TestCSIController_AttachVolume/handles_nil_PublishContext
=== RUN   TestCSIController_AttachVolume/handles_non-nil_PublishContext
--- PASS: TestCSIController_AttachVolume (0.74s)
    --- PASS: TestCSIController_AttachVolume/returns_plugin_not_found_errors (0.10s)
    --- PASS: TestCSIController_AttachVolume/validates_volumeid_is_not_empty (0.10s)
    --- PASS: TestCSIController_AttachVolume/validates_nodeid_is_not_empty (0.09s)
    --- PASS: TestCSIController_AttachVolume/validates_AccessMode (0.09s)
    --- PASS: TestCSIController_AttachVolume/validates_attachmentmode_is_not_empty (0.09s)
    --- PASS: TestCSIController_AttachVolume/returns_transitive_errors (0.10s)
    --- PASS: TestCSIController_AttachVolume/handles_nil_PublishContext (0.08s)
    --- PASS: TestCSIController_AttachVolume/handles_non-nil_PublishContext (0.09s)
=== CONT  TestCSINode_DetachVolume
=== RUN   TestCSINode_DetachVolume/returns_plugin_not_found_errors
=== RUN   TestCSINode_DetachVolume/validates_volumeid_is_not_empty
=== RUN   TestCSINode_DetachVolume/validates_nodeid_is_not_empty
=== RUN   TestCSINode_DetachVolume/returns_transitive_errors
--- PASS: TestCSINode_DetachVolume (0.36s)
    --- PASS: TestCSINode_DetachVolume/returns_plugin_not_found_errors (0.09s)
    --- PASS: TestCSINode_DetachVolume/validates_volumeid_is_not_empty (0.08s)
    --- PASS: TestCSINode_DetachVolume/validates_nodeid_is_not_empty (0.09s)
    --- PASS: TestCSINode_DetachVolume/returns_transitive_errors (0.10s)
=== CONT  TestCSIController_ListSnapshots
=== RUN   TestCSIController_ListSnapshots/returns_plugin_not_found_errors
=== RUN   TestCSIController_ListSnapshots/returns_transitive_errors
=== RUN   TestCSIController_ListSnapshots/returns_volumes
--- PASS: TestCSIController_ListSnapshots (0.27s)
    --- PASS: TestCSIController_ListSnapshots/returns_plugin_not_found_errors (0.09s)
    --- PASS: TestCSIController_ListSnapshots/returns_transitive_errors (0.08s)
    --- PASS: TestCSIController_ListSnapshots/returns_volumes (0.09s)
=== CONT  TestCSIController_CreateSnapshot
=== RUN   TestCSIController_CreateSnapshot/returns_plugin_not_found_errors
=== RUN   TestCSIController_CreateSnapshot/returns_transitive_errors
=== RUN   TestCSIController_CreateSnapshot/returns_snapshot_on_success
--- PASS: TestCSIController_CreateSnapshot (0.26s)
    --- PASS: TestCSIController_CreateSnapshot/returns_plugin_not_found_errors (0.09s)
    --- PASS: TestCSIController_CreateSnapshot/returns_transitive_errors (0.09s)
    --- PASS: TestCSIController_CreateSnapshot/returns_snapshot_on_success (0.08s)
=== CONT  TestCSIController_DeleteSnapshot
=== RUN   TestCSIController_DeleteSnapshot/returns_plugin_not_found_errors
=== RUN   TestCSIController_DeleteSnapshot/returns_transitive_errors
--- PASS: TestCSIController_DeleteSnapshot (0.17s)
    --- PASS: TestCSIController_DeleteSnapshot/returns_plugin_not_found_errors (0.08s)
    --- PASS: TestCSIController_DeleteSnapshot/returns_transitive_errors (0.08s)
=== CONT  TestCSIController_DeleteVolume
=== RUN   TestCSIController_DeleteVolume/returns_plugin_not_found_errors
=== RUN   TestCSIController_DeleteVolume/returns_transitive_errors
--- PASS: TestCSIController_DeleteVolume (0.17s)
    --- PASS: TestCSIController_DeleteVolume/returns_plugin_not_found_errors (0.08s)
    --- PASS: TestCSIController_DeleteVolume/returns_transitive_errors (0.08s)
=== CONT  TestCSIController_ListVolumes
=== RUN   TestCSIController_ListVolumes/returns_plugin_not_found_errors
=== RUN   TestCSIController_ListVolumes/returns_transitive_errors
=== RUN   TestCSIController_ListVolumes/returns_volumes
--- PASS: TestCSIController_ListVolumes (0.26s)
    --- PASS: TestCSIController_ListVolumes/returns_plugin_not_found_errors (0.09s)
    --- PASS: TestCSIController_ListVolumes/returns_transitive_errors (0.09s)
    --- PASS: TestCSIController_ListVolumes/returns_volumes (0.09s)
=== CONT  TestCSIController_DetachVolume
=== RUN   TestCSIController_DetachVolume/returns_plugin_not_found_errors
=== RUN   TestCSIController_DetachVolume/validates_volumeid_is_not_empty
=== RUN   TestCSIController_DetachVolume/validates_nodeid_is_not_empty
=== RUN   TestCSIController_DetachVolume/returns_transitive_errors
--- PASS: TestCSIController_DetachVolume (0.35s)
    --- PASS: TestCSIController_DetachVolume/returns_plugin_not_found_errors (0.08s)
    --- PASS: TestCSIController_DetachVolume/validates_volumeid_is_not_empty (0.08s)
    --- PASS: TestCSIController_DetachVolume/validates_nodeid_is_not_empty (0.09s)
    --- PASS: TestCSIController_DetachVolume/returns_transitive_errors (0.09s)
=== CONT  TestCSIController_ValidateVolume
=== RUN   TestCSIController_ValidateVolume/validates_volumeid_is_not_empty
=== RUN   TestCSIController_ValidateVolume/returns_plugin_not_found_errors
=== RUN   TestCSIController_ValidateVolume/validates_attachmentmode
=== RUN   TestCSIController_ValidateVolume/validates_AccessMode
=== RUN   TestCSIController_ValidateVolume/returns_transitive_errors
--- PASS: TestCSIController_ValidateVolume (0.44s)
    --- PASS: TestCSIController_ValidateVolume/validates_volumeid_is_not_empty (0.08s)
    --- PASS: TestCSIController_ValidateVolume/returns_plugin_not_found_errors (0.08s)
    --- PASS: TestCSIController_ValidateVolume/validates_attachmentmode (0.10s)
    --- PASS: TestCSIController_ValidateVolume/validates_AccessMode (0.09s)
    --- PASS: TestCSIController_ValidateVolume/returns_transitive_errors (0.09s)
=== CONT  TestCSIController_CreateVolume
=== RUN   TestCSIController_CreateVolume/returns_plugin_not_found_errors
=== RUN   TestCSIController_CreateVolume/validates_AccessMode
=== RUN   TestCSIController_CreateVolume/validates_attachmentmode_is_not_empty
=== RUN   TestCSIController_CreateVolume/returns_transitive_errors
--- PASS: TestCSIController_CreateVolume (0.37s)
    --- PASS: TestCSIController_CreateVolume/returns_plugin_not_found_errors (0.11s)
    --- PASS: TestCSIController_CreateVolume/validates_AccessMode (0.09s)
    --- PASS: TestCSIController_CreateVolume/validates_attachmentmode_is_not_empty (0.08s)
    --- PASS: TestCSIController_CreateVolume/returns_transitive_errors (0.09s)
PASS
ok      github.com/hashicorp/nomad/client       3.520s

plugin manager tests:

$ NOMAD_TEST_LOG_LEVEL=warn go test -race -v ./client/pluginmanager/csimanager/
=== RUN   TestBuildBasicFingerprint_Node
=== RUN   TestBuildBasicFingerprint_Node/Minimal_successful_response
=== RUN   TestBuildBasicFingerprint_Node/Successful_response_with_capabilities_and_topologies
=== RUN   TestBuildBasicFingerprint_Node/PluginGetCapabilities_Failed
=== RUN   TestBuildBasicFingerprint_Node/NodeGetInfo_Failed
--- PASS: TestBuildBasicFingerprint_Node (0.00s)
    --- PASS: TestBuildBasicFingerprint_Node/Minimal_successful_response (0.00s)
    --- PASS: TestBuildBasicFingerprint_Node/Successful_response_with_capabilities_and_topologies (0.00s)
    --- PASS: TestBuildBasicFingerprint_Node/PluginGetCapabilities_Failed (0.00s)
    --- PASS: TestBuildBasicFingerprint_Node/NodeGetInfo_Failed (0.00s)
=== RUN   TestBuildControllerFingerprint
=== RUN   TestBuildControllerFingerprint/Minimal_successful_response
=== RUN   TestBuildControllerFingerprint/Successful_response_with_capabilities
=== RUN   TestBuildControllerFingerprint/ControllerGetCapabilities_Failed
--- PASS: TestBuildControllerFingerprint (0.00s)
    --- PASS: TestBuildControllerFingerprint/Minimal_successful_response (0.00s)
    --- PASS: TestBuildControllerFingerprint/Successful_response_with_capabilities (0.00s)
    --- PASS: TestBuildControllerFingerprint/ControllerGetCapabilities_Failed (0.00s)
=== RUN   TestBuildNodeFingerprint
=== RUN   TestBuildNodeFingerprint/Minimal_successful_response
=== RUN   TestBuildNodeFingerprint/Successful_response_with_capabilities_and_topologies
=== RUN   TestBuildNodeFingerprint/NodeGetCapabilities_Failed
--- PASS: TestBuildNodeFingerprint (0.00s)
    --- PASS: TestBuildNodeFingerprint/Minimal_successful_response (0.00s)
    --- PASS: TestBuildNodeFingerprint/Successful_response_with_capabilities_and_topologies (0.00s)
    --- PASS: TestBuildNodeFingerprint/NodeGetCapabilities_Failed (0.00s)
=== RUN   TestInstanceManager_Shutdown
--- PASS: TestInstanceManager_Shutdown (0.02s)
=== RUN   TestManager_RegisterPlugin
--- PASS: TestManager_RegisterPlugin (0.01s)
=== RUN   TestManager_DeregisterPlugin
--- PASS: TestManager_DeregisterPlugin (0.02s)
=== RUN   TestManager_MultiplePlugins
--- PASS: TestManager_MultiplePlugins (0.03s)
=== RUN   TestManager_ConcurrentPlugins
=== RUN   TestManager_ConcurrentPlugins/replacement_races_on_host_restart
=== RUN   TestManager_ConcurrentPlugins/interleaved_register_and_deregister
--- PASS: TestManager_ConcurrentPlugins (0.06s)
    --- PASS: TestManager_ConcurrentPlugins/replacement_races_on_host_restart (0.03s)
    --- PASS: TestManager_ConcurrentPlugins/interleaved_register_and_deregister (0.02s)
=== RUN   TestUsageTracker
=== RUN   TestUsageTracker/Register_and_deregister_all_allocs
=== RUN   TestUsageTracker/Register_all_and_deregister_partial_allocs
--- PASS: TestUsageTracker (0.00s)
    --- PASS: TestUsageTracker/Register_and_deregister_all_allocs (0.00s)
    --- PASS: TestUsageTracker/Register_all_and_deregister_partial_allocs (0.00s)
=== RUN   TestVolumeManager_ensureStagingDir
=== PAUSE TestVolumeManager_ensureStagingDir
=== RUN   TestVolumeManager_stageVolume
=== PAUSE TestVolumeManager_stageVolume
=== RUN   TestVolumeManager_unstageVolume
=== PAUSE TestVolumeManager_unstageVolume
=== RUN   TestVolumeManager_publishVolume
=== PAUSE TestVolumeManager_publishVolume
=== RUN   TestVolumeManager_unpublishVolume
=== PAUSE TestVolumeManager_unpublishVolume
=== RUN   TestVolumeManager_MountVolumeEvents
=== PAUSE TestVolumeManager_MountVolumeEvents
=== CONT  TestVolumeManager_stageVolume
=== RUN   TestVolumeManager_stageVolume/Returns_an_error_when_an_invalid_AttachmentMode_is_provided
=== RUN   TestVolumeManager_stageVolume/Returns_an_error_when_an_invalid_AccessMode_is_provided
=== RUN   TestVolumeManager_stageVolume/Returns_an_error_when_the_plugin_returns_an_error
=== RUN   TestVolumeManager_stageVolume/Happy_Path
--- PASS: TestVolumeManager_stageVolume (0.01s)
    --- PASS: TestVolumeManager_stageVolume/Returns_an_error_when_an_invalid_AttachmentMode_is_provided (0.00s)
    --- PASS: TestVolumeManager_stageVolume/Returns_an_error_when_an_invalid_AccessMode_is_provided (0.00s)
    --- PASS: TestVolumeManager_stageVolume/Returns_an_error_when_the_plugin_returns_an_error (0.00s)
    --- PASS: TestVolumeManager_stageVolume/Happy_Path (0.00s)
=== CONT  TestVolumeManager_MountVolumeEvents
--- PASS: TestVolumeManager_MountVolumeEvents (0.00s)
=== CONT  TestVolumeManager_unpublishVolume
=== RUN   TestVolumeManager_unpublishVolume/Returns_an_error_when_the_plugin_returns_an_error
=== RUN   TestVolumeManager_unpublishVolume/Happy_Path
--- PASS: TestVolumeManager_unpublishVolume (0.00s)
    --- PASS: TestVolumeManager_unpublishVolume/Returns_an_error_when_the_plugin_returns_an_error (0.00s)
    --- PASS: TestVolumeManager_unpublishVolume/Happy_Path (0.00s)
=== CONT  TestVolumeManager_unstageVolume
=== RUN   TestVolumeManager_unstageVolume/Returns_an_error_when_the_plugin_returns_an_error
=== RUN   TestVolumeManager_unstageVolume/Happy_Path
--- PASS: TestVolumeManager_unstageVolume (0.00s)
    --- PASS: TestVolumeManager_unstageVolume/Returns_an_error_when_the_plugin_returns_an_error (0.00s)
    --- PASS: TestVolumeManager_unstageVolume/Happy_Path (0.00s)
=== CONT  TestVolumeManager_publishVolume
=== RUN   TestVolumeManager_publishVolume/Returns_an_error_when_the_plugin_returns_an_error
=== RUN   TestVolumeManager_publishVolume/Happy_Path
=== RUN   TestVolumeManager_publishVolume/Mount_options_in_the_volume
=== RUN   TestVolumeManager_publishVolume/Mount_options_override_in_the_request
--- PASS: TestVolumeManager_publishVolume (0.00s)
    --- PASS: TestVolumeManager_publishVolume/Returns_an_error_when_the_plugin_returns_an_error (0.00s)
    --- PASS: TestVolumeManager_publishVolume/Happy_Path (0.00s)
    --- PASS: TestVolumeManager_publishVolume/Mount_options_in_the_volume (0.00s)
    --- PASS: TestVolumeManager_publishVolume/Mount_options_override_in_the_request (0.00s)
=== CONT  TestVolumeManager_ensureStagingDir
=== RUN   TestVolumeManager_ensureStagingDir/Creates_a_directory_when_one_does_not_exist
=== RUN   TestVolumeManager_ensureStagingDir/Does_not_fail_because_of_a_pre-existing_directory
=== RUN   TestVolumeManager_ensureStagingDir/Returns_negative_mount_info
=== RUN   TestVolumeManager_ensureStagingDir/Returns_positive_mount_info
    volume_test.go:92: TODO: Skipped because we don't detect bind mounts
--- PASS: TestVolumeManager_ensureStagingDir (0.00s)
    --- PASS: TestVolumeManager_ensureStagingDir/Creates_a_directory_when_one_does_not_exist (0.00s)
    --- PASS: TestVolumeManager_ensureStagingDir/Does_not_fail_because_of_a_pre-existing_directory (0.00s)
    --- PASS: TestVolumeManager_ensureStagingDir/Returns_negative_mount_info (0.00s)
    --- SKIP: TestVolumeManager_ensureStagingDir/Returns_positive_mount_info (0.00s)
PASS
ok      github.com/hashicorp/nomad/client/pluginmanager/csimanager      0.235s

@hashicorp-cla
Copy link

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes


temp seems not to be a GitHub user.
You need a GitHub account to be able to sign the CLA. If you already have a GitHub account, please add the email address used for this commit to your account.

Have you signed the CLA already but the status is still pending? Recheck it.

@lgfa29
Copy link
Contributor

lgfa29 commented Apr 20, 2022

Manually backported

@lgfa29 lgfa29 closed this Apr 20, 2022
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants