Skip to content

Commit

Permalink
Merge pull request #1197 from mskanth972/csi_driver_taint
Browse files Browse the repository at this point in the history
Add Startup Taint Removal Feature
  • Loading branch information
mskanth972 committed Nov 30, 2023
2 parents 9530c7d + 8c2876a commit f93f899
Show file tree
Hide file tree
Showing 12 changed files with 1,510 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ metadata:
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
verbs: ["get", "list", "watch", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
Expand Down
2 changes: 2 additions & 0 deletions charts/aws-efs-csi-driver/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@ controller:
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- key: efs.csi.aws.com/agent-not-ready
operator: Exists
affinity: {}
# Specifies whether a service account should be created
serviceAccount:
Expand Down
2 changes: 2 additions & 0 deletions deploy/kubernetes/base/node-daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ spec:
priorityClassName: system-node-critical
tolerations:
- operator: Exists
- key: efs.csi.aws.com/agent-not-ready
operator: Exists
securityContext:
fsGroup: 0
runAsGroup: 0
Expand Down
2 changes: 1 addition & 1 deletion deploy/kubernetes/base/node-serviceaccount.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ metadata:
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
verbs: ["get", "list", "watch", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
Expand Down
5 changes: 5 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,11 @@ Since Amazon EFS is an elastic file system, it doesn't really enforce any file s
+ Version `1.4.2` or later of this driver supports using FIPS for mounting file systems. For more information on how to enable FIPS, see [Helm](#-helm-).
+ Take note of the resource quotas for Amazon EFS. For example, there's a quota of 1000 access points that can be created for each Amazon EFS file system. For more information, see [https://docs.aws.amazon.com/efs/latest/ug/limits.html#limits-efs-resources-per-account-per-region](https://docs.aws.amazon.com/efs/latest/ug/limits.html#limits-efs-resources-per-account-per-region).

### Configure node startup taint
There are potential race conditions on node startup (especially when a node is first joining the cluster) where pods/processes that rely on the EFS CSI Driver can act on a node before the EFS CSI Driver is able to startup up and become fully ready. To combat this, the EFS CSI Driver contains a feature to automatically remove a taint from the node on startup. Users can taint their nodes when they join the cluster and/or on startup, to prevent other pods from running and/or being scheduled on the node prior to the EFS CSI Driver becoming ready.

This feature is activated by default, and cluster administrators should use the taint `efs.csi.aws.com/agent-not-ready:NoExecute` (any effect will work, but `NoExecute` is recommended). For example, EKS Managed Node Groups [support automatically tainting nodes](https://docs.aws.amazon.com/eks/latest/userguide/node-taints-managed-node-groups.html).

**Prerequisites**
+ An existing AWS Identity and Access Management \(IAM\) OpenID Connect \(OIDC\) provider for your cluster. To determine whether you already have one, or to create one, see [Creating an IAM OIDC provider for your cluster](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html).
+ The AWS CLI installed and configured on your device or AWS CloudShell. To install the latest version, see [Installing, updating, and uninstalling the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html) and [Quick configuration with `aws configure`](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html#cli-configure-quickstart-config) in the AWS Command Line Interface User Guide. The AWS CLI version installed in the AWS CloudShell may also be several versions behind the latest version. To update it, see [Installing AWS CLI to your home directory](https://docs.aws.amazon.com/cloudshell/latest/userguide/vm-specs.html#install-cli-software) in the AWS CloudShell User Guide.
Expand Down
4 changes: 4 additions & 0 deletions hack/update-gomock
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ mockgen -build_flags=--mod=mod -package=mocks -destination=./pkg/driver/mocks/mo
mockgen -build_flags=--mod=mod -package=mocks -destination=./pkg/cloud/mocks/mock_ec2metadata.go ${IMPORT_PATH}/pkg/cloud EC2Metadata
mockgen -build_flags=--mod=mod -package=mocks -destination=./pkg/cloud/mocks/mock_taskmetadata.go ${IMPORT_PATH}/pkg/cloud TaskMetadataService

# Reflection-based mocking for external dependencies
mockgen -build_flags=--mod=mod -package=mocks -destination=./pkg/driver/mocks/mock_k8s_client.go -mock_names='Interface=MockKubernetesClient' k8s.io/client-go/kubernetes Interface
mockgen -build_flags=--mod=mod -package=mocks -destination=./pkg/driver/mocks/mock_k8s_corev1.go k8s.io/client-go/kubernetes/typed/core/v1 CoreV1Interface,NodeInterface

# Fixes "Mounter Type cannot implement 'Mounter' as it has a non-exported method and is defined in a different package"
# See https://github.com/kubernetes/mount-utils/commit/a20fcfb15a701977d086330b47b7efad51eb608e for context.
sed -i 's/mount /mount_utils /g' pkg/driver/mocks/mock_mount.go
Expand Down
2 changes: 2 additions & 0 deletions pkg/cloud/metadata.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ type metadata struct {

var _ MetadataService = &metadata{}

type KubernetesAPIClient func() (kubernetes.Interface, error)

// GetInstanceID returns the instance identification.
func (m *metadata) GetInstanceID() string {
return m.instanceID
Expand Down
10 changes: 10 additions & 0 deletions pkg/driver/driver.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ import (

const (
driverName = "efs.csi.aws.com"

// AgentNotReadyTaintKey contains the key of taints to be removed on driver startup
AgentNotReadyNodeTaintKey = "efs.csi.aws.com/agent-not-ready"
)

type Driver struct {
Expand Down Expand Up @@ -124,6 +127,13 @@ func (d *Driver) Run() error {
klog.Info("Starting reaper")
reaper.start()

// Remove taint from node to indicate driver startup success
// This is done at the last possible moment to prevent race conditions or false positive removals
err = removeNotReadyTaint(cloud.DefaultKubernetesAPIClient)
if err != nil {
klog.ErrorS(err, "Unexpected failure when attempting to remove node taint(s)")
}

klog.Infof("Listening for connections on address: %#v", listener.Addr())
return d.srv.Serve(listener)
}
Expand Down
Loading

0 comments on commit f93f899

Please sign in to comment.