- Summary
- Motivation
- Proposal
- Production Readiness Review Questionnaire
- Alternatives
- Implementation History
This KEP proposes a way to obtain service account token for pods that the CSI
drivers are mounting volumes for. Since these tokens are valid only for a
limited period, this KEP will also give the CSI drivers an option to re-execute
NodePublishVolume
to mount volumes.
Currently, the only way that CSI drivers acquire service account tokens is to directly read the token in the file system. However, this approach has uncharming traits:
- It will not work for csi drivers which run as a different non-root user than the pods. See file permission section for service account token.
- CSI driver will have access to the secrets of pods that do not use it
because the CSI driver should have a
hostPath
volume for thepods
subdirectory to read the token. - The audience of the token is defaulted to kube apiserver.
- The token is not guaranteed to be available (e.g.
automountServiceAccountToken=false
).
- HashiCorp Vault provider for secret store CSI driver requires service account token of the pods they are mounting secrets at to authenticate to Vaults. The provisioned secrets also have given TTL in Vault, so it is necessary get tokens after the initial mount.
- Cert manager CSI dirver will create CertificateRequests on behalf of the pods.
- Amazon EFS CSI driver wants the service account tokens of pods to exchange for AWS credentials.
- Allow CSI driver to request audience-bounded service account tokens of pods
from kubelet to
NodePublishVolume
. - Provide an option to re-execute
NodePublishVolume
in a best-effort manner.
- Other CSI calls e.g.
NodeStageVolume
may not acquire pods' service account tokens via this feature. - Failed re-execution of
NodePublishVolume
will not unmount volumes.
// CSIDriverSpec is the specification of a CSIDriver.
type CSIDriverSpec struct {
... // existing fields
RequiresRepublish *bool
TokenRequests []TokenRequest
}
// TokenRequest contains parameters of a token.
type TokenRequest struct {
Audience string
ExpirationSeconds *int64
}
These three fields are all optional:
-
TokenRequest.Audience
: will be set inTokenRequestSpec
. This -
will default to
APIAudiences
of kube-apiserver if it is empty. The storage provider of the CSI driver is supposed to send aTokenReview
with at least one of the audiences specified. -
TokenRequest.ExpirationSeconds
: will be set inTokenRequestSpec
. The issued token may have a different duration, so theExpirationTimestamp
inTokenRequestStatus
will be passed to CSI driver. -
RequiresRepublish
: should be only set when the mounted volumes by the CSI driver have TTL and require re-validation on the token.- Note: Remount means re-execution of
NodePublishVolume
in scope of CSI and there is no intervening unmounts. If use this option,NodePublishVolume
should only change the contents rather than the mount because container will not be restarted to reflect the mount change. The period between remounts is 0.1s which is hardcoded asreconcilerLoopSleepPeriod
in volume manager. However, the rateTokenRequest
is not 10/s because it will be cached until expiration.
- Note: Remount means re-execution of
The token will be bounded to the pod that the CSI driver is mounting volumes for
and will be set in VolumeContext
:
"csi.storage.k8s.io/serviceAccount.tokens": {
'audience': {
'token': token,
'expiry': expiry,
},
...
}
Take the Vault provider for secret store CSI driver as an example:
- Create
CSIDriver
object withTokenRequests[0].Audience=['vault']
andRequiresRepublish=true
. - When the volume manager of kubelet sees a new volume, the pod object in
mountedPods
will haverequiresRemound=true
afterMarkRemountRequired
is called.MarkRemountRequired
will call intoRequiresRemount
of the in-tree csi plugin to fetch theCSIDriver
object. - Before
NodePublishVolume
call, kubelet will request token fromTokenRequest
api withaudiences=['vault']
. - The token will be specified in
VolumeContext
toNodePublishVolume
call. - Every 0.1 second, the reconciler component of volume manager will remount the volume in case the vault secrets expire and re-login is required.
The RequiresRepublish
is useful when the mounted volumes can expire and the
availability and validity of volumes are continuously required. Those volumes
are most likely credentials which rotates for the best security practice. There
are two options when the remount failed:
- Keep the container/pod running and use the old credentials.
- The next
NodePublishVolume
may succeed if it was unlucky transient failure. - Given there are multiple of 0.1 second usage of stale credentials, it is critical for the credential provisioners to guarantee that the validity is revoked after expiry. In general, it is much harder to eliminate the sinks than source.
- The container/pod will also have better observability in usage of the stale credentials.
- The next
- Kill the container/pod and hopefully the new container/pod has the refreshed
credentials.
- This will reduce the stale volume exposure by one sink.
- More likely to overcome fatal errors.
- Container start-up cost is high
Option 1 is adopted. See discussion here.
- Unit tests around all the added logic in kubelet.
- E2E tests around remount and token passing.
- Implemented the feature.
- Wrote all the unit and E2E tests.
- Deployed the feature in production and went through at least minor k8s.
- Fixed any bugs.
- Deployed the feature in production and went through at least minor k8s. version.
- Wrote stress/scale tests to make sure the feature is still working where large number of pods are running.
-
How can this feature be enabled / disabled in a live cluster?
- Feature gate name: CSIServiceAccountToken
- Components depending on the feature gate: kubelet, kube-apiserver
- Will enabling / disabling the feature require downtime of the control plane? no.
- Will enabling / disabling the feature require downtime or reprovisioning of a node? yes.
-
Does enabling the feature change any default behavior? no.
-
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? yes, as long as the new fields in
CSIDriverSpec
is not used. -
What happens if we reenable the feature if it was previously rolled back? nothing, as long as the new fields in
CSIDriverSpec
is not used. -
Are there any tests for feature enablement/disablement? yes, unit tests will cover this.
- How can a rollout fail? Can it impact already running workloads? Rollout will not fail because this change only exposes an extra field in CSIDriverSpec.
-
What specific metrics should inform a rollback?
storage_operation_duration_seconds
: if the corresponding csi plugin has high error rates by aggregating onstatus
.
-
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? No. When downgrade happens where kube-apiserver doesn't have the added fields, the existing volumes will continue to work as long as it doesn't rely on the acquired token being valid.
-
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? No.
-
How can an operator determine if the feature is in use by workloads? run
kubectl get CSIDriver
to see whethertokenRequests
orrequiresRepublish
is specified. -
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
storage_operation_duration_seconds
- Aggregation method: volume_plugin, operation_name, status
- Components exposing the metric: kubelet
- Metric name:
- Metrics
- What are the reasonable SLOs (Service Level Objectives) for the above SLIs? for the particular csi plugin, per-day percentage of failed storage operations <= 1%
- Are there any missing metrics that would be useful to have to improve observability of this feature? None
-
Does this feature depend on any specific services running in the cluster?
There are no new components required, but requires kubelets >= 1.12
-
Will enabling / using this feature result in any new API calls?
- API call type:
TokenRequest
- estimated throughput: 1(
RequiresRepublish=false
) or 1/ExpirationSeconds/s(RequiresRepublish=true
) for each CSI driver using this feature. - originating component: kubelet
- components listing and/or watching resources they didn't before: n/a.
- API calls that may be triggered by changes of some Kubernetes resources: n/a.
- periodic API calls to reconcile state (e.g. periodic fetching state, heartbeats, leader election, etc.): n/a.
- API call type:
-
Will enabling / using this feature result in introducing new API types? no.
-
Will enabling / using this feature result in any new calls to the cloud provider? no.
-
Will enabling / using this feature result in increasing size or count of the existing API objects? no.
-
Will enabling / using this feature result in increasing time taken by any operations covered by [existing SLIs/SLOs]? no.
-
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? no.
-
How does this feature react if the API server and/or etcd is unavailable?
RequiresRepublish
will continue to function butTokenRequests
will fail. -
What are other known failure modes?
-
Failed to fetch token
- Detection: Check mount failure in Pod events or kubelet log.
- Mitigations: Set
TokenRequests=[]
, subsequentNodePublishVolume
will not have tokens in volume attributes. Tokens retrieved before will eventually expire. - Diagnostics: Search "mounter.SetUpAt failed to get service accoount token attributes"
- Testing: E2E test
-
-
What steps should be taken if SLOs are not being met to determine the problem? None.
- Instead of fetching tokens in kubelet, CSI drivers will be granted
permission to
TokenRequest
api. This will require non-trivial admission plugin to do necessary validation and every csi driver needs to reimplement the same functionality.
- ALPHA: 1.20
- BETA: 1.21
- STABLE: 1.22