Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Data Repository Associations for Lusture 2.12 or newer filesystems(e.g. PERSISTENT_2 deployment type) #368

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -93,12 +93,14 @@ spec:
args:
- --csi-address=$(ADDRESS)
- --v={{ .Values.sidecars.provisioner.logLevel }}
- --timeout=5m
- --timeout=$(TIMEOUT)
- --extra-create-metadata
- --leader-election=true
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
- name: TIMEOUT
everpeace marked this conversation as resolved.
Show resolved Hide resolved
value: {{ .Values.controller.provisionerTimeout }}
volumeMounts:
- name: socket-dir
mountPath: /var/lib/csi/sockets/pluginproxy/
Expand Down
1 change: 1 addition & 0 deletions charts/aws-fsx-csi-driver/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ controller:
- effect: NoExecute
operator: Exists
tolerationSeconds: 300
provisionerTimeout: 5m
Copy link
Author

@everpeace everpeace Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the default provisioner timeout longer in helm chart? It is because it often takes more time to prepare an FSx filesystem when it has data repository associations.

Single FSx for Lusture fielsystem can have up to 8 data repository associations.

In my experience, it usually takes around 7-10 minutes to make single data repository associations available even for empty S3 bucket.
Moreover, setting up data repository associations on the specific filesystem looks like sequential.

So, I think 90 min = 10x8 min (data repository associations) + 5 min (FSx filesystem) + <buffer> would be safe because the current CreateVolume operation is synchronous operation and not be safe when timeout happened.

What do you think??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping the default timer the same + clearly documenting the need to change the timeout if using DRAs would be the correct move. This ensures consistent behavior for users who aren't using DRAs. Extending it is a one way door (because reducing the timeout would break compatibility for users who are using a large number of DRAs).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending it is a one way door (because reducing the timeout would break compatibility for users who are using a large number of DRAs)

It makes sense.

the default timer the same + clearly documenting the need to change the timeout if using DRAs would be the correct move

OK. let me add the documentation.

Copy link
Author

@everpeace everpeace Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in below commits:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the information for users, it seems like users using DRAs will still be fine in most cases:
https://github.com/kubernetes-csi/external-provisioner?tab=readme-ov-file
https://github.com/kubernetes-csi/external-provisioner?tab=readme-ov-file#csi-error-and-timeout-handling
The CreateVolume will timeout and other ones will be made with an exponential backoff. It's only in the case of a large number of DRAs where this will be an issue.


node:
mode: node
Expand Down
5 changes: 4 additions & 1 deletion deploy/kubernetes/base/controller-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,12 +76,15 @@ spec:
args:
- --csi-address=$(ADDRESS)
- --v=2
- --timeout=5m
- --timeout=$(CONTROLLER_PROVISIONER_TIMEOUT)
- --extra-create-metadata
- --leader-election=true
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
envFrom:
- configMapRef:
name: fsx-csi-controller
volumeMounts:
- name: socket-dir
mountPath: /var/lib/csi/sockets/pluginproxy/
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes/base/controller.env
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
CONTROLLER_PROVISIONER_TIMEOUT=5m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value of creating a separate file for this vs. putting it in the values.yaml:
https://github.com/kubernetes-sigs/aws-fsx-csi-driver/blob/master/charts/aws-fsx-csi-driver/values.yaml#L42-L67

Copy link
Author

@everpeace everpeace Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is for manifests only for kustomize. values.yaml is dedicated to helm chart. I understand this driver supports both kustomize and helm.

In kustomize, injecting parameter in building manifest needs a bit hack. This env file is needed for kustomize users to change timeout value. I also updated install.md as below:

https://github.com/everpeace/aws-fsx-csi-driver/blob/suppor-dra/docs/install.md#deploy-driver

# To set CSI controller's provisioner timeout,
# Please follow the instruction
$ cd $(mktemp -d)
$ kustomize init
$ kustomize edit add resource "github.com/kubernetes-sigs/aws-fsx-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.1"
$ kustomize edit add configmap fsx-csi-controller --from-literal=CONTROLLER_PROVISIONER_TIMEOUT=30m --behavior=merge
$ kubectl apply -k .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid hacks when possible and this seems like an avoidable instance. If users want to configure their kustomize templates, they can download them, configure them, and deploy them freely. We should follow precedent in terms of implementation, which is to put it in the values.yaml.

4 changes: 4 additions & 0 deletions deploy/kubernetes/base/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,7 @@ resources:
- clusterrole-csi-node.yaml
- clusterrolebinding-csi-node.yaml

configMapGenerator:
- name: fsx-csi-controller
envs:
- controller.env
29 changes: 28 additions & 1 deletion docs/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,11 @@ eksctl create iamserviceaccount \
"fsx:CreateFileSystem",
"fsx:DeleteFileSystem",
"fsx:DescribeFileSystems",
"fsx:TagResource"
"fsx:TagResource",
"fsx:CreateDataRepositoryAssociation",
"fsx:DescribeDataRepositoryAssociations",
"fsx:DeleteDataRepositoryAssociation",
"fsx:UpdateDataRepositoryAssociation"
],
"Resource": ["*"]
}
Expand All @@ -80,6 +84,10 @@ There are potential race conditions on node startup (especially when a node is f

This feature is activated by default, and cluster administrators should use the taint `fsx.csi.aws.com/agent-not-ready:NoExecute` (any effect will work, but `NoExecute` is recommended). For example, EKS Managed Node Groups [support automatically tainting nodes](https://docs.aws.amazon.com/eks/latest/userguide/node-taints-managed-node-groups.html).

### CSI controller's provisioner timeout

**WARNING** If you will use [Dynamic Provisioning with Data Repository Associations](../examples/kubernetes/dynamic_provisioning_s3_association/README.md), you are strongly recommended to set longer CSI controller's provisioner timeout (e.g. `60m`) because attaching Data Repository Associations to FSx for Lustre filesystem is prone to take long time. Please see the next section for how to set the timeout.

### Deploy driver
You may deploy the FSx for Lustre CSI driver via Kustomize or Helm

Expand All @@ -88,6 +96,16 @@ You may deploy the FSx for Lustre CSI driver via Kustomize or Helm
kubectl apply -k "github.com/kubernetes-sigs/aws-fsx-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.1"
```

```sh
# To set CSI controller's provisioner timeout,
# Please follow the instruction
$ cd $(mktemp -d)
$ kustomize init
$ kustomize edit add resource "github.com/kubernetes-sigs/aws-fsx-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.1"
$ kustomize edit add configmap fsx-csi-controller --from-literal=CONTROLLER_PROVISIONER_TIMEOUT=30m --behavior=merge
$ kubectl apply -k .
```

*Note: Using the master branch to deploy the driver is not supported as the master branch may contain upcoming features incompatible with the currently released stable version of the driver.*

#### Helm
Expand All @@ -104,6 +122,15 @@ helm upgrade --install aws-fsx-csi-driver \
aws-fsx-csi-driver/aws-fsx-csi-driver
```

```sh
# To set CSI controller's provisioner timeout,
# Please follow the instruction
helm template --install aws-fsx-csi-driver \
--namespace kube-system \
--set-string "controller.provisionerTimeout=60m" \
aws-fsx-csi-driver/aws-fsx-csi-driver
```

Review the [configuration values](https://github.com/kubernetes-sigs/aws-fsx-openzfs-csi-driver/blob/master/charts/aws-fsx-csi-driver/values.yaml) for the Helm chart.

#### Once the driver has been deployed, verify the pods are running:
Expand Down
93 changes: 93 additions & 0 deletions examples/kubernetes/dynamic_provisioning_s3_association/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
## Dynamic Provisioning with Data Repository Associations
This example shows how to create a FSx for Lustre filesystem using persistence volume claim (PVC) with data repository associations integration.

Please note that data repository associations are supported on FSx for Lustre 2.12 and 2.15 file systems, excluding scratch_1 deployment type.

This integration means that you can seamlessly access the objects stored in your Amazon S3 buckets from applications mounting your Amazon FSx for Lustre file system. Please check [Using Data Repositories](https://docs.aws.amazon.com/fsx/latest/LustreGuide/fsx-data-repositories.html) for details.

**WARNING** If you will use this feature, you are strongly recommended to set longer CSI controller's provisioner timeout (e.g. `60m`). See [here](../../../docs/install.md#csi-controllers-provisioner-timeout).

### Edit [StorageClass](./specs/storageclass.yaml)
```yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fsx-sc
provisioner: fsx.csi.aws.com
parameters:
subnetId: subnet-056da83524edbe641
securityGroupIds: sg-086f61ea73388fb6b
deploymentType: PERSISTENT_2
perUnitStorageThroughput: "125"
dataRepositoryAssociations: |
- batchImportMetaDataOnCreate: true
dataRepositoryPath: s3://ml-training-data-000
fileSystemPath: /ml-training-data-000
s3:
autoExportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
autoImportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
- batchImportMetaDataOnCreate: true
dataRepositoryPath: s3://ml-training-data-000
fileSystemPath: /ml-training-data-000
s3:
autoExportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
autoImportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
```
* subnetId - the subnet ID that the FSx for Lustre filesystem should be created inside.
* securityGroupIds - a common separated list of security group IDs that should be attached to the filesystem.
* dataRepositoryAssociations - a list of data repository association configurations in yaml to associate with the filesystem. See [./specs/storageclass.yaml](./specs/storageclass.yaml) for details.
* deploymentType (Optional) - FSx for Lustre supports four deployment types, SCRATCH_1, SCRATCH_2, PERSISTENT_1 and PERSISTENT_2. Default: SCRATCH_1. However, data repository association can't be used with SCRATCH_1 deploymentType
* kmsKeyId (Optional) - for deployment types PERSISTENT_1 and PERSISTENT_2, customer can specify a KMS key to use.
* perUnitStorageThroughput (Optional) - for deployment type PERSISTENT_1 and PERSISTENT_2, customer can specify the storage throughput. Default: "200". Note that customer has to specify as a string here like "200" or "100" etc. For PERSISTENT_2 SSD storage, valid values are 125, 250, 500, 1000.
* storageType (Optional) - for deployment type PERSISTENT_1, customer can specify the storage type, either SSD or HDD. Default: "SSD". For PERSISTENT_2 SSD storage, only "SSD" is allowed.
* driveCacheType (Required if storageType is "HDD") - for HDD PERSISTENT_1, specify the type of drive cache, either NONE or READ.
* dataCompressionType (Optional) - FSx for Lustre supports data compression via LZ4 algorithm. Compression is disabled when the value is set to NONE. The default value is NONE

Note:
- `dataRepositoryAssociations` can not be used with `s3ImportPath, s3ExportPath, autoImportPolicy` described in [Dynamic Provisioning with Data Repository](../dynamic_provisioning_s3/)

### Edit [Persistent Volume Claim Spec](./specs/claim.yaml)
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fsx-claim
spec:
accessModes:
- ReadWriteMany
storageClassName: fsx-sc
resources:
requests:
storage: 1200Gi
```
Update `spec.resource.requests.storage` with the storage capacity to request. The storage capacity value will be rounded up to 1200 GiB, 2400 GiB, or a multiple of 3600 GiB for SSD. If the storageType is specified as HDD, the storage capacity will be rounded up to 6000 GiB or a multiple of 6000 GiB if the perUnitStorageThroughput is 12, or rounded up to 1800 or a multiple of 1800 if the perUnitStorageThroughput is 40.

### Deploy the Application
Create PVC, storageclass and the pod that consumes the PV:
```sh
>> kubectl apply -f examples/kubernetes/dynamic_provisioning_s3/specs/storageclass.yaml
>> kubectl apply -f examples/kubernetes/dynamic_provisioning_s3/specs/claim.yaml
>> kubectl apply -f examples/kubernetes/dynamic_provisioning_s3/specs/pod.yaml
```

### Use Case 1: Access S3 files from Lustre filesystem
If you only want to import data and read it without any modification and creation.

You can see S3 files are visible in the persistent volume.

```
>> kubectl exec -it fsx-app ls /data
```

### Use case 2: Export changes to S3
For new files and modified/deleted files, these are automatically export to the linked bucket.

Pod `fsx-app` create a file `out.txt` in mounted volume, then, you can see the file in S3 bucket:

```sh
>> kubectl exec -ti fsx-app -- sh -c 'echo "test" > /data/out.txt'
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fsx-claim
spec:
accessModes:
- ReadWriteMany
storageClassName: fsx-sc
resources:
requests:
storage: 1200Gi
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: v1
kind: Pod
metadata:
name: fsx-app
spec:
containers:
- name: app
image: amazonlinux:2
command: ["/bin/sh"]
securityContext:
privileged: true
args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
volumeMounts:
- name: persistent-storage
mountPath: /data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: fsx-claim
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fsx-sc
provisioner: fsx.csi.aws.com
parameters:
subnetId: subnet-0d7b5e117ad7b4961
securityGroupIds: sg-05a37bfe01467059a
deploymentType: PERSISTENT_2
dataRepositoryAssociations: |
- batchImportMetaDataOnCreate: true
dataRepositoryPath: s3://ml-training-data-000
fileSystemPath: /ml-training-data-000
s3:
autoExportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
autoImportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
- batchImportMetaDataOnCreate: true
dataRepositoryPath: s3://ml-training-data-000
fileSystemPath: /ml-training-data-000
s3:
autoExportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
autoImportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
# - # Set to true to run an import data repository task to import metadata from
# # the data repository to the file system after the data repository association
# # is created. Default is false.
# batchImportMetaDataOnCreate: true
#
# # The path to the Amazon S3 data repository that will be linked to the file
# # system. The path can be an S3 bucket or prefix in the format s3:#myBucket/myPrefix/.
# # This path specifies where in the S3 data repository files will be imported
# # from or exported to.
# # DataRepositoryPath is a required field
# dataRepositoryPath: s3://ml-training-data-000
#
# # A path on the file system that points to a high-level directory (such as
# # /ns1/) or subdirectory (such as /ns1/subdir/) that will be mapped 1-1 with
# # DataRepositoryPath. The leading forward slash in the name is required. Two
# # data repository associations cannot have overlapping file system paths. For
# # example, if a data repository is associated with file system path /ns1/,
# # then you cannot link another data repository with file system path /ns1/ns2.
# #
# # This path specifies where in your file system files will be exported from
# # or imported to. This file system directory can be linked to only one Amazon
# # S3 bucket, and no other S3 bucket can be linked to the directory.
# #
# # If you specify only a forward slash (/) as the file system path, you can
# # link only one data repository to the file system. You can only specify "/"
# # as the file system path for the first data repository associated with a file
# # system.
# fileSystemPath: /
#
# # The configuration for an Amazon S3 data repository linked to an Amazon FSx
# # Lustre file system with a data repository association. The configuration
# # defines which file events (new, changed, or deleted files or directories)
# # are automatically imported from the linked data repository to the file system
# # or automatically exported from the file system to the data repository.
# s3:
# # Specifies the type of updated objects (new, changed, deleted) that will be
# # automatically exported from your file system to the linked S3 bucket.
# autoExportPolicy:
# # The AutoExportPolicy can have the following event values:
# # * NEW - New files and directories are automatically exported to the data
# # repository as they are added to the file system.
# # * CHANGED - Changes to files and directories on the file system are automatically
# # exported to the data repository.
# # * DELETED - Files and directories are automatically deleted on the data
# # repository when they are deleted on the file system.
# # You can define any combination of event types for your AutoExportPolicy.
# events: ["NEW", "CHANGED", "DELETED" ]
# # Specifies the type of updated objects (new, changed, deleted) that will be
# # automatically imported from the linked S3 bucket to your file system.
# autoImportPolicy:
# # The AutoImportPolicy can have the following event values:
# #
# # * NEW - Amazon FSx automatically imports metadata of files added to the
# # linked S3 bucket that do not currently exist in the FSx file system.
# #
# # * CHANGED - Amazon FSx automatically updates file metadata and invalidates
# # existing file content on the file system as files change in the data repository.
# #
# # * DELETED - Amazon FSx automatically deletes files on the file system
# # as corresponding files are deleted in the data repository.
# #
# # You can define any combination of event types for your AutoImportPolicy.
# events: ["NEW", "CHANGED", "DELETED" ]
mountOptions:
- flock
6 changes: 5 additions & 1 deletion hack/kops-patch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,11 @@ spec:
"fsx:CreateFileSystem",
"fsx:DeleteFileSystem",
"fsx:DescribeFileSystems",
"fsx:TagResource"
"fsx:TagResource",
"fsx:CreateDataRepositoryAssociation",
"fsx:DescribeDataRepositoryAssociations",
"fsx:DeleteDataRepositoryAssociation",
"fsx:UpdateDataRepositoryAssociation"
],
"Resource": ["*"]
}
Expand Down
1 change: 1 addition & 0 deletions hack/values.yaml
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
controller:
replicaCount: 1
provisionerTimeout: 45m
Loading