Skip to content

Commit

Permalink
Add an example for Data Repository Associations support
Browse files Browse the repository at this point in the history
  • Loading branch information
everpeace committed Dec 27, 2023
1 parent d8e002c commit d7260f4
Show file tree
Hide file tree
Showing 4 changed files with 212 additions and 0 deletions.
90 changes: 90 additions & 0 deletions examples/kubernetes/dynamic_provisioning_s3_association/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
## Dynamic Provisioning with Data Repository Associations
This example shows how to create a FSx for Lustre filesystem using persistence volume claim (PVC) with data repository associations integration.

Please note that data repository associations are supported on FSx for Lustre 2.12 and 2.15 file systems, excluding scratch_1 deployment type.

This integration means that you can seamlessly access the objects stored in your Amazon S3 buckets from applications mounting your Amazon FSx for Lustre file system. Please check [Using Data Repositories](https://docs.aws.amazon.com/fsx/latest/LustreGuide/fsx-data-repositories.html) for details.

### Edit [StorageClass](./specs/storageclass.yaml)
```yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fsx-sc
provisioner: fsx.csi.aws.com
parameters:
subnetId: subnet-056da83524edbe641
securityGroupIds: sg-086f61ea73388fb6b
deploymentType: PERSISTENT_2
dataRepositoryAssociations: |
- batchImportMetaDataOnCreate: true
dataRepositoryPath: s3://ml-training-data-000
fileSystemPath: /ml-training-data-000
s3:
autoExportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
autoImportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
- batchImportMetaDataOnCreate: true
dataRepositoryPath: s3://ml-training-data-000
fileSystemPath: /ml-training-data-000
s3:
autoExportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
autoImportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
```
* subnetId - the subnet ID that the FSx for Lustre filesystem should be created inside.
* securityGroupIds - a common separated list of security group IDs that should be attached to the filesystem.
* dataRepositoryAssociations - a list of data repository association configurations in yaml to associate with the filesystem. See [./specs/storageclass.yaml](./specs/storageclass.yaml) for details.
* deploymentType (Optional) - FSx for Lustre supports four deployment types, SCRATCH_1, SCRATCH_2, PERSISTENT_1 and PERSISTENT_2. Default: SCRATCH_1. However, data repository association can't be used with SCRATCH_1 deploymentType
* kmsKeyId (Optional) - for deployment types PERSISTENT_1 and PERSISTENT_2, customer can specify a KMS key to use.
* perUnitStorageThroughput (Optional) - for deployment type PERSISTENT_1 and PERSISTENT_2, customer can specify the storage throughput. Default: "200". Note that customer has to specify as a string here like "200" or "100" etc. For PERSISTENT_2 SSD storage, valid values are 125, 250, 500, 1000.
* storageType (Optional) - for deployment type PERSISTENT_1, customer can specify the storage type, either SSD or HDD. Default: "SSD". For PERSISTENT_2 SSD storage, only "SSD" is allowed.
* driveCacheType (Required if storageType is "HDD") - for HDD PERSISTENT_1, specify the type of drive cache, either NONE or READ.
* dataCompressionType (Optional) - FSx for Lustre supports data compression via LZ4 algorithm. Compression is disabled when the value is set to NONE. The default value is NONE

Note:
- `dataRepositoryAssociations` can not be used with `s3ImportPath, s3ExportPath, autoImportPolicy` described in [Dynamic Provisioning with Data Repository](../dynamic_provisioning_s3/)

### Edit [Persistent Volume Claim Spec](./specs/claim.yaml)
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fsx-claim
spec:
accessModes:
- ReadWriteMany
storageClassName: fsx-sc
resources:
requests:
storage: 1200Gi
```
Update `spec.resource.requests.storage` with the storage capacity to request. The storage capacity value will be rounded up to 1200 GiB, 2400 GiB, or a multiple of 3600 GiB for SSD. If the storageType is specified as HDD, the storage capacity will be rounded up to 6000 GiB or a multiple of 6000 GiB if the perUnitStorageThroughput is 12, or rounded up to 1800 or a multiple of 1800 if the perUnitStorageThroughput is 40.

### Deploy the Application
Create PVC, storageclass and the pod that consumes the PV:
```sh
>> kubectl apply -f examples/kubernetes/dynamic_provisioning_s3/specs/storageclass.yaml
>> kubectl apply -f examples/kubernetes/dynamic_provisioning_s3/specs/claim.yaml
>> kubectl apply -f examples/kubernetes/dynamic_provisioning_s3/specs/pod.yaml
```

### Use Case 1: Access S3 files from Lustre filesystem
If you only want to import data and read it without any modification and creation.

You can see S3 files are visible in the persistent volume.

```
>> kubectl exec -it fsx-app ls /data
```

### Use case 2: Export changes to S3
For new files and modified/deleted files, these are automatically export to the linked bucket.

Pod `fsx-app` create a file `out.txt` in mounted volume, then, you can see the file in S3 bucket:

```sh
>> kubectl exec -ti fsx-app -- sh -c 'echo "test" > /data/out.txt'
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fsx-claim
spec:
accessModes:
- ReadWriteMany
storageClassName: fsx-sc
resources:
requests:
storage: 1200Gi
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: v1
kind: Pod
metadata:
name: fsx-app
spec:
containers:
- name: app
image: amazonlinux:2
command: ["/bin/sh"]
securityContext:
privileged: true
args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
volumeMounts:
- name: persistent-storage
mountPath: /data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: fsx-claim
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fsx-sc
provisioner: fsx.csi.aws.com
parameters:
subnetId: subnet-0d7b5e117ad7b4961
securityGroupIds: sg-05a37bfe01467059a
deploymentType: PERSISTENT_2
dataRepositoryAssociations: |
- batchImportMetaDataOnCreate: true
dataRepositoryPath: s3://ml-training-data-000
fileSystemPath: /ml-training-data-000
s3:
autoExportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
autoImportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
- batchImportMetaDataOnCreate: true
dataRepositoryPath: s3://ml-training-data-000
fileSystemPath: /ml-training-data-000
s3:
autoExportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
autoImportPolicy:
events: ["NEW", "CHANGED", "DELETED" ]
# - # Set to true to run an import data repository task to import metadata from
# # the data repository to the file system after the data repository association
# # is created. Default is false.
# batchImportMetaDataOnCreate: true
#
# # The path to the Amazon S3 data repository that will be linked to the file
# # system. The path can be an S3 bucket or prefix in the format s3:#myBucket/myPrefix/.
# # This path specifies where in the S3 data repository files will be imported
# # from or exported to.
# # DataRepositoryPath is a required field
# dataRepositoryPath: s3://ml-training-data-000
#
# # A path on the file system that points to a high-level directory (such as
# # /ns1/) or subdirectory (such as /ns1/subdir/) that will be mapped 1-1 with
# # DataRepositoryPath. The leading forward slash in the name is required. Two
# # data repository associations cannot have overlapping file system paths. For
# # example, if a data repository is associated with file system path /ns1/,
# # then you cannot link another data repository with file system path /ns1/ns2.
# #
# # This path specifies where in your file system files will be exported from
# # or imported to. This file system directory can be linked to only one Amazon
# # S3 bucket, and no other S3 bucket can be linked to the directory.
# #
# # If you specify only a forward slash (/) as the file system path, you can
# # link only one data repository to the file system. You can only specify "/"
# # as the file system path for the first data repository associated with a file
# # system.
# fileSystemPath: /
#
# # The configuration for an Amazon S3 data repository linked to an Amazon FSx
# # Lustre file system with a data repository association. The configuration
# # defines which file events (new, changed, or deleted files or directories)
# # are automatically imported from the linked data repository to the file system
# # or automatically exported from the file system to the data repository.
# s3:
# # Specifies the type of updated objects (new, changed, deleted) that will be
# # automatically exported from your file system to the linked S3 bucket.
# autoExportPolicy:
# # The AutoExportPolicy can have the following event values:
# # * NEW - New files and directories are automatically exported to the data
# # repository as they are added to the file system.
# # * CHANGED - Changes to files and directories on the file system are automatically
# # exported to the data repository.
# # * DELETED - Files and directories are automatically deleted on the data
# # repository when they are deleted on the file system.
# # You can define any combination of event types for your AutoExportPolicy.
# events: ["NEW", "CHANGED", "DELETED" ]
# # Specifies the type of updated objects (new, changed, deleted) that will be
# # automatically imported from the linked S3 bucket to your file system.
# autoImportPolicy:
# # The AutoImportPolicy can have the following event values:
# #
# # * NEW - Amazon FSx automatically imports metadata of files added to the
# # linked S3 bucket that do not currently exist in the FSx file system.
# #
# # * CHANGED - Amazon FSx automatically updates file metadata and invalidates
# # existing file content on the file system as files change in the data repository.
# #
# # * DELETED - Amazon FSx automatically deletes files on the file system
# # as corresponding files are deleted in the data repository.
# #
# # You can define any combination of event types for your AutoImportPolicy.
# events: ["NEW", "CHANGED", "DELETED" ]
mountOptions:
- flock

0 comments on commit d7260f4

Please sign in to comment.