Skip to content

Commit

Permalink
docs: update README
Browse files Browse the repository at this point in the history
  • Loading branch information
ronjaquensel committed Dec 18, 2024
1 parent d8a759d commit a077837
Showing 1 changed file with 73 additions and 7 deletions.
80 changes: 73 additions & 7 deletions extensions/data-plane/data-plane-aws-s3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ This module contains a Data Plane extension to copy data to and from Aws S3.

When as a source, it supports copying a single or multiple objects.

### DataAddress Schema
## DataAddress Schema

#### Properties
### Properties

| Key | Description | Applies at | Mandatory |
|:-------------------|:-----------------------------------------------------------------------|-------------------------|-------------------------------------------------------|
Expand All @@ -23,7 +23,7 @@ When as a source, it supports copying a single or multiple objects.
| `accessKeyId` | Defines the access key id to access S3 Bucket/Object | `source`, `destination` | `false` |
| `secretAccessKey` | Defines the secret access key id to access S3 Bucket/Object | `source`, `destination` | `false` |

#### S3DataSource Properties and behavior
### S3DataSource Properties and behavior

The behavior of object transfers can be customized using `DataAddress` properties.

Expand All @@ -34,7 +34,7 @@ The behavior of object transfers can be customized using `DataAddress` propertie

> Note: Using `objectPrefix` introduces an additional step to list all objects whose keys match the specified prefix.
#### S3DataSink Properties and behavior
### S3DataSink Properties and behavior

The destination's object naming can be tailored further through the utilization of `DataAddress` properties.

Expand All @@ -44,7 +44,7 @@ The destination's object naming can be tailored further through the utilization
- The `folderName` property can consistently group objects in the destination, whether there is a single object or
multiple objects.

#### Secret Resolution
### Secret Resolution

The `keyName` property should point to a `vault` entry that contains a JSON-serialized `SecretToken` object. The
possible values are:
Expand Down Expand Up @@ -88,7 +88,7 @@ Example:
}
```

#### Plain text credentials
### Plain text credentials

This feature has been introduced to provide flexibility by not mandating the use of a `vault`. However, it is important
to note that this functionality is not recommended for production environments.
Expand All @@ -109,6 +109,8 @@ Example:
}
```

### Data Address Examples

#### Source - Data Address Example

- Single object:
Expand Down Expand Up @@ -165,7 +167,71 @@ Example:
}
```

## Required AWS permissions

The secrets described above should contain credentials for a user/role with the following permissions.

### Source

In order to be able to use an S3 bucket as the source of a transfer, the user/role needs the following permissions:
- `s3:ListBucket` on the bucket: `"arn:aws:s3:::<bucket-name>"` (only applies if `objectPrefix` is used)
- `s3:GetObject` on the object(s) in the bucket, e.g.:
- `"arn:aws:s3:::<bucket-name>/*"`
- `"arn:aws:s3:::<bucket-name>/<object-name>"`
- `"arn:aws:s3:::<bucket-name>/<object-prefix>/*"`

Example:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "my-statement",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<bucket-name>",
"arn:aws:s3:::<bucket-name>/*"
]
}
]
}
```

### Destination

In order to be able to use an S3 bucket as the destination of a transfer, the user/role needs the following permissions:
- `s3:putObject` on the object(s) in the bucket, e.g.:
- `"arn:aws:s3:::<bucket-name>/*"`
- `"arn:aws:s3:::<bucket-name>/<object-name>"`
- `"arn:aws:s3:::<bucket-name>/<folder-name>/*"`
- `"arn:aws:s3:::<bucket-name>/<folder-name>/<object-name>"`

Example:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "my-statement",
"Effect": "Allow",
"Action": [
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::<bucket-name>/*"
]
}
]
}
```

## Configuration

### AmazonS3 Chunk size Configuration

The maximum chunk of stream to be read, by default, is 500mb. It can be changed in the EDC config file
as `edc.dataplane.aws.sink.chunk.size.mb` or in the env variables as `EDC_DATAPLANE_AWS_SINK_CHUNK_SIZE_MB`.
as `edc.dataplane.aws.sink.chunk.size.mb` or in the env variables as `EDC_DATAPLANE_AWS_SINK_CHUNK_SIZE_MB`.

0 comments on commit a077837

Please sign in to comment.