Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ChristopheDuong committed Apr 4, 2022
1 parent 665e6db commit e2a8000
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -100,13 +100,13 @@ private OnStartFunction onStartFunction(final BlobStorageOperations storageOpera
if (writeConfig.getSyncMode().equals(DestinationSyncMode.OVERWRITE)) {
final String namespace = writeConfig.getNamespace();
final String stream = writeConfig.getStreamName();
final String bucketPath = writeConfig.getOutputBucketPath();
LOGGER.info("Clearing storage area in destination started for namespace {} stream {} bucketObject {}", namespace, stream, bucketPath);
final String outputBucketPath = writeConfig.getOutputBucketPath();
LOGGER.info("Clearing storage area in destination started for namespace {} stream {} bucketObject {}", namespace, stream, outputBucketPath);
AirbyteSentry.executeWithTracing("PrepareStreamStorage",
() -> storageOperations.dropBucketObject(bucketPath),
Map.of("namespace", Objects.requireNonNullElse(namespace, "null"), "stream", stream, "storage", bucketPath));
() -> storageOperations.dropBucketObject(outputBucketPath),
Map.of("namespace", Objects.requireNonNullElse(namespace, "null"), "stream", stream, "storage", outputBucketPath));
LOGGER.info("Clearing storage area in destination completed for namespace {} stream {} bucketObject {}", namespace, stream,
bucketPath);
outputBucketPath);
}
}
LOGGER.info("Preparing storage area in destination completed.");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
"description": "Format string on how data will be organized inside the S3 bucket directory",
"type": "string",
"examples": [
"${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_${PART_ID}"
"${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_"
],
"order": 3
},
Expand Down
20 changes: 18 additions & 2 deletions docs/integrations/destinations/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@ Check out common troubleshooting issues for the S3 destination connector on our
| S3 Endpoint | string | URL to S3, If using AWS S3 just leave blank. |
| S3 Bucket Name | string | Name of the bucket to sync data into. |
| S3 Bucket Path | string | Subdirectory under the above bucket to sync the data into. |
| S3 Bucket Format | string | Additional string format under S3 Bucket Path. Default value is `${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_${PART_ID}`. |
| S3 Bucket Format | string | Additional string format on how to store data under S3 Bucket Path. Default value is `${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_`. |
| S3 Region | string | See [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions) for all region codes. |
| Access Key ID | string | AWS/Minio credential. |
| Secret Access Key | string | AWS/Minio credential. |
| Format | object | Format specific configuration. See the [spec](/airbyte-integrations/connectors/destination-s3/src/main/resources/spec.json) for details. |

⚠️ Please note that under "Full Refresh Sync" mode, data in the configured bucket and path will be wiped out before each sync. We recommend you to provision a dedicated S3 resource for this sync to prevent unexpected data deletion from misconfiguration. ⚠️

The full path of the output data with S3 path format `${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_${PART_ID}` is:
The full path of the output data with S3 path format `${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_` is:

```text
<bucket-name>/<source-namespace-if-exists>/<stream-name>/<upload-date>_<epoch>_<partition-id>.<format-extension>
Expand All @@ -50,6 +50,22 @@ testing_bucket/data_output_path/public/users/2021_01_01_1234567890_0.csv.gz
| bucket path
bucket name
```
Available variable for custom s3 path format are:
- `${NAMESPACE}`: Namespace where the stream comes from or configured by the connectionn namespace fields.
- `${STREAM_NAME}`: Name of the stream
- `${YEAR}`: Year in which the sync was writing the output data in.
- `${MONTH}`: Month in which the sync was writing the output data in.
- `${DAY}`: Day in which the sync was writing the output data in.
- `${HOUR}`: Hour in which the sync was writing the output data in.
- `${MINUTE}` : Minute in which the sync was writing the output data in.
- `${SECOND}`: Second in which the sync was writing the output data in.
- `${MILLISECOND}`: Millisecond in which the sync was writing the output data in.
- `${EPOCH}`: Milliseconds since Epoch in which the sync was writing the output data in.
- `${UUID}`: random uuid string

Note:
- Multiple `/` characters in the S3 path are collapsed into a single `/` character.
- If the output bucket contains too many files, the part id variable is using a `UUID` instead. It uses sequential ID otherwise.

Please note that the stream name may contain a prefix, if it is configured on the connection.
A data sync may create multiple files as the output files can be partitioned by size (targeting a size of 200MB compressed or lower) .
Expand Down

1 comment on commit e2a8000

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SonarQube Report

SonarQube report for Airbyte Connectors Destination S3(#11666)

Measures

Name Value Name Value Name Value
Vulnerabilities 0 Duplicated Lines (%) 0.0 Code Smells 45
Security Rating A Lines to Cover 24 Bugs 1
Quality Gate Status ERROR Coverage 0.0 Duplicated Blocks 3
Lines of Code 2763 Reliability Rating A Blocker Issues 0
Critical Issues 3 Major Issues 40 Minor Issues 3

Detected Issues

Rule File Description Message
java:S1172 (MAJOR) s3/S3ConsumerFactory.java:76 Unused method parameters should be removed Remove this unused method parameter "namingResolver".
java:S1611 (MINOR) s3/S3ConsumerFactory.java:152 Parentheses should be removed from a single lambda input parameter when its type is inferred Remove the parentheses around the "hasFailed" parameter (sonar.java.source not set. Assuming 8 or greater.)
java:S1118 (MAJOR) s3/SerializedBufferFactory.java:26 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S112 (MAJOR) s3/SerializedBufferFactory.java:71 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) avro/AvroSerializedBuffer.java:32 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
common-java:DuplicatedBlocks (MAJOR) avro/S3AvroWriter.java Source files should not have any duplicated blocks 1 duplicated blocks of code must be removed.
common-java:DuplicatedBlocks (MAJOR) csv/S3CsvWriter.java Source files should not have any duplicated blocks 1 duplicated blocks of code must be removed.
java:S112 (MAJOR) jsonl/JsonLSerializedBuffer.java:31 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S1172 (MAJOR) jsonl/JsonLSerializedBuffer.java:61 Unused method parameters should be removed Remove this unused method parameter "config".
common-java:DuplicatedBlocks (MAJOR) jsonl/S3JsonlWriter.java Source files should not have any duplicated blocks 1 duplicated blocks of code must be removed.
java:S1118 (MAJOR) util/StreamTransferManagerHelper.java:12 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S3358 (MAJOR) avro/JsonToAvroSchemaConverter.java:179 Ternary operators should not be nested Extract this nested ternary operation into an independent statement.
java:S112 (MAJOR) s3/BlobStorageOperations.java:20 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) s3/BlobStorageOperations.java:27 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) s3/BlobStorageOperations.java:32 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) s3/S3StorageOperations.java:103 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) s3/S3StorageOperations.java:124 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) s3/S3StorageOperations.java:134 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) csv/CsvSerializedBuffer.java:33 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) csv/CsvSerializedBuffer.java:47 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S107 (MAJOR) s3/S3DestinationConfig.java:80 Methods should not have too many parameters Constructor has 8 parameters, which is greater than 7 authorized.
java:S6213 (MAJOR) avro/S3AvroWriter.java:106 Restricted Identifiers should not be used as Identifiers Rename this variable to not match a restricted identifier.
java:S112 (MAJOR) s3/S3DestinationConfig.java:175 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S2259 (MAJOR) avro/JsonToAvroSchemaConverter.java:164 Null pointers should not be dereferenced A "NullPointerException" could be thrown; "properties" is nullable here.
java:S3776 (CRITICAL) avro/JsonToAvroSchemaConverter.java:338 Cognitive Complexity of methods should not be too high Refactor this method to reduce its Cognitive Complexity from 22 to the 15 allowed.
java:S3252 (CRITICAL) avro/JsonToAvroSchemaConverter.java:372 "static" base class members should not be accessed via derived types Use static access with "!Unknown!" for "Entry".
java:S1121 (MAJOR) avro/JsonToAvroSchemaConverter.java:261 Assignments should not be made from within sub-expressions Extract the assignment out of this expression.
java:S107 (MAJOR) s3/S3DestinationConfig.java:60 Methods should not have too many parameters Constructor has 9 parameters, which is greater than 7 authorized.
java:S107 (MAJOR) csv/S3CsvWriter.java:40 Methods should not have too many parameters Constructor has 9 parameters, which is greater than 7 authorized.
java:S3358 (MAJOR) s3/S3Destination.java:136 Ternary operators should not be nested Extract this nested ternary operation into an independent statement.
java:S5361 (CRITICAL) s3/S3Destination.java:137 "String#replace" should be preferred to "String#replaceAll" Replace this call to "replaceAll()" by a call to the "replace()" method.
java:S1121 (MAJOR) avro/JsonToAvroSchemaConverter.java:217 Assignments should not be made from within sub-expressions Extract the assignment out of this expression.
java:S1118 (MAJOR) avro/AvroConstants.java:10 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S125 (MAJOR) avro/JsonToAvroSchemaConverter.java:209 Sections of code should not be commented out This block of commented-out lines of code should be removed.
java:S1068 (MAJOR) jsonl/S3JsonlWriter.java:37 Unused "private" fields should be removed Remove this unused "WRITER" private field.
java:S1118 (MAJOR) util/AvroRecordHelper.java:17 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S1700 (MAJOR) avro/JsonSchemaType.java:23 A field should not duplicate the name of its containing class Rename field "jsonSchemaType"
java:S1118 (MAJOR) parquet/S3ParquetConstants.java:9 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S1118 (MAJOR) util/S3OutputPathHelper.java:13 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S112 (MAJOR) writer/ProductionWriterFactory.java:59 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) writer/S3WriterFactory.java:21 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S1118 (MAJOR) s3/S3FormatConfigs.java:16 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S112 (MAJOR) s3/S3FormatConfigs.java:39 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S1118 (MAJOR) csv/CsvSheetGenerator.java:25 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S2094 (MINOR) csv/CsvSheetGenerators.java:7 Classes should not be empty Remove this empty class, write its code or make it an "interface".
java:S1116 (MINOR) csv/RootLevelFlatteningSheetGenerator.java:25 Empty statements should be removed Remove this empty statement.

Coverage (0.0%)

File Coverage File Coverage
src/main/java/io/airbyte/integrations/destination/s3/avro/AvroConstants.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/avro/AvroNameTransformer.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/avro/AvroRecordFactory.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/avro/AvroSerializedBuffer.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/avro/JsonFieldNameUpdater.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/avro/JsonSchemaType.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/avro/JsonToAvroSchemaConverter.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/avro/S3AvroFormatConfig.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/avro/S3AvroWriter.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/csv/BaseSheetGenerator.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/csv/CsvSerializedBuffer.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/csv/CsvSheetGenerator.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/csv/NoFlatteningSheetGenerator.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/csv/RootLevelFlatteningSheetGenerator.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvFormatConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvWriter.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/csv/StagingDatabaseCsvSheetGenerator.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/jsonl/JsonLSerializedBuffer.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/jsonl/S3JsonlFormatConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/jsonl/S3JsonlWriter.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/parquet/ParquetSerializedBuffer.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/parquet/S3ParquetConstants.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/parquet/S3ParquetFormatConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/parquet/S3ParquetWriter.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/S3ConsumerFactory.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/S3Destination.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/S3DestinationConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/S3DestinationConfigFactory.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/S3DestinationConstants.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/S3Format.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/S3FormatConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/S3FormatConfigs.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/S3StorageOperations.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/SerializedBufferFactory.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/util/AvroRecordHelper.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/util/S3NameTransformer.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/util/S3OutputPathHelper.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/util/StreamTransferManagerHelper.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/WriteConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/writer/BaseS3Writer.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/writer/ProductionWriterFactory.java 0.0

Please sign in to comment.