Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Destination S3: fix minio output for parquet format #4666

Merged
merged 1 commit into from
Jul 13, 2021
Merged

🐛 Destination S3: fix minio output for parquet format #4666

merged 1 commit into from
Jul 13, 2021

Conversation

varunbpatil
Copy link
Contributor

What

Fix #4665.

How

Set the appropriate hadoop configurations (ENDPOINT and PATH_STYLE_ACCESS).

Recommended reading order

  1. S3ParquetWriter.java

Pre-merge Checklist

Expand the checklist which is relevant for this PR.

Connector checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • Secrets are annotated with airbyte_secret in the connector's spec
  • Credentials added to Github CI if needed and not already present. instructions for injecting secrets into CI.
  • Unit & integration tests added as appropriate (and are passing)
    • Community members: please provide proof of this succeeding locally e.g: screenshot or copy-paste acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • /test connector=connectors/<name> command as documented here is passing.
    • Community members can skip this, Airbyters will run this for you.
  • Code reviews completed
  • Documentation updated
    • README.md
    • docs/SUMMARY.md if it's a new connector
    • Created or updated reference docs in docs/integrations/<source or destination>/<name>.
    • Changelog in the appropriate page in docs/integrations/.... See changelog example
    • docs/integrations/README.md contains a reference to the new connector
    • Build status added to build page
  • Build is successful
  • Connector version bumped like described here
  • New Connector version released on Dockerhub by running the /publish command described here
  • No major blockers
  • PR merged into master branch
  • Follow up tickets have been created
  • Associated tickets have been closed & stakeholders notified

Connector Generator checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@github-actions github-actions bot added the area/connectors Connector related issues label Jul 9, 2021
@varunbpatil
Copy link
Contributor Author

Integration tests result.

> Task :airbyte-integrations:connectors:destination-s3:integrationTestJava

S3CsvDestinationAcceptanceTest > testEntrypointEnvVar() PASSED

S3CsvDestinationAcceptanceTest > testCheckConnectionInvalidCredentials() PASSED

S3CsvDestinationAcceptanceTest > testSyncVeryBigRecords() PASSED

S3CsvDestinationAcceptanceTest > [1] exchange_rate_messages.txt, exchange_rate_catalog.json PASSED

S3CsvDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json PASSED

S3CsvDestinationAcceptanceTest > [1] exchange_rate_messages.txt, exchange_rate_catalog.json PASSED

S3CsvDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json PASSED

S3CsvDestinationAcceptanceTest > testIncrementalDedupeSync() PASSED

S3CsvDestinationAcceptanceTest > testGetSpec() PASSED

S3CsvDestinationAcceptanceTest > testSecondSync() PASSED

S3CsvDestinationAcceptanceTest > testCustomDbtTransformationsFailure() PASSED

S3CsvDestinationAcceptanceTest > testIncrementalSync() PASSED

S3CsvDestinationAcceptanceTest > testCheckConnection() PASSED

S3CsvDestinationAcceptanceTest > specDBTValueShouldBeCorrect() PASSED

S3CsvDestinationAcceptanceTest > testSyncWriteSameTableNameDifferentNamespace() PASSED

S3CsvDestinationAcceptanceTest > specNormalizationValueShouldBeCorrect() PASSED

S3CsvDestinationAcceptanceTest > testLineBreakCharacters() PASSED

S3CsvDestinationAcceptanceTest > testSyncUsesAirbyteStreamNamespaceIfNotNull() PASSED

S3CsvDestinationAcceptanceTest > testCustomDbtTransformations() PASSED

S3AvroDestinationAcceptanceTest > testEntrypointEnvVar() PASSED

S3AvroDestinationAcceptanceTest > testCheckConnectionInvalidCredentials() PASSED

S3AvroDestinationAcceptanceTest > testSyncVeryBigRecords() PASSED

S3AvroDestinationAcceptanceTest > [1] exchange_rate_messages.txt, exchange_rate_catalog.json PASSED

S3AvroDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json PASSED

S3AvroDestinationAcceptanceTest > [1] exchange_rate_messages.txt, exchange_rate_catalog.json PASSED

S3AvroDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json PASSED

S3AvroDestinationAcceptanceTest > testIncrementalDedupeSync() PASSED

S3AvroDestinationAcceptanceTest > testGetSpec() PASSED

S3AvroDestinationAcceptanceTest > testSecondSync() PASSED

S3AvroDestinationAcceptanceTest > testCustomDbtTransformationsFailure() PASSED

S3AvroDestinationAcceptanceTest > testIncrementalSync() PASSED

S3AvroDestinationAcceptanceTest > testCheckConnection() PASSED

S3AvroDestinationAcceptanceTest > specDBTValueShouldBeCorrect() PASSED

S3AvroDestinationAcceptanceTest > testSyncWriteSameTableNameDifferentNamespace() PASSED

S3AvroDestinationAcceptanceTest > specNormalizationValueShouldBeCorrect() PASSED

S3AvroDestinationAcceptanceTest > testLineBreakCharacters() PASSED

S3AvroDestinationAcceptanceTest > testSyncUsesAirbyteStreamNamespaceIfNotNull() PASSED

S3AvroDestinationAcceptanceTest > testCustomDbtTransformations() PASSED

S3JsonlDestinationAcceptanceTest > testEntrypointEnvVar() PASSED

S3JsonlDestinationAcceptanceTest > testCheckConnectionInvalidCredentials() PASSED

S3JsonlDestinationAcceptanceTest > testSyncVeryBigRecords() PASSED

S3JsonlDestinationAcceptanceTest > [1] exchange_rate_messages.txt, exchange_rate_catalog.json PASSED

S3JsonlDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json PASSED

S3JsonlDestinationAcceptanceTest > [1] exchange_rate_messages.txt, exchange_rate_catalog.json PASSED

S3JsonlDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json PASSED

S3JsonlDestinationAcceptanceTest > testIncrementalDedupeSync() PASSED

S3JsonlDestinationAcceptanceTest > testGetSpec() PASSED

S3JsonlDestinationAcceptanceTest > testSecondSync() PASSED

S3JsonlDestinationAcceptanceTest > testCustomDbtTransformationsFailure() PASSED

S3JsonlDestinationAcceptanceTest > testIncrementalSync() PASSED

S3JsonlDestinationAcceptanceTest > testCheckConnection() PASSED

S3JsonlDestinationAcceptanceTest > specDBTValueShouldBeCorrect() PASSED

S3JsonlDestinationAcceptanceTest > testSyncWriteSameTableNameDifferentNamespace() PASSED

S3JsonlDestinationAcceptanceTest > specNormalizationValueShouldBeCorrect() PASSED

S3JsonlDestinationAcceptanceTest > testLineBreakCharacters() PASSED

S3JsonlDestinationAcceptanceTest > testSyncUsesAirbyteStreamNamespaceIfNotNull() PASSED

S3JsonlDestinationAcceptanceTest > testCustomDbtTransformations() PASSED

S3ParquetDestinationAcceptanceTest > testEntrypointEnvVar() PASSED

S3ParquetDestinationAcceptanceTest > testCheckConnectionInvalidCredentials() PASSED

S3ParquetDestinationAcceptanceTest > testSyncVeryBigRecords() PASSED

S3ParquetDestinationAcceptanceTest > [1] exchange_rate_messages.txt, exchange_rate_catalog.json PASSED

S3ParquetDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json PASSED

S3ParquetDestinationAcceptanceTest > [1] exchange_rate_messages.txt, exchange_rate_catalog.json PASSED

S3ParquetDestinationAcceptanceTest > [2] edge_case_messages.txt, edge_case_catalog.json PASSED

S3ParquetDestinationAcceptanceTest > testIncrementalDedupeSync() PASSED

S3ParquetDestinationAcceptanceTest > testGetSpec() PASSED

S3ParquetDestinationAcceptanceTest > testSecondSync() PASSED

S3ParquetDestinationAcceptanceTest > testCustomDbtTransformationsFailure() PASSED

S3ParquetDestinationAcceptanceTest > testIncrementalSync() PASSED

S3ParquetDestinationAcceptanceTest > testCheckConnection() PASSED

S3ParquetDestinationAcceptanceTest > specDBTValueShouldBeCorrect() PASSED

S3ParquetDestinationAcceptanceTest > testSyncWriteSameTableNameDifferentNamespace() PASSED

S3ParquetDestinationAcceptanceTest > specNormalizationValueShouldBeCorrect() PASSED

S3ParquetDestinationAcceptanceTest > testLineBreakCharacters() PASSED

S3ParquetDestinationAcceptanceTest > testSyncUsesAirbyteStreamNamespaceIfNotNull() PASSED

S3ParquetDestinationAcceptanceTest > testCustomDbtTransformations() PASSED

NoFlatteningSheetGeneratorTest > testGetHeaderRow() PASSED

NoFlatteningSheetGeneratorTest > testGetRecordColumns() PASSED

S3CsvFormatConfig > Flattening enums can be created from value string PASSED

RootLevelFlatteningSheetGeneratorTest > testGetHeaderRow() PASSED

RootLevelFlatteningSheetGeneratorTest > testGetRecordColumns() PASSED

S3OutputPathHelperTest > getOutputPrefix PASSED

JsonFieldNameUpdaterTest > testFieldNameUpdate() PASSED

JsonToAvroSchemaConverterTest > [1] string_field, {"type":"string"}, ["null","string"] PASSED

JsonToAvroSchemaConverterTest > [2] integer_field, {"type":"integer"}, ["null","int"] PASSED

JsonToAvroSchemaConverterTest > [3] number_field, {"type":"number"}, ["null","double"] PASSED

JsonToAvroSchemaConverterTest > [4] null_field, {"type":"null"}, "null" PASSED

JsonToAvroSchemaConverterTest > [5] union_field, {"type":["null","number","string"]}, ["null","double","string"] PASSED

JsonToAvroSchemaConverterTest > [6] array_field_single_type, {"type":"array","items":{"type":"string"}}, ["null",{"type":"array","items":["null","string"]}] PASSED

JsonToAvroSchemaConverterTest > [7] array_field_multiple_types, {"type":"array","items":[{"type":"string"},{"type":"number"}]}, ["null",{"type":"array","items":["null","string","double"]}] PASSED

JsonToAvroSchemaConverterTest > [8] object_field, {"type":"object","properties":{"id":{"type":"integer"},"node_id":{"type":["null","string"]}}}, ["null",{"type":"record","name":"object_field","fields":[{"name":"id","type":["null","int"],"default":null},{"name":"node_id","type":["null","string"],"default":null}]}] PASSED

JsonToAvroSchemaConverterTest > [9] any_of_field, {"anyOf":[{"type":"string"},{"type":"integer"}]}, ["null","string","int"] PASSED

JsonToAvroSchemaConverterTest > [10] all_of_field, {"allOf":[{"type":"string"},{"type":"integer"}]}, ["null","string","int"] PASSED

JsonToAvroSchemaConverterTest > [11] one_of_field, {"oneOf":[{"type":"string"},{"type":"integer"}]}, ["null","string","int"] PASSED

JsonToAvroSchemaConverterTest > testNoCombinedRestriction() PASSED

JsonToAvroSchemaConverterTest > testGetUnionTypes() PASSED

JsonToAvroSchemaConverterTest > [1] simple_schema, namespace1, false, {"type":"object","properties":{"node_id":{"type":["null","string"]}}}, {"type":"record","name":"simple_schema","namespace":"namespace1","fields":[{"name":"node_id","type":["null","string"],"default":null}]} PASSED

JsonToAvroSchemaConverterTest > [2] nested_record, namespace2, false, {"type":"object","properties":{"node_id":{"type":["null","string"]},"user":{"type":["null","object"],"properties":{"first_name":{"type":"string"},"last_name":{"type":"string"}}}}}, {"type":"record","name":"nested_record","namespace":"namespace2","fields":[{"name":"node_id","type":["null","string"],"default":null},{"name":"user","type":["null",{"type":"record","name":"user","namespace":"","fields":[{"name":"first_name","type":["null","string"],"default":null},{"name":"last_name","type":["null","string"],"default":null}]}],"default":null}]} PASSED

JsonToAvroSchemaConverterTest > [3] record_with_airbyte_fields, namespace3, true, {"type":"object","properties":{"node_id":{"type":["null","string"]}}}, {"type":"record","name":"record_with_airbyte_fields","namespace":"namespace3","fields":[{"name":"_airbyte_ab_id","type":{"type":"string","logicalType":"uuid"}},{"name":"_airbyte_emitted_at","type":{"type":"long","logicalType":"timestamp-millis"}},{"name":"node_id","type":["null","string"],"default":null}]} PASSED

JsonToAvroSchemaConverterTest > [4] name_with:spécial:characters, namespace4, false, {"type":"object","properties":{"node:id":{"type":["null","string"]}}}, {"type":"record","name":"name_with_special_characters","namespace":"namespace4","doc":"_airbyte_original_name:name_with:spécial:characters","fields":[{"name":"node_id","doc":"_airbyte_original_name:node:id","type":["null","string"],"default":null}]} PASSED

JsonToAvroSchemaConverterTest > [5] record_with_union_type, namespace5, false, {"type":"object","properties":{"identifier":{"type":["null","number","string"]}}}, {"type":"record","name":"record_with_union_type","namespace":"namespace5","fields":[{"name":"identifier","type":["null","double","string"],"default":null}]} PASSED

JsonToAvroSchemaConverterTest > [6] array_with_same_type, namespace6, false, {"type":"object","properties":{"identifier":{"type":"array","items":{"type":"string"}}}}, {"type":"record","name":"array_with_same_type","namespace":"namespace6","fields":[{"name":"identifier","type":["null",{"type":"array","items":["null","string"]}],"default":null}]} PASSED

JsonToAvroSchemaConverterTest > [7] array_with_union_type, namespace7, false, {"type":"object","properties":{"identifiers":{"type":"array","items":[{"type":"string"},{"type":"integer"},{"type":"string"},{"type":"boolean"}]}}}, {"type":"record","name":"array_with_union_type","namespace":"namespace7","fields":[{"name":"identifiers","type":["null",{"type":"array","items":["null","string","int","boolean"]}],"default":null}]} PASSED

JsonToAvroSchemaConverterTest > [8] field_with_combined_restriction, namespace8, false, {"properties":{"created_at":{"anyOf":[{"type":"string","format":"date-time"},{"type":["null","string"]},{"type":"integer"}]}}}, {"type":"record","name":"field_with_combined_restriction","namespace":"namespace8","fields":[{"name":"created_at","type":["null","string","int"],"default":null}]} PASSED

JsonToAvroSchemaConverterTest > [9] record_with_combined_restriction_field, namespace9, false, {"properties":{"user":{"type":"object","properties":{"created_at":{"anyOf":[{"type":"string","format":"date-time"},{"type":["null","string"]},{"type":"integer"}]}}}}}, {"type":"record","name":"record_with_combined_restriction_field","namespace":"namespace9","fields":[{"name":"user","type":["null",{"type":"record","name":"user","namespace":"","fields":[{"name":"created_at","type":["null","string","int"],"default":null}]}],"default":null}]} PASSED

JsonToAvroSchemaConverterTest > [10] array_with_combined_restriction_field, namespace10, false, {"properties":{"identifiers":{"type":"array","items":[{"oneOf":[{"type":"integer"},{"type":"string"}]},{"type":"boolean"}]}}}, {"type":"record","name":"array_with_combined_restriction_field","namespace":"namespace10","fields":[{"name":"identifiers","type":["null",{"type":"array","items":["null","int","string","boolean"]}],"default":null}]} PASSED

JsonToAvroSchemaConverterTest > testGetSingleTypes() PASSED

JsonToAvroSchemaConverterTest > testWithCombinedRestriction() PASSED

S3AvroFormatConfigTest > testParseCodecConfigNull() PASSED

S3AvroFormatConfigTest > testParseCodecConfigXz() PASSED

S3AvroFormatConfigTest > testParseCodecConfigBzip2() PASSED

S3AvroFormatConfigTest > testParseCodecConfigZstandard() PASSED

S3AvroFormatConfigTest > testParseCodecConfigDeflate() PASSED

S3AvroFormatConfigTest > testParseCodecConfigInvalid() PASSED

S3AvroFormatConfigTest > testParseCodecConfigSnappy() PASSED

BaseS3WriterTest > testGetOutputFilename() PASSED

S3ParquetFormatConfigTest > testConfigConstruction() PASSED

S3FormatConfigs > When CSV format is specified, it returns CSV format config PASSED

Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.7.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 8m 15s
44 actionable tasks: 2 executed, 42 up-to-date

@sherifnada sherifnada requested a review from tuliren July 9, 2021 14:54
@sherifnada
Copy link
Contributor

brilliant, thank you @varunbpatil!

@tuliren can you take a look and push to the finish line?

@tuliren
Copy link
Contributor

tuliren commented Jul 10, 2021

Got it. Will do.

@tuliren tuliren changed the title Allow S3 destination with parquet format to write to MinIO (#4665) 🐛 Destination S3: fix minio output for parquet format Jul 13, 2021
@tuliren tuliren merged commit 52e098e into airbytehq:master Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3 destination + Parquet doesn't work with MinIO
4 participants