Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Airbyte S3-Glue Destination Connector #1315

Open
blarghmatey opened this issue Oct 1, 2024 · 1 comment
Open

Update Airbyte S3-Glue Destination Connector #1315

blarghmatey opened this issue Oct 1, 2024 · 1 comment
Assignees
Labels
Data Engineering product:data-platform Issues related to the Data Platform product

Comments

@blarghmatey
Copy link
Member

Description/Context

The base S3 destination connector has had substantial updates made to it since the last time that we ran a build of the S3-Glue destination. There are also pull requests that were never merged upstream which we are relying on in our build of the connector. We need to generate an updated build that incorporates the improvements from the base S3 destination to increase the performance of our syncs.

Plan/Design

The un-merged changes in the upstream code need to be re-created against the current state of the repository:
destination-s3
Added a new "stringify" argument to the JsonLSerialized buffer
If true, conditional logic will execute that gathers root level objects as strings, including airbyte_data.
Added a new Stringify utility.
Threaded the new stringify argument through the necessary classes and methods, incl. S3JsonlFormatConfig.
destination-s3-glue
Added new s3_glue interfaces MetastoreFormatConfig and MetastoreJsonlFormatConfig with methods to get input, output, and return a serializationLibrary.
Passing in an S3FormatConfig object to the GlueDestinationConfig.
Passing in a MetastoreFormatConfig object instead of the serializationLibrary in the operations code. This is config is passed as an additional argument to transformSchemaRecursive in GlueOperations or upsertTable in MetastoreOperations.
Abstracted default values for glue db, serialization library, text input and output formats to MetastoreConstants.

Once these changes have been implemented and any other structural changes required in the s3-glue code are fixed then we will build a new version of the s3-glue connector for testing in our QA environment.

@blarghmatey blarghmatey added Data Engineering product:data-platform Issues related to the Data Platform product labels Oct 1, 2024
@pdpinch
Copy link
Member

pdpinch commented Dec 2, 2024

What's the status of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Engineering product:data-platform Issues related to the Data Platform product
Projects
None yet
Development

No branches or pull requests

3 participants