-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 fixed OOM error when splitting a stream into several files #7074
Conversation
/test connector=connectors/destination-snowflake
|
/test connector=connectors/destination-redshift
|
...tion-jdbc/src/main/java/io/airbyte/integrations/destination/jdbc/copy/s3/S3StreamCopier.java
Outdated
Show resolved
Hide resolved
@andriikorotkov, the previous PR #6949 is titled as a fix to the bug that writes multiple streams to the same table. However, does it actually fix this bug? There was no unit test in that PR to show that the bug had been fixed. Could you add unit tests in this PR? |
@tuliren, you are right, this is not entirely true. I saw that contributors in the troubleshooting group had an OOM error at this point. I tested this again with a lot of data (as you can see it in the screenshots) and fixed it in no time. If you have time, I would like to know your opinion on this fix and get any remarks from you. |
/test connector=connectors/destination-redshift
|
/test connector=connectors/destination-snowflake
|
...tion-jdbc/src/main/java/io/airbyte/integrations/destination/jdbc/copy/s3/S3StreamCopier.java
Show resolved
Hide resolved
/test connector=connectors/destination-snowflake
|
...tion-jdbc/src/main/java/io/airbyte/integrations/destination/jdbc/copy/s3/S3StreamCopier.java
Outdated
Show resolved
Hide resolved
...tion-jdbc/src/main/java/io/airbyte/integrations/destination/jdbc/copy/s3/S3StreamCopier.java
Outdated
Show resolved
Hide resolved
...tion-jdbc/src/main/java/io/airbyte/integrations/destination/jdbc/copy/s3/S3StreamCopier.java
Outdated
Show resolved
Hide resolved
...tion-jdbc/src/main/java/io/airbyte/integrations/destination/jdbc/copy/s3/S3StreamCopier.java
Show resolved
Hide resolved
Thanks for adding the |
/test connector=connectors/destination-snowflake
|
/test connector=connectors/destination-redshift
|
@tuliren I marked a test with a lot of records as disabled. |
/test connector=connectors/destination-snowflake
|
/test connector=connectors/destination-redshift
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated the filename generation logic to its own class, StagingFilenameGenerator
, so that the logics can be unit tested. It turns out that there was an off-by-one bug in the original code, which has been fixed now.
/publish connector=connectors/destination-snowflake
|
/publish connector=connectors/destination-redshift
|
…hq#7074) * UPDATED DESTINATION JDBC * fixed remarks * fixed remarks * added new test for testing largeNumberRecords * updated test * fixed remarks * Separate staging filename generator and add tests Co-authored-by: Liren Tu <tuliren.git@outlook.com>
What
AWS S3 Staging COPY is writing records from different table in the same raw table
How
It is optimal to write every 10,000,000 records to a new file. This will make it easier to work with files and speed up the recording of large amounts of data. In addition, for a large number of records, we will not get a drop in the copy request to QUERY_TIMEOUT when the records from the file are copied to the staging table.
Recommended reading order
x.java
y.python
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/SUMMARY.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes