Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres to Snowflake Sync: unsafe memory access exception #3251

Closed
royt-via opened this issue May 5, 2021 · 10 comments
Closed

Postgres to Snowflake Sync: unsafe memory access exception #3251

royt-via opened this issue May 5, 2021 · 10 comments

Comments

@royt-via
Copy link

royt-via commented May 5, 2021

Expected Behavior

Data should be streamed from source (Postgres) to destination (Snowflake)

Current Behavior

After about 80 minutes since the sync process starts, I get the error message below.

Logs

2021-05-05 16:39:06 ERROR (/tmp/workspace/12/0) LineGobbler(voidCall):69 - Exception in thread "main" java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
2021-05-05 16:39:06 ERROR (/tmp/workspace/12/0) LineGobbler(voidCall):69 - at java.base/java.lang.StringUTF16.compress(StringUTF16.java:161)
2021-05-05 16:39:06 ERROR (/tmp/workspace/12/0) LineGobbler(voidCall):69 - at java.base/java.lang.String.<init>(String.java:3651)
2021-05-05 16:39:06 ERROR (/tmp/workspace/12/0) LineGobbler(voidCall):69 - at java.base/java.lang.String.<init>(String.java:294)
2021-05-05 16:39:06 ERROR (/tmp/workspace/12/0) LineGobbler(voidCall):69 - at java.base/java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:688)
2021-05-05 16:39:06 ERROR (/tmp/workspace/12/0) LineGobbler(voidCall):69 - at java.base/java.nio.CharBuffer.toString(CharBuffer.java:1626)
2021-05-05 16:39:06 ERROR (/tmp/workspace/12/0) LineGobbler(voidCall):69 - at java.base/java.util.regex.Matcher.toMatchResult(Matcher.java:274)
2021-05-05 16:39:06 ERROR (/tmp/workspace/12/0) LineGobbler(voidCall):69 - at java.base/java.util.Scanner.match(Scanner.java:1399)
2021-05-05 16:39:06 ERROR (/tmp/workspace/12/0) LineGobbler(voidCall):69 - at java.

(if needed full logs lmk and I'll upload them somewhere)

Steps to Reproduce

Sync data (some tables are big) from Postgres to Snowflake

Severity of the bug for you

Critical
Can't use Airbyte

Airbyte Version

0.22.0-alpha

Connector Version (if applicable)

Postgres (source) connector - 0.3.1

Additional context

t3a.large EC2 machine

@royt-via royt-via added the type/bug Something isn't working label May 5, 2021
@marcosmarxm
Copy link
Member

hey @royt-via do you mind sharing the full-log and the size of tables (rows/GB)?

@marcosmarxm marcosmarxm added the priority/critical Critical priority! label May 6, 2021
@royt-via
Copy link
Author

royt-via commented May 6, 2021

hey @marcosmarxm, sure!
this is the full log.
I was trying to sync 2 tables - one with 23,130,763 rows (~6.3 GB) and the other with 1,466 rows (~6 MB).

@momer
Copy link

momer commented May 6, 2021

Hey all, we encountered this - specifically, we encountered an issue related to what we believed was a JSONb column. @royt-via do you have JSONB or JSON columns on that table?

@marcosmarxm
Copy link
Member

@momer just checking, are you using pgsql => snowflake too? if possible share your logs too to better debug and if the instance is online can you check the disk storage usage?

@royt-via
Copy link
Author

royt-via commented May 6, 2021

@momer I am using JSONB columns in one of the 2 tables I'm trying to sync.
How did you solve this issue?
@marcosmarxm, if momer's solution would work for me too I guess it will pinpoint the problem

@michel-tricot
Copy link
Contributor

I don't think the root cause is

Exception in thread "main" java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code

but instead:

2021-05-04 17:22:12 ERROR (/tmp/workspace/2/0) LineGobbler(voidCall):69 - at [Source: (String)"{"stream":"my_prefix_my_table","data":{"id":"00000000-0000-0000-0000-000000000000","created":"2020-07-18T04:10:00Z","modified":"2020-07-18T04:10:00Z","timestamp":1595045400,"reference_id":"00000000-0000-0000-0000-000000000001","other_ref_id":"ABCDEFGHIJK","Status":"ExampleStatus","ExampleDetail":"{\"my_key\": [{\"MyKey\": \"45\", \"OtherKey\": \"OtherValue\", \"AnotherKey\": {\"AnestedKey\": \"AnestedValue\"}, \"ANestedKey\": {\"ANestedKey\": \"ANestedValue\"[truncated 9404 chars]; line: 1, column: 834] (through reference chain: io.airbyte.protocol.models.AirbyteRecordMessage["data"])
2021-05-04 17:22:12 ERROR (/tmp/workspace/2/0) LineGobbler(voidCall):69 - at io.airbyte.integrations.destination.buffered_stream_consumer.BufferedStreamConsumer.writeStreamsWithNRecords(BufferedStreamConsumer.java:198)
2021-05-04 17:22:12 ERROR (/tmp/workspace/2/0) LineGobbler(voidCall):69 - at io.airbyte.integrations.destination.buffered_stream_consumer.BufferedStreamConsumer.close(BufferedStreamConsumer.java:172)
2021-05-04 17:22:12 ERROR (/tmp/workspace/2/0) LineGobbler(voidCall):69 - at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.close(FailureTrackingAirbyteMessageConsumer.java:82)
2021-05-04 17:22:12 ERROR (/tmp/workspace/2/0) LineGobbler(voidCall):69 - at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:126)
2021-05-04 17:22:12 ERROR (/tmp/workspace/2/0) LineGobbler(voidCall):69 - ... 2 more
2021-05-04 17:22:12 ERROR (/tmp/workspace/2/0) LineGobbler(voidCall):69 - Caused by: java.lang.RuntimeException: com.fasterxml.jackson.databind.JsonMappingException: Illegal unquoted character ((CTRL-CHAR, code 0)): has to be escaped using backslash to be included in string value

My theory is the following:

  • we don't support jsonb in the destination so we try to write it as a string
  • unfortunately when we convert it to string, it applies all the coding-decoding logic of strings on binary data. This data can not be encoded as string, so it fails.

I think the next steps are to write things that we don't know how to write and that are binary using a blob type.

Full-redacted log:
airbyte_issue_2021-05-05.log

@royt-via
Copy link
Author

royt-via commented May 7, 2021

Hi @michel-tricot, since we're talking about Snowflake-connector (destination) why not store JSON/JSONB as VARIANT?

@sherifnada
Copy link
Contributor

sherifnada commented May 7, 2021

@royt-via that's a pretty good idea. I've created a follow on issue here: #3283. We'll pick this up first thing on Monday and keep you updated. @momer any context you can provide about which source/destination you're using would be superb as well.

@royt-via
Copy link
Author

royt-via commented May 8, 2021

Thank you, @sherifnada !

@davinchia
Copy link
Contributor

This is working based on work from #3283. Apart from investigation, no actual work was done. We suspect this might have been fixed with #3327.

Closing this for now. Please reopen if this resurfaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants