Iterable (certified) -> BigQuery connection | Schema detection fails #30966
Unanswered
PoulpiFr
asked this question in
Connector Questions
Replies: 3 comments
-
I'm also getting this exact same error message, but only intermittently for Facebook Marketing. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hey @PoulpiFr were you able to solve this ? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Not really.
Putting the schedule to early morning helped as the Iterable API is faster
during the night.
And the large lists on which the API was hanging up were tuned to reduce
their size.
But it's all workarounds, the underlying issue was not addressed and then I
switched to another project.
Good luck 🤞
Le lun. 10 juin 2024 à 17:57, Ganesh Futane ***@***.***> a
écrit :
… Hey @PoulpiFr <https://github.com/PoulpiFr> were you able to solve this ?
—
Reply to this email directly, view it on GitHub
<#30966 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDZVGPHDLPZE6SCBXJQ4HDZGXEGJAVCNFSM6AAAAABJCTC23OVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TOMRZGIYDG>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello Folks 👋
I’m having an issue on my Iterable (certified) -> BigQuery connection 😢
My Iterable -> BigQuery connection was working fine until a few days ago.
I started having failed sync jobs with those logs:
And with the error message at the top:
Failed to launch the refresh schema activity because of: scheduledEventId=17, startedEventId=18, activityType='RefreshSchema', activityId='186be4db-9c3e-3a24-b87f-b5e3d2d24f39', identity='', retryState=RETRY_STATE_NON_RETRYABLE_FAILURE
Another thing that is broken, is that I can no longer:
In both cases, I have a loader at the “schema discovery” step that appears and that take a long time until I get those errors:
So, my understanding is that the connector source-iterable is trowing the exception “Connection broken: InvalidChunkLength(got length b’‘, 0 bytes read)” when trying to run the discovery on my Iterable project.
I could reproduce that by running manually:
Which gives:
Now, I have a few “hypothesis / ideas” on why this could be happening (or I could be dead wrong 😓 )
The Iterable project on which I’m connected is a large one, with a lot of data (600k Users)
The code of the source seems to run those part of code during discovery:
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-iterable/source_iterable/source.py#L80
This has the side effect of calling/
all_streams_accessible() -> read_full_refresh(access_check_stream)
And if we deep dive a bit to understand better what this does/
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-iterable/source_iterable/utils.py#L29
It iterates on ALL slices of the AccessCheck stream.
Let’s have a look at AccessCheck :
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-iterable/source_iterable/streams.py#L697
It’s a modified version of the ListUsers stream.
What does the ListUsers stream does?
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-iterable/source_iterable/streams.py#L352
So, it does 2 iteration, firstly it downloads the list of ListIds from a first endpoint of Iterable (https://api.iterable.com/api/docs#lists_getLists) and then for each one, it use this endpoint to fetch the users inside it (https://api.iterable.com/api/docs#lists_getLists_0)
Some lists on my Iterable project are very large (>600k users) and the call to https://api.iterable.com/api/docs#lists_getLists_0 is quite long -> Iterable doesn’t have pagination on those nor an async job pattern API 🙃
We have an 11 minutes long call, for 14Mo of response for 1 list of the project.
So my hypothesis is that this API call is for some reason making the requests fails which produces the
requests.exceptions.ChunkedEncodingError: (\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read)))"}}
error.However, I’m missing the necessary skills to update the Airbyte connector code to validate this (is this a timeout issue of the client lib?) and/or to update the connector logic which seems crazy at the moment -> why do we need to iterate on a full stream to do discovery?!
The access check could be done on a single slice or on an endpoint that doesn’t trigger 11minutes API calls 🤯
Beta Was this translation helpful? Give feedback.
All reactions