Iterable (certified) -> BigQuery connection | Schema detection fails #30966

PoulpiFr · 2023-10-02T07:40:30Z

PoulpiFr
Oct 2, 2023

Hello Folks 👋

I’m having an issue on my Iterable (certified) -> BigQuery connection 😢

My Iterable -> BigQuery connection was working fine until a few days ago.
I started having failed sync jobs with those logs:

2023-09-29 09:16:44 INFO i.a.w.t.TemporalAttemptExecution(get):124 - Docker volume job log path: /tmp/workspace/773/2/logs.log
2023-09-29 09:16:44 INFO i.a.w.t.TemporalAttemptExecution(get):129 - Executing worker wrapper. Airbyte version: 0.50.30
2023-09-29 09:16:44 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):290 - Attempt 0 to save workflow id for cancellation
2023-09-29 09:16:44 INFO i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2023-09-29 09:16:44 INFO i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SOCAT_KUBE_CPU_LIMIT: '2.0'
2023-09-29 09:16:44 INFO i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2023-09-29 09:16:44 INFO i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SOCAT_KUBE_CPU_REQUEST: '0.1'
2023-09-29 09:16:44 INFO i.a.c.i.LineGobbler(voidCall):149 - 
2023-09-29 09:16:44 INFO i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable LAUNCHDARKLY_KEY: ''
2023-09-29 09:16:44 INFO i.a.c.i.LineGobbler(voidCall):149 - ----- START CHECK -----
2023-09-29 09:16:44 INFO i.a.c.i.LineGobbler(voidCall):149 - 
2023-09-29 09:16:44 INFO i.a.c.i.LineGobbler(voidCall):149 - Checking if airbyte/destination-bigquery:1.2.20 exists...
2023-09-29 09:16:44 INFO i.a.c.i.LineGobbler(voidCall):149 - airbyte/destination-bigquery:1.2.20 was found locally.
2023-09-29 09:16:44 INFO i.a.w.p.DockerProcessFactory(create):143 - Creating docker container = destination-bigquery-check-773-2-igxtu with resources io.airbyte.config.ResourceRequirements@216dbaba[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null
2023-09-29 09:16:44 INFO i.a.w.p.DockerProcessFactory(create):196 - Preparing command: docker run --rm --init -i -w /data/773/2 --log-driver none --name destination-bigquery-check-773-2-igxtu --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/destination-bigquery:1.2.20 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e USE_STREAM_CAPABLE_STATE=true -e FIELD_SELECTION_WORKSPACES= -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=2 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317/ -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.50.30 -e WORKER_JOB_ID=773 airbyte/destination-bigquery:1.2.20 check --config source_config.json
2023-09-29 09:16:44 INFO i.a.w.i.VersionedAirbyteStreamFactory(create):177 - Reading messages from protocol version 0.2.0
2023-09-29 09:16:45 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.b.IntegrationCliParser(parseOptions):126 integration args: {check=null, config=source_config.json}
2023-09-29 09:16:45 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.b.IntegrationRunner(runInternal):108 Running integration: io.airbyte.integrations.destination.bigquery.BigQueryDestination
2023-09-29 09:16:45 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.b.IntegrationRunner(runInternal):109 Command: CHECK
2023-09-29 09:16:45 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.b.IntegrationRunner(runInternal):110 Integration config: IntegrationConfig{command=CHECK, configPath='source_config.json', catalogPath='null', statePath='null'}
2023-09-29 09:16:45 WARN i.a.w.i.VersionedAirbyteStreamFactory(internalLog):309 - WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2023-09-29 09:16:45 WARN i.a.w.i.VersionedAirbyteStreamFactory(internalLog):309 - WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2023-09-29 09:16:45 WARN i.a.w.i.VersionedAirbyteStreamFactory(internalLog):309 - WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword always_show - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2023-09-29 09:16:46 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.d.b.BigQueryUtils(getLoadingMethod):419 Selected loading method is set to: GCS
2023-09-29 09:16:51 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.d.s.S3FormatConfigs(getS3FormatConfig):22 S3 format config: {"format_type":"CSV","flattening":"No flattening"}
2023-09-29 09:16:52 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.d.s.S3BaseChecks(testSingleUpload):40 Started testing if all required credentials assigned to user for single file uploading
2023-09-29 09:16:52 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.d.s.S3BaseChecks(testSingleUpload):48 Finished checking for normal upload mode
2023-09-29 09:16:52 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.d.s.S3BaseChecks(testMultipartUpload):52 Started testing if all required credentials assigned to user for multipart upload
2023-09-29 09:16:52 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO a.m.s.StreamTransferManager(getMultiPartOutputStreams):329 Initiated multipart upload to pawns-airbyte/pawns-airbyte/pawns_prod/test_1695979012700 with full ID ABPnzm4Y_LcaslD1Wl61HLsZ2iODThOdVh1mF_Al1-3Jslq6S1IEMkCTAv__uzF6PZUW-tQ
2023-09-29 09:16:53 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO a.m.s.MultiPartOutputStream(close):158 Called close() on [MultipartOutputStream for parts 1 - 10000]
2023-09-29 09:16:53 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO a.m.s.MultiPartOutputStream(close):158 Called close() on [MultipartOutputStream for parts 1 - 10000]
2023-09-29 09:16:53 WARN i.a.w.i.VersionedAirbyteStreamFactory(internalLog):309 - WARN a.m.s.MultiPartOutputStream(close):160 [MultipartOutputStream for parts 1 - 10000] is already closed
2023-09-29 09:16:53 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO a.m.s.StreamTransferManager(complete):367 [Manager uploading to pawns-airbyte/pawns-airbyte/pawns_prod/test_1695979012700 with id ABPnzm4Y_...F6PZUW-tQ]: Uploading leftover stream [Part number 1 containing 3.34 MB]
2023-09-29 09:16:53 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO a.m.s.StreamTransferManager(uploadStreamPart):558 [Manager uploading to pawns-airbyte/pawns-airbyte/pawns_prod/test_1695979012700 with id ABPnzm4Y_...F6PZUW-tQ]: Finished uploading [Part number 1 containing 3.34 MB]
2023-09-29 09:16:54 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO a.m.s.StreamTransferManager(complete):395 [Manager uploading to pawns-airbyte/pawns-airbyte/pawns_prod/test_1695979012700 with id ABPnzm4Y_...F6PZUW-tQ]: Completed
2023-09-29 09:16:54 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.d.s.S3BaseChecks(testMultipartUpload):74 Finished verification for multipart upload mode
2023-09-29 09:16:55 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.b.IntegrationRunner(runInternal):186 Completed integration: io.airbyte.integrations.destination.bigquery.BigQueryDestination
2023-09-29 09:16:55 INFO i.a.w.g.DefaultCheckConnectionWorker(run):117 - Check connection job received output: io.airbyte.config.StandardCheckConnectionOutput@c684921[status=succeeded,message=<null>,additionalProperties={}]</null>
2023-09-29 09:16:55 INFO i.a.c.i.LineGobbler(voidCall):149 - 
2023-09-29 09:16:55 INFO i.a.c.i.LineGobbler(voidCall):149 - ----- END CHECK -----
2023-09-29 09:16:55 INFO i.a.c.i.LineGobbler(voidCall):149 - 
2023-09-29 09:46:56 INFO i.a.w.t.s.a.AppendToAttemptLogActivityImpl(log):56 - Retry State: RetryManager(completeFailureBackoffPolicy=BackoffPolicy(minInterval=PT10S, maxInterval=PT30M, base=3), partialFailureBackoffPolicy=null, successiveCompleteFailureLimit=5, totalCompleteFailureLimit=10, successivePartialFailureLimit=1000, totalPartialFailureLimit=10, successiveCompleteFailures=3, totalCompleteFailures=3, successivePartialFailures=0, totalPartialFailures=0)
 Backoff before next attempt: 1 minute 30 seconds

And with the error message at the top:
Failed to launch the refresh schema activity because of: scheduledEventId=17, startedEventId=18, activityType='RefreshSchema', activityId='186be4db-9c3e-3a24-b87f-b5e3d2d24f39', identity='', retryState=RETRY_STATE_NON_RETRYABLE_FAILURE

Another thing that is broken, is that I can no longer:

Create a new connection with my Iterable source
Refresh the source schema on the existing connection “Replication” panel

In both cases, I have a loader at the “schema discovery” step that appears and that take a long time until I get those errors:

2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.w.t.TemporalAttemptExecution(get):124 - Docker volume job log path: /tmp/workspace/5ec8f121-c093-461f-b625-ef9b537d9f95/0/logs.log
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.w.t.TemporalAttemptExecution(get):129 - Executing worker wrapper. Airbyte version: 0.50.30
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.a.c.AirbyteApiClient(retryWithJitterThrows):290 - Attempt 0 to save workflow id for cancellation
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SOCAT_KUBE_CPU_LIMIT: '2.0'
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SOCAT_KUBE_CPU_REQUEST: '0.1'
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable LAUNCHDARKLY_KEY: ''
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.c.i.LineGobbler(voidCall):149 - Checking if airbyte/source-iterable:0.1.30 exists...
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.w.p.DockerProcessFactory(create):143 - Creating docker container = source-iterable-discover-5ec8f121-c093-461f-b625-ef9b537d9f95-0-vfqsl with resources io.airbyte.config.ResourceRequirements@7295e226[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.c.i.LineGobbler(voidCall):149 - airbyte/source-iterable:0.1.30 was found locally.
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.w.p.DockerProcessFactory(create):196 - Preparing command: docker run --rm --init -i -w /data/5ec8f121-c093-461f-b625-ef9b537d9f95/0 --log-driver none --name source-iterable-discover-5ec8f121-c093-461f-b625-ef9b537d9f95-0-vfqsl --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-iterable:0.1.30 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e USE_STREAM_CAPABLE_STATE=true -e FIELD_SELECTION_WORKSPACES= -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317/ -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.50.30 -e WORKER_JOB_ID=5ec8f121-c093-461f-b625-ef9b537d9f95 airbyte/source-iterable:0.1.30 discover --config source_config.json
2023-09-28 09:27:04 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(create):177 - Reading messages from protocol version 0.2.0
2023-09-28 09:27:10 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Backing off _send(...) for 20.0s (requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)))
2023-09-28 09:27:10 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Caught retryable error '("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))' after 1 tries. Waiting 20 seconds then retrying...
2023-09-28 09:27:36 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Backing off _send(...) for 40.0s (requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)))
2023-09-28 09:27:36 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Caught retryable error '("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))' after 2 tries. Waiting 40 seconds then retrying...
2023-09-28 09:28:17 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Backing off _send(...) for 80.0s (requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)))
2023-09-28 09:28:17 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Caught retryable error '("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))' after 3 tries. Waiting 80 seconds then retrying...
2023-09-28 09:29:38 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Backing off _send(...) for 160.0s (requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)))
2023-09-28 09:29:38 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Caught retryable error '("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))' after 4 tries. Waiting 160 seconds then retrying...
2023-09-28 09:32:19 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Backing off _send(...) for 320.0s (requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)))
2023-09-28 09:32:19 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Caught retryable error '("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))' after 5 tries. Waiting 320 seconds then retrying...
2023-09-28 09:37:47 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Backing off _send(...) for 640.0s (requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)))
2023-09-28 09:37:47 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Caught retryable error '("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))' after 6 tries. Waiting 640 seconds then retrying...
2023-09-28 09:48:39 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Backing off _send(...) for 1280.0s (requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)))
2023-09-28 09:48:39 ESC[32mINFOESC[m i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - Caught retryable error '("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))' after 7 tries. Waiting 1280 seconds then retrying...
2023-09-28 09:57:24 ESC[33mWARNESC[m i.a.w.g.DefaultDiscoverCatalogWorker(run):105 - Discover job subprocess finished with exit codee 143

So, my understanding is that the connector source-iterable is trowing the exception “Connection broken: InvalidChunkLength(got length b’‘, 0 bytes read)” when trying to run the discovery on my Iterable project.
I could reproduce that by running manually:

docker run --rm --init -i -w /data/d66e046b-e5fe-4348-8aea-fd73ef30f37f/0 --log-driver none --name source-iterable-discover-d66e046b-e5fe-4348-8aea-fd73ef30f37f-0-sggbl-me --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-iterable:0.1.30 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e USE_STREAM_CAPABLE_STATE=true -e FIELD_SELECTION_WORKSPACES= -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317/ -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.50.30 -e WORKER_JOB_ID=d66e046b-e5fe-4348-8aea-fd73ef30f37f airbyte/source-iterable:0.1.30 discover --config source_config.json --debug

Which gives:

{"type": "DEBUG", "message": "Debug logs enabled", "data": {}}
{"type": "LOG", "log": {"level": "INFO", "message": "Backing off _send(...) for 20.0s (requests.exceptions.ChunkedEncodingError: (\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read)))"}}
{"type": "LOG", "log": {"level": "INFO", "message": "Caught retryable error '(\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read))' after 1 tries. Waiting 20 seconds then retrying..."}}

Now, I have a few “hypothesis / ideas” on why this could be happening (or I could be dead wrong 😓 )
The Iterable project on which I’m connected is a large one, with a lot of data (600k Users)
The code of the source seems to run those part of code during discovery:
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-iterable/source_iterable/source.py#L80

def streams(self, config: Mapping[str, Any]) -> List[Stream]:
        def all_streams_accessible():
            access_check_stream = AccessCheck(authenticator=authenticator)
            try:
                next(read_full_refresh(access_check_stream), None)
            except requests.exceptions.RequestException as e:
                if e.response.status_code == requests.codes.UNAUTHORIZED:
                    return False
                raise
            return True

        authenticator = TokenAuthenticator(token=config["api_key"], auth_header="Api-Key", auth_method="")
        # end date is provided for integration tests only
        start_date, end_date = config["start_date"], config.get("end_date")
        date_range = {"start_date": start_date, "end_date": end_date}
        streams = [
            Campaigns(authenticator=authenticator),
            CampaignsMetrics(authenticator=authenticator, **date_range),
            Channels(authenticator=authenticator),
            Lists(authenticator=authenticator),
            MessageTypes(authenticator=authenticator),
            Metadata(authenticator=authenticator),
            Templates(authenticator=authenticator, **date_range),
        ]
        # Iterable supports two types of Server-side api keys:
        # - read only
        # - server side
        # The first one has a limited set of supported APIs, so others are filtered out here.
        # A simple check is done - a read operation on a stream that can be accessed only via a Server side API key.
        # If read is successful - other streams should be supported as well.
        # More on this - https://support.iterable.com/hc/en-us/articles/360043464871-API-Keys-
        if all_streams_accessible():

This has the side effect of calling/
all_streams_accessible() -> read_full_refresh(access_check_stream)
And if we deep dive a bit to understand better what this does/
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-iterable/source_iterable/utils.py#L29

def read_full_refresh(stream_instance: Stream):
    slices = stream_instance.stream_slices(sync_mode=SyncMode.full_refresh)
    for _slice in slices:
        for record in stream_instance.read_records(stream_slice=_slice, sync_mode=SyncMode.full_refresh):
            yield record

It iterates on ALL slices of the AccessCheck stream.
Let’s have a look at AccessCheck :
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-iterable/source_iterable/streams.py#L697

class AccessCheck(ListUsers):
    # since 401 error is failed silently in all the streams,
    # we need another class to distinguish an empty stream from 401 response
    def check_unauthorized_key(self, response: requests.Response) -> bool:
        # this allows not retrying 401 and raising the error upstream
        return response.status_code != codes.UNAUTHORIZED

It’s a modified version of the ListUsers stream.
What does the ListUsers stream does?
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-iterable/source_iterable/streams.py#L352

class ListUsers(IterableStream):
    primary_key = "listId"
    data_field = "getUsers"
    name = "list_users"
    # enable caching, because this stream used by other ones
    use_cache = True

    def path(self, stream_slice: Optional[Mapping[str, Any]] = None, **kwargs) -> str:
        return f"lists/{self.data_field}?listId={stream_slice['list_id']}"

    def stream_slices(self, **kwargs) -> Iterable[Optional[Mapping[str, any]]]:
        lists = Lists(authenticator=self._cred)
        for list_record in lists.read_records(sync_mode=kwargs.get("sync_mode", SyncMode.full_refresh)):
            yield {"list_id": list_record["id"]}

    def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
        list_id = self._get_list_id(response.url)
        for user in response.iter_lines():
            yield {"email": user.decode(), "listId": list_id}

    @staticmethod
    def _get_list_id(url: str) -> int:
        parsed_url = urlparse.urlparse(url)
        for q in parsed_url.query.split("&"):
            key, value = q.split("=")
            if key == "listId":
                return int(value)

So, it does 2 iteration, firstly it downloads the list of ListIds from a first endpoint of Iterable (https://api.iterable.com/api/docs#lists_getLists) and then for each one, it use this endpoint to fetch the users inside it (https://api.iterable.com/api/docs#lists_getLists_0)

Some lists on my Iterable project are very large (>600k users) and the call to https://api.iterable.com/api/docs#lists_getLists_0 is quite long -> Iterable doesn’t have pagination on those nor an async job pattern API 🙃

We have an 11 minutes long call, for 14Mo of response for 1 list of the project.

So my hypothesis is that this API call is for some reason making the requests fails which produces the requests.exceptions.ChunkedEncodingError: (\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read)))"}} error.

However, I’m missing the necessary skills to update the Airbyte connector code to validate this (is this a timeout issue of the client lib?) and/or to update the connector logic which seems crazy at the moment -> why do we need to iterate on a full stream to do discovery?!

The access check could be done on a single slice or on an endpoint that doesn’t trigger 11minutes API calls 🤯

dustincolvin-transitiv · 2023-11-09T14:35:28Z

dustincolvin-transitiv
Nov 9, 2023

I'm also getting this exact same error message, but only intermittently for Facebook Marketing.

0 replies

gatsby003 · 2024-06-10T15:57:02Z

gatsby003
Jun 10, 2024

Hey @PoulpiFr were you able to solve this ?

0 replies

PoulpiFr · 2024-06-11T07:19:54Z

PoulpiFr
Jun 11, 2024
Author

Not really. Putting the schedule to early morning helped as the Iterable API is faster during the night. And the large lists on which the API was hanging up were tuned to reduce their size. But it's all workarounds, the underlying issue was not addressed and then I switched to another project. Good luck 🤞 Le lun. 10 juin 2024 à 17:57, Ganesh Futane ***@***.***> a écrit :

…

Hey @PoulpiFr <https://github.com/PoulpiFr> were you able to solve this ? — Reply to this email directly, view it on GitHub <#30966 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDZVGPHDLPZE6SCBXJQ4HDZGXEGJAVCNFSM6AAAAABJCTC23OVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TOMRZGIYDG> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterable (certified) -> BigQuery connection | Schema detection fails #30966

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Iterable (certified) -> BigQuery connection | Schema detection fails #30966

PoulpiFr Oct 2, 2023

Replies: 3 comments

dustincolvin-transitiv Nov 9, 2023

gatsby003 Jun 10, 2024

PoulpiFr Jun 11, 2024 Author

PoulpiFr
Oct 2, 2023

dustincolvin-transitiv
Nov 9, 2023

gatsby003
Jun 10, 2024

PoulpiFr
Jun 11, 2024
Author