Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Iterable hits rate limit on initial sync, even after greatly reducing the total count & size of streams #17654

Closed
marcosmarxm opened this issue Oct 6, 2022 · 15 comments · Fixed by #23821

Comments

@marcosmarxm
Copy link
Member

This Github issue is synchronized with Zendesk:

Ticket ID: #2505
Priority: normal
Group: Community Assistance Engineer
Assignee: Sunny

Original ticket description:

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 64GB disk, t3.2xlarge, 32 GiB of Memory, 8 vCPUs, 64-bit platform
  • Deployment: Docker
  • Airbyte Version: 0.40.10
  • Source name/version: Iterable
  • Destination name/version: Snowflake
  • Step: Initial sync
  • Description: Iterable fails on initial sync, even after greatly reducing the total count & size of streams

I’ve tried several times and have not finished the initial sync of Iterable data. This includes both on the SaaS and the self-hosted versions of Airbyte. The problem appears related to rate-limiting by Iterable, and Airbyte eventually not being to take so much rejection.

I have turned off all but 11 (out of 44) streams, set most of the active streams to full sync w/overwrite, and set the time window to just 5 days. All to no avail.

It seems that the larger streams, e.g. user and list-user are the main culprits. I’ve attached the log from the most recent failure (on self-hosted Docker-based Airbyte).

Here is a snippet of the latest failure from the logs:

2022-09-30 06:05:54 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):301 - failures: [ {
"failureOrigin" : "source",
"failureType" : "system_error",
"internalMessage" : "Request URL: https://api.iterable.com/api/lists/getUsers?listId=1659364, Response Code: 500, Response Text: {\"msg\":\"An error occurred. Please try again later. If problem persists, please contact your CSM\",\"code\":\"GenericError\",\"params\":null}",
"externalMessage" : "Something went wrong in the connector. See the logs for more details.",
"metadata" : {
"attemptNumber" : 0,
"jobId" : 2,
"from_trace_message" : true,
"connector_command" : "read"
},
"stacktrace" : "Traceback (most recent call last):\n  File \"/airbyte/integration_code/main.py\", line 13, in <module>\n    launch(source, sys.argv[1:])\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py\", line 123, in launch\n    for message in source_entrypoint.run(parsed_args):\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py\", line 114, in run\n    for message in generator:\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py\", line 128, in read\n    raise e\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py\", line 114, in read\n    yield from self._read_stream(\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py\", line 179, in _read_stream\n    for record in record_iterator:\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py\", line 277, in _read_full_refresh\n    for record in records:\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 421, in read_records\n    response = self._send_request(request, request_kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 339, in _send_request\n    return backoff_handler(user_backoff_handler)(request, request_kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/backoff/_sync.py\", line 105, in retry\n    ret = target(*args, **kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/backoff/_sync.py\", line 105, in retry\n    ret = target(*args, **kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 297, in _send\n    raise UserDefinedBackoffException(backoff=custom_backoff_time, request=request, response=response)\nairbyte_cdk.sources.streams.http.exceptions.UserDefinedBackoffException: Request URL: https://api.iterable.com/api/lists/getUsers?listId=1659364, Response Code: 500, Response Text: {\"msg\":\"An error occurred. Please try again later. If problem persists, please contact your CSM\",\"code\":\"GenericError\",\"params\":null}\n",
"timestamp" : 1664517894149
}

c48fb6ce_7a0b_45f8_a009_c62667781496_logs_2_txt.txt (2.7 MB)

[Discourse post]

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2022-10-01 at 00:01:

Hello there! You are receiving this message because none of your fellow community members has stepped in to respond to your topic post. (If you are a community member and you are reading this response, feel free to jump in if you have the answer!) As a result, the Community Assistance Team has been made aware of this topic and will be investigating and responding as quickly as possible.
Some important considerations that will help your to get your issue solved faster:
* It is best to use our topic creation template; if you haven’t yet, we recommend posting a followup with the requested information. With that information the team will be able to more quickly search for similar issues with connectors and the platform and troubleshoot more quickly your specific question or problem.
* Make sure to upload the complete log file; a common investigation roadblock is that sometimes the error for the issue happens well before the problem is surfaced to the user, and so having the tail of the log is less useful than having the whole log to scan through.
* Be as descriptive and specific as possible; when investigating it is extremely valuable to know what steps were taken to encounter the issue, what version of connector / platform / Java / Python / docker / k8s was used, etc. The more context supplied, the quicker the investigation can start on your topic and the faster we can drive towards an answer.
* We in the Community Assistance Team are glad you’ve made yourself part of our community, and we’ll do our best to answer your questions and resolve the problems as quickly as possible. Expect to hear from a specific team member as soon as possible.

Thank you for your time and attention.
Best,
The Community Assistance Team

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Sunny on 2022-10-06 at 00:45:

I've created a Github issue to request improvement around this: #17654

It sounds like you have taken many steps to try and get around this, but I'm going to see if there is anything else that can be done. Maybe restricting by start_date?

@sh4sh sh4sh changed the title Iterable fails on initial sync, even after greatly reducing the total count & size of streams Iterable hits rate limit on initial sync, even after greatly reducing the total count & size of streams Oct 6, 2022
@sh4sh sh4sh changed the title Iterable hits rate limit on initial sync, even after greatly reducing the total count & size of streams Source Iterable hits rate limit on initial sync, even after greatly reducing the total count & size of streams Oct 6, 2022
@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Sunny on 2022-10-06 at 00:47:

It does look like end_date has been added as well in v0.40.11 #17573

@bazarnov
Copy link
Collaborator

bazarnov commented Feb 27, 2023

@marcosmarxm
Looks like the https://api.iterable.com/api/lists/getUsers?listId=1659364 is invalid since there is no List Id is requested: 1659364. In this case, should we bypass the record with status_code = 500. It could also be the code 400 - bad request.

I've managed to reproduce the issue, but not sure if that's the best workaround.

@bazarnov bazarnov self-assigned this Feb 27, 2023
@bazarnov
Copy link
Collaborator

bazarnov commented Mar 1, 2023

Update: Working on it.

@bazarnov
Copy link
Collaborator

bazarnov commented Mar 7, 2023

Updates:
The retry fix is on the go. I'll attach the PR asap.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:13:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:14:

Closed due to no response from requester.

1 similar comment
@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:14:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:15:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:16:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:20:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:21:

Closed due to no response from requester.

1 similar comment
@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:21:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:23:

Closed due to no response from requester.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Unclaimed
5 participants