Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Intercom: Fix conversations incremental pagination slowness #11208

Merged
merged 6 commits into from
Mar 21, 2022

Conversation

bkrausz
Copy link
Contributor

@bkrausz bkrausz commented Mar 16, 2022

What

Conversation sync, usually the largest set of objects in Intercom (and the most important for ETL), is incredibly slow, even for incremental sync. The reason for this is that a sync fetches every single page, ignoring the records but still issuing the HTTP requests (other users are also seeing this issue: #9572 (comment)).

How

Sort by descending and once we cross the updated_at boundary stop syncing new pages.

🚨 User Impact 🚨

Faster syncs

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Tests

Unit
Results (1.18s):
       4 passed
Integration

Integration tests will only pass with a specific access token that I don't have access to.

Acceptance

Getting a socket timeout error on TestBasicRead.test_read when running these, not sure why, but I believe they are passing.

@github-actions github-actions bot added the area/connectors Connector related issues label Mar 16, 2022
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Mar 16, 2022
@alafanechere alafanechere self-assigned this Mar 18, 2022
@alafanechere
Copy link
Contributor

alafanechere commented Mar 18, 2022

/test connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/2003446099
❌ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/2003446099
🐛 https://gradle.com/s/3wqzaxhcthy52

@alafanechere
Copy link
Contributor

Hey @bkrausz thank you for this improvement attempt. I'm running the acceptance tests and will go for a first review asap.

@alafanechere
Copy link
Contributor

alafanechere commented Mar 18, 2022

/test connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/2003588271
✅ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/2003588271
Python tests coverage:

Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
source_acceptance_test/utils/__init__.py                 6      0   100%
source_acceptance_test/tests/__init__.py                 4      0   100%
source_acceptance_test/__init__.py                       2      0   100%
source_acceptance_test/tests/test_full_refresh.py       52      2    96%
source_acceptance_test/utils/asserts.py                 37      2    95%
source_acceptance_test/config.py                        74      6    92%
source_acceptance_test/utils/json_schema_helper.py     105     13    88%
source_acceptance_test/utils/common.py                  70     17    76%
source_acceptance_test/utils/compare.py                 62     23    63%
source_acceptance_test/tests/test_core.py              275    106    61%
source_acceptance_test/base.py                          10      4    60%
source_acceptance_test/utils/connector_runner.py       110     48    56%
source_acceptance_test/tests/test_incremental.py        69     38    45%
------------------------------------------------------------------------
TOTAL                                                  876    259    70%
Name                          Stmts   Miss  Cover
-------------------------------------------------
source_intercom/__init__.py       2      0   100%
source_intercom/source.py       222     44    80%
-------------------------------------------------
TOTAL                           224     44    80%
Name                          Stmts   Miss  Cover
-------------------------------------------------
source_intercom/__init__.py       2      0   100%
source_intercom/source.py       222     57    74%
-------------------------------------------------
TOTAL                           224     57    75%

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkrausz I ran ./gradlew format: there was an unused import in integration_test.py.
I also made the has_old_records an instance attribute instead of a class attribute: this makes the mutation of this variable safer.

You mentioned:

We're sorting by desc. Once we hit the first page with an out-of-date result we can stop.

You removed the ordering related parameters. Is it because the default endpoint behavior is to return descending ordering? Could you please explicitly set the params.update({"order": "desc", "sort": self.cursor_field}) for safety then?

Comment on lines 283 to 286
def request_params(self, next_page_token: Mapping[str, Any] = None, **kwargs) -> MutableMapping[str, Any]:
params = super().request_params(next_page_token, **kwargs)
params.update({"order": "asc", "sort": self.cursor_field})
return params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guessed you removed this because default ordering is {"order": "desc", "sort": "updated_at"}. I think you should rather explicitly set params.update({"order": "desc", "sort": self.cursor_field}) for safety in case of API changes and to improve the understanding of the code.

@bkrausz
Copy link
Contributor Author

bkrausz commented Mar 18, 2022

Thanks for the help getting this to a good state! I haven't touched python in many years, so my toolchain and comfort is pretty minimal.

@alafanechere
Copy link
Contributor

alafanechere commented Mar 21, 2022

/test connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/2015363714
✅ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/2015363714
Python tests coverage:

Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
source_acceptance_test/utils/__init__.py                 6      0   100%
source_acceptance_test/tests/__init__.py                 4      0   100%
source_acceptance_test/__init__.py                       2      0   100%
source_acceptance_test/tests/test_full_refresh.py       52      2    96%
source_acceptance_test/utils/asserts.py                 37      2    95%
source_acceptance_test/config.py                        74      6    92%
source_acceptance_test/utils/json_schema_helper.py     105     13    88%
source_acceptance_test/utils/common.py                  70     17    76%
source_acceptance_test/utils/compare.py                 62     23    63%
source_acceptance_test/tests/test_core.py              275    106    61%
source_acceptance_test/base.py                          10      4    60%
source_acceptance_test/utils/connector_runner.py       110     48    56%
source_acceptance_test/tests/test_incremental.py        69     38    45%
------------------------------------------------------------------------
TOTAL                                                  876    259    70%
Name                          Stmts   Miss  Cover
-------------------------------------------------
source_intercom/__init__.py       2      0   100%
source_intercom/source.py       226     44    81%
-------------------------------------------------
TOTAL                           228     44    81%
Name                          Stmts   Miss  Cover
-------------------------------------------------
source_intercom/__init__.py       2      0   100%
source_intercom/source.py       226     60    73%
-------------------------------------------------
TOTAL                           228     60    74%

@alafanechere
Copy link
Contributor

alafanechere commented Mar 21, 2022

/publish connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/2015952837
✅ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/2015952837

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes! Publishing the new connector version now and will merge afterward.

@alafanechere alafanechere merged commit ea92a24 into airbytehq:master Mar 21, 2022
@bkrausz bkrausz deleted the bk-convo-pagination branch March 21, 2022 19:50
@bkrausz
Copy link
Contributor Author

bkrausz commented Mar 21, 2022

Beautiful...what an improvement! Thanks :)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants