Source Hubspot: add incremental streams #2425

keu · 2021-03-12T01:14:30Z

What

#2222
Describe what the change is solving
It helps to add screenshots if it affects the frontend.

How

Describe the solution

Pre-merge Checklist

Run integration tests
Publish Docker images

Recommended reading order

test.java
component.ts
the rest

yevhenii-ldv

LGTM

keu · 2021-03-12T20:17:03Z

/test connector=source-hubspot

🕑 source-hubspot https://github.com/airbytehq/airbyte/actions/runs/647325533
❌ source-hubspot https://github.com/airbytehq/airbyte/actions/runs/647325533

keu · 2021-03-12T21:59:16Z

/test connector=source-hubspot

🕑 source-hubspot https://github.com/airbytehq/airbyte/actions/runs/647553647
❌ source-hubspot https://github.com/airbytehq/airbyte/actions/runs/647553647

keu · 2021-03-12T22:20:36Z

/test connector=source-hubspot

🕑 source-hubspot https://github.com/airbytehq/airbyte/actions/runs/647601484
✅ source-hubspot https://github.com/airbytehq/airbyte/actions/runs/647601484

sherifnada · 2021-03-12T22:22:25Z

airbyte-integrations/connectors/source-hubspot/sample_files/configured_catalog.json

+            },
+            "updatedAt": {
+              "type": "string"
+            },


the cursor_field and default_cursor_field should be set on any stream that is using incremental. Is discover not outputing a default cursor field?

but if source_defined_cursor is True, why cursor_field should be in catalog? I thought this one is coming from user, the same question about default_cursor_field, I thought this one used in case source_defined_cursor is False and user didn't provide the cursor_field

the current implementation of the BaseClient returns only JSON schema

cursor_field is used by destinations when deduping records. For example, if the records user=eugene,score=300,updatedAt=1 and user=eugene,score=200,updatedAt=2 appear in the destination, assuming user is the primary key then we need to know what field to dedupe on i.e: cursor field

This will also be used by standard tests to fix the incremental problem you are running into by verifying all records have cursor value >= state

but these are two completely different use cases, I don't see how this will work in case there is no primary_key or primary_key is composite (multiple keys).
update_at/bookmark_field can't be used instead of primary_key in many cases because it means a different thing (could have duplicates, etc).

I agree. primary_key is coming though :) see #2337 and #2409

airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py

sherifnada · 2021-03-12T22:32:58Z

...ource-test/src/main/java/io/airbyte/integrations/standardtest/source/StandardSourceTest.java

@@ -98,16 +98,15 @@
  private Set<String> IMAGES_TO_SKIP_SECOND_INCREMENTAL_READ = Sets.newHashSet(
      "airbyte/source-intercom-singer",
      "airbyte/source-exchangeratesapi-singer",
-      "airbyte/source-hubspot-singer",
+      "airbyte/source-hubspot",


Is the reason we include this here is because the cursor comparison logic is inclusive?

how can we verify that incremental is working? can we compare cursor values in a custom test?

yes it is inclusive as we use startTimestamp param to filter records

we could verify that we don't produce records older than the state value in custom test

can we do that?

I have added a custom test for incremental, unfortunately, subscription_changes stream doesn't have data, and all my tries to subscribe and unsubscribe from emails didn't trigger any event there. I remember I was using demo credentials to test development, but obviously, demo creds can't be used in the test, as they mostly give you a random response

no problem.

sherifnada · 2021-03-12T22:33:51Z

airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py

+    def read(self, getter: Callable, params: Mapping[str, Any] = None) -> Iterator:
+        """Apply state filter to set of records, update cursor(state) if necessary in the end"""
+        latest_cursor = None
+        for record in self.read_chunked(getter, params):


why do we read chunked instead of just paginating over a single request on the entire date range? (i assume it's for a good reason, just not obvious. -- can you leave in a comment?)

to track state, there is no guarantee that returned records sorted in ascending order. Having exact boundary we could always ensure we don't miss records between states. In the future, if we would like to save the state more often we can do this every batch

sherifnada · 2021-03-12T22:35:45Z

airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py

@@ -300,7 +365,7 @@ class CRMObjectStream(Stream):

    entity: Optional[str] = None
    associations: List[str] = []
-    updated_field = "updatedAt"
+    updated_at_fields = ["updatedAt", "createdAt"]


why are we doing this? this is really dangerous because we are not using a consistent field for the cursor value. This means we could miss records.

we use fallback because updatedAt can be null if there is no updates yet

nvm -- thought more about it. this makes sense. created_at is always <= updated_at, and as long as we only persist state at the very end of syncing a stream then we should be fine

sherifnada · 2021-03-15T19:41:50Z

@keu lmk when this is ready for another look

airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py

…api.py

* #2150 Issue: created native connector with schema folder populated * #2150 Issue: make format code * first version * fix few issues * fix issues * fix read issue * format * docs * docker tags * extend configured catalog for testing * fix source definitions * format * fix call rate issue, add backoff for retry after * add general backoff * write secrets for new connector * drop singer connector registration * refactor streams, resolve properties in schemas at runtime * replace deprecated endpoint for company contacts * replace deprecated pipeline endpoint * update comments * update docs * fix typo * fix stream contact lists * fix pagination and forms result fetching * fix health_check * format and update catalog * revert changes * drop singer based hubspot * fix company contacts substream * move deals to separate test * fix deals tests * remove dynamic fields from records * move deals to catalog again * extend CRMObjectStream with associations * format * update schemas with updated field, change engagement layout * fix Campaign stream * remove custom tests * remove dependency * remove oauth * Source Hubspot: add incremental streams (#2425) * add incremental * add incremental * polishing * update docs * fix docstring * clean up * fix incremental bookmark access * fix incremental tests * clean up * add custom test for incremental, improve logging * format * Update airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py Co-authored-by: Eugene Kulak <kulak.eugene@gmail.com> Co-authored-by: Sherif A. Nada <snadalive@gmail.com> * Source Hubspot: best practices (#2537) * fix error reporting and add unit tests * fix test and refactor cursor fields * format Co-authored-by: Eugene Kulak <kulak.eugene@gmail.com> * restored configured_catalog.json Co-authored-by: ykurochkin <y.kurochkin@zazmic.com> Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com> Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

eugene-kulak added 2 commits March 12, 2021 03:06

add incremental

af60dba

add incremental

408967d

auto-assign bot requested review from ChristopheDuong and michel-tricot March 12, 2021 01:14

keu linked an issue Mar 12, 2021 that may be closed by this pull request

Source Hubspot: add incremental support #2222

Closed

3 tasks

eugene-kulak added 2 commits March 12, 2021 16:44

polishing

40cde14

update docs

3d9e025

keu requested review from vitaliizazmic and yevhenii-ldv March 12, 2021 14:50

keu self-assigned this Mar 12, 2021

eugene-kulak added 2 commits March 12, 2021 16:52

fix docstring

952af6c

clean up

0c2d554

keu removed request for michel-tricot and ChristopheDuong March 12, 2021 14:54

yevhenii-ldv approved these changes Mar 12, 2021

View reviewed changes

fix incremental bookmark access

607428c

eugene-kulak added 2 commits March 12, 2021 23:50

fix incremental tests

17672fb

clean up

fea181b

keu requested a review from sherifnada March 12, 2021 22:20

sherifnada suggested changes Mar 12, 2021

View reviewed changes

eugene-kulak added 2 commits March 16, 2021 00:15

add custom test for incremental, improve logging

de7c286

format

4cd8d70

sherifnada reviewed Mar 16, 2021

View reviewed changes

airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py Show resolved Hide resolved

Update airbyte-integrations/connectors/source-hubspot/source_hubspot/…

e12d026

…api.py

sherifnada approved these changes Mar 16, 2021

View reviewed changes

keu merged commit 0bfabee into keu/source-hubspot-native Mar 16, 2021

keu deleted the keu/source-hubspot-incremental branch March 16, 2021 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source Hubspot: add incremental streams #2425

Source Hubspot: add incremental streams #2425

keu commented Mar 12, 2021

yevhenii-ldv left a comment

keu commented Mar 12, 2021 •

edited by github-actions bot

Loading

keu commented Mar 12, 2021 •

edited by github-actions bot

Loading

keu commented Mar 12, 2021 •

edited by github-actions bot

Loading

sherifnada Mar 12, 2021

keu Mar 12, 2021 •

edited

Loading

keu Mar 12, 2021 •

edited

Loading

sherifnada Mar 12, 2021

sherifnada Mar 12, 2021

keu Mar 12, 2021 •

edited

Loading

sherifnada Mar 13, 2021 •

edited

Loading

sherifnada Mar 12, 2021

keu Mar 12, 2021

keu Mar 12, 2021

sherifnada Mar 12, 2021

keu Mar 15, 2021

sherifnada Mar 16, 2021

sherifnada Mar 12, 2021

keu Mar 12, 2021

sherifnada Mar 12, 2021

keu Mar 12, 2021

sherifnada Mar 12, 2021

sherifnada commented Mar 15, 2021

Source Hubspot: add incremental streams #2425

Source Hubspot: add incremental streams #2425

Conversation

keu commented Mar 12, 2021

What

How

Pre-merge Checklist

Recommended reading order

yevhenii-ldv left a comment

Choose a reason for hiding this comment

keu commented Mar 12, 2021 • edited by github-actions bot Loading

keu commented Mar 12, 2021 • edited by github-actions bot Loading

keu commented Mar 12, 2021 • edited by github-actions bot Loading

Choose a reason for hiding this comment

keu Mar 12, 2021 • edited Loading

Choose a reason for hiding this comment

keu Mar 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keu Mar 12, 2021 • edited Loading

Choose a reason for hiding this comment

sherifnada Mar 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sherifnada commented Mar 15, 2021

keu commented Mar 12, 2021 •

edited by github-actions bot

Loading

keu commented Mar 12, 2021 •

edited by github-actions bot

Loading

keu commented Mar 12, 2021 •

edited by github-actions bot

Loading

keu Mar 12, 2021 •

edited

Loading

keu Mar 12, 2021 •

edited

Loading

keu Mar 12, 2021 •

edited

Loading

sherifnada Mar 13, 2021 •

edited

Loading