Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Google Search Console: incremental stream not working incrementally #8798

Closed
danieldiamond opened this issue Dec 14, 2021 · 1 comment · Fixed by #9194
Closed

Comments

@danieldiamond
Copy link
Contributor

Environment

  • Airbyte version: 0.33.11-alpha
  • OS Version / Instance: AWS EC2
  • Deployment: Docker
  • Source Connector and version: GSC 0.1.7
  • Destination Connector and version: Snowflake 0.3.20
  • Severity: High
  • Step where error happened: Sync Job

Current Behavior

Incremental stream performs full refresh

Expected Behavior

Incremental stream performs incremental sync

Logs

If applicable, please upload the logs from the failing operation.
For sync jobs, you can download the full logs from the UI by going to the sync attempt page and
clicking the download logs button at the top right of the logs display window.

LOG

second_sync.txt
first_sync.txt

Screen Shot 2021-12-15 at 8 47 09 am

Screen Shot 2021-12-14 at 10 36 52 pm

Steps to Reproduce

Are you willing to submit a PR?

No

Possibly information related: #7479

@danieldiamond danieldiamond added needs-triage type/bug Something isn't working labels Dec 14, 2021
@yevhenii-ldv yevhenii-ldv added this to the Connectors Dec 24 2021 milestone Dec 15, 2021
@alafanechere alafanechere added area/connectors Connector related issues and removed needs-triage labels Dec 15, 2021
@alafanechere alafanechere changed the title Google Search Console (GSC) Incremental stream not working incrementally 🐛 Source Google Search Console: incremental stream not working incrementally Dec 15, 2021
@sherifnada sherifnada removed this from the Connectors Dec 24 2021 milestone Dec 17, 2021
@grubberr
Copy link
Contributor

Hay @danieldiamond yes you probably right !
current implementation of connector did NOT PASS test_incremental.
as for me current implementation of "incremental streaming" looks buggy

it's not fully comply CDK concepts:

  1. it slices over conceptually wrong pair: [url, type]
    but CDK assume you slice over something comparable like date or datetime

  2. For one stream it keeps pretty complex state object

{
  "url1": {
    "web": {"date": "date"},
    "news": {"date": "date"}
  },
  "url2": {
    "web": {"date": "date"},
    "news": {"date": "date"}
  },
  "date": "date"  <- IT PUT HERE JUST TO MAKE TESTS SILENT
}

and it ignores concept of cursor_field.

I bet all this can produce to incorrect results
slicing and keeped state - has to be re-implemented to use right approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
9 participants