-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google Search Console (GSC) Incremental stream isn't incremental #7479
Comments
Thanks for creating issue this helps us to track them. Team will look into it soon. |
Hi, @danieldiamond.
Could you please clarify what exactly you are having a problem with so that I can take a deeper look at the connector code, since at the moment i am unable to reproduce your problem? |
hi @yevhenii-ldv thanks for investigating this! to confirm though, yes |
May I ask you to check if it was a banal small failure of the connector? Please create a new Sourse GSC again and try to read the data Incrementally 2 or 3 times (by setting the date, for example, the beginning of autumn, so that it does not take too much of your time)? And write me what happened with the connector. |
there's now ~30M records in the |
regarding this seperate issue of no records in the final table @ChristopheDuong i wonder if this is due to the new incremental normalization process |
Hello @danieldiamond,
And the last thing left was to reproduce your configuration, and I created a connection where the source was GSC and the destination was Redshift. In this regard, could you make sure that the versions of the Connector and Destination are updated to the latest versions and re-create the connection. Because at the present moment in time I am by no means able to recreate your problem. |
Hi @yevhenii-ldv thank you so much for the further investigation. Can I please confirm that you checked this using the service account configuration? Or at least can 💯 confirm without any doubt there is no difference between the two authentication methods on streaming? |
@danieldiamond I created a connection -> started syncing -> waited 1 minute after it finished -> started a second sync. |
@yevhenii-ldv looking at your screenshot, i see
also posting the connection schema below to confirm its configured correctly {
"connectionId": "409bd766-2f56-4be7-95b5-c96a160f624b",
"name": "default",
"namespaceDefinition": "customformat",
"namespaceFormat": "google_search_console",
"prefix": "",
"sourceId": "f1063ef2-af11-412c-8158-538e7154f755",
"destinationId": "744cf008-8d04-4d35-b4b8-daa5e2f4130b",
"operationIds": [
"c123bff8-264e-4074-9707-a7f406385fef"
],
"syncCatalog": {
"streams": [
{
"stream": {
"name": "search_analytics_by_page",
"jsonSchema": {
"type": "object",
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"ctr": {
"type": [
"null",
"number"
],
"multipleOf": 1e-25
},
"date": {
"type": [
"null",
"string"
],
"format": "date"
},
"page": {
"type": [
"null",
"string"
]
},
"clicks": {
"type": [
"null",
"integer"
]
},
"position": {
"type": [
"null",
"number"
],
"multipleOf": 1e-25
},
"site_url": {
"type": [
"null",
"string"
]
},
"impressions": {
"type": [
"null",
"integer"
]
},
"search_type": {
"type": [
"null",
"string"
]
}
}
},
"supportedSyncModes": [
"full_refresh",
"incremental"
],
"sourceDefinedCursor": true,
"defaultCursorField": [
"date"
],
"sourceDefinedPrimaryKey": [],
"namespace": null
},
"config": {
"syncMode": "incremental",
"cursorField": [
"date"
],
"destinationSyncMode": "append",
"primaryKey": [],
"aliasName": "search_analytics_by_page",
"selected": true
}
},
{
"stream": {
"name": "sites",
"jsonSchema": {
"type": "object",
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"siteUrl": {
"type": [
"null",
"string"
]
},
"permissionLevel": {
"type": [
"null",
"string"
]
}
}
},
"supportedSyncModes": [
"full_refresh"
],
"sourceDefinedCursor": null,
"defaultCursorField": [],
"sourceDefinedPrimaryKey": [],
"namespace": null
},
"config": {
"syncMode": "full_refresh",
"cursorField": [],
"destinationSyncMode": "overwrite",
"primaryKey": [],
"aliasName": "sites",
"selected": true
}
}
]
},
"schedule": null,
"status": "active",
"resourceRequirements": {
"cpu_request": null,
"cpu_limit": null,
"memory_request": null,
"memory_limit": null
}
} |
@danieldiamond In my last testing, I used the following versions:
|
I'm thinking it might be possible that the connector is still reading all the records (not doing an incremental load) but then processing only the latest records per the new normalisation incremental process and "_airbyte_ab_id" column which is masking the underlying issue. |
a new value for Thus, I don't think incremental normalization would be able to mask these based on the |
Hi @danieldiamond! |
okay thanks @yevhenii-ldv but to be sure, this wasn't troubleshooted on pre- on not seeing much point in trying to getting this resolved on previous versions, a quick search of @ChristopheDuong am i able to upgrade to |
Сan not reproduce the issue with using Airbyte |
@yevhenii-ldv i'm hoping im just doing something silly here that you can help with.
i checked the destination and max(date) is |
Enviroment
Is this your first time deploying Airbyte: no
OS Version / Instance: Linux EC2 m5.2xlarge
Deployment: Docker
Airbyte Version: 0.30.20-alpha
Source name: GSC 0.1.6
destination: Redshift 0.3.19
Description: Finally able to try out GSC connector. Unfortunately incremental syncs are not working properly. Thinking it could be linked to improper setting of state
Current Behavior
Full refresh on streams configured to be incremental
Expected Behavior
Incremental sync
Logs
If applicable, please upload the logs from the failing operation.
For sync jobs, you can download the full logs from the UI by going to the sync attempt page and
clicking the download logs button at the top right of the logs display window.
LOG
logs are too large. stored them here: https://airbytehq.slack.com/archives/C01MFR03D5W/p1635482528310300
Steps to Reproduce
Are you willing to submit a PR?
Remove this with your answer.
The text was updated successfully, but these errors were encountered: