Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Salesforce: calls for distinct timestamp capped at 400 records (leads stream) #9883

Closed
nwayne94 opened this issue Jan 28, 2022 · 13 comments

Comments

@nwayne94
Copy link

nwayne94 commented Jan 28, 2022

Environment

  • OS Version / Instance: Mac OS 11.5.2
  • Memory / Disk: 32 GB
  • Deployment: Docker
  • Airbyte Version: 0.35.10-alpha
  • Source name/version: Salesforce 0.1.23
  • Destination name/version: S3 0.2.7
  • Severity: Critical

Current Behavior

It appears records from Leads table is capped at 400 records for each distinct timestamp. We currently have a Salesforce connection through AppFlow and used it to validate that we should, indeed, see thousands of records for certain timestamps, rather than the 400

#Screenshots
Screen Shot 2022-01-28 at 2 41 43 PM
Screen Shot 2022-01-28 at 2 41 23 PM

@sherifnada sherifnada added area/connectors Connector related issues connectors/source/salesforce type/bug Something isn't working labels Jan 28, 2022
@alafanechere alafanechere changed the title Salesforce calls for distinct timestamp capped at 400 records 🐛 Source Salesforce: calls for distinct timestamp capped at 400 records Jan 31, 2022
@alafanechere alafanechere changed the title 🐛 Source Salesforce: calls for distinct timestamp capped at 400 records 🐛 Source Salesforce: calls for distinct timestamp capped at 400 records (leads stream) Jan 31, 2022
@alafanechere
Copy link
Contributor

alafanechere commented Jan 31, 2022

Hi @nwayne94, is your replication successful according to the Airbyte UI?

@nwayne94
Copy link
Author

@alafanechere yes.

@bazarnov
Copy link
Collaborator

bazarnov commented Feb 14, 2022

@nwayne94 Let me clarify some parts here:

  1. Have you tried to update the source-salesforce up to 0.1.23 (the latest version) ?
  2. Have you tried to use another destination (not Redshift) ?
  3. Have you checked the data from other streams, like 'Account' ?

Currently I couldn't reproduce the issue on our side with the source connector, the perfect idea to isolate the problem is to try to use a simple Postgres instance and load data into it from the Lead stream, then check the result by querying the data.

@nwayne94
Copy link
Author

@bazarnov

  1. We just updated to the latest version and still seeing the same problems
  2. I misspoke and we are are using S3 as the destination
  3. All the other tables we pull in are working fine

@bazarnov
Copy link
Collaborator

@nwayne94
4. Have you used REST or Bulk option to call SF API from connector?
5. Could you please distinct count the id for Lead object only in SalesForce before and after the replication, just to make sure we have pulled out all the Leads correctly? (test it out only on Lead stream)

@nwayne94
Copy link
Author

nwayne94 commented Feb 17, 2022

@bazarnov
4. We are using whatever the default call option is in Airbyte. The UI version we are on (0.35.10-alpha) no longer provides the option to choose between the two
5. Screen Shot 2022-02-17 at 11 41 09 AM

Is this maybe just a rate limiting problem? I ran a full sync starting from 2020-08-01, 2021-06-01, and 2021-08-01, and they all generate ~200,000 records upon initial sync
Screen Shot 2022-02-17 at 11 43 28 AM

@bazarnov
Copy link
Collaborator

This could be the Rate-Limit issue, please try to sync only the Lead stream, and check the output, if you didn't make it earlier, of course.

@nwayne94
Copy link
Author

@bazarnov this is only with the Lead stream, sorry for not clarifying that

@bazarnov
Copy link
Collaborator

@nwayne94 Just to be clear, does the issue still persist? Should we dig more into it?

@nwayne94
Copy link
Author

@bazarnov yes, it is still an issue. When we initially talked to @marcosmarxm, he mentioned that it could be a pagination problem with the Lead stream 🤷

@bazarnov
Copy link
Collaborator

bazarnov commented Mar 1, 2022

@nwayne94 Please sync the Leads stream only and share the Full Sync Log here, please. Thank you.
The issue is not reproduced yet, I'm trying to identify the issue and I need your help here:

  1. Count the Actual number of records on the side of Salesforce using Reporting functionality, generate the report with the leads by Id and count the total number of records, make a screenshot with the actual number.
  2. Run the full sync and check the number of records in the output.
  3. Also, share the screenshot with the connector's setup configuration you use, specifically: "Salesforce Object filtering criteria (Optional)" values.

@misteryeo
Copy link
Contributor

Following up here @nwayne94

@bazarnov bazarnov removed their assignment Oct 18, 2022
@marcosmarxm
Copy link
Member

@nwayne94 I'm closing in favor of #13658 please open a new issue if the issue persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: On Hold
Development

No branches or pull requests

8 participants