Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Salesforce: Write binary file without decoding #16684

Closed
wants to merge 6 commits into from

Conversation

Nakachi-S
Copy link
Contributor

@Nakachi-S Nakachi-S commented Sep 14, 2022

What

#15950
#14659

https://discuss.airbyte.io/t/problem-in-salesforce-chunk-decoding/1757

How

In #11692 PR, the download_data method implements saving to a temporary file by chunk size.

I propose a way to handle multibyte strings by saving the data as a binary file instead of decoding by chunk size.

Recommended reading order

  1. streams.py

🚨 User Impact 🚨

No impact expected.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here

Tests

Unit

スクリーンショット 2022-09-14 11 05 57

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@github-actions github-actions bot added the area/connectors Connector related issues label Sep 14, 2022
@Nakachi-S Nakachi-S changed the title 🐛 Write binary file without decoding. 🐛 Write binary file without decoding on Salesforce Sep 14, 2022
@Nakachi-S Nakachi-S marked this pull request as draft September 14, 2022 01:24
@Nakachi-S Nakachi-S changed the title 🐛 Write binary file without decoding on Salesforce 🐛 Source Salesforce: Write binary file without decoding Sep 14, 2022
@Nakachi-S Nakachi-S marked this pull request as ready for review September 14, 2022 02:03
@sajarin
Copy link
Contributor

sajarin commented Sep 14, 2022

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/3054866558
✅ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/3054866558
Python tests coverage:

Name                                 Stmts   Miss  Cover
--------------------------------------------------------
source_salesforce/__init__.py            2      0   100%
source_salesforce/exceptions.py          8      1    88%
source_salesforce/api.py               151     19    87%
source_salesforce/streams.py           307     71    77%
source_salesforce/rate_limiting.py      22      6    73%
source_salesforce/source.py             91     35    62%
source_salesforce/utils.py               8      7    12%
--------------------------------------------------------
TOTAL                                  589    139    76%
Name                                 Stmts   Miss  Cover
--------------------------------------------------------
source_salesforce/utils.py               8      0   100%
source_salesforce/__init__.py            2      0   100%
source_salesforce/source.py             91      8    91%
source_salesforce/api.py               151     14    91%
source_salesforce/exceptions.py          8      1    88%
source_salesforce/rate_limiting.py      22      3    86%
source_salesforce/streams.py           307     42    86%
--------------------------------------------------------
TOTAL                                  589     68    88%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          10      4    60%   15-18
	 source_acceptance_test/config.py                        83      6    93%   78-80, 84-86
	 source_acceptance_test/conftest.py                     164    164     0%   6-282
	 source_acceptance_test/plugin.py                        48     48     0%   6-104
	 source_acceptance_test/tests/test_core.py              329    111    66%   39, 50-58, 63-70, 74-75, 79-80, 164, 202-219, 228-236, 240-245, 251, 284-289, 327-334, 374-376, 379, 439-448, 477-478, 484, 487, 520-530, 543-568, 573-577
	 source_acceptance_test/tests/test_full_refresh.py       52      2    96%   34, 65
	 source_acceptance_test/tests/test_incremental.py       121     25    79%   21-23, 29-31, 36-43, 48-61, 208-216
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  77     17    78%   15-16, 24-30, 47-54, 64, 67
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       110     48    56%   23-26, 32, 36, 39-64, 67-69, 72-74, 77-79, 82-84, 87-89, 92-110, 144-146
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1325    463    65%

Build Passed

Test summary info:

All Passed

@Nakachi-S
Copy link
Contributor Author

@sajarin
Thanks for running the tests!

@grishick @marcosmarxm @tuliren
Could you review this PR?

@sajarin
Copy link
Contributor

sajarin commented Sep 19, 2022

assigned to @koconder

@sajarin sajarin added the bounty-S Maintainer program: claimable small bounty PR label Sep 19, 2022
Copy link
Contributor

@vincentkoc vincentkoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Nakachi-S I'll be taking a look at this PR for you

Copy link
Contributor

@vincentkoc vincentkoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nakachi-S few quick things:

  • Have you tested this directly yourself? The unit tests have no encoding or read/write unit tests, not sure if you are able to write a unit test on the read_with_chunks
  • See feedback on how chunks are written, based on the code i believe only the last chunk will be written
  • You need to update the changelog in docs/integrations//.md including changelog. See changelog example
  • You need to bump the docker image version for your change using semantic versioning, so this change would be 1.0.16

for chunk in response.iter_content(chunk_size=chunk_size):
data_file.writelines(self.filter_null_bytes(self.decode(chunk)))
data_file.write(self.filter_null_bytes(chunk))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nakachi-S just worried that since the file was opened as wb and not ab the function writelines() that now needs to use write() which is what’s required for bytes might overwrite the file at each chunk. Other than that everything looks OK here for this particular file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koconder
Thanks your review.

A temporary file is created for each URL, and chunks are continuously written to the file.
Therefore, it is not necessary to use "aw," which is an append mode.

Also about

See feedback on how chunks are written, based on the code i believe only the last chunk will be written

if you write "write" in the with clause as shown below, the result is not overwriting the file, but appending to it.

with open("tmp.txt", "wb") as f:
    f.wirte("Hello\n")
    f.write("World.)

this result is

Hello
World

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Sep 20, 2022
@Nakachi-S
Copy link
Contributor Author

@koconder

Have you tested this directly yourself?

Yes, I tested this method and there were no problems.

You need to update the changelog in docs/integrations//.md including changelog. See changelog example
You need to bump the docker image version for your change

I bump docker image version and updated document.
Also, I have given the version of the seed file as shown in the following commit, is this necessary?
93c66e8

@vincentkoc
Copy link
Contributor

@Nakachi-S please revert the seed file changes, only keep to changes in the connector folder

@Nakachi-S
Copy link
Contributor Author

@koconder
I have reverted the seed file changes.

@vincentkoc
Copy link
Contributor

PR approved, pending release.

@sajarin sajarin changed the title 🐛 Source Salesforce: Write binary file without decoding 🐛 Source Salesforce: Write binary file without decoding Sep 21, 2022
@sajarin
Copy link
Contributor

sajarin commented Sep 21, 2022

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/3098949493
❌ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/3098949493
🐛 https://gradle.com/s/xq33jx6uszdvs

Build Failed

Test summary info:

	 =========================== short test summary info ============================
	 FAILED unit_tests/api_test.py::test_rate_limit_bulk - TypeError: _read_stream...
	 FAILED unit_tests/api_test.py::test_rate_limit_rest - TypeError: _read_stream...
	 �[31m============ �[31m�[1m2 failed�[0m, �[32m50 passed�[0m, �[33m153 warnings�[0m�[31m in 94.35s (0:01:34)�[0m�[31m =============�[0m

@vincentkoc
Copy link
Contributor

@Nakachi-S please resolve conflicts, changes made in #17001

@sajarin sajarin requested review from sherifnada and removed request for sherifnada September 23, 2022 15:46
@sajarin
Copy link
Contributor

sajarin commented Sep 23, 2022

Hey @Nakachi-S.

Thank you for making this PR. One of our engineers has pushed a change that should fix the underlying issue; #17001. We look forward to more PRs in the future!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation bounty bounty-S Maintainer program: claimable small bounty PR community connectors/source/salesforce reward-50
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

5 participants