-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Source Salesforce: Write binary file without decoding #16684
Conversation
/test connector=connectors/source-salesforce
Build PassedTest summary info:
|
@sajarin @grishick @marcosmarxm @tuliren |
assigned to @koconder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Nakachi-S I'll be taking a look at this PR for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nakachi-S few quick things:
- Have you tested this directly yourself? The unit tests have no encoding or read/write unit tests, not sure if you are able to write a unit test on the
read_with_chunks
- See feedback on how chunks are written, based on the code i believe only the last chunk will be written
- You need to update the changelog in docs/integrations/
- You need to bump the docker image version for your change
LABEL io.airbyte.version=1.0.15 1.0.16
for chunk in response.iter_content(chunk_size=chunk_size): | ||
data_file.writelines(self.filter_null_bytes(self.decode(chunk))) | ||
data_file.write(self.filter_null_bytes(chunk)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nakachi-S just worried that since the file was opened as wb
and not ab
the function writelines()
that now needs to use write()
which is what’s required for bytes might overwrite the file at each chunk. Other than that everything looks OK here for this particular file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@koconder
Thanks your review.
A temporary file is created for each URL, and chunks are continuously written to the file.
Therefore, it is not necessary to use "aw," which is an append mode.
Also about
See feedback on how chunks are written, based on the code i believe only the last chunk will be written
if you write "write" in the with clause as shown below, the result is not overwriting the file, but appending to it.
with open("tmp.txt", "wb") as f:
f.wirte("Hello\n")
f.write("World.)
this result is
Hello
World
@koconder
Yes, I tested this method and there were no problems.
I bump docker image version and updated document. |
@Nakachi-S please revert the seed file changes, only keep to changes in the connector folder |
This reverts commit 93c66e8.
@koconder |
airbyte-config/init/src/main/resources/seed/source_definitions.yaml
Outdated
Show resolved
Hide resolved
PR approved, pending release. |
/test connector=connectors/source-salesforce
Build FailedTest summary info:
|
@Nakachi-S please resolve conflicts, changes made in #17001 |
Hey @Nakachi-S. Thank you for making this PR. One of our engineers has pushed a change that should fix the underlying issue; #17001. We look forward to more PRs in the future! |
What
#15950
#14659
https://discuss.airbyte.io/t/problem-in-salesforce-chunk-decoding/1757
How
In #11692 PR, the
download_data
method implements saving to a temporary file by chunk size.I propose a way to handle multibyte strings by saving the data as a binary file instead of decoding by chunk size.
Recommended reading order
streams.py
🚨 User Impact 🚨
No impact expected.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
Updating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereTests
Unit
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.