Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B2-40 support 100 Continue #300

Closed
wants to merge 34 commits into from
Closed

B2-40 support 100 Continue #300

wants to merge 34 commits into from

Conversation

emnoor-reef
Copy link

@emnoor-reef emnoor-reef commented Aug 9, 2023

The b2sdk/urllib3/_connection.py file is mostly copied from botocore. I have updated at some places to support urllib3>=2 (botocore pins urllib3<1.27).

@mjurbanski-reef mjurbanski-reef changed the base branch from master to AddApiToFolder August 9, 2023 15:00
@mjurbanski-reef mjurbanski-reef changed the base branch from AddApiToFolder to master August 9, 2023 15:00
@emnoor-reef emnoor-reef marked this pull request as ready for review August 12, 2023 16:45
@emnoor-reef emnoor-reef changed the title 100 continue B2-40 support 100 Continue Aug 12, 2023
Copy link

@mjurbanski-reef mjurbanski-reef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like reusing the opensource components here 💪
this is why I left most of the comments on License-related files - nitpicks mostly, but lets make sure we are playing by the rules correctly :)

b2sdk/requests/NOTICE Outdated Show resolved Hide resolved
b2sdk/urllib3/NOTICE Outdated Show resolved Hide resolved
b2sdk/urllib3/included_source_meta.py Outdated Show resolved Hide resolved
b2sdk/requests/included_source_meta.py Outdated Show resolved Hide resolved
b2sdk/urllib3/_connection.py Outdated Show resolved Hide resolved
Comment on lines 6 to 8
The following classes have been overriden for support HTTP 100-continue:
urllib3.connection.HTTPConnection, urllib3.connection.VerifiedHTTPSConnection,
urllib3.connectionpool.HTTPConnectionPool, urllib3.connectionpool.HTTPSConnectionPool

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not our (Backblaze) changes as you are suggesting thru the header above!

IMO we should just say we are vendoring part of botocore that is relavant to supporting 100 Continue and list our changes to it below it.

b2sdk/b2http.py Outdated Show resolved Hide resolved
test/integration/test_raw_api.py Outdated Show resolved Hide resolved
Comment on lines 121 to 122
# Wait for 1 second for the server to send a response.
if urllib3.util.wait_for_read(self.sock, 2):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, this is a good case showing why putting an obvious is decremental
it says 1 second, but wait is actually 2 ;p

as to why I'm commenting on this: we may need this to be user-configurable
https://github.com/reef-technologies/b2-sdk-python/pull/252/files#diff-ac825af986ae146afcaa98597de5a9c07399afb45b45edbba3d2b7ab68db124eR26

Please for now just expose it so it is controllable for HTTPAdapterWithContinue user and use 10 as default since that value was chosen with pycurl for some reason.

Comment on lines 562 to 564
headers.update({
'Expect': '100-continue',
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, what, we are not setting this header by default during upload etc?
I wonder what is the purpose of adding this in if we are not using it

please consult this with @pawelpolewicz if this should be enabled by default or turned off completely for now

I'm leaving out the "optional" option since IMO having it is for "greater good" (infrastructure wise ;) ) and having it always on may only negatively affect very small file uploads which are not the main use case of b2 AFAIK.

@@ -82,6 +83,11 @@ def request(self, method, url, body=None, headers=None, *args, **kwargs):
self._response_received = False
if headers.get('Expect', b'') in [b'100-continue', '100-continue']:
self._expect_header_set = True
timeout = headers.pop('X-Expect-100-Continue-Timeout-Seconds', self._continue_timeout)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: drop -Seconds part of the name; AFAIK there are already timeout settigns http headers, none of which adds -seconds (Cache-Control, Keep Alive etc)

Comment on lines 89 to 90
except (ValueError, TypeError):
pass

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, I would prefer it explode; name collision seems unlikely for it to be breaking change

Copy link

@mjurbanski-reef mjurbanski-reef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solution looks good now, but I'm pretty sure there is still something shady going on in the test

also make sure default is set properly & add it to changelog please

decode_content: bool = False
decode_content: bool = False,
expect_100_continue: bool = True,
expect_100_continue_timeout_seconds: float = 10.0,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename to expect_100_timeout across the whole code base and make sure you added it to docstrings to methods which already have params in docstrings

Lets not do "expect 100 continue timeout" in one place and "expect 100 timeout" in another. Lets have one and stick with it.

@@ -351,6 +351,7 @@ def _get_upload_headers(self) -> bytes:
file_retention=self.file_retention,
legal_hold=self.legal_hold,
cache_control=self.cache_control,
expect_100_continue=False,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't it supposed to be ON by default with option of disabling?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't file_version a separate request than an upload request?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry my bad:)

also this isn't used in any request but in has_large_header from what I see

NOTES Outdated
@@ -0,0 +1,5 @@
response status line = b'HTTP/1.1 401 \r\n'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗑️

# This is our custom behavior. If the Expect header was
# set, it will trigger this custom behavior.
logger.debug("Waiting for 100 Continue response.")
# Wait for 10 seconds for the server to send a response.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment :) it is not accurate depending on configuration

@@ -351,6 +351,7 @@ def _get_upload_headers(self) -> bytes:
file_retention=self.file_retention,
legal_hold=self.legal_hold,
cache_control=self.cache_control,
expect_100_continue=False,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry my bad:)

also this isn't used in any request but in has_large_header from what I see

@emnoor-reef
Copy link
Author

I agree the test is a bit shady.

In the last commit I moved the test into a separate file. I added two more tests for the the configurations. Though I am still not happy with the tests.

Also, the fixture could be broken up into reusable parts. Currently it borrows all of it's code form test_raw_api. But I think that could be refactored with test_raw_api when that is done.

Comment on lines 94 to 96
@pytest.fixture(autouse=True)
def rest_sent_data(expect_100_setup):
expect_100_setup["sent_data"].clear()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it kinda seems to me like this test will work even if this was deleted

Comment on lines 82 to 90
"raw_api": raw_api,
"upload_url": upload_url,
"upload_auth_token": upload_auth_token,
"sse_b2_aes": sse_b2_aes,
"sent_data": sent_data,
"file_name": file_name,
"file_contents": file_contents,
"file_length": file_length,
"file_sha1": file_sha1,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like you crammed bunch of fixtures into single one
this doesn't encourage reusability

Comment on lines +962 to +963
:param expect_100_continue: whether to use 'Expect: 100-continue' header
:param expect_100_timeout: timeout of 100 response when expect_100_continue is True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for another change here, but now that I look at it - wouldn't expect_100_timeout being set to 0 be basically the same think and we could skip adding expect_100_continue param altogether?

assert expect_100_setup["file_contents"] not in expect_100_setup["sent_data"]


def test_expect_100_timeout(expect_100_setup):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and where is the test that we actually don't send data before 100 Continue is received from the server, but after we get it, we do send data? That is the most important test; The "ignoring 100 Continue" is what we did so far without modification to urrlib&requests

Comment on lines 50 to 57
bucket_dict = raw_api.create_bucket(
api_url,
account_auth_token,
account_id,
bucket_name,
'allPublic',
is_file_lock_enabled=True,
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please reuse the single bucket for all tests here - there is not reason not to, and it will reduce the overall API calls i.e. make tests faster

Copy link

@mjurbanski-reef mjurbanski-reef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/unit/botocore/test_awsrequest.py Outdated Show resolved Hide resolved
)
assert file_contents in http_sent_data
assert wait_mock.call_count == 1
args, _ = wait_mock.call_args

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the purpose of this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to remove that line :}

test/unit/botocore/test_awsrequest.py Outdated Show resolved Hide resolved
@emnoor-reef emnoor-reef closed this by deleting the head repository Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants