Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PycURL as an alternative to requests #252

Closed
wants to merge 53 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
2d86ba0
Pycurl PoC
agoncharov-reef Aug 31, 2022
74dea2d
Add pycurl to requirements
agoncharov-reef Aug 31, 2022
e772660
Fix empty headers
agoncharov-reef Aug 31, 2022
accf798
Compression & other fixes
agoncharov-reef Aug 31, 2022
814cbcc
Add Expect: 100-continue test
agoncharov-reef Sep 5, 2022
89098e1
Fix lint
agoncharov-reef Sep 5, 2022
172c4c2
Add expect-100 timeout
agoncharov-reef Sep 5, 2022
4149a15
SessionProtocol and curl as optional dependency
kkalinowski-reef Nov 22, 2022
315f0bc
Additional CurlSession variables for configuration
kkalinowski-reef Nov 22, 2022
8e9f5d8
Streaming added
kkalinowski-reef Nov 23, 2022
cff2fad
Curl cleanup
kkalinowski-reef Nov 24, 2022
34c5a6b
Using email.parser to parse http headers, allowing for streamed HEAD …
kkalinowski-reef Nov 25, 2022
07ded05
Fix for improper Curl.getinfo during CurlMulti.perform and documentation
kkalinowski-reef Nov 25, 2022
5782c65
Removed curl from general requirements
kkalinowski-reef Nov 25, 2022
5750128
Merge branch 'master' into pycurl
kkalinowski-reef Nov 25, 2022
d080fa2
Changelog updated
kkalinowski-reef Nov 25, 2022
636eb30
CurlResponse url added
kkalinowski-reef Nov 25, 2022
77333b5
Environmental variables to control used http backend
kkalinowski-reef Nov 28, 2022
2c7512e
CI fix - sybmols in condition renamed, renamed http-protocol to http-…
kkalinowski-reef Nov 28, 2022
d1de188
Reformat for linter
kkalinowski-reef Nov 28, 2022
152b61c
Nox no longer installs curl on CI without actual libcurl env variable.
kkalinowski-reef Nov 28, 2022
7769d36
libcurl enabled only on ubuntu in CI, installation fixed
kkalinowski-reef Nov 28, 2022
fedefb4
Fixed issues with non-libcurl versions
kkalinowski-reef Nov 28, 2022
de7dbc8
Replaced functool.cache with functool.lru_cache available from 3.2
kkalinowski-reef Nov 28, 2022
7638c16
Added maxsize to functools.lru_cache to ensure backward compatibility
kkalinowski-reef Nov 28, 2022
c3589fe
Removed typing.Literal, as unavailable on 3.7
kkalinowski-reef Nov 28, 2022
e9001a1
Debug piece of code to check what breaks the CI
kkalinowski-reef Nov 28, 2022
389edd8
Fix for problems with downloading files in parallel
kkalinowski-reef Nov 28, 2022
5c86d48
Disabled tests of libcurl with pypy on ubuntu
kkalinowski-reef Nov 28, 2022
3ba5c48
Comment on imported modules for session backends updated
kkalinowski-reef Nov 28, 2022
a526d14
Silencing warning about an operation on closed file
kkalinowski-reef Nov 28, 2022
0806069
PR fixes
kkalinowski-reef Dec 13, 2022
93ff92a
PR fixes - additional comments and explanations
kkalinowski-reef Dec 13, 2022
b19eba1
Unit tests for core concepts of curl
kkalinowski-reef Dec 14, 2022
647eea9
Merge branch 'master' into pycurl
kkalinowski-reef Dec 14, 2022
149a9af
Extracted streamed bytes and friends, ensured that CurlManager tests …
kkalinowski-reef Dec 14, 2022
8c0e2ff
PR fixes
kkalinowski-reef Dec 14, 2022
77ec3d1
PR fixes
kkalinowski-reef Dec 14, 2022
0d3ffb4
Documentation for session protocol
kkalinowski-reef Dec 15, 2022
14aae11
Handling adapters
kkalinowski-reef Dec 15, 2022
d3d3721
Missing typing added
kkalinowski-reef Dec 15, 2022
0e52e11
Linter fixes
kkalinowski-reef Dec 15, 2022
8a66599
Handling cookies across all requests sent via CurlSession
kkalinowski-reef Dec 15, 2022
67ff8fb
Proper adapter matching, including `''` scheme adapter.
kkalinowski-reef Dec 15, 2022
caf16fe
Linter fixes
kkalinowski-reef Dec 15, 2022
836f44e
100-continue test confirming that no data is read
kkalinowski-reef Dec 15, 2022
e672dc5
Merge branch 'master' into pycurl
kkalinowski-reef Dec 19, 2022
4d470af
Caching Curl handles
kkalinowski-reef Dec 19, 2022
51dd531
Configuration upgrade
kkalinowski-reef Dec 19, 2022
b9b3278
ApiVer handling of a new ApiConfig format
kkalinowski-reef Dec 19, 2022
2240c25
Linter fix
kkalinowski-reef Dec 19, 2022
d5089c9
Merge branch 'master' into pycurl
kkalinowski-reef Jan 13, 2023
c175275
Clarification on when the warning appears for a closed stream being f…
kkalinowski-reef Jan 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,8 @@ jobs:
fail-fast: false
matrix:
os: ["ubuntu-latest", "macos-latest", "windows-latest"]
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11.0", "pypy-3.7", "pypy-3.8"]
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11", "pypy-3.7", "pypy-3.8"]
http-backend: ["B2_USE_LIBCURL", "B2_USE_REQUESTS"]
exclude:
- os: "macos-latest"
python-version: "pypy-3.7"
Expand All @@ -94,6 +95,13 @@ jobs:
python-version: "pypy-3.8"
- os: "windows-latest"
python-version: "pypy-3.8"
- os: "windows-latest"
http-backend: "B2_USE_LIBCURL"
- os: "macos-latest"
http-backend: "B2_USE_LIBCURL"
- os: "ubuntu-latest"
python-version: "pypy-3.8"
http-backend: "B2_USE_LIBCURL"
steps:
- uses: actions/checkout@v3
with:
Expand All @@ -105,13 +113,19 @@ jobs:
cache: "pip"
- name: Install dependencies
run: python -m pip install --upgrade nox pip setuptools
- name: Install libcurl on Ubuntu
if: ${{ matrix.os == 'ubuntu-latest' && matrix.http-backend == 'B2_USE_LIBCURL' }}
run: sudo apt-get install -y libcurl4-openssl-dev
- name: Run unit tests
run: nox -vs unit
env:
SKIP_COVERAGE: ${{ startsWith(matrix.python-version, env.SKIP_COVERAGE_PYTHON_VERSION_PREFIX) }}
${{ matrix.http-backend}}: "1"
- name: Run integration tests
if: ${{ env.B2_TEST_APPLICATION_KEY != '' && env.B2_TEST_APPLICATION_KEY_ID != '' }}
run: nox -vs integration -- --dont-cleanup-old-buckets
env:
${{ matrix.http-backend}}: "1"
doc:
needs: build
runs-on: ubuntu-latest
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
* Raising `PotentialS3EndpointPassedAsRealm` when a specific misconfiguration is suspected
* Add `large_file_sha1` support
* Add support for incremental upload and sync
* Ability to use libcurl (via pycurl) as http backend (experimental)

### Fixed
* Removed information about replication being in closed beta
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ b2sdk follows [Semantic Versioning](https://semver.org/) policy, so in essence t
Therefore when setting up b2sdk as a dependency, please make sure to match the version appropriately, for example you could put this in your `requirements.txt` to make sure your code is compatible with the `b2sdk` version your user will get from pypi:

```
b2sdk>=0.0.0,<1.0.0
b2sdk>=1.0.0,<2.0.0
```

# Release History
Expand Down
22 changes: 15 additions & 7 deletions b2sdk/api_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,21 @@
#
# File: b2sdk/api_config.py
#
# Copyright 2021 Backblaze Inc. All Rights Reserved.
# Copyright 2022 Backblaze Inc. All Rights Reserved.
#
# License https://www.backblaze.com/using_b2_code.html
#
######################################################################

from typing import Optional, Callable, Type
import requests
from typing import Optional, Type

from .raw_api import AbstractRawApi, B2RawHTTPApi
from .utils.session_config import SESSION_CONFIG, SessionConfig
from .utils.session_protocol import SessionProtocolFactory, get_session_protocol_factories

SESSION_FACTORIES = get_session_protocol_factories()
assert SESSION_FACTORIES.enabled, f'There are no session protocols available. Errors: {SESSION_FACTORIES.disabled}'
DEFAULT_SESSION_FACTORY = SESSION_FACTORIES.enabled[0]


class B2HttpApiConfig:
Expand All @@ -20,23 +25,26 @@ class B2HttpApiConfig:

def __init__(
self,
http_session_factory: Callable[[], requests.Session] = requests.Session,
http_session_factory_base: SessionProtocolFactory = DEFAULT_SESSION_FACTORY,
install_clock_skew_hook: bool = True,
user_agent_append: Optional[str] = None,
_raw_api_class: Optional[Type[AbstractRawApi]] = None,
decode_content: bool = False
decode_content: bool = False,
session_config: SessionConfig = SESSION_CONFIG,
):
"""
A structure with params to be passed to low level API.

:param http_session_factory: a callable that returns a requests.Session object (or a compatible one)
:param http_session_factory_base: a callable that returns a requests.Session object (or a compatible one)
conforming to a provided SessionConfig input.
:param install_clock_skew_hook: if True, install a clock skew hook
:param user_agent_append: if provided, the string will be appended to the User-Agent
:param _raw_api_class: AbstractRawApi-compliant class
:param decode_content: If true, the underlying http backend will try to decode encoded files when downloading,
based on the response headers
:param session_config: Configuration for given session factory.
"""
self.http_session_factory = http_session_factory
self.http_session_factory = lambda: http_session_factory_base(session_config)
self.install_clock_skew_hook = install_clock_skew_hook
self.user_agent_append = user_agent_append
self.raw_api_class = _raw_api_class or self.DEFAULT_RAW_API_CLASS
Expand Down
13 changes: 1 addition & 12 deletions b2sdk/b2http.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@

import arrow
import requests
from requests.adapters import HTTPAdapter
import time

from typing import Any, Dict, Optional
Expand All @@ -27,7 +26,7 @@
InvalidJsonResponse, PotentialS3EndpointPassedAsRealm
)
from .api_config import B2HttpApiConfig, DEFAULT_HTTP_API_CONFIG
from .requests import NotDecompressingResponse
from .utils.not_decompresing_http_adapter import NotDecompressingHTTPAdapter
from .version import USER_AGENT

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -502,16 +501,6 @@ def _translate_and_retry(cls, fcn, try_count, post_params=None):
return cls._translate_errors(fcn, post_params)


class NotDecompressingHTTPAdapter(HTTPAdapter):
"""
HTTP adapter that uses :class:`b2sdk.requests.NotDecompressingResponse` instead of the default
:code:`requests.Response` class.
"""

def build_response(self, req, resp):
return NotDecompressingResponse.from_builtin_response(super().build_response(req, resp))


def test_http():
"""
Run a few tests on error diagnosis.
Expand Down
10 changes: 9 additions & 1 deletion b2sdk/stream/wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,15 @@ def flush(self):
"""
Flush the stream.
"""
self.stream.flush()
# By default, the io.IOBase finalizer performs a flush and then closes a stream.
# Since this class is not expected to manage lifetime of a provided stream object,
# when the `__del__` is called underlying stream can already be closed. That lead
# to a warning being printed on Python 3.11 in certain cases.
# Since this class is expected to behave like a stream, it's better to provide
# this check than to re-implement required features (context manager as of
# the moment of writing this).
if not self.stream.closed:
self.stream.flush()

def readable(self):
return self.stream.readable()
Expand Down
6 changes: 6 additions & 0 deletions b2sdk/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -501,3 +501,9 @@ def iterator_peek(iterator: Iterator[T], count: int) -> Tuple[List[T], Iterator[
assert disable_trace
assert limit_trace_arguments
assert trace_call


def str_to_bool(in_str: str) -> bool:
in_str_lower = in_str.lower()
yes_values = {'y', 'yes', '1', 't', 'true'}
return in_str_lower in yes_values
129 changes: 129 additions & 0 deletions b2sdk/utils/cookie_jar.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
######################################################################
#
# File: b2sdk/utils/cookie_jar.py
#
# Copyright 2022 Backblaze Inc. All Rights Reserved.
#
# License https://www.backblaze.com/using_b2_code.html
#
######################################################################
import email.parser
import threading
import urllib.parse
from dataclasses import (
dataclass,
field,
)
from http.cookiejar import CookieJar as HttpCookieJar
from typing import (
Iterator,
List,
Tuple,
)

from requests.structures import CaseInsensitiveDict


class CookieJar:
"""
Wrapper that handles cookies in an interface-agnostic way.
All you need is to provide url and headers on one end,
and you'll receive strings on another end.

This class is thread-safe.
"""

# Mocks urllib.request.Request interface required by
# http.cookiejar.CookieJar.extract_cookies and http.cookiejar.CookieJar.add_cookie_header
@dataclass
class _Request:
url: str
origin_req_host: str

headers: CaseInsensitiveDict = field(default_factory=CaseInsensitiveDict)
unverifiable: bool = False
cookies: List[str] = field(default_factory=list)

parsed_url = None

def __post_init__(self):
self.parsed_url = urllib.parse.urlparse(self.url)

@property
def type(self) -> str:
return self.parsed_url.scheme

@property
def host(self) -> str:
return self.parsed_url.hostname

def has_header(self, header_name: str) -> bool:
return header_name in self.headers

def get_header(self, header_name: str) -> str:
return self.headers[header_name]

def header_items(self) -> Iterator[Tuple[str, str]]:
for pair in self.headers.items():
yield pair

def add_unredirected_header(self, header_name, header_value) -> None:
# It is assumed that CookieJar will only touch cookies in this way.
self.cookies.append(header_value)
self.headers[header_name] = header_value

def get_full_url(self) -> str:
return self.url

# Mocks urllib.request.HTTPResponse interface
# required by http.cookiejar.CookieJar.extract_cookies
@dataclass
class _Response:
headers: List[Tuple[str, str]]

def info(self) -> email.message.EmailMessage:
message = email.message.EmailMessage()
for key, value in self.headers:
message.add_header(key, value)
return message

def __init__(self):
self.lock = threading.Lock()
# Default policy is to support netscape and to not support Set-Cookie2.
# This way we know that we should receive only one set of cookies.
self.jar = HttpCookieJar()
self.original_host = None

def add_headers(self, url: str, headers: List[Tuple[str, str]]) -> None:
"""
Add Cookies

Headers are filtered and cookies are assigned to the url.
If this is the very first query, host is also assumed
to be the original host for purposes of future requests.
"""
with self.lock:
response = self._Response(headers)
request = self._Request(url, self.original_host)

if self.original_host is None:
self.original_host = request.host

# Provided classes meet minimal interface requirements
self.jar.extract_cookies(response, request) # noqa

def iter_cookies(self, url: str) -> Iterator[str]:
"""
Fetches all the cookies from the jar for given url.
"""
with self.lock:
request = self._Request(url, self.original_host)
# Provided class meets minimal interface requirements
self.jar.add_cookie_header(request) # noqa
for cookie in request.cookies:
yield cookie

def clear(self) -> None:
with self.lock:
self.original_host = None
self.jar.clear()
Loading