Use the new HttpResponse that replaces ResponseData in web_poet #67

BurnzZ · 2022-03-28T06:54:02Z

Reference PR from web_poet: scrapinghub/web-poet#30.

This also uses the small enhancement in scrapinghub/web-poet#33.

Checklist before release:

Remove references repo in setup.py and tox.ini that prevents breaking the CI on this PR

Reference PR from web_poet: scrapinghub/web-poet#30

codecov · 2022-03-28T06:55:45Z

Codecov Report

Merging #67 (a254ba4) into master (0965c68) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master      #67   +/-   ##
=======================================
  Coverage   95.96%   95.96%           
=======================================
  Files           9        9           
  Lines         372      372           
=======================================
  Hits          357      357           
  Misses         15       15

Impacted Files	Coverage Δ
scrapy_poet/__init__.py	`100.00% <ø> (ø)`
scrapy_poet/middleware.py	`100.00% <100.00%> (ø)`
scrapy_poet/page_input_providers.py	`94.00% <100.00%> (ø)`

docs/providers.rst

scrapy_poet/page_input_providers.py

tests/test_middleware.py

docs/providers.rst

kmike · 2022-05-06T20:39:56Z

docs/providers.rst

-
+            """Build a ``web_poet.HttpResponse`` instance using a Scrapy ``Response``"""
+            return [
+                HttpResponse(


Suggested change

HttpResponse(

web_poet.HttpResponse(

docs/providers.rst

kmike · 2022-05-06T20:57:59Z

scrapy_poet/page_input_providers.py

+                response_data["url"],
+                response_data["body"],
+                status=response_data["status"],
+                headers=HttpResponseHeaders.from_bytes_dict(response_data["headers"]),


Why is HttpResponseHeaders.from_bytes_dict needed here?

@kmike It's because Scrapy headers look like this:

scrapy_headers = { b"Content-Encoding": [b"gzip", b"br"], b"Content-Type": [b"text/html"], b"content-length": b"648", }

Reference for the motivation: scrapinghub/web-poet#33 (comment)

If I'm not wrong, here the result is not coming from Scrapy, it's a part of serialization / deserialization for cache.

Ahhh I see what you mean. Nice catch @kmike ! 🙌 Fixed this in #72

kmike

The PR looks good @BurnzZ!

Co-authored-by: Mikhail Korobov <kmike84@gmail.com>

BurnzZ · 2022-05-07T08:52:00Z

Thanks for the review @kmike ! 🙏 I haven't ticked off the TODO-list on this PR regarding updating the setup.py and tox.ini deps. In anycase, the PR in #62 is built on top of this one so we could look into it again.

Use the new HttpResponse that replaces ResponseData in web_poet

c69c917

Reference PR from web_poet: scrapinghub/web-poet#30

BurnzZ requested a review from kmike March 28, 2022 06:54

update CHANGELOG to reflect web_poet.HttpResponse usage

0b3e5bd

kmike reviewed Mar 28, 2022

View reviewed changes

docs/providers.rst Outdated Show resolved Hide resolved

kmike reviewed Mar 28, 2022

View reviewed changes

scrapy_poet/page_input_providers.py Outdated Show resolved Hide resolved

kmike reviewed Mar 28, 2022

View reviewed changes

scrapy_poet/page_input_providers.py Outdated Show resolved Hide resolved

kmike reviewed Mar 28, 2022

View reviewed changes

tests/test_middleware.py Outdated Show resolved Hide resolved

ensure proper HttpResponse creation

64263a9

BurnzZ mentioned this pull request Mar 28, 2022

integration for web-poet's support on additional requests and Meta #62

Merged

11 tasks

kmike reviewed Apr 28, 2022

View reviewed changes

docs/providers.rst Show resolved Hide resolved

update due to upstream changes to web-poet

adc3ffd

BurnzZ mentioned this pull request May 3, 2022

Providers for HttpResponseBody and HttpResponseHeaders #70

Open

kmike reviewed May 6, 2022

View reviewed changes

docs/providers.rst Outdated Show resolved Hide resolved

kmike reviewed May 6, 2022

View reviewed changes

kmike approved these changes May 6, 2022

View reviewed changes

Update docs/providers.rst code example

a254ba4

Co-authored-by: Mikhail Korobov <kmike84@gmail.com>

BurnzZ merged commit 67a788b into master May 7, 2022

BurnzZ deleted the responsedata-to-httpresponse branch May 7, 2022 08:52

BurnzZ mentioned this pull request May 16, 2022

remove from_bytes_dict() call when deserializing responses #72

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the new HttpResponse that replaces ResponseData in web_poet #67

Use the new HttpResponse that replaces ResponseData in web_poet #67

BurnzZ commented Mar 28, 2022 •

edited

Loading

codecov bot commented Mar 28, 2022 •

edited

Loading

kmike May 6, 2022

kmike May 6, 2022

BurnzZ May 7, 2022

kmike May 8, 2022

BurnzZ May 16, 2022

kmike left a comment

BurnzZ commented May 7, 2022

Use the new HttpResponse that replaces ResponseData in web_poet #67

Use the new HttpResponse that replaces ResponseData in web_poet #67

Conversation

BurnzZ commented Mar 28, 2022 • edited Loading

Checklist before release:

codecov bot commented Mar 28, 2022 • edited Loading

Codecov Report

kmike May 6, 2022

Choose a reason for hiding this comment

kmike May 6, 2022

Choose a reason for hiding this comment

BurnzZ May 7, 2022

Choose a reason for hiding this comment

kmike May 8, 2022

Choose a reason for hiding this comment

BurnzZ May 16, 2022

Choose a reason for hiding this comment

kmike left a comment

Choose a reason for hiding this comment

BurnzZ commented May 7, 2022

BurnzZ commented Mar 28, 2022 •

edited

Loading

codecov bot commented Mar 28, 2022 •

edited

Loading