`_Url` to inherit from `str` #187

BurnzZ · 2023-08-28T15:03:23Z

There was a previous discussion about this before in one of the PRs.

I'm re-opening this for tracking since this part of w3lib.util.to_unicode breaks: https://github.com/scrapy/w3lib/blob/master/w3lib/util.py#L46-L49

In particular, doing something like:

from scrapy.linkextractors import LinkExtractor

link_extractor = LinkExtractor()
link_extractor.extract_links(response)

where response is a web_poet.page_inputs.http.HttpResponse instance and not scrapy.http.Response.

The full stacktrace would be:

File "/usr/local/lib/python3.10/site-packages/scrapy/linkextractors/[lxmlhtml.py](http://lxmlhtml.py/)", line 239, in extract_links
    base_url = get_base_url(response)
  File "/usr/local/lib/python3.10/site-packages/scrapy/utils/[response.py](http://response.py/)", line 27, in get_base_url
    _baseurl_cache[response] = html.get_base_url(
  File "/usr/local/lib/python3.10/site-packages/w3lib/[html.py](http://html.py/)", line 323, in get_base_url
    return safe_url_string(baseurl)
  File "/usr/local/lib/python3.10/site-packages/w3lib/[url.py](http://url.py/)", line 141, in safe_url_string
    decoded = to_unicode(url, encoding=encoding, errors="percentencode")
  File "/usr/local/lib/python3.10/site-packages/w3lib/[util.py](http://util.py/)", line 47, in to_unicode
    raise TypeError(
TypeError: to_unicode must receive bytes or str, got ResponseUrl

Other alternatives could be adjusting Scrapy code instead to cast str(response.url) for every use.

The text was updated successfully, but these errors were encountered:

BurnzZ added the discuss label Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`_Url` to inherit from `str` #187

`_Url` to inherit from `str` #187

BurnzZ commented Aug 28, 2023 •

edited

Loading

_Url to inherit from str #187

_Url to inherit from str #187

Comments

BurnzZ commented Aug 28, 2023 • edited Loading

`_Url` to inherit from `str` #187

`_Url` to inherit from `str` #187

BurnzZ commented Aug 28, 2023 •

edited

Loading