You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an html that contains a malformed link with a ] character:
<p><ahfref="http://sample.com] ">My link</a>
When converting to PDF, I get the following error:
weasyprint/__init__.py:257: in write_pdf
self.render(font_config, counter_style, **options)
weasyprint/__init__.py:214: in render
return Document._render(self, font_config, counter_style, options)
weasyprint/document.py:262: in _render
[Page(page_box) for page_box in page_boxes],
weasyprint/document.py:262: in <listcomp>
[Page(page_box) for page_box in page_boxes],
weasyprint/document.py:76: in __init__
gather_anchors(
weasyprint/anchors.py:119: in gather_anchors
gather_anchors(child, anchors, links, bookmarks, inputs, matrix)
weasyprint/anchors.py:119: in gather_anchors
gather_anchors(child, anchors, links, bookmarks, inputs, matrix)
weasyprint/anchors.py:119: in gather_anchors
gather_anchors(child, anchors, links, bookmarks, inputs, matrix)
weasyprint/anchors.py:119: in gather_anchors
gather_anchors(child, anchors, links, bookmarks, inputs, matrix)
weasyprint/anchors.py:119: in gather_anchors
gather_anchors(child, anchors, links, bookmarks, inputs, matrix)
weasyprint/anchors.py:83: in gather_anchors
link = box.style['link']
weasyprint/css/__init__.py:792: in __missing__
value = COMPUTER_FUNCTIONS[key](self, key, value)
weasyprint/css/computed_values.py:563: in link
return get_link_attribute(style.element, value, style.base_url)
weasyprint/urls.py:154: in get_link_attribute
parsed = urlsplit(uri)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
url = '', scheme = 'https', allow_fragments = True
def urlsplit(url, scheme='', allow_fragments=True):
"""Parse a URL into 5 components:
<scheme>://<netloc>/<path>?<query>#<fragment>
Return a 5-tuple: (scheme, netloc, path, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes."""
url, scheme, _coerce_result = _coerce_args(url, scheme)
url = _remove_unsafe_bytes_from_url(url)
scheme = _remove_unsafe_bytes_from_url(scheme)
allow_fragments = bool(allow_fragments)
key = url, scheme, allow_fragments, type(url), type(scheme)
cached = _parse_cache.get(key, None)
if cached:
return _coerce_result(cached)
if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth
clear_cache()
netloc = query = fragment = ''
i = url.find(':')
if i > 0:
if url[:i] == 'http': # optimize the common case
url = url[i+1:]
if url[:2] == '//':
netloc, url = _splitnetloc(url, 2)
if (('[' in netloc and ']' not in netloc) or
(']' in netloc and '[' not in netloc)):
raise ValueError("Invalid IPv6 URL")
if allow_fragments and '#' in url:
url, fragment = url.split('#', 1)
if '?' in url:
url, query = url.split('?', 1)
_checknetloc(netloc)
v = SplitResult('http', netloc, url, query, fragment)
_parse_cache[key] = v
return _coerce_result(v)
for c in url[:i]:
if c not in scheme_chars:
break
else:
# make sure "url" is not actually a port number (in which case
# "scheme" is really part of the path)
rest = url[i+1:]
if not rest or any(c not in '0123456789' for c in rest):
# not a port number
scheme, url = url[:i].lower(), rest
if url[:2] == '//':
netloc, url = _splitnetloc(url, 2)
if (('[' in netloc and ']' not in netloc) or
(']' in netloc and '[' not in netloc)):
> raise ValueError("Invalid IPv6 URL")
E ValueError: Invalid IPv6 URL
/usr/local/lib/python3.8/urllib/parse.py:474: ValueError
This causes the conversion to fail because it thinks that it's a malformed IPv6 URL.
I think that the library should still generate a file with the original URL, even if it is malformed, and possibly print a warning to the console.
The text was updated successfully, but these errors were encountered:
edugonza
added a commit
to edugonza/WeasyPrint
that referenced
this issue
Jan 15, 2024
I have an html that contains a malformed link with a
]
character:When converting to PDF, I get the following error:
This causes the conversion to fail because it thinks that it's a malformed IPv6 URL.
I think that the library should still generate a file with the original URL, even if it is malformed, and possibly print a warning to the console.
The text was updated successfully, but these errors were encountered: