-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace calls to werkzeug.urls
with urllib.parse
#2608
Conversation
Python doesn't treat characters from RFC 3986 as safe, so a small wrapper is used
need a wrapper to handle MultiDict and drop None
using urllib.parse results in a ~35% speedup uri_to_iri unquotes as much as possible without changing urlsplit meaning iri_to_uri quotes as little as possible without chaning urlsplit meaning
WhatWG URL Standard, and RFC 5987 for send_file use keyword arg safe= to make searching easy apply different safe sets to different parts of URL
As mentioned on Discord, I feel like change this is actually a bug that should possibly be reverted:
|
Yep, I thought it would make sense to apply it consistently, but it sounds like only |
Use
urllib.parse
functions instead of our own implementation. Deprecate all ofwerkzeug.urls
except foruri_to_iri
andiri_to_uri
. My benchmark shows a 35% speedup in routing and responses, 8% from replacing most calls, the rest from refactoring the implementations of theiri
functions. fixes #2600, fixes #2406The only thing that still needed an (internal) wrapper was
urlencode
, since the router and test client might passMultiDict
ordict
to it, and also expectNone
values to be discarded.Since I was replacing all uses of
quote
, I also took the opportunity to review what characters are being treated as safe from percent encoding. We were not being particularly consistent or correct about it. Now all uses ofquote
for URLs use safe characters for the specific part of the URL being quoted, based on the WHATWG URL Standard, which fixes #2601. For quoting thefilename*
option insend_file
, use the RFC 5987attr-char
set, which fixes #2598.iri_to_uri
avoids quoting any ASCII printables, since it's assumed they're intentional at that stage.uri_to_iri
unquotes as much as possible without changing howurllib.parse.urlsplit
will split the URL.As a start to #2602, deprecated passing a tuple or bytes to the
iri
functions.Another side effect of inlining some helper functions is that parsing
application/x-www-form-urlencoded
form data now usesmax_form_parts
likemultipart/form-data
. Not as important in this case, but may as well be consistent.