-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed content-handler response-headers/object-storage collision. #4251
Conversation
dc291cf
to
0debecc
Compare
pulpcore/content/handler.py
Outdated
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/get_object.html | ||
# response-headers S3 respects, and what they map to in an S3 object | ||
_S3_RESPONSE_HEADER_MAP = { | ||
"Content-Disposition": "ResponseContentDisposition", | ||
"Content-Type": "ResponseContentType", | ||
"Cache-Control": "ResponseCacheControl", | ||
"Content-Language": "ResponseContentLanguage", | ||
"Expires": "ResponseExpires", | ||
"Content-Encoding": "ResponseContentEncoding", | ||
} | ||
# https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.contentsettings?view=azure-python | ||
# response-headers azure respects, and what they map to in an azure object | ||
_AZURE_RESPONSE_HEADER_MAP = { | ||
"Content-Disposition": "content_disposition", | ||
"Content-Type": "content_type", | ||
"Cache-Control": "cache_control", | ||
"Content-Language": "content_language", | ||
"Content-Encoding": "content_encoding", | ||
} | ||
# https://gcloud.readthedocs.io/en/latest/storage-blobs.html | ||
# response-headers Google Cloud Storage respects, and what they map to in a GCS object | ||
_GCS_RESPONSE_HEADER_MAP = { | ||
"Content-Disposition": "content_disposition", | ||
"Content-Type": "content_type", | ||
"Cache-Control": "cache_control", | ||
"Content-Language": "content_language", | ||
"Content-Encoding": "content_encoding", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be more favourable to store in the constants.py
file. Is it worth caching the return value of _set_params_from_headers
(the parameters
variable will be returned instead of overwriting it inside the function)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These both sound like fine ideas, will do!
0debecc
to
e175c1f
Compare
e175c1f
to
cbc799f
Compare
@@ -894,6 +895,16 @@ async def _serve_content_artifact(self, content_artifact, headers, request): | |||
Returns: | |||
The :class:`aiohttp.web.FileResponse` for the file. | |||
""" | |||
|
|||
def _set_params_from_headers(hdrs, storage_domain): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we call this function, we have identified the storage class already, so i think we can spare ourselves one if and dict lookup by providing the response map instead of the storage class name.
def _set_params_from_headers(headers, response_header_map):
params = {}
for key, new_key in response_header_map.items():
if key in headers:
params[new_key] = headers[key]
or even:
def _set_params_from_headers(headers, response_header_map):
return {new_key: headers[key] for key, new_key in response_header_map.items() if key in headers}
We'd call it by, e.g. in the s3 branch:
_set_params_from_headers(headers, S3_RESPONSE_HEADER_MAP)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing the mapping-dictionary "assumes" that a simple lookup is all we're ever going to want to do. If we want/need to do anything more complicated in the future, the current approach lets us do so in a single place.
In terms of performance - given the current state of the content-handler, sparing "one dictionary lookup" is not convincing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about my performance in understanding the code. But it's not going to block this contribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the change!
Optionally, we could remove the if storage_domain in STORAGE_RESPONSE_MAP
condition and convert it to a try/except block for the for loop (https://www.oreilly.com/library/view/python-essentials/9781784390341/ch09s08.html).
I've heard that argument before, but personally feel exceptions should be for exceptional issues - not a main-flow "is this one of the things we know how to deal with?" path. Speed doesn't impress me either - no matter how fast python is at exceptions, building one, executing the jump, and then processing it, isn't as fast as "single key lookup into a (tiny) dictionary". "try it and see" makes more sense the more complicated the possibilities are. (Or it makes sense until you start fielding "OMG FIX THIS ERROR" issues from users seeing database-logging saying "you can't do that" for things you handle via "ask forgiveness" processing...) |
Backport to 3.28: 💚 backport PR created✅ Backport PR branch: Backported as #4265 🤖 @patchback |
Backport to 3.29: 💚 backport PR created✅ Backport PR branch: Backported as #4266 🤖 @patchback |
fixes #4028.