Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed content-handler response-headers/object-storage collision. #4251

Merged
merged 1 commit into from
Aug 9, 2023

Conversation

ggainey
Copy link
Contributor

@ggainey ggainey commented Aug 3, 2023

fixes #4028.

Comment on lines 897 to 924
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/get_object.html
# response-headers S3 respects, and what they map to in an S3 object
_S3_RESPONSE_HEADER_MAP = {
"Content-Disposition": "ResponseContentDisposition",
"Content-Type": "ResponseContentType",
"Cache-Control": "ResponseCacheControl",
"Content-Language": "ResponseContentLanguage",
"Expires": "ResponseExpires",
"Content-Encoding": "ResponseContentEncoding",
}
# https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.contentsettings?view=azure-python
# response-headers azure respects, and what they map to in an azure object
_AZURE_RESPONSE_HEADER_MAP = {
"Content-Disposition": "content_disposition",
"Content-Type": "content_type",
"Cache-Control": "cache_control",
"Content-Language": "content_language",
"Content-Encoding": "content_encoding",
}
# https://gcloud.readthedocs.io/en/latest/storage-blobs.html
# response-headers Google Cloud Storage respects, and what they map to in a GCS object
_GCS_RESPONSE_HEADER_MAP = {
"Content-Disposition": "content_disposition",
"Content-Type": "content_type",
"Cache-Control": "cache_control",
"Content-Language": "content_language",
"Content-Encoding": "content_encoding",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be more favourable to store in the constants.py file. Is it worth caching the return value of _set_params_from_headers (the parameters variable will be returned instead of overwriting it inside the function)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These both sound like fine ideas, will do!

@@ -894,6 +895,16 @@ async def _serve_content_artifact(self, content_artifact, headers, request):
Returns:
The :class:`aiohttp.web.FileResponse` for the file.
"""

def _set_params_from_headers(hdrs, storage_domain):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we call this function, we have identified the storage class already, so i think we can spare ourselves one if and dict lookup by providing the response map instead of the storage class name.

def _set_params_from_headers(headers, response_header_map):
    params = {}
    for key, new_key in response_header_map.items():
        if key in headers:
            params[new_key] = headers[key]

or even:

def _set_params_from_headers(headers, response_header_map):
    return {new_key: headers[key] for key, new_key in response_header_map.items() if key in headers}

We'd call it by, e.g. in the s3 branch:

_set_params_from_headers(headers, S3_RESPONSE_HEADER_MAP)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the mapping-dictionary "assumes" that a simple lookup is all we're ever going to want to do. If we want/need to do anything more complicated in the future, the current approach lets us do so in a single place.

In terms of performance - given the current state of the content-handler, sparing "one dictionary lookup" is not convincing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about my performance in understanding the code. But it's not going to block this contribution.

Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change!

Optionally, we could remove the if storage_domain in STORAGE_RESPONSE_MAP condition and convert it to a try/except block for the for loop (https://www.oreilly.com/library/view/python-essentials/9781784390341/ch09s08.html).

@ggainey
Copy link
Contributor Author

ggainey commented Aug 9, 2023

I like the change!

Optionally, we could remove the if storage_domain in STORAGE_RESPONSE_MAP condition and convert it to a try/except block for the for loop (https://www.oreilly.com/library/view/python-essentials/9781784390341/ch09s08.html).

I've heard that argument before, but personally feel exceptions should be for exceptional issues - not a main-flow "is this one of the things we know how to deal with?" path. Speed doesn't impress me either - no matter how fast python is at exceptions, building one, executing the jump, and then processing it, isn't as fast as "single key lookup into a (tiny) dictionary".

"try it and see" makes more sense the more complicated the possibilities are.

(Or it makes sense until you start fielding "OMG FIX THIS ERROR" issues from users seeing database-logging saying "you can't do that" for things you handle via "ask forgiveness" processing...)

@ggainey ggainey merged commit 4dd2edf into pulp:main Aug 9, 2023
@patchback
Copy link

patchback bot commented Aug 9, 2023

Backport to 3.28: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.28/4dd2edf6edaf2cdc8a0cb2881e00f6c0bc9007b9/pr-4251

Backported as #4265

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@patchback
Copy link

patchback bot commented Aug 9, 2023

Backport to 3.29: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.29/4dd2edf6edaf2cdc8a0cb2881e00f6c0bc9007b9/pr-4251

Backported as #4266

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Content-app doesn't respect response-headers specified by plugins
3 participants