Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert caching of Content-Type headers #121831

Closed

Conversation

steverep
Copy link
Member

Proposed change

Reverts #103281 as it propagates an error for compressed files being fixed in aio-libs/aiohttp#8485. Rather than also fix it here, IMO just eliminating the copied code is the better solution as I think the logic for caching it is flawed.

#103281 guarantees that the content type is computed for each URI whenever the cache is clear on restart or there's a miss. However, many of those responses may be 304s (not modified) where the header isn't even needed. By contrast, aiohttp only computes it when it's actually going to send the file, in which case this little string operation would just be a drop in the bucket.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

@home-assistant
Copy link

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (http) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of http can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign http Removes the current integration label and assignees on the pull request, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

@bdraco
Copy link
Member

bdraco commented Jul 12, 2024

I wasn't expecting the solution to be to revert the caching. I'll take a look this weekend to see what the impact is.

Copy link
Member

@bdraco bdraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than also fix it here, IMO just eliminating the copied code is the better solution as I think the logic for caching it is flawed.

I disagree that the logic for caching it is flawed. The caching is based on observing real world requests and profiling the server. The repeated guess mime type calls show up in the profile which was the original reason for adding the cache.

I agree ignoring the encoding is a problem. That's even more complicated because https://github.com/aio-libs/aiohttp/pull/8485/files#diff-7eeb7201d1ad679d4091189dc43b7a7b843c1b2fc0f0914301addcb25179e6b1L279 seeks to eliminate sending it, but we should fix the cache instead of eliminating it.

https://github.com/aio-libs/aiohttp/pull/8485/files#r1675359145 would eliminate the need to look at encoding, achieve the desired outcome, and lets stdlib mimetypes handle more of the complexity.

If we went that direction, we could use the MimeTypes object from aiohttp in the future here as well and eliminate most of the duplication.

@home-assistant home-assistant bot marked this pull request as draft July 12, 2024 05:38
@home-assistant
Copy link

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

@bdraco

This comment was marked as duplicate.

@steverep
Copy link
Member Author

I disagree that the logic for caching it is flawed. The caching is based on observing real world requests and profiling the server. The repeated guess mime type calls show up in the profile which was the original reason for adding the cache.

We can figure out the best approach on the aiohttp PR. In the meantime, it would be nice to see that profiling data. In particular, how much CPU reduction was there after caching? And was that reduction just because of a benefit to multiple clients, or are clients also making many requests for the same file? The former is where server caching should live, and the latter needs a different solution.

@bdraco
Copy link
Member

bdraco commented Jul 12, 2024

As its been a few months since #103281, I have already deleted the profiles from the user that provided them where the issue was discovered.

I do remember that they had multiple wall tablets so it should be possible to recreate the data by setting up a few tablets pointing to a dev instance, with #103281 reverted, and patching bootstrap with the attached diff. Be sure to set RUN_PROFILE = True

At the time, it was easier to find with RUN_PY_SPY but since py-spy doesn't have python 3.12 benfred/py-spy#642 support yet, we are stuck with using cProfile.

profiler_at_startup.diff.zip

@MartinHjelmare MartinHjelmare changed the title Revert caching of Content-Type headers from #103281 Revert caching of Content-Type headers Jul 13, 2024
@steverep steverep mentioned this pull request Jul 23, 2024
19 tasks
@steverep
Copy link
Member Author

Closing this since the 3.10 changes make this obsolete anyway, regardless of caching.

Probably the easiest thing to do to make best use of 3.10 code and keep the caching is to just cache the Response objects. That would result in almost no code duplication.

@steverep steverep closed this Jul 23, 2024
@steverep steverep deleted the revert-cache-content-type branch July 23, 2024 17:42
@github-actions github-actions bot locked and limited conversation to collaborators Jul 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants