Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix infinite callback loop when time is not moving forward #10151

Merged
merged 5 commits into from
Dec 17, 2024

Conversation

bmerry
Copy link
Contributor

@bmerry bmerry commented Dec 10, 2024

What do these changes do?

If the keepalive handler is called too soon, it reschedules itself. The test used now <= close_time, which means that an exactly on-time notification is treated as "too soon", causing an automatic rescheduling. For real systems the time will eventually advance and break the loop, but with async-solipsism, time doesn't advance until there is some reason to sleep and the loop is infinite.

Are there changes in behavior for the user?

This will fix infinite loops when using async-solipsism.

Is it a substantial burden for the maintainers to support this?

No. This does not increase the amount of code at all.

Related issue number

Fixes #10149.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist (I am assuming that the keepalive functionality has existing test coverage)
  • Documentation reflects the changes (no documentation change needed)
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt (already there)
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES/ folder
    • name it <issue_or_pr_num>.<type>.rst (e.g. 588.bugfix.rst)

    • if you don't have an issue number, change it to the pull request
      number after creating the PR

      • .bugfix: A bug fix for something the maintainers deemed an
        improper undesired behavior that got corrected to match
        pre-agreed expectations.
      • .feature: A new behavior, public APIs. That sort of stuff.
      • .deprecation: A declaration of future API removals and breaking
        changes in behavior.
      • .breaking: When something public is removed in a breaking way.
        Could be deprecated in an earlier release.
      • .doc: Notable updates to the documentation structure or build
        process.
      • .packaging: Notes for downstreams about unobvious side effects
        and tooling. Changes in the test invocation considerations and
        runtime assumptions.
      • .contrib: Stuff that affects the contributor experience. e.g.
        Running tests, building the docs, setting up the development
        environment.
      • .misc: Changes that are hard to assign to any of the above
        categories.
    • Make sure to use full sentences with correct case and punctuation,
      for example:

      Fixed issue with non-ascii contents in doctest text files
      -- by :user:`contributor-gh-handle`.

      Use the past tense or the present tense a non-imperative mood,
      referring to what's changed compared to the last released version
      of this project.

If the keepalive handler is called too soon, it reschedules itself. The
test used `now <= close_time`, which means that an exactly on-time
notification is treated as "too soon", causing an automatic
rescheduling. For real systems the time will eventually advance and
break the loop, but with async-solipsism, time doesn't advance until
there is some reason to sleep and the loop is infinite.

Closes aio-libs#10149.
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Dec 10, 2024
Copy link

codecov bot commented Dec 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.75%. Comparing base (6200513) to head (03f7a3c).
Report is 5 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #10151   +/-   ##
=======================================
  Coverage   98.75%   98.75%           
=======================================
  Files         122      122           
  Lines       36954    36997   +43     
  Branches     4411     4413    +2     
=======================================
+ Hits        36494    36538   +44     
  Misses        313      313           
+ Partials      147      146    -1     
Flag Coverage Δ
CI-GHA 98.64% <100.00%> (+<0.01%) ⬆️
OS-Linux 98.33% <95.45%> (+<0.01%) ⬆️
OS-Windows 96.18% <100.00%> (+<0.01%) ⬆️
OS-macOS 97.44% <95.45%> (+<0.01%) ⬆️
Py-3.10.11 97.28% <100.00%> (+<0.01%) ⬆️
Py-3.10.15 97.82% <95.45%> (-0.05%) ⬇️
Py-3.11.10 ?
Py-3.11.11 97.85% <95.45%> (?)
Py-3.11.9 97.33% <100.00%> (-0.01%) ⬇️
Py-3.12.7 ?
Py-3.12.8 98.38% <100.00%> (?)
Py-3.13.0 ?
Py-3.13.1 98.38% <95.45%> (+0.02%) ⬆️
Py-3.9.13 97.20% <100.00%> (+<0.01%) ⬆️
Py-3.9.20 97.78% <95.45%> (+0.04%) ⬆️
Py-pypy7.3.16 97.35% <95.45%> (+<0.01%) ⬆️
VM-macos 97.44% <95.45%> (+<0.01%) ⬆️
VM-ubuntu 98.33% <95.45%> (+<0.01%) ⬆️
VM-windows 96.18% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

codspeed-hq bot commented Dec 10, 2024

CodSpeed Performance Report

Merging #10151 will not alter performance

Comparing bmerry:fix-repeated-keepalive (03f7a3c) with master (0070045)

Summary

✅ 47 untouched benchmarks

@bdraco bdraco changed the title Fix infinite callback loop when used with async-solipsism Fix infinite callback loop when time is not moving forward Dec 10, 2024
@bdraco
Copy link
Member

bdraco commented Dec 10, 2024

We need to make this very clear it's not a production problem so we don't end up with unexpected questions or discussions.

Additionally we haven't committed to supporting async-solipsism at this time so I changed the title to better reflect that.

@bdraco bdraco added backport-3.11 Trigger automatic backporting to the 3.11 release branch by Patchback robot backport-3.12 Trigger automatic backporting to the 3.12 release branch by Patchback robot labels Dec 10, 2024
@bdraco
Copy link
Member

bdraco commented Dec 10, 2024

I think we should change the fragment to packaging instead of bugfix since it's for testing and doesn't affect production

@Dreamsorcerer
Copy link
Member

Is there any chance of a regression test without adding a new dependency?

@bdraco
Copy link
Member

bdraco commented Dec 10, 2024

I think a test could be added using freezegun

@bmerry
Copy link
Contributor Author

bmerry commented Dec 10, 2024

I think we should change the fragment to packaging instead of bugfix since it's for testing and doesn't affect production

Sure, I'll do that. But are you sure aiohttp isn't used in any environment in which the timer precision might be low enough that it causes the event to be re-scheduled unnecessarily (which won't be an infinite loop, but will cause more work than necessary)? I'm thinking of environments that deliberately degrade timer precision to prevent Spectra-type side-channel attacks.

I think a test could be added using freezegun

From what I've seen, freezegun only mocks queries of time, but doesn't provide any mechanisms to fake time passing in select/epoll and similar functions. I don't know how easy it'll be to mock that reliably without falling down a rabbit-hole and re-inventing async-solipsism. It's certainly not something I'll have time to do before I go on leave until January.

I've never tested async-solipsism on non-Linux OSes, so it might not be something you want the test suite to depend on at this point.

@bdraco
Copy link
Member

bdraco commented Dec 10, 2024

I think we should change the fragment to packaging instead of bugfix since it's for testing and doesn't affect production

Sure, I'll do that. But are you sure aiohttp isn't used in any environment in which the timer precision might be low enough that it causes the event to be re-scheduled unnecessarily (which won't be an infinite loop, but will cause more work than necessary)? I'm thinking of environments that deliberately degrade timer precision to prevent Spectra-type side-channel attacks.

That seems unlikely as we haven't had any issue reports, and even CLOCK_MONOTONIC_COARSE should tick forward enough that it would not be a problem.

I think a test could be added using freezegun

From what I've seen, freezegun only mocks queries of time, but doesn't provide any mechanisms to fake time passing in select/epoll and similar functions. I don't know how easy it'll be to mock that reliably without falling down a rabbit-hole and re-inventing async-solipsism. It's certainly not something I'll have time to do before I go on leave until January.

I've never tested async-solipsism on non-Linux OSes, so it might not be something you want the test suite to depend on at this point.

We don't have any maintainers familiar with it either so its not something we could effectively troubleshoot so we wouldn't want to depend on it right now.

@bdraco
Copy link
Member

bdraco commented Dec 10, 2024

From what I've seen, freezegun only mocks queries of time, but doesn't provide any mechanisms to fake time passing in select/epoll and similar functions. I don't know how easy it'll be to mock that reliably without falling down a rabbit-hole and re-inventing async-solipsism. It's certainly not something I'll have time to do before I go on leave until January.

I think we only need to patch loop.time for this case

@webknjaz
Copy link
Member

I think we should change the fragment to packaging

I'm not so sure — it doesn't touch any packaging metadata / mechanisms / downstream expectations. This would be misleading, in my opinion. I suppose, we could go for misc instead. If not, then it might be contrib.

But yeah, a regression test would be most welcome here...

@bdraco
Copy link
Member

bdraco commented Dec 10, 2024

misc works for me

@bmerry
Copy link
Contributor Author

bmerry commented Dec 11, 2024

Ok, renamed to misc. I might know how to write a unit test using only freezegun, but I'll probably only have time to experiment with it next year.

@bdraco
Copy link
Member

bdraco commented Dec 14, 2024

I added a test to ensure the keep alive expires on time. I don't love it, but since RequestHandler isn't patchable, I couldn't come up with something that didn't access the internals.

@bdraco bdraco merged commit 7c12b1a into aio-libs:master Dec 17, 2024
40 checks passed
Copy link
Contributor

patchback bot commented Dec 17, 2024

Backport to 3.11: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.11/7c12b1a9c8b2a9e33fb559229a4c4695de39f08c/pr-10151

Backported as #10173

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback bot pushed a commit that referenced this pull request Dec 17, 2024
Co-authored-by: J. Nick Koston <nick@koston.org>
(cherry picked from commit 7c12b1a)
Copy link
Contributor

patchback bot commented Dec 17, 2024

Backport to 3.12: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.12/7c12b1a9c8b2a9e33fb559229a4c4695de39f08c/pr-10151

Backported as #10174

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback bot pushed a commit that referenced this pull request Dec 17, 2024
Co-authored-by: J. Nick Koston <nick@koston.org>
(cherry picked from commit 7c12b1a)
bdraco pushed a commit that referenced this pull request Dec 17, 2024
…ime is not moving forward (#10173)

Co-authored-by: Bruce Merry <1963944+bmerry@users.noreply.github.com>
Fixes #123'). -->
Fixes #10149.
bdraco pushed a commit that referenced this pull request Dec 17, 2024
…ime is not moving forward (#10174)

Co-authored-by: Bruce Merry <1963944+bmerry@users.noreply.github.com>
Fixes #123'). -->
Fixes #10149.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-3.11 Trigger automatic backporting to the 3.11 release branch by Patchback robot backport-3.12 Trigger automatic backporting to the 3.12 release branch by Patchback robot bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infinite loop when used with async_solipsism
4 participants