Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tests for libexpat >=2.6.0 (or with CVE-2023-52425 fixes backported) #556

Merged
merged 2 commits into from
Sep 9, 2024

Conversation

hartwork
Copy link
Contributor

@hartwork hartwork commented Mar 3, 2024

Hi! 👋

It was reported that the clustershell test suite breaks with Expat >=2.6.0 (https://bugs.gentoo.org/924601). So this pull request fixes the test suite for Expat >=2.6.0 with CPython 3.13 and backport releases by using the new pyexpat API (python/cpython#115398). There is no rush in merging this, we can definitey wait for that APi to become more widely available if you like, it will likely appear in these very CPython releases:

  • 3.8.19
  • 3.9.19
  • 3.10.14
  • 3.11.9
  • 3.12.3
  • 3.13.0

I should note that Python 3.7 and earlier are dead, hence related pull request #555.
I'm looking forward to your review, maybe I missed something 🍻

Best, Sebastian

@thiell thiell self-assigned this Mar 4, 2024
@hartwork
Copy link
Contributor Author

hartwork commented Mar 7, 2024

Quick update on CPython: All related backport pull requests have now been merged. The next step is CPython security releases and then distros picking them up.

Regarding related expected releases dates, so far I only found:

@thiell thiell added the Quality label Mar 15, 2024
@hartwork
Copy link
Contributor Author

@thiell by any chance do you know if the issue I'm fixing for the tests here affects the core ClusterShell software also? Is it just the tests that break with Expat >=2.6.0 or more?

@hartwork
Copy link
Contributor Author

hartwork commented Mar 20, 2024

  • 3.8.19
  • 3.9.19
  • 3.10.14

PS: These three are out since yesterday March 19, 2024:

@hartwork
Copy link
Contributor Author

@thiell by any chance do you know if the issue I'm fixing for the tests here affects the core ClusterShell software also? Is it just the tests that break with Expat >=2.6.0 or more?

@thiell any chance for a reply this week? We'll have to make decision in Gentoo whether to stabilize libexpat >=2.6.0 or not for all users, and knowing if core ClusterShell is affected outside the tests is the last missing piece in the picture, before making a decision. Thanks in advance, have a good start to the week 🍻

@thiell
Copy link
Collaborator

thiell commented Mar 25, 2024

@hartwork When ClusterShell is configured to use the tree execution mode, the communication protocol with the remote ClusterShell gateway(s) uses XML and we parse it incrementally with an xml.sax.xmlreader.IncrementalParser. So the only instance of IncrementalParser.feed() in the core ClusterShell code is done here: https://github.com/cea-hpc/clustershell/blob/master/lib/ClusterShell/Communication.py#L231

A question I have is does the issue only occur with multithreaded codes? The test case uses two threads for example to simulate a remote gateway. In Communication.py, unlike the test in tests/TreeGatewayTest.py, a single thread is used and the producer of the XML content is remote, on a different machine.

@hartwork
Copy link
Contributor Author

hartwork commented Mar 25, 2024

@thiell thanks for your detailed reply! 👍

The way I read the linked code, self._xml_reader is only used as an event FIFO that is emptied at the speed of availability at:

# pass messages to the driver if ready
while self._xml_reader.msg_available():
msg = self._xml_reader.pop_msg()
assert msg is not None
self.recv(msg)

To me, it looks like the parser can just have its pace and the receivers need to be okay with any rate. If that's reality, it probably means that there is no problem with an occasional delay in events?

With regard to multi-threading, my understanding is that it's technically not the two threads but their blocking lockstep relationship that breaks with Expat >=2.6.0: every parser feed was expected to produce events immediately by the test code. The fix to CVE-2023-52425 in Expat can add delay to creation of events — firing of handlers in the language of Expat — depending on how big or small input chunks get, and that made one thread wait for an event that would not be produced without further input.

Maybe a good stress test would be to turn…

self._parser.feed(msg + b'\n')

…into a loop feeding single bytes to make things worst (and be reverted), and running that with Expat >=2.6.0, if someone gets a chance to.

What do you think?

PS: On a side note Expat is not thread-safe and multi-threaded access would need synchronization around it, likely even from Python code (python/cpython#62170), not just in C. (I'm in favor of considering that a bug in CPython.)

@hartwork
Copy link
Contributor Author

Released by now on 2024-04-02 — https://docs.python.org/release/3.11.9/whatsnew/changelog.html#python-3-11-9

Released by now on 2024-04-09 — https://docs.python.org/release/3.12.3/whatsnew/changelog.html#python-3-12-2

@hartwork hartwork changed the title Fix tests for libexpat >=2.6.0 Fix tests for libexpat >=2.6.0 (or with CVE-2023-52425 fixes backported) Sep 4, 2024
@hartwork
Copy link
Contributor Author

hartwork commented Sep 4, 2024

@thiell I would expect CI on master to fail by now because it has 2.4.7-1ubuntu0.3 installed for all but Python 3.6 and that comes with the CVE-2023-52425 fixes according to https://changelogs.ubuntu.com/changelogs/pool/main/e/expat/expat_2.4.7-1ubuntu0.3/changelog but for some reason it's no failing — any ideas why? And: How do you feel about this pull request today?

@thiell
Copy link
Collaborator

thiell commented Sep 9, 2024

PS: On a side note Expat is not thread-safe and multi-threaded access would need synchronization around it, likely even from Python code (python/cpython#62170), not just in C. (I'm in favor of considering that a bug in CPython.)

Thanks for the clarification. That's fine in our case as we communicate through pipes.

@thiell I would expect CI on master to fail by now because it has 2.4.7-1ubuntu0.3 installed for all but Python 3.6 and that comes with the CVE-2023-52425 fixes according to https://changelogs.ubuntu.com/changelogs/pool/main/e/expat/expat_2.4.7-1ubuntu0.3/changelog but for some reason it's no failing — any ideas why? And: How do you feel about this pull request today?

Actually, I noticed some random hung tests, that I relaunched manually. I just added support for testing with Python 3.12 too, and I saw a hung test. I think this is likely related to this issue.

I reviewed your proposed changes and I am ok to merge them as a first step. Note that I personally don't love the additional flush() method which is called after each feed() + need to be checked with hasattr(). I prefer the Get/SetReparseDeferralEnabled() API, which can restore the previous behavior.

Now apart from this test, I think we could have some problems with delayed events in the parser when used in clustershell's tree mode: XML is parsed incrementally between cluster nodes and the connection might be established for a long time between then, triggering events when they arrive. The source XML is generated by clustershell itself, so I think it would be fine in terms of security to just disable this reparse deferral feature with SetReparseDeferralEnabled(false) if available.

Sorry to be a bit slow but thanks for your help with this!

@thiell thiell added this pull request to the merge queue Sep 9, 2024
@thiell thiell added this to the 1.9.3 milestone Sep 9, 2024
Merged via the queue into cea-hpc:master with commit e49f842 Sep 9, 2024
6 checks passed
thiell added a commit to thiell/clustershell that referenced this pull request Sep 9, 2024
Use SetReparseDeferralEnabled() API to restore previous bahavior
as we control the source of the XML.
@thiell
Copy link
Collaborator

thiell commented Sep 9, 2024

@hartwork Ah, interestingly, it looks like the incremental SAX parser that we use (ExpatParser) didn't get the same API than the xmlparser (xml.parsers.expat.xmlparser). With Python 3.12.5, I get the following error when trying to call ExpatParser.SetReparseDeferralEnabled():

  File "/home/runner/work/clustershell/clustershell/lib/ClusterShell/Communication.py", line 194, in __init__
    self._parser.SetReparseDeferralEnabled(False)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ExpatParser' object has no attribute 'SetReparseDeferralEnabled'

thiell added a commit to thiell/clustershell that referenced this pull request Sep 9, 2024
This is due to libexpat 2.6 reparse deferral as documented in cea-hpc#556.

Unfortunately ExpatParser doesn't seem to have SetReparseDeferralEnabled(),
so we use an explicit flush() call to make sure all handlers are called
immediately. We control the source XML.

NOTE: Only available in Python 3.13 and backports.
@hartwork
Copy link
Contributor Author

@hartwork Ah, interestingly, it looks like the incremental SAX parser that we use (ExpatParser) didn't get the same API than the xmlparser (xml.parsers.expat.xmlparser).

@thiell correct, the description of python/cpython#115623 has the list of all related methods added (and to which class). The involved classes are quite different and there was not much room for how to implement things in CPython.

With Python 3.12.5, I get the following error when trying to call ExpatParser.SetReparseDeferralEnabled():

  File "/home/runner/work/clustershell/clustershell/lib/ClusterShell/Communication.py", line 194, in __init__
    self._parser.SetReparseDeferralEnabled(False)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ExpatParser' object has no attribute 'SetReparseDeferralEnabled'

Correct, there is no such method in that class.

Could you summarize status and plans for the new pull request #569? That would be great!

github-merge-queue bot pushed a commit that referenced this pull request Sep 11, 2024
This is due to libexpat 2.6 reparse deferral as documented in #556.

Unfortunately ExpatParser doesn't seem to have SetReparseDeferralEnabled(),
so we use an explicit flush() call to make sure all handlers are called
immediately. We control the source XML.

NOTE: Only available in Python 3.13 and backports.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants