-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python ClientWrapperTest fails occasionally #1714
Comments
Well investigated. However, it might be related to something else. On my machine (Gentoo Linux, protobuf-3.14.0 installed via system package manager), the test passes:
Protobuf (and its Python bindings) are already "stable" on Gentoo Linux (at least for amd64): https://packages.gentoo.org/packages/dev-libs/protobuf and https://packages.gentoo.org/packages/dev-python/protobuf-python. I do have
But basically no package (including protobuf) install bindings for Python 2 anymore. Configuration of ola-0.10.8 goes like that:
Gentoo Linux is a bit exotic for people not used to it. However, if you want I can provide you a Dockerfile so you can reproduce and/or continue investigating with that if you'd like to. |
Just tested also with the latest master (commit a95a6f7) and the tests are also all PASSing. However, my system runs fewer tests than yours:
|
That matches my experience on Fedora with protobuf 3.14, so it appears the issue is confined to Ubuntu. Going to start digging through changelogs. |
I tweaked the failing test a bit to get better info: diff --git a/python/ola/ClientWrapperTest.py b/python/ola/ClientWrapperTest.py
index cd1e5cf2d..a24041d15 100644
--- a/python/ola/ClientWrapperTest.py
+++ b/python/ola/ClientWrapperTest.py
@@ -186,13 +186,13 @@ class ClientWrapperTest(unittest.TestCase):
self.assertIsNotNone(results.c_called)
self.assertIsNotNone(results.d_called)
- self.assertTrue(results.a_called - self.start <
+ self.assertLess(results.a_called - self.start,
datetime.timedelta(milliseconds=5))
- self.assertTrue(results.b_called - self.start >=
+ self.assertGreaterEqual(results.b_called - self.start,
datetime.timedelta(milliseconds=5))
- self.assertTrue(results.c_called - self.start >=
+ self.assertGreaterEqual(results.c_called - self.start,
datetime.timedelta(milliseconds=10))
- self.assertTrue(results.d_called - self.start >=
+ self.assertGreaterEqual(results.d_called - self.start,
datetime.timedelta(milliseconds=15))
sockets[0].close() With the new test that says what the difference is, it fails with:
However, I ran the same test a couple more times, without changing anything, and it passed. Are we just dealing with system scheduler variances? |
Glad you're investigating. Though I don't know what protobuf could have to
do with either of those failures.
On Thu, Feb 18, 2021 at 1:09 PM Dan Keenan ***@***.***> wrote:
FAIL: python/ola/ClientWrapperTest.sh
=====================================
...F/usr/lib/python3.8/unittest/case.py:704: ResourceWarning: unclosed <socket.socket fd=4, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0>
outcome.errors.clear()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
..
Glancing at the tests, the last test (testSend) doesn't have the
sockets[0].close()
sockets[1].close()
at the end. That looks like my mistake.
======================================================================
FAIL: testEventLoop (__main__.ClientWrapperTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./python/ola/ClientWrapperTest.py", line 195, in testEventLoop
self.assertTrue(results.d_called - self.start >=
AssertionError: False is not true
Only way I see that happening would be a strange rounding error, but I'd
be really surprised it's repeatable. Curious what d_called and start are.
I considered putting in checks that were a bit less than the requested
delays (so 14ms here) but never saw it fail.
Bruce
|
(messages overlapped...)
I don't see how, as the check for when to call the task is explicit in the
code (it's not just a sleep):
return self._run_at < now
but I wonder if there's some issue with timer precision underlying the
python implementation giving different results in different places.
I would just subtract 1 from each of the comparisons and leave it at that.
Bruce
…On Thu, Feb 18, 2021 at 3:26 PM Dan Keenan ***@***.***> wrote:
I tweaked the failing test a bit to get better info:
diff --git a/python/ola/ClientWrapperTest.py b/python/ola/ClientWrapperTest.py
index cd1e5cf2d..a24041d15 100644--- a/python/ola/ClientWrapperTest.py+++ b/python/ola/ClientWrapperTest.py@@ -186,13 +186,13 @@ class ClientWrapperTest(unittest.TestCase):
self.assertIsNotNone(results.c_called)
self.assertIsNotNone(results.d_called)
- self.assertTrue(results.a_called - self.start <+ self.assertLess(results.a_called - self.start,
datetime.timedelta(milliseconds=5))- self.assertTrue(results.b_called - self.start >=+ self.assertGreaterEqual(results.b_called - self.start,
datetime.timedelta(milliseconds=5))- self.assertTrue(results.c_called - self.start >=+ self.assertGreaterEqual(results.c_called - self.start,
datetime.timedelta(milliseconds=10))- self.assertTrue(results.d_called - self.start >=+ self.assertGreaterEqual(results.d_called - self.start,
datetime.timedelta(milliseconds=15))
sockets[0].close()
With the new test that says what the difference is, it fails with:
AssertionError: datetime.timedelta(microseconds=14612) not greater than or equal to datetime.timedelta(microseconds=15000)
However, I ran the same test a couple more times, without changing
anything, and it passed. Are we just dealing with system scheduler
variances?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1714 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADNHN5NWKT2RYBDWBEQSH5TS7VZXNANCNFSM4X2YUA5A>
.
|
I've got a VM rebuilding the project on a single core to see what happens (it was difficult to get it to fail locally on my 12-core i7). Will work on a PR that relaxes the timing slightly. |
Oops, I thought I'd raised this elsewhere and tagged @brucelowekamp for some input but clearly not. I hit the same thing on the Arm builds on Travis, but only confusingly at some point later the commits to get them merged worked fine, so I guess after some system change or similar, or package upgrade. I get (from https://travis-ci.org/github/OpenLightingProject/ola/jobs/755992758 which is a build of #1687):
|
Relax timing on ClientWrapperTest
As @peternewman on #1716 (comment), tests are still failing occasionally (at least on ARM) even after #1716 was merged. Given what DMX timings can be in popular products, it's probably safe to raise the allowed timing delta from 500 µs to 1000 µs (1 ms). I suppose the real question is what resolution we want to guarantee callbacks will run with? Trying to avoid changing tests when perhaps those tests are saying there's a problem. |
This has apparently been a known issue for a while, but I've narrowed down the circumstances it can happen in while working on OpenLightingProject/website#14. Warning - it's a bit involved.
When protobuf is installed from pip (i.e.
pip3 install protobuf
), ClientWrapperTest fails. When protobuf is installed from the system package manager (i.e.apt-get install python3-protobuf
), the test passes. Checked with the latest commit in master (a95a6f7 currently) with Ubuntu 20.10 and Ubuntu 20.04. I haven't properly checked Fedora yet (only a cursory test while writing docs), but I suspect it will have the same issue for reasons detailed below.Some speculation:
My first thought was an issue between Python2 packages and Python3 packages. These test systems only have Python3, so that's not it.
Further research reveals that GNOME distros (or, at least, Ubuntu and Fedora) come with
gnome-online-accounts
, which itself depends onpython3-protobuf
. This means that GNOME users will likely have protobuf for python already installed on their systems. If they install deps from pip (e.g. withpip3 install protobuf numpy
), pip won't install another protobuf as it sees the system version already. This also explains why the issue occurs on headless systems (i.e. Travis), as those systems won't have the preinstalled python protobuf and will install it from pip if you ask.But what's the difference between the two sources? My instinct is that something changed in newer Protobuf releases (I know this is a regular problem here). The latest release on pip is 3.14.0. Ubuntu 20.10 will get you 3.12.3. Ubuntu 20.04 provides 3.6.1. Fedora 33 has 3.12.4. Fedora 32 has 3.11.2.
To confirm this theory, I installed protobuf 3.12.4 from pip and ran tests, which all passed. protobuf 3.13.0 also passes. This leads me to conclude something about protobuf 3.14 is causing problems. Caveat - I havn't tested any of this with Python2.
Test log:
The text was updated successfully, but these errors were encountered: