resolve random CI failures #809

prjemian · 2023-02-16T21:05:50Z

close Certain tests still fail randomly in CI #748

prjemian · 2023-02-16T21:42:58Z

Unit test workflow has a manual trigger to re-run. Trigger a re-run.

prjemian · 2023-02-16T21:49:27Z

Py 3.11:

apstools/devices/tests/test_positioner_soft_done.py::test_move_and_stop_nonzero FAILED [ 21%]

=================================== FAILURES ===================================
__________________________ test_move_and_stop_nonzero __________________________

rbv = EpicsSignal(read_pv='gp:gp:float1', name='rbv', value=0.5, timestamp=[167](https://github.com/BCDA-APS/apstools/actions/runs/4198440690/jobs/7282085823#step:15:168)6584057.792367, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float1', limits=False, put_complete=False)
pos = PVPositionerSoftDoneWithStop(prefix='gp:gp:', name='pos', settle_time=0.0, timeout=None, read_attrs=['readback', 'setpoint'], configuration_attrs=['done', 'tolerance', 'target'], limits=None, egu='')

    def test_move_and_stop_nonzero(rbv, pos):
        timed_pause()
    
        # move to non-zero
        longer_delay = 2
        target = round(rand(2, 5), 2)
        delayed_complete(pos, rbv, delay=longer_delay)
        # t0 = time.time()  # time it
        status = pos.move(target)  # readback set by delayed_complete()
        # dt = time.time() - t0
        assert status.done
        assert status.success
        # assert status.elapsed >= longer_delay
        timed_pause()
        # assert dt >= longer_delay
>       assert pos.inposition
E       AssertionError: assert False
E        +  where False = PVPositionerSoftDoneWithStop(prefix='gp:gp:', name='pos', settle_time=0.0, timeout=None, read_attrs=['readback', 'setpoint'], configuration_attrs=['done', 'tolerance', 'target'], limits=None, egu='').inposition

/home/runner/work/apstools/apstools/apstools/devices/tests/test_positioner_soft_done.py:266: AssertionError
=============================== warnings summary ===============================
../../../micromamba-root/envs/anaconda-test-env-py-3.11/lib/python3.11/site-packages/setuptools_scm/git.py:135
  /home/runner/micromamba-root/envs/anaconda-test-env-py-3.11/lib/python3.11/site-packages/setuptools_scm/git.py:135: UserWarning: "/home/runner/work/apstools/apstools" is shallow and may cause errors
    warnings.warn(f'"{wd.path}" is shallow and may cause errors')

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED apstools/devices/tests/test_positioner_soft_done.py::test_move_and_stop_nonzero - AssertionError: assert False
 +  where False = PVPositionerSoftDoneWithStop(prefix='gp:gp:', name='pos', settle_time=0.0, timeout=None, read_attrs=['readback', 'setpoint'], configuration_attrs=['done', 'tolerance', 'target'], limits=None, egu='').inposition
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
============= 1 failed, 134 passed, 1 warning in 110.85s (0:01:50) =============
Error: Process completed with exit code 1.

prjemian · 2023-02-16T22:24:31Z

different test had a failure:

apstools/plans/tests/test_alignment.py::test_direct_implementation_with_rel_scan[signal1-mover1--1.2-1.2-11-cen] FAILED [ 41%]

=================================== FAILURES ===================================
___ test_direct_implementation_with_rel_scan[signal1-mover1--1.2-1.2-11-cen] ___

signal = SynPseudoVoigt(name='pvoigt', value=385, timestamp=1676585458.7826302)
mover = EpicsSignal(read_pv='gp:gp:float1', name='axis', value=-0.9345, timestamp=1676585458.971158, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float1', limits=False, put_complete=False)
start = -1.2, finish = 1.2, npts = 11, feature = 'cen'

    @pytest.mark.parametrize(
        "signal, mover, start, finish, npts, feature",
        [
            # FIXME: [noisy, m1, -1.2, 1.2, 11, "max"],  # slower
            [pvoigt, axis, -1.2, 1.2, 11, "max"],  # faster
            [pvoigt, axis, -1.2, 1.2, 11, "cen"],
            [pvoigt, axis, -1.2, 1.2, 51, "com"],
        ],
    )
    def test_direct_implementation_with_rel_scan(signal, mover, start, finish, npts, feature):
        RE(bps.mv(mover, 0))
        assert get_position(mover) == 0.0
    
generator rel_scan ['e498790c'] (scan num: 2)



=============================== warnings summary ===============================
../../../micromamba-root/envs/anaconda-test-env-py-3.9/lib/python3.11/site-packages/setuptools_scm/git.py:135
  /home/runner/micromamba-root/envs/anaconda-test-env-py-3.9/lib/python3.11/site-packages/setuptools_scm/git.py:135: UserWarning: "/home/runner/work/apstools/apstools" is shallow and may cause errors
    warnings.warn(f'"{wd.path}" is shallow and may cause errors')

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED apstools/plans/tests/test_alignment.py::test_direct_implementation_with_rel_scan[signal1-mover1--1.2-1.2-11-cen] - AssertionError: lo=0.8084099249862092 position=-0.934 fwhm=0.[291](https://github.com/BCDA-APS/apstools/actions/runs/4198581472/jobs/7282422710#step:15:292)00000000000015
assert 0.8084099249862092 <= (-0.934 + 0.29100000000000015)
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
============= 1 failed, 261 passed, 1 warning in 234.80s (0:03:54) =============
Error: Process completed with exit code 1.

prjemian · 2023-02-16T22:51:11Z

This lint workflow failure is not related to the issue:

2023-02-16 22:38:28 [ERROR]   Found errors in [mypy] linter!
2023-02-16 22:38:28 [ERROR]   Error code: 1. Command output:
------
WARNING: The directory '/github/home/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Collecting types-PyYAML
  Downloading types_PyYAML-6.0.12.6-py3-none-any.whl (14 kB)
Installing collected packages: types-PyYAML
Successfully installed types-PyYAML-6.0.12.6

Notice:  A new release of pip available: 22.3.1 -> 23.0
Notice:  To update, run: python -m pip install --upgrade pip
apstools/plans/command_list.py:419: error: Incompatible types in assignment (expression has type "Callable[[Any, Any, Any], Any]", variable has type "None")  [assignment]
Installing missing stub packages:
/venvs/mypy/bin/python -m pip install types-PyYAML


Found 1 error in 1 file (checked 1 source file)
------

prjemian · 2023-02-16T23:02:27Z

The error from the lint CI is a puzzle since it does not point to th eline with the problem. Instead, it points to the last line in the file:

apstools/apstools/plans/command_list.py

Line 419 in 49feeba

COMMAND_LIST_REGISTRY.command = execute_command_list

And the suggestion to upgrade pip or pip install types-PyYAML is all under the control of the github/super-linter@v4 actions workflow. Not sure how best to resolve.

Switch to github/super-linter/slim@v4 on advice.

prjemian · 2023-02-16T23:21:41Z

The root cause appears to be a timing situation involving the EPICS PV monitor events. Some of the test code requires a new value from EPICS faster than the ophyd code has updated its in-memory cache of values as received by CA monitor events. In such cases, it was found that using SIGNAL.get(use_monitor=False) was effective to eliminate the random CI failures observed. The use_monitor=False kwarg forces an immediate CA get from the EPICS IOC. This produced the new value.

prjemian · 2023-02-16T23:43:34Z

Trigger the workflow two more times to confirm no random failures.

prjemian · 2023-02-16T23:44:43Z

Both failed.

Py3.9

apstools/devices/tests/test_positioner_soft_done.py::test_put_and_stop FAILED [ 21%]

=================================== FAILURES ===================================
______________________________ test_put_and_stop _______________________________

rbv = EpicsSignal(read_pv='gp:gp:float1', name='rbv', value=0.5, timestamp=1676590437.933418, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float1', limits=False, put_complete=False)
prec = EpicsSignal(read_pv='gp:gp:float1.PREC', name='prec', value=4, timestamp=631[152](https://github.com/BCDA-APS/apstools/actions/runs/4199116508/jobs/7283603125#step:15:153)000.0, auto_monitor=True, string=False, write_pv='gp:gp:float1.PREC', limits=False, put_complete=False)
pos = PVPositionerSoftDoneWithStop(prefix='gp:gp:', name='pos', settle_time=0.0, timeout=None, read_attrs=['readback', 'setpoint'], configuration_attrs=['done', 'tolerance', 'target'], limits=None, egu='')

    def test_put_and_stop(rbv, prec, pos):
        assert pos.tolerance.get() == -1
        assert pos.precision == prec.get()
    
        def motion(rb_initial, target, rb_mid=None):
            rbv.put(rb_initial)  # make the readback to different
            c_sp = pos._sp_count
            pos.setpoint.put(target)
            assert pos._sp_count == c_sp + 1
            assert math.isclose(pos.readback.get(use_monitor=False), rb_initial, abs_tol=0.02)
            assert math.isclose(pos.setpoint.get(use_monitor=False), target, abs_tol=0.02)
            assert pos.done.get() != pos.done_value
            assert not pos.inposition
    
            if rb_mid is not None:
                # move the readback part-way, but move is not over yet
                rbv.put(rb_mid)
                assert math.isclose(pos.readback.get(use_monitor=False), rb_mid, abs_tol=0.02)
                assert not pos.inposition
    
                # force a stop now
                pos.stop()
                pos.cb_readback()
                assert pos.setpoint.get(use_monitor=False) == rb_mid
                assert pos.readback.get(use_monitor=False) == rb_mid
                assert pos.position == rb_mid
            else:  # interrupted move
                rbv.put(target)  # make the readback match
                rb_new = pos.readback.get(use_monitor=False)
                assert math.isclose(rb_new, target, abs_tol=0.02)
    
            assert pos.inposition
            assert pos.done_value is True
            pos.cb_readback()
            assert pos.done.get() is True
    
        motion(1, 0)  # ensure starting value at 0.0
        motion(0, 1)  # new setpoint
>       motion(1, 0, 0.5)  # interrupted move

/home/runner/work/apstools/apstools/apstools/devices/tests/test_positioner_soft_done.py:247: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

rb_initial = 1, target = 0, rb_mid = 0.5

    def motion(rb_initial, target, rb_mid=None):
        rbv.put(rb_initial)  # make the readback to different
        c_sp = pos._sp_count
        pos.setpoint.put(target)
        assert pos._sp_count == c_sp + 1
        assert math.isclose(pos.readback.get(use_monitor=False), rb_initial, abs_tol=0.02)
        assert math.isclose(pos.setpoint.get(use_monitor=False), target, abs_tol=0.02)
        assert pos.done.get() != pos.done_value
        assert not pos.inposition
    
        if rb_mid is not None:
            # move the readback part-way, but move is not over yet
            rbv.put(rb_mid)
            assert math.isclose(pos.readback.get(use_monitor=False), rb_mid, abs_tol=0.02)
            assert not pos.inposition
    
            # force a stop now
            pos.stop()
            pos.cb_readback()
>           assert pos.setpoint.get(use_monitor=False) == rb_mid
E           AssertionError: assert 1.0 == 0.5
E            +  where 1.0 = <bound method EpicsSignalBase.get of EpicsSignal(read_pv='gp:gp:float2', name='pos_setpoint', parent='pos', value=1.0, timestamp=[167](https://github.com/BCDA-APS/apstools/actions/runs/4199116508/jobs/7283603125#step:15:168)6590437.934744, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float2', limits=False, put_complete=True)>(use_monitor=False)
E            +    where <bound method EpicsSignalBase.get of EpicsSignal(read_pv='gp:gp:float2', name='pos_setpoint', parent='pos', value=1.0, timestamp=1676590437.934744, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float2', limits=False, put_complete=True)> = EpicsSignal(read_pv='gp:gp:float2', name='pos_setpoint', parent='pos', value=1.0, timestamp=1676590437.934744, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float2', limits=False, put_complete=True).get
E            +      where EpicsSignal(read_pv='gp:gp:float2', name='pos_setpoint', parent='pos', value=1.0, timestamp=1676590437.934744, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float2', limits=False, put_complete=True) = PVPositionerSoftDoneWithStop(prefix='gp:gp:', name='pos', settle_time=0.0, timeout=None, read_attrs=['readback', 'setpoint'], configuration_attrs=['done', 'tolerance', 'target'], limits=None, egu='').setpoint

/home/runner/work/apstools/apstools/apstools/devices/tests/test_positioner_soft_done.py:[232](https://github.com/BCDA-APS/apstools/actions/runs/4199116508/jobs/7283603125#step:15:233): AssertionError
=============================== warnings summary ===============================
../../../micromamba-root/envs/anaconda-test-env-py-3.9/lib/python3.11/site-packages/setuptools_scm/git.py:135
  /home/runner/micromamba-root/envs/anaconda-test-env-py-3.9/lib/python3.11/site-packages/setuptools_scm/git.py:135: UserWarning: "/home/runner/work/apstools/apstools" is shallow and may cause errors
    warnings.warn(f'"{wd.path}" is shallow and may cause errors')

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED apstools/devices/tests/test_positioner_soft_done.py::test_put_and_stop - AssertionError: assert 1.0 == 0.5
 +  where 1.0 = <bound method EpicsSignalBase.get of EpicsSignal(read_pv='gp:gp:float2', name='pos_setpoint', parent='pos', value=1.0, timestamp=1676590437.934744, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float2', limits=False, put_complete=True)>(use_monitor=False)
 +    where <bound method EpicsSignalBase.get of EpicsSignal(read_pv='gp:gp:float2', name='pos_setpoint', parent='pos', value=1.0, timestamp=1676590437.934744, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float2', limits=False, put_complete=True)> = EpicsSignal(read_pv='gp:gp:float2', name='pos_setpoint', parent='pos', value=1.0, timestamp=1676590437.934744, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float2', limits=False, put_complete=True).get
 +      where EpicsSignal(read_pv='gp:gp:float2', name='pos_setpoint', parent='pos', value=1.0, timestamp=1676590437.934744, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float2', limits=False, put_complete=True) = PVPositionerSoftDoneWithStop(prefix='gp:gp:', name='pos', settle_time=0.0, timeout=None, read_attrs=['readback', 'setpoint'], configuration_attrs=['done', 'tolerance', 'target'], limits=None, egu='').setpoint
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
============= 1 failed, 133 passed, 1 warning in 111.56s (0:01:51) =============
Error: Process completed with exit code 1.

Py3.10

apstools/devices/tests/test_positioner_soft_done.py::test_move_and_stopped_early FAILED [ 21%]

=================================== FAILURES ===================================
_________________________ test_move_and_stopped_early __________________________

rbv = EpicsSignal(read_pv='gp:gp:float1', name='rbv', value=2.27, timestamp=[167](https://github.com/BCDA-APS/apstools/actions/runs/4199113730/jobs/7283596237#step:15:168)6590367.240589, tolerance=0.0001, auto_monitor=True, string=False, write_pv='gp:gp:float1', limits=False, put_complete=False)
pos = PVPositionerSoftDoneWithStop(prefix='gp:gp:', name='pos', settle_time=0.0, timeout=None, read_attrs=['readback', 'setpoint'], configuration_attrs=['done', 'tolerance', 'target'], limits=None, egu='')

    def test_move_and_stopped_early(rbv, pos):
        def motion(target, delay, interrupt=False):
            timed_pause(0.1)  # allow previous activities to settle down
    
            t0 = time.time()
            delayed_complete(pos, rbv, delay=delay)
            status = pos.move(target, wait=False)  # readback set by delayed_complete()
    
            timed_pause(delay / 2 if interrupt else delay + 0.1)
            dt = time.time() - t0
            # fmt: off
            rb_new = pos.readback.get(use_monitor=False)
            arrived = math.isclose(rb_new, target, abs_tol=pos.actual_tolerance)
            # fmt: on
            if interrupt:
                assert not status.done
                assert not status.success
                assert not arrived, f"{dt=:.3f}"
                pos.stop()
            else:
                assert status.done
                assert status.success
                assert arrived
            confirm_in_position(pos, dt)
    
        target = round(rand(2, 5), 2)
        motion(target, rand(0.4, 0.3))
>       motion(target - 1, rand(0.4, 0.3), True)

/home/runner/work/apstools/apstools/apstools/devices/tests/test_positioner_soft_done.py:299: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

target = 1.27, delay = 0.544[177](https://github.com/BCDA-APS/apstools/actions/runs/4199113730/jobs/7283596237#step:15:178)327374102, interrupt = True

    def motion(target, delay, interrupt=False):
        timed_pause(0.1)  # allow previous activities to settle down
    
        t0 = time.time()
        delayed_complete(pos, rbv, delay=delay)
        status = pos.move(target, wait=False)  # readback set by delayed_complete()
    
        timed_pause(delay / 2 if interrupt else delay + 0.1)
        dt = time.time() - t0
        # fmt: off
        rb_new = pos.readback.get(use_monitor=False)
        arrived = math.isclose(rb_new, target, abs_tol=pos.actual_tolerance)
        # fmt: on
        if interrupt:
>           assert not status.done
E           assert not True
E            +  where True = MoveStatus(done=True, pos=pos, elapsed=0.0, success=True, settle_time=0.0).done

/home/runner/work/apstools/apstools/apstools/devices/tests/test_positioner_soft_done.py:287: AssertionError
=============================== warnings summary ===============================
../../../micromamba-root/envs/anaconda-test-env-py-3.10/lib/python3.11/site-packages/setuptools_scm/git.py:135
  /home/runner/micromamba-root/envs/anaconda-test-env-py-3.10/lib/python3.11/site-packages/setuptools_scm/git.py:135: UserWarning: "/home/runner/work/apstools/apstools" is shallow and may cause errors
    warnings.warn(f'"{wd.path}" is shallow and may cause errors')

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED apstools/devices/tests/test_positioner_soft_done.py::test_move_and_stopped_early - assert not True
 +  where True = MoveStatus(done=True, pos=pos, elapsed=0.0, success=True, settle_time=0.0).done
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
============= 1 failed, 135 passed, 1 warning in 101.31s (0:01:41) =============
Error: Process completed with exit code 1.

prjemian · 2023-02-17T08:33:30Z

Decided to disable the super_linter workflow. Usefulness was offset by reports that did not make any sense.

prjemian added 2 commits February 16, 2023 15:03

TST #806 2.0 s default timeout not enough for CI, sometimes

1e8175c

TST #748 re-emable one test for now

36beb08

prjemian added the bug label Feb 16, 2023

prjemian added this to the 1.6.14 milestone Feb 16, 2023

prjemian self-assigned this Feb 16, 2023

TST #748 re-emable remaining tests

9feedbe

prjemian added 2 commits February 16, 2023 15:51

TST #748 force CA get after move here

b17f8dd

TST #748 small change, re-run CI

4f8568a

TST #748

49feeba

prjemian added 2 commits February 16, 2023 17:03

CI #748

068ac15

TST #748 remove local test markers and re-run CI

345f4ed

TST #748 restore @pytest.mark.local

f7d4a29

prjemian mentioned this pull request Feb 17, 2023

Replace unit tests marked by the @pytest.mark.local decorator. #810

Closed

prjemian merged commit 7d6ac98 into main Feb 17, 2023

prjemian deleted the 748-random-CI-failures branch February 17, 2023 05:38

prjemian mentioned this pull request Feb 17, 2023

addDeviceDataAsStream-> write_stream #811

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resolve random CI failures #809

resolve random CI failures #809

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 17, 2023

resolve random CI failures #809

resolve random CI failures #809

Conversation

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

prjemian commented Feb 16, 2023

Py3.9

Py3.10

prjemian commented Feb 17, 2023