Intermittent failures on 32bit architectures #481

avalentino · 2022-11-25T21:03:29Z

Code Sample, a minimal, complete, and verifiable piece of code

python3 -m pytest -k test_compare_to_legacy

Problem description

On 32bit architectures I'm experiencing intermittent failures of the test pytesample.test.test_dask_ewa.TestDaskEWAResampler.test_compare_to_legacy

Expected Output

All unittest pass.

Actual Result, Traceback if applicable

$ python3 -m pytest -k test_compare_to_legacy
======================================================================== test session starts =========================================================================
platform linux -- Python 3.10.8, pytest-7.1.2, pluggy-1.0.0+repack
rootdir: /home/antonio/debian/git/pyresample
plugins: lazy-fixture-0.6.3, cov-4.0.0
collected 776 items / 772 deselected / 4 selected                                                                                                                    

pyresample/test/test_dask_ewa.py .F..                                                                                                                          [100%]

============================================================================== FAILURES ==============================================================================
____________________________________________ TestDaskEWAResampler.test_compare_to_legacy[input_shape1-input_dims1-False] _____________________________________________

self = <pyresample.test.test_dask_ewa.TestDaskEWAResampler object at 0xd8142a18>, input_shape = (3, 100, 50), input_dims = ('bands', 'y', 'x')
maximum_weight_mode = False

    @pytest.mark.parametrize(
        ('input_shape', 'input_dims', 'maximum_weight_mode'),
        [
            ((100, 50), ('y', 'x'), False),
            ((3, 100, 50), ('bands', 'y', 'x'), False),
            ((100, 50), ('y', 'x'), True),
            ((3, 100, 50), ('bands', 'y', 'x'), True),
        ]
    )
    def test_compare_to_legacy(self, input_shape, input_dims, maximum_weight_mode):
        """Make sure new and legacy EWA algorithms produce the same results."""
        output_shape = (200, 100)
        if len(input_shape) == 3:
            output_shape = (input_shape[0], output_shape[0], output_shape[1])
        swath_data, source_swath, target_area = get_test_data(
            input_shape=input_shape, output_shape=output_shape[-2:],
            input_dims=input_dims,
        )
        swath_data.data = swath_data.data.astype(np.float32)
    
        resampler = DaskEWAResampler(source_swath, target_area)
        new_data = resampler.resample(swath_data, rows_per_scan=10,
                                      maximum_weight_mode=maximum_weight_mode)
        new_arr = new_data.compute()
    
        legacy_resampler = LegacyDaskEWAResampler(source_swath, target_area)
        legacy_data = legacy_resampler.resample(swath_data, rows_per_scan=10,
                                                maximum_weight_mode=maximum_weight_mode)
        legacy_arr = legacy_data.compute()
    
        import sys
        rtol = 1e-7 if sys.maxsize > 2**32 else 1e-4
        #print("rtol", rtol)
>       np.testing.assert_allclose(new_arr, legacy_arr, rtol=rtol)
E       AssertionError: 
E       Not equal to tolerance rtol=0.0001, atol=0
E       
E       Mismatched elements: 1 / 60000 (0.00167%)
E       Max absolute difference: 6.839633e-05
E       Max relative difference: 0.00032184
E        x: array([[[     nan,      nan,      nan, ..., 0.282998, 0.283142,
E                0.283296],
E               [     nan,      nan,      nan, ..., 0.310199, 0.3105  ,...
E        y: array([[[     nan,      nan,      nan, ..., 0.282998, 0.283142,
E                0.283296],
E               [     nan,      nan,      nan, ..., 0.310199, 0.3105  ,...

pyresample/test/test_dask_ewa.py:348: AssertionError
========================================================================== warnings summary ==========================================================================
pyresample/spherical_geometry.py:27
  [...]/dist/pyresample/spherical_geometry.py:27: DeprecationWarning: This module will be removed in pyresample 2.0, please use the `pyresample.spherical` module functions and class instead.
    warnings.warn("This module will be removed in pyresample 2.0, please use the "

dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape0-input_dims0-False]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape1-input_dims1-False]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape2-input_dims2-True]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape3-input_dims3-True]
  /usr/lib/python3/dist-packages/pyproj/crs/crs.py:1286: UserWarning: You will likely lose important projection information when converting to a PROJ string from another format. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems
    proj = self._crs.to_proj4(version=version)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================== short test summary info =======================================================================
FAILED pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape1-input_dims1-False] - AssertionError: 
====================================================== 1 failed, 3 passed, 772 deselected, 5 warnings in 5.18s =======================================================

$ python3 -m pytest -k test_compare_to_legacy
======================================================================== test session starts =========================================================================
platform linux -- Python 3.10.8, pytest-7.1.2, pluggy-1.0.0+repack
rootdir: /home/antonio/debian/git/pyresample
plugins: lazy-fixture-0.6.3, cov-4.0.0
collected 776 items / 772 deselected / 4 selected                                                                                                                    

pyresample/test/test_dask_ewa.py ....                                                                                                                          [100%]

========================================================================== warnings summary ==========================================================================
pyresample/spherical_geometry.py:27
  [...]/dist/pyresample/spherical_geometry.py:27: DeprecationWarning: This module will be removed in pyresample 2.0, please use the `pyresample.spherical` module functions and class instead.
    warnings.warn("This module will be removed in pyresample 2.0, please use the "

dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape0-input_dims0-False]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape1-input_dims1-False]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape2-input_dims2-True]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape3-input_dims3-True]
  /usr/lib/python3/dist-packages/pyproj/crs/crs.py:1286: UserWarning: You will likely lose important projection information when converting to a PROJ string from another format. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems
    proj = self._crs.to_proj4(version=version)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================================================== 4 passed, 772 deselected, 5 warnings in 4.91s ============================================================

Versions of Python, package at hand and relevant dependencies

Python 3.10
PyResample 1.26.0

The text was updated successfully, but these errors were encountered:

avalentino · 2022-11-26T12:22:26Z

There seems to be also another issue linked to a 32 bit architecture (mipsel this time).
The problem seems to be linked to the _legacy_dask_ewa extension.

=================================== FAILURES ===================================
_ TestDaskEWAResampler.test_xarray_basic_ewa[100-True-float64-input_shape1-input_dims1-LegacyDaskEWAResampler-pyresample.ewa._legacy_dask_ewa] _

self = <pyresample.test.test_dask_ewa.TestDaskEWAResampler object at 0x6dd1c418>
resampler_class = <class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'>
resampler_mod = <module 'pyresample.ewa._legacy_dask_ewa' from '/<<PKGBUILDDIR>>/.pybuild/cpython3_3.10_pyresample/build/pyresample/ewa/_legacy_dask_ewa.py'>
input_shape = (3, 100, 50), input_dims = ('bands', 'y', 'x')
input_dtype = <class 'numpy.float64'>, maximum_weight_mode = True
rows_per_scan = 100

    @pytest.mark.parametrize(
        ('resampler_class', 'resampler_mod'),
        [
            (DaskEWAResampler, dask_ewa),
            (LegacyDaskEWAResampler, legacy_dask_ewa),
        ])
    @pytest.mark.parametrize(
        ('input_shape', 'input_dims'),
        [
            ((100, 50), ('y', 'x')),
            ((3, 100, 50), ('bands', 'y', 'x')),
        ]
    )
    @pytest.mark.parametrize('input_dtype', [np.float32, np.float64, np.int8])
    @pytest.mark.parametrize('maximum_weight_mode', [False, True])
    @pytest.mark.parametrize('rows_per_scan', [10, 0, 100])
    def test_xarray_basic_ewa(self, resampler_class, resampler_mod,
                              input_shape, input_dims, input_dtype,
                              maximum_weight_mode, rows_per_scan):
        """Test EWA with basic xarray DataArrays."""
        is_legacy = resampler_class is LegacyDaskEWAResampler
        is_int = np.issubdtype(input_dtype, np.integer)
        if is_legacy and is_int:
            pytest.skip("Legacy dask resampler does not properly support "
                        "integer inputs.")
        if is_legacy and rows_per_scan == 0:
            pytest.skip("Legacy dask resampler does not support rows_per_scan "
                        "of 0.")
        output_shape = (200, 100)
        if len(input_shape) == 3:
            output_shape = (input_shape[0], output_shape[0], output_shape[1])
        swath_data, source_swath, target_area = get_test_data(
            input_shape=input_shape, output_shape=output_shape[-2:],
            input_dims=input_dims, input_dtype=input_dtype,
        )
        num_chunks = _get_num_chunks(source_swath, resampler_class, rows_per_scan)
    
        with mock.patch.object(resampler_mod, 'll2cr', wraps=resampler_mod.ll2cr) as ll2cr, \
                mock.patch.object(source_swath, 'get_lonlats', wraps=source_swath.get_lonlats) as get_lonlats:
            resampler = resampler_class(source_swath, target_area)
            new_data = resampler.resample(swath_data, rows_per_scan=rows_per_scan,
                                          weight_delta_max=40,
                                          maximum_weight_mode=maximum_weight_mode)
            _data_attrs_coords_checks(new_data, output_shape, input_dtype, target_area,
                                      'test', 'test')
            # make sure we can actually compute everything
            new_data.compute()
            lonlat_calls = get_lonlats.call_count
            ll2cr_calls = ll2cr.call_count
    
            # resample a different dataset and make sure cache is used
            swath_data2 = _create_second_test_data(swath_data)
            new_data = resampler.resample(swath_data2, rows_per_scan=rows_per_scan,
                                          weight_delta_max=40,
                                          maximum_weight_mode=maximum_weight_mode)
            _data_attrs_coords_checks(new_data, output_shape, input_dtype, target_area,
                                      'test2', 'test2')
            _coord_and_crs_checks(new_data, target_area,
                                  has_bands='bands' in input_dims)
            result = new_data.compute()
    
            # ll2cr will be called once more because of the computation
>           assert ll2cr.call_count == ll2cr_calls + num_chunks
E           AssertionError: assert 100 == (51 + 50)
E            +  where 100 = <MagicMock name='ll2cr' id='1756641120'>.call_count

pyresample/test/test_dask_ewa.py:231: AssertionError

djhoese · 2022-11-26T13:34:00Z

So the legacy dask EWA one we've noticed on 64-bit Windows platforms (at least) in our CI. It is on my TODO list to figure that out.

For the other failure...very weird. A single pixel is too different and only by:

E       Max absolute difference: 6.839633e-05
E       Max relative difference: 0.00032184

Any idea how often it fails versus passes? My guess is it would have to be something with the order that dask computes the individual chunks. I guess the easiest thing would be to increase the threshold for the test, but maybe I spend some time figuring out the legacy dask ewa failure first which at least happens in our CI occasionally.

avalentino · 2022-11-26T14:01:53Z

Any idea how often it fails versus passes?

well, i would say quite often:

at the first packaging attempt I had a failure on all (3) the 32 architectures supported by debian.
second attempt I relaxed the rtol to 1e-4 (according to numbers observed in the previous run) and still I had a failure on 3 architectures
after some investigation on i386 I noticed the intermittent nature of the issue and decided to disable the test_compare_to_legacy completely. The rate of failure in this case was more or less 6/10 considering rtol=1e-4. The following run (with test_compare_to_legacy disabled) still failed in mipsel (32bit) only with the error reported in the previous comment

djhoese · 2022-12-17T15:44:49Z

@avalentino I just merged #482 into main which should fix the intermittent test_xarray_basic_ewa failures. I'm wondering if a similar fix (setting the dask scheduler to sync) would clear up the issues for the other test. My other guess is that the order of execution of the sums in the EWA algorithm is making a big enough difference that it is showing in the tests. In this case we don't have much of a choice so we may just have to lighten up on the comparison threshold.

avalentino · 2022-12-17T18:21:09Z

@djhoese I run the test_xarray_basic_ewa test few times on i386 and all seems to work properly.
But I have to say that I never had problems with that test on i386.

I have also tried to set the dask scheduler to "sync" for the test_compare_to_legacy test but it does not seems to help.
Not sure that my patch is correct anyway.

diff --git a/pyresample/test/test_dask_ewa.py b/pyresample/test/test_dask_ewa.py
index cfb96a7..0ac8286 100644
--- a/pyresample/test/test_dask_ewa.py
+++ b/pyresample/test/test_dask_ewa.py
@@ -333,16 +333,16 @@ class TestDaskEWAResampler:
             input_dims=input_dims,
         )
         swath_data.data = swath_data.data.astype(np.float32)
+        with dask.config.set(scheduler='sync'):
+            resampler = DaskEWAResampler(source_swath, target_area)
+            new_data = resampler.resample(swath_data, rows_per_scan=10,
+                                          maximum_weight_mode=maximum_weight_mode)
+            new_arr = new_data.compute()
 
-        resampler = DaskEWAResampler(source_swath, target_area)
-        new_data = resampler.resample(swath_data, rows_per_scan=10,
-                                      maximum_weight_mode=maximum_weight_mode)
-        new_arr = new_data.compute()
-
-        legacy_resampler = LegacyDaskEWAResampler(source_swath, target_area)
-        legacy_data = legacy_resampler.resample(swath_data, rows_per_scan=10,
-                                                maximum_weight_mode=maximum_weight_mode)
-        legacy_arr = legacy_data.compute()
+            legacy_resampler = LegacyDaskEWAResampler(source_swath, target_area)
+            legacy_data = legacy_resampler.resample(swath_data, rows_per_scan=10,
+                                                    maximum_weight_mode=maximum_weight_mode)
+            legacy_arr = legacy_data.compute()
 
         np.testing.assert_allclose(new_arr, legacy_arr)

djhoese · 2022-12-17T18:24:56Z

Yeah that's basically how I would have done it. I mean this change in result by such a small amount could be anything from numpy to dask to the specific process the code is being run on. Especially since it is one pixel I'm not sure it is worth trying to narrow it down rather than changing the threshold/tolerance on the comparison.

avalentino · 2022-12-17T18:47:48Z

OK for me to change the tolerance.

mraspaud · 2023-01-25T07:13:59Z

I think this was solved in #482, please tell if it isn't. Closing for now.

djhoese mentioned this issue Nov 27, 2022

Fix intermittent EWA test failures #482

Merged

mraspaud closed this as completed Jan 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent failures on 32bit architectures #481

Intermittent failures on 32bit architectures #481

avalentino commented Nov 25, 2022

avalentino commented Nov 26, 2022

djhoese commented Nov 26, 2022

avalentino commented Nov 26, 2022

djhoese commented Dec 17, 2022 •

edited

Loading

avalentino commented Dec 17, 2022

djhoese commented Dec 17, 2022

avalentino commented Dec 17, 2022

mraspaud commented Jan 25, 2023

Intermittent failures on 32bit architectures #481

Intermittent failures on 32bit architectures #481

Comments

avalentino commented Nov 25, 2022

Code Sample, a minimal, complete, and verifiable piece of code

Problem description

Expected Output

Actual Result, Traceback if applicable

Versions of Python, package at hand and relevant dependencies

avalentino commented Nov 26, 2022

djhoese commented Nov 26, 2022

avalentino commented Nov 26, 2022

djhoese commented Dec 17, 2022 • edited Loading

avalentino commented Dec 17, 2022

djhoese commented Dec 17, 2022

avalentino commented Dec 17, 2022

mraspaud commented Jan 25, 2023

djhoese commented Dec 17, 2022 •

edited

Loading