Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent failures on 32bit architectures #481

Closed
avalentino opened this issue Nov 25, 2022 · 8 comments
Closed

Intermittent failures on 32bit architectures #481

avalentino opened this issue Nov 25, 2022 · 8 comments

Comments

@avalentino
Copy link
Contributor

Code Sample, a minimal, complete, and verifiable piece of code

python3 -m pytest -k test_compare_to_legacy

Problem description

On 32bit architectures I'm experiencing intermittent failures of the test pytesample.test.test_dask_ewa.TestDaskEWAResampler.test_compare_to_legacy

Expected Output

All unittest pass.

Actual Result, Traceback if applicable

$ python3 -m pytest -k test_compare_to_legacy
======================================================================== test session starts =========================================================================
platform linux -- Python 3.10.8, pytest-7.1.2, pluggy-1.0.0+repack
rootdir: /home/antonio/debian/git/pyresample
plugins: lazy-fixture-0.6.3, cov-4.0.0
collected 776 items / 772 deselected / 4 selected                                                                                                                    

pyresample/test/test_dask_ewa.py .F..                                                                                                                          [100%]

============================================================================== FAILURES ==============================================================================
____________________________________________ TestDaskEWAResampler.test_compare_to_legacy[input_shape1-input_dims1-False] _____________________________________________

self = <pyresample.test.test_dask_ewa.TestDaskEWAResampler object at 0xd8142a18>, input_shape = (3, 100, 50), input_dims = ('bands', 'y', 'x')
maximum_weight_mode = False

    @pytest.mark.parametrize(
        ('input_shape', 'input_dims', 'maximum_weight_mode'),
        [
            ((100, 50), ('y', 'x'), False),
            ((3, 100, 50), ('bands', 'y', 'x'), False),
            ((100, 50), ('y', 'x'), True),
            ((3, 100, 50), ('bands', 'y', 'x'), True),
        ]
    )
    def test_compare_to_legacy(self, input_shape, input_dims, maximum_weight_mode):
        """Make sure new and legacy EWA algorithms produce the same results."""
        output_shape = (200, 100)
        if len(input_shape) == 3:
            output_shape = (input_shape[0], output_shape[0], output_shape[1])
        swath_data, source_swath, target_area = get_test_data(
            input_shape=input_shape, output_shape=output_shape[-2:],
            input_dims=input_dims,
        )
        swath_data.data = swath_data.data.astype(np.float32)
    
        resampler = DaskEWAResampler(source_swath, target_area)
        new_data = resampler.resample(swath_data, rows_per_scan=10,
                                      maximum_weight_mode=maximum_weight_mode)
        new_arr = new_data.compute()
    
        legacy_resampler = LegacyDaskEWAResampler(source_swath, target_area)
        legacy_data = legacy_resampler.resample(swath_data, rows_per_scan=10,
                                                maximum_weight_mode=maximum_weight_mode)
        legacy_arr = legacy_data.compute()
    
        import sys
        rtol = 1e-7 if sys.maxsize > 2**32 else 1e-4
        #print("rtol", rtol)
>       np.testing.assert_allclose(new_arr, legacy_arr, rtol=rtol)
E       AssertionError: 
E       Not equal to tolerance rtol=0.0001, atol=0
E       
E       Mismatched elements: 1 / 60000 (0.00167%)
E       Max absolute difference: 6.839633e-05
E       Max relative difference: 0.00032184
E        x: array([[[     nan,      nan,      nan, ..., 0.282998, 0.283142,
E                0.283296],
E               [     nan,      nan,      nan, ..., 0.310199, 0.3105  ,...
E        y: array([[[     nan,      nan,      nan, ..., 0.282998, 0.283142,
E                0.283296],
E               [     nan,      nan,      nan, ..., 0.310199, 0.3105  ,...

pyresample/test/test_dask_ewa.py:348: AssertionError
========================================================================== warnings summary ==========================================================================
pyresample/spherical_geometry.py:27
  [...]/dist/pyresample/spherical_geometry.py:27: DeprecationWarning: This module will be removed in pyresample 2.0, please use the `pyresample.spherical` module functions and class instead.
    warnings.warn("This module will be removed in pyresample 2.0, please use the "

dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape0-input_dims0-False]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape1-input_dims1-False]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape2-input_dims2-True]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape3-input_dims3-True]
  /usr/lib/python3/dist-packages/pyproj/crs/crs.py:1286: UserWarning: You will likely lose important projection information when converting to a PROJ string from another format. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems
    proj = self._crs.to_proj4(version=version)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================== short test summary info =======================================================================
FAILED pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape1-input_dims1-False] - AssertionError: 
====================================================== 1 failed, 3 passed, 772 deselected, 5 warnings in 5.18s =======================================================

$ python3 -m pytest -k test_compare_to_legacy
======================================================================== test session starts =========================================================================
platform linux -- Python 3.10.8, pytest-7.1.2, pluggy-1.0.0+repack
rootdir: /home/antonio/debian/git/pyresample
plugins: lazy-fixture-0.6.3, cov-4.0.0
collected 776 items / 772 deselected / 4 selected                                                                                                                    

pyresample/test/test_dask_ewa.py ....                                                                                                                          [100%]

========================================================================== warnings summary ==========================================================================
pyresample/spherical_geometry.py:27
  [...]/dist/pyresample/spherical_geometry.py:27: DeprecationWarning: This module will be removed in pyresample 2.0, please use the `pyresample.spherical` module functions and class instead.
    warnings.warn("This module will be removed in pyresample 2.0, please use the "

dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape0-input_dims0-False]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape1-input_dims1-False]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape2-input_dims2-True]
dist/pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_compare_to_legacy[input_shape3-input_dims3-True]
  /usr/lib/python3/dist-packages/pyproj/crs/crs.py:1286: UserWarning: You will likely lose important projection information when converting to a PROJ string from another format. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems
    proj = self._crs.to_proj4(version=version)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================================================== 4 passed, 772 deselected, 5 warnings in 4.91s ============================================================

Versions of Python, package at hand and relevant dependencies

Python 3.10
PyResample 1.26.0

@avalentino
Copy link
Contributor Author

There seems to be also another issue linked to a 32 bit architecture (mipsel this time).
The problem seems to be linked to the _legacy_dask_ewa extension.

=================================== FAILURES ===================================
_ TestDaskEWAResampler.test_xarray_basic_ewa[100-True-float64-input_shape1-input_dims1-LegacyDaskEWAResampler-pyresample.ewa._legacy_dask_ewa] _

self = <pyresample.test.test_dask_ewa.TestDaskEWAResampler object at 0x6dd1c418>
resampler_class = <class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'>
resampler_mod = <module 'pyresample.ewa._legacy_dask_ewa' from '/<<PKGBUILDDIR>>/.pybuild/cpython3_3.10_pyresample/build/pyresample/ewa/_legacy_dask_ewa.py'>
input_shape = (3, 100, 50), input_dims = ('bands', 'y', 'x')
input_dtype = <class 'numpy.float64'>, maximum_weight_mode = True
rows_per_scan = 100

    @pytest.mark.parametrize(
        ('resampler_class', 'resampler_mod'),
        [
            (DaskEWAResampler, dask_ewa),
            (LegacyDaskEWAResampler, legacy_dask_ewa),
        ])
    @pytest.mark.parametrize(
        ('input_shape', 'input_dims'),
        [
            ((100, 50), ('y', 'x')),
            ((3, 100, 50), ('bands', 'y', 'x')),
        ]
    )
    @pytest.mark.parametrize('input_dtype', [np.float32, np.float64, np.int8])
    @pytest.mark.parametrize('maximum_weight_mode', [False, True])
    @pytest.mark.parametrize('rows_per_scan', [10, 0, 100])
    def test_xarray_basic_ewa(self, resampler_class, resampler_mod,
                              input_shape, input_dims, input_dtype,
                              maximum_weight_mode, rows_per_scan):
        """Test EWA with basic xarray DataArrays."""
        is_legacy = resampler_class is LegacyDaskEWAResampler
        is_int = np.issubdtype(input_dtype, np.integer)
        if is_legacy and is_int:
            pytest.skip("Legacy dask resampler does not properly support "
                        "integer inputs.")
        if is_legacy and rows_per_scan == 0:
            pytest.skip("Legacy dask resampler does not support rows_per_scan "
                        "of 0.")
        output_shape = (200, 100)
        if len(input_shape) == 3:
            output_shape = (input_shape[0], output_shape[0], output_shape[1])
        swath_data, source_swath, target_area = get_test_data(
            input_shape=input_shape, output_shape=output_shape[-2:],
            input_dims=input_dims, input_dtype=input_dtype,
        )
        num_chunks = _get_num_chunks(source_swath, resampler_class, rows_per_scan)
    
        with mock.patch.object(resampler_mod, 'll2cr', wraps=resampler_mod.ll2cr) as ll2cr, \
                mock.patch.object(source_swath, 'get_lonlats', wraps=source_swath.get_lonlats) as get_lonlats:
            resampler = resampler_class(source_swath, target_area)
            new_data = resampler.resample(swath_data, rows_per_scan=rows_per_scan,
                                          weight_delta_max=40,
                                          maximum_weight_mode=maximum_weight_mode)
            _data_attrs_coords_checks(new_data, output_shape, input_dtype, target_area,
                                      'test', 'test')
            # make sure we can actually compute everything
            new_data.compute()
            lonlat_calls = get_lonlats.call_count
            ll2cr_calls = ll2cr.call_count
    
            # resample a different dataset and make sure cache is used
            swath_data2 = _create_second_test_data(swath_data)
            new_data = resampler.resample(swath_data2, rows_per_scan=rows_per_scan,
                                          weight_delta_max=40,
                                          maximum_weight_mode=maximum_weight_mode)
            _data_attrs_coords_checks(new_data, output_shape, input_dtype, target_area,
                                      'test2', 'test2')
            _coord_and_crs_checks(new_data, target_area,
                                  has_bands='bands' in input_dims)
            result = new_data.compute()
    
            # ll2cr will be called once more because of the computation
>           assert ll2cr.call_count == ll2cr_calls + num_chunks
E           AssertionError: assert 100 == (51 + 50)
E            +  where 100 = <MagicMock name='ll2cr' id='1756641120'>.call_count

pyresample/test/test_dask_ewa.py:231: AssertionError

@djhoese
Copy link
Member

djhoese commented Nov 26, 2022

So the legacy dask EWA one we've noticed on 64-bit Windows platforms (at least) in our CI. It is on my TODO list to figure that out.

For the other failure...very weird. A single pixel is too different and only by:

E       Max absolute difference: 6.839633e-05
E       Max relative difference: 0.00032184

Any idea how often it fails versus passes? My guess is it would have to be something with the order that dask computes the individual chunks. I guess the easiest thing would be to increase the threshold for the test, but maybe I spend some time figuring out the legacy dask ewa failure first which at least happens in our CI occasionally.

@avalentino
Copy link
Contributor Author

Any idea how often it fails versus passes?

well, i would say quite often:

  • at the first packaging attempt I had a failure on all (3) the 32 architectures supported by debian.
  • second attempt I relaxed the rtol to 1e-4 (according to numbers observed in the previous run) and still I had a failure on 3 architectures
  • after some investigation on i386 I noticed the intermittent nature of the issue and decided to disable the test_compare_to_legacy completely. The rate of failure in this case was more or less 6/10 considering rtol=1e-4. The following run (with test_compare_to_legacy disabled) still failed in mipsel (32bit) only with the error reported in the previous comment

@djhoese
Copy link
Member

djhoese commented Dec 17, 2022

@avalentino I just merged #482 into main which should fix the intermittent test_xarray_basic_ewa failures. I'm wondering if a similar fix (setting the dask scheduler to sync) would clear up the issues for the other test. My other guess is that the order of execution of the sums in the EWA algorithm is making a big enough difference that it is showing in the tests. In this case we don't have much of a choice so we may just have to lighten up on the comparison threshold.

@avalentino
Copy link
Contributor Author

@djhoese I run the test_xarray_basic_ewa test few times on i386 and all seems to work properly.
But I have to say that I never had problems with that test on i386.

I have also tried to set the dask scheduler to "sync" for the test_compare_to_legacy test but it does not seems to help.
Not sure that my patch is correct anyway.

diff --git a/pyresample/test/test_dask_ewa.py b/pyresample/test/test_dask_ewa.py
index cfb96a7..0ac8286 100644
--- a/pyresample/test/test_dask_ewa.py
+++ b/pyresample/test/test_dask_ewa.py
@@ -333,16 +333,16 @@ class TestDaskEWAResampler:
             input_dims=input_dims,
         )
         swath_data.data = swath_data.data.astype(np.float32)
+        with dask.config.set(scheduler='sync'):
+            resampler = DaskEWAResampler(source_swath, target_area)
+            new_data = resampler.resample(swath_data, rows_per_scan=10,
+                                          maximum_weight_mode=maximum_weight_mode)
+            new_arr = new_data.compute()
 
-        resampler = DaskEWAResampler(source_swath, target_area)
-        new_data = resampler.resample(swath_data, rows_per_scan=10,
-                                      maximum_weight_mode=maximum_weight_mode)
-        new_arr = new_data.compute()
-
-        legacy_resampler = LegacyDaskEWAResampler(source_swath, target_area)
-        legacy_data = legacy_resampler.resample(swath_data, rows_per_scan=10,
-                                                maximum_weight_mode=maximum_weight_mode)
-        legacy_arr = legacy_data.compute()
+            legacy_resampler = LegacyDaskEWAResampler(source_swath, target_area)
+            legacy_data = legacy_resampler.resample(swath_data, rows_per_scan=10,
+                                                    maximum_weight_mode=maximum_weight_mode)
+            legacy_arr = legacy_data.compute()
 
         np.testing.assert_allclose(new_arr, legacy_arr)

@djhoese
Copy link
Member

djhoese commented Dec 17, 2022

Yeah that's basically how I would have done it. I mean this change in result by such a small amount could be anything from numpy to dask to the specific process the code is being run on. Especially since it is one pixel I'm not sure it is worth trying to narrow it down rather than changing the threshold/tolerance on the comparison.

@avalentino
Copy link
Contributor Author

OK for me to change the tolerance.

@mraspaud
Copy link
Member

I think this was solved in #482, please tell if it isn't. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants