Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel (mpi4py) tests with coverage often lead to INTERNALERROR #237

Open
EmilyBourne opened this issue Nov 14, 2018 · 8 comments
Open

Comments

@EmilyBourne
Copy link

When running tests in parallel for a program which uses mpi4py, often an INTERNALERROR is signalled. The error arises due to a corruption of the .coverage file. This seems to be due to a race condition (hence why the error does not always arise). More specifically I think it is due to concurrent write commands.

The command which gave this error was:
mpirun -n 2 py.test pygyro/diagnostics/ -sxm parallel --cov=pygyro

The operating system is linux, ubuntu bionic beaver 18.04.1. I am using python 3.6.6.

The error generated is as follows:

INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 293, in read_file
INTERNALERROR>     self.read_fileobj(f)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 271, in read_fileobj
INTERNALERROR>     data = self._read_raw_data(file_obj)

INTERNALERROR> Traceback (most recent call last):INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 314, in _read_raw_data

INTERNALERROR>     return json.load(file_obj)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 293, in read_file
INTERNALERROR>   File "/usr/lib/python3.6/json/__init__.py", line 299, in loadINTERNALERROR>     self.read_fileobj(f)

INTERNALERROR>     parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 271, in read_fileobj
INTERNALERROR>     data = self._read_raw_data(file_obj)
INTERNALERROR>   File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 314, in _read_raw_dataINTERNALERROR>     return _default_decoder.decode(s)

INTERNALERROR>     return json.load(file_obj)
INTERNALERROR>   File "/usr/lib/python3.6/json/decoder.py", line 342, in decode
INTERNALERROR>   File "/usr/lib/python3.6/json/__init__.py", line 299, in loadINTERNALERROR>     raise JSONDecodeError("Extra data", s, end)

INTERNALERROR>     parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
INTERNALERROR> json.decoder.JSONDecodeError: Extra data: line 1 column 9466 (char 9465)
INTERNALERROR> 
INTERNALERROR>   File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
INTERNALERROR> During handling of the above exception, another exception occurred:INTERNALERROR>     return _default_decoder.decode(s)

INTERNALERROR> 
INTERNALERROR>   File "/usr/lib/python3.6/json/decoder.py", line 342, in decode
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>     raise JSONDecodeError("Extra data", s, end)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 184, in wrap_session
INTERNALERROR> json.decoder.JSONDecodeError: Extra data: line 1 column 9466 (char 9465)
INTERNALERROR>     session.exitstatus = doit(config, session) or 0INTERNALERROR> 

INTERNALERROR> During handling of the above exception, another exception occurred:
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 224, in _mainINTERNALERROR> 

INTERNALERROR> Traceback (most recent call last):INTERNALERROR>     config.hook.pytest_runtestloop(session=session)

INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 184, in wrap_session
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 284, in __call__
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 224, in _main
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 67, in _hookexec
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 284, in __call__
INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 61, in <lambda>
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 67, in _hookexec
INTERNALERROR>     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/callers.py", line 203, in _multicall
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 61, in <lambda>
INTERNALERROR>     gen.send(outcome)
INTERNALERROR>     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/plugin.py", line 228, in pytest_runtestloop
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/callers.py", line 203, in _multicall
INTERNALERROR>     gen.send(outcome)
INTERNALERROR>     self.cov_controller.finish()
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/plugin.py", line 228, in pytest_runtestloop
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/engine.py", line 167, in finish
INTERNALERROR>     self.cov_controller.finish()
INTERNALERROR>     self.cov.stop()
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/engine.py", line 167, in finish
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/control.py", line 677, in loadINTERNALERROR>     self.cov.stop()

INTERNALERROR>     self.data_files.read(self.data)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/control.py", line 677, in load
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 653, in readINTERNALERROR>     self.data_files.read(self.data)

INTERNALERROR>     data.read_file(self.filename)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 653, in read
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 297, in read_fileINTERNALERROR>     data.read_file(self.filename)

INTERNALERROR>     filename, exc.__class__.__name__, exc,
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 297, in read_file
INTERNALERROR>     filename, exc.__class__.__name__, exc,
INTERNALERROR> coverage.misc.CoverageException: Couldn't read data from '/home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage': JSONDecodeError: Extra data: line 1 column 9466 (char 9465)
INTERNALERROR> coverage.misc.CoverageException: Couldn't read data from '/home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage': JSONDecodeError: Extra data: line 1 column 9466 (char 9465)


========================= 1731 passed in 17.66 seconds =========================
========================= 1731 passed in 17.66 seconds =========================
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[29764,1],0]
  Exit code:    3
--------------------------------------------------------------------------
@blueyed
Copy link
Contributor

blueyed commented Nov 14, 2018

INTERNALERROR> coverage.misc.CoverageException: Couldn't read data from '/home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage': JSONDecodeError: Extra data: line 1 column 9466 (char 9465)

This looks like you're using coverage.py 5, and there you're mixing the (new, default) SQLite database with the legacy JSON format.

Try using --cov-config to pass your coveragerc/config explicitly.

You can also try export COVERAGE_DEBUG=process,config to get info about coverage processes being started and which config they are using.

@EmilyBourne
Copy link
Author

My version of coverage is 4.5.2.

I don't get this problem when running anything in serial so I don't think it should be a versionning issue.

The reason I think that it's a concurrent write problem is that I have occasionally seen things such as: :[]}},[]}} at the end of the file.

Using --cov-config doesn't make any difference. Is there anything I need to add to the coveragerc file?

Here is the output from export COVERAGE_DEBUG=process,config I can't see any problems

-- config ----------------------------------------------------
               _include: None
                  _omit: None
 attempted_config_files: .coveragerc
                 branch: False
            concurrency: None
           config_files: .coveragerc
            cover_pylib: False
              data_file: /home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage
                  debug: process
                         config
       disable_warnings: -none-
           exclude_list: #\s*(pragma|PRAGMA)[:\s]?\s*(no|NO)\s*(cover|COVER)
              extra_css: None
             fail_under: 0.0
               html_dir: htmlcov
             html_title: Coverage report
          ignore_errors: False
                   note: None
               parallel: False
    partial_always_list: while (True|1|False|0):
                         if (True|1|False|0):
           partial_list: #\s*(pragma|PRAGMA)[:\s]?\s*(no|NO)\s*(branch|BRANCH)
                  paths: {}
         plugin_options: {}
                plugins: -none-
              precision: 0
         report_include: None
            report_omit: *test*
                         pygyro/advection/*mod*
                         pygyro/initialisation/*mod*
                         pygyro/splines/*mod*
                         pygyro/advection/accelerated_advection_steps.py
                         pygyro/initialisation/initialiser_func.py
                         pygyro/splines/spline_eval_funcs.py
                         pygyro/tools/*
                         pygyro/utilities/*
            run_include: None
               run_omit: None
           show_missing: False
           skip_covered: False
                 source: pygyro
                  timid: False
             xml_output: current_coverage.xml
      xml_package_depth: 99
-- end -------------------------------------------------------

@blueyed
Copy link
Contributor

blueyed commented Nov 15, 2018

With export COVERAGE_DEBUG=process,config you should see multiple outputs of the config, and I assume that some of them do not get your "config_files" right.
What is your .coveragerc?

@EmilyBourne
Copy link
Author

I have the multiple outputs but they all say the same thing:
config_files: .coveragerc

My .coveragerc is not very complicated it only contains information about which files to omit and the name of the xml file if it is to be generated:

[report]
omit = *test*
       pygyro/advection/*mod*
       pygyro/initialisation/*mod*
       pygyro/splines/*mod*
       pygyro/advection/accelerated_advection_steps.py
       pygyro/initialisation/initialiser_func.py
       pygyro/splines/spline_eval_funcs.py
       pygyro/tools/*
       pygyro/utilities/*

[xml]
output = current_coverage.xml

I have just had another look at the doc and tried adding

[run]
parallel = True

to the file. This does not remove the error. I have also tried adding the "concurrency" argument too but I am not sure what to specify as the library. concurrency = mpi4py does not change the situation, concurrency = multiprocessing leads to a different problem :

WARNING: Failed to generate report: No data to report.

Coverage.py warning: Couldn't read data from '/home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage.emily-SATELLITE-Z30-A.31964.442210': FileNotFoundError: [Errno 2] No such file or directory: '/home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage.emily-SATELLITE-Z30-A.31964.442210'

@EmilyBourne
Copy link
Author

Actually I seem to be getting this error even without the concurrency argument when I set parallel = True:

Coverage.py warning: Couldn't read data from '/home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage.emily-SATELLITE-Z30-A.22702.434517': CoverageException: Doesn't seem to be a coverage.py data file

@blueyed
Copy link
Contributor

blueyed commented Nov 15, 2018

Coverage.py warning: Couldn't read data from '/home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage.emily-SATELLITE-Z30-A.22702.434517': CoverageException: Doesn't seem to be a coverage.py data file

How does this file look then?

With COVERAGE_DEBUG=process you should be seeing where this process (22702 or 434517) gets created.
There are several other options for COVERAGE_DEBUG, e.g. "dataio" that could be helpful here to debug this. See https://github.com/nedbat/coveragepy/blob/cc12f4b7c40347b7297f7f6d938150bfde8c9ed5/doc/cmd.rst#diagnostics.

skirpichev added a commit to skirpichev/diofant that referenced this issue Nov 22, 2018
See
pytest-dev/pytest-cov#237 (comment)

Also pytest options for Travis CI builds were consolidated
in PYTEST_ADDOPTS environment variable.
rouge8 added a commit to GoodRx/pytest-mark-no-py3 that referenced this issue May 17, 2019
@1313e
Copy link

1313e commented Oct 22, 2019

There is a very easy way to fix this: Don't write to the same coverage file from multiple cores.
This can be done by putting the following line in the setup.cfg file (or whatever you use as the configuration file):

[coverage:run]
parallel = true

and run your tests with:

mpiexec -n 2 coverage run --rcfile=setup.cfg -m mpi4py -m pytest
coverage combine
coverage report -m

This will create a separate .coverage file for every core that executed the tests, and combine them afterward into a single report.

Note that this also makes it easier to test a package (for example) both in serial and in parallel, and combine all reports into a single one, by doing this:

coverage run --rcfile=setup.cfg -m pytest
mpiexec -n 2 coverage run --rcfile=setup.cfg -m mpi4py -m pytest
coverage combine
coverage report -m

Finally, also note that the console output when running pytest in MPI this way, will probably be messed up a little bit, as both processes try to write to the console simultaneously.
This has no effect at all on the actual testing process itself, but may make it a bit harder to figure out what process printed what.
As far as I know, there is no way to fix this without making pytest somehow MPI-aware.

@JulienPalard
Copy link
Contributor

JulienPalard commented Jun 27, 2020

This can be related to nedbat/coveragepy#883 (comment)

FFY00 added a commit to pypa/build that referenced this issue Aug 5, 2020
This was causing some errors in the integration tests.

pytest-dev/pytest-cov#237

Signed-off-by: Filipe Laíns <lains@archlinux.org>
FFY00 added a commit to pypa/build that referenced this issue Aug 5, 2020
This was causing some errors in the integration tests.

pytest-dev/pytest-cov#237

Signed-off-by: Filipe Laíns <lains@archlinux.org>
FFY00 added a commit to pypa/build that referenced this issue Aug 6, 2020
This was causing some errors in the integration tests.

pytest-dev/pytest-cov#237

Signed-off-by: Filipe Laíns <lains@archlinux.org>
FFY00 added a commit to pypa/build that referenced this issue Aug 6, 2020
This was causing some errors in the integration tests.

pytest-dev/pytest-cov#237

Signed-off-by: Filipe Laíns <lains@archlinux.org>
FFY00 added a commit to pypa/build that referenced this issue Aug 6, 2020
This was causing some errors in the integration tests.

pytest-dev/pytest-cov#237

Signed-off-by: Filipe Laíns <lains@archlinux.org>
FFY00 added a commit to pypa/build that referenced this issue Aug 6, 2020
This was causing some errors in the integration tests.

pytest-dev/pytest-cov#237

Signed-off-by: Filipe Laíns <lains@archlinux.org>
FFY00 added a commit to pypa/build that referenced this issue Aug 6, 2020
This was causing some errors in the integration tests.

pytest-dev/pytest-cov#237

Signed-off-by: Filipe Laíns <lains@archlinux.org>
FFY00 added a commit to pypa/build that referenced this issue Aug 6, 2020
This was causing some errors in the integration tests.

pytest-dev/pytest-cov#237

Signed-off-by: Filipe Laíns <lains@archlinux.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants