Robust exception mechanism in core setters #4219

jngrad · 2021-04-09T13:18:52Z

Problem statement

Setters in the core return an error code to indicate whether the core object was successfully initialized or not. The Cython code is expected to check this error code and react to it if it is non-zero. There are however many places in the Cython code where the returned error code was simply ignored and the range checks from the core setter were copy-pasted to the validate_params() method of the corresponding Cython class. This code duplication is both unnecessary and error-prone: if the range checks diverge, the user cannot be notified by the Python interpreter that a core object wasn't initialized. See #4217 for an example.

Solution

All setters are called from the head node, which means it's safe to throw C++ exceptions. Simply add the except + modifier to prototype declarations in .pxd files, and the C++ exception are converted to Python exceptions. The range checks in core setters can be adapted to throw an exception instead of returning an error code. All range checks in validate_params() methods should be moved to their corresponding core setters; the validate_params() should only do type checking and vector length checking.

The range checks need to be placed before the first line that alters the state of the core object. This way, the system doesn't end up in an undefined state and the user can recover from the error. For example, when adding an actor with incorrect parameters, the error is recovered by deleting the actor from the list of active actors. When updating an active actor with incorrect parameters, no cleanup action is required.

Here are the four main C++ exceptions:

std::domain_error: the value is outside the domain of validity
- use when e.g. a positive value was expected but a negative value was passed (pressure, time step, etc.)
- gets converted to a Python ValueError
std::invalid_argument: the value was rejected
- use when e.g. a keyword was expected but an empty string or a keyword outside the set of allowed keywords was passed (name of the correlation operation, etc.)
- gets converted to a Python ValueError
std::out_of_range: access out of bounds
- use when e.g. accessing an array with a negative index
- gets converted to a Python IndexError
std::runtime_error: something unexpected happened
- use when the three other exceptions aren't suitable
- gets converted to a Python RuntimeError

The text was updated successfully, but these errors were encountered:

Fixes #4230 Description of changes: - create a new ELC struct every time a new ELC actor is set up (#4220) - use standard exception mechanism in the ELC setter (#4219) - ELC data members are updated upon change to the system

Description of changes: * create a new DLC struct every time a new DLC actor is set up (#4220) * use standard exception mechanism in the DLC /DAWAANR/MDDS core setters (#4219) and test it * non-P3M methods no longer crash the system with `std::abort()` (via `errexit()`) when something unexpected happens * reduce code duplication in DLC by disentangling DLC from P3M * DLC now depends on feature `DIPOLES` instead of `DP3M` (aka `DIPOLES + FFTW`) * ICC now depends on feature `ELECTROSTATICS` instead of `P3M` (aka `ELECTROSTATICS + FFTW`)

Partial fix for #2628 and #4219 Description of changes: - simplify CUDA kernels and GPU interface code - remove superfluous GPU-related macros and unused code - around 300 lines of CUDA code were deleted - remove GPU magnetostatics global variables - they are now shared pointers in the Cython interface - rewrite `SystemInterface` framework - document framework - remove unused methods - remove broken past-the-end device memory pointers (access to device memory with the wrong alignment is dangerous) - throw an error when requesting a particle list conversion from AoS to SoA for features that are not compiled in - unit test `EspressoSystemInterface` implementation - improve separation of concerns: - `magnetostatics.pyx` no longer knows about the `dipole` global nor the associated enum values - header files of GPU long-range methods no longer expose implementation details - narrow includes of `cells.hpp`, `thermostat.hpp`, `rotation.hpp` to the strict minimum - the majority of indirect includes were unnecessary, exposed global variables and increased compile time - where necessary, functions were moved to different files and global variables `this_node` and `cell_structure` were forwarded by function argument - bugfixes: - fix device memory leak in GPU Barnes-Hut (can fill up device memory quickly when creating, using and then deleting a Barnes-Hut actor in a loop) - GPU dipolar direct sum and Barnes-Hut used to emit a runtime error message, but `handle_errors()` was missing in the Cython interface

Description of changes: - improve exception handling in MPI code (partial fix for #4219) - several runtime errors used to be queued for far too long and would only surface during integration - runtime errors from non-bonded interactions instantiated with a cutoff too large now immediately raise an exception, from which the user needs to recover by either reducing the cutoff or disabling the interaction - runtime errors from virtual sites relative applied to particles too far away now immediately raise an exception, from which the user needs to recover by increasing the minimum global cutoff - all automatically-generated script interface methods used to call `handle_errors("")` with an empty string; now it's called with the method name to help identify which feature threw the exception - virtual sites fatal errors are now runtime errors - quaternion fatal errors are now runtime errors - invalid particle ids fatal errors are now value errors - fix bugs - virtual sites can no longer relate to themselves - particles can no longer be created with a negative id - LB node property `boundary` no longer returns a random integer when `LB_BOUNDARIES` is not compiled in - improve code coverage - common exceptions are now properly covered by tests (e.g. error messages from the cell system and integrator) - rotation code is now unit tested - h5md exceptions are now tested - fix regressions in the testsuite - checkpoint tests were only running on 1 MPI core - the LB VTK had a wall boundary misplaced - virtual sites relative test cases can now run in any order - fix regression in the OpenGL visualizer - activating the `LB_draw_boundaries` option without any other `LB_draw_*` option no longer throws an exception - rescaling the box size in `lj-demo.py` now properly rescales the particle coordinates (partial fix for #4347)

Fixes #4454 Description of changes: - apply the SOLID principles - split electrostatics from magnetostatics - factor out code duplication - extract P3M/DP3M tuning code - hide ELC details in private class methods (the energy/force correction logic used to be exposed in `coulomb.cpp`) - hide Coulomb implementation details by no longer indirectly including `coulomb.hpp` in 30 unrelated files in the core - write clear pre-conditions as event hooks - these events are triggered upon box size change, periodicity change, node grid change and cell structure change - partial fix for #4428 - setting a `node_grid` incompatible with a long-range actor is no longer possible (the old `node_grid` is restored) - these preconditions always run once before the integration loop starts - write test cases to check for these events - bugfix: charge neutrality checks now run once before the integration loop starts, instead of once at actor activation - encapsulate the state of long-range methods in classes managed by the script interface - bugfix: the cached `DipolarP3M` energy correction is now invalidated when the box length changes - remove global variables for CPU methods (partial fix for #2628) and introduce one global variable for each type of long-range actor: dipolar solver, Coulomb solver, Coulomb extension - convert Cython files for long-range actors to Python files using the `ScriptInterface` framework - long-range actors and their python object now have the same lifetime, regardless of whether the actor is active or not (partial fix for #4220) - the list of active long-range actors can no longer end up in an invalid state (partial fix for #4219): - updating an active actor with an invalid parameter now automatically restores the original parameter - inserting an invalid long-range actor in the list of actors now automatically removes the actor from the list of active actors, even when the exception happens during tuning - API changes: - `ELC` and `DLC` are no longer extensions, instead they take another actor as argument - `P3MGPU` is now able to execute the CPU energy and pressure kernels using parameters tuned for the GPU force kernel - closes #239 - P3M-based reaction scripts run with 2x speed-up when reaction steps take place every 1000 integration steps - `P3MGPU` is now supported by `ELC` - long-range actor parameters are now immutable, since changing one parameter (e.g. the prefactor) would usually invalidate the tuned parameters - the charge neutrality check sensitivity can now be modified via `actor.charge_neutrality_tolerance`; this is mostly relevant to actors coupled to `ICC`, whose induced charges can have value spanning several orders of magnitude - under-the-hood changes: - CPU-based long-range actors no longer rely on the `MpiCallbacks` framework - the subtracted Coulomb bonded interaction and pair criteria no longer indirectly include `coulomb.hpp`

jngrad · 2022-06-23T13:42:05Z

This proposal has been successfully implemented in the 4.2 and 4.3 lines of ESPResSo. Closing.

jngrad added Core ScriptInterface CodeQuality labels Apr 9, 2021

jngrad mentioned this issue Apr 16, 2021

Refactor ELC core setter #4231

Merged

jngrad mentioned this issue Apr 30, 2021

Refactor DLC/DAWAANR/MDDS core setters #4238

Merged

jngrad mentioned this issue Aug 12, 2021

Make ekin immutable and disable ekin checkpointing #4327

Merged

jngrad mentioned this issue Sep 23, 2021

LB setters and getters ignore runtime errors #4354

Closed

jngrad mentioned this issue Dec 10, 2021

CUDA code maintenance #4404

Merged

jngrad mentioned this issue Mar 18, 2022

Improve exception handling and fix various regressions #4479

Merged

jngrad mentioned this issue May 27, 2022

Rewrite long-range actors #4506

Merged

jngrad closed this as completed Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robust exception mechanism in core setters #4219

Robust exception mechanism in core setters #4219

jngrad commented Apr 9, 2021

jngrad commented Jun 23, 2022

Robust exception mechanism in core setters #4219

Robust exception mechanism in core setters #4219

Comments

jngrad commented Apr 9, 2021

jngrad commented Jun 23, 2022