Proceedings 2019 ESPResSo meetings

Proceedings of the 2019 ESPResSo meetings

2019-12-13

Integrating waLBerla LB

remove espresso LBCPU in the PR integrating waLBerla LBCPU
waLBerla doesn't currently have LBGPU implementation that can be integrated in espresso out-of-the-box
EKCPU can be implemented using stencils
EKGPU in waLBerla is unclear
Lees-Edwards depends on LBCPU only
find out which LB systems are sensitive to single-precision on GPU
quantify speed-up of espresso LBGPU vs waLBerla LBCPU

January coding day

factor out globals
refactor the analysis functions
CIP pool only available on Fridays

Planned projects

Particle collision: have a technical meeting on particle creation (Flo, Rudolf, Ingo, Christoph, Philip)
MMM2D: check for feature in Scafacos
ELC: schedule meeting
GitHub Actions: could replace GitLab-CI when more features become available, defer for a few months

2019-11-26

Minimal required dependency versions for 4.2

Thread: #3093
CMake (#3090)
- We currently officially require 3.4. Doesn't actually work in all environments
- FetchContent module is of importance and is available in 3.11
- Newer versions (currently 3.13) can be installed with pip install --user cmake
- Move to 3.10 for now (Ubuntu 18.04)
Boost
- We currently require 1.55
- Move to 1.65 (Ubuntu 18.04)
- 1.65 would let us remove most custom handling for compiler/boost version combos
- Some 32-bit relevant bug fixes came with 1.67
- Boost.qvm which we might use came with 1.62
- Boost has to be built manually due to boost-mpi
- Compile boost manually on Ubuntu 16.04 (for CUDA 9.0 Docker image)
Cuda
- Our cluster currently runs 9.1. That's the only reason to still support that
- 9.1 requires a Ubuntu 16.04 Docker image
Python version
- Python 3.6 on CentOS 7, Python 3.7 on Debian10
- Python 3.5 on Ubuntu 16.04 (for CUDA 9.0 Docker image)
Cython
- We require 0.23 but have had to work around quite a few issues already
- More current Cython can easily be installed from pip
- Move to 0.26 (Ubuntu 18.04)

ELC and MMM2D

ELC
- tests are currently being written (#3331)
- ELC bugfix for the potential difference will go in 4.1.2
- ELC does not work with non-neutral systems
- Prepare meeting with Christian, Florian, Alex
MMM2D
- MMM2D cannot be used for validation of ELC due to code overlap
- MMM2D interferes with further refactoring or the cellsystem
- factor MMM2D out, except for the N-squared cellsystem

Release cycle and release checklist

Aims:
- Avoid issues discovered after the release during packaging
  - Testing Fedora builds incl odd architectures on their infrastructure: Copr (#3312)
  - Run these tests manually only for releases
- Shorten preparation time for bugfix releases
  - Limit grammar and spell checks to minor releases, and to release notes for bugfix releases
  - If we go for the Fedora build service Copr, remove QEMU emulated builds on odd architectures

Submodules

keep submodules mechanism for future external contributed features
alternatives: subtrees, manual download

2019-11-05

Collision detection and bond breaking

sticking to surface of shapes, probability or breaking based on energy, force or distance, but not history
bond creation between particles of specific types, in the future could use virtual sites as reactive sites
currently bond creation and breaking is done via python every few time steps
could use exception mechanism to pause the integration loop and handle bond status in python (using a runtime error queue)
active site becomes inactive after collision, change type upon collision
keep three-particle collision (creates angle bonds)
make sure bond creation does not happen at the same time as bond breaking to avoid reforming the same bond
particle types cannot change during the integration loop
find a volunteer to implement it, Rudolf can help but not full time

bors tooling

bors often pushes twice to the staging branch, auto-canceling the right pipeline half the time
risk of timeout of not manually checked
randomly pushes to the python branch, triggering a useless CI pipeline
report issue

Ekin and waLBerla

lbmpy: will be open source once the paper is out, in the meantime can ship ES with generated code
still thermalization issues, should be fixed soon
disagreement between ES and waLBerla for MPI node assignment when 4+ threads are used

2019-10-16

4.1.1 bugfix release

release checklist, milestone
thermostats and integrators checkpointing silently broke in 4.1.0 (#3245)
constant pH tutorial: create a PR without unit conversion from #3184 for 4.1.1 and work on a unit conversion with pint for 4.2.0: Jonas
removal of the old RE tutorial (#3211)
fix NpT interface (#3253)
fix broken build system (#3228)

State of large PRs

Matheval (#1644): a few use cases for which it can be better than tabulated interaction, could be used in virtual sites, will increase the maintenance effort of espresso if included, should be included as a library via a git subtree or submodule: Rudolf + maybe JN
Stokesian dynamics (#3241): Michael + a HiWi (maybe Alex, Jan or a new one)
Brownian dynamics (#1842): requires a quick refactor of Velocity Verlet
waLBerla, Lees-Edwards (#2976): issue with thermalization

Code quality and linting

new autopep8 version introduced in 4.1
automatic sorting of import statements
pylint (#3194)
- prevent the introduction of dangerous code (wildcard imports, function overloading, mutable optional value arguments)
- don't include rules for trivial style changes in CI
- developers should comment on the PR which rules should be included in CI
shellcheck (#3242)
- look for replacing bash scripts by Python scripts

2019-08-06

Recent major changes to the codebase

"newstyle" classes in Python3 and simplified inheritance syntax (#3026):

class A(object):
    def __init__(self):
        pass

class B(object):
    def __init__(self):
        super(B, self).__init__()

becomes

class A:
    def __init__(self):
        pass

class B:
    def __init__(self):
        super().__init__()

tutorials now in continuous delivery (#3024)
- writing tests: wiki:Testing/tutorials, ex: 04-LB-part-4
- deploy tutorial: wiki:Documentation/tutorials, ex: 04-LB-part-4
- tests (in Python) and deployment (in CMake) share the same syntax
Vector3d-based operations on vectors, force/energy kernels refactor (target: 4.1, #3032, #3039)
- espresso developers shouldn't manually write vector cross products, dot products, hadamard products or scalar products anymore
- parts of the code already converted to the Vector3d syntax: shapes, constraints, force functions, energy functions, electrostatics, magnetostatics
- if any merge conflict occurs in PRs, @jngrad can help
planned refactoring of bonded IA structures and force/energy kernel signatures (target: 4.1 if possible)
- see project Interaction Kernel Refactoring

HIP review

list of HIP-related issues created after the last meeting: #2973
- list of PRs where HIP was involved: #3005, #2984, #2982, #2937, #2933, #2878
- helped with floating-point precision and incorrect loop unrolling in CUDA code
- keep HIP support as experimental feature

New coding day

last one cleared a lot of tickets
Rudolf will organize the next one

Handle long-standing PRs

MMM2D and ELC (#2725)
- consider closing the PR and only keep the cherry-picked improvements in mmm2d.cpp (python...reinaual:singleCharge2D, maybe conflicts with #3022)
- ELC still requires a lot of maintenance (#2685, #3001, #3003)
- prepare meeting with Alex, Kai, Rudolf, JN
- MMM2D is slow and only used for reference, ELC is used for production. MMM2D currently requires direct access to the cells (because it uses those as layers), it is the only method that has an anisotropic cutoff and this only method that needs the layered cell system. The direct cell access blocks separating the interaction calculation from the cell system implementation; this is not maintainable on the long term. Since MMM2D can also be run as a pure pair interaction using the nsquare cell system (at the cost of worse performance) it can still be used as a reference method, without that direct access to the layered cell system.
- The MMM2D dielectric support also has very poor code quality (it was apparently added after the original implementation).
Lees-Edwards/LB CPU incompatibility for 4.1 #2976
- Issue with particle coupling scheme. For LE the ghost shifts have been removed. But LB couples also to ghost particles, which don't get forces from the coupling but contribute to the force density field of the LB. In doing this, it makes implicit assumptions about the ghost particles, which are an implementation detail of the cell system, including assumptions about the ghost shifts; these implicit assumptions are broken by removing the ghost shifts, which breaks the coupling. The same issue will probably be present in walberla.
- Possible solutions:
  - fweik: only couple to the local particles of each node, and the reduce the halo regions of the LB force density across nodes. With this the LB coupling only needs the bounding box for the local particles as input. It can then choose a halo which extends its local grid volume so that it covers this bounding box, and otherwise work independent of the cell system. Code for this already exists in Espresso, because this is exactly how the charge density for P3M is collected. This code could be reused. I think this needs to be properly addressed before the ghost shifts can be removed. More generally speaking, the current implementation tries to be clever and avoid one communication by introducing additional coupling between otherwise independent components of the code, which hampers the extensibility of the code.
  - Rudolf: remove LB CPU after 4.1, then add walberla and LE, then fix the particle coupling
- walberla has thermalization mostly fixed
Brownian Dynamics #1842
- delay this PR
- fweik: don't add anything to the integration/propagation before it has been refactored to state that one can reason about it.
ENGINE on CPU LB
- test relies on hardcoded data, which is different for LB GPU/CPU
- forces need extra tests

Reminder for summer school responsibilities

check tutorial correctness 3 weeks before the school starts

2019-06-25

Feature reports

State of Lees Edwards integration

add feature to 4.1 release
no LB support for now, maybe consider adding it after waLBerla integration

State of waLBerla integration

add feature to 4.1 release
issue with ghost communication of the velocity field for more than 1 node
issue with EK boundaries

State of NPT

volume changes causes interactions with other features
write tests for more complex systems
check anisotropic boxes with fixed dimensions
check terminology in the docs
discuss in #2939

Date of feature-freeze of 4.1

#2894, update Milestone 4.1 and Project 4.1
release candidate or beta release 1 month before summer school (Oct 7-11 2019)

Removing AMD GPU support

fix in #2937
memory management on the GPU is the main issue
HIP status:
- three compilers: hcc, HIP and a new Clang feature
- HIP support made ES more CUDA-compliant during HIP integration
- HIP support of CUDA code lags behind CUDA releases
- HIP adds a new layer of complexity
track logfiles of failed HIP jobs in a dedicated issue
postpone removal for 2 ES meetings

Simplifying the CI infrastructure

kaniko
- docker-in-docker is not secure, anyone opening a PR can run malicious code as root
- kaniko is simpler than docker-in-docker, secure, runs everywhere, but caching is broken
staging branch for docker CI? already done by the deploy stage
intel compiler
- requires a license server
- espressomd/espresso: only activate it on release branches
QEMU
- created for the Fedora packages
- adds a layer of complexity, triggers too many random failures
- for the slow emulated containers:
  - espressomd/espresso: only activate them on release branches
  - espressomd/docker: build becomes manual @RudolfWeeber: I understood the outcome of the discussion as follows: Both, Intel and emulated containers are only built manually and the CI jobs in Espressomd are triggerd manually, before the release is made.

2019-05-07

Steps to end Python2 support

convert remaining Python2 containers to Python3 -> JN for Linux, Michael for MacOS
update install files
setting minimum version in CMake
check requirements
communication to users
updates:
- autopep from Ubuntu 18
- jupyter available in Ubuntu 18?

Documentation of the CI system

GitLab, Runners, connection between GitHub and GitLab

Brief info

Compiling Espresso on bee

CUDA update on bee
will take a few days to install espresso again

Progress on CI reliability

some technical issues now resolved (timing of OS upgrade and nightly build, configuration issues in CI with GPU-tagged jobs, etc.)
timeouts still happen
BW cloud works out-of-the-box

Documentation checks in CI

assertion for Doxygen version in Doxygen warnings parser

CUDA warnings

enabled in 9.5, treat them as errors

Espresso summer school

2019: reach larger audience
2020: consider extending the program with all-atom simulation, WaLBerla? CECAM deadline: July 16th 2019 -> JN + Rudolf

2019-04-29

Rudolf: failure rate too high to rely on
Michael: failure rate of 5% of jobs is usual in many open-source projects, equals one failed job in each of our builds build
Rudolf: too many Mac runners, too few Linux runners. Michael: slots are used interchangably, Gitlab statistics do not reflect that. Florian: sometimes only 6 Linux jobs running. Michael: let me know next time it happens, may be config issue.
Frank: 8 builders with 4 cores, 1 with 6 cores => 19 slots available
Florian: slowest job is sanitizer
Rudolf: build takes about half of the total job time
categories of issues:
- timeouts: Rudolf seems to have fixed this by preventing GPU oversubscription
- Gitlab/Docker bugs - we can't fix, but don't happen that often (#2742)
- nightly builds vs. automatic updates - fixed already by moving the nightly builds to midnight while the automatic updates are at 05:00
- Gitlab updates and registry maintenance: Michael can move that to a better time at night so it doesn't collide with the nightly build
- s390x emulation: library linkage issue, can hopefully be fixed by Michael (#2766)
- Jean-Noel: some tests get stuck and time out. Michael: attach a debugger next time and check whetere it is stuck.
- dead build machines, currently restarted manually by Frank, maybe he can set up monitoring
hardware:
- can't add more tower PCs because we don't have enough space
- some money is available, but also needed for new storage system
  - buy AMD desktops instead of servers, they are relatively cheap per core
- cloud?
  - expensive too
  - Amazon Spot instances $0.003 per core hour, perfect for burst usage
  - GPUs disproportionately expensive
  - is it worth it if we still need to run some on-premises runners?
- buy two more AMD Vega 56, one for debugging on a desktop and one for a runner
drop Python 2 on the master branch, three fewer CI jobs
- append "-python3" to all container names so they don't conflict with the containers used by the 4.0 branch

2019-04-16

Branch protection and bors tooling

bors bot:
- merge queue: combine multiple PRs
- checks the merge on the staging branch before merging on python
- bors always gets the PR up-to-date before merging
- maintainer: trigger the merge by posting the comment bors r+
branch protection: can't merge until CI passes
PR must be approved by 1 person
consider applying formatting automatically during the merge

LB/Ekin/Walberla strategy/time line

GPU LB boundaries: fixing the code is time consuming
keep GPU LB in 4.1 release
WaLBerla: thermalization is still missing
Lees-Edwards progress

ELC dielectric contrast state and bad test coverage

Konrad is looking into it
- #2685, #2723

2019-03-26

LB code

LB boundary force
- results very sensitive to input parameters #2624
- disable that feature in 4.0.2
- check LB tutorial
LB stress #2054
3pt coupling disabled for now

Features documentation

document the implementation of commonly used features directly in Doxygen:
- observables
- shapes
- interactions
- parallel algorithms accessing particles
document in Sphinx where to find these classes in the core

CI robustness, monitoring

multiple sources of failures:
- frequent timeouts
- random errors (failed code coverage files upload, etc.)
- can be hardware-dependent
- down runners
- hard to reproduce some issues
form a group to improve CI reliability (Frank, Jean-Noël, Rudolf, Kai)

Introduction of a staging branch

master branch is often failing CI after a merge
master has more thorough tests
plans:
- notify when master fails
- use a staging branch to merge PRs, then merge into master if thorough CI passes
- Jean-Noël should look into it
  - configure GitHub accordingly
  - branch protection on master
  - look into GitLab Enterprise to mirror the GitHub repository

Open and recently merged pull requests

4.0.2rc #2585
MPI PR waiting for review #2593
GPU LB checkpoint now works #2511
philox RNG on thermalized bonds, dpd, etc.
struct Particle refactor #2296: wait for ghost and communication refactors (#2400, #2394, #2478)
Coulomb and dipole refactoring waiting for feedback #2512
LB stress

Espresso 4.1 roadmap

waLBerla progress
thermalization
don't release before summer school

Tutorials

consider reproducing literature results
- electrostatics tutorial: currently, only salt crystal
- find relevant literature results to implement: Rudolf + Christian
review and merge tutorials CI #2452

Next coding day

next week (see Doodle)

2019-03-05

Coding day wrap-up

back-communication of ghost forces: still need to do thermalized bonds and DPD
angle forces: re-derive formula
polymer placement code: in progress
soft-sphere: done
LB checkpoint: #2555
waLBerla integration: in progress, todo list
containers migration to Python3: not done yet

Espresso plans

Espresso 4.0.2 bugfix release:
- time-step change causes velocity change
- wrong sign in tabulated potentials
- milestone
mailing list: communicate on platforms removed from testing
consider binary releases (Flatpak)
consider moving the developer's guide from Sphinx to the GitHub wiki (Development)

Doxygen coverage

coverxygen:

https://github.com/psycofdj/coverxygen
extended version available at /work/jgrad/coverxygen
- the plain text summary now reports:
  - tallies for function parameters and template parameters
  - coverage diff between two commits
- the website now reports:
  - a new column for undocumented functions (as a subset of the undocumented lines column)
  - lists of undocumented functions in dedicated pages
  - a detailed explanation of what is missing (when hovering the mouse over a red line, a tooltip appears):
    - missing @file block
    - missing Doxygen block for variables and enum/union/struct/class members
    - for functions, provides a list of missing @param and @tparam blocks
  - undocumented enum values are no longer undocumenting the entire enum

What's new with C++14

https://github.com/AnthonyCalandra/modern-cpp-features/blob/master/CPP14.md
- Generic lambda expressions
- Lambda capture initializers
- Return type deduction

2019-02-12

CI

notify when the master branch doesn't pass CI
consider using a staging branch in the future

deRSE19 conference

link (poster, short talk: deadline Feb 28th)
poster (JN):
- focus more on project structure & community than on applications
- history: from Tcl to Python, from LB CPU/GPU to Walberla
attendees: JN, Rudolf

ES meetings

communicate next ES meeting date on the users mailing lists (Rudolf)

2019-01-22

i386 issues and 4.0.1

32-bit specific issue with floating point arithmetic during particle resorting, fixed in #2454, porting in #2456

Dropping support for older distros in CI

C++14 is required for new parts of the core
use up-to-date gcc version
remove:
- Ubuntu 14 (LTS ends in April 2019, extended security maintenance after that),
- CentOS 7 (unless an up-to-date compiler is available)
- Intel 15 (maybe)
check which Boost version is installed
make tutorial for running ES in a Docker container (but needs root), or use FlatPak

When/how to end Python 2 support

Python2 won't get new updates after Jan 2020
Python2 won't be part of standard distributions by default (Ubuntu 20)
Need to communicate that change to the user base
Invert CI:
- use Python3 on all distributions
- use only one container with Python2 (maxset)
Start using Jupyter instead of IPython
Start using Python3 syntax in 4.1 development

Removal of features without a Python interface

Features not used and not documented, need to be removed:
- GHMC (interacts negatively with the core)
- NEMD
- MEMD
Some may be implemented in Scafacos

Progress on current projects

LB

CPU improvements ready for merge, but 30% slow-down on 1 node, need to measure if this slow-down is significant vs. communication slow-down in multi-node simulations
GPU improvements are in progress

Local particles

list of particles, indexed by id:
previously implemented as an array, needed to be re-updated every time a particle moved from cell to cell
now implemented as an unordered map, but slower

Ghost particles

#2394
simplify reallocation and communication
improve CPU/GPU work balance

Tutorial testing

#2452
add tests for numerical results of tutorials
create a new CI container testing just the tutorials

Simplify CMake

#2403

New Espresso HiWi

removing code duplicates

Planning next coding day

do maintenance on Espresso
poll to decide on the date

Proceedings 2019 ESPResSo meetings

Proceedings of the 2019 ESPResSo meetings

2019-12-13

Integrating waLBerla LB

January coding day

Planned projects

2019-11-26

Minimal required dependency versions for 4.2

ELC and MMM2D

Release cycle and release checklist

Submodules

2019-11-05

Collision detection and bond breaking

bors tooling

Ekin and waLBerla

2019-10-16

4.1.1 bugfix release

State of large PRs

Code quality and linting

2019-08-06

Recent major changes to the codebase

HIP review

New coding day

Handle long-standing PRs

Reminder for summer school responsibilities

2019-06-25

Feature reports

State of Lees Edwards integration

State of waLBerla integration

State of NPT

Date of feature-freeze of 4.1

Removing AMD GPU support

Simplifying the CI infrastructure

2019-05-07

Steps to end Python2 support

Documentation of the CI system

Brief info

Compiling Espresso on bee

Progress on CI reliability

Documentation checks in CI

CUDA warnings

Espresso summer school

2019-04-29

2019-04-16

Branch protection and bors tooling

LB/Ekin/Walberla strategy/time line

ELC dielectric contrast state and bad test coverage

2019-03-26

LB code

Features documentation

CI robustness, monitoring

Introduction of a staging branch

Open and recently merged pull requests

Espresso 4.1 roadmap

Tutorials

Next coding day

2019-03-05

Coding day wrap-up

Espresso plans

Doxygen coverage

What's new with C++14

2019-02-12

CI

deRSE19 conference

ES meetings

2019-01-22

i386 issues and 4.0.1

Dropping support for older distros in CI

When/how to end Python 2 support

Removal of features without a Python interface

Progress on current projects

LB

Local particles

Ghost particles

Tutorial testing

Simplify CMake

New Espresso HiWi

Planning next coding day

Clone this wiki locally