Skip to content

Proceedings 2019 ESPResSo meetings

Jean-Noël Grad edited this page Jul 20, 2021 · 1 revision

Proceedings of the 2019 ESPResSo meetings

2019-12-13

Integrating waLBerla LB

  • remove espresso LBCPU in the PR integrating waLBerla LBCPU
  • waLBerla doesn't currently have LBGPU implementation that can be integrated in espresso out-of-the-box
  • EKCPU can be implemented using stencils
  • EKGPU in waLBerla is unclear
  • Lees-Edwards depends on LBCPU only
  • find out which LB systems are sensitive to single-precision on GPU
  • quantify speed-up of espresso LBGPU vs waLBerla LBCPU

January coding day

  • factor out globals
  • refactor the analysis functions
  • CIP pool only available on Fridays

Planned projects

  • Particle collision: have a technical meeting on particle creation (Flo, Rudolf, Ingo, Christoph, Philip)
  • MMM2D: check for feature in Scafacos
  • ELC: schedule meeting
  • GitHub Actions: could replace GitLab-CI when more features become available, defer for a few months

2019-11-26

Minimal required dependency versions for 4.2

  • Thread: #3093
  • CMake (#3090)
    • We currently officially require 3.4. Doesn't actually work in all environments
    • FetchContent module is of importance and is available in 3.11
    • Newer versions (currently 3.13) can be installed with pip install --user cmake
    • Move to 3.10 for now (Ubuntu 18.04)
  • Boost
    • We currently require 1.55
    • Move to 1.65 (Ubuntu 18.04)
    • 1.65 would let us remove most custom handling for compiler/boost version combos
    • Some 32-bit relevant bug fixes came with 1.67
    • Boost.qvm which we might use came with 1.62
    • Boost has to be built manually due to boost-mpi
    • Compile boost manually on Ubuntu 16.04 (for CUDA 9.0 Docker image)
  • Cuda
    • Our cluster currently runs 9.1. That's the only reason to still support that
    • 9.1 requires a Ubuntu 16.04 Docker image
  • Python version
    • Python 3.6 on CentOS 7, Python 3.7 on Debian10
    • Python 3.5 on Ubuntu 16.04 (for CUDA 9.0 Docker image)
  • Cython
    • We require 0.23 but have had to work around quite a few issues already
    • More current Cython can easily be installed from pip
    • Move to 0.26 (Ubuntu 18.04)

ELC and MMM2D

  • ELC
    • tests are currently being written (#3331)
    • ELC bugfix for the potential difference will go in 4.1.2
    • ELC does not work with non-neutral systems
    • Prepare meeting with Christian, Florian, Alex
  • MMM2D
    • MMM2D cannot be used for validation of ELC due to code overlap
    • MMM2D interferes with further refactoring or the cellsystem
    • factor MMM2D out, except for the N-squared cellsystem

Release cycle and release checklist

  • Aims:
    • Avoid issues discovered after the release during packaging
      • Testing Fedora builds incl odd architectures on their infrastructure: Copr (#3312)
      • Run these tests manually only for releases
    • Shorten preparation time for bugfix releases
      • Limit grammar and spell checks to minor releases, and to release notes for bugfix releases
      • If we go for the Fedora build service Copr, remove QEMU emulated builds on odd architectures

Submodules

  • keep submodules mechanism for future external contributed features
  • alternatives: subtrees, manual download

2019-11-05

Collision detection and bond breaking

  • sticking to surface of shapes, probability or breaking based on energy, force or distance, but not history
  • bond creation between particles of specific types, in the future could use virtual sites as reactive sites
  • currently bond creation and breaking is done via python every few time steps
  • could use exception mechanism to pause the integration loop and handle bond status in python (using a runtime error queue)
  • active site becomes inactive after collision, change type upon collision
  • keep three-particle collision (creates angle bonds)
  • make sure bond creation does not happen at the same time as bond breaking to avoid reforming the same bond
  • particle types cannot change during the integration loop
  • find a volunteer to implement it, Rudolf can help but not full time

bors tooling

  • bors often pushes twice to the staging branch, auto-canceling the right pipeline half the time
  • risk of timeout of not manually checked
  • randomly pushes to the python branch, triggering a useless CI pipeline
  • report issue

Ekin and waLBerla

  • lbmpy: will be open source once the paper is out, in the meantime can ship ES with generated code
  • still thermalization issues, should be fixed soon
  • disagreement between ES and waLBerla for MPI node assignment when 4+ threads are used

2019-10-16

4.1.1 bugfix release

  • release checklist, milestone
  • thermostats and integrators checkpointing silently broke in 4.1.0 (#3245)
  • constant pH tutorial: create a PR without unit conversion from #3184 for 4.1.1 and work on a unit conversion with pint for 4.2.0: Jonas
  • removal of the old RE tutorial (#3211)
  • fix NpT interface (#3253)
  • fix broken build system (#3228)

State of large PRs

  • Matheval (#1644): a few use cases for which it can be better than tabulated interaction, could be used in virtual sites, will increase the maintenance effort of espresso if included, should be included as a library via a git subtree or submodule: Rudolf + maybe JN
  • Stokesian dynamics (#3241): Michael + a HiWi (maybe Alex, Jan or a new one)
  • Brownian dynamics (#1842): requires a quick refactor of Velocity Verlet
  • waLBerla, Lees-Edwards (#2976): issue with thermalization

Code quality and linting

  • new autopep8 version introduced in 4.1
  • automatic sorting of import statements
  • pylint (#3194)
    • prevent the introduction of dangerous code (wildcard imports, function overloading, mutable optional value arguments)
    • don't include rules for trivial style changes in CI
    • developers should comment on the PR which rules should be included in CI
  • shellcheck (#3242)
    • look for replacing bash scripts by Python scripts

2019-08-06

Recent major changes to the codebase

  • "newstyle" classes in Python3 and simplified inheritance syntax (#3026):
class A(object):
    def __init__(self):
        pass

class B(object):
    def __init__(self):
        super(B, self).__init__()

becomes

class A:
    def __init__(self):
        pass

class B:
    def __init__(self):
        super().__init__()
  • tutorials now in continuous delivery (#3024)
  • Vector3d-based operations on vectors, force/energy kernels refactor (target: 4.1, #3032, #3039)
    • espresso developers shouldn't manually write vector cross products, dot products, hadamard products or scalar products anymore
    • parts of the code already converted to the Vector3d syntax: shapes, constraints, force functions, energy functions, electrostatics, magnetostatics
    • if any merge conflict occurs in PRs, @jngrad can help
  • planned refactoring of bonded IA structures and force/energy kernel signatures (target: 4.1 if possible)

HIP review

  • list of HIP-related issues created after the last meeting: #2973
    • list of PRs where HIP was involved: #3005, #2984, #2982, #2937, #2933, #2878
    • helped with floating-point precision and incorrect loop unrolling in CUDA code
    • keep HIP support as experimental feature

New coding day

  • last one cleared a lot of tickets
  • Rudolf will organize the next one

Handle long-standing PRs

  • MMM2D and ELC (#2725)

    • consider closing the PR and only keep the cherry-picked improvements in mmm2d.cpp (python...reinaual:singleCharge2D, maybe conflicts with #3022)
    • ELC still requires a lot of maintenance (#2685, #3001, #3003)
    • prepare meeting with Alex, Kai, Rudolf, JN
    • MMM2D is slow and only used for reference, ELC is used for production. MMM2D currently requires direct access to the cells (because it uses those as layers), it is the only method that has an anisotropic cutoff and this only method that needs the layered cell system. The direct cell access blocks separating the interaction calculation from the cell system implementation; this is not maintainable on the long term. Since MMM2D can also be run as a pure pair interaction using the nsquare cell system (at the cost of worse performance) it can still be used as a reference method, without that direct access to the layered cell system.
    • The MMM2D dielectric support also has very poor code quality (it was apparently added after the original implementation).
  • Lees-Edwards/LB CPU incompatibility for 4.1 #2976

    • Issue with particle coupling scheme. For LE the ghost shifts have been removed. But LB couples also to ghost particles, which don't get forces from the coupling but contribute to the force density field of the LB. In doing this, it makes implicit assumptions about the ghost particles, which are an implementation detail of the cell system, including assumptions about the ghost shifts; these implicit assumptions are broken by removing the ghost shifts, which breaks the coupling. The same issue will probably be present in walberla.
    • Possible solutions:
      • fweik: only couple to the local particles of each node, and the reduce the halo regions of the LB force density across nodes. With this the LB coupling only needs the bounding box for the local particles as input. It can then choose a halo which extends its local grid volume so that it covers this bounding box, and otherwise work independent of the cell system. Code for this already exists in Espresso, because this is exactly how the charge density for P3M is collected. This code could be reused. I think this needs to be properly addressed before the ghost shifts can be removed. More generally speaking, the current implementation tries to be clever and avoid one communication by introducing additional coupling between otherwise independent components of the code, which hampers the extensibility of the code.
      • Rudolf: remove LB CPU after 4.1, then add walberla and LE, then fix the particle coupling
    • walberla has thermalization mostly fixed
  • Brownian Dynamics #1842

    • delay this PR
    • fweik: don't add anything to the integration/propagation before it has been refactored to state that one can reason about it.
  • ENGINE on CPU LB

    • test relies on hardcoded data, which is different for LB GPU/CPU
    • forces need extra tests

Reminder for summer school responsibilities

  • check tutorial correctness 3 weeks before the school starts

2019-06-25

Feature reports

State of Lees Edwards integration

  • add feature to 4.1 release
  • no LB support for now, maybe consider adding it after waLBerla integration

State of waLBerla integration

  • add feature to 4.1 release
  • issue with ghost communication of the velocity field for more than 1 node
  • issue with EK boundaries

State of NPT

  • volume changes causes interactions with other features
  • write tests for more complex systems
  • check anisotropic boxes with fixed dimensions
  • check terminology in the docs
  • discuss in #2939

Date of feature-freeze of 4.1

Removing AMD GPU support

  • fix in #2937
  • memory management on the GPU is the main issue
  • HIP status:
    • three compilers: hcc, HIP and a new Clang feature
    • HIP support made ES more CUDA-compliant during HIP integration
    • HIP support of CUDA code lags behind CUDA releases
    • HIP adds a new layer of complexity
  • track logfiles of failed HIP jobs in a dedicated issue
  • postpone removal for 2 ES meetings

Simplifying the CI infrastructure

  • kaniko
    • docker-in-docker is not secure, anyone opening a PR can run malicious code as root
    • kaniko is simpler than docker-in-docker, secure, runs everywhere, but caching is broken
  • staging branch for docker CI? already done by the deploy stage
  • intel compiler
    • requires a license server
    • espressomd/espresso: only activate it on release branches
  • QEMU
    • created for the Fedora packages
    • adds a layer of complexity, triggers too many random failures
    • for the slow emulated containers:
      • espressomd/espresso: only activate them on release branches
      • espressomd/docker: build becomes manual @RudolfWeeber: I understood the outcome of the discussion as follows: Both, Intel and emulated containers are only built manually and the CI jobs in Espressomd are triggerd manually, before the release is made.

2019-05-07

Steps to end Python2 support

  • convert remaining Python2 containers to Python3 -> JN for Linux, Michael for MacOS
  • update install files
  • setting minimum version in CMake
  • check requirements
  • communication to users
  • updates:
    • autopep from Ubuntu 18
    • jupyter available in Ubuntu 18?

Documentation of the CI system

  • GitLab, Runners, connection between GitHub and GitLab

Brief info

Compiling Espresso on bee

  • CUDA update on bee
  • will take a few days to install espresso again

Progress on CI reliability

  • some technical issues now resolved (timing of OS upgrade and nightly build, configuration issues in CI with GPU-tagged jobs, etc.)
  • timeouts still happen
  • BW cloud works out-of-the-box

Documentation checks in CI

  • assertion for Doxygen version in Doxygen warnings parser

CUDA warnings

  • enabled in 9.5, treat them as errors

Espresso summer school

  • 2019: reach larger audience
  • 2020: consider extending the program with all-atom simulation, WaLBerla? CECAM deadline: July 16th 2019 -> JN + Rudolf

2019-04-29

  • Rudolf: failure rate too high to rely on
  • Michael: failure rate of 5% of jobs is usual in many open-source projects, equals one failed job in each of our builds build
  • Rudolf: too many Mac runners, too few Linux runners. Michael: slots are used interchangably, Gitlab statistics do not reflect that. Florian: sometimes only 6 Linux jobs running. Michael: let me know next time it happens, may be config issue.
  • Frank: 8 builders with 4 cores, 1 with 6 cores => 19 slots available
  • Florian: slowest job is sanitizer
  • Rudolf: build takes about half of the total job time
  • categories of issues:
    • timeouts: Rudolf seems to have fixed this by preventing GPU oversubscription
    • Gitlab/Docker bugs - we can't fix, but don't happen that often (#2742)
    • nightly builds vs. automatic updates - fixed already by moving the nightly builds to midnight while the automatic updates are at 05:00
    • Gitlab updates and registry maintenance: Michael can move that to a better time at night so it doesn't collide with the nightly build
    • s390x emulation: library linkage issue, can hopefully be fixed by Michael (#2766)
    • Jean-Noel: some tests get stuck and time out. Michael: attach a debugger next time and check whetere it is stuck.
    • dead build machines, currently restarted manually by Frank, maybe he can set up monitoring
  • hardware:
    • can't add more tower PCs because we don't have enough space
    • some money is available, but also needed for new storage system
      • buy AMD desktops instead of servers, they are relatively cheap per core
    • cloud?
      • expensive too
      • Amazon Spot instances $0.003 per core hour, perfect for burst usage
      • GPUs disproportionately expensive
      • is it worth it if we still need to run some on-premises runners?
    • buy two more AMD Vega 56, one for debugging on a desktop and one for a runner
  • drop Python 2 on the master branch, three fewer CI jobs
    • append "-python3" to all container names so they don't conflict with the containers used by the 4.0 branch

2019-04-16

Branch protection and bors tooling

  • bors bot:
    • merge queue: combine multiple PRs
    • checks the merge on the staging branch before merging on python
    • bors always gets the PR up-to-date before merging
    • maintainer: trigger the merge by posting the comment bors r+
  • branch protection: can't merge until CI passes
  • PR must be approved by 1 person
  • consider applying formatting automatically during the merge

LB/Ekin/Walberla strategy/time line

  • GPU LB boundaries: fixing the code is time consuming
  • keep GPU LB in 4.1 release
  • WaLBerla: thermalization is still missing
  • Lees-Edwards progress

ELC dielectric contrast state and bad test coverage

2019-03-26

LB code

  • LB boundary force
    • results very sensitive to input parameters #2624
    • disable that feature in 4.0.2
    • check LB tutorial
  • LB stress #2054
  • 3pt coupling disabled for now

Features documentation

  • document the implementation of commonly used features directly in Doxygen:
    • observables
    • shapes
    • interactions
    • parallel algorithms accessing particles
  • document in Sphinx where to find these classes in the core

CI robustness, monitoring

  • multiple sources of failures:
    • frequent timeouts
    • random errors (failed code coverage files upload, etc.)
    • can be hardware-dependent
    • down runners
    • hard to reproduce some issues
  • form a group to improve CI reliability (Frank, Jean-Noël, Rudolf, Kai)

Introduction of a staging branch

  • master branch is often failing CI after a merge
  • master has more thorough tests
  • plans:
    • notify when master fails
    • use a staging branch to merge PRs, then merge into master if thorough CI passes
    • Jean-Noël should look into it
      • configure GitHub accordingly
      • branch protection on master
      • look into GitLab Enterprise to mirror the GitHub repository

Open and recently merged pull requests

  • 4.0.2rc #2585
  • MPI PR waiting for review #2593
  • GPU LB checkpoint now works #2511
  • philox RNG on thermalized bonds, dpd, etc.
  • struct Particle refactor #2296: wait for ghost and communication refactors (#2400, #2394, #2478)
  • Coulomb and dipole refactoring waiting for feedback #2512
  • LB stress

Espresso 4.1 roadmap

  • waLBerla progress
  • thermalization
  • don't release before summer school

Tutorials

  • consider reproducing literature results
    • electrostatics tutorial: currently, only salt crystal
    • find relevant literature results to implement: Rudolf + Christian
  • review and merge tutorials CI #2452

Next coding day

  • next week (see Doodle)

2019-03-05

Coding day wrap-up

  • back-communication of ghost forces: still need to do thermalized bonds and DPD
  • angle forces: re-derive formula
  • polymer placement code: in progress
  • soft-sphere: done
  • LB checkpoint: #2555
  • waLBerla integration: in progress, todo list
  • containers migration to Python3: not done yet

Espresso plans

  • Espresso 4.0.2 bugfix release:
    • time-step change causes velocity change
    • wrong sign in tabulated potentials
    • milestone
  • mailing list: communicate on platforms removed from testing
  • consider binary releases (Flatpak)
  • consider moving the developer's guide from Sphinx to the GitHub wiki (Development)

Doxygen coverage

coverxygen:

  • https://github.com/psycofdj/coverxygen
  • extended version available at /work/jgrad/coverxygen
    • the plain text summary now reports:
      • tallies for function parameters and template parameters
      • coverage diff between two commits
    • the website now reports:
      • a new column for undocumented functions (as a subset of the undocumented lines column)
      • lists of undocumented functions in dedicated pages
      • a detailed explanation of what is missing (when hovering the mouse over a red line, a tooltip appears):
        • missing @file block
        • missing Doxygen block for variables and enum/union/struct/class members
        • for functions, provides a list of missing @param and @tparam blocks
      • undocumented enum values are no longer undocumenting the entire enum

What's new with C++14

2019-02-12

CI

  • notify when the master branch doesn't pass CI
  • consider using a staging branch in the future

deRSE19 conference

  • link (poster, short talk: deadline Feb 28th)
  • poster (JN):
    • focus more on project structure & community than on applications
    • history: from Tcl to Python, from LB CPU/GPU to Walberla
  • attendees: JN, Rudolf

ES meetings

  • communicate next ES meeting date on the users mailing lists (Rudolf)

2019-01-22

i386 issues and 4.0.1

32-bit specific issue with floating point arithmetic during particle resorting, fixed in #2454, porting in #2456

Dropping support for older distros in CI

  • C++14 is required for new parts of the core
  • use up-to-date gcc version
  • remove:
    • Ubuntu 14 (LTS ends in April 2019, extended security maintenance after that),
    • CentOS 7 (unless an up-to-date compiler is available)
    • Intel 15 (maybe)
  • check which Boost version is installed
  • make tutorial for running ES in a Docker container (but needs root), or use FlatPak

When/how to end Python 2 support

  • Python2 won't get new updates after Jan 2020
  • Python2 won't be part of standard distributions by default (Ubuntu 20)
  • Need to communicate that change to the user base
  • Invert CI:
    • use Python3 on all distributions
    • use only one container with Python2 (maxset)
  • Start using Jupyter instead of IPython
  • Start using Python3 syntax in 4.1 development

Removal of features without a Python interface

  • Features not used and not documented, need to be removed:
    • GHMC (interacts negatively with the core)
    • NEMD
    • MEMD
  • Some may be implemented in Scafacos

Progress on current projects

LB

  • CPU improvements ready for merge, but 30% slow-down on 1 node, need to measure if this slow-down is significant vs. communication slow-down in multi-node simulations
  • GPU improvements are in progress

Local particles

  • list of particles, indexed by id:
  • previously implemented as an array, needed to be re-updated every time a particle moved from cell to cell
  • now implemented as an unordered map, but slower

Ghost particles

  • #2394
  • simplify reallocation and communication
  • improve CPU/GPU work balance

Tutorial testing

  • #2452
  • add tests for numerical results of tutorials
  • create a new CI container testing just the tutorials

Simplify CMake

New Espresso HiWi

  • removing code duplicates

Planning next coding day

  • do maintenance on Espresso
  • poll to decide on the date