Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

actions: remote testing functionality #3641

Open
oliver-sanders opened this issue Jun 2, 2020 · 7 comments
Open

actions: remote testing functionality #3641

oliver-sanders opened this issue Jun 2, 2020 · 7 comments
Labels
infrastructure GH Actions, Codecov etc.
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Jun 2, 2020

Use containers in our GitHub Actions testing workflow to act as remote hosts for testing purposes.

The functional test battery currently supports two types of remote host:

  • remote host
  • remote host with shared filesystem

And there are currently two (soon to be three) task communication methods:

That means that to fully test Cylc we need six platforms. These can probably be implemented as two containers (📦) where 📦 2 is a subtle variant of 📦 1.

tcp poll ssh+tcp
remote host 📦 1 📦 1 📦 1
remote host with shared filesystem 📦 2 📦 2 📦 2

We currently support 9 batch systems (🧮):

  • at
  • background
  • loadleveler
  • lsf
  • moab
  • pbs
  • pbs_multi_cluster
  • sge
  • slurm

So that leaves us with 2 * 3 * 9 = 54 containers!

Ok, so this is getting a bit nuts, for testing we can probably test all of the batch systems (🧮) in the remote host + tcp image.

tcp poll ssh+tcp
remote host 📦1 + 🧮 📦 1 📦 1
remote host with shared filesystem 📦 2 📦 2 📦 2

Questions

  • How many containers do we need?
  • Can we avoid duplication between containers?
@oliver-sanders oliver-sanders added this to the cylc-8.0.0 milestone Jun 2, 2020
@oliver-sanders oliver-sanders added the question Flag this as a question for the next Cylc project meeting. label Jun 2, 2020
@hjoliver
Copy link
Member

hjoliver commented Jun 2, 2020

How many containers do we need?

Seems to me two is enough as testing batch system support is orthogonal to testing remote host support (isn't it?) with separate or shared FS - i.e. no need for the cross combinations.

Can we avoid duplication between containers?

Presumably you mean duplication of test runs, not duplication of container content? If the tests are quick enough we could run all batch system tests in both contains, and to hell with the duplication. Or we could randomly select half the batch system tests for one container, half for the other (nice side effect, we would eventually pick up unforeseen cross-combination issues, at the cost of perfect reproducibility).

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jun 3, 2020

Presumably you mean duplication of test runs

Sorry, I actually meant duplication in docker files. The remote tests are few and fairly fast so that shouldn't be a major issue.

@oliver-sanders
Copy link
Member Author

Ping @kinow who knows the most.

@kinow
Copy link
Member

kinow commented Jun 9, 2020

Ping @kinow who knows the most.

Far from it.

How many containers do we need?

I think it would be easier to have separate containers for each batch system, instead of a single container, or containers with more than 1 batch system installed.

Can we avoid duplication between containers?

There are a few ways of doing that.

Some projects use Shell to process a template file and generate the Dockerfile, e.g. php

A few releases ago Docker added a way to use multiple containers together. So there could be some way to package cylc in one container A (using something different than pip, maybe conda, pyinstaller, or rpm), and combine with a container B that is simply using the image of a batch system container; https://docs.docker.com/develop/develop-images/multistage-build/

Something like:

FROM cylcbase:latest AS cylc
WORKDIR /opt
# package cylc and dependencies creating some file like cylc.zip with a conda env, etc

FROM some-image-with-pbs:latest
COPY --from=cylc /opt/cylc.zip .
# at this point we should have Cylc, and the PBS, so we would just need to figure out a way to run the Cylc test-battery for PBS here

Not sure what approach would be the best, but we can try either and see if that works well on GH Actions I think?

@oliver-sanders oliver-sanders removed the question Flag this as a question for the next Cylc project meeting. label Aug 4, 2020
@oliver-sanders
Copy link
Member Author

Removing the question label, time for investigation...

@oliver-sanders oliver-sanders self-assigned this Aug 25, 2020
@oliver-sanders
Copy link
Member Author

oliver-sanders commented Aug 25, 2020

Putting myself on the ticket as I've made some progress on this.

  • It's fairly straight-forward to setup a Conda docker image then install Cylc into it via Conda.
  • Other images can use this is their base (avoiding duplication).
  • SSH from the host into a container is pretty simple.
  • SSH from the container into the host is likely to more interesting.

The platform matrix for tests takes the form:

(is_local_platform, batch_system, has_shared_fs, comms_method)

Caveats:

  • Installing Conda (or Mamba) onto Alpine involves provisioning your own static libraries 🤮 so we will probs have to use a more heavy-weight image.
  • Conda installations use a lot of space making the docker images somewhat unwieldy.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Sep 8, 2020

The four containers in the matrix (shared_fs, indep_fs) * (tcp, poll) can be reduced to a single image with four sets of run options.

The batch systems would require one container each but could be based on one of the four "base" images from the matrix above. In practice it may be easier to combine at and background into the "base" images.

So that leaves us with one image per batch system and one container per logically sensible combination of (is_local_platform, batch_system, has_shared_fs, comms_method).

So far indep_fs works fine but there is an issue with shared_fs containers where the host cannot issue commands to workflows running inside the container (which some of the tests require). This would presumably require meddling /etc/hosts on the host system which is not viable.

@hjoliver hjoliver modified the milestones: cylc-8.0.0, cylc-8.x Aug 4, 2021
@oliver-sanders oliver-sanders removed their assignment Nov 26, 2021
@MetRonnie MetRonnie added the infrastructure GH Actions, Codecov etc. label Jan 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure GH Actions, Codecov etc.
Projects
None yet
Development

No branches or pull requests

4 participants