Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cylc clean proposal #118

Merged
merged 3 commits into from
May 25, 2021
Merged

cylc clean proposal #118

merged 3 commits into from
May 25, 2021

Conversation

oliver-sanders
Copy link
Member

An outline of where we could go with the cylc clean command, very much open to ideas.

The current (Cylc7) approach to meddling with workflow files is a confusing trio of:

  • rose suite-clean
  • rose_prune
  • rose_arch

These all have overlapping functionalities, they (ideally) should all:

  • Locate workflow installations on different platforms (as required/configured).
  • Have cycle awareness.
  • Be able to perform cycle arithmetic.
  • Be able to flush out files from within workflows

We wont get cylc clean up to rose_arch standards right away, nor should we, but we should be able to get some of the simpler cases out the way by early versions of Cylc8.

I would be keen for the development of a light-weight scaffold with plugins providing filetype support and archive functionality doing the heavy-lifting.

(use case `cylc play --re-run`):

```
# remove the log dir AND suite DB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a common use case is to want to to revert your run directory back to how it was after you ran cylc install. I think the command to do that would cylc clean myflow --rm log:work:share which seems OK to me. Maybe add this as an example?


# remove a workflow installation on the scheduler host and
# on any remote task hosts
cylc clean myflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that, as discussed in cylc/cylc-flow#3887, we'll need a --force option to handle the case where the remote clean cannot be completed.

@dpmatthews
Copy link
Contributor

I think I'm happy with the detail of everything targetted for 8.0 (--local, --remote, --rm, --mv).
I haven't worried too much about the detail of the later stuff but the ideas looks good.

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me.

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to play devil's advocate: maybe this is somewhat overkill for housekeeping of workflow files - is there really much need for MIME type-specific plugins, for instance, when 99% (?) of use would be just to delete or compress and/or archive files by location, file extension, and cycle point. (Is the file type content really relevant here?)

docs/proposal-cylc-clean.md Outdated Show resolved Hide resolved
docs/proposal-cylc-clean.md Outdated Show resolved Hide resolved
@oliver-sanders
Copy link
Member Author

MIME is overkill, really just suggesting that if we add "tar" support we do so via an interface which allows support for other types (or use on if it exists). MIME is just a cheap way of getting file-type association.

@oliver-sanders
Copy link
Member Author

Just considering one final related use-case which is the transfer of files between platforms.

  • Internally (e.g. remote_init)
  • By users (e.g. to move data where it is needed)

Obviously not a priority ATM but worth ensuring the interface is compatible with this in future.

The utilities outlined here provide most of the framework for this, in particular the ability to find what platform a task has run on and to pick a host on the same filesystem where the data can be extracted. This is very hard for users to do, especially in a portable way.

Something like this:

# copy "somefile" from one platforma to platformb
$ cylc clean myflow \
  --scp share/cycle/1234/somefile platforma \
  --platform platformb

# copy "somefile" from the platform where 1234/foo ran (most recently) to the scheduler host
$ cylc clean myflow//1234/foo \
    --scp share/cycle/<cycle>/somefile \
    --platform=<scheduler>

# internal job log retrieval would look like this
$ cylc clean myflow//1234/mytask/* \
    --rsync log/job/<cycle>/<task>/<job>/ <scheduler>

# remote-init would look like this
$ cylc clean myflow \
    --scp=./ myplatform \
    --exclude='*' \
    --include=...

Would need a couple of subtle change to the above to fix. Also perhaps better to split into cylc clean and cylc sync (as two interfaces to the same functionality) for clarity.

@hjoliver
Copy link
Member

We should get this merged, so that the proposal document is can be found easily.

@hjoliver hjoliver force-pushed the master branch 18 times, most recently from 4ec3c68 to de71c88 Compare April 1, 2021 03:52
@hjoliver hjoliver force-pushed the master branch 28 times, most recently from 19a2a96 to 73b1679 Compare April 6, 2021 01:55
Copy link
Contributor

@dpmatthews dpmatthews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this merged.
Detail of initial changes looks fine.
Future stuff can be debated further later.

@hjoliver hjoliver merged commit 286d11b into cylc:master May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants