Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File housekeeping utility. #1159

Open
hjoliver opened this issue Sep 16, 2014 · 13 comments
Open

File housekeeping utility. #1159

hjoliver opened this issue Sep 16, 2014 · 13 comments
Milestone

Comments

@hjoliver
Copy link
Member

Cylc really needs a built-in file housekeeping utility, for archiving (by copy or move) and deletion of date-time labeled files and directories older than some offset from current cycle point

The old cylc housekeeping command was removed at cylc-6 because it wasn't ISO 8601 compatible, and it had a serious deficiency that I had never got around to addressing: it was unable to match individual files below a date-time labeled directory. Aside from that it was quite nice in some respects: it was controlled by simple config files, and it performed its configured operations in parallel.

For cylc-6+ a general housekeeping utility can no longer assume a simple fixed format cycle time (see #1158). It would have to be aware of the suite's cycle point format (actually it's worse than this - a suite using cycle point format CCYY-MM-DDTHH could still choose to use filenames containing CCYYMMDDHH for compatibility with external systems, for example).

At NIWA we currently use a (very non-general) in-house shell script for housekeeping. @matthewrmshin - how is this handled at the Met Office?

@hjoliver hjoliver added this to the later milestone Sep 16, 2014
@hjoliver
Copy link
Member Author

UPDATE: a-ha, rose_arch and rose_prune! I had thought Rose could do it but at last look I was expecting a command rather than a built-in app, so I missed it. No doubt this is cylc-6 compatible. I presume this comes under the category of functionality that should be moved into cylc? (not that I'm trying to steal all your stuff!).

It looks like these built-in apps do not handle files "older than" (as opposed to "at") the cycle point offset , but that doesn't really matter. In my old utility, I was trying to automatically handle the case of changing to a smaller offset mid run. That's more difficult than matching a single specific cycle point, obviously (it requires a regex that matches any cycle point, which now depends on the format in use, or else it has to match all possible formats).

@matthewrmshin
Copy link
Contributor

No doubt this is cylc-6 compatible.

Yes, rose_prune in the latest Rose release is tested with cylc 6.

I presume this comes under the category of functionality that should be moved into cylc?

I think most of rose_prune can move to cylc.

It is less clear to me whether we should move rose_arch into cylc or not.

@matthewrmshin
Copy link
Contributor

In the latest version of rose_prune, we have removed any Rose specific functionality, so it is safe to say that we can migrate all its functionality across. (This is as long as we are able to provide a compatibility layer that is transparent to users. I'll follow up on this soon.)

@matthewrmshin
Copy link
Contributor

matthewrmshin commented Jul 10, 2015

A quick brain dump...

It should be relatively straightforward to move job logs housekeep
functionality to cylc from rose_prune, which does the following:

  • Re-rsync and then delete job logs on remote job hosts.
  • Tar-gzip job logs.
  • Remove job logs.
  • Record location of job log files in a searchable database.
    • Original location.
    • Tar-gzip file.
    • Gone.

In cylc, we can also do:

  • Housekeep by cycle, similar to rose_prune.
    • Automatic housekeep, use latest cut off cycle point to determine what
      cycles are safe to housekeep. User to specify the number of cycle points
      or a duration before the cut off point.
    • Triggered housekeep:
      • Option to specify current safe cut off cycle point.
      • Can be invoked by rose_prune.
  • Housekeep by number of files/file size on disk + access time.
    • User to specify:
      • maximum number of job log files
      • total file size occupied by job logs files on disk
      • minimum duration between last access time and current time for any job
        logs for a task before it can be house kept.
    • Housekeep files when:
      • we have too many of them.
      • they are using up too much space.
      • they are not accessed for a long time.

@arjclark
Copy link
Contributor

We should also have the capability to be able to housekeep the contents of the databases as they can become overly large over time.

@hjoliver
Copy link
Member Author

@arjclark - yes, DB housekeeping would be good. Maybe just deleting entries beyond some configurable cutoff would do.

@matthewrmshin
Copy link
Contributor

DB tables we can housekeep:

  • broadcast_events, probably by time - no point keeping very old entries.
  • task_jobs, probably by time_* or cycle.
  • task_job_logs, probably by file mtime.
  • task_events, probably by time.
    • Note: I no longer think this table is useful at all.

DB tables we cannot housekeep:

  • broadcast_states, current broadcast states only, so should be small.

DB tables we may be able to housekeep:

  • task_states, probably by time_updated, but is it desirable to housekeep this table?

See also #1827.

@hjoliver
Copy link
Member Author

hjoliver commented Jun 19, 2016

[meeting] we agreed:

  • DB housekeeping needed sooner rather than later
  • cycle-offset file housekeeping needed by cylc, probably a command line utility called via tasks is sufficient
  • there may be some need for non cycle-based file housekeeping too, but it is harder to see how to do this safely.

(need to be careful of any clash between DB and file housekeeping offsets)

@hjoliver
Copy link
Member Author

NIWA operations reports that (at older cylc versions) db locking issues were strongly correlated with the size of the suite db (presumably because read times became significantly longer, perhaps on a slow filesystem). They used to wipe a db and restart the suite from scratch occasionally, which would fix the problem. This isn't an issue now with our robust lock recovery mechanism, but if db ops do (or can) slow significantly with db size, then automatic housekeeping would be a good thing.

@arjclark
Copy link
Contributor

@benfitzpatrick - the above comment looks related to your rose bush timings investigations

@benfitzpatrick
Copy link
Contributor

We think we can find at least a factor of 2 speed-up for jobs and cycle views in Rose Bush, which I assume is always the dominant reader of the public database. Bigger databases are slower...

@matthewrmshin
Copy link
Contributor

Had a quick discussion with @dpmatthews. A lot of disk usages come from large job log files and large number of job logs per task submit. It may be worthwhile to have them house-kept more aggressively. E.g.:

  • Remove remote job logs as soon as they are retrieved from the remote cluster.
  • Tar-gzip job logs soon after a job is completed and/or on removal (garbage collection) of the task proxy from the pool.

@oliver-sanders
Copy link
Member

Note if it is hard to keep rose prune functional after the platforms change this may have to get fast-tracked to Cylc8.

@oliver-sanders oliver-sanders modified the milestones: cylc-9, 8.x Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants