Skip to content

Commit

Permalink
Merge pull request #1 from prisms-center/4.X
Browse files Browse the repository at this point in the history
4.X
  • Loading branch information
bpuchala authored Oct 17, 2017
2 parents 31dea83 + df26998 commit b6b1aae
Show file tree
Hide file tree
Showing 28 changed files with 913 additions and 532 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ Possible values for "taskstatus" are:


Jobs are marked 'auto' either by submitting through the python class ``prisms_jobs.Job``
with the attribute ``auto=True``, or by submitting a PBS script which contains
the line ``#auto=True`` using the included ``psub`` script.
with the attribute ``auto=True``, or by submitting a script which contains
the line ``#auto=True`` using the included ``psub`` command line program.

Jobs can be monitored using the command line program ``pstat``. All 'auto' jobs
which have stopped can be resubmitted using ``pstat --continue``. In this case,
Expand Down Expand Up @@ -70,7 +70,7 @@ Jobs not marked 'auto' are shown with the status "Check" in ``pstat`` until the
marks them as "Complete".


## Installation from PyPI (todo)
## Installation from PyPI

Using ``pip``:

Expand All @@ -85,7 +85,7 @@ If installing to a user directory, you may need to set your PATH to find the ins
export PATH=$PATH:`python -m site --user-base`/bin


## Install using conda (todo)
## Install using conda

conda config --add channels prisms-center
conda install prisms-jobs
Expand All @@ -99,9 +99,9 @@ If installing to a user directory, you may need to set your PATH to find the ins
git clone https://github.com/prisms-center/prisms_jobs.git
cd prisms_jobs

2. Checkout the branch/tag containing the version you wish to install. Latest is ``v3.0.1``:
2. Checkout the branch/tag containing the version you wish to install. Latest is ``v4.0.0``:

git checkout v3.0.1
git checkout v4.0.0

2. From the root directory of the repository:

Expand Down
2 changes: 0 additions & 2 deletions build_conda.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# begin
anaconda login
conda config --set anaconda_upload yes

# build, get location of result, upload
conda build conda-recipes/prisms_jobs > conda-recipes/tmp.out
Expand All @@ -19,5 +18,4 @@ conda convert --platform linux-64 $LOCATION -o conda-recipes
anaconda upload --user prisms-center conda-recipes/linux-64/prisms-jobs*

# finish
conda config --set anaconda_upload no
anaconda logout
8 changes: 6 additions & 2 deletions conda-recipes/prisms_jobs/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,18 +1,22 @@
package:
name: prisms-jobs
version: "3.0.1"
version: "4.0.0"

source:
git_rev: v3.0.1
git_rev: v4.0.0
git_url: https://github.com/prisms-center/prisms_jobs.git

requirements:
build:
- python
- setuptools
- future
- six
- argparse # [py26]
run:
- python
- future
- six
- argparse # [py26]

about:
Expand Down
21 changes: 20 additions & 1 deletion doc/source/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ prisms_jobs
prisms_jobs.EligibilityError
prisms_jobs.complete_job
prisms_jobs.error_job
prisms_jobs.set_software

prisms_jobs.interface
---------------------
Expand All @@ -27,6 +26,26 @@ prisms_jobs.interface
prisms_jobs.interface.torque
prisms_jobs.interface.slurm
prisms_jobs.interface.default

prisms_jobs.config
---------------------

.. autosummary::
:toctree:

prisms_jobs.config.configure
prisms_jobs.config.dbpath
prisms_jobs.config.settings
prisms_jobs.config.read_config
prisms_jobs.config.write_config
prisms_jobs.config.default_settings
prisms_jobs.config.config_dir
prisms_jobs.config.config_path
prisms_jobs.config.update_selection_method
prisms_jobs.config.set_update_selection_method
prisms_jobs.config.software
prisms_jobs.config.set_software
prisms_jobs.config.detect_software

prisms_jobs.misc
----------------
Expand Down
4 changes: 2 additions & 2 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,9 @@
# built documents.
#
# The short X.Y version.
version = u'3.0'
version = u'4.0'
# The full version, including alpha/beta/rc tags.
release = u'3.0b0'
release = u'4.0.0'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
98 changes: 80 additions & 18 deletions doc/source/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,90 @@
Configuration
=============

Environment variables (typically not necessary):
Some configuration is possible:

- ``PRISMS_JOBS_DB``:
- ``PRISMS_JOBS_DIR``: (optional, default=``$HOME/.prisms_jobs``)

The SQLite jobs database is stored by default at ``$HOME/.prisms_jobs/jobs.db``.
If ``PRISMS_JOBS_DB`` is set, then the jobs database is stored at
``$PRISMS_JOBS_DB/jobs.db``.
The jobs database is stored at ``$PBS_JOB_DIR/jobs.db``.

- ``PRISMS_JOBS_SOFTWARE``:
- ``PRISMS_JOBS_DIR/config.json``:

By default, ``prisms-jobs`` will attempt to automatically
detect ``'torque'`` (by checking for the 'qstat' executable) or ``'slurm'`` (by
checking for the 'sbatch' executable). The ``'default'`` module provides stubs to
enable testing/use on systems with no job management software. If ``PRISMS_JOBS_SOFTWARE``
is set to any other value, it is treated as the name of a Python module containing
a custom interface which ``prisms_jobs`` will attempt to import and use.
Automatically generated JSON configuration file storing settings:

- ``"dbpath"``: (str)

The location of the SQLite jobs database.

- ``"software"``: (str)

The job submission software interface to use. ``"torque"`` or ``"slurm"``
is automatically detected if present.

+-------------------+------------------------------------------------+
|"torque" |TORQUE |
+-------------------+------------------------------------------------+
|"slurm" |Slurm |
+-------------------+------------------------------------------------+
|"default" (or null)|Empty stub, does nothing |
+-------------------+------------------------------------------------+
|other |The name of an existing findable python module |
| |implementing an interface |
+-------------------+------------------------------------------------+

- ``PRISMS_JOBS_UPDATE``:
- ``"write_submit_script"``: (bool, optional, default=false)

If ``true``, submit jobs by first writing a submit script file and then
submitting it. Otherwise, by default, the job is submitted via the command
line.

- ``"update_method"``: (str, optional, default="default")

Controls which jobs are updated when JobDB.update() is called.

+-------------------+------------------------------------------------+
|"default" (or null)| Select jobs with jobstatus != 'C' |
+-------------------+------------------------------------------------+
|"check_hostname" | Select jobs with jobstatus != 'C' and matching |
| | hostname. This is useful on compute clusters |
| | where multiple machines with different queues |
| | share the same ``PRISMS_JOBS_DIR``. |
+-------------------+------------------------------------------------+

- ``"taskmaster_job_kwargs"``: (JSON object, optional)

Holds options for the `taskmaster`_ job. Defaults are:

+-----------+------------------------------------------------+
|'name' | "taskmaster" |
+-----------+------------------------------------------------+
|'account' | "prismsprojectdebug_fluxoe" |
+-----------+------------------------------------------------+
|'nodes' | "1" |
+-----------+------------------------------------------------+
|'ppn' | "1" |
+-----------+------------------------------------------------+
|'walltime' | "1:00:00" |
+-----------+------------------------------------------------+
|'pmem' | "3800mb" |
+-----------+------------------------------------------------+
|'qos' | "flux" |
+-----------+------------------------------------------------+
|'queue' | "fluxoe" |
+-----------+------------------------------------------------+
|'message' | null |
+-----------+------------------------------------------------+
|'email' | null |
+-----------+------------------------------------------------+
|'priority' | "-1000" |
+-----------+------------------------------------------------+
|'command' | "rm taskmaster.o*; rm taskmaster.e*\\n" |
+-----------+------------------------------------------------+
|'auto' | false |
+-----------+------------------------------------------------+

Additionally, the ``'exetime'`` is set based on the ``--delay``
commandline argument and the commandline invocation used to launch
``taskmaster`` is appended to ``'command'``.

If unset or set to ``'default'``, the ``pstat`` script
will attempt to update the status of all jobs that are not yet complete (``'C'``).
For systems with multiple-clusters-same-home, this may be set to ``'check_hostname'``
and ``pstat`` will only attempt to update the status of jobs that are not yet
complete (``'C'``) and have matching hostname, as determined by ``socket.gethostname()``.
.. _taskmaster: scripts/taskmaster.html

5 changes: 1 addition & 4 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
.. prisms_jobs documentation master file, created by
sphinx-quickstart on Mon Sep 11 12:22:02 2017.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. prisms_jobs
Welcome to prisms-jobs's documentation!
=======================================
Expand Down
4 changes: 2 additions & 2 deletions doc/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,11 @@ Install from source
git clone https://github.com/prisms-center/prisms_jobs.git
cd prisms_jobs

2. Checkout the branch/tag containing the version you wish to install. Latest is ``v3.0.1``:
2. Checkout the branch/tag containing the version you wish to install. Latest is ``v4.0.0``:

::

git checkout v3.0.1
git checkout v4.0.0

3. From the root directory of the repository:

Expand Down
8 changes: 5 additions & 3 deletions doc/source/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ Possible values for "taskstatus" are:


Jobs are marked 'auto' either by submitting through the python class ``prisms_jobs.Job``
with the attribute ``auto=True``, or by submitting a PBS script which contains
the line ``#auto=True`` using the included ``psub`` script.
with the attribute ``auto=True``, or by submitting a script which contains
the line ``#auto=True`` using the included ``psub`` command line program.

Jobs can be monitored using the command line program ``pstat``. All 'auto' jobs
which have stopped can be resubmitted using ``pstat --continue``. In this case,
Expand Down Expand Up @@ -61,7 +61,9 @@ Example screen shot:

Additionally, when scheduling periodic jobs is not allowed other ways, the
``taskmaster`` script can fully automate this process. ``taskmaster`` executes
``pstat --continue`` and then resubmits itself to execute again periodically.
``pstat --continue`` and then resubmits itself to execute again periodically. As
not all compute resources allow this behavior, remember check the policy prior
to using ``taskmaster`` on a new compute resource.

A script marked 'auto' should check itself for completion and when reached execute
``pstat --complete $JOBID --force`` in bash, or ``prisms_jobs.complete_job()``
Expand Down
19 changes: 8 additions & 11 deletions doc/source/scripts/pstat.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
.. scripts/pstat.rst
pstat
=====


Summary
-------
``pstat``
=========

Summary:
--------

``pstat`` gives command line access to the jobs database. It can be used to:

Expand All @@ -28,12 +26,11 @@ Summary
- Delete jobs from the database (and abort if currently running)


Help documentation:
-------------------
``--help`` documentation:
-------------------------

.. argparse::
:filename: scripts/pstat
:filename: prisms_jobs/scripts/pstat.py
:func: make_parser
:prog: pstat



13 changes: 8 additions & 5 deletions doc/source/scripts/psub.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
.. scripts/psub.rst
psub
====
``psub``
========

Summary:
--------

``psub`` submits a job script and adds the job to the job database.

Help documentation:
-------------------
``--help`` documentation:
-------------------------

.. argparse::
:filename: scripts/psub
:filename: prisms_jobs/scripts/psub.py
:func: parser
:prog: psub

Expand Down
32 changes: 17 additions & 15 deletions doc/source/scripts/taskmaster.rst
Original file line number Diff line number Diff line change
@@ -1,26 +1,28 @@
.. scripts/taskmaster.rst
taskmaster
==========
``taskmaster``
==============

``taskmaster`` submits a job on the PRISMS flux debug queue that will repeatedly
resubmit any ``Auto`` jobs in the job database that have completed but whose
taskstatus is still ``'Incomplete'`` (perhaps because the jobs has hit the walltime
before completing or failed to converge) and then resubmit itself with a delay
before execution.
Summary:
--------

To use on machines other than flux change the line containing
``taskmaster`` submits a job that will repeatedly resubmit any ``Auto`` jobs in
the job database that have completed but whose taskstatus is still ``'Incomplete'``
(perhaps because the jobs has hit the walltime before completing or failed to
converge) and then resubmit itself with a delay before execution. As not all
compute resources allow this behavior, remember check the policy prior to using
``taskmaster`` on a new compute resource.

::
The job submission options can be customized by editing the ``prisms-jobs``
`configuration file`_.

j = prisms_jobs.templates.PrismsDebugJob(...)

Help documentation:
-------------------
``--help`` documentation:
-------------------------

.. argparse::
:filename: scripts/taskmaster
:filename: prisms_jobs/scripts/taskmaster.py
:func: parser
:prog: taskmaster

_`configuration file`: config.html
Loading

0 comments on commit b6b1aae

Please sign in to comment.