Skip to content

Commit

Permalink
Merge branch 'develop' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
egede authored Apr 10, 2024
2 parents ffdf26c + 2a6bc14 commit c04a5fb
Show file tree
Hide file tree
Showing 25 changed files with 624 additions and 74 deletions.
2 changes: 1 addition & 1 deletion doc/UserGuide/InputAndOutputData.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ The urls are generated by using the `id` of the file.

This will upload the local file "~/temp/mydata.txt" to the user's Google Drive inside a folder names `Ganga`. The File object also supports for glob patterns, which can be supplied as `j.namePattern = '*.ROOT'`.

Upon first usage, the user will be asked to authenticate and allow access to create new files and edit these files only. While the default client ID of `Ganga` can be used, it is recommended to create you own client ID. Tjhis will prevent getting rate limited by other users. See :doc:`GoogleOauth` for how to do this.
Upon first usage, the user will be asked to authenticate and allow access to create new files and edit these files only. While the default client ID of `Ganga` can be used, it is recommended to create you own client ID. This will prevent getting rate limited by other users. See :doc:`GoogleOauth` for how to do this.

Only files created by Ganga can be deleted (or restored after deletion).

Expand Down
2 changes: 1 addition & 1 deletion doc/UserGuide/InstallAndBasicUsage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ To check the ``stdout/stderr`` of a job, you can use the peek method
Copy a job
----------
You can copy and old job, modify its attributes anbd then submit it as a new one
You can copy and old job, modify its attributes and then submit it as a new one
.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/UserGuide/JobManipulation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ Removing Jobs
-------------

As you submit more jobs, your Ganga repository will grow and could become quite large. If you have finished with
jobs it is good practise to remove them from the repository:
jobs it is good practice to remove them from the repository:

.. literalinclude:: ../../ganga/GangaCore/test/GPI/TutorialTests.py
:start-after: # -- JOBMANIPULATION JOBREMOVE START
Expand Down
62 changes: 56 additions & 6 deletions doc/UserGuide/Virtualization.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@

Virtualization
==============
It is possible to run a Ganga job inside a container. This allows you to get a completely well defined environment on the worker node where the job is executed. Each job has a virtualization attribute which defines the image to be used for the container as a required attribute. Images can be either from Docker or Singularity hub, from the images created by Gitlab or in case of Singularity from a file provided as a GangaFile.
It is possible to run a Ganga job inside a container. This allows you to get a completely well defined environment on the worker node where the job is executed. Each job has a virtualization attribute which defines the image to be used for the container as a required attribute. Images can be either from Docker, Singularity or Apptainer hub, from the images created by Gitlab or in case of Singularity or Apptainer, from a file provided as a GangaFile.

Using images can provide an attractive workflow where GitLab continuous integration is used to create Docker images. Those images can then subsequently be used for running jobs where it is assured that they are in the same environment. The image can either be used directly from the repository (using the deploy username/password if private) or can be pulled and converted to a Singularity image.
Using images can provide an attractive workflow where GitLab continuous integration is used to create Docker images. Those images can then subsequently be used for running jobs where it is assured that they are in the same environment. The image can either be used directly from the repository (using the deploy username/password if private) or can be pulled and converted to a Singularity or Apptainer image.

Try it out
----------
Expand Down Expand Up @@ -62,14 +62,64 @@ If the image is a private image, the username and password of the deploy token c
j.virtualization.tokenuser = 'gitlab+deploy-token-123'
j.virtualization.tokenpassword = 'gftrh84dgel-245^ghHH'
Directories can be mounted from the host to the container using key-value pairs to the mounts option. If the directory is not vailable on the host, a warning will be written to stderr of the job and no mount will be attempted.
Directories can be mounted from the host to the container using key-value pairs to the mounts option. If the directory is not available on the host, a warning will be written to stderr of the job and no mount will be attempted.

.. code-block:: python
j.virtualization.mounts = {'/cvmfs':'/cvmfs'}
By default the container is started in singularity with the ``--nohome`` option. Extra options can be provided through the ``options`` attribute. See the Singularity documentation for what is possible.

Apptainer class
-----------------
The Apptainer class can be used for either Apptainer or Docker images. It requires that apptainer is installed on the worker node.

For Apptainer images you provide the image name and tag from Apptainer hub like

.. code-block:: python
j=Job()
j.application=Executable(exe=File('my/full/path/to/executable'))
j.virtualization = Apptainer("shub://image:tag")
Notice how the executable is given as a ``File`` object. This ensures that it is copied to the working directory and thus will be accessible inside the container.

The container can also be provided as a Docker image from a repository. The default repository is Docker hub.

.. code-block:: python
j.virtualization = Apptainer("docker://gitlab-registry.cern.ch/lhcb-core/lbdocker/centos7-build:v3")
j.virtualization = Docker("docker://fedora:latest")
Another option is to provide a ``GangaFile`` Object which points to a apptainer file. In that case the apptainer image file will be copied to the worker node. The first example is with an image located on some shared disk. This will be effective for running on a local backend or a batch system with a shared disk system.

.. code-block:: python
imagefile = SharedFile('myimage.sif', locations=['/my/full/path/myimage.sif'])
j.virtualization = Apptainer(image= imagefile)
while a second example is with an image located in the Dirac Storage Element. This will be effective when using the Dirac backend.

.. code-block:: python
imagefile = DiracFile('myimage.sif', lfn=['/some/lfn/path'])
j.virtualization = Apptainer(image= imagefile)
If the image is a private image, the username and password of the deploy token can be given like the example below. Look inside Gitlab setting for how to set this up. The token will only need access to the images and nothing else.

.. code-block:: python
j.virtualization.tokenuser = 'gitlab+deploy-token-123'
j.virtualization.tokenpassword = 'gftrh84dgel-245^ghHH'
Directories can be mounted from the host to the container using key-value pairs to the mounts option. If the directory is not vailable on the host, a warning will be written to stderr of the job and no mount will be attempted.

.. code-block:: python
j.virtualization.mounts = {'/cvmfs':'/cvmfs'}
By default the container is started in apptainer with the ``--nohome`` option. Extra options can be provided through the ``options`` attribute. See the Apptainer documentation for what is possible.

Docker class
------------
You can define a docker container by providing an image name and tag. Using that ganga will fetch
Expand All @@ -80,8 +130,8 @@ the image from the docker hub.
j=Job()
j.virtualization = Docker(image="image:tag")
Ganga will try to run the container using Docker if Docker is available in the worker node and if the user has the
permission to run docker containers. If not ganga will download `UDocker <https://github.com/indigo-dc/udocker>`_ which provides the ability to run docker containers in userspace. The runmode in Udocker can be changed as seen in the documentation. Using Singualarity as the run mode is not recommended; use the ``Singularity`` class above instead.
Ganga will try to run the container using Docker if Docker is availabe in the worker node and if the user has the
permission to run docker containers. If not ganga will download `UDocker <https://github.com/indigo-dc/udocker>`_ which provides the ability to run docker containers in userspace. The runmode in Udocker can be changed as seen in the documentation. Using Singualarity as the run mode is not recommended; use the ``Singularity`` or ``Apptainer`` class above instead.

Issues to keep in mind
----------------------
Expand All @@ -90,5 +140,5 @@ Awareness should be given to the load that using containers will impose on the s

* If the file system is shared (like for the ``Batch`` and ``Local`` backends, the images pulled down from a remote repository will be cached locally.
* If the file system is not shared (like for the ``LCG`` and ``Dirac`` backends), then images from remote repositories will be pulled for each job. This might put an excessive load on the network and/or the repository.
* If the image for ``Singularity`` is given as a file, it will be copied to the worker node. If provided as a ``DiracFile`` object, it can be replicated to the sites where the job will be asked to run to limit the impact of pulling the image.
* If the image for ``Singularity`` or ``Apptainer`` is given as a file, it will be copied to the worker node. If provided as a ``DiracFile`` object, it can be replicated to the sites where the job will be asked to run to limit the impact of pulling the image.

2 changes: 1 addition & 1 deletion doc/UserGuide/WhatIsGanga.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ The idea in Ganga is to take all these problems, provide a Python API for them t
* Tell Ganga where your task should be executed (Batch, Grid, ...) and submit it;
* Let Ganga monitor the progress, resubmit failed pieces and merge the results in the end.

Ganga provides a plugin system that allows groups such as HEP collaborations to expand the API with specific applications that will make it easier to run tasks on remote systems (build shared libraries, find configuration files, interact with data bookkeeping). There is also support for running tasks inside docker and singularity containers.
Ganga provides a plugin system that allows groups such as HEP collaborations to expand the API with specific applications that will make it easier to run tasks on remote systems (build shared libraries, find configuration files, interact with data bookkeeping). There is also support for running taks inside docker, singularity and apptainer containers.
2 changes: 1 addition & 1 deletion doc/dev/credentials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ at which point it will ask the store for the real credential and then extract th
A method decorator called :func:`~.require_credential` is provided for use in any class which has a ``credential_requirements`` attribute.
It will access this attribute, search in the credential store for the appropriate match and raise an error if it is not found.
This allows any methods on a class (such as a backend's ``submit`` method) to label themselves as using a credential,
allowing the system to defer asking the user to create the credential until the time it is acually needed.
allowing the system to defer asking the user to create the credential until the time it is actually needed.
It is *possible* to use the credentials system without using ``require_credential``
but it provides a way of explicitly marking "this is the point at which we will ask the user to create the credential."

Expand Down
2 changes: 1 addition & 1 deletion doc/dev/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ These proxy classes exist for a number of reasons but primarily they are there f
While a ``GangaObject`` can has as many functions and attributes as it likes,
only those attributes in the schema and those methods which are explicitly exported will be available to users of the proxy class.

When working on internal Ganga code, you shuold never have to deal with any proxy objects at all.
When working on internal Ganga code, you should never have to deal with any proxy objects at all.
Proxies should be added to objects as they are passed to the GPI and should be removed as they are passed back.

Attributes on proxy objects
Expand Down
2 changes: 1 addition & 1 deletion doc/sysadmin/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ We have since migrated away from that and there are two primary ways to get acce
pip
^^^

At its simplest it is possbile to install ganga using the standard Python ``pip`` tool with a simple
At its simplest it is possible to install ganga using the standard Python ``pip`` tool with a simple

.. code-block:: bash
Expand Down
2 changes: 1 addition & 1 deletion doc/work/lifetime.txt
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ In brief the deepcopy is called for every object in the tree. At each level the
the copies are saved in the state dictionary
3.1.4 state dictionary is saved by calling __setstate__
3.1.5 deepcopy then iterates over non-copyable properties and reset their values to default
(thus disregarding some object copies which may have been done unnecesserily)
(thus disregarding some object copies which may have been done unnecessarily)

4. Additional description of low-level methods

Expand Down
6 changes: 3 additions & 3 deletions doc/work/splitting.txt
Original file line number Diff line number Diff line change
Expand Up @@ -178,16 +178,16 @@ while the specific/sub aspect is considered the number of times which
is implied by the splitting. So it is either equal to the number of
subjobs or equal to one if no splitting.

The interfaces have been desigend in a 'backwards-compatible'
The interfaces have been designed in a 'backwards-compatible'
way. This means that the applications and backends may be used without
changes with the new base classes.

By default splitting is enabled for all "old" applications but it will
be innefficient (the application will be configured for each subjob).
be inefficient (the application will be configured for each subjob).
The bulk submission is emulated with individual job submission in a
loop.

The prototype of base classes and their intrerface will follow soon.
The prototype of base classes and their interface will follow soon.



Expand Down
Loading

0 comments on commit c04a5fb

Please sign in to comment.