From 995212a02dbed11d9649283062809d24e585e24a Mon Sep 17 00:00:00 2001
From:  <>
Date: Tue, 7 Nov 2023 17:32:53 +0000
Subject: [PATCH] Deployed b5c5e6a47 with MkDocs version: 1.3.0

---
 search/search_index.json                  |   2 +-
 sitemap.xml.gz                            | Bin 863 -> 863 bytes
 worker-node/install-wn-tarball/index.html |  18 ++++++++++++------
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/search/search_index.json b/search/search_index.json
index d73538fe5..67cc511af 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"OSG Site Documentation \u00b6 User documentation If you are a researcher interested in accessing OSG computational capacity, please consult our user documentation instead. The OSG Consortium provides common service and support for capacity providers and scientific institutions (i.e., \"sites\") using a distributed fabric of high throughput computational services. The OSG Consortium does not own computational capacity but provides software and services to users and capacity providers alike to enable the opportunistic usage and sharing of capacity. This documentation aims to provide HTC/HPC system administrators with the necessary information to contribute computational capacity to the OSG Consortium. Contributing to the OSG \u00b6 We offer two models for sites to contribute capacity to the OSG Consortium: one where OSG staff hosts and maintains capacity provisioning services for users; and the traditional model where the site hosts and maintains these same services. In both of these cases, the following will be needed: An existing compute cluster running on a supported operating system with a supported batch system: Grid Engine , HTCondor , LSF , PBS Pro / Torque , or Slurm . Outbound network connectivity from your cluster's worker nodes Temporary scratch space on each worker node Don't meet the requirements? If your site does not meet the above conditions, please contact us to discuss your options for contributing to the OSG Consortium. OSG-hosted services \u00b6 To contribute computational capacity with OSG-hosted services, your site will also need the following: Allow SSH access to your local cluster's login host from a known IP address Shared home directories on each cluster node Next steps If you are interested in OSG-hosted services, please contact us for a consultation, even if your site does not meet the conditions as outlined above! Self-hosted services \u00b6 If you are interested in contributing capacity by hosting your own OSG services, please continue with the site planning page.","title":"Home"},{"location":"#osg-site-documentation","text":"User documentation If you are a researcher interested in accessing OSG computational capacity, please consult our user documentation instead. The OSG Consortium provides common service and support for capacity providers and scientific institutions (i.e., \"sites\") using a distributed fabric of high throughput computational services. The OSG Consortium does not own computational capacity but provides software and services to users and capacity providers alike to enable the opportunistic usage and sharing of capacity. This documentation aims to provide HTC/HPC system administrators with the necessary information to contribute computational capacity to the OSG Consortium.","title":"OSG Site Documentation"},{"location":"#contributing-to-the-osg","text":"We offer two models for sites to contribute capacity to the OSG Consortium: one where OSG staff hosts and maintains capacity provisioning services for users; and the traditional model where the site hosts and maintains these same services. In both of these cases, the following will be needed: An existing compute cluster running on a supported operating system with a supported batch system: Grid Engine , HTCondor , LSF , PBS Pro / Torque , or Slurm . Outbound network connectivity from your cluster's worker nodes Temporary scratch space on each worker node Don't meet the requirements? If your site does not meet the above conditions, please contact us to discuss your options for contributing to the OSG Consortium.","title":"Contributing to the OSG"},{"location":"#osg-hosted-services","text":"To contribute computational capacity with OSG-hosted services, your site will also need the following: Allow SSH access to your local cluster's login host from a known IP address Shared home directories on each cluster node Next steps If you are interested in OSG-hosted services, please contact us for a consultation, even if your site does not meet the conditions as outlined above!","title":"OSG-hosted services"},{"location":"#self-hosted-services","text":"If you are interested in contributing capacity by hosting your own OSG services, please continue with the site planning page.","title":"Self-hosted services"},{"location":"site-maintenance/","text":"Site Maintenance \u00b6 This document outlines how to maintain your OSG site, including steps to take if you suspect that OSG jobs are causing issues. Handle Misbehaving Jobs \u00b6 In rare instances, you may experience issues at your site caused by misbehaving jobs (e.g., over-utilization of memory) from an OSG community or Virtual Organization (VO). If this occurs, you should immediately stop accepting job submissions from the OSG and remove the offending jobs: Configure your batch system to stop accepting jobs from the VO: For HTCondor batch systems, set the following in /etc/condor/config.d/ on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: SUBMIT_REQUIREMENT_Ban_OSG = (Owner != \"<OFFENDING VO USER>\") SUBMIT_REQUIREMENT_Ban_OSG_REASON = \"OSG pilot job submission temporarily disabled\" SUBMIT_REQUIREMENT_NAMES = $(SUBMIT_REQUIREMENT_NAMES) Ban_OSG Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, disable the relevant Slurm partition : [root@host] # scontrol update PartitionName = <OSG PARTITION> State = DOWN Replacing <OSG PARTITION> with the name of the partition where you are sending OSG jobs. Remove the VO's jobs: For HTCondor batch systems, run the following command on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: [root@access-point] # condor_rm <OFFENDING VO USER> Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, run the following command: [root@host] # scancel -u <OFFENDING VO USER> Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. Let us know so that we can track down the offending software or user: the same issue that you're experiencing may also be affecting other sites! Keep OSG Software Updated \u00b6 It is important to keep your software and data (e.g., CAs and VO client) up-to-date with the latest OSG release. See the release notes for your installed release series: OSG 3.6 release notes To stay abreast of software releases, we recommend subscribing to the osg-sites@opensciencegrid.org mailing list. Notify OSG of Major Changes \u00b6 To avoid potential issues with OSG job submissions, please notify us of major changes to your site, including: Major OS version changes on the worker nodes (e.g., upgraded from EL 7 to EL 8) Adding or removing container support through singularity or apptainer Policy changes regarding OSG resource requests (e.g., number of cores or GPUs, memory usage, or maximum walltime) Scheduled or unscheduled downtimes Site topology changes such as additions, modifications, or retirements of OSG services Changes to site contacts, such as administrative or security staff Help \u00b6 If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Site Maintenance"},{"location":"site-maintenance/#site-maintenance","text":"This document outlines how to maintain your OSG site, including steps to take if you suspect that OSG jobs are causing issues.","title":"Site Maintenance"},{"location":"site-maintenance/#handle-misbehaving-jobs","text":"In rare instances, you may experience issues at your site caused by misbehaving jobs (e.g., over-utilization of memory) from an OSG community or Virtual Organization (VO). If this occurs, you should immediately stop accepting job submissions from the OSG and remove the offending jobs: Configure your batch system to stop accepting jobs from the VO: For HTCondor batch systems, set the following in /etc/condor/config.d/ on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: SUBMIT_REQUIREMENT_Ban_OSG = (Owner != \"<OFFENDING VO USER>\") SUBMIT_REQUIREMENT_Ban_OSG_REASON = \"OSG pilot job submission temporarily disabled\" SUBMIT_REQUIREMENT_NAMES = $(SUBMIT_REQUIREMENT_NAMES) Ban_OSG Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, disable the relevant Slurm partition : [root@host] # scontrol update PartitionName = <OSG PARTITION> State = DOWN Replacing <OSG PARTITION> with the name of the partition where you are sending OSG jobs. Remove the VO's jobs: For HTCondor batch systems, run the following command on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: [root@access-point] # condor_rm <OFFENDING VO USER> Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, run the following command: [root@host] # scancel -u <OFFENDING VO USER> Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. Let us know so that we can track down the offending software or user: the same issue that you're experiencing may also be affecting other sites!","title":"Handle Misbehaving Jobs"},{"location":"site-maintenance/#keep-osg-software-updated","text":"It is important to keep your software and data (e.g., CAs and VO client) up-to-date with the latest OSG release. See the release notes for your installed release series: OSG 3.6 release notes To stay abreast of software releases, we recommend subscribing to the osg-sites@opensciencegrid.org mailing list.","title":"Keep OSG Software Updated"},{"location":"site-maintenance/#notify-osg-of-major-changes","text":"To avoid potential issues with OSG job submissions, please notify us of major changes to your site, including: Major OS version changes on the worker nodes (e.g., upgraded from EL 7 to EL 8) Adding or removing container support through singularity or apptainer Policy changes regarding OSG resource requests (e.g., number of cores or GPUs, memory usage, or maximum walltime) Scheduled or unscheduled downtimes Site topology changes such as additions, modifications, or retirements of OSG services Changes to site contacts, such as administrative or security staff","title":"Notify OSG of Major Changes"},{"location":"site-maintenance/#help","text":"If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Help"},{"location":"site-planning/","text":"Site Planning \u00b6 The OSG vision is to integrate computing across different resource types and business models to allow campus IT to offer a maximally flexible high throughput computing (HTC) environment for their researchers. This document is for System Administrators and aims to provide an overview of the different options to consider when planning to share resources via the OSG. After reading, you should be able to understand what software or services you want to provide to support your researchers Note This document covers the most common options. OSG is a diverse infrastructure: depending on what groups you want to support, you may need to install additional services. Coordinate with your local researchers. OSG Site Services \u00b6 The OSG Software stack tries to provide a uniform computing and storage fabric across many independently-managed computing and storage resources. These individual services will be accessed by virtual organizations (VOs), which will delegate the resources to scientists, researchers, and students. Sharing is a fundamental principle for the OSG: your site is encouraged to support as many OSG-registered VOs as local conditions allow. Autonomy is another principle: you are not required to support any VOs you do not want. As the administrator, your task is to make your existing computing and storage resources available to and reliable for your supported VOs. We break this down into three tasks: Getting \"pilot jobs\" submitted to your site batch system. Establishing an OSG runtime environment for running jobs. Delivering data to payload applications to be processed. There are multiple approaches for each item, depending on the VOs you support, and time you have to invest in the OSG. Note An essential concept in the OSG is the \"pilot job\". The pilot, which arrives at your batch system, is sent by the VO to get a resource allocation. However, it does not contain any research payload. Once started, it will connect back to a resource pool and pull down individuals' research \"payload jobs\". Hence, we do not think about submitting \"jobs\" to sites but rather \"resource requests\". Pilot Jobs \u00b6 Traditionally, an OSG Compute Entrypoint (CE) provides remote access for VOs to submit pilot jobs to your local batch system . There are two options for accepting pilot jobs at your site: Hosted CE : OSG will run and operate the CE services at no cost; the site only needs to provide a SSH pubkey-based authentication access to the central OSG host. OSG will interface with the VO and submit pilots directly to your batch system via SSH. By far, this is the simplest option : however, it is less-scalable and the site delegates many of the scheduling decisions to the OSG. Contact help@osg-htc.org for more information on the hosted CE. OSG CE : The traditional option where the site installs and operates a HTCondor-based CE on a dedicated host. This provides the best scalability and flexibility, but may require an ongoing time investment from the site. The OSG CE install and operation is covered in this documentation page . There are additional ways that pilots can be started at a site (either by the site administrator or an end-user); see resource sharing for more details. Runtime environment \u00b6 The OSG requires a very minimal runtime environment that can be deployed via tarball , RPM , or through a global filesystem on your cluster's worker nodes. We believe that all research applications should be portable and self-contained, with no OS dependencies. This provides access to the most resources and minimizes the presence at sites. However, this ideal is often difficult to achieve in practice. For sites that want to support a uniform runtime environment, we provide a global filesystem called CVMFS that VOs can use to distribute their own software dependencies. Finally, many researchers use applications that require a specific OS environment - not just individual dependencies - that is distributed as a container. OSG supports the use of the Singularity container runtime with Docker-based image distribution. Data Services \u00b6 Whether accessed through CVMFS or command-line software like curl , the majority of software is moved via HTTP in cache-friendly patterns. All sites are highly encouraged to use an HTTP proxy to reduce the load on the WAN from the cluster. Depending on the VOs you want to support, additional data services may be necessary: Some VOs elect to stream their larger input data from offsite using OSG's Data Federation . User jobs can make use of the OSG Data Federation without any services at your site but you may wish to run one or more of the following services: Data Cache to further reduce load on your connection to the WAN. Data Origin to allow local users to stage their data into the OSG Data Federation. The largest sites will additionally run large-scale data services such as a \"storage element\". This is often required for sites that want to support more complex organizations such as ATLAS or CMS. Site Policies \u00b6 Sites are encouraged to clearly specify and communicate their local policies regarding resource access. One common mechanism to do this is post them on a web page and make this page part of your site registration . Written policies help external entities understand what your site wants to accomplish with the OSG -- and are often internally clarifying. In line of our principle of sharing , we encourage you to allow virtual organizations registered with the OSG \"opportunistic use\" of your resources. You may need to preempt those jobs when higher priority jobs come around. The end-users using the OSG generally prefer having access to your site subject to preemption over having no access at all. Getting Help \u00b6 If you need help with planning your site, follow the contact instructions .","title":"Site Planning"},{"location":"site-planning/#site-planning","text":"The OSG vision is to integrate computing across different resource types and business models to allow campus IT to offer a maximally flexible high throughput computing (HTC) environment for their researchers. This document is for System Administrators and aims to provide an overview of the different options to consider when planning to share resources via the OSG. After reading, you should be able to understand what software or services you want to provide to support your researchers Note This document covers the most common options. OSG is a diverse infrastructure: depending on what groups you want to support, you may need to install additional services. Coordinate with your local researchers.","title":"Site Planning"},{"location":"site-planning/#osg-site-services","text":"The OSG Software stack tries to provide a uniform computing and storage fabric across many independently-managed computing and storage resources. These individual services will be accessed by virtual organizations (VOs), which will delegate the resources to scientists, researchers, and students. Sharing is a fundamental principle for the OSG: your site is encouraged to support as many OSG-registered VOs as local conditions allow. Autonomy is another principle: you are not required to support any VOs you do not want. As the administrator, your task is to make your existing computing and storage resources available to and reliable for your supported VOs. We break this down into three tasks: Getting \"pilot jobs\" submitted to your site batch system. Establishing an OSG runtime environment for running jobs. Delivering data to payload applications to be processed. There are multiple approaches for each item, depending on the VOs you support, and time you have to invest in the OSG. Note An essential concept in the OSG is the \"pilot job\". The pilot, which arrives at your batch system, is sent by the VO to get a resource allocation. However, it does not contain any research payload. Once started, it will connect back to a resource pool and pull down individuals' research \"payload jobs\". Hence, we do not think about submitting \"jobs\" to sites but rather \"resource requests\".","title":"OSG Site Services"},{"location":"site-planning/#pilot-jobs","text":"Traditionally, an OSG Compute Entrypoint (CE) provides remote access for VOs to submit pilot jobs to your local batch system . There are two options for accepting pilot jobs at your site: Hosted CE : OSG will run and operate the CE services at no cost; the site only needs to provide a SSH pubkey-based authentication access to the central OSG host. OSG will interface with the VO and submit pilots directly to your batch system via SSH. By far, this is the simplest option : however, it is less-scalable and the site delegates many of the scheduling decisions to the OSG. Contact help@osg-htc.org for more information on the hosted CE. OSG CE : The traditional option where the site installs and operates a HTCondor-based CE on a dedicated host. This provides the best scalability and flexibility, but may require an ongoing time investment from the site. The OSG CE install and operation is covered in this documentation page . There are additional ways that pilots can be started at a site (either by the site administrator or an end-user); see resource sharing for more details.","title":"Pilot Jobs"},{"location":"site-planning/#runtime-environment","text":"The OSG requires a very minimal runtime environment that can be deployed via tarball , RPM , or through a global filesystem on your cluster's worker nodes. We believe that all research applications should be portable and self-contained, with no OS dependencies. This provides access to the most resources and minimizes the presence at sites. However, this ideal is often difficult to achieve in practice. For sites that want to support a uniform runtime environment, we provide a global filesystem called CVMFS that VOs can use to distribute their own software dependencies. Finally, many researchers use applications that require a specific OS environment - not just individual dependencies - that is distributed as a container. OSG supports the use of the Singularity container runtime with Docker-based image distribution.","title":"Runtime environment"},{"location":"site-planning/#data-services","text":"Whether accessed through CVMFS or command-line software like curl , the majority of software is moved via HTTP in cache-friendly patterns. All sites are highly encouraged to use an HTTP proxy to reduce the load on the WAN from the cluster. Depending on the VOs you want to support, additional data services may be necessary: Some VOs elect to stream their larger input data from offsite using OSG's Data Federation . User jobs can make use of the OSG Data Federation without any services at your site but you may wish to run one or more of the following services: Data Cache to further reduce load on your connection to the WAN. Data Origin to allow local users to stage their data into the OSG Data Federation. The largest sites will additionally run large-scale data services such as a \"storage element\". This is often required for sites that want to support more complex organizations such as ATLAS or CMS.","title":"Data Services"},{"location":"site-planning/#site-policies","text":"Sites are encouraged to clearly specify and communicate their local policies regarding resource access. One common mechanism to do this is post them on a web page and make this page part of your site registration . Written policies help external entities understand what your site wants to accomplish with the OSG -- and are often internally clarifying. In line of our principle of sharing , we encourage you to allow virtual organizations registered with the OSG \"opportunistic use\" of your resources. You may need to preempt those jobs when higher priority jobs come around. The end-users using the OSG generally prefer having access to your site subject to preemption over having no access at all.","title":"Site Policies"},{"location":"site-planning/#getting-help","text":"If you need help with planning your site, follow the contact instructions .","title":"Getting Help"},{"location":"site-verification/","text":"Site Verification \u00b6 After installing and registering services from the site planning document , you will need to perform some verification steps before your site can scale up to full production . Verify OSG Software \u00b6 To verify your site's installation of OSG Software, you will need to: Submit local test jobs Contact the OSG for end-to-end tests of pilot job submission Check that OSG usage is reported to the GRACC Local verification \u00b6 It is useful to submit jobs from within your site to verify CE's ability to submit jobs to your local batch system. Consult the document for submitting jobs into an HTCondor-CE for detailed instructions on how to test job submission. Verify end-to-end pilot job submission \u00b6 Once you have validated job submission from within your site, request test pilot jobs from OSG Factory Operations and provide the following information: The fully qualified domain name of the CE Registered OSG resource name Supported OS version of your worker nodes (e.g., EL7, EL8, or a combination) Support for multicore jobs Support for GPUs Maximum job walltime Maximum job memory usage Once the Factory Operations team has enough information, they will start submitting pilots to your CE. Initially, this will be a handful of pilots at a time but once the factory verifies that pilot jobs are running successfully, that number will be ramped up. Verify reporting and monitoring \u00b6 To verify that your site is correctly reporting to the OSG, visit OSG's Accounting Portal and select your registered OSG site name from the Site dropdown. If you don't see your site in the dropdown, please contact us for assistance . Scale Up to Full Production \u00b6 After verifying end-to-end pilot job submission and usage reporting, your site is ready for production! In the same OSG Factory Operations ticket that you opened above , let OSG staff know when you are ready to accept production pilots. After requesting production pilots, review the documentation for how to maintain an OSG site . Getting Help \u00b6 If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Site Verification"},{"location":"site-verification/#site-verification","text":"After installing and registering services from the site planning document , you will need to perform some verification steps before your site can scale up to full production .","title":"Site Verification"},{"location":"site-verification/#verify-osg-software","text":"To verify your site's installation of OSG Software, you will need to: Submit local test jobs Contact the OSG for end-to-end tests of pilot job submission Check that OSG usage is reported to the GRACC","title":"Verify OSG Software"},{"location":"site-verification/#local-verification","text":"It is useful to submit jobs from within your site to verify CE's ability to submit jobs to your local batch system. Consult the document for submitting jobs into an HTCondor-CE for detailed instructions on how to test job submission.","title":"Local verification"},{"location":"site-verification/#verify-end-to-end-pilot-job-submission","text":"Once you have validated job submission from within your site, request test pilot jobs from OSG Factory Operations and provide the following information: The fully qualified domain name of the CE Registered OSG resource name Supported OS version of your worker nodes (e.g., EL7, EL8, or a combination) Support for multicore jobs Support for GPUs Maximum job walltime Maximum job memory usage Once the Factory Operations team has enough information, they will start submitting pilots to your CE. Initially, this will be a handful of pilots at a time but once the factory verifies that pilot jobs are running successfully, that number will be ramped up.","title":"Verify end-to-end pilot job submission"},{"location":"site-verification/#verify-reporting-and-monitoring","text":"To verify that your site is correctly reporting to the OSG, visit OSG's Accounting Portal and select your registered OSG site name from the Site dropdown. If you don't see your site in the dropdown, please contact us for assistance .","title":"Verify reporting and monitoring"},{"location":"site-verification/#scale-up-to-full-production","text":"After verifying end-to-end pilot job submission and usage reporting, your site is ready for production! In the same OSG Factory Operations ticket that you opened above , let OSG staff know when you are ready to accept production pilots. After requesting production pilots, review the documentation for how to maintain an OSG site .","title":"Scale Up to Full Production"},{"location":"site-verification/#getting-help","text":"If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Getting Help"},{"location":"common/ca/","text":"Installing Certificate Authorities (CAs) \u00b6 The certificate authorities (CAs) provide the trust roots for the public key infrastructure OSG uses to maintain integrity of its sites and services. This document provides details of various options to install the Certificate Authority (CA) certificates and have up-to-date certificate revocation lists (CRLs) on your OSG hosts. We provide three options for installing CA certificates that offer varying levels of control: Install an RPM for a specific set of CA certificates ( default ) Install osg-ca-scripts , a set of scripts that provide fine-grained CA management Install an RPM that doesn't install any CAs. This is useful if you'd like to manage CAs yourself while satisfying RPM dependencies. Prior to following the instructions on this page, you must enable our yum repositories Installing CA Certificates \u00b6 Please choose one of the three options to install CA certificates. Option 1: Install an RPM for a specific set of CA certificates \u00b6 Note This option is the default if you install OSG software without pre-installing CAs. For example, yum install osg-ce will bring in osg-ca-certs by default. In the OSG repositories, you will find two different sets of predefined CA certificates: ( default ) The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. If you chose... Then run the following command... OSG CA certificates yum install osg-ca-certs IGTF CA certificates yum install igtf-ca-certs To automatically keep your RPM installation of CAs up to date, we recommend the OSG CA certificates updater service. Option 2: Install osg-ca-scripts \u00b6 The osg-ca-scripts package provides scripts to install and update predefined sets of CAs with the ability to add or remove specific CAs. The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. Install the osg-ca-scripts package: root@host # yum install osg-ca-scripts Choose and install the CA certificate set: If you choose... Then run the following command... OSG CA certificates osg-ca-manage setupCA --location root --url osg IGTF CA certificates osg-ca-manage setupCA --location root --url igtf Enable the osg-update-certs-cron service to enable periodic CA updates. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> (Optional) To add a new CA: osg-ca-manage add [--dir <local_dir>] --hash <CA-HASH> (Optional) To remove a CA osg-ca-manage remove --hash <CA-HASH> A complete set of options available though osg-ca-manage command, can be found in the osg-ca-manage documentation Option 3: Site-managed CAs \u00b6 If you want to handle the list of CAs completely internally to your site, you can utilize the empty-ca-certs RPM to satisfy RPM dependencies while not actually installing any CAs. To install this RPM, run the following command: root@host # yum install empty-ca-certs \u2013-enablerepo = osg-empty Warning If you choose this option, you are responsible for installing and maintaining the CA certificates. They must be installed in /etc/grid-security/certificates , or a symlink must be made from that location to the directory that contains the CA certificates. Installing other CAs \u00b6 In addition to the above CAs, you can install other CAs via RPM. These only work with the RPMs that provide CAs (that is, osg-ca-certs and the like, but not osg-ca-scripts .) They are in addition to the above RPMs, so do not only install these extra CAs. Set of CAs RPM name Installation command (as root) cilogon-openid cilogon-openid-ca-cert yum install cilogon-openid-ca-cert Verifying CA Certificates \u00b6 After installing or updating the CA certificates, they can be verified with the following command: root@host # curl --cacert <CA FILE> \\ --capath <CA DIRECTORY> \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" Where <CA FILE> is the path to a valid X.509 CA certificate and <CA DIRECTORY> is the path to the directory containing the installed CA certificates. For example, the following command can be used to verify a default OSG CA certificate installation: root@host # curl --cacert /etc/grid-security/certificates/cilogon-osg.pem \\ --capath /etc/grid-security/certificates/ \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 22005 0 22005 0 0 86633 0 --:--:-- --:--:-- --:--:-- 499k CA certificate installation verified If you do not see CA certificate installation verified this means that your CA certificate installation is broken. First, ensure that your CA installation is up-to-date and if you continue to see issues please contact us . Keeping CA Certificates Up-to-date \u00b6 It is important to keep CA certificates up-to-date for services and their clients to maintain integrity of production services. To verify that your CA certificates are on the latest version on a given host, determine the most recently released versions and the method by which your CA certificates have been installed: Retrieve the versions of the most recently released IGTF CA certificates and OSG CA certificates Determine which of the three CA certificate installation methods you are using: # rpm -q igtf-ca-certs osg-ca-certs osg-ca-scripts empty-ca-certs Based on which package is installed from the output in the previous step, choose one of the following options: If igtf-ca-certs or osg-ca-certs is installed , compare the installed version from step 2 to the corresponding version from step 1. If the version is older than the corresponding version from step 1, continue onto option 1 to upgrade your current installation and keep your installation up-to-date. If the versions match, your CA certificates are up-to-date! If osg-ca-scripts is installed , run the following command to update your CA certificates: # osg-ca-manage refreshCA And continue to the instructions in option 2 to enable automatic updates of your CA certificates. If empty-ca-scripts is installed , then you are responsible for maintaining your own CA certificates as outlined in option 3 . If none of the packages are installed , your host likely does not need CA certificates and you are done. Managing Certificate Revocation Lists \u00b6 In addition to CA certificates, you must have updated Certificate Revocation Lists (CRLs). CRLs contain certificate blacklists that OSG software uses to ensure that your hosts are only talking to valid clients or servers. To maintain up to date CAs, you will need to run the fetch-crl services. Note Normally fetch-crl is installed when you install the rest of the software and you do not need to explicitly install it. If you do wish to install it manually, run the following command: root@host # yum install fetch-crl If you do not wish to change the frequency of fetch-crl updates (default: every 6 hours) or use syslog for fetch-crl output, skip to the service management section Optional: configuring fetch-crl \u00b6 The following sub-sections contain optional configuration instructions. Note Note that the nosymlinks option in the configuration files refers to ignoring links within the certificates directory (e.g. two different names for the same file). It is perfectly fine if the path of the CA certificates directory itself ( infodir ) is a link to a directory. Changing the frequency of fetch-crl-cron \u00b6 To modify the times that fetch-crl-cron runs, edit /etc/cron.d/fetch-crl . Logging with syslog \u00b6 fetch-crl can produce quite a bit of output when run in verbose mode. To send fetch-crl output to syslog, use the following instructions: Change the configuration file to enable syslog: logmode = syslog syslogfacility = daemon Make sure the file /var/log/daemon exists, e.g. touching the file Change /etc/logrotate.d files to rotate it Managing fetch-crl services \u00b6 fetch-crl is installed as two different system services. The fetch-crl-boot service runs fetch-crl and is intended to only be enabled or disabled. The fetch-crl-cron service runs fetch-crl every 6 hours (with a random sleep time included). Both services are disabled by default. At the very minimum, the fetch-crl-cron service needs to be enabled and started, otherwise services will begin to fail as existing CRLs expire. Software Service name Notes Fetch CRL fetch-crl.timer (EL8-only) Runs fetch-crl every 6 hours and on boot fetch-crl-cron (EL7-only) Runs fetch-crl every 6 hours fetch-crl-boot (EL7-only) Runs fetch-crl immediately and on boot Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> Getting Help \u00b6 To get assistance, please use the this page . References \u00b6 Some guides on X.509 certificates: Useful commands: http://security.ncsa.illinois.edu/research/grid-howtos/usefulopenssl.html Install GSI authentication on a server: http://security.ncsa.illinois.edu/research/wssec/gsihttps/ Certificates how-to: http://www.nordugrid.org/documents/certificate_howto.html See this page for examples of verifying certificates. Related software: osg-ca-manage osg-ca-certs-updater Configuration files \u00b6 Package File Description Location Comment All CA Packages CA File Location /etc/grid-security/certificates All CA Packages Index files /etc/grid-security/certificates/INDEX.html or /etc/grid-security/certificates/INDEX.txt Latest version also available at http://repo.opensciencegrid.org/cadist/ All CA Packages Change Log /etc/grid-security/certificates/CHANGES Latest version also available at http://repo.opensciencegrid.org/cadist/CHANGES osg-ca-certs or igtf-ca-certs contain only CA files osg-ca-scripts Configuration File for osg-update-certs /etc/osg/osg-update-certs.conf This file may be edited by hand, though it is recommended to use osg-ca-manage to set configuration parameters. fetch-crl-3.x Configuration file /etc/fetch-crl.conf The index and change log files contain a summary of all the CA distributed and their version. Logs files \u00b6 Package File Description Location osg-ca-scripts Log file of osg-update-certs /var/log/osg-update-certs.log osg-ca-scripts Stdout of osg-update-certs /var/log/osg-ca-certs-status.system.out osg-ca-scripts Stdout of osg-ca-manage /var/log/osg-ca-manage.system.out osg-ca-scripts Stdout of initial CA setup /var/log/osg-setup-ca-certificates.system.out","title":"Overview"},{"location":"common/ca/#installing-certificate-authorities-cas","text":"The certificate authorities (CAs) provide the trust roots for the public key infrastructure OSG uses to maintain integrity of its sites and services. This document provides details of various options to install the Certificate Authority (CA) certificates and have up-to-date certificate revocation lists (CRLs) on your OSG hosts. We provide three options for installing CA certificates that offer varying levels of control: Install an RPM for a specific set of CA certificates ( default ) Install osg-ca-scripts , a set of scripts that provide fine-grained CA management Install an RPM that doesn't install any CAs. This is useful if you'd like to manage CAs yourself while satisfying RPM dependencies. Prior to following the instructions on this page, you must enable our yum repositories","title":"Installing Certificate Authorities (CAs)"},{"location":"common/ca/#installing-ca-certificates","text":"Please choose one of the three options to install CA certificates.","title":"Installing CA Certificates"},{"location":"common/ca/#option-1-install-an-rpm-for-a-specific-set-of-ca-certificates","text":"Note This option is the default if you install OSG software without pre-installing CAs. For example, yum install osg-ce will bring in osg-ca-certs by default. In the OSG repositories, you will find two different sets of predefined CA certificates: ( default ) The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. If you chose... Then run the following command... OSG CA certificates yum install osg-ca-certs IGTF CA certificates yum install igtf-ca-certs To automatically keep your RPM installation of CAs up to date, we recommend the OSG CA certificates updater service.","title":"Option 1: Install an RPM for a specific set of CA certificates"},{"location":"common/ca/#option-2-install-osg-ca-scripts","text":"The osg-ca-scripts package provides scripts to install and update predefined sets of CAs with the ability to add or remove specific CAs. The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. Install the osg-ca-scripts package: root@host # yum install osg-ca-scripts Choose and install the CA certificate set: If you choose... Then run the following command... OSG CA certificates osg-ca-manage setupCA --location root --url osg IGTF CA certificates osg-ca-manage setupCA --location root --url igtf Enable the osg-update-certs-cron service to enable periodic CA updates. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> (Optional) To add a new CA: osg-ca-manage add [--dir <local_dir>] --hash <CA-HASH> (Optional) To remove a CA osg-ca-manage remove --hash <CA-HASH> A complete set of options available though osg-ca-manage command, can be found in the osg-ca-manage documentation","title":"Option 2: Install osg-ca-scripts"},{"location":"common/ca/#option-3-site-managed-cas","text":"If you want to handle the list of CAs completely internally to your site, you can utilize the empty-ca-certs RPM to satisfy RPM dependencies while not actually installing any CAs. To install this RPM, run the following command: root@host # yum install empty-ca-certs \u2013-enablerepo = osg-empty Warning If you choose this option, you are responsible for installing and maintaining the CA certificates. They must be installed in /etc/grid-security/certificates , or a symlink must be made from that location to the directory that contains the CA certificates.","title":"Option 3: Site-managed CAs"},{"location":"common/ca/#installing-other-cas","text":"In addition to the above CAs, you can install other CAs via RPM. These only work with the RPMs that provide CAs (that is, osg-ca-certs and the like, but not osg-ca-scripts .) They are in addition to the above RPMs, so do not only install these extra CAs. Set of CAs RPM name Installation command (as root) cilogon-openid cilogon-openid-ca-cert yum install cilogon-openid-ca-cert","title":"Installing other CAs"},{"location":"common/ca/#verifying-ca-certificates","text":"After installing or updating the CA certificates, they can be verified with the following command: root@host # curl --cacert <CA FILE> \\ --capath <CA DIRECTORY> \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" Where <CA FILE> is the path to a valid X.509 CA certificate and <CA DIRECTORY> is the path to the directory containing the installed CA certificates. For example, the following command can be used to verify a default OSG CA certificate installation: root@host # curl --cacert /etc/grid-security/certificates/cilogon-osg.pem \\ --capath /etc/grid-security/certificates/ \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 22005 0 22005 0 0 86633 0 --:--:-- --:--:-- --:--:-- 499k CA certificate installation verified If you do not see CA certificate installation verified this means that your CA certificate installation is broken. First, ensure that your CA installation is up-to-date and if you continue to see issues please contact us .","title":"Verifying CA Certificates"},{"location":"common/ca/#keeping-ca-certificates-up-to-date","text":"It is important to keep CA certificates up-to-date for services and their clients to maintain integrity of production services. To verify that your CA certificates are on the latest version on a given host, determine the most recently released versions and the method by which your CA certificates have been installed: Retrieve the versions of the most recently released IGTF CA certificates and OSG CA certificates Determine which of the three CA certificate installation methods you are using: # rpm -q igtf-ca-certs osg-ca-certs osg-ca-scripts empty-ca-certs Based on which package is installed from the output in the previous step, choose one of the following options: If igtf-ca-certs or osg-ca-certs is installed , compare the installed version from step 2 to the corresponding version from step 1. If the version is older than the corresponding version from step 1, continue onto option 1 to upgrade your current installation and keep your installation up-to-date. If the versions match, your CA certificates are up-to-date! If osg-ca-scripts is installed , run the following command to update your CA certificates: # osg-ca-manage refreshCA And continue to the instructions in option 2 to enable automatic updates of your CA certificates. If empty-ca-scripts is installed , then you are responsible for maintaining your own CA certificates as outlined in option 3 . If none of the packages are installed , your host likely does not need CA certificates and you are done.","title":"Keeping CA Certificates Up-to-date"},{"location":"common/ca/#managing-certificate-revocation-lists","text":"In addition to CA certificates, you must have updated Certificate Revocation Lists (CRLs). CRLs contain certificate blacklists that OSG software uses to ensure that your hosts are only talking to valid clients or servers. To maintain up to date CAs, you will need to run the fetch-crl services. Note Normally fetch-crl is installed when you install the rest of the software and you do not need to explicitly install it. If you do wish to install it manually, run the following command: root@host # yum install fetch-crl If you do not wish to change the frequency of fetch-crl updates (default: every 6 hours) or use syslog for fetch-crl output, skip to the service management section","title":"Managing Certificate Revocation Lists"},{"location":"common/ca/#optional-configuring-fetch-crl","text":"The following sub-sections contain optional configuration instructions. Note Note that the nosymlinks option in the configuration files refers to ignoring links within the certificates directory (e.g. two different names for the same file). It is perfectly fine if the path of the CA certificates directory itself ( infodir ) is a link to a directory.","title":"Optional: configuring fetch-crl"},{"location":"common/ca/#changing-the-frequency-of-fetch-crl-cron","text":"To modify the times that fetch-crl-cron runs, edit /etc/cron.d/fetch-crl .","title":"Changing the frequency of fetch-crl-cron"},{"location":"common/ca/#logging-with-syslog","text":"fetch-crl can produce quite a bit of output when run in verbose mode. To send fetch-crl output to syslog, use the following instructions: Change the configuration file to enable syslog: logmode = syslog syslogfacility = daemon Make sure the file /var/log/daemon exists, e.g. touching the file Change /etc/logrotate.d files to rotate it","title":"Logging with syslog"},{"location":"common/ca/#managing-fetch-crl-services","text":"fetch-crl is installed as two different system services. The fetch-crl-boot service runs fetch-crl and is intended to only be enabled or disabled. The fetch-crl-cron service runs fetch-crl every 6 hours (with a random sleep time included). Both services are disabled by default. At the very minimum, the fetch-crl-cron service needs to be enabled and started, otherwise services will begin to fail as existing CRLs expire. Software Service name Notes Fetch CRL fetch-crl.timer (EL8-only) Runs fetch-crl every 6 hours and on boot fetch-crl-cron (EL7-only) Runs fetch-crl every 6 hours fetch-crl-boot (EL7-only) Runs fetch-crl immediately and on boot Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME>","title":"Managing fetch-crl services"},{"location":"common/ca/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"common/ca/#references","text":"Some guides on X.509 certificates: Useful commands: http://security.ncsa.illinois.edu/research/grid-howtos/usefulopenssl.html Install GSI authentication on a server: http://security.ncsa.illinois.edu/research/wssec/gsihttps/ Certificates how-to: http://www.nordugrid.org/documents/certificate_howto.html See this page for examples of verifying certificates. Related software: osg-ca-manage osg-ca-certs-updater","title":"References"},{"location":"common/ca/#configuration-files","text":"Package File Description Location Comment All CA Packages CA File Location /etc/grid-security/certificates All CA Packages Index files /etc/grid-security/certificates/INDEX.html or /etc/grid-security/certificates/INDEX.txt Latest version also available at http://repo.opensciencegrid.org/cadist/ All CA Packages Change Log /etc/grid-security/certificates/CHANGES Latest version also available at http://repo.opensciencegrid.org/cadist/CHANGES osg-ca-certs or igtf-ca-certs contain only CA files osg-ca-scripts Configuration File for osg-update-certs /etc/osg/osg-update-certs.conf This file may be edited by hand, though it is recommended to use osg-ca-manage to set configuration parameters. fetch-crl-3.x Configuration file /etc/fetch-crl.conf The index and change log files contain a summary of all the CA distributed and their version.","title":"Configuration files"},{"location":"common/ca/#logs-files","text":"Package File Description Location osg-ca-scripts Log file of osg-update-certs /var/log/osg-update-certs.log osg-ca-scripts Stdout of osg-update-certs /var/log/osg-ca-certs-status.system.out osg-ca-scripts Stdout of osg-ca-manage /var/log/osg-ca-manage.system.out osg-ca-scripts Stdout of initial CA setup /var/log/osg-setup-ca-certificates.system.out","title":"Logs files"},{"location":"common/contact-registration/","text":"Registering Contact Information \u00b6 OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinate in case of security incidents or troubleshooting services. The OSG contact management service is backed by InCommon federation , meaning that contacts may register with the OSG using their institutional identities with familiar Single Sign-On forms. Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations). How do I register a mailing list? If you would like to register a mailing list as a contact for your site, please contact us directly . Submitting an Application \u00b6 To register with the OSG, submit an application using the self-signup process: Visit https://osg-htc.org/register You will be presented with a Single-Sign On page. Select your insitution and sign in with your insitutional credentials: Help, my institution does not show up in the drop-down! If your institution does not show up in the drop-down menu, then your institution is not part of the InCommon federation . In this case, we recommend using an ORCID account instead, registering a new one if necessary. After you have signed in, you will be presented with the self-signup form. Click the \"BEGIN\" button: Enter your name, email address, GitHub username (optional), and a comment describing why you are registering as a participant in the OSG Consortium. Your institution may provide defaults for your name and email address but you may override these values. Once you have updated all the fields to your liking, click the \"SUBMIT\" button: Verifying Your Email Address \u00b6 After submitting your registration application, you will receive an email from registry@cilogon.org to verify your email address. Follow the link in the email and click the \"Accept\" button to complete the verification: Wait for URL redirection After clicking the email verification link, be sure to let the page to completely load (you will be redirected back to this page), otherwise you may have issues completing your registration. If you believe this has happened to you, please contact us for assistance. Help, my email verification link has expired! If the email verification link has expired, please contact us to request a new verification link. Waiting for Approval \u00b6 After verifying your email address, your registration application must be approved by OSG staff. Once your registration application has been approved, you will receive a confirmation email: Once you have received your confirmation email, you may start using OSG services such as registering your resources . OASIS Managers: Adding an SSH Key \u00b6 After approval by OSG staff, OASIS managers must upload a public SSH key before being able to access the OASIS login host: Visit https://osg-htc.org/register and login if prompted Click your name in the top right to get a dropdown and click the My Profile (OSG) button On the right-side of your profile, click the Authenticators link: On the authenticators page, click the Manage button: On the SSH keys page, click the Add SSH Key link: Finally, upload your public SSH key from your computer: Getting Help \u00b6 For assistance with the OSG contact registration process, please use this page .","title":"Contact Information"},{"location":"common/contact-registration/#registering-contact-information","text":"OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinate in case of security incidents or troubleshooting services. The OSG contact management service is backed by InCommon federation , meaning that contacts may register with the OSG using their institutional identities with familiar Single Sign-On forms. Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations). How do I register a mailing list? If you would like to register a mailing list as a contact for your site, please contact us directly .","title":"Registering Contact Information"},{"location":"common/contact-registration/#submitting-an-application","text":"To register with the OSG, submit an application using the self-signup process: Visit https://osg-htc.org/register You will be presented with a Single-Sign On page. Select your insitution and sign in with your insitutional credentials: Help, my institution does not show up in the drop-down! If your institution does not show up in the drop-down menu, then your institution is not part of the InCommon federation . In this case, we recommend using an ORCID account instead, registering a new one if necessary. After you have signed in, you will be presented with the self-signup form. Click the \"BEGIN\" button: Enter your name, email address, GitHub username (optional), and a comment describing why you are registering as a participant in the OSG Consortium. Your institution may provide defaults for your name and email address but you may override these values. Once you have updated all the fields to your liking, click the \"SUBMIT\" button:","title":"Submitting an Application"},{"location":"common/contact-registration/#verifying-your-email-address","text":"After submitting your registration application, you will receive an email from registry@cilogon.org to verify your email address. Follow the link in the email and click the \"Accept\" button to complete the verification: Wait for URL redirection After clicking the email verification link, be sure to let the page to completely load (you will be redirected back to this page), otherwise you may have issues completing your registration. If you believe this has happened to you, please contact us for assistance. Help, my email verification link has expired! If the email verification link has expired, please contact us to request a new verification link.","title":"Verifying Your Email Address"},{"location":"common/contact-registration/#waiting-for-approval","text":"After verifying your email address, your registration application must be approved by OSG staff. Once your registration application has been approved, you will receive a confirmation email: Once you have received your confirmation email, you may start using OSG services such as registering your resources .","title":"Waiting for Approval"},{"location":"common/contact-registration/#oasis-managers-adding-an-ssh-key","text":"After approval by OSG staff, OASIS managers must upload a public SSH key before being able to access the OASIS login host: Visit https://osg-htc.org/register and login if prompted Click your name in the top right to get a dropdown and click the My Profile (OSG) button On the right-side of your profile, click the Authenticators link: On the authenticators page, click the Manage button: On the SSH keys page, click the Add SSH Key link: Finally, upload your public SSH key from your computer:","title":"OASIS Managers: Adding an SSH Key"},{"location":"common/contact-registration/#getting-help","text":"For assistance with the OSG contact registration process, please use this page .","title":"Getting Help"},{"location":"common/help/","text":"How to Get Help \u00b6 This page is aimed at OSG site administrators looking for support. Help for OSG users can be found at our support desk . Security Incidents \u00b6 Security incidents can be reported by following the instructions on the Incident Discovery and Reporting page. Software or Service Support \u00b6 If you are experiencing issues with OSG software or services, please consult the following resources before opening a support inquiry: Troubleshooting sections or pages for the problematic software Recent OSG Software release notes OSG 23 OSG 3.6 Outage information for OSG services Submitting support inquiries \u00b6 If your problem still hasn't been resolved by consulting the resources above, please submit a support inquiry with the information noted below: If you came to this page from an installation guide, please provide the following information: Commands and output from any Troubleshooting sections or pages The OSG system profile ( osg-profile.txt ), generated by running the following command: root@host # osg-system-profiler Submit a support inquiry to the system based on the VOs that you are associated with: If you are primarily associated with... Submit new tickets to... LHC VOs GGUS Anyone else help@osg-htc.org Community-specific support \u00b6 Some OSG VOs have dedicated forums or mechanisms for community-specific support. If your VO provides user support, that should be a user's first line of support because the VO is most familiar with your applications and requirements. The list of support centers for OSG VOs can be found in the here . Resources for CMS sites: http://www.uscms.org/uscms_at_work/physics/computing/grid/index.shtml CMS Hyper News: https://hypernews.cern.ch/HyperNews/CMS/get/osg-tier3.html CMS Twiki: https://twiki.cern.ch/twiki/bin/viewauth/CMS/USTier3Computing","title":"Help / Security Incidents"},{"location":"common/help/#how-to-get-help","text":"This page is aimed at OSG site administrators looking for support. Help for OSG users can be found at our support desk .","title":"How to Get Help"},{"location":"common/help/#security-incidents","text":"Security incidents can be reported by following the instructions on the Incident Discovery and Reporting page.","title":"Security Incidents"},{"location":"common/help/#software-or-service-support","text":"If you are experiencing issues with OSG software or services, please consult the following resources before opening a support inquiry: Troubleshooting sections or pages for the problematic software Recent OSG Software release notes OSG 23 OSG 3.6 Outage information for OSG services","title":"Software or Service Support"},{"location":"common/help/#submitting-support-inquiries","text":"If your problem still hasn't been resolved by consulting the resources above, please submit a support inquiry with the information noted below: If you came to this page from an installation guide, please provide the following information: Commands and output from any Troubleshooting sections or pages The OSG system profile ( osg-profile.txt ), generated by running the following command: root@host # osg-system-profiler Submit a support inquiry to the system based on the VOs that you are associated with: If you are primarily associated with... Submit new tickets to... LHC VOs GGUS Anyone else help@osg-htc.org","title":"Submitting support inquiries"},{"location":"common/help/#community-specific-support","text":"Some OSG VOs have dedicated forums or mechanisms for community-specific support. If your VO provides user support, that should be a user's first line of support because the VO is most familiar with your applications and requirements. The list of support centers for OSG VOs can be found in the here . Resources for CMS sites: http://www.uscms.org/uscms_at_work/physics/computing/grid/index.shtml CMS Hyper News: https://hypernews.cern.ch/HyperNews/CMS/get/osg-tier3.html CMS Twiki: https://twiki.cern.ch/twiki/bin/viewauth/CMS/USTier3Computing","title":"Community-specific support"},{"location":"common/registration/","text":"Registering with the OSG Consortium \u00b6 OSG staff keeps a registry containing active projects, collaborations (a.k.a. virtual organizations or VOs), resources, and resource downtimes stored as YAML files in the topology GitHub repository . This registry is used for accounting data , contact information, and resource availability. Use this page to learn how to register information in the OSG Consortium. Registration Requirements \u00b6 The instructions in this document require the following: A GitHub account A working knowledge of GitHub collaboration OSG contact registration Registering Contacts \u00b6 OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinating in case of security incidents or troubleshooting services. To register your contact information with the OSG Consortium, follow the instructions in this document . Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations). Registering Resources \u00b6 An OSG resource is a host that provides services to OSG campuses and collaborations; some examples are Compute Entrypoints, storage endpoints, or perfSONAR hosts. See the full list of services that should be registered in the OSG topology here . OSG resources are stored under a hierarchy of facilities, sites, and resource groups, defined as follows: Facility : The institution or company name where your resource is located. Site : Smaller than a facility; typically represents a computing center or an academic department. Frequently used as the display name for accounting dashboards . Resource Group : A logical grouping of resources at a site, i.e. all resources associated with a specific computing cluster. Multi-resource downtimes are easiest to declare across a resource group. Production and testing resources must be placed into separate resource groups. Resource : A host that provides services, e.g. Compute Entrypoints, storage endpoints, or perfSONAR hosts. Throughout this document, you will be asked to substitute your own facility, site, resource group, and resource names when registering with the OSG. If you don't already know the relevant names for your resource, using the following naming conventions: Level Naming convention Facility Unabbreviated institution or company name, e.g. University of Wisconsin - Madison Site Computing center or academic department, e.g. CHTC , MWT2 ATLAS UC , San Diego Supercomputer Center The only characters allowed in Site names are letters, numbers, underscores, hyphens, and spaces; i.e., a Site name must match the regular expression ^[A-Za-z0-9_ -]+$ Resource Group Abbreviated facility, site, and cluster name. Resource groups used for testing purposes should have an -ITB or - ITB suffix, e.g. TCNJ-ELSA-ITB Resource In all capital letters, <ABBREV FACILTY>-<CLUSTER>-<RESOURCE TYPE> , for example: TCNJ-ELSA-CE or NMSU-AGGIE-GRID-SQUID If you don't know which VO to use, pick OSG . OSG resources are stored in the GitHub repository as YAML files under a directory structure that reflects the above hierarchy, i.e. topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml from the root of the topology repository . New site \u00b6 To register a site, first choose a name for it (see the naming conventions table above ) The site name will appear in OSG accounting in places such as the GRACC site dashboard . Once you have chosen a site name, open the following in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology/<FACILITY>/<SITE>/SITE.yaml (replacing <FACILITY> and <SITE> with the facility and the site name that you chose ). \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the site template as a guide. You may leave the ID field blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding AggieGrid cluster for New Mexico State Searching for resources \u00b6 Whether you are registering a new resource or modifying an existing resource, start by searching for the FQDN of your host to avoid any duplicate registrations: Open the topology repository in your browser. Search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search doesn't return any results , skip to these instructions for registering a new resource. If the search returns a single YAML file , open the link to the YAML file and skip to these instructions for modifying existing resources. If the search returns more than one YAML file , please contact us . Note If you are adding a new service to a host which is already registered as a resource, follow the instructions for modifying existing resources. New resources \u00b6 Before registering a new resource, make sure that its FQDN is not already registered . To register a new resource, follow the instructions below: Find the facility, site, and resource group for your resource in the topology repository under this directory structure: topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml . When searching for these, keep in mind that case and spaces matter. If you do not have a facility, contact help@osg-htc.org for help. If you have a facility but not a site, first follow the instructions for registering a site above. If you have a facility and a site but not a resource group, pick a resource group name . Once you have your facility, site, and resource group, follow the instructions below, replacing instances of <FACILITY> , <SITE> , and <RESOURCE GROUP> with the corresponding names that you chose above : If your resource group already exists under your facility and site, open the following URL in your browser: https://github.com/opensciencegrid/topology/edit/master/topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml For example, to add a resource to the CHTC resource group for the CHTC site at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/edit/master/topology/University of Wisconsin/CHTC/CHTC.yaml If your resource group does not exist, open the following URL in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml For example, to create a CHTC-Slurm-HPC resource group for the Center for High Throughput Computing ( CHTC ) at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/new/master?filename=topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding a new compute entrypoint to the CHTC Modifying existing resources \u00b6 To modify an existing resource, follow these instructions: Find the resource that you would like to modify by searching GitHub , and open the link to the YAML file. Click the branch selector button next to the file path and select the master branch. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. Make sure that the formatting and indentation of the modified entry does not change. If you are adding a new service to a host that is already registered as a resource, add the new service to the existing resource; do not create a new resource for the same host. !!! note \"\"You're editing a file in a project you don't have write access to.\"\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating administrative contact information for CHTC-glidein2 Retiring resources \u00b6 To retire an already registered resource, set Active: false . For example: ... Production: true Resources: GLOW: Active: false ... Services: CE: Description: Compute Entrypoint Details: hidden: false If the Active attribute does not already exist within the resource definition, add it. If your resource becomes available again, set Active: true . Registering Resource Downtimes \u00b6 Resource downtime is a finite period of time for which one or more of the services of a registered resource are unavailable. Warning If you expect your resource to be indefinitely unavailable, retire the resource instead of registering a downtime. Downtimes are stored in YAML files alongside the resource group YAML files as described here . For example, downtimes for resources in the CHTC-Slurm-HPC resource group of the CHTC site at the University of Wisconsin can be found and registered in the following file, relative to the root of the topology repository : topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC_downtime.yaml Note Do not put downtime updates in the same pull request as other topology updates. Registering new downtime \u00b6 To register a new downtime for a resource or for multiples resources that are part of a resource group, you will use webforms to generate the contents of the downtime entry, copy it into the downtime file corresponding to your resource, and submit it as a GitHub pull request. Follow the instructions below: Open one of the downtime generation webforms in your browser: Use the resource downtime generator if you only need to declare a downtime for a single resource. Use the resource group downtime generator if you need to declare a downtime for multiple resources across a resource group. Select your facility, site, resource group, and/or resource from the corresponding lists. For the single resource downtime form: Select all the services that will be down. To select multiple, use Control-Click on Windows and Linux, or Command-Click on macOS. Fill the other fields with information about the downtime. Click the Generate button. If the information is valid, a block of text will be displayed in the box labeled Generated YAML . Otherwise, check for error messages and fix your input. Follow the instructions shown below the generated block of text. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Wait for OSG staff to approve and merge your new downtime. Modifying existing downtime \u00b6 In case an already registered downtime is incorrect or need to be updated to reflect new information, you can modify existing downtime entries using the GitHub editor. Failure Changes to the ID or CreatedTime fields will be rejected. To modify an existing downtime entry for a registered resource, manually make the changes in the matching downtime YAML file. Follow the instructions below: Open the topology repository in your browser. If you do not know the facility, site, and resource group of the resource the downtime entry refers to, search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search returns a single YAML file , note the name of the facility, site, and resource group and continue to the next step. If the search doesn't return any results or returns more than one YAML file , please contact us . Open the following URL in your browser using the facility, site, and resource group names to replace <FACILITY> , <SITE> , and <RESOURCE GROUP> , respectively: https://github.com/opensciencegrid/topology/edit/master/topology/<FACILITY>/<SITE>/<RESOURCE GROUP>_downtime.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the downtime template as a reference. Make sure that the formatting and indentation of the modified entry does not change. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Move forward end date for CHTC-glidein2 regular maintenance Wait for OSG staff to approve and merge your modified downtime. Registering Virtual Organizations \u00b6 Virtual Organizations (VOs) are sets of groups or individuals defined by some common cyber-infrastructure need. This can be a scientific experiment, a university campus or a distributed research effort. A VO represents all its members and their common needs in distributed computing environment. A VO also includes the group\u2019s computing/storage resources and services. For more information about VOs, see this page . Info Before submitting a registration for a new VO, please contact us describing your organization's computing needs. VO information is stored as YAML files in the virtual-organizations directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your VO in the list, open the file and continue to the next step. If you do not see your VO in the list, click Create new file button: In the new file dialog, enter <VO>.yaml , replacing <VO> with the name of your VO. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the VO template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the GLOW VO Registering Projects \u00b6 Info Before submitting a registration for a new project, please contact us describing your organization's computing needs. Project information is stored as YAML files in the projects directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your project in the list, open the file and continue to the next step. If you do not see your project in the list, click Create new file button: In the new file dialog, enter <PROJECT>.yaml , replacing <PROJECT> with the name of your project. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the project template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the Mu2e project Getting Help \u00b6 To get assistance, please use the this page .","title":"Resources and Collaborations"},{"location":"common/registration/#registering-with-the-osg-consortium","text":"OSG staff keeps a registry containing active projects, collaborations (a.k.a. virtual organizations or VOs), resources, and resource downtimes stored as YAML files in the topology GitHub repository . This registry is used for accounting data , contact information, and resource availability. Use this page to learn how to register information in the OSG Consortium.","title":"Registering with the OSG Consortium"},{"location":"common/registration/#registration-requirements","text":"The instructions in this document require the following: A GitHub account A working knowledge of GitHub collaboration OSG contact registration","title":"Registration Requirements"},{"location":"common/registration/#registering-contacts","text":"OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinating in case of security incidents or troubleshooting services. To register your contact information with the OSG Consortium, follow the instructions in this document . Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations).","title":"Registering Contacts"},{"location":"common/registration/#registering-resources","text":"An OSG resource is a host that provides services to OSG campuses and collaborations; some examples are Compute Entrypoints, storage endpoints, or perfSONAR hosts. See the full list of services that should be registered in the OSG topology here . OSG resources are stored under a hierarchy of facilities, sites, and resource groups, defined as follows: Facility : The institution or company name where your resource is located. Site : Smaller than a facility; typically represents a computing center or an academic department. Frequently used as the display name for accounting dashboards . Resource Group : A logical grouping of resources at a site, i.e. all resources associated with a specific computing cluster. Multi-resource downtimes are easiest to declare across a resource group. Production and testing resources must be placed into separate resource groups. Resource : A host that provides services, e.g. Compute Entrypoints, storage endpoints, or perfSONAR hosts. Throughout this document, you will be asked to substitute your own facility, site, resource group, and resource names when registering with the OSG. If you don't already know the relevant names for your resource, using the following naming conventions: Level Naming convention Facility Unabbreviated institution or company name, e.g. University of Wisconsin - Madison Site Computing center or academic department, e.g. CHTC , MWT2 ATLAS UC , San Diego Supercomputer Center The only characters allowed in Site names are letters, numbers, underscores, hyphens, and spaces; i.e., a Site name must match the regular expression ^[A-Za-z0-9_ -]+$ Resource Group Abbreviated facility, site, and cluster name. Resource groups used for testing purposes should have an -ITB or - ITB suffix, e.g. TCNJ-ELSA-ITB Resource In all capital letters, <ABBREV FACILTY>-<CLUSTER>-<RESOURCE TYPE> , for example: TCNJ-ELSA-CE or NMSU-AGGIE-GRID-SQUID If you don't know which VO to use, pick OSG . OSG resources are stored in the GitHub repository as YAML files under a directory structure that reflects the above hierarchy, i.e. topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml from the root of the topology repository .","title":"Registering Resources"},{"location":"common/registration/#new-site","text":"To register a site, first choose a name for it (see the naming conventions table above ) The site name will appear in OSG accounting in places such as the GRACC site dashboard . Once you have chosen a site name, open the following in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology/<FACILITY>/<SITE>/SITE.yaml (replacing <FACILITY> and <SITE> with the facility and the site name that you chose ). \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the site template as a guide. You may leave the ID field blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding AggieGrid cluster for New Mexico State","title":"New site"},{"location":"common/registration/#searching-for-resources","text":"Whether you are registering a new resource or modifying an existing resource, start by searching for the FQDN of your host to avoid any duplicate registrations: Open the topology repository in your browser. Search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search doesn't return any results , skip to these instructions for registering a new resource. If the search returns a single YAML file , open the link to the YAML file and skip to these instructions for modifying existing resources. If the search returns more than one YAML file , please contact us . Note If you are adding a new service to a host which is already registered as a resource, follow the instructions for modifying existing resources.","title":"Searching for resources"},{"location":"common/registration/#new-resources","text":"Before registering a new resource, make sure that its FQDN is not already registered . To register a new resource, follow the instructions below: Find the facility, site, and resource group for your resource in the topology repository under this directory structure: topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml . When searching for these, keep in mind that case and spaces matter. If you do not have a facility, contact help@osg-htc.org for help. If you have a facility but not a site, first follow the instructions for registering a site above. If you have a facility and a site but not a resource group, pick a resource group name . Once you have your facility, site, and resource group, follow the instructions below, replacing instances of <FACILITY> , <SITE> , and <RESOURCE GROUP> with the corresponding names that you chose above : If your resource group already exists under your facility and site, open the following URL in your browser: https://github.com/opensciencegrid/topology/edit/master/topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml For example, to add a resource to the CHTC resource group for the CHTC site at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/edit/master/topology/University of Wisconsin/CHTC/CHTC.yaml If your resource group does not exist, open the following URL in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml For example, to create a CHTC-Slurm-HPC resource group for the Center for High Throughput Computing ( CHTC ) at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/new/master?filename=topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding a new compute entrypoint to the CHTC","title":"New resources"},{"location":"common/registration/#modifying-existing-resources","text":"To modify an existing resource, follow these instructions: Find the resource that you would like to modify by searching GitHub , and open the link to the YAML file. Click the branch selector button next to the file path and select the master branch. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. Make sure that the formatting and indentation of the modified entry does not change. If you are adding a new service to a host that is already registered as a resource, add the new service to the existing resource; do not create a new resource for the same host. !!! note \"\"You're editing a file in a project you don't have write access to.\"\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating administrative contact information for CHTC-glidein2","title":"Modifying existing resources"},{"location":"common/registration/#retiring-resources","text":"To retire an already registered resource, set Active: false . For example: ... Production: true Resources: GLOW: Active: false ... Services: CE: Description: Compute Entrypoint Details: hidden: false If the Active attribute does not already exist within the resource definition, add it. If your resource becomes available again, set Active: true .","title":"Retiring resources"},{"location":"common/registration/#registering-resource-downtimes","text":"Resource downtime is a finite period of time for which one or more of the services of a registered resource are unavailable. Warning If you expect your resource to be indefinitely unavailable, retire the resource instead of registering a downtime. Downtimes are stored in YAML files alongside the resource group YAML files as described here . For example, downtimes for resources in the CHTC-Slurm-HPC resource group of the CHTC site at the University of Wisconsin can be found and registered in the following file, relative to the root of the topology repository : topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC_downtime.yaml Note Do not put downtime updates in the same pull request as other topology updates.","title":"Registering Resource Downtimes"},{"location":"common/registration/#registering-new-downtime","text":"To register a new downtime for a resource or for multiples resources that are part of a resource group, you will use webforms to generate the contents of the downtime entry, copy it into the downtime file corresponding to your resource, and submit it as a GitHub pull request. Follow the instructions below: Open one of the downtime generation webforms in your browser: Use the resource downtime generator if you only need to declare a downtime for a single resource. Use the resource group downtime generator if you need to declare a downtime for multiple resources across a resource group. Select your facility, site, resource group, and/or resource from the corresponding lists. For the single resource downtime form: Select all the services that will be down. To select multiple, use Control-Click on Windows and Linux, or Command-Click on macOS. Fill the other fields with information about the downtime. Click the Generate button. If the information is valid, a block of text will be displayed in the box labeled Generated YAML . Otherwise, check for error messages and fix your input. Follow the instructions shown below the generated block of text. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Wait for OSG staff to approve and merge your new downtime.","title":"Registering new downtime"},{"location":"common/registration/#modifying-existing-downtime","text":"In case an already registered downtime is incorrect or need to be updated to reflect new information, you can modify existing downtime entries using the GitHub editor. Failure Changes to the ID or CreatedTime fields will be rejected. To modify an existing downtime entry for a registered resource, manually make the changes in the matching downtime YAML file. Follow the instructions below: Open the topology repository in your browser. If you do not know the facility, site, and resource group of the resource the downtime entry refers to, search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search returns a single YAML file , note the name of the facility, site, and resource group and continue to the next step. If the search doesn't return any results or returns more than one YAML file , please contact us . Open the following URL in your browser using the facility, site, and resource group names to replace <FACILITY> , <SITE> , and <RESOURCE GROUP> , respectively: https://github.com/opensciencegrid/topology/edit/master/topology/<FACILITY>/<SITE>/<RESOURCE GROUP>_downtime.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the downtime template as a reference. Make sure that the formatting and indentation of the modified entry does not change. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Move forward end date for CHTC-glidein2 regular maintenance Wait for OSG staff to approve and merge your modified downtime.","title":"Modifying existing downtime"},{"location":"common/registration/#registering-virtual-organizations","text":"Virtual Organizations (VOs) are sets of groups or individuals defined by some common cyber-infrastructure need. This can be a scientific experiment, a university campus or a distributed research effort. A VO represents all its members and their common needs in distributed computing environment. A VO also includes the group\u2019s computing/storage resources and services. For more information about VOs, see this page . Info Before submitting a registration for a new VO, please contact us describing your organization's computing needs. VO information is stored as YAML files in the virtual-organizations directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your VO in the list, open the file and continue to the next step. If you do not see your VO in the list, click Create new file button: In the new file dialog, enter <VO>.yaml , replacing <VO> with the name of your VO. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the VO template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the GLOW VO","title":"Registering Virtual Organizations"},{"location":"common/registration/#registering-projects","text":"Info Before submitting a registration for a new project, please contact us describing your organization's computing needs. Project information is stored as YAML files in the projects directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your project in the list, open the file and continue to the next step. If you do not see your project in the list, click Create new file button: In the new file dialog, enter <PROJECT>.yaml , replacing <PROJECT> with the name of your project. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the project template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the Mu2e project","title":"Registering Projects"},{"location":"common/registration/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"common/yum/","text":"OSG Yum Repositories \u00b6 This document introduces Yum repositories and how they are used in the OSG. If you are unfamiliar with Yum, see the documentation on using Yum and RPM . Repositories \u00b6 The OSG hosts multiple repositories at repo.opensciencegrid.org that are intended for public use: The OSG Yum repositories... Contain RPMs that... osg , osg-upcoming are considered production-ready (default). osg-testing , osg-upcoming-testing have passed developer or integration testing but not acceptance testing osg-development , osg-upcoming-development have not passed developer, integration or acceptance testing. Do not use without instruction from the OSG Software and Release Team. osg-contrib have been contributed from outside of the OSG Software and Release Team. See this section for details. Note The upcoming repositories contain newer software that might require manual action after an update. They are not enabled by default and must be enabled in addition to the main osg repository. See the upcoming software section for details. OSG's RPM packages also rely on external packages provided by supported OSes and EPEL. You must have the following repositories available and enabled: OS repositories, including the following ones that aren't enabled by default: extras (SL 7, CentOS 7, CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) Server-Extras (RHEL 7) powertools (CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) CodeReady Builder (RHEL 8) or crb (all EL9 variants) EPEL repositories OSG repositories If any of these repositories are missing, you may end up with installation issues or missing dependencies. Danger Other repositories, such as jpackage , dag , or rpmforge , are not supported and you may encounter problems if you use them. Upcoming Software \u00b6 Certain sites have requested new versions of software that would be considered \"disruptive\" or \"experimental\": upgrading to them would likely require manual intervention after their installation. We do not want sites to unwittingly upgrade to these versions. We have placed such software in separate repositories. Their names start with osg-upcoming and have the same structure as our standard repositories, as well as the same guarantees of quality and production-readiness. There are separate sets of upcoming repositories for each release series. For example, the OSG 23 repos have corresponding 23-upcoming repos . The upcoming repositories are meant to be layered on top of our standard repositories: installing software from the upcoming repositories requires also enabling the standard repositories from the same release. Contrib Software \u00b6 In addition to our regular software repositories, we also have a contrib (short for \"contributed\") software repository. This is software that is does not go through the same software testing and release processes as the official OSG Software release, but may be useful to you. Particularly, contrib software is not guaranteed to be compatible with the rest of the OSG Software stack nor is it supported by the OSG. The definitive list of software in the contrib repository can be found here: OSG 23 EL8 contrib software repository OSG 23 EL9 contrib software repository OSG 3.6 EL7 contrib software repository OSG 3.6 EL8 contrib software repository OSG 3.6 EL9 contrib software repository If you would like to distribute your software in the OSG contrib repository, please contact us with a description of your software, what users it serves, and relevant RPM packaging. Installing Yum Repositories \u00b6 Install the Yum priorities plugin (EL7) \u00b6 The Yum priorities plugin is used to tell Yum to prefer OSG packages over EPEL or OS packages. It is important to install and enable the Yum priorities plugin before installing OSG Software to ensure that you are getting the OSG-supported versions. This plugin is built into Yum on EL8 and EL9 distributions. Install the Yum priorities package: root@host # yum install yum-plugin-priorities Ensure that /etc/yum.conf has the following line in the [main] section: plugins=1 Enable additional OS repositories \u00b6 Some packages depend on packages that are in OS repositories not enabled by default. The repositories to enable, as well as the instructions to enable them, are OS-dependent. Note A repository is enabled if it has enabled=1 in its definition, or if the enabled line is missing (i.e. it is enabled unless specified otherwise.) SL 7 \u00b6 Install the yum-conf-extras RPM package. Ensure that the sl-extras repo in /etc/yum.repos.d/sl-extras.repo is enabled. CentOS 7 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/CentOS-Base.repo is enabled. CentOS Stream 8 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/CentOS-Stream-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/CentOS-Stream-PowerTools.repo is enabled. Rocky Linux 8 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/Rocky-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/Rocky-PowerTools.repo is enabled. AlmaLinux 8 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/almalinux.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/almalinux-powertools.repo is enabled. RHEL 7 \u00b6 Ensure that the Server-Extras channel is enabled. RHEL 8 \u00b6 Ensure that the CodeReady Linux Builder channel is enabled. See Red Hat's instructions on how to enable this repo. Rocky Linux 9 \u00b6 Ensure that the crb repo in /etc/yum.repos.d/rocky.repo is enabled AlmaLinux 9 \u00b6 Ensure that the crb repo in /etc/yum.repos.d/almalinux-crb.repo is enabled CentOS Stream 9 \u00b6 Ensure that the crb repo in /etc/yum.repos.d/centos.repo is enabled Install the EPEL repositories \u00b6 OSG software depends on packages distributed via the EPEL repositories. You must install and enable these first. Install the EPEL repository, if not already present. Choose the right version to match your OS version. # # EPEL 7 (For RHEL 7, CentOS 7, and SL 7) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # # EPEL 8 (For RHEL 8 and CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm # # EPEL 9 (For RHEL 9 and CentOS Stream 9, Rocky Linux 9, AlmaLinux 9) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm Verify that /etc/yum.repos.d/epel.repo exists; the [epel] section should contain: The line enabled=1 Either no priority setting, or a priority setting that is 99 or higher Warning If you have your own mirror or configuration of the EPEL repository, you MUST verify that the priority of the EPEL repository is either missing, or 99 or a higher number. The OSG repositories must have a better (numerically lower) priority than the EPEL repositories; otherwise, you might have dependency resolution (\"depsolving\") issues. Install the OSG Repositories \u00b6 This document assumes a fresh install. For instructions on upgrading from one OSG series to another, see the release series document . Install the OSG repository for your OS version and the OSG release series that you wish to use: OSG 23 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el8-release-latest.rpm OSG 23 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el9-release-latest.rpm OSG 3.6 EL7: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm OSG 3.6 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el8-release-latest.rpm OSG 3.6 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el9-release-latest.rpm The only OSG repository enabled by default is the release one. If you want to enable another one (e.g. osg-testing ), then edit its file (e.g. /etc/yum.repos.d/osg-testing.repo ) and change the enabled option from 0 to 1: [osg-testing] name=OSG Software for Enterprise Linux 7 - Testing - $basearch #baseurl=https://repo.opensciencegrid.org/osg/3.6/el7/testing/$basearch mirrorlist=https://repo.opensciencegrid.org/mirror/osg/3.6/el7/testing/$basearch failovermethod=priority priority=98 enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 Optional Configuration \u00b6 Enable automatic security updates \u00b6 For production services, we suggest only changing software versions during controlled downtime. Therefore we recommend security-only automatic updates or disabling automatic updates entirely. Note Automatic updates for EL8 and EL9 variants are provided in the dnf-automatic RPM, which is not installed by default. To enable only security related automatic updates: On EL 7 variants, edit /etc/yum/yum-cron.conf and set update_cmd = security On EL8 and EL9 variants, edit /etc/dnf/automatic.conf and set upgrade_type = security CentOS 7, CentOS Stream 8, and CentOS Stream 9 do not support security-only automatic updates; doing any of the above steps will prevent automatic updates from happening at all. To disable automatic updates entirely: On EL7 variants, run: root@host # service yum-cron stop On EL8 and EL9 variants, run: root@host # systemctl disable --now dnf-automatic.timer Configuring Spacewalk priorities \u00b6 Sites using Spacewalk to manage RPM packages will need to configure OSG Yum repository priorities using their Spacewalk ID. For example, if the OSG 3.4 repository's Spacewalk ID is centos_7_osg34_dev , modify /etc/yum/pluginconf.d/90-osg.conf to include the following: [centos_7_osg_34_dev] priority = 98 Repository Mirrors \u00b6 If you run a large site (>20 nodes), you should consider setting up a local mirror for the OSG repositories. A local Yum mirror allows you to reduce the amount of external bandwidth used when updating or installing packages. Add the following to a file in /etc/cron.d : <RANDOM> * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg/ /var/www/html/osg/ Or, to mirror only a single repository: <RANDOM> * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg/<OSG_RELEASE>/el9/development /var/www/html/osg/<OSG_RELEASE>/el7 Replace <OSG_RELEASE> with the OSG release you would like to use (e.g. 23-main ) and <RANDOM> with a number between 0 and 59. On your worker node, you can replace the baseurl line of /etc/yum.repos.d/osg.repo with the appropriate URL for your mirror. If you are interested in having your mirror be part of the OSG's default set of mirrors, please file a support ticket . Reference \u00b6 Basic use of Yum","title":"OSG Yum Repos"},{"location":"common/yum/#osg-yum-repositories","text":"This document introduces Yum repositories and how they are used in the OSG. If you are unfamiliar with Yum, see the documentation on using Yum and RPM .","title":"OSG Yum Repositories"},{"location":"common/yum/#repositories","text":"The OSG hosts multiple repositories at repo.opensciencegrid.org that are intended for public use: The OSG Yum repositories... Contain RPMs that... osg , osg-upcoming are considered production-ready (default). osg-testing , osg-upcoming-testing have passed developer or integration testing but not acceptance testing osg-development , osg-upcoming-development have not passed developer, integration or acceptance testing. Do not use without instruction from the OSG Software and Release Team. osg-contrib have been contributed from outside of the OSG Software and Release Team. See this section for details. Note The upcoming repositories contain newer software that might require manual action after an update. They are not enabled by default and must be enabled in addition to the main osg repository. See the upcoming software section for details. OSG's RPM packages also rely on external packages provided by supported OSes and EPEL. You must have the following repositories available and enabled: OS repositories, including the following ones that aren't enabled by default: extras (SL 7, CentOS 7, CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) Server-Extras (RHEL 7) powertools (CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) CodeReady Builder (RHEL 8) or crb (all EL9 variants) EPEL repositories OSG repositories If any of these repositories are missing, you may end up with installation issues or missing dependencies. Danger Other repositories, such as jpackage , dag , or rpmforge , are not supported and you may encounter problems if you use them.","title":"Repositories"},{"location":"common/yum/#upcoming-software","text":"Certain sites have requested new versions of software that would be considered \"disruptive\" or \"experimental\": upgrading to them would likely require manual intervention after their installation. We do not want sites to unwittingly upgrade to these versions. We have placed such software in separate repositories. Their names start with osg-upcoming and have the same structure as our standard repositories, as well as the same guarantees of quality and production-readiness. There are separate sets of upcoming repositories for each release series. For example, the OSG 23 repos have corresponding 23-upcoming repos . The upcoming repositories are meant to be layered on top of our standard repositories: installing software from the upcoming repositories requires also enabling the standard repositories from the same release.","title":"Upcoming Software"},{"location":"common/yum/#contrib-software","text":"In addition to our regular software repositories, we also have a contrib (short for \"contributed\") software repository. This is software that is does not go through the same software testing and release processes as the official OSG Software release, but may be useful to you. Particularly, contrib software is not guaranteed to be compatible with the rest of the OSG Software stack nor is it supported by the OSG. The definitive list of software in the contrib repository can be found here: OSG 23 EL8 contrib software repository OSG 23 EL9 contrib software repository OSG 3.6 EL7 contrib software repository OSG 3.6 EL8 contrib software repository OSG 3.6 EL9 contrib software repository If you would like to distribute your software in the OSG contrib repository, please contact us with a description of your software, what users it serves, and relevant RPM packaging.","title":"Contrib Software"},{"location":"common/yum/#installing-yum-repositories","text":"","title":"Installing Yum Repositories"},{"location":"common/yum/#install-the-yum-priorities-plugin-el7","text":"The Yum priorities plugin is used to tell Yum to prefer OSG packages over EPEL or OS packages. It is important to install and enable the Yum priorities plugin before installing OSG Software to ensure that you are getting the OSG-supported versions. This plugin is built into Yum on EL8 and EL9 distributions. Install the Yum priorities package: root@host # yum install yum-plugin-priorities Ensure that /etc/yum.conf has the following line in the [main] section: plugins=1","title":"Install the Yum priorities plugin (EL7)"},{"location":"common/yum/#enable-additional-os-repositories","text":"Some packages depend on packages that are in OS repositories not enabled by default. The repositories to enable, as well as the instructions to enable them, are OS-dependent. Note A repository is enabled if it has enabled=1 in its definition, or if the enabled line is missing (i.e. it is enabled unless specified otherwise.)","title":"Enable additional OS repositories"},{"location":"common/yum/#sl-7","text":"Install the yum-conf-extras RPM package. Ensure that the sl-extras repo in /etc/yum.repos.d/sl-extras.repo is enabled.","title":"SL 7"},{"location":"common/yum/#centos-7","text":"Ensure that the extras repo in /etc/yum.repos.d/CentOS-Base.repo is enabled.","title":"CentOS 7"},{"location":"common/yum/#centos-stream-8","text":"Ensure that the extras repo in /etc/yum.repos.d/CentOS-Stream-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/CentOS-Stream-PowerTools.repo is enabled.","title":"CentOS Stream 8"},{"location":"common/yum/#rocky-linux-8","text":"Ensure that the extras repo in /etc/yum.repos.d/Rocky-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/Rocky-PowerTools.repo is enabled.","title":"Rocky Linux 8"},{"location":"common/yum/#almalinux-8","text":"Ensure that the extras repo in /etc/yum.repos.d/almalinux.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/almalinux-powertools.repo is enabled.","title":"AlmaLinux 8"},{"location":"common/yum/#rhel-7","text":"Ensure that the Server-Extras channel is enabled.","title":"RHEL 7"},{"location":"common/yum/#rhel-8","text":"Ensure that the CodeReady Linux Builder channel is enabled. See Red Hat's instructions on how to enable this repo.","title":"RHEL 8"},{"location":"common/yum/#rocky-linux-9","text":"Ensure that the crb repo in /etc/yum.repos.d/rocky.repo is enabled","title":"Rocky Linux 9"},{"location":"common/yum/#almalinux-9","text":"Ensure that the crb repo in /etc/yum.repos.d/almalinux-crb.repo is enabled","title":"AlmaLinux 9"},{"location":"common/yum/#centos-stream-9","text":"Ensure that the crb repo in /etc/yum.repos.d/centos.repo is enabled","title":"CentOS Stream 9"},{"location":"common/yum/#install-the-epel-repositories","text":"OSG software depends on packages distributed via the EPEL repositories. You must install and enable these first. Install the EPEL repository, if not already present. Choose the right version to match your OS version. # # EPEL 7 (For RHEL 7, CentOS 7, and SL 7) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # # EPEL 8 (For RHEL 8 and CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm # # EPEL 9 (For RHEL 9 and CentOS Stream 9, Rocky Linux 9, AlmaLinux 9) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm Verify that /etc/yum.repos.d/epel.repo exists; the [epel] section should contain: The line enabled=1 Either no priority setting, or a priority setting that is 99 or higher Warning If you have your own mirror or configuration of the EPEL repository, you MUST verify that the priority of the EPEL repository is either missing, or 99 or a higher number. The OSG repositories must have a better (numerically lower) priority than the EPEL repositories; otherwise, you might have dependency resolution (\"depsolving\") issues.","title":"Install the EPEL repositories"},{"location":"common/yum/#install-the-osg-repositories","text":"This document assumes a fresh install. For instructions on upgrading from one OSG series to another, see the release series document . Install the OSG repository for your OS version and the OSG release series that you wish to use: OSG 23 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el8-release-latest.rpm OSG 23 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el9-release-latest.rpm OSG 3.6 EL7: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm OSG 3.6 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el8-release-latest.rpm OSG 3.6 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el9-release-latest.rpm The only OSG repository enabled by default is the release one. If you want to enable another one (e.g. osg-testing ), then edit its file (e.g. /etc/yum.repos.d/osg-testing.repo ) and change the enabled option from 0 to 1: [osg-testing] name=OSG Software for Enterprise Linux 7 - Testing - $basearch #baseurl=https://repo.opensciencegrid.org/osg/3.6/el7/testing/$basearch mirrorlist=https://repo.opensciencegrid.org/mirror/osg/3.6/el7/testing/$basearch failovermethod=priority priority=98 enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2","title":"Install the OSG Repositories"},{"location":"common/yum/#optional-configuration","text":"","title":"Optional Configuration"},{"location":"common/yum/#enable-automatic-security-updates","text":"For production services, we suggest only changing software versions during controlled downtime. Therefore we recommend security-only automatic updates or disabling automatic updates entirely. Note Automatic updates for EL8 and EL9 variants are provided in the dnf-automatic RPM, which is not installed by default. To enable only security related automatic updates: On EL 7 variants, edit /etc/yum/yum-cron.conf and set update_cmd = security On EL8 and EL9 variants, edit /etc/dnf/automatic.conf and set upgrade_type = security CentOS 7, CentOS Stream 8, and CentOS Stream 9 do not support security-only automatic updates; doing any of the above steps will prevent automatic updates from happening at all. To disable automatic updates entirely: On EL7 variants, run: root@host # service yum-cron stop On EL8 and EL9 variants, run: root@host # systemctl disable --now dnf-automatic.timer","title":"Enable automatic security updates"},{"location":"common/yum/#configuring-spacewalk-priorities","text":"Sites using Spacewalk to manage RPM packages will need to configure OSG Yum repository priorities using their Spacewalk ID. For example, if the OSG 3.4 repository's Spacewalk ID is centos_7_osg34_dev , modify /etc/yum/pluginconf.d/90-osg.conf to include the following: [centos_7_osg_34_dev] priority = 98","title":"Configuring Spacewalk priorities"},{"location":"common/yum/#repository-mirrors","text":"If you run a large site (>20 nodes), you should consider setting up a local mirror for the OSG repositories. A local Yum mirror allows you to reduce the amount of external bandwidth used when updating or installing packages. Add the following to a file in /etc/cron.d : <RANDOM> * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg/ /var/www/html/osg/ Or, to mirror only a single repository: <RANDOM> * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg/<OSG_RELEASE>/el9/development /var/www/html/osg/<OSG_RELEASE>/el7 Replace <OSG_RELEASE> with the OSG release you would like to use (e.g. 23-main ) and <RANDOM> with a number between 0 and 59. On your worker node, you can replace the baseurl line of /etc/yum.repos.d/osg.repo with the appropriate URL for your mirror. If you are interested in having your mirror be part of the OSG's default set of mirrors, please file a support ticket .","title":"Repository Mirrors"},{"location":"common/yum/#reference","text":"Basic use of Yum","title":"Reference"},{"location":"compute-element/covid-19/","text":"Supporting COVID-19 Research on the OSG \u00b6 Info The instructions in this document are deprecated, as COVID-19 jobs are no longer prioritized. There a few options available for sites with computing resources who want to support the important and urgent work of COVID-19 researchers using the OSG. As we're currently routing such projects through the OSG VO, your site can be configured to accept pilots that exclusively run OSG VO jobs relating to COVID-19 research (among other pilots you support), allowing you to prioritize these pilots and account for this usage separately from other OSG activity. To support COVID-19 work, the overall process includes the following: Make the site computing resources available through a HTCondor-CE if you have not already done so. You can install a locally-managed instance or ask OSG to host the CE on your behalf. If neither solution is viable, or you'd like to discuss the options, please send email to help@osg-htc.org and we'll work with you to arrive at the best solution. If you already provide resources through an OSG Hosted CE, skip to this section . Enable the OSG VO on your HTCondor-CE. Setup a job route specific to COVID-19 pilot jobs (documented below). The job route will allow you to prioritize these jobs using local policy in your site's cluster. (Optional) To attract more user jobs, install CVMFS and Apptainer on your site's worker nodes Send email to help@osg-htc.org requesting that your CE receive COVID-19 pilots. We will need to know the CE hostname and any special restrictions that might apply to these pilots. Setting up a COVID-19 Job Route \u00b6 By default, COVID-19 pilots will look identical to OSG pilots except they will have the attribute IsCOVID19 = true . They do not require mapping to a distinct Unix account but can be sent to a prioritized queue or accounting group. Job routes are controlled by the JOB_ROUTER_ENTRIES configuration variable in HTCondor-CE. Customizations may be placed in /etc/condor-ce/config.d/ where files are parsed in lexicographical order, e.g. JOB_ROUTER_ENTRIES specified in 50-covid-routes.conf will override JOB_ROUTER_ENTRIES in 02-local-slurm.conf . For Non-HTCondor batch systems \u00b6 To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"covid19\"; Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing slurm in the GridResource attribute with the appropriate value for your batch system (e.g., lsf , pbs , sge , or slurm ); and the value of set_default_queue with the name of the partition or queue of your local batch system dedicated to COVID-19 work. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) If your configuration does not already define JOB_ROUTER_ROUTE_NAMES , you need to add the name of all previous routes to it, leaving OSG_COVID19_Jobs at the start of the list. For example: JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, Local_Condor, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration For HTCondor batch systems \u00b6 Similarly, at an HTCondor site, one can place these jobs into a separate accounting group by providing the set_AcctGroup and eval_set_AccountingGroup attributes in a new job route. To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; TargetUniverse = 5; set_AcctGroup = \"covid19\"; eval_set_AccountingGroup = strcat(AcctGroup, \".\", Owner); Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing covid19 in set_AcctGroup with the name of the accounting group that you would like to use for COVID-19 jobs. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"atlas\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration Verifying the COVID-19 Job Route \u00b6 To verify that your HTCondor-CE is configured to support COVID-19 jobs, perform the following steps: Ensure that the OSG_COVID19_Jobs route appears with all of your other previously enabled routes: condor_ce_job_router_info -config Known issue: removing old routes If your HTCondor-CE has jobs associated with a route that is removed from your configuration, this will result in a crashing Job Router. If you accidentally remove an old route, restore the route or remove all jobs associated with said route. Ensure that COVID-19 jobs will match to your new job route: For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: OSG_COVID19_Jobs should be the first route in the routing table: condor_ce_config_val -verbose JOB_ROUTER_ROUTE_NAMES For older versions of HTCondor: the Requirements expresison of your OSG_COVID19_Jobs route must contain (TARGET.IsCOVID19 =?= true) and all other routes must contain (TARGET.IsCOVID19 =!= true) in their Requirements expression. After requesting COVID-19 jobs , verify that jobs are being routed appropriately, by examining pilots with condor_ce_router_q . Requesting COVID-19 Jobs \u00b6 To receive COVID-19 pilot jobs, send an email to help@osg-htc.org with the subject Requesting COVID-19 pilots and the following information: Whether you want to receive only COVID-19 jobs, or if you want to accept COVID-19 and other OSG jobs The hostname(s) of your HTCondor-CE(s) Any other restrictions that may apply to these jobs (e.g. number of available cores) Viewing COVID-19 Contributions \u00b6 You can view how many hours that COVID-19 projects have consumed at your site with this GRACC dashboard . Getting Help \u00b6 To get assistance, please use this page .","title":"Supporting COVID-19 Research on the OSG"},{"location":"compute-element/covid-19/#supporting-covid-19-research-on-the-osg","text":"Info The instructions in this document are deprecated, as COVID-19 jobs are no longer prioritized. There a few options available for sites with computing resources who want to support the important and urgent work of COVID-19 researchers using the OSG. As we're currently routing such projects through the OSG VO, your site can be configured to accept pilots that exclusively run OSG VO jobs relating to COVID-19 research (among other pilots you support), allowing you to prioritize these pilots and account for this usage separately from other OSG activity. To support COVID-19 work, the overall process includes the following: Make the site computing resources available through a HTCondor-CE if you have not already done so. You can install a locally-managed instance or ask OSG to host the CE on your behalf. If neither solution is viable, or you'd like to discuss the options, please send email to help@osg-htc.org and we'll work with you to arrive at the best solution. If you already provide resources through an OSG Hosted CE, skip to this section . Enable the OSG VO on your HTCondor-CE. Setup a job route specific to COVID-19 pilot jobs (documented below). The job route will allow you to prioritize these jobs using local policy in your site's cluster. (Optional) To attract more user jobs, install CVMFS and Apptainer on your site's worker nodes Send email to help@osg-htc.org requesting that your CE receive COVID-19 pilots. We will need to know the CE hostname and any special restrictions that might apply to these pilots.","title":"Supporting COVID-19 Research on the OSG"},{"location":"compute-element/covid-19/#setting-up-a-covid-19-job-route","text":"By default, COVID-19 pilots will look identical to OSG pilots except they will have the attribute IsCOVID19 = true . They do not require mapping to a distinct Unix account but can be sent to a prioritized queue or accounting group. Job routes are controlled by the JOB_ROUTER_ENTRIES configuration variable in HTCondor-CE. Customizations may be placed in /etc/condor-ce/config.d/ where files are parsed in lexicographical order, e.g. JOB_ROUTER_ENTRIES specified in 50-covid-routes.conf will override JOB_ROUTER_ENTRIES in 02-local-slurm.conf .","title":"Setting up a COVID-19 Job Route"},{"location":"compute-element/covid-19/#for-non-htcondor-batch-systems","text":"To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"covid19\"; Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing slurm in the GridResource attribute with the appropriate value for your batch system (e.g., lsf , pbs , sge , or slurm ); and the value of set_default_queue with the name of the partition or queue of your local batch system dedicated to COVID-19 work. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) If your configuration does not already define JOB_ROUTER_ROUTE_NAMES , you need to add the name of all previous routes to it, leaving OSG_COVID19_Jobs at the start of the list. For example: JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, Local_Condor, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration","title":"For Non-HTCondor batch systems"},{"location":"compute-element/covid-19/#for-htcondor-batch-systems","text":"Similarly, at an HTCondor site, one can place these jobs into a separate accounting group by providing the set_AcctGroup and eval_set_AccountingGroup attributes in a new job route. To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; TargetUniverse = 5; set_AcctGroup = \"covid19\"; eval_set_AccountingGroup = strcat(AcctGroup, \".\", Owner); Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing covid19 in set_AcctGroup with the name of the accounting group that you would like to use for COVID-19 jobs. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"atlas\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration","title":"For HTCondor batch systems"},{"location":"compute-element/covid-19/#verifying-the-covid-19-job-route","text":"To verify that your HTCondor-CE is configured to support COVID-19 jobs, perform the following steps: Ensure that the OSG_COVID19_Jobs route appears with all of your other previously enabled routes: condor_ce_job_router_info -config Known issue: removing old routes If your HTCondor-CE has jobs associated with a route that is removed from your configuration, this will result in a crashing Job Router. If you accidentally remove an old route, restore the route or remove all jobs associated with said route. Ensure that COVID-19 jobs will match to your new job route: For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: OSG_COVID19_Jobs should be the first route in the routing table: condor_ce_config_val -verbose JOB_ROUTER_ROUTE_NAMES For older versions of HTCondor: the Requirements expresison of your OSG_COVID19_Jobs route must contain (TARGET.IsCOVID19 =?= true) and all other routes must contain (TARGET.IsCOVID19 =!= true) in their Requirements expression. After requesting COVID-19 jobs , verify that jobs are being routed appropriately, by examining pilots with condor_ce_router_q .","title":"Verifying the COVID-19 Job Route"},{"location":"compute-element/covid-19/#requesting-covid-19-jobs","text":"To receive COVID-19 pilot jobs, send an email to help@osg-htc.org with the subject Requesting COVID-19 pilots and the following information: Whether you want to receive only COVID-19 jobs, or if you want to accept COVID-19 and other OSG jobs The hostname(s) of your HTCondor-CE(s) Any other restrictions that may apply to these jobs (e.g. number of available cores)","title":"Requesting COVID-19 Jobs"},{"location":"compute-element/covid-19/#viewing-covid-19-contributions","text":"You can view how many hours that COVID-19 projects have consumed at your site with this GRACC dashboard .","title":"Viewing COVID-19 Contributions"},{"location":"compute-element/covid-19/#getting-help","text":"To get assistance, please use this page .","title":"Getting Help"},{"location":"compute-element/hosted-ce/","text":"Requesting an OSG Hosted CE \u00b6 An OSG Hosted Compute Entrypoint (CE) is the entry point for resource requests coming from the OSG; it handles authorization and delegation of resource requests to your existing campus HPC/HTC cluster. Many sites set up their compute entrypoint locally. As an alternative, OSG offers a no-cost Hosted CE option wherein the OSG team will host and operate the HTCondor Compute Entrypoint, and configure it for the communities that you choose to support. This document explains the requirements and the procedure for requesting an OSG Hosted CE. Running more than 10,000 resource requests The Hosted CE can support thousands of concurrent resource request submissions. If you wish to run your own local compute entrypoint or expect to support more than 10,000 concurrently running OSG resource requests, see this page for installing the HTCondor-CE. Before Starting \u00b6 Before preparing your cluster for OSG resource requests, consider the following requirements: An existing compute cluster with a supported batch system running on a supported operating system Outbound network connectivity from the worker nodes (they can be behind NAT) One or more Unix accounts on your cluster's submit server with the following capabilities: Accessible via SSH key Use of SSH remote port forwarding ( AllowTcpForwarding yes ) and SSH multiplexing ( MaxSessions 10 or greater) Permission to submit jobs to your local cluster. Shared user home directories between the submit server and the worker nodes. Not required for HTCondor clusters: see this section for more details. Temporary scratch space on each worker node; site administrators should ensure that files in this directory are regularly cleaned out. OSG resource contributors must inform the OSG of any relevant changes to their site. Site downtimes For an improved turnaround time regarding an outage or downtime at your site, contact us and include downtime in the subject or body of the email. For additional technical details, please consult the reference section below. Don't meet the requirements? If your site does not meet these conditions, please contact us to discuss your options for contributing to the OSG. Scheduling a Planning Consultation \u00b6 Before participating in the OSG, either as a computational resource contributor or consumer, we ask that you contact us to set up a consultation. During this consultation, OSG staff will introduce you and your team to the OSG and develop a plan to meet your resource contribution and/or research goals. Preparing Your Local Cluster \u00b6 After the consultation, ensure that your local cluster meets the requirements as outlined above . In particular, you should now know which accounts to create for the communities that you wish to serve at your cluster. Also consider the size and number of jobs that the OSG should send to your site (e.g., number of cores, memory, GPUs, walltime) as well as their scheduling policy (e.g. preemptible backfill partitions). Additionally, OSG staff may have directed you to follow installation instructions from one or more of the following sections: (Recommended) Providing access to CVMFS \u00b6 Maximize resource utilization; required for GPU support Installing CVMFS on your cluster makes your resources more attractive to OSG user jobs! Additionally, if you plan to contribute GPUs to the OSG, installation of CVMFS is required . Many users in the OSG make of use software modules and/or containers provided by their collaborations or by the OSG Research Facilitation team. In order to support these users without having to install specific software modules on your cluster, you may provide a distributed software repository system called CernVM File System (CVMFS). In order to provide CVMFS at your site, you will need the following: A cluster-wide Frontier Squid proxy service with at least 50GB of cache space; installation instructions for Frontier Squid are provided here . A local CVMFS cache per worker node (10 GB minimum, 20 GB recommended) After setting up the Frontier Squid proxy and worker node local caches, install CVMFS on each worker node. (HTCondor clusters only) Installing the OSG Worker Node Client \u00b6 Skip this section if you have CVMFS or shared home directories! If you have CVMFS installed or shared home directories on your worker nodes, you can skip manual installation of the OSG Worker Node Client. All OSG sites need to provide the OSG Worker Node Client on each worker node in their local cluster. This is normally handled by OSG staff for a Hosted CE but that requires shared home directories across the cluster. However, for sites with an HTCondor batch system, often there is no shared filesystem set up. If you run an HTCondor site and it is easier to install and maintain the Worker Node Client on each worker node than to install CVMFS or maintain shared file system, you have the following options: Install the Worker Node Client from RPM Install the Worker Node Client from tarball Requesting an OSG Hosted CE \u00b6 After preparing your local cluster, apply for a Hosted CE by filling out the cluster integration questionnaire. Your answers will help our operators submit resource requests to your local cluster of the appropriate size and scale. Cluster Integration Questionnaire Can I change my answers at a later date? Yes! If you want the OSG to change the size (i.e. CPU, RAM), type (e.g., GPU requests), or number of resource requests, contact us with the FQDN of your login host and the details of your changes. Finalizing Installation \u00b6 After applying for an OSG Hosted CE, our staff will contact you with the following information: IP ranges of OSG hosted services Public SSH key to be installed in the OSG accounts Once this is done, OSG staff will work with you and your team to begin submitting resource requests to your site, first with some tests, then with a steady ramp-up to full production. Validating contributions \u00b6 In addition to any internal validation processes that you may have, the OSG provides monitoring to view which communities and projects within said communities are accessing your site, their fields of science, and home institution. Below is an example of the monitoring views that will be available for your cluster. To view your contributions, select your site from the Facility dropdown of the Payload job summary dashboard. Note that accounting data may take up to 24 hours to display. Reference \u00b6 User accounts \u00b6 Each resource pool in the OSG Consortium that uses Hosted CEs is mapped to your site as a fixed, specific account; we request the account names are of the form osg01 through osg20 . The mappings from Unix usernames to resource pools are as follows: Username Pool Supported Research osg01 OSPool Projects (primarily single PI) supported directly by the OSG organization osg02 GLOW Projects coming from the Center for High Throughput Computing at the University of Wisconsin-Madison osg03 HCC Projects coming from the Holland Computing Center at the University of Nebraska\u2013Lincoln osg04 CMS High-energy physics experiment from the Large Hadron Collider at CERN osg05 Fermilab Experiments from the Fermi National Accelerator Laboratory osg07 IGWN Gravitational wave detection experiments osg08 IGWN Gravitational wave detection experiments osg09 ATLAS High-energy physics experiment from the Large Hadron Collider at CERN osg10 GlueX Study of quark and gluon degrees of freedom in hadrons using high-energy photons osg11 DUNE Experiment for neutrino science and proton decay studies osg12 IceCube Research based on data from the IceCube neutrino detector osg13 XENON Dark matter search experiment osg14 JLab Experiments from the Thomas Jefferson National Accelerator Facility osg15 - osg20 - Unassigned For example, the activities in your batch system corresponding to the user osg02 will always be associated with the GLOW resource pool. Security \u00b6 OSG takes multiple precautions to maintain security and prevent unauthorized usage of resources: Access to the OSG system with SSH keys are restricted to the OSG staff maintaining them Users are carefully vetted before they are allowed to submit jobs to OSG Jobs running through OSG can be traced back to the user that submitted them Job submission can quickly be disabled if needed Our security team is readily contactable in case of an emergency: https://osg-htc.org/security/#reporting-a-security-incident How to Get Help \u00b6 Is your site not receiving jobs from an OSG Hosted CE? Consult our status page for Hosted CE outages. If there isn't an outage, you need help with setup, or otherwise have questions, contact us .","title":"Request a Hosted CE"},{"location":"compute-element/hosted-ce/#requesting-an-osg-hosted-ce","text":"An OSG Hosted Compute Entrypoint (CE) is the entry point for resource requests coming from the OSG; it handles authorization and delegation of resource requests to your existing campus HPC/HTC cluster. Many sites set up their compute entrypoint locally. As an alternative, OSG offers a no-cost Hosted CE option wherein the OSG team will host and operate the HTCondor Compute Entrypoint, and configure it for the communities that you choose to support. This document explains the requirements and the procedure for requesting an OSG Hosted CE. Running more than 10,000 resource requests The Hosted CE can support thousands of concurrent resource request submissions. If you wish to run your own local compute entrypoint or expect to support more than 10,000 concurrently running OSG resource requests, see this page for installing the HTCondor-CE.","title":"Requesting an OSG Hosted CE"},{"location":"compute-element/hosted-ce/#before-starting","text":"Before preparing your cluster for OSG resource requests, consider the following requirements: An existing compute cluster with a supported batch system running on a supported operating system Outbound network connectivity from the worker nodes (they can be behind NAT) One or more Unix accounts on your cluster's submit server with the following capabilities: Accessible via SSH key Use of SSH remote port forwarding ( AllowTcpForwarding yes ) and SSH multiplexing ( MaxSessions 10 or greater) Permission to submit jobs to your local cluster. Shared user home directories between the submit server and the worker nodes. Not required for HTCondor clusters: see this section for more details. Temporary scratch space on each worker node; site administrators should ensure that files in this directory are regularly cleaned out. OSG resource contributors must inform the OSG of any relevant changes to their site. Site downtimes For an improved turnaround time regarding an outage or downtime at your site, contact us and include downtime in the subject or body of the email. For additional technical details, please consult the reference section below. Don't meet the requirements? If your site does not meet these conditions, please contact us to discuss your options for contributing to the OSG.","title":"Before Starting"},{"location":"compute-element/hosted-ce/#scheduling-a-planning-consultation","text":"Before participating in the OSG, either as a computational resource contributor or consumer, we ask that you contact us to set up a consultation. During this consultation, OSG staff will introduce you and your team to the OSG and develop a plan to meet your resource contribution and/or research goals.","title":"Scheduling a Planning Consultation"},{"location":"compute-element/hosted-ce/#preparing-your-local-cluster","text":"After the consultation, ensure that your local cluster meets the requirements as outlined above . In particular, you should now know which accounts to create for the communities that you wish to serve at your cluster. Also consider the size and number of jobs that the OSG should send to your site (e.g., number of cores, memory, GPUs, walltime) as well as their scheduling policy (e.g. preemptible backfill partitions). Additionally, OSG staff may have directed you to follow installation instructions from one or more of the following sections:","title":"Preparing Your Local Cluster"},{"location":"compute-element/hosted-ce/#recommended-providing-access-to-cvmfs","text":"Maximize resource utilization; required for GPU support Installing CVMFS on your cluster makes your resources more attractive to OSG user jobs! Additionally, if you plan to contribute GPUs to the OSG, installation of CVMFS is required . Many users in the OSG make of use software modules and/or containers provided by their collaborations or by the OSG Research Facilitation team. In order to support these users without having to install specific software modules on your cluster, you may provide a distributed software repository system called CernVM File System (CVMFS). In order to provide CVMFS at your site, you will need the following: A cluster-wide Frontier Squid proxy service with at least 50GB of cache space; installation instructions for Frontier Squid are provided here . A local CVMFS cache per worker node (10 GB minimum, 20 GB recommended) After setting up the Frontier Squid proxy and worker node local caches, install CVMFS on each worker node.","title":"(Recommended) Providing access to CVMFS"},{"location":"compute-element/hosted-ce/#htcondor-clusters-only-installing-the-osg-worker-node-client","text":"Skip this section if you have CVMFS or shared home directories! If you have CVMFS installed or shared home directories on your worker nodes, you can skip manual installation of the OSG Worker Node Client. All OSG sites need to provide the OSG Worker Node Client on each worker node in their local cluster. This is normally handled by OSG staff for a Hosted CE but that requires shared home directories across the cluster. However, for sites with an HTCondor batch system, often there is no shared filesystem set up. If you run an HTCondor site and it is easier to install and maintain the Worker Node Client on each worker node than to install CVMFS or maintain shared file system, you have the following options: Install the Worker Node Client from RPM Install the Worker Node Client from tarball","title":"(HTCondor clusters only) Installing the OSG Worker Node Client"},{"location":"compute-element/hosted-ce/#requesting-an-osg-hosted-ce_1","text":"After preparing your local cluster, apply for a Hosted CE by filling out the cluster integration questionnaire. Your answers will help our operators submit resource requests to your local cluster of the appropriate size and scale. Cluster Integration Questionnaire Can I change my answers at a later date? Yes! If you want the OSG to change the size (i.e. CPU, RAM), type (e.g., GPU requests), or number of resource requests, contact us with the FQDN of your login host and the details of your changes.","title":"Requesting an OSG Hosted CE"},{"location":"compute-element/hosted-ce/#finalizing-installation","text":"After applying for an OSG Hosted CE, our staff will contact you with the following information: IP ranges of OSG hosted services Public SSH key to be installed in the OSG accounts Once this is done, OSG staff will work with you and your team to begin submitting resource requests to your site, first with some tests, then with a steady ramp-up to full production.","title":"Finalizing Installation"},{"location":"compute-element/hosted-ce/#validating-contributions","text":"In addition to any internal validation processes that you may have, the OSG provides monitoring to view which communities and projects within said communities are accessing your site, their fields of science, and home institution. Below is an example of the monitoring views that will be available for your cluster. To view your contributions, select your site from the Facility dropdown of the Payload job summary dashboard. Note that accounting data may take up to 24 hours to display.","title":"Validating contributions"},{"location":"compute-element/hosted-ce/#reference","text":"","title":"Reference"},{"location":"compute-element/hosted-ce/#user-accounts","text":"Each resource pool in the OSG Consortium that uses Hosted CEs is mapped to your site as a fixed, specific account; we request the account names are of the form osg01 through osg20 . The mappings from Unix usernames to resource pools are as follows: Username Pool Supported Research osg01 OSPool Projects (primarily single PI) supported directly by the OSG organization osg02 GLOW Projects coming from the Center for High Throughput Computing at the University of Wisconsin-Madison osg03 HCC Projects coming from the Holland Computing Center at the University of Nebraska\u2013Lincoln osg04 CMS High-energy physics experiment from the Large Hadron Collider at CERN osg05 Fermilab Experiments from the Fermi National Accelerator Laboratory osg07 IGWN Gravitational wave detection experiments osg08 IGWN Gravitational wave detection experiments osg09 ATLAS High-energy physics experiment from the Large Hadron Collider at CERN osg10 GlueX Study of quark and gluon degrees of freedom in hadrons using high-energy photons osg11 DUNE Experiment for neutrino science and proton decay studies osg12 IceCube Research based on data from the IceCube neutrino detector osg13 XENON Dark matter search experiment osg14 JLab Experiments from the Thomas Jefferson National Accelerator Facility osg15 - osg20 - Unassigned For example, the activities in your batch system corresponding to the user osg02 will always be associated with the GLOW resource pool.","title":"User accounts"},{"location":"compute-element/hosted-ce/#security","text":"OSG takes multiple precautions to maintain security and prevent unauthorized usage of resources: Access to the OSG system with SSH keys are restricted to the OSG staff maintaining them Users are carefully vetted before they are allowed to submit jobs to OSG Jobs running through OSG can be traced back to the user that submitted them Job submission can quickly be disabled if needed Our security team is readily contactable in case of an emergency: https://osg-htc.org/security/#reporting-a-security-incident","title":"Security"},{"location":"compute-element/hosted-ce/#how-to-get-help","text":"Is your site not receiving jobs from an OSG Hosted CE? Consult our status page for Hosted CE outages. If there isn't an outage, you need help with setup, or otherwise have questions, contact us .","title":"How to Get Help"},{"location":"compute-element/htcondor-ce-overview/","text":"HTCondor-CE Overview \u00b6 This document serves as an introduction to HTCondor-CE and how it works. Before continuing with the overview, make sure that you are familiar with the following concepts: An OSG site plan What is a batch system and which one will you use ( HTCondor , PBS, LSF, SGE, or SLURM )? Security via host certificates to authenticate servers and bearer tokens to authenticate clients Pilot jobs, frontends, and factories (i.e., GlideinWMS , AutoPyFactory) What is a Compute Entrypoint? \u00b6 An OSG Compute Entrypoint (CE) is the door for remote organizations to submit requests to temporarily allocate local compute resources. At the heart of the CE is the job gateway software, which is responsible for handling incoming jobs, authenticating and authorizing them, and delegating them to your batch system for execution. Most jobs that arrive at a CE (here referred to as \"CE jobs\") are not end-user jobs, but rather pilot jobs submitted from factories. Successful pilot jobs create and make available an environment for actual end-user jobs to match and ultimately run within the pilot job container. Eventually pilot jobs remove themselves, typically after a period of inactivity. Note The Compute Entrypoint was previously known as the \"Compute Element\". What is HTCondor-CE? \u00b6 HTCondor-CE is a special configuration of the HTCondor software designed to be a job gateway solution for the OSG Fabric of Services. It is configured to use the JobRouter daemon to delegate jobs by transforming and submitting them to the site\u2019s batch system. Benefits of running the HTCondor-CE: Scalability: HTCondor-CE is capable of supporting job workloads of large sites Debugging tools: HTCondor-CE offers many tools to help troubleshoot issues with jobs Routing as configuration: HTCondor-CE\u2019s mechanism to transform and submit jobs is customized via configuration variables, which means that customizations will persist across upgrades and will not involve modification of software internals to route jobs How CE Jobs Run \u00b6 Once an incoming CE job is authorized, it is placed into HTCondor-CE\u2019s scheduler where the JobRouter creates a transformed copy (called the routed job ) and submits the copy to the batch system (called the batch system job ). After submission, HTCondor-CE monitors the batch system job and communicates its status to the original CE job, which in turn notifies the original submitter (e.g., job factory) of any updates. When the job completes, files are transferred along the same chain: from the batch system to the CE, then from the CE to the original submitter. Hosted CE over SSH \u00b6 The Hosted CE is intended for small sites or as an introduction to providing capacity to collaborations. OSG staff configure and maintain an HTCondor-CE on behalf of the site. The Hosted CE is a special configuration of HTCondor-CE that can submit jobs to a remote cluster over SSH. It provides a simple starting point for opportunistic resource owners that want to start contributing capacity with minimal effort: an organization will be able to accept CE jobs by allowing SSH access to a login node in their cluster. If your site intends to run over 10,000 concurrent CE jobs, you will need to host your own HTCondor-CE because the Hosted CE has not yet been optimized for such loads. If you are interested in a Hosted CE solution, please follow the instructions on this page . On HTCondor batch systems \u00b6 For a site with an HTCondor batch system , the JobRouter can use HTCondor protocols to place a transformed copy of the CE job directly into the batch system\u2019s scheduler, meaning that the routed and batch system jobs are one and the same. Thus, there are three representations of your job, each with its own ID (see diagram below): Access point: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID HTCondor batch system: the routed job\u2019s ID In an HTCondor-CE/HTCondor setup, files are transferred from HTCondor-CE\u2019s spool directory to the batch system\u2019s spool directory using internal HTCondor protocols. Note The JobRouter copies the job directly into the batch system and does not make use of condor_submit . This means that if the HTCondor batch system is configured to add attributes to incoming jobs when they are submitted (i.e., SUBMIT_EXPRS ), these attributes will not be added to the routed jobs. On other batch systems \u00b6 For non-HTCondor batch systems, the JobRouter transforms the CE job into a routed job on the CE and the routed job submits a job into the batch system via a process called the BLAHP. Thus, there are four representations of your job, each with its own ID (see diagram below): Login node: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID and the routed job\u2019s ID HTCondor batch system: the batch system\u2019s job ID Although the following figure specifies the PBS case, it applies to all non-HTCondor batch systems: With non-HTCondor batch systems, HTCondor-CE cannot use internal HTCondor protocols to transfer files so its spool directory must be exported to a shared file system that is mounted on the batch system\u2019s worker nodes. How the CE is Customized \u00b6 Aside from the basic configuration required in the CE installation, there are two main ways to customize your CE (if you decide any customization is required at all): Deciding which collaborations are allowed to run at your site: collaborations will submit resource allocation requests to your CE using bearer tokens, and you can configure which collaboration's tokens you are willing to accept. How to filter and transform the CE jobs to be run on your batch system: Filtering and transforming CE jobs (i.e., setting site-specific attributes or resource limits), requires configuration of your site\u2019s job routes. For examples of common job routes, consult the JobRouter recipes page. Note If you are running HTCondor as your batch system, you will have two HTCondor configurations side-by-side (one residing in /etc/condor/ and the other in /etc/condor-ce ) and will need to make sure to differentiate the two when editing any configuration. How Security Works \u00b6 Among OSG services, communication is secured between various parties using a combination of PKI infrastructure involving Certificate Authorities (CAs) and bearer tokens. Services such as a Compute Entrypoint, present host certificates to prove their identity to clients, much like your browser verifies websites that you may visit. And to use these services, clients present bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client. In turn, the service may be configured to authorize the client based on their collaboration. Next steps \u00b6 Once the basic installation is done, additional activities include: Setting up job routes to customize incoming jobs Submitting jobs to a HTCondor-CE Troubleshooting the HTCondor-CE Register the CE Register with the OSG GlideinWMS factories and/or the ATLAS AutoPyFactory","title":"HTCondor-CE Overview"},{"location":"compute-element/htcondor-ce-overview/#htcondor-ce-overview","text":"This document serves as an introduction to HTCondor-CE and how it works. Before continuing with the overview, make sure that you are familiar with the following concepts: An OSG site plan What is a batch system and which one will you use ( HTCondor , PBS, LSF, SGE, or SLURM )? Security via host certificates to authenticate servers and bearer tokens to authenticate clients Pilot jobs, frontends, and factories (i.e., GlideinWMS , AutoPyFactory)","title":"HTCondor-CE Overview"},{"location":"compute-element/htcondor-ce-overview/#what-is-a-compute-entrypoint","text":"An OSG Compute Entrypoint (CE) is the door for remote organizations to submit requests to temporarily allocate local compute resources. At the heart of the CE is the job gateway software, which is responsible for handling incoming jobs, authenticating and authorizing them, and delegating them to your batch system for execution. Most jobs that arrive at a CE (here referred to as \"CE jobs\") are not end-user jobs, but rather pilot jobs submitted from factories. Successful pilot jobs create and make available an environment for actual end-user jobs to match and ultimately run within the pilot job container. Eventually pilot jobs remove themselves, typically after a period of inactivity. Note The Compute Entrypoint was previously known as the \"Compute Element\".","title":"What is a Compute Entrypoint?"},{"location":"compute-element/htcondor-ce-overview/#what-is-htcondor-ce","text":"HTCondor-CE is a special configuration of the HTCondor software designed to be a job gateway solution for the OSG Fabric of Services. It is configured to use the JobRouter daemon to delegate jobs by transforming and submitting them to the site\u2019s batch system. Benefits of running the HTCondor-CE: Scalability: HTCondor-CE is capable of supporting job workloads of large sites Debugging tools: HTCondor-CE offers many tools to help troubleshoot issues with jobs Routing as configuration: HTCondor-CE\u2019s mechanism to transform and submit jobs is customized via configuration variables, which means that customizations will persist across upgrades and will not involve modification of software internals to route jobs","title":"What is HTCondor-CE?"},{"location":"compute-element/htcondor-ce-overview/#how-ce-jobs-run","text":"Once an incoming CE job is authorized, it is placed into HTCondor-CE\u2019s scheduler where the JobRouter creates a transformed copy (called the routed job ) and submits the copy to the batch system (called the batch system job ). After submission, HTCondor-CE monitors the batch system job and communicates its status to the original CE job, which in turn notifies the original submitter (e.g., job factory) of any updates. When the job completes, files are transferred along the same chain: from the batch system to the CE, then from the CE to the original submitter.","title":"How CE Jobs Run"},{"location":"compute-element/htcondor-ce-overview/#hosted-ce-over-ssh","text":"The Hosted CE is intended for small sites or as an introduction to providing capacity to collaborations. OSG staff configure and maintain an HTCondor-CE on behalf of the site. The Hosted CE is a special configuration of HTCondor-CE that can submit jobs to a remote cluster over SSH. It provides a simple starting point for opportunistic resource owners that want to start contributing capacity with minimal effort: an organization will be able to accept CE jobs by allowing SSH access to a login node in their cluster. If your site intends to run over 10,000 concurrent CE jobs, you will need to host your own HTCondor-CE because the Hosted CE has not yet been optimized for such loads. If you are interested in a Hosted CE solution, please follow the instructions on this page .","title":"Hosted CE over SSH"},{"location":"compute-element/htcondor-ce-overview/#on-htcondor-batch-systems","text":"For a site with an HTCondor batch system , the JobRouter can use HTCondor protocols to place a transformed copy of the CE job directly into the batch system\u2019s scheduler, meaning that the routed and batch system jobs are one and the same. Thus, there are three representations of your job, each with its own ID (see diagram below): Access point: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID HTCondor batch system: the routed job\u2019s ID In an HTCondor-CE/HTCondor setup, files are transferred from HTCondor-CE\u2019s spool directory to the batch system\u2019s spool directory using internal HTCondor protocols. Note The JobRouter copies the job directly into the batch system and does not make use of condor_submit . This means that if the HTCondor batch system is configured to add attributes to incoming jobs when they are submitted (i.e., SUBMIT_EXPRS ), these attributes will not be added to the routed jobs.","title":"On HTCondor batch systems"},{"location":"compute-element/htcondor-ce-overview/#on-other-batch-systems","text":"For non-HTCondor batch systems, the JobRouter transforms the CE job into a routed job on the CE and the routed job submits a job into the batch system via a process called the BLAHP. Thus, there are four representations of your job, each with its own ID (see diagram below): Login node: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID and the routed job\u2019s ID HTCondor batch system: the batch system\u2019s job ID Although the following figure specifies the PBS case, it applies to all non-HTCondor batch systems: With non-HTCondor batch systems, HTCondor-CE cannot use internal HTCondor protocols to transfer files so its spool directory must be exported to a shared file system that is mounted on the batch system\u2019s worker nodes.","title":"On other batch systems"},{"location":"compute-element/htcondor-ce-overview/#how-the-ce-is-customized","text":"Aside from the basic configuration required in the CE installation, there are two main ways to customize your CE (if you decide any customization is required at all): Deciding which collaborations are allowed to run at your site: collaborations will submit resource allocation requests to your CE using bearer tokens, and you can configure which collaboration's tokens you are willing to accept. How to filter and transform the CE jobs to be run on your batch system: Filtering and transforming CE jobs (i.e., setting site-specific attributes or resource limits), requires configuration of your site\u2019s job routes. For examples of common job routes, consult the JobRouter recipes page. Note If you are running HTCondor as your batch system, you will have two HTCondor configurations side-by-side (one residing in /etc/condor/ and the other in /etc/condor-ce ) and will need to make sure to differentiate the two when editing any configuration.","title":"How the CE is Customized"},{"location":"compute-element/htcondor-ce-overview/#how-security-works","text":"Among OSG services, communication is secured between various parties using a combination of PKI infrastructure involving Certificate Authorities (CAs) and bearer tokens. Services such as a Compute Entrypoint, present host certificates to prove their identity to clients, much like your browser verifies websites that you may visit. And to use these services, clients present bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client. In turn, the service may be configured to authorize the client based on their collaboration.","title":"How Security Works"},{"location":"compute-element/htcondor-ce-overview/#next-steps","text":"Once the basic installation is done, additional activities include: Setting up job routes to customize incoming jobs Submitting jobs to a HTCondor-CE Troubleshooting the HTCondor-CE Register the CE Register with the OSG GlideinWMS factories and/or the ATLAS AutoPyFactory","title":"Next steps"},{"location":"compute-element/install-htcondor-ce/","text":"Installing and Maintaining HTCondor-CE \u00b6 The HTCondor-CE software is a job gateway for an OSG Compute Entrypoint (CE). As such, the OSG will submit resource allocation requests (RARs) jobs to your HTCondor-CE and it will handle authorization and delegation of RARs to your local batch system. In OSG today, RARs are sent to CEs as pilot jobs from a factory, which in turn are able to accept and run end-user jobs. See the upstream documentation for a more detailed introduction. Use this page to learn how to install, configure, run, test, and troubleshoot an OSG HTCondor-CE. OSG Hosted CE Unless you plan on running more than 10k concurrently running RARs or plan on making frequent configuration changes, we suggest requesting an OSG Hosted CE . Note If you are installing an HTCondor-CE for use outside of the OSG, consult the upstream documentation instead. Before Starting \u00b6 Before starting the installation process, consider the following points, consulting the upstream references as needed ( HTCondor-CE 23 ): User IDs: If they do not exist already, the installation will create the Linux users condor (UID 4716) and gratia You will also need to create Unix accounts for each collaboration that you wish to support. See details in the 'Configuring authentication' section below . SSL certificate: The HTCondor-CE service uses a host certificate and an accompanying key. If using a Let's Encrypt cert, install these as /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key If using an IGTF cert, install these as /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem See details in the Host Certificates overview . DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP) Access point/login node: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster File Systems : Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Install the appropriate EPEL and OSG Yum repositories for your operating system. Obtain root access to the host Install CA certificates Installing HTCondor-CE \u00b6 An HTCondor-CE installation consists of the job gateway (i.e., the HTCondor-CE job router) and other support software (e.g., osg-configure , a Gratia probe for OSG accounting). To simplify installation, OSG provides convenience RPMs that install all required software. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages (Optional) If your batch system is already installed via non-RPM means and is in the following list, install the appropriate 'empty' RPM. Otherwise, skip to the next step. If your batch system is\u2026 Then run the following command\u2026 HTCondor yum install empty-condor --enablerepo=osg-empty SLURM yum install empty-slurm --enablerepo=osg-empty (Optional) If your HTCondor batch system is already installed via non-OSG RPM means, add the line below to /etc/yum.repos.d/osg.repo . Otherwise, skip to the next step. exclude=condor Select the appropriate convenience RPM: If your batch system is... Then use the following package... HTCondor osg-ce-condor LSF osg-ce-lsf PBS osg-ce-pbs SGE osg-ce-sge SLURM osg-ce-slurm Install the CE software where <PACKAGE> is the package you selected in the above step.: root@host # yum install <PACKAGE> Configuring HTCondor-CE \u00b6 There are a few required configuration steps to connect HTCondor-CE with your batch system and authentication method. For more advanced configuration, see the section on optional configurations . Configuring the local batch system \u00b6 To configure HTCondor-CE to integrate with your local batch system, please refer to the upstream documentation . Configuring authentication \u00b6 HTCondor-CE clients will submit RARs accompanied by bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client The osg-scitokens-mapfile , pulled in by the osg-ce package, provides default token to local user mappings. To accept RARs from a particular collaboration: Create the Unix account(s) corresponding to the last field in the default mapfile: /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf . For example, to add support for the OSPool, create the osg user account on the CE and across your cluster. (Optional) if you wish to change the user mapping, copy the relevant mapping from /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf to a .conf file in /etc/condor-ce/mapfiles.d/ and change the last field to the desired username. For example, if you wish to add support for the OSPool but prefer to map OSPool pilot jobs to the osgpilot account that you created on your CE and across your cluster, you could add the following to /etc/condor-ce/mapfiles.d/50-ospool.conf : # OSG SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot For more details of the mapfile format, consult the \"SciTokens\" section of the upstream documentation . Bannning a collaboration \u00b6 Implicit banning Note that if you have not created the mapped user per the above section , it is not strictly necessary to add a ban mapping. HTCondor-CE will only authenticate remote RAR submission for the relevant credential if the Unix user exists. To explicitly ban a remote submitter from your HTCondor-CE, add a line like the following to a file in /etc/condor-ce/mapfiles.d/*.conf : SCITOKENS /<TOKEN ISSUER>,<TOKEN SUBJECT>/ <USER>@banned.htcondor.org Replacing <CREDENTIAL> with a regular expression and <USER> with an arbitrary user name. For example, to ban OSPool pilots from your site, you could add the following to /etc/condor-ce/config.d/99-bans.conf : SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot@banned.htcondor.org Automatic configuration \u00b6 The OSG CE metapackage brings along a configuration tool, osg-configure , that is designed to automatically configure the different pieces of software required for an OSG HTCondor-CE: Enable your batch system in the HTCondor-CE configuration by editing the enabled field in the /etc/osg/config.d/20-<YOUR BATCH SYSTEM>.ini : enabled = True Read through the other .ini files in the /etc/osg/config.d directory and make any necessary changes. See the osg-configure documentation for details. Validate the configuration settings root@host # osg-configure -v Fix any errors (at least) that osg-configure reports. Once the validation command succeeds without errors, apply the configuration settings: root@host # osg-configure -c Optional configuration \u00b6 In addition to the configurations above, you may need to further configure how pilot jobs are filtered and transformed before they are submitted to your local batch system or otherwise change the behavior of your CE. For detailed instructions, please refer to the upstream documentation: Configuring the Job Router Optional configuration Accounting with multiple CEs or local user jobs \u00b6 Note For non-HTCondor batch systems only If your site has multiple CEs or you have local users submitting to the same local batch system, the OSG accounting software needs to be configured so that it doesn't over report the number of jobs. Modify the value of SuppressNoDNRecords in /etc/gratia/htcondor-ce/ProbeConfig on each of your CE's so that it reads: SuppressNoDNRecords=\"1\" Starting and Validating HTCondor-CE \u00b6 For information on how to start and validate the core HTCondor-CE services, please refer to the upstream documentation Troubleshooting HTCondor-CE \u00b6 For information on how to troubleshoot your HTCondor-CE, please refer to the upstream documentation: Common issues Debugging tools Helpful logs Registering the CE \u00b6 To contribute capacity, your CE must be registered with the OSG Consortium . To register your resource: Identify the facility, site, and resource group where your HTCondor-CE is hosted. For example, the Center for High Throughput Computing at the University of Wisconsin-Madison uses the following information: Facility: University of Wisconsin Site: CHTC Resource Group: CHTC Using the above information, create or update the appropriate YAML file, using this template as a guide. Getting Help \u00b6 To get assistance, please use the this page .","title":"Install HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#installing-and-maintaining-htcondor-ce","text":"The HTCondor-CE software is a job gateway for an OSG Compute Entrypoint (CE). As such, the OSG will submit resource allocation requests (RARs) jobs to your HTCondor-CE and it will handle authorization and delegation of RARs to your local batch system. In OSG today, RARs are sent to CEs as pilot jobs from a factory, which in turn are able to accept and run end-user jobs. See the upstream documentation for a more detailed introduction. Use this page to learn how to install, configure, run, test, and troubleshoot an OSG HTCondor-CE. OSG Hosted CE Unless you plan on running more than 10k concurrently running RARs or plan on making frequent configuration changes, we suggest requesting an OSG Hosted CE . Note If you are installing an HTCondor-CE for use outside of the OSG, consult the upstream documentation instead.","title":"Installing and Maintaining HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#before-starting","text":"Before starting the installation process, consider the following points, consulting the upstream references as needed ( HTCondor-CE 23 ): User IDs: If they do not exist already, the installation will create the Linux users condor (UID 4716) and gratia You will also need to create Unix accounts for each collaboration that you wish to support. See details in the 'Configuring authentication' section below . SSL certificate: The HTCondor-CE service uses a host certificate and an accompanying key. If using a Let's Encrypt cert, install these as /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key If using an IGTF cert, install these as /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem See details in the Host Certificates overview . DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP) Access point/login node: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster File Systems : Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Install the appropriate EPEL and OSG Yum repositories for your operating system. Obtain root access to the host Install CA certificates","title":"Before Starting"},{"location":"compute-element/install-htcondor-ce/#installing-htcondor-ce","text":"An HTCondor-CE installation consists of the job gateway (i.e., the HTCondor-CE job router) and other support software (e.g., osg-configure , a Gratia probe for OSG accounting). To simplify installation, OSG provides convenience RPMs that install all required software. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages (Optional) If your batch system is already installed via non-RPM means and is in the following list, install the appropriate 'empty' RPM. Otherwise, skip to the next step. If your batch system is\u2026 Then run the following command\u2026 HTCondor yum install empty-condor --enablerepo=osg-empty SLURM yum install empty-slurm --enablerepo=osg-empty (Optional) If your HTCondor batch system is already installed via non-OSG RPM means, add the line below to /etc/yum.repos.d/osg.repo . Otherwise, skip to the next step. exclude=condor Select the appropriate convenience RPM: If your batch system is... Then use the following package... HTCondor osg-ce-condor LSF osg-ce-lsf PBS osg-ce-pbs SGE osg-ce-sge SLURM osg-ce-slurm Install the CE software where <PACKAGE> is the package you selected in the above step.: root@host # yum install <PACKAGE>","title":"Installing HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#configuring-htcondor-ce","text":"There are a few required configuration steps to connect HTCondor-CE with your batch system and authentication method. For more advanced configuration, see the section on optional configurations .","title":"Configuring HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#configuring-the-local-batch-system","text":"To configure HTCondor-CE to integrate with your local batch system, please refer to the upstream documentation .","title":"Configuring the local batch system"},{"location":"compute-element/install-htcondor-ce/#configuring-authentication","text":"HTCondor-CE clients will submit RARs accompanied by bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client The osg-scitokens-mapfile , pulled in by the osg-ce package, provides default token to local user mappings. To accept RARs from a particular collaboration: Create the Unix account(s) corresponding to the last field in the default mapfile: /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf . For example, to add support for the OSPool, create the osg user account on the CE and across your cluster. (Optional) if you wish to change the user mapping, copy the relevant mapping from /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf to a .conf file in /etc/condor-ce/mapfiles.d/ and change the last field to the desired username. For example, if you wish to add support for the OSPool but prefer to map OSPool pilot jobs to the osgpilot account that you created on your CE and across your cluster, you could add the following to /etc/condor-ce/mapfiles.d/50-ospool.conf : # OSG SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot For more details of the mapfile format, consult the \"SciTokens\" section of the upstream documentation .","title":"Configuring authentication"},{"location":"compute-element/install-htcondor-ce/#bannning-a-collaboration","text":"Implicit banning Note that if you have not created the mapped user per the above section , it is not strictly necessary to add a ban mapping. HTCondor-CE will only authenticate remote RAR submission for the relevant credential if the Unix user exists. To explicitly ban a remote submitter from your HTCondor-CE, add a line like the following to a file in /etc/condor-ce/mapfiles.d/*.conf : SCITOKENS /<TOKEN ISSUER>,<TOKEN SUBJECT>/ <USER>@banned.htcondor.org Replacing <CREDENTIAL> with a regular expression and <USER> with an arbitrary user name. For example, to ban OSPool pilots from your site, you could add the following to /etc/condor-ce/config.d/99-bans.conf : SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot@banned.htcondor.org","title":"Bannning a collaboration"},{"location":"compute-element/install-htcondor-ce/#automatic-configuration","text":"The OSG CE metapackage brings along a configuration tool, osg-configure , that is designed to automatically configure the different pieces of software required for an OSG HTCondor-CE: Enable your batch system in the HTCondor-CE configuration by editing the enabled field in the /etc/osg/config.d/20-<YOUR BATCH SYSTEM>.ini : enabled = True Read through the other .ini files in the /etc/osg/config.d directory and make any necessary changes. See the osg-configure documentation for details. Validate the configuration settings root@host # osg-configure -v Fix any errors (at least) that osg-configure reports. Once the validation command succeeds without errors, apply the configuration settings: root@host # osg-configure -c","title":"Automatic configuration"},{"location":"compute-element/install-htcondor-ce/#optional-configuration","text":"In addition to the configurations above, you may need to further configure how pilot jobs are filtered and transformed before they are submitted to your local batch system or otherwise change the behavior of your CE. For detailed instructions, please refer to the upstream documentation: Configuring the Job Router Optional configuration","title":"Optional configuration"},{"location":"compute-element/install-htcondor-ce/#accounting-with-multiple-ces-or-local-user-jobs","text":"Note For non-HTCondor batch systems only If your site has multiple CEs or you have local users submitting to the same local batch system, the OSG accounting software needs to be configured so that it doesn't over report the number of jobs. Modify the value of SuppressNoDNRecords in /etc/gratia/htcondor-ce/ProbeConfig on each of your CE's so that it reads: SuppressNoDNRecords=\"1\"","title":"Accounting with multiple CEs or local user jobs"},{"location":"compute-element/install-htcondor-ce/#starting-and-validating-htcondor-ce","text":"For information on how to start and validate the core HTCondor-CE services, please refer to the upstream documentation","title":"Starting and Validating HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#troubleshooting-htcondor-ce","text":"For information on how to troubleshoot your HTCondor-CE, please refer to the upstream documentation: Common issues Debugging tools Helpful logs","title":"Troubleshooting HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#registering-the-ce","text":"To contribute capacity, your CE must be registered with the OSG Consortium . To register your resource: Identify the facility, site, and resource group where your HTCondor-CE is hosted. For example, the Center for High Throughput Computing at the University of Wisconsin-Madison uses the following information: Facility: University of Wisconsin Site: CHTC Resource Group: CHTC Using the above information, create or update the appropriate YAML file, using this template as a guide.","title":"Registering the CE"},{"location":"compute-element/install-htcondor-ce/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"compute-element/job-router-recipes/","text":"Up-to-date documentation can be found at https://osg-htc.org/docs/compute-element/install-htcondor-ce/","title":"Job router recipes"},{"location":"compute-element/slurm-recipes/","text":"Slurm Configuration Recipes \u00b6 This document contains examples of common Slurm configurations used by sites to contribute capacity to the OSPool. Contributing X% of Your Cluster \u00b6 To contribute a percentage of your Slurm cluster to the OSPool, set aside a number of whole nodes for a dedicated OSPool partition : Determine the percentage of your cluster that you would like to contribute and use that to calculate the number of cores to meet that percentage Select nodes and sum the number of cores to meet your desired contribution In slurm.conf , configure the NodeName for each type of chassis and assign specific nodes to PartitionName=ospool For example, if your cluster is 5120 cores and you wanted to contribute 10% of the cluster to the OSPool, your slurm.conf could contain the following: # Dell PowerEdge C6525, AMD EPYC 7513 32-Core Processor @ 2.6GHz NodeName=spark-a[002-004,006-028] CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=1 RealMemory=256000 State=UNKNOWN Features=amd,avx,avx2 # Dell PowerEdge R6525, AMD EPYC 7763 64-Core Processor NodeName=spark-a[029-071,204-206] CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=1 RealMemory=512000 State=UNKNOWN Features=amd,avx,avx2 # OSPool Partition, -- 10% of Shared is approx 512 cores; 6x64cores + 1x128 cores = 512 PartitionName=ospool State=UP Nodes=spark-a[002-004,006-008,029] DefaultTime=0-04:00:00 MaxTime=1-00:00:00 PreemptMode=OFF Priority=50 AllowGroups=slurm-admin,osg01","title":"Slurm recipes"},{"location":"compute-element/slurm-recipes/#slurm-configuration-recipes","text":"This document contains examples of common Slurm configurations used by sites to contribute capacity to the OSPool.","title":"Slurm Configuration Recipes"},{"location":"compute-element/slurm-recipes/#contributing-x-of-your-cluster","text":"To contribute a percentage of your Slurm cluster to the OSPool, set aside a number of whole nodes for a dedicated OSPool partition : Determine the percentage of your cluster that you would like to contribute and use that to calculate the number of cores to meet that percentage Select nodes and sum the number of cores to meet your desired contribution In slurm.conf , configure the NodeName for each type of chassis and assign specific nodes to PartitionName=ospool For example, if your cluster is 5120 cores and you wanted to contribute 10% of the cluster to the OSPool, your slurm.conf could contain the following: # Dell PowerEdge C6525, AMD EPYC 7513 32-Core Processor @ 2.6GHz NodeName=spark-a[002-004,006-028] CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=1 RealMemory=256000 State=UNKNOWN Features=amd,avx,avx2 # Dell PowerEdge R6525, AMD EPYC 7763 64-Core Processor NodeName=spark-a[029-071,204-206] CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=1 RealMemory=512000 State=UNKNOWN Features=amd,avx,avx2 # OSPool Partition, -- 10% of Shared is approx 512 cores; 6x64cores + 1x128 cores = 512 PartitionName=ospool State=UP Nodes=spark-a[002-004,006-008,029] DefaultTime=0-04:00:00 MaxTime=1-00:00:00 PreemptMode=OFF Priority=50 AllowGroups=slurm-admin,osg01","title":"Contributing X% of Your Cluster"},{"location":"compute-element/submit-htcondor-ce/","text":"Up-to-date documentation can be found at https://osg-htc.org/docs/compute-element/install-htcondor-ce/","title":"Submit htcondor ce"},{"location":"compute-element/troubleshoot-htcondor-ce/","text":"Up-to-date documentation can be found at https://osg-htc.org/docs/compute-element/install-htcondor-ce/","title":"Troubleshoot htcondor ce"},{"location":"data/external-oasis-repos/","text":"Install an OASIS Repository \u00b6 OASIS (the OSG A pplication S oftware I nstallation S ervice) is an infrastructure, based on CVMFS , for distributing software throughout the OSG. Once software is installed into an OASIS repository, the goal is to make it available across about 90% of the OSG within an hour. OASIS consists of keysigning infrastructure, a content distribution network (CDN), and a shared CVMFS repository that is hosted by the OSG. Many use cases will be covered by utilizing the shared repository ; this document covers how to install, configure, and host your own CVMFS repository server . This server will distribute software via OASIS, but will be hosted and operated externally from the OSG project. OASIS-based distribution and key signing is available to OSG VOs or repositories affiliated with an OSG VO. See the policy page for more information on what repositories OSG is willing to distribute. Before Starting \u00b6 The host OS must be: RHEL7 or RHEL8 (or equivalent). Additionally, User IDs: If it does not exist already, the installation will create the cvmfs Linux user Group IDs: If they do not exist already, the installation will create the Linux groups cvmfs and fuse Network ports: This page will configure the repository to distribute using Apache HTTPD on port 8000. At the minimum, the repository needs in-bound access from the OASIS CDN. Disk space: This host will need enough free disk space to host two copies of the software: one compressed and one uncompressed. /srv/cvmfs will hold all the published data (compressed and de-deuplicated). The /var/spool/cvmfs directory will contain all the data in all current transactions (uncompressed). Root access will be needed to install. Installation of software into the repository itself will be done as an unprivileged user. Yum will need to be configured to use the OSG repositories . Overlay-FS limitations CVMFS on RHEL7 only supports Overlay-FS if the underlying filesystem is ext3 or ext4 ; make sure /var/spool/cvmfs is one of these filesystem types. If this is not possible, add CVMFS_DONT_CHECK_OVERLAYFS_VERSION=yes to your CVMFS configuration. Using xfs will work if it was created with ftype=1 Installation \u00b6 Installation is a straightforward install via yum : root@host # yum install cvmfs-server osg-oasis Apache and Repository Mounts \u00b6 For all installs, we recommend mounting all the local repositories on startup: root@host # echo \"cvmfs_server mount -a\" >>/etc/rc.local root@host # chmod +x /etc/rc.local The Apache HTTPD service should be configured to listen on port 8000, have the KeepAlive option enabled, and be started: root@host # echo Listen 8000 >>/etc/httpd/conf.d/cvmfs.conf root@host # echo KeepAlive on >>/etc/httpd/conf.d/cvmfs.conf root@host # chkconfig httpd on root@host # service httpd start Check Firewalls Make sure that port 8000 is available to the Internet. Check the setting of the host- and site-level firewalls. The next steps will fail if the web server is not accessible. Creating a Repository \u00b6 Prior to creation, the repository administrator will need to make two decisions: Select a repository name ; typically, this is derived from the VO or project's name and ends in opensciencegrid.org . For example, the NoVA VO runs the repository nova.opensciencegrid.org . For this section, we will use <EXAMPLE.OPENSCIENCEGRID.ORG> . Select a repository owner : Software publication will need to run by a non- root Unix user account; for this document, we will use <LIBRARIAN> as the account name of the repository owner. The initial repository creation must be run as root : root@host # echo -e \"\\*\\\\t\\\\t-\\\\tnofile\\\\t\\\\t16384\" >>/etc/security/limits.conf root@host # ulimit -n 16384 root@host # cvmfs_server mkfs -o <LIBRARIAN> <EXAMPLE.OPENSCIENCEGRID.ORG> root@host # cat >/srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.htaccess <<xEOFx Order deny,allow Deny from all Allow from 127.0.0.1 Allow from ::1 Allow from 129.79.53.0/24 129.93.244.192/26 129.93.227.64/26 Allow from 2001:18e8:2:6::/56 2600:900:6::/48 xEOFx Here, we increase the number of open files allowed, create the repository using the mkfs command, and then limit the hosts that are allowed to access the repo to the OSG CDN. Next, adjust the configuration in the repository as follows. root@host # cat >>/etc/cvmfs/repositories.d/<EXAMPLE.OPENSCIENCEGRID.ORG>/server.conf <<xEOFx CVMFS_AUTO_TAG_TIMESPAN=\"2 weeks ago\" CVMFS_IGNORE_XDIR_HARDLINKS=true CVMFS_GENERATE_LEGACY_BULK_CHUNKS=false CVMFS_AUTOCATALOGS=true CVMFS_ENFORCE_LIMITS=true CVMFS_FORCE_REMOUNT_WARNING=false xEOFx Additionally, especially if files will be frequently deleted, enabling garbage collection is recommended in this way: root@host # cat >>/etc/cvmfs/repositories.d/<EXAMPLE.OPENSCIENCEGRID.ORG>/server.conf <<xEOFx CVMFS_GARBAGE_COLLECTION=true CVMFS_AUTO_GC=false xEOFx The above assumes that you have your own mechanism to run cvmfs_server gc regularly (typically daily) at a time when it won't interfere with publications, since garbage collection and publication can't be done at the same time. CVMFS_AUTO_GC=true will automatically run garbage collection periodically after publications, but those times are not always convenient. Also, check the cvmfs documentation for additional recommendations for special purpose repositories. Now verify that the repository is readable over HTTP: root@host # wget -qO- http://localhost:8000/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist | cat -v That should print several lines including some gibberish at the end. Hosting a Repository on OASIS \u00b6 In order to host a repository on OASIS, perform the following steps: Verify your VO's registration is up-to-date . All repositories need to be associated with a VO; the VO needs to assign an OASIS manager in Topology who would be responsible for the contents of any of the VO's repositories and will be contacted in case of issues. To designate an OASIS manager, have the VO manager update the Topology registration . Send a message to OSG support using the following template: Please add a new CVMFS repository to OASIS for VO <VO NAME> using the URL http://<FQDN>:8000/cvmfs/<OASIS REPOSITORY> The VO responsible manager will be <OASIS MANAGER>. Replace the <ANGLE BRACKET TEXT> items with the appropriate values. If the repository name matches *.opensciencegrid.org or *.osgstorage.org , wait for the go-ahead from the OSG representative before continuing with the remaining instructions; for all other repositories (such as *.egi.eu ), you are done. When you are told in the ticket to proceed to the next step, first if the repository might be in a transaction abort it: root@host # su <LIBRARIAN> -c \"cvmfs_server abort <EXAMPLE.OPENSCIENCEGRID.ORG>\" Then execute the following commands: root@host # wget -O /srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist \\ http://oasis.opensciencegrid.org/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist root@host # cp /etc/cvmfs/keys/opensciencegrid.org/opensciencegrid.org.pub \\ /etc/cvmfs/keys/<EXAMPLE.OPENSCIENCEGRID.ORG>.pub Replace <EXAMPLE.OPENSCIENCEGRID.ORG> as appropriate. If the cp command prompts about overwriting an existing file, type 'y'. Verify that publishing operation succeeds: root@host # su <LIBRARIAN> -c \"cvmfs_server transaction <EXAMPLE.OPENSCIENCEGRID.ORG>\" root@host # su <LIBRARIAN> -c \"cvmfs_server publish <EXAMPLE.OPENSCIENCEGRID.ORG>\" Within an hour, the repository updates should appear at the OSG Operations and FNAL Stratum-1 servers. On success, make sure the whitelist update happens daily by creating /etc/cron.d/fetch-cvmfs-whitelist with the following contents: 5 4 * * * <LIBRARIAN> cd /srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG> && wget -qO .cvmfswhitelist.new http://oasis.opensciencegrid.org/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist && mv .cvmfswhitelist.new .cvmfswhitelist Note This cronjob eliminates the need for the repository service administrator to periodically use cvmfs_server resign to update .cvmfswhitelist as described in the upstream CVMFS documentation. Update the open support ticket to indicate that the previous steps have been completed Once the repository is fully replicated on the OSG, the VO may proceed in publishing into CVMFS using the <LIBRARIAN> account on the repository server. Tip We strongly recommend the repository maintainer read through the upstream documentation on maintaining repositories and content limitations . Finally, if the new repository will be used outside of the U.S., the VO should open a GGUS ticket following EGI's PROC20 to get the repository replicated onto worldwide Stratum 1s. Replacing an Existing OASIS Repository Server \u00b6 If a need arises to replace a server for an existing *.opensciencegrid.org or *.osgstorage.org repository, there are two ways to do it: one without changing the DNS name and one with changing it. The latter can take longer because it requires OSG Operations intervention. Revision numbers must increase CVMFS does not allow repository revision numbers to decrease, so the instructions below make sure the revision numbers only go up. Without changing the server DNS name \u00b6 If you are recreating the repository on the same machine, use the following command to remove the repository configuration while preserving the data and keys: root@host # cvmfs_server rmfs -p <EXAMPLE.OPENSCIENCEGRID.ORG> Otherwise if it is a new machine, copy the keys from /etc/cvmfs/keys/ <EXAMPLE.OPENSCIENCEGRID.ORG> .* and the data from /srv/cvmfs/ <EXAMPLE.OPENSCIENCEGRID.ORG> from the old server to the new, making sure that no publish operations happen on the old server while you copy the data. Then in either case use cvmfs_server import instead of cvmfs_server mkfs in the above instructions for Creating the Repository , in order to reuse old data and keys. Note that you wil need to reapply any custom configuration changes under /etc/cvmfs/repositories.d/ ` that was on the old server. If you run an old and a new machine in parallel for a while, make sure that when you put the new machine into production (by moving the DNS name) that the new machine has had at least as many publishes as the old machine, so the revision number does not decrease. With changing the server DNS name \u00b6 Note If you create a repository from scratch, as opposed to copying the data and keys from an old server, it is in fact better to change the DNS name of the server because that causes the OSG Operations server to reinitialize the .cvmfswhitelist. If you create a replacement repository on a new machine from scratch, follow the normal instructions on this page above, but with the following differences in the Hosting a Repository on OASIS section: In step 2, instead of asking in the support ticket to create a new repository, give the new URL and ask them to change the repository registration to that URL. When you do the publish in step 5, add a -n NNNN option where NNNN is a revision number greater than the number on the existing repository. That number can be found by this command on a client machine: user@host $ attr -qg revision /cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG> Skip step 6; there is no need to tell OSG Operations when you are finished. After enough time has elapsed for the publish to propagate to clients, typically around 15 minutes, verify that the new chosen revision has reached a client. Removing a Repository from OASIS \u00b6 In order to remove a repository that is being hosted on OASIS, perform the following steps: If the repository has been replicated outside of the U.S., open a GGUS ticket assigned to support unit \"Software and Data Distribution (CVMFS)\" asking that the replication be removed from EGI Stratum-1s. Remind them in the ticket that there are worldwide Stratum-1s that automatically replicate all OSG repositories that RAL replicates, so those Stratum-1s cannot remove their replicas before RAL does but their administrators will need to be notified to remove their replicas within 8 hours after RAL does to avoid alarms. Wait until this ticket is resolved before proceeding. Open a support ticket asking to shut down the repository, giving the repository name (e.g., <EXAMPLE.OPENSCIENCEGRID.ORG> ), and the corresponding VO.","title":"Install an OASIS Repo"},{"location":"data/external-oasis-repos/#install-an-oasis-repository","text":"OASIS (the OSG A pplication S oftware I nstallation S ervice) is an infrastructure, based on CVMFS , for distributing software throughout the OSG. Once software is installed into an OASIS repository, the goal is to make it available across about 90% of the OSG within an hour. OASIS consists of keysigning infrastructure, a content distribution network (CDN), and a shared CVMFS repository that is hosted by the OSG. Many use cases will be covered by utilizing the shared repository ; this document covers how to install, configure, and host your own CVMFS repository server . This server will distribute software via OASIS, but will be hosted and operated externally from the OSG project. OASIS-based distribution and key signing is available to OSG VOs or repositories affiliated with an OSG VO. See the policy page for more information on what repositories OSG is willing to distribute.","title":"Install an OASIS Repository"},{"location":"data/external-oasis-repos/#before-starting","text":"The host OS must be: RHEL7 or RHEL8 (or equivalent). Additionally, User IDs: If it does not exist already, the installation will create the cvmfs Linux user Group IDs: If they do not exist already, the installation will create the Linux groups cvmfs and fuse Network ports: This page will configure the repository to distribute using Apache HTTPD on port 8000. At the minimum, the repository needs in-bound access from the OASIS CDN. Disk space: This host will need enough free disk space to host two copies of the software: one compressed and one uncompressed. /srv/cvmfs will hold all the published data (compressed and de-deuplicated). The /var/spool/cvmfs directory will contain all the data in all current transactions (uncompressed). Root access will be needed to install. Installation of software into the repository itself will be done as an unprivileged user. Yum will need to be configured to use the OSG repositories . Overlay-FS limitations CVMFS on RHEL7 only supports Overlay-FS if the underlying filesystem is ext3 or ext4 ; make sure /var/spool/cvmfs is one of these filesystem types. If this is not possible, add CVMFS_DONT_CHECK_OVERLAYFS_VERSION=yes to your CVMFS configuration. Using xfs will work if it was created with ftype=1","title":"Before Starting"},{"location":"data/external-oasis-repos/#installation","text":"Installation is a straightforward install via yum : root@host # yum install cvmfs-server osg-oasis","title":"Installation"},{"location":"data/external-oasis-repos/#apache-and-repository-mounts","text":"For all installs, we recommend mounting all the local repositories on startup: root@host # echo \"cvmfs_server mount -a\" >>/etc/rc.local root@host # chmod +x /etc/rc.local The Apache HTTPD service should be configured to listen on port 8000, have the KeepAlive option enabled, and be started: root@host # echo Listen 8000 >>/etc/httpd/conf.d/cvmfs.conf root@host # echo KeepAlive on >>/etc/httpd/conf.d/cvmfs.conf root@host # chkconfig httpd on root@host # service httpd start Check Firewalls Make sure that port 8000 is available to the Internet. Check the setting of the host- and site-level firewalls. The next steps will fail if the web server is not accessible.","title":"Apache and Repository Mounts"},{"location":"data/external-oasis-repos/#creating-a-repository","text":"Prior to creation, the repository administrator will need to make two decisions: Select a repository name ; typically, this is derived from the VO or project's name and ends in opensciencegrid.org . For example, the NoVA VO runs the repository nova.opensciencegrid.org . For this section, we will use <EXAMPLE.OPENSCIENCEGRID.ORG> . Select a repository owner : Software publication will need to run by a non- root Unix user account; for this document, we will use <LIBRARIAN> as the account name of the repository owner. The initial repository creation must be run as root : root@host # echo -e \"\\*\\\\t\\\\t-\\\\tnofile\\\\t\\\\t16384\" >>/etc/security/limits.conf root@host # ulimit -n 16384 root@host # cvmfs_server mkfs -o <LIBRARIAN> <EXAMPLE.OPENSCIENCEGRID.ORG> root@host # cat >/srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.htaccess <<xEOFx Order deny,allow Deny from all Allow from 127.0.0.1 Allow from ::1 Allow from 129.79.53.0/24 129.93.244.192/26 129.93.227.64/26 Allow from 2001:18e8:2:6::/56 2600:900:6::/48 xEOFx Here, we increase the number of open files allowed, create the repository using the mkfs command, and then limit the hosts that are allowed to access the repo to the OSG CDN. Next, adjust the configuration in the repository as follows. root@host # cat >>/etc/cvmfs/repositories.d/<EXAMPLE.OPENSCIENCEGRID.ORG>/server.conf <<xEOFx CVMFS_AUTO_TAG_TIMESPAN=\"2 weeks ago\" CVMFS_IGNORE_XDIR_HARDLINKS=true CVMFS_GENERATE_LEGACY_BULK_CHUNKS=false CVMFS_AUTOCATALOGS=true CVMFS_ENFORCE_LIMITS=true CVMFS_FORCE_REMOUNT_WARNING=false xEOFx Additionally, especially if files will be frequently deleted, enabling garbage collection is recommended in this way: root@host # cat >>/etc/cvmfs/repositories.d/<EXAMPLE.OPENSCIENCEGRID.ORG>/server.conf <<xEOFx CVMFS_GARBAGE_COLLECTION=true CVMFS_AUTO_GC=false xEOFx The above assumes that you have your own mechanism to run cvmfs_server gc regularly (typically daily) at a time when it won't interfere with publications, since garbage collection and publication can't be done at the same time. CVMFS_AUTO_GC=true will automatically run garbage collection periodically after publications, but those times are not always convenient. Also, check the cvmfs documentation for additional recommendations for special purpose repositories. Now verify that the repository is readable over HTTP: root@host # wget -qO- http://localhost:8000/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist | cat -v That should print several lines including some gibberish at the end.","title":"Creating a Repository"},{"location":"data/external-oasis-repos/#hosting-a-repository-on-oasis","text":"In order to host a repository on OASIS, perform the following steps: Verify your VO's registration is up-to-date . All repositories need to be associated with a VO; the VO needs to assign an OASIS manager in Topology who would be responsible for the contents of any of the VO's repositories and will be contacted in case of issues. To designate an OASIS manager, have the VO manager update the Topology registration . Send a message to OSG support using the following template: Please add a new CVMFS repository to OASIS for VO <VO NAME> using the URL http://<FQDN>:8000/cvmfs/<OASIS REPOSITORY> The VO responsible manager will be <OASIS MANAGER>. Replace the <ANGLE BRACKET TEXT> items with the appropriate values. If the repository name matches *.opensciencegrid.org or *.osgstorage.org , wait for the go-ahead from the OSG representative before continuing with the remaining instructions; for all other repositories (such as *.egi.eu ), you are done. When you are told in the ticket to proceed to the next step, first if the repository might be in a transaction abort it: root@host # su <LIBRARIAN> -c \"cvmfs_server abort <EXAMPLE.OPENSCIENCEGRID.ORG>\" Then execute the following commands: root@host # wget -O /srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist \\ http://oasis.opensciencegrid.org/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist root@host # cp /etc/cvmfs/keys/opensciencegrid.org/opensciencegrid.org.pub \\ /etc/cvmfs/keys/<EXAMPLE.OPENSCIENCEGRID.ORG>.pub Replace <EXAMPLE.OPENSCIENCEGRID.ORG> as appropriate. If the cp command prompts about overwriting an existing file, type 'y'. Verify that publishing operation succeeds: root@host # su <LIBRARIAN> -c \"cvmfs_server transaction <EXAMPLE.OPENSCIENCEGRID.ORG>\" root@host # su <LIBRARIAN> -c \"cvmfs_server publish <EXAMPLE.OPENSCIENCEGRID.ORG>\" Within an hour, the repository updates should appear at the OSG Operations and FNAL Stratum-1 servers. On success, make sure the whitelist update happens daily by creating /etc/cron.d/fetch-cvmfs-whitelist with the following contents: 5 4 * * * <LIBRARIAN> cd /srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG> && wget -qO .cvmfswhitelist.new http://oasis.opensciencegrid.org/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist && mv .cvmfswhitelist.new .cvmfswhitelist Note This cronjob eliminates the need for the repository service administrator to periodically use cvmfs_server resign to update .cvmfswhitelist as described in the upstream CVMFS documentation. Update the open support ticket to indicate that the previous steps have been completed Once the repository is fully replicated on the OSG, the VO may proceed in publishing into CVMFS using the <LIBRARIAN> account on the repository server. Tip We strongly recommend the repository maintainer read through the upstream documentation on maintaining repositories and content limitations . Finally, if the new repository will be used outside of the U.S., the VO should open a GGUS ticket following EGI's PROC20 to get the repository replicated onto worldwide Stratum 1s.","title":"Hosting a Repository on OASIS"},{"location":"data/external-oasis-repos/#replacing-an-existing-oasis-repository-server","text":"If a need arises to replace a server for an existing *.opensciencegrid.org or *.osgstorage.org repository, there are two ways to do it: one without changing the DNS name and one with changing it. The latter can take longer because it requires OSG Operations intervention. Revision numbers must increase CVMFS does not allow repository revision numbers to decrease, so the instructions below make sure the revision numbers only go up.","title":"Replacing an Existing OASIS Repository Server"},{"location":"data/external-oasis-repos/#without-changing-the-server-dns-name","text":"If you are recreating the repository on the same machine, use the following command to remove the repository configuration while preserving the data and keys: root@host # cvmfs_server rmfs -p <EXAMPLE.OPENSCIENCEGRID.ORG> Otherwise if it is a new machine, copy the keys from /etc/cvmfs/keys/ <EXAMPLE.OPENSCIENCEGRID.ORG> .* and the data from /srv/cvmfs/ <EXAMPLE.OPENSCIENCEGRID.ORG> from the old server to the new, making sure that no publish operations happen on the old server while you copy the data. Then in either case use cvmfs_server import instead of cvmfs_server mkfs in the above instructions for Creating the Repository , in order to reuse old data and keys. Note that you wil need to reapply any custom configuration changes under /etc/cvmfs/repositories.d/ ` that was on the old server. If you run an old and a new machine in parallel for a while, make sure that when you put the new machine into production (by moving the DNS name) that the new machine has had at least as many publishes as the old machine, so the revision number does not decrease.","title":"Without changing the server DNS name"},{"location":"data/external-oasis-repos/#with-changing-the-server-dns-name","text":"Note If you create a repository from scratch, as opposed to copying the data and keys from an old server, it is in fact better to change the DNS name of the server because that causes the OSG Operations server to reinitialize the .cvmfswhitelist. If you create a replacement repository on a new machine from scratch, follow the normal instructions on this page above, but with the following differences in the Hosting a Repository on OASIS section: In step 2, instead of asking in the support ticket to create a new repository, give the new URL and ask them to change the repository registration to that URL. When you do the publish in step 5, add a -n NNNN option where NNNN is a revision number greater than the number on the existing repository. That number can be found by this command on a client machine: user@host $ attr -qg revision /cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG> Skip step 6; there is no need to tell OSG Operations when you are finished. After enough time has elapsed for the publish to propagate to clients, typically around 15 minutes, verify that the new chosen revision has reached a client.","title":"With changing the server DNS name"},{"location":"data/external-oasis-repos/#removing-a-repository-from-oasis","text":"In order to remove a repository that is being hosted on OASIS, perform the following steps: If the repository has been replicated outside of the U.S., open a GGUS ticket assigned to support unit \"Software and Data Distribution (CVMFS)\" asking that the replication be removed from EGI Stratum-1s. Remind them in the ticket that there are worldwide Stratum-1s that automatically replicate all OSG repositories that RAL replicates, so those Stratum-1s cannot remove their replicas before RAL does but their administrators will need to be notified to remove their replicas within 8 hours after RAL does to avoid alarms. Wait until this ticket is resolved before proceeding. Open a support ticket asking to shut down the repository, giving the repository name (e.g., <EXAMPLE.OPENSCIENCEGRID.ORG> ), and the corresponding VO.","title":"Removing a Repository from OASIS"},{"location":"data/frontier-squid/","text":"Install the Frontier Squid HTTP Caching Proxy \u00b6 Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. This document is intended for System Administrators who are installing frontier-squid , the OSG distribution of the Frontier Squid software. Frontier Squid Is Recommended \u00b6 OSG recommends that all sites run a caching proxy for HTTP and HTTPS to help reduce bandwidth and improve throughput. To that end, Compute Element (CE) installations include Frontier Squid automatically. We encourage all sites to configure and use this service, as described below. For large sites that expect heavy load on the proxy, it is best to run the proxy on its own host. If you are unsure if your site qualifies, we recommend initially running the proxy on your CE host and monitoring its bandwidth. If the network usage regularly peaks at over one third of the bandwidth capacity, move the proxy to a new host. Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If it does not exist already, the installation will create the squid Linux user Network ports: Clients within your cluster (e.g., OSG user jobs) will communicate with Frontier Squid on port 3128 (TCP). Additionally, central infrastructure will monitor Frontier Squid through port 3401 (UDP); see this section for more details. Host choice: If you will be supporting the Frontier application at your site, review the upstream documentation to determine how to size your equipment. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Installing Frontier Squid \u00b6 To install Frontier Squid, make sure that your host is up to date before installing the required packages: Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install Frontier Squid: root@host # yum install frontier-squid Configuring Frontier Squid \u00b6 Configuring the Frontier Squid Service \u00b6 To configure the Frontier Squid service itself: Follow the Configuration section of the upstream Frontier Squid documentation . Enable, start, and test the service (as described below). Register the squid (also as described below ). Note An important difference between the standard Squid software and the Frontier Squid variant is that Frontier Squid changes are in /etc/squid/customize.sh instead of /etc/squid/squid.conf . Configuring the OSG CE \u00b6 To configure the OSG Compute Entrypoint (CE) to know about your Frontier Squid service: On your CE host (which may be different than your Frontier Squid host), edit /etc/osg/config.d/01-squid.ini Make sure that enabled is set to True Set location to the hostname and port of your Frontier Squid service (e.g., my.squid.host.edu:3128 ) Leave the other settings at DEFAULT unless you have specific reasons to change them Run osg-configure -c to propagate the changes on your CE. Note You may want to finish other CE configuration tasks before running osg-configure . Just be sure to run it once before starting CE services. Using Frontier-Squid \u00b6 Start the frontier-squid service and enable it to start at boot time. As a reminder, here are common service commands (all run as root ): To... Run the command... Start the service systemctl start frontier-squid Stop the service systemctl stop frontier-squid Enable the service to start on boot systemctl enable frontier-squid Disable the service from starting on boot systemctl disable frontier-squid Validating Frontier Squid \u00b6 As any user on another computer, do the following (where <MY.SQUID.HOST.EDU> is the fully qualified domain name of your squid server): user@host $ export http_proxy = http:// ` <MY.SQUID.HOST.EDU> ` :3128 user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from `<MY.SQUID.HOST.EDU>` user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from `<MY.SQUID.HOST.EDU>` If the grep doesn't print anything, try removing it from the pipeline to see if errors are obvious. If the second try says MISS again, something is probably wrong with the squid cache writes. Look at the squid access.log file to try to see what's wrong. If your squid will be supporting the Frontier application, it is also good to do the test in the upstream documentation Testing the installation section . Registering Frontier Squid \u00b6 To register your Frontier Squid host, follow the general registration instructions here with the following Frontier Squid-specific details. Alternatively, contact us for assistance with the registration process. Add a Squid: section to the Services: list, with any relevant fields for that service. This is a partial example: ... FQDN: <FULLY QUALIFIED DOMAIN NAME> Services: Squid: Description: Generic squid service ... Replacing <FULLY QUALIFIED DOMAIN NAME> with your Frontier Squid server's DNS entry or in the case of multiple Frontier Squid servers for a single resource, the round-robin DNS entry. See the BNL_ATLAS_Frontier_Squid for a complete example. Normally registered squids will be monitored by WLCG. This is strongly recommended even for non-WLCG sites so operations experts can help with diagnosing problems. However, if a site declines monitoring, that can be indicated by setting Monitored: false in a Details: section below Description: . Registration is still important for the sake of excluding squids from worker node failover monitors. The default if Details: Monitored: is not set is true . If you set Monitored to true, also enable monitoring as described in the upstream documentation on enabling monitoring . A few hours after a squid is registered and marked Active (and not marked Monitored: false ), verify that it is monitored by WLCG . Reference \u00b6 Users \u00b6 The frontier-squid installation will create one user account unless it already exists. User Comment squid Reduced privilege user that the squid process runs under. Set the default gid of the \"squid\" user to be a group that is also called \"squid\". The package can instead use another user name of your choice if you create a configuration file before installation. Details are in the upstream documentation Preparation section . Networking \u00b6 Open the following ports on your Frontier Squid hosts: Port Number Protocol WAN LAN Comment 3128 tcp \u2713 Also limited in squid ACLs. Should be limited to access from your worker nodes 3401 udp \u2713 Also limited in squid ACLs. Should be limited to public monitoring server addresses The addresses of the WLCG monitoring servers for use in firewalls are listed in the upstream documentation Enabling monitoring section . Frontier Squid Log Files \u00b6 Log file contents are explained in the upstream documentation Log file contents section .","title":"Install Frontier Squid RPM"},{"location":"data/frontier-squid/#install-the-frontier-squid-http-caching-proxy","text":"Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. This document is intended for System Administrators who are installing frontier-squid , the OSG distribution of the Frontier Squid software.","title":"Install the Frontier Squid HTTP Caching Proxy"},{"location":"data/frontier-squid/#frontier-squid-is-recommended","text":"OSG recommends that all sites run a caching proxy for HTTP and HTTPS to help reduce bandwidth and improve throughput. To that end, Compute Element (CE) installations include Frontier Squid automatically. We encourage all sites to configure and use this service, as described below. For large sites that expect heavy load on the proxy, it is best to run the proxy on its own host. If you are unsure if your site qualifies, we recommend initially running the proxy on your CE host and monitoring its bandwidth. If the network usage regularly peaks at over one third of the bandwidth capacity, move the proxy to a new host.","title":"Frontier Squid Is Recommended"},{"location":"data/frontier-squid/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If it does not exist already, the installation will create the squid Linux user Network ports: Clients within your cluster (e.g., OSG user jobs) will communicate with Frontier Squid on port 3128 (TCP). Additionally, central infrastructure will monitor Frontier Squid through port 3401 (UDP); see this section for more details. Host choice: If you will be supporting the Frontier application at your site, review the upstream documentation to determine how to size your equipment. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories","title":"Before Starting"},{"location":"data/frontier-squid/#installing-frontier-squid","text":"To install Frontier Squid, make sure that your host is up to date before installing the required packages: Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install Frontier Squid: root@host # yum install frontier-squid","title":"Installing Frontier Squid"},{"location":"data/frontier-squid/#configuring-frontier-squid","text":"","title":"Configuring Frontier Squid"},{"location":"data/frontier-squid/#configuring-the-frontier-squid-service","text":"To configure the Frontier Squid service itself: Follow the Configuration section of the upstream Frontier Squid documentation . Enable, start, and test the service (as described below). Register the squid (also as described below ). Note An important difference between the standard Squid software and the Frontier Squid variant is that Frontier Squid changes are in /etc/squid/customize.sh instead of /etc/squid/squid.conf .","title":"Configuring the Frontier Squid Service"},{"location":"data/frontier-squid/#configuring-the-osg-ce","text":"To configure the OSG Compute Entrypoint (CE) to know about your Frontier Squid service: On your CE host (which may be different than your Frontier Squid host), edit /etc/osg/config.d/01-squid.ini Make sure that enabled is set to True Set location to the hostname and port of your Frontier Squid service (e.g., my.squid.host.edu:3128 ) Leave the other settings at DEFAULT unless you have specific reasons to change them Run osg-configure -c to propagate the changes on your CE. Note You may want to finish other CE configuration tasks before running osg-configure . Just be sure to run it once before starting CE services.","title":"Configuring the OSG CE"},{"location":"data/frontier-squid/#using-frontier-squid","text":"Start the frontier-squid service and enable it to start at boot time. As a reminder, here are common service commands (all run as root ): To... Run the command... Start the service systemctl start frontier-squid Stop the service systemctl stop frontier-squid Enable the service to start on boot systemctl enable frontier-squid Disable the service from starting on boot systemctl disable frontier-squid","title":"Using Frontier-Squid"},{"location":"data/frontier-squid/#validating-frontier-squid","text":"As any user on another computer, do the following (where <MY.SQUID.HOST.EDU> is the fully qualified domain name of your squid server): user@host $ export http_proxy = http:// ` <MY.SQUID.HOST.EDU> ` :3128 user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from `<MY.SQUID.HOST.EDU>` user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from `<MY.SQUID.HOST.EDU>` If the grep doesn't print anything, try removing it from the pipeline to see if errors are obvious. If the second try says MISS again, something is probably wrong with the squid cache writes. Look at the squid access.log file to try to see what's wrong. If your squid will be supporting the Frontier application, it is also good to do the test in the upstream documentation Testing the installation section .","title":"Validating Frontier Squid"},{"location":"data/frontier-squid/#registering-frontier-squid","text":"To register your Frontier Squid host, follow the general registration instructions here with the following Frontier Squid-specific details. Alternatively, contact us for assistance with the registration process. Add a Squid: section to the Services: list, with any relevant fields for that service. This is a partial example: ... FQDN: <FULLY QUALIFIED DOMAIN NAME> Services: Squid: Description: Generic squid service ... Replacing <FULLY QUALIFIED DOMAIN NAME> with your Frontier Squid server's DNS entry or in the case of multiple Frontier Squid servers for a single resource, the round-robin DNS entry. See the BNL_ATLAS_Frontier_Squid for a complete example. Normally registered squids will be monitored by WLCG. This is strongly recommended even for non-WLCG sites so operations experts can help with diagnosing problems. However, if a site declines monitoring, that can be indicated by setting Monitored: false in a Details: section below Description: . Registration is still important for the sake of excluding squids from worker node failover monitors. The default if Details: Monitored: is not set is true . If you set Monitored to true, also enable monitoring as described in the upstream documentation on enabling monitoring . A few hours after a squid is registered and marked Active (and not marked Monitored: false ), verify that it is monitored by WLCG .","title":"Registering Frontier Squid"},{"location":"data/frontier-squid/#reference","text":"","title":"Reference"},{"location":"data/frontier-squid/#users","text":"The frontier-squid installation will create one user account unless it already exists. User Comment squid Reduced privilege user that the squid process runs under. Set the default gid of the \"squid\" user to be a group that is also called \"squid\". The package can instead use another user name of your choice if you create a configuration file before installation. Details are in the upstream documentation Preparation section .","title":"Users"},{"location":"data/frontier-squid/#networking","text":"Open the following ports on your Frontier Squid hosts: Port Number Protocol WAN LAN Comment 3128 tcp \u2713 Also limited in squid ACLs. Should be limited to access from your worker nodes 3401 udp \u2713 Also limited in squid ACLs. Should be limited to public monitoring server addresses The addresses of the WLCG monitoring servers for use in firewalls are listed in the upstream documentation Enabling monitoring section .","title":"Networking"},{"location":"data/frontier-squid/#frontier-squid-log-files","text":"Log file contents are explained in the upstream documentation Log file contents section .","title":"Frontier Squid Log Files"},{"location":"data/run-frontier-squid-container/","text":"Running Frontier Squid in a Container \u00b6 Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. Tip OSG recommends that all sites run a caching proxy for HTTP to help reduce bandwidth and improve throughput. This document outlines how to run Frontier Squid in a Docker container. Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Frontier Squid Reference section as needed): Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: Frontier squid communicates on ports 3128 (TCP) and 3401 (UDP). We encourage sites to allow monitoring on port 3401 via UDP from CERN IP address ranges, 128.142.0.0/16, 188.184.128.0/17, 188.185.48.0/20 and 188.185.128.0/17. See the CERN monitoring documentation for additional details. If outgoing connections are filtered, note that CVMFS always uses ports 8000, 80, or 8080. Host choice: If you will be supporting the Frontier application at your site, review the upstream documentation to determine how to size your equipment. Configuring Squid \u00b6 Environment variables (optional) \u00b6 In addition to the required configuration above (ports and file systems), you may also configure the behavior of your cache with the following environment variables: Variable name Description Defaults SQUID_IPRANGE Limits the incoming connections to the provided whitelist. By default only standard private network addresses are whitelisted. SQUID_CACHE_DISK Sets the cache_dir option which determines the disk size squid uses. Must be an integer value, and its unit is MBs. Note: The cache disk area is located at /var/cache/squid. Defaults to 10000. SQUID_CACHE_MEM Sets the cache_mem option which regulates the size squid reserves for caching small objects in memory. Includes a space and unit, e.g. \"MB\". Defaults to \"128 MB\". Cache Disk Size For production deployments, OSG recommends allocating at least 50 to 100 GB (50000 to 100000 MB) to SQUID_CACHE_DISK. Mount points \u00b6 In order to preserve the cache between redeployments, you should map the following areas to persistent storage outside the container: Mountpoint Description Example docker mount /var/cache/squid This directory contains the cache for squid. See also SQUID_CACHE_DISK above. -v /tmp/squid:/var/cache/squid /var/log/squid This directory contains the squid logs. -v /tmp/log:/var/log/squid For more details, see the Frontier Squid documentation . Configuration customization (optional) \u00b6 More complicated configuration customization can be done by mounting .sh and .awk files into /etc/squid/customize.d. For details on the names and content of those files see the comments in the customization script and see the upstream documentation on configuration customization. Running a Frontier Squid Container \u00b6 To run a Frontier Squid container with the defaults: user@host $ docker run --rm --name frontier-squid \\ -v <HOST CACHE PARTITION>:/var/cache/squid \\ -v <HOST LOG PARTITION>:/var/log/squid \\ -p <HOST PORT>:3128 opensciencegrid/frontier-squid:3.6-release You may pass configuration variables in KEY=VALUE format with either docker -e options or in a file specified with --env-file=<FILENAME> . Running a Frontier Squid container with systemd \u00b6 An example systemd service file for Frontier Squid. This will require creating the environment file in the directory /opt/xcache/.env . Note This example systemd file assumes <HOST PORT> is 3128 and <HOST CACHE PARTITION> is /tmp/squid and <HOST LOG PARTITION> is /tmp/log . Create the systemd service file /etc/systemd/system/docker.frontier-squid.service as follows: [Unit] Description=Stash Cache Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/frontier-squid:3.6-release ExecStart=/usr/bin/docker run --rm --name %n --publish 3128:3128 -v /tmp/squid:/var/cache/squid -v /tmp/log:/var/log/squid --env-file /opt/xcache/.env opensciencegrid/frontier-squid:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.frontier-squid root@host $ systemctl start docker.frontier-squid Validating the Frontier Squid Cache \u00b6 The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl or wget . Here, <HOST PORT> is the port chosen in the docker run command, 3128 by default. user@host $ export http_proxy = http://localhost:<HOST PORT> user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from 797a56e426cf user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from 797a56e426cf Registering Frontier Squid \u00b6 See the Registering Frontier Squid instructions to register your Frontier Squid host. Getting Help \u00b6 To get assistance, please use the this page .","title":"Running Frontier Squid in a Container"},{"location":"data/run-frontier-squid-container/#running-frontier-squid-in-a-container","text":"Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. Tip OSG recommends that all sites run a caching proxy for HTTP to help reduce bandwidth and improve throughput. This document outlines how to run Frontier Squid in a Docker container.","title":"Running Frontier Squid in a Container"},{"location":"data/run-frontier-squid-container/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Frontier Squid Reference section as needed): Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: Frontier squid communicates on ports 3128 (TCP) and 3401 (UDP). We encourage sites to allow monitoring on port 3401 via UDP from CERN IP address ranges, 128.142.0.0/16, 188.184.128.0/17, 188.185.48.0/20 and 188.185.128.0/17. See the CERN monitoring documentation for additional details. If outgoing connections are filtered, note that CVMFS always uses ports 8000, 80, or 8080. Host choice: If you will be supporting the Frontier application at your site, review the upstream documentation to determine how to size your equipment.","title":"Before Starting"},{"location":"data/run-frontier-squid-container/#configuring-squid","text":"","title":"Configuring Squid"},{"location":"data/run-frontier-squid-container/#environment-variables-optional","text":"In addition to the required configuration above (ports and file systems), you may also configure the behavior of your cache with the following environment variables: Variable name Description Defaults SQUID_IPRANGE Limits the incoming connections to the provided whitelist. By default only standard private network addresses are whitelisted. SQUID_CACHE_DISK Sets the cache_dir option which determines the disk size squid uses. Must be an integer value, and its unit is MBs. Note: The cache disk area is located at /var/cache/squid. Defaults to 10000. SQUID_CACHE_MEM Sets the cache_mem option which regulates the size squid reserves for caching small objects in memory. Includes a space and unit, e.g. \"MB\". Defaults to \"128 MB\". Cache Disk Size For production deployments, OSG recommends allocating at least 50 to 100 GB (50000 to 100000 MB) to SQUID_CACHE_DISK.","title":"Environment variables (optional)"},{"location":"data/run-frontier-squid-container/#mount-points","text":"In order to preserve the cache between redeployments, you should map the following areas to persistent storage outside the container: Mountpoint Description Example docker mount /var/cache/squid This directory contains the cache for squid. See also SQUID_CACHE_DISK above. -v /tmp/squid:/var/cache/squid /var/log/squid This directory contains the squid logs. -v /tmp/log:/var/log/squid For more details, see the Frontier Squid documentation .","title":"Mount points"},{"location":"data/run-frontier-squid-container/#configuration-customization-optional","text":"More complicated configuration customization can be done by mounting .sh and .awk files into /etc/squid/customize.d. For details on the names and content of those files see the comments in the customization script and see the upstream documentation on configuration customization.","title":"Configuration customization (optional)"},{"location":"data/run-frontier-squid-container/#running-a-frontier-squid-container","text":"To run a Frontier Squid container with the defaults: user@host $ docker run --rm --name frontier-squid \\ -v <HOST CACHE PARTITION>:/var/cache/squid \\ -v <HOST LOG PARTITION>:/var/log/squid \\ -p <HOST PORT>:3128 opensciencegrid/frontier-squid:3.6-release You may pass configuration variables in KEY=VALUE format with either docker -e options or in a file specified with --env-file=<FILENAME> .","title":"Running a Frontier Squid Container"},{"location":"data/run-frontier-squid-container/#running-a-frontier-squid-container-with-systemd","text":"An example systemd service file for Frontier Squid. This will require creating the environment file in the directory /opt/xcache/.env . Note This example systemd file assumes <HOST PORT> is 3128 and <HOST CACHE PARTITION> is /tmp/squid and <HOST LOG PARTITION> is /tmp/log . Create the systemd service file /etc/systemd/system/docker.frontier-squid.service as follows: [Unit] Description=Stash Cache Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/frontier-squid:3.6-release ExecStart=/usr/bin/docker run --rm --name %n --publish 3128:3128 -v /tmp/squid:/var/cache/squid -v /tmp/log:/var/log/squid --env-file /opt/xcache/.env opensciencegrid/frontier-squid:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.frontier-squid root@host $ systemctl start docker.frontier-squid","title":"Running a Frontier Squid container with systemd"},{"location":"data/run-frontier-squid-container/#validating-the-frontier-squid-cache","text":"The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl or wget . Here, <HOST PORT> is the port chosen in the docker run command, 3128 by default. user@host $ export http_proxy = http://localhost:<HOST PORT> user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from 797a56e426cf user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from 797a56e426cf","title":"Validating the Frontier Squid Cache"},{"location":"data/run-frontier-squid-container/#registering-frontier-squid","text":"See the Registering Frontier Squid instructions to register your Frontier Squid host.","title":"Registering Frontier Squid"},{"location":"data/run-frontier-squid-container/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/update-oasis/","text":"Updating Software in OASIS \u00b6 OASIS is the OSG Application Software Installation Service that can be used to publish and update software on OSG Worker Nodes under /cvmfs/oasis.opensciencegrid.org . It is implemented using CernVM FileSystem (CVMFS) technology and is the recommended method to make software available to researchers in the OSG Consortium. This document is a step by step explanation of how a member of a Virtual Organization (VO) can become an OASIS manager for their VO and gain access to the shared OASIS service for software management. The shared OASIS service is especially appropropriate for VOs that have a relatively small number of members and a relatively small amount of software to distribute. Larger VOs should consider hosting their own separate repositories . Note For information on how to configure an OASIS client see the CVMFS installation documentation . Requirements \u00b6 To begin the process to distribute software on OASIS using the service, you must: Register as an OSG contact and upload your SSH Key . Submit a request to help@osg-htc.org to become an OASIS manager with the following: The names of the VO(s) whose software that you would like to manage with the shared OASIS login host The names of any other VO members that should be OASIS managers The name of a member of the VO(s) that can verify your affiliation, and Cc that person on your emailed request How to use OASIS \u00b6 Log in with SSH \u00b6 The shared OASIS login server is accessible via SSH for all OASIS managers with registered SSH keys: user@host $ ssh -i <PATH TO SSH KEY> ouser.<VO>@oasis-login.opensciencegrid.org Change <VO> for the name of the Virtual Organization you are trying to access and <PATH TO SSH KEY> with the path to the private part of the SSH key whose public part you registered with the OSG . Instead of putting -i <PATH TO SSH KEY> or ouser.<VO>@ on the command line, you can put it in your ~/.ssh/config : Host oasis-login.opensciencegrid.org User ouser.<VO> IdentityFile <PATH TO SSH KEY> Install and update software \u00b6 Once you log in, you can add/modify/remove content on a staging area at /stage/oasis/$VO where $VO is the name of the VO represented by the manager. Files here are visible to both oasis-login and the Stratum 0 server (oasis.opensciencegrid.org). There is a symbolic link at /cvmfs/oasis.opensciencegrid.org/$VO that points to the same staging area. Request an oasis publish with this command: user@host $ osg-oasis-update This command queues a process to sync the content of OASIS with the content of /stage/oasis/$VO osg-oasis-update returns immediately, but only one update can run at a time (across all VOs); your request may be queued behind a different VO. If you encounter severe delays before the update is finished being published (more than 4 hours), please file a support ticket . Limitations on repository content \u00b6 Although CVMFS provides a POSIX filesystem, it does not work well with all types of content. Content in OASIS is expected to adhere to the CVMFS repository content limitations so please review those guidelines carefully. Testing \u00b6 After osg-oasis-update completes and the changes have been propagated to the CVMFS stratum 1 servers (typically between 0 and 60 minutes, but possibly longer if the servers are busy with updates of other repositories) then the changes can be visible under /cvmfs/oasis.opensciencegrid.org on a computer that has the CVMFS client installed . A client normally only checks for updates if at least an hour has passed since it last checked, but people who have superuser access on the client machine can force it to check again with root@host # cvmfs_talk -i oasis.opensciencegrid.org remount This can be done while the filesystem is mounted (despite the name, it does not do an OS-level umount/mount of the filesystem). If the filesystem is not mounted, it will automatically check for new updates the next time it is mounted. In order to find out if an update has reached the CVMFS stratum 1 server, you can find out the latest osg-oasis-update time seen by the stratum 1 most favored by your CVMFS client with the following long command on your client machine: user@host $ date -d \"1970-1-1 GMT + $( wget -qO- $( attr -qg host /cvmfs/oasis.opensciencegrid.org ) /.cvmfspublished | \\ cat -v | sed -n '/^T/{s/^T//p;q;}' ) sec\" References \u00b6 CVMFS Documentation","title":"Update OASIS Shared Repo"},{"location":"data/update-oasis/#updating-software-in-oasis","text":"OASIS is the OSG Application Software Installation Service that can be used to publish and update software on OSG Worker Nodes under /cvmfs/oasis.opensciencegrid.org . It is implemented using CernVM FileSystem (CVMFS) technology and is the recommended method to make software available to researchers in the OSG Consortium. This document is a step by step explanation of how a member of a Virtual Organization (VO) can become an OASIS manager for their VO and gain access to the shared OASIS service for software management. The shared OASIS service is especially appropropriate for VOs that have a relatively small number of members and a relatively small amount of software to distribute. Larger VOs should consider hosting their own separate repositories . Note For information on how to configure an OASIS client see the CVMFS installation documentation .","title":"Updating Software in OASIS"},{"location":"data/update-oasis/#requirements","text":"To begin the process to distribute software on OASIS using the service, you must: Register as an OSG contact and upload your SSH Key . Submit a request to help@osg-htc.org to become an OASIS manager with the following: The names of the VO(s) whose software that you would like to manage with the shared OASIS login host The names of any other VO members that should be OASIS managers The name of a member of the VO(s) that can verify your affiliation, and Cc that person on your emailed request","title":"Requirements"},{"location":"data/update-oasis/#how-to-use-oasis","text":"","title":"How to use OASIS"},{"location":"data/update-oasis/#log-in-with-ssh","text":"The shared OASIS login server is accessible via SSH for all OASIS managers with registered SSH keys: user@host $ ssh -i <PATH TO SSH KEY> ouser.<VO>@oasis-login.opensciencegrid.org Change <VO> for the name of the Virtual Organization you are trying to access and <PATH TO SSH KEY> with the path to the private part of the SSH key whose public part you registered with the OSG . Instead of putting -i <PATH TO SSH KEY> or ouser.<VO>@ on the command line, you can put it in your ~/.ssh/config : Host oasis-login.opensciencegrid.org User ouser.<VO> IdentityFile <PATH TO SSH KEY>","title":"Log in with SSH"},{"location":"data/update-oasis/#install-and-update-software","text":"Once you log in, you can add/modify/remove content on a staging area at /stage/oasis/$VO where $VO is the name of the VO represented by the manager. Files here are visible to both oasis-login and the Stratum 0 server (oasis.opensciencegrid.org). There is a symbolic link at /cvmfs/oasis.opensciencegrid.org/$VO that points to the same staging area. Request an oasis publish with this command: user@host $ osg-oasis-update This command queues a process to sync the content of OASIS with the content of /stage/oasis/$VO osg-oasis-update returns immediately, but only one update can run at a time (across all VOs); your request may be queued behind a different VO. If you encounter severe delays before the update is finished being published (more than 4 hours), please file a support ticket .","title":"Install and update software"},{"location":"data/update-oasis/#limitations-on-repository-content","text":"Although CVMFS provides a POSIX filesystem, it does not work well with all types of content. Content in OASIS is expected to adhere to the CVMFS repository content limitations so please review those guidelines carefully.","title":"Limitations on repository content"},{"location":"data/update-oasis/#testing","text":"After osg-oasis-update completes and the changes have been propagated to the CVMFS stratum 1 servers (typically between 0 and 60 minutes, but possibly longer if the servers are busy with updates of other repositories) then the changes can be visible under /cvmfs/oasis.opensciencegrid.org on a computer that has the CVMFS client installed . A client normally only checks for updates if at least an hour has passed since it last checked, but people who have superuser access on the client machine can force it to check again with root@host # cvmfs_talk -i oasis.opensciencegrid.org remount This can be done while the filesystem is mounted (despite the name, it does not do an OS-level umount/mount of the filesystem). If the filesystem is not mounted, it will automatically check for new updates the next time it is mounted. In order to find out if an update has reached the CVMFS stratum 1 server, you can find out the latest osg-oasis-update time seen by the stratum 1 most favored by your CVMFS client with the following long command on your client machine: user@host $ date -d \"1970-1-1 GMT + $( wget -qO- $( attr -qg host /cvmfs/oasis.opensciencegrid.org ) /.cvmfspublished | \\ cat -v | sed -n '/^T/{s/^T//p;q;}' ) sec\"","title":"Testing"},{"location":"data/update-oasis/#references","text":"CVMFS Documentation","title":"References"},{"location":"data/stashcache/install-cache/","text":"Installing the OSDF Cache \u00b6 This document describes how to install an Open Science Data Federation (OSDF) cache service. This service allows a site or regional network to cache data frequently used on the OSG, reducing data transfer over the wide-area network and decreasing access latency. Minimum version for this documentation This document describes features introduced in XCache 3.3.0, released on 2022-12-08. When installing, ensure that your version of the stash-cache RPM is at least 3.3.0. Note The OSDF cache was previously named \"Stash Cache\" and some documentation and software may use the old name. Before Starting \u00b6 Before starting the installation process, consider the following requirements: Operating system: Ensure the host has a supported operating system User IDs: If they do not exist already, the installation will create the Linux user IDs condor and xrootd Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request and install host certificates. Network ports: Your host may run a public cache instance (for serving public data only), an authenticated cache instance (for serving protected data), or both. A public cache instance requires the following ports open: Inbound TCP port 1094 for file access via the XRootD protocol Inbound TCP port 8000 for file access via HTTP(S) Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring An authenticated cache instance requires the following ports open: Inbound TCP port 8443 for authenticated file access via HTTPS Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 1TB of disk space for the cache directory, and 12GB of RAM. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates Registering the Cache \u00b6 To be part of the OSDF, your cache must be registered with the OSG. You will need basic information like the resource name, hostname, host certificate DN, and the administrative and security contacts. Initial registration \u00b6 To register your cache host, follow the general registration instructions here . The service type is XRootD cache server . Info This step must be completed before installation. In your registration, you must specify which VOs your cache will serve by adding an AllowedVOs list, with each line specifying a VO whose data you are willing to cache. There are special values you may use in AllowedVOs : ANY_PUBLIC indicates that the cache is willing to serve public data from any VO. ANY indicates that the cache is willing to serve data from any VO, both public and protected. ANY implies ANY_PUBLIC . There are extra requirements for serving protected data: In addition to the cache allowing a VO in the AllowedVOs list, that VO must also allow the cache in its AllowedCaches list. See the page on getting your VO's data into OSDF . There must be an authenticated XRootD instance on the cache server. There must be a DN attribute in the resource registration with the subject DN of the host certificate This is an example registration for a cache server that serves all public data: MY_OSDF_CACHE : FQDN : my-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - ANY_PUBLIC This is an example registration for a cache server that only serves protected data for the Open Science Pool: MY_AUTH_OSDF_CACHE : FQDN : my-auth-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-auth-cache.example.net This is an example registration for a cache server that serves all public data and protected data from the OSG VO: MY_COMBO_OSDF_CACHE : FQDN : my-combo-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG - ANY_PUBLIC DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-combo-cache.example.net Non-standard ports \u00b6 By default, an unauthenticated cache instance serves public data on port 8000, and an authenticated cache instance serves protected data on port 8443. If you change the ports for your cache instances, you must specify the new endpoints under the service, as follows: MY_COMBO_OSDF_CACHE2 : FQDN : my-combo-cache2.example.net Services : XRootD cache server : Description : OSDF cache server Details : endpoint_override : my-combo-cache2.example.net:8080 auth_endpoint_override : my-combo-cache2.example.net:8444 Finalizing registration \u00b6 Once initial registration is complete, you may start the installation process. In the meantime, open a help ticket with your cache name. Mention in your ticket that you would like to \"Finalize the cache registration.\" Installing the Cache \u00b6 The OSDF software consists of an XRootD server with special configuration and supporting services. To simplify installation, OSG provides convenience RPMs that install all required packages with a single command: root@host # yum install stash-cache Configuring the Cache \u00b6 First, you must create a \"cache directory\", which will be used to store downloaded files. By default this is /mnt/stash . We recommend using a separate file system for the cache directory, with at least 1 TB of storage available. Note The cache directory must be writable by the xrootd:xrootd user and group. The stash-cache package provides default configuration files in /etc/xrootd/xrootd-stash-cache.cfg and /etc/xrootd/config.d/ . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d/1*.cfg (for files that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for files that need to be processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG. Ensure the xrootd service has a certificate \u00b6 The service will need a certificate for reporting and to authenticate to origins. The easiest solution for this is to use your host certificate and key as follows: Copy the host certificate to /etc/grid-security/xrd/xrd{cert,key}.pem Set the owner of the directory and contents /etc/grid-security/xrd/ to xrootd:xrootd : root@host # chown -R xrootd:xrootd /etc/grid-security/xrd/ Note You must repeat the above steps whenever you renew your host certificate. If you automate certificate renewal, you should automate copying as well. In addition, you will need to restart the XRootD services ( xrootd@stash-cache and/or xrootd@stash-cache-auth ) so they load the updated certificates. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site . Configuring Optional Features \u00b6 Adjust disk utilization \u00b6 To adjust the disk utilization of your cache, create or edit a file named /etc/xrootd/config.d/90-local.cfg and set the values of pfc.diskusage . pfc.diskusage 0.90 0.95 The two values correspond to the low and high usage water marks, respectively. When usage goes above the high water mark, the XRootD service will delete cached files until usage goes below the low water mark. Enable remote debugging \u00b6 XRootD provides remote debugging via a read-only file system named digFS. This feature is disabled by default, but you may enable it if you need help troubleshooting your server. Warning Remote debugging should only be enabled for long as it is needed to troubleshoot your server. To enable remote debugging, edit /etc/xrootd/digauth.cfg and specify the authorizations for reading digFS. An example of authorizations: all allow gsi g=/glow h=*.cs.wisc.edu This gives access to the config file, log files, core files, and process information to anyone from *.cs.wisc.edu in the /glow VOMS group. See the XRootD manual for the full syntax. Remote debugging should only be enabled for as long as you need assistance. As soon as your issue has been resolved, revert any changes you have made to /etc/xrootd/digauth.cfg . Enable HTTPS on the unauthenticated cache \u00b6 By default, the unauthenticated cache instance uses plain HTTP, not HTTPS. To use HTTPS: Add a certificate according to the instructions above Uncomment set EnableVoms = 1 in /etc/xrootd/config.d/10-osg-xrdvoms.cfg Upgrading from OSG 3.5 If upgrading from OSG 3.5, you may have a file with the following contents in /etc/xrootd/config.d : # Support HTTPS access to unauthenticated cache if named stash-cache http.cadir /etc/grid-security/certificates http.cert /etc/grid-security/xrd/xrdcert.pem http.key /etc/grid-security/xrd/xrdkey.pem http.secxtractor /usr/lib64/libXrdLcmaps.so fi You must delete this config block or XRootD will fail to start. Manually Setting the FQDN (optional) \u00b6 The FQDN of the cache server that you registered in Topology may be different than its internal hostname (as reported by hostname -f ). For example, this may be the case if your cache is behind a load balancer such as LVS. In this case, you must manually tell the cache services which FQDN to use for topology lookups. Create the file /etc/systemd/system/stash-authfile@.service.d/override.conf (note the @ in the directory name) with the following contents: [Service] Environment = CACHE_FQDN=<Topology-registered FQDN> Run systemctl daemon-reload after modifying the file. Managing OSDF services \u00b6 These services must be managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> Public cache services \u00b6 Software Service name Notes XRootD xrootd@stash-cache.service The XRootD daemon, which performs the data transfers XCache xcache-reporter.timer Reports usage information to collector.opensciencegrid.org Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron Required to authenticate monitoring services. See CA documentation for more info stash-authfile@stash-cache.service Generate authentication configuration files for XRootD (public cache instance) stash-authfile@stash-cache.timer Periodically run the above service (public cache instance) Authenticated cache services \u00b6 Software Service name Notes XRootD xrootd-renew-proxy.service Renew a proxy for authenticated downloads to the cache xrootd@stash-cache-auth.service The xrootd daemon which performs authenticated data transfers xrootd-renew-proxy.timer Trigger daily proxy renewal stash-authfile@stash-cache-auth.service Generate the authentication configuration files for XRootD (authenticated cache instance) stash-authfile@stash-cache-auth.timer Periodically run the above service (authenticated cache instance) Validating the Cache \u00b6 The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl . user@host $ curl -O http://cache_host:8000/osgconnect/public/rynge/test.data curl may not correctly report a failure, so verify that the contents of the file are: hello world! Test cache server reporting to the central collector \u00b6 To verify the cache is reporting to the central collector, run the following command from the cache server: user@host $ condor_status -any -pool collector.opensciencegrid.org:9619 \\ -l -const \"Name==\\\"xrootd@`hostname`\\\"\" The output of the above command should detail what the collector knows about the status of your cache. Here is an example snippet of the output: AuthenticatedIdentity = \"sc-cache.chtc.wisc.edu@daemon.opensciencegrid.org\" AuthenticationMethod = \"GSI\" free_cache_bytes = 868104454144 free_cache_fraction = 0.8022261674321525 LastHeardFrom = 1552002482 most_recent_access_time = 1551997049 MyType = \"Machine\" Name = \"xrootd@sc-cache.chtc.wisc.edu\" ping_elapsed_time = 0.00763392448425293 ping_response_code = 0 ping_response_message = \"[SUCCESS] \" ping_response_status = \"ok\" STASHCACHE_DaemonVersion = \"1.0.0\" ... Updating to OSG 3.6 \u00b6 The OSG 3.5 series reached end-of-life on May 1, 2022. Admins are strongly encouraged to move their caches to OSG 3.6. See general update instructions . Unauthenticated caches ( xrootd@stash-cache service) do not need any configuration changes, unless HTTPS access has been enabled. See the \"enable HTTPS on the unauthenticated cache\" section ) for the necessary configuration changes. Authenticated caches ( xrootd@stash-cache-auth service) may need the configuration changes described in the updating to OSG 3.6 section of the XRootD authorization configuration document. Getting Help \u00b6 To get assistance, please use the this page .","title":"Install from RPM"},{"location":"data/stashcache/install-cache/#installing-the-osdf-cache","text":"This document describes how to install an Open Science Data Federation (OSDF) cache service. This service allows a site or regional network to cache data frequently used on the OSG, reducing data transfer over the wide-area network and decreasing access latency. Minimum version for this documentation This document describes features introduced in XCache 3.3.0, released on 2022-12-08. When installing, ensure that your version of the stash-cache RPM is at least 3.3.0. Note The OSDF cache was previously named \"Stash Cache\" and some documentation and software may use the old name.","title":"Installing the OSDF Cache"},{"location":"data/stashcache/install-cache/#before-starting","text":"Before starting the installation process, consider the following requirements: Operating system: Ensure the host has a supported operating system User IDs: If they do not exist already, the installation will create the Linux user IDs condor and xrootd Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request and install host certificates. Network ports: Your host may run a public cache instance (for serving public data only), an authenticated cache instance (for serving protected data), or both. A public cache instance requires the following ports open: Inbound TCP port 1094 for file access via the XRootD protocol Inbound TCP port 8000 for file access via HTTP(S) Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring An authenticated cache instance requires the following ports open: Inbound TCP port 8443 for authenticated file access via HTTPS Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 1TB of disk space for the cache directory, and 12GB of RAM. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"data/stashcache/install-cache/#registering-the-cache","text":"To be part of the OSDF, your cache must be registered with the OSG. You will need basic information like the resource name, hostname, host certificate DN, and the administrative and security contacts.","title":"Registering the Cache"},{"location":"data/stashcache/install-cache/#initial-registration","text":"To register your cache host, follow the general registration instructions here . The service type is XRootD cache server . Info This step must be completed before installation. In your registration, you must specify which VOs your cache will serve by adding an AllowedVOs list, with each line specifying a VO whose data you are willing to cache. There are special values you may use in AllowedVOs : ANY_PUBLIC indicates that the cache is willing to serve public data from any VO. ANY indicates that the cache is willing to serve data from any VO, both public and protected. ANY implies ANY_PUBLIC . There are extra requirements for serving protected data: In addition to the cache allowing a VO in the AllowedVOs list, that VO must also allow the cache in its AllowedCaches list. See the page on getting your VO's data into OSDF . There must be an authenticated XRootD instance on the cache server. There must be a DN attribute in the resource registration with the subject DN of the host certificate This is an example registration for a cache server that serves all public data: MY_OSDF_CACHE : FQDN : my-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - ANY_PUBLIC This is an example registration for a cache server that only serves protected data for the Open Science Pool: MY_AUTH_OSDF_CACHE : FQDN : my-auth-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-auth-cache.example.net This is an example registration for a cache server that serves all public data and protected data from the OSG VO: MY_COMBO_OSDF_CACHE : FQDN : my-combo-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG - ANY_PUBLIC DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-combo-cache.example.net","title":"Initial registration"},{"location":"data/stashcache/install-cache/#non-standard-ports","text":"By default, an unauthenticated cache instance serves public data on port 8000, and an authenticated cache instance serves protected data on port 8443. If you change the ports for your cache instances, you must specify the new endpoints under the service, as follows: MY_COMBO_OSDF_CACHE2 : FQDN : my-combo-cache2.example.net Services : XRootD cache server : Description : OSDF cache server Details : endpoint_override : my-combo-cache2.example.net:8080 auth_endpoint_override : my-combo-cache2.example.net:8444","title":"Non-standard ports"},{"location":"data/stashcache/install-cache/#finalizing-registration","text":"Once initial registration is complete, you may start the installation process. In the meantime, open a help ticket with your cache name. Mention in your ticket that you would like to \"Finalize the cache registration.\"","title":"Finalizing registration"},{"location":"data/stashcache/install-cache/#installing-the-cache","text":"The OSDF software consists of an XRootD server with special configuration and supporting services. To simplify installation, OSG provides convenience RPMs that install all required packages with a single command: root@host # yum install stash-cache","title":"Installing the Cache"},{"location":"data/stashcache/install-cache/#configuring-the-cache","text":"First, you must create a \"cache directory\", which will be used to store downloaded files. By default this is /mnt/stash . We recommend using a separate file system for the cache directory, with at least 1 TB of storage available. Note The cache directory must be writable by the xrootd:xrootd user and group. The stash-cache package provides default configuration files in /etc/xrootd/xrootd-stash-cache.cfg and /etc/xrootd/config.d/ . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d/1*.cfg (for files that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for files that need to be processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG.","title":"Configuring the Cache"},{"location":"data/stashcache/install-cache/#ensure-the-xrootd-service-has-a-certificate","text":"The service will need a certificate for reporting and to authenticate to origins. The easiest solution for this is to use your host certificate and key as follows: Copy the host certificate to /etc/grid-security/xrd/xrd{cert,key}.pem Set the owner of the directory and contents /etc/grid-security/xrd/ to xrootd:xrootd : root@host # chown -R xrootd:xrootd /etc/grid-security/xrd/ Note You must repeat the above steps whenever you renew your host certificate. If you automate certificate renewal, you should automate copying as well. In addition, you will need to restart the XRootD services ( xrootd@stash-cache and/or xrootd@stash-cache-auth ) so they load the updated certificates. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site .","title":"Ensure the xrootd service has a certificate"},{"location":"data/stashcache/install-cache/#configuring-optional-features","text":"","title":"Configuring Optional Features"},{"location":"data/stashcache/install-cache/#adjust-disk-utilization","text":"To adjust the disk utilization of your cache, create or edit a file named /etc/xrootd/config.d/90-local.cfg and set the values of pfc.diskusage . pfc.diskusage 0.90 0.95 The two values correspond to the low and high usage water marks, respectively. When usage goes above the high water mark, the XRootD service will delete cached files until usage goes below the low water mark.","title":"Adjust disk utilization"},{"location":"data/stashcache/install-cache/#enable-remote-debugging","text":"XRootD provides remote debugging via a read-only file system named digFS. This feature is disabled by default, but you may enable it if you need help troubleshooting your server. Warning Remote debugging should only be enabled for long as it is needed to troubleshoot your server. To enable remote debugging, edit /etc/xrootd/digauth.cfg and specify the authorizations for reading digFS. An example of authorizations: all allow gsi g=/glow h=*.cs.wisc.edu This gives access to the config file, log files, core files, and process information to anyone from *.cs.wisc.edu in the /glow VOMS group. See the XRootD manual for the full syntax. Remote debugging should only be enabled for as long as you need assistance. As soon as your issue has been resolved, revert any changes you have made to /etc/xrootd/digauth.cfg .","title":"Enable remote debugging"},{"location":"data/stashcache/install-cache/#enable-https-on-the-unauthenticated-cache","text":"By default, the unauthenticated cache instance uses plain HTTP, not HTTPS. To use HTTPS: Add a certificate according to the instructions above Uncomment set EnableVoms = 1 in /etc/xrootd/config.d/10-osg-xrdvoms.cfg Upgrading from OSG 3.5 If upgrading from OSG 3.5, you may have a file with the following contents in /etc/xrootd/config.d : # Support HTTPS access to unauthenticated cache if named stash-cache http.cadir /etc/grid-security/certificates http.cert /etc/grid-security/xrd/xrdcert.pem http.key /etc/grid-security/xrd/xrdkey.pem http.secxtractor /usr/lib64/libXrdLcmaps.so fi You must delete this config block or XRootD will fail to start.","title":"Enable HTTPS on the unauthenticated cache"},{"location":"data/stashcache/install-cache/#manually-setting-the-fqdn-optional","text":"The FQDN of the cache server that you registered in Topology may be different than its internal hostname (as reported by hostname -f ). For example, this may be the case if your cache is behind a load balancer such as LVS. In this case, you must manually tell the cache services which FQDN to use for topology lookups. Create the file /etc/systemd/system/stash-authfile@.service.d/override.conf (note the @ in the directory name) with the following contents: [Service] Environment = CACHE_FQDN=<Topology-registered FQDN> Run systemctl daemon-reload after modifying the file.","title":"Manually Setting the FQDN (optional)"},{"location":"data/stashcache/install-cache/#managing-osdf-services","text":"These services must be managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME>","title":"Managing OSDF services"},{"location":"data/stashcache/install-cache/#public-cache-services","text":"Software Service name Notes XRootD xrootd@stash-cache.service The XRootD daemon, which performs the data transfers XCache xcache-reporter.timer Reports usage information to collector.opensciencegrid.org Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron Required to authenticate monitoring services. See CA documentation for more info stash-authfile@stash-cache.service Generate authentication configuration files for XRootD (public cache instance) stash-authfile@stash-cache.timer Periodically run the above service (public cache instance)","title":"Public cache services"},{"location":"data/stashcache/install-cache/#authenticated-cache-services","text":"Software Service name Notes XRootD xrootd-renew-proxy.service Renew a proxy for authenticated downloads to the cache xrootd@stash-cache-auth.service The xrootd daemon which performs authenticated data transfers xrootd-renew-proxy.timer Trigger daily proxy renewal stash-authfile@stash-cache-auth.service Generate the authentication configuration files for XRootD (authenticated cache instance) stash-authfile@stash-cache-auth.timer Periodically run the above service (authenticated cache instance)","title":"Authenticated cache services"},{"location":"data/stashcache/install-cache/#validating-the-cache","text":"The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl . user@host $ curl -O http://cache_host:8000/osgconnect/public/rynge/test.data curl may not correctly report a failure, so verify that the contents of the file are: hello world!","title":"Validating the Cache"},{"location":"data/stashcache/install-cache/#test-cache-server-reporting-to-the-central-collector","text":"To verify the cache is reporting to the central collector, run the following command from the cache server: user@host $ condor_status -any -pool collector.opensciencegrid.org:9619 \\ -l -const \"Name==\\\"xrootd@`hostname`\\\"\" The output of the above command should detail what the collector knows about the status of your cache. Here is an example snippet of the output: AuthenticatedIdentity = \"sc-cache.chtc.wisc.edu@daemon.opensciencegrid.org\" AuthenticationMethod = \"GSI\" free_cache_bytes = 868104454144 free_cache_fraction = 0.8022261674321525 LastHeardFrom = 1552002482 most_recent_access_time = 1551997049 MyType = \"Machine\" Name = \"xrootd@sc-cache.chtc.wisc.edu\" ping_elapsed_time = 0.00763392448425293 ping_response_code = 0 ping_response_message = \"[SUCCESS] \" ping_response_status = \"ok\" STASHCACHE_DaemonVersion = \"1.0.0\" ...","title":"Test cache server reporting to the central collector"},{"location":"data/stashcache/install-cache/#updating-to-osg-36","text":"The OSG 3.5 series reached end-of-life on May 1, 2022. Admins are strongly encouraged to move their caches to OSG 3.6. See general update instructions . Unauthenticated caches ( xrootd@stash-cache service) do not need any configuration changes, unless HTTPS access has been enabled. See the \"enable HTTPS on the unauthenticated cache\" section ) for the necessary configuration changes. Authenticated caches ( xrootd@stash-cache-auth service) may need the configuration changes described in the updating to OSG 3.6 section of the XRootD authorization configuration document.","title":"Updating to OSG 3.6"},{"location":"data/stashcache/install-cache/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/stashcache/install-origin/","text":"Installing the OSDF Origin \u00b6 This document describes how to install an Open Science Data Federation (OSDF) origin service. This service allows an organization to export its data to the data federation. Minimum version for this documentation This document describes features introduced in XCache 3.3.0, released on 2022-12-08. When installing, ensure that your version of the stash-origin RPM is at least 3.3.0. Note The OSDF Origin was previously named \"Stash Origin\" and some documentation and software may use the old name. Note The origin must be registered with the OSG prior to joining the data federation. You may start the registration process prior to finishing the installation by using this link along with information like: Resource name and hostname VO associated with this origin server (which will be used to determine the origin's namespace prefix) Administrative and security contact(s) Who (or what) will be allowed to access the VO's data Which caches will be allowed to cache the VO data Before Starting \u00b6 Before starting the installation process, consider the following requirements: Operating system: A RHEL 7 or RHEL 8 or compatible operating systems. User IDs: If they do not exist already, the installation will create the Linux user IDs condor and xrootd ; only the xrootd user is utilized for the running daemons. Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request and install host certificates. Network ports: The origin service requires the following ports open: Inbound TCP port 1094 for unauthenticated file access via the XRoot or HTTP protocols (if serving public data) Inbound TCP port 1095 for authenticated file access via the XRoot or HTTPS protocols (if serving authenticated data) Outbound TCP port 1213 to redirector.osgstorage.org for connecting to the data federation Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring. Hardware requirements: We recommend that an origin has at least 1Gbps connectivity and 8GB of RAM. We suggest that several gigabytes of local disk space be available for log files, although some logging verbosity can be reduced. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates Installing the Origin \u00b6 The origin service consists of one or more XRootD daemons and their dependencies for the authentication infrastructure. To simplify installation, OSG provides convenience RPMs that install all required software with a single command: root@host # yum install stash-origin For this installation guide, we assume that the data to be exported to the federation is mounted at /mnt/stash and owned by the xrootd:xrootd user. Configuring the Origin Server \u00b6 The stash-origin package provides a default configuration files in /etc/xrootd/xrootd-stash-origin.cfg and /etc/xrootd/config.d . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d of the form /etc/xrootd/config.d/1*.cfg (for directives that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for directives that are processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg and /etc/xrootd/config.d/10-origin-site-local.cfg . The mandatory variables to configure are: File Config line Description 10-common-site-local.cfg set rootdir = /mnt/stash The mounted filesystem path to export; this document calls it /mnt/stash 10-common-site-local.cfg set resourcename = YOUR_RESOURCE_NAME The resource name registered with OSG 10-origin-site-local.cfg set PublicOriginExport = /VO/PUBLIC The directory relative to rootdir that is the top of the exported namespace for public (unauthenticated) origin services 10-origin-site-local.cfg set AuthOriginExport = /VO/PUBLIC The directory relative to rootdir that is the top of the exported namespace for authenticated origin services For example, if the HCC VO would like to set up an origin server exporting from the mount point /mnt/stash , and HCC has a public registered namespace at /hcc/PUBLIC , then the following would be set in 10-common-site-local.cfg : set rootdir = /mnt/stash set resourcename = HCC_OSDF_ORIGIN And the following would be set in 10-origin-site-local.cfg : set PublicOriginExport = /hcc/PUBLIC With this configuration, the data under /mnt/stash/hcc/PUBLIC/bio/datasets would be available under the path /hcc/PUBLIC/bio/datasets in the OSDF namespace and the data under /mnt/stash/hcc/PUBLIC/hep/generators would be available under the path /hcc/PUBLIC/hep/generators in the OSDF namespace. If the HCC has a protected registered namespace at /hcc/PROTECTED then set the following in 10-origin-site-local.cfg : set AuthOriginExport = /hcc/PROTECTED If you are serving public data from the origin, you must set PublicOriginExport and use the xrootd@stash-origin service. If you are serving protected data from the origin, you must set AuthOriginExport and use the xrootd@stash-origin-auth service (if not using xrootd-multiuser ) or xrootd-privileged@stash-origin-auth service (if using xrootd-multiuser ). Warning The OSDF namespace is a global namespace. Directories you export must not collide with directories provided by other origin servers; this is why the explicit registration is required. Manually Setting the FQDN (optional) \u00b6 The FQDN of the origin server that you registered in Topology may be different than its internal hostname (as reported by hostname -f ). For example, this may be the case if your origin is behind a load balancer such as LVS. In this case, you must manually tell the origin services which FQDN to use for topology lookups. Create the file /etc/systemd/system/stash-authfile@.service.d/override.conf with the following contents: [Service] Environment = ORIGIN_FQDN=<Topology-registered FQDN> Run systemctl daemon-reload after modifying the file. Managing the Origin Services \u00b6 Serving data for an origin is done by the xrootd daemon. There can be multiple instances of xrootd , running on different ports. The instance that serves unauthenticated data will run on port 1094. The instance that serves authenticated data will run on port 1095. If your origin serves both authenticated and unauthenticated data, you will run both instances. Use of multiuser plugin Some of the service names are different if you have configured the XRootD Multiuser plugin : - xrootd-privileged is used instead of xrootd - cmsd-privileged is used instead of cmsd The privileged and non-privileged services are mutually exclusive. The origin services consist of the following SystemD units that you must directly manage: Service name Notes xrootd@stash-origin.service Performs data transfers (unauthenticated instance) xrootd@stash-origin-auth.service Performs data transfers (authenticated instance without multiuser ) xrootd-privileged@stash-origin-auth.service Performs data transfers (authenticated instance with multiuser ) These services must be managed with systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> In addition, the origin service automatically uses the following SystemD units: Service name Notes cmsd@stash-origin.service Integrates the origin into the data federation (unauthenticated instance) cmsd@stash-origin-auth.service Integrates the origin into the data federation (authenticated instance without multiuser ) cmsd-privileged@stash-origin-auth.service Integrates the origin into the data federation (authenticated instance with multiuser ) stash-authfile@stash-origin.timer Updates the authorization files periodically (unauthenticated instance) stash-authfile@stash-origin-auth.timer Updates the authorization files periodically (authenticated instance) Verifying the Origin Server \u00b6 Once your server has been registered with the OSG and started, perform the following steps to verify that it is functional. Testing availability \u00b6 To verify that your origin is correctly advertising its availability, run the following command from the origin server: [user@server ~]$ xrdmapc -r --list s redirector.osgstorage.org:1094 0**** redirector.osgstorage.org:1094 Srv ceph-gridftp1.grid.uchicago.edu:1094 Srv stashcache.fnal.gov:1094 Srv stash.osgconnect.net:1094 Srv origin.ligo.caltech.edu:1094 Srv csiu.grid.iu.edu:1094 The output should list the hostname of your origin server. Testing directory export \u00b6 To verify that the directories you are exporting are visible from the redirector, run the following command from the origin server: [user@server ~]$ xrdmapc -r --verify --list s redirector.osgstorage.org:1094 <EXPORTED DIR> 0*rv* redirector.osgstorage.org:1094 >+ Srv ceph-gridftp1.grid.uchicago.edu:1094 ? Srv stashcache.fnal.gov:1094 [not authorized] >+ Srv stash.osgconnect.net:1094 - Srv origin.ligo.caltech.edu:1094 ? Srv csiu.grid.iu.edu:1094 [connect error] Change <EXPORTED_DIR> for the directory the service is suppose to export. Your server should be marked with a >+ to indicate that it contains the given path and the path was accessible. Testing file access (unauthenticated origin) \u00b6 To verify that you can download a file from the origin server, use the stashcp tool, which is available in the stashcp RPM. Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a publicly accessible path. Run the following command: [user@host]$ stashcp <TEST FILE> /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. If unsuccessful, you can pass the -d flag to stashcp for debug info. You can also test directly downloading from the origin via xrdcp , which is available in the xrootd-client RPM. Run the following command: [user@host]$ xrdcp xroot://<origin server>:1094/<TEST FILE> /tmp/testfile Testing file access (authenticated origin) \u00b6 In order to download files from the origin, caches must be able to access the origin via SSL certificates. To test SSL authentication, use the curl command. Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a protected location. As root on your origin, run the following command: [root@host]# curl --cert /etc/grid-security/hostcert.pem \\ --key /etc/grid-security/hostkey.pem \\ https://<origin server>:1095/<TEST FILE> \\ -o /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. Note This test requires including the DN of your origin in your origin's OSG Topology registration . To verify that a user can download a file from the origin server, use the stashcp tool, which is available in the stashcp RPM. Obtain a credential (a SciToken or WLCG Token, depending on your origin's configuration). Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a path you expect to be accessible using the credential you just obtained. Run the following command: [user@host]$ stashcp <TEST FILE> /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. If unsuccessful, you can pass the -d flag to stashcp for debug info. Registering the Origin \u00b6 To be part of the Open Science Data Federation, your origin must be registered with the OSG . The service type is XRootD origin server . The resource must also specify which VOs it will serve data from. To do this, add an AllowedVOs list, with each line specifying a VO whose data the resource is willing to host. For example: MY_OSDF_ORIGIN : Services : XRootD origin server : Description : OSDF origin server AllowedVOs : - GLOW - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-osdf-origin.example.net You can use the special value ANY to indicate that the origin will serve data from any VO that puts data on it. In addition to the origin allowing a VOs via the AllowedVOs list, that VO must also allow the origin in one of its AllowedOrigins lists in DataFederation/StashCache/Namespaces . See the page on getting your VO's data into OSDF . Specifying the DN of your origin is not required but it is useful for testing. Updating to OSG 3.6 \u00b6 The OSG 3.5 series reached end-of-life on May 1, 2022. Admins are strongly encouraged to move their origins to OSG 3.6. See general update instructions . Unauthenticated origins ( xrootd@stash-origin service) do not need any configuration changes. Authenticated origins ( xrootd@stash-origin-auth service) may need the configuration changes described in the updating to OSG 3.6 section of the XRootD authorization configuration document. Getting Help \u00b6 To get assistance, please use the this page .","title":"Install from RPM"},{"location":"data/stashcache/install-origin/#installing-the-osdf-origin","text":"This document describes how to install an Open Science Data Federation (OSDF) origin service. This service allows an organization to export its data to the data federation. Minimum version for this documentation This document describes features introduced in XCache 3.3.0, released on 2022-12-08. When installing, ensure that your version of the stash-origin RPM is at least 3.3.0. Note The OSDF Origin was previously named \"Stash Origin\" and some documentation and software may use the old name. Note The origin must be registered with the OSG prior to joining the data federation. You may start the registration process prior to finishing the installation by using this link along with information like: Resource name and hostname VO associated with this origin server (which will be used to determine the origin's namespace prefix) Administrative and security contact(s) Who (or what) will be allowed to access the VO's data Which caches will be allowed to cache the VO data","title":"Installing the OSDF Origin"},{"location":"data/stashcache/install-origin/#before-starting","text":"Before starting the installation process, consider the following requirements: Operating system: A RHEL 7 or RHEL 8 or compatible operating systems. User IDs: If they do not exist already, the installation will create the Linux user IDs condor and xrootd ; only the xrootd user is utilized for the running daemons. Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request and install host certificates. Network ports: The origin service requires the following ports open: Inbound TCP port 1094 for unauthenticated file access via the XRoot or HTTP protocols (if serving public data) Inbound TCP port 1095 for authenticated file access via the XRoot or HTTPS protocols (if serving authenticated data) Outbound TCP port 1213 to redirector.osgstorage.org for connecting to the data federation Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring. Hardware requirements: We recommend that an origin has at least 1Gbps connectivity and 8GB of RAM. We suggest that several gigabytes of local disk space be available for log files, although some logging verbosity can be reduced. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"data/stashcache/install-origin/#installing-the-origin","text":"The origin service consists of one or more XRootD daemons and their dependencies for the authentication infrastructure. To simplify installation, OSG provides convenience RPMs that install all required software with a single command: root@host # yum install stash-origin For this installation guide, we assume that the data to be exported to the federation is mounted at /mnt/stash and owned by the xrootd:xrootd user.","title":"Installing the Origin"},{"location":"data/stashcache/install-origin/#configuring-the-origin-server","text":"The stash-origin package provides a default configuration files in /etc/xrootd/xrootd-stash-origin.cfg and /etc/xrootd/config.d . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d of the form /etc/xrootd/config.d/1*.cfg (for directives that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for directives that are processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg and /etc/xrootd/config.d/10-origin-site-local.cfg . The mandatory variables to configure are: File Config line Description 10-common-site-local.cfg set rootdir = /mnt/stash The mounted filesystem path to export; this document calls it /mnt/stash 10-common-site-local.cfg set resourcename = YOUR_RESOURCE_NAME The resource name registered with OSG 10-origin-site-local.cfg set PublicOriginExport = /VO/PUBLIC The directory relative to rootdir that is the top of the exported namespace for public (unauthenticated) origin services 10-origin-site-local.cfg set AuthOriginExport = /VO/PUBLIC The directory relative to rootdir that is the top of the exported namespace for authenticated origin services For example, if the HCC VO would like to set up an origin server exporting from the mount point /mnt/stash , and HCC has a public registered namespace at /hcc/PUBLIC , then the following would be set in 10-common-site-local.cfg : set rootdir = /mnt/stash set resourcename = HCC_OSDF_ORIGIN And the following would be set in 10-origin-site-local.cfg : set PublicOriginExport = /hcc/PUBLIC With this configuration, the data under /mnt/stash/hcc/PUBLIC/bio/datasets would be available under the path /hcc/PUBLIC/bio/datasets in the OSDF namespace and the data under /mnt/stash/hcc/PUBLIC/hep/generators would be available under the path /hcc/PUBLIC/hep/generators in the OSDF namespace. If the HCC has a protected registered namespace at /hcc/PROTECTED then set the following in 10-origin-site-local.cfg : set AuthOriginExport = /hcc/PROTECTED If you are serving public data from the origin, you must set PublicOriginExport and use the xrootd@stash-origin service. If you are serving protected data from the origin, you must set AuthOriginExport and use the xrootd@stash-origin-auth service (if not using xrootd-multiuser ) or xrootd-privileged@stash-origin-auth service (if using xrootd-multiuser ). Warning The OSDF namespace is a global namespace. Directories you export must not collide with directories provided by other origin servers; this is why the explicit registration is required.","title":"Configuring the Origin Server"},{"location":"data/stashcache/install-origin/#manually-setting-the-fqdn-optional","text":"The FQDN of the origin server that you registered in Topology may be different than its internal hostname (as reported by hostname -f ). For example, this may be the case if your origin is behind a load balancer such as LVS. In this case, you must manually tell the origin services which FQDN to use for topology lookups. Create the file /etc/systemd/system/stash-authfile@.service.d/override.conf with the following contents: [Service] Environment = ORIGIN_FQDN=<Topology-registered FQDN> Run systemctl daemon-reload after modifying the file.","title":"Manually Setting the FQDN (optional)"},{"location":"data/stashcache/install-origin/#managing-the-origin-services","text":"Serving data for an origin is done by the xrootd daemon. There can be multiple instances of xrootd , running on different ports. The instance that serves unauthenticated data will run on port 1094. The instance that serves authenticated data will run on port 1095. If your origin serves both authenticated and unauthenticated data, you will run both instances. Use of multiuser plugin Some of the service names are different if you have configured the XRootD Multiuser plugin : - xrootd-privileged is used instead of xrootd - cmsd-privileged is used instead of cmsd The privileged and non-privileged services are mutually exclusive. The origin services consist of the following SystemD units that you must directly manage: Service name Notes xrootd@stash-origin.service Performs data transfers (unauthenticated instance) xrootd@stash-origin-auth.service Performs data transfers (authenticated instance without multiuser ) xrootd-privileged@stash-origin-auth.service Performs data transfers (authenticated instance with multiuser ) These services must be managed with systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> In addition, the origin service automatically uses the following SystemD units: Service name Notes cmsd@stash-origin.service Integrates the origin into the data federation (unauthenticated instance) cmsd@stash-origin-auth.service Integrates the origin into the data federation (authenticated instance without multiuser ) cmsd-privileged@stash-origin-auth.service Integrates the origin into the data federation (authenticated instance with multiuser ) stash-authfile@stash-origin.timer Updates the authorization files periodically (unauthenticated instance) stash-authfile@stash-origin-auth.timer Updates the authorization files periodically (authenticated instance)","title":"Managing the Origin Services"},{"location":"data/stashcache/install-origin/#verifying-the-origin-server","text":"Once your server has been registered with the OSG and started, perform the following steps to verify that it is functional.","title":"Verifying the Origin Server"},{"location":"data/stashcache/install-origin/#testing-availability","text":"To verify that your origin is correctly advertising its availability, run the following command from the origin server: [user@server ~]$ xrdmapc -r --list s redirector.osgstorage.org:1094 0**** redirector.osgstorage.org:1094 Srv ceph-gridftp1.grid.uchicago.edu:1094 Srv stashcache.fnal.gov:1094 Srv stash.osgconnect.net:1094 Srv origin.ligo.caltech.edu:1094 Srv csiu.grid.iu.edu:1094 The output should list the hostname of your origin server.","title":"Testing availability"},{"location":"data/stashcache/install-origin/#testing-directory-export","text":"To verify that the directories you are exporting are visible from the redirector, run the following command from the origin server: [user@server ~]$ xrdmapc -r --verify --list s redirector.osgstorage.org:1094 <EXPORTED DIR> 0*rv* redirector.osgstorage.org:1094 >+ Srv ceph-gridftp1.grid.uchicago.edu:1094 ? Srv stashcache.fnal.gov:1094 [not authorized] >+ Srv stash.osgconnect.net:1094 - Srv origin.ligo.caltech.edu:1094 ? Srv csiu.grid.iu.edu:1094 [connect error] Change <EXPORTED_DIR> for the directory the service is suppose to export. Your server should be marked with a >+ to indicate that it contains the given path and the path was accessible.","title":"Testing directory export"},{"location":"data/stashcache/install-origin/#testing-file-access-unauthenticated-origin","text":"To verify that you can download a file from the origin server, use the stashcp tool, which is available in the stashcp RPM. Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a publicly accessible path. Run the following command: [user@host]$ stashcp <TEST FILE> /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. If unsuccessful, you can pass the -d flag to stashcp for debug info. You can also test directly downloading from the origin via xrdcp , which is available in the xrootd-client RPM. Run the following command: [user@host]$ xrdcp xroot://<origin server>:1094/<TEST FILE> /tmp/testfile","title":"Testing file access (unauthenticated origin)"},{"location":"data/stashcache/install-origin/#testing-file-access-authenticated-origin","text":"In order to download files from the origin, caches must be able to access the origin via SSL certificates. To test SSL authentication, use the curl command. Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a protected location. As root on your origin, run the following command: [root@host]# curl --cert /etc/grid-security/hostcert.pem \\ --key /etc/grid-security/hostkey.pem \\ https://<origin server>:1095/<TEST FILE> \\ -o /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. Note This test requires including the DN of your origin in your origin's OSG Topology registration . To verify that a user can download a file from the origin server, use the stashcp tool, which is available in the stashcp RPM. Obtain a credential (a SciToken or WLCG Token, depending on your origin's configuration). Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a path you expect to be accessible using the credential you just obtained. Run the following command: [user@host]$ stashcp <TEST FILE> /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. If unsuccessful, you can pass the -d flag to stashcp for debug info.","title":"Testing file access (authenticated origin)"},{"location":"data/stashcache/install-origin/#registering-the-origin","text":"To be part of the Open Science Data Federation, your origin must be registered with the OSG . The service type is XRootD origin server . The resource must also specify which VOs it will serve data from. To do this, add an AllowedVOs list, with each line specifying a VO whose data the resource is willing to host. For example: MY_OSDF_ORIGIN : Services : XRootD origin server : Description : OSDF origin server AllowedVOs : - GLOW - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-osdf-origin.example.net You can use the special value ANY to indicate that the origin will serve data from any VO that puts data on it. In addition to the origin allowing a VOs via the AllowedVOs list, that VO must also allow the origin in one of its AllowedOrigins lists in DataFederation/StashCache/Namespaces . See the page on getting your VO's data into OSDF . Specifying the DN of your origin is not required but it is useful for testing.","title":"Registering the Origin"},{"location":"data/stashcache/install-origin/#updating-to-osg-36","text":"The OSG 3.5 series reached end-of-life on May 1, 2022. Admins are strongly encouraged to move their origins to OSG 3.6. See general update instructions . Unauthenticated origins ( xrootd@stash-origin service) do not need any configuration changes. Authenticated origins ( xrootd@stash-origin-auth service) may need the configuration changes described in the updating to OSG 3.6 section of the XRootD authorization configuration document.","title":"Updating to OSG 3.6"},{"location":"data/stashcache/install-origin/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/stashcache/overview/","text":"Open Science Data Federation Overview \u00b6 The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data at each site. The map below shows the location of the current caches in the federation: Joining and Using the OSDF \u00b6 We support three types of deployments: We operate the service for you. All you need is provide us with a Kubernetes host to deploy our container into. This is our preferred way for you to join. It is conceptually described on our home website for an origin. A cache would be deployed exactly the same way. If this is how you want to join OSDF, please send email to support@osg-htc.org and we will guide you through the process. You can deploy our container yourself as described in our documentation . You can deploy from RPM as described in our documentation We strongly suggest that you allow us to operate these services for you (option 1) . The software that implements the service changes frequently enough, and is complicated enough, that keeping up with changes may require significant effort. If your installation is deemed too out-of-date, your service may be excluded from the OSDF. For more information on the OSDF , please see our overview page .","title":"Overview"},{"location":"data/stashcache/overview/#open-science-data-federation-overview","text":"The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data at each site. The map below shows the location of the current caches in the federation:","title":"Open Science Data Federation Overview"},{"location":"data/stashcache/overview/#joining-and-using-the-osdf","text":"We support three types of deployments: We operate the service for you. All you need is provide us with a Kubernetes host to deploy our container into. This is our preferred way for you to join. It is conceptually described on our home website for an origin. A cache would be deployed exactly the same way. If this is how you want to join OSDF, please send email to support@osg-htc.org and we will guide you through the process. You can deploy our container yourself as described in our documentation . You can deploy from RPM as described in our documentation We strongly suggest that you allow us to operate these services for you (option 1) . The software that implements the service changes frequently enough, and is complicated enough, that keeping up with changes may require significant effort. If your installation is deemed too out-of-date, your service may be excluded from the OSDF. For more information on the OSDF , please see our overview page .","title":"Joining and Using the OSDF"},{"location":"data/stashcache/run-stash-origin-container/","text":"Running OSDF Origin in a Container \u00b6 The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data across sites or operate their own scalable infrastructure. Origins store copies of users' data. Each community (or experiment) needs to run one origin to export its data via the federation. This document outlines how to run such an origin in a Docker container. Note The OSDF Origin was previously named \"Stash Origin\" and some documentation and software may use the old name. Before Starting \u00b6 Before starting the installation process, consider the following requirements: Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: The origin listens for incoming HTTP(S) and XRootD connections on ports 1094 and/or 1095. 1094 is used for serving public (unauthenticated) data, and 1095 is used for serving authenticated data. File Systems: The origin needs a host partition to store user data. Hardware requirements: We recommend that an origin has at least 1Gbps connectivity and 8GB of RAM. Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request host certificates. Registration: Before deploying an origin, you must register the service in the OSG Topology Note This document describes features introduced in XCache 3.2.2, released on 2022-09-29. You must use a version of the opensciencegrid/stash-origin image built after that date. Configuring the Origin \u00b6 In addition to the required configuration above (ports and file systems), you may also configure the behavior of your origin with the following variables using an environment variable file: Where the environment file on the docker host, /opt/origin/.env , has (at least) the following contents, replacing <YOUR_RESOURCE_NAME> with the resource name of your origin as registered in Topology and <FQDN> with the public DNS name that should be used to contact your origin: XC_RESOURCENAME=YOUR_SITE_NAME ORIGIN_FQDN=<FQDN> In addition, define the following variables to specify which subpaths should be served as public (unauthenticated) data on port 1094, and which subpaths should be served as authenticated data on port 1095: XC_PUBLIC_ORIGIN_EXPORT=/<VO>/PUBLIC XC_AUTH_ORIGIN_EXPORT=/<VO>/PROTECTED These paths are relative to the host partition being served -- see the Populating Origin Data section below. If you only define XC_AUTH_ORIGIN_EXPORT , you will only serve data on port 1095. If you only define XC_PUBLIC_ORIGIN_EXPORT , you will only serve data on port 1094. If you do not define either, you will serve the entire host partition as public data on port 1094. Note For backward compatibility, XC_ORIGINEXPORT is accepted as an alias for XC_PUBLIC_ORIGIN_EXPORT . Providing a host certificate \u00b6 The service will need a certificate for contacting central OSDF services and for authenticating connections. Follow our host certificate documentation to obtain a host certificate and key. Then, volume-mount the host certificate to /etc/grid-security/hostcert.pem , and the key to /etc/grid-security/hostkey.pem . Note You must restart the container whenever you renew your certificate in order for the services to pick up the new certificate. If you automate certificate renewal, you should automate restarts as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site . Populating Origin Data \u00b6 The OSDF namespace is shared by multiple VOs so you must choose a namespace for your own VO's data. When running an origin container, your chosen namespace must be reflected in your host partition. For example, if your host partition is /srv/origin and the name of your VO is ASTRO , you should store the Astro VO's public data in /srv/origin/astro/PUBLIC , and protected data in /srv/origin/astro/PROTECTED . When starting the container, mount /srv/origin/ into /xcache/namespace in the container, and set the environment variables XC_PUBLIC_ORIGIN_EXPORT=/astro/PUBLIC and XC_AUTH_ORIGIN_EXPORT=/astro/PROTECTED . You may omit XC_AUTH_ORIGIN_EXPORT if you are only serving public data, or omit XC_PUBLIC_ORIGIN_EXPORT if you are only serving protected data. If you omit both, the entire /srv/origin partition will be served as public data. Running the Origin \u00b6 It is recommended to use a container orchestration service such as docker-compose or kubernetes whose details are beyond the scope of this document. The following sections provide examples for starting origin containers from the command-line as well as a more production-appropriate method using systemd. user@host $ docker run --rm --publish 1094 :1094 --publish 1095 :1095 \\ --volume <HOST PARTITION>:/xcache/namespace \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/origin/.env \\ opensciencegrid/stash-origin:3.6-release Replacing <HOST PARTITION> with the host directory containing data that your origin should serve. See this section for details. Warning Unless configured otherwise via the env file /opt/origin/.env , a container deployed this way will serve the entire contents of <HOST PARTITION> . See the Configuring the Origin section for information on how to serve one subpath as public and another as protected. Note You may omit --publish 1094:1094 if you are only serving authenticated data, or omit --publish 1095:1095 if you are only serving public data. Running on origin container with systemd \u00b6 An example systemd service file for the OSDF. This will require creating the environment file in the directory /opt/origin/.env . Note This example systemd file assumes <HOST PARTITION> is /srv/origin , and the cert and key to use are in /etc/ssl/host.crt and /etc/ssl/host.key , respectively. Create the systemd service file /etc/systemd/system/docker.stash-origin.service as follows: [Unit] Description=Origin Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/stash-origin:3.6-release ExecStart=/usr/bin/docker run --rm --name %n \\ --publish 1094:1094 \\ --publish 1095:1095 \\ --volume /srv/origin:/xcache/namespace \\ --volume /etc/ssl/host.crt:/etc/grid-security/hostcert.pem \\ --volume /etc/ssl/host.key:/etc/grid-security/hostkey.pem \\ --env-file /opt/origin/.env \\ opensciencegrid/stash-origin:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.stash-origin root@host $ systemctl start docker.stash-origin Warning Unless configured otherwise via the env file /opt/origin/.env , a container deployed this way will serve the entire contents of /srv/origin . See the Configuring the Origin section for information on how to serve one subpath as public and another as protected. Note You may omit --publish 1094:1094 if you are only serving authenticated data, or omit --publish 1095:1095 if you are only serving public data. Warning You must register the origin before starting it up. Validating the Origin \u00b6 To validate the origin please follow the validating origin instructions . Getting Help \u00b6 To get assistance, please use the this page .","title":"Install from container"},{"location":"data/stashcache/run-stash-origin-container/#running-osdf-origin-in-a-container","text":"The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data across sites or operate their own scalable infrastructure. Origins store copies of users' data. Each community (or experiment) needs to run one origin to export its data via the federation. This document outlines how to run such an origin in a Docker container. Note The OSDF Origin was previously named \"Stash Origin\" and some documentation and software may use the old name.","title":"Running OSDF Origin in a Container"},{"location":"data/stashcache/run-stash-origin-container/#before-starting","text":"Before starting the installation process, consider the following requirements: Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: The origin listens for incoming HTTP(S) and XRootD connections on ports 1094 and/or 1095. 1094 is used for serving public (unauthenticated) data, and 1095 is used for serving authenticated data. File Systems: The origin needs a host partition to store user data. Hardware requirements: We recommend that an origin has at least 1Gbps connectivity and 8GB of RAM. Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request host certificates. Registration: Before deploying an origin, you must register the service in the OSG Topology Note This document describes features introduced in XCache 3.2.2, released on 2022-09-29. You must use a version of the opensciencegrid/stash-origin image built after that date.","title":"Before Starting"},{"location":"data/stashcache/run-stash-origin-container/#configuring-the-origin","text":"In addition to the required configuration above (ports and file systems), you may also configure the behavior of your origin with the following variables using an environment variable file: Where the environment file on the docker host, /opt/origin/.env , has (at least) the following contents, replacing <YOUR_RESOURCE_NAME> with the resource name of your origin as registered in Topology and <FQDN> with the public DNS name that should be used to contact your origin: XC_RESOURCENAME=YOUR_SITE_NAME ORIGIN_FQDN=<FQDN> In addition, define the following variables to specify which subpaths should be served as public (unauthenticated) data on port 1094, and which subpaths should be served as authenticated data on port 1095: XC_PUBLIC_ORIGIN_EXPORT=/<VO>/PUBLIC XC_AUTH_ORIGIN_EXPORT=/<VO>/PROTECTED These paths are relative to the host partition being served -- see the Populating Origin Data section below. If you only define XC_AUTH_ORIGIN_EXPORT , you will only serve data on port 1095. If you only define XC_PUBLIC_ORIGIN_EXPORT , you will only serve data on port 1094. If you do not define either, you will serve the entire host partition as public data on port 1094. Note For backward compatibility, XC_ORIGINEXPORT is accepted as an alias for XC_PUBLIC_ORIGIN_EXPORT .","title":"Configuring the Origin"},{"location":"data/stashcache/run-stash-origin-container/#providing-a-host-certificate","text":"The service will need a certificate for contacting central OSDF services and for authenticating connections. Follow our host certificate documentation to obtain a host certificate and key. Then, volume-mount the host certificate to /etc/grid-security/hostcert.pem , and the key to /etc/grid-security/hostkey.pem . Note You must restart the container whenever you renew your certificate in order for the services to pick up the new certificate. If you automate certificate renewal, you should automate restarts as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site .","title":"Providing a host certificate"},{"location":"data/stashcache/run-stash-origin-container/#populating-origin-data","text":"The OSDF namespace is shared by multiple VOs so you must choose a namespace for your own VO's data. When running an origin container, your chosen namespace must be reflected in your host partition. For example, if your host partition is /srv/origin and the name of your VO is ASTRO , you should store the Astro VO's public data in /srv/origin/astro/PUBLIC , and protected data in /srv/origin/astro/PROTECTED . When starting the container, mount /srv/origin/ into /xcache/namespace in the container, and set the environment variables XC_PUBLIC_ORIGIN_EXPORT=/astro/PUBLIC and XC_AUTH_ORIGIN_EXPORT=/astro/PROTECTED . You may omit XC_AUTH_ORIGIN_EXPORT if you are only serving public data, or omit XC_PUBLIC_ORIGIN_EXPORT if you are only serving protected data. If you omit both, the entire /srv/origin partition will be served as public data.","title":"Populating Origin Data"},{"location":"data/stashcache/run-stash-origin-container/#running-the-origin","text":"It is recommended to use a container orchestration service such as docker-compose or kubernetes whose details are beyond the scope of this document. The following sections provide examples for starting origin containers from the command-line as well as a more production-appropriate method using systemd. user@host $ docker run --rm --publish 1094 :1094 --publish 1095 :1095 \\ --volume <HOST PARTITION>:/xcache/namespace \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/origin/.env \\ opensciencegrid/stash-origin:3.6-release Replacing <HOST PARTITION> with the host directory containing data that your origin should serve. See this section for details. Warning Unless configured otherwise via the env file /opt/origin/.env , a container deployed this way will serve the entire contents of <HOST PARTITION> . See the Configuring the Origin section for information on how to serve one subpath as public and another as protected. Note You may omit --publish 1094:1094 if you are only serving authenticated data, or omit --publish 1095:1095 if you are only serving public data.","title":"Running the Origin"},{"location":"data/stashcache/run-stash-origin-container/#running-on-origin-container-with-systemd","text":"An example systemd service file for the OSDF. This will require creating the environment file in the directory /opt/origin/.env . Note This example systemd file assumes <HOST PARTITION> is /srv/origin , and the cert and key to use are in /etc/ssl/host.crt and /etc/ssl/host.key , respectively. Create the systemd service file /etc/systemd/system/docker.stash-origin.service as follows: [Unit] Description=Origin Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/stash-origin:3.6-release ExecStart=/usr/bin/docker run --rm --name %n \\ --publish 1094:1094 \\ --publish 1095:1095 \\ --volume /srv/origin:/xcache/namespace \\ --volume /etc/ssl/host.crt:/etc/grid-security/hostcert.pem \\ --volume /etc/ssl/host.key:/etc/grid-security/hostkey.pem \\ --env-file /opt/origin/.env \\ opensciencegrid/stash-origin:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.stash-origin root@host $ systemctl start docker.stash-origin Warning Unless configured otherwise via the env file /opt/origin/.env , a container deployed this way will serve the entire contents of /srv/origin . See the Configuring the Origin section for information on how to serve one subpath as public and another as protected. Note You may omit --publish 1094:1094 if you are only serving authenticated data, or omit --publish 1095:1095 if you are only serving public data. Warning You must register the origin before starting it up.","title":"Running on origin container with systemd"},{"location":"data/stashcache/run-stash-origin-container/#validating-the-origin","text":"To validate the origin please follow the validating origin instructions .","title":"Validating the Origin"},{"location":"data/stashcache/run-stash-origin-container/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/stashcache/run-stashcache-container/","text":"Running OSDF Cache in a Container \u00b6 The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data across sites or operate their own scalable infrastructure. OSDF Caches transfer data to clients such as jobs or users. A set of caches are operated across the OSG for the benefit of nearby sites; in addition, each site may run its own cache in order to reduce the amount of data transferred over the WAN. This document outlines how to run a cache in a Docker container. Note The OSDF cache was previously named \"Stash Cache\" and some documentation and software may use the old name. Before Starting \u00b6 Before starting the installation process, consider the following requirements: Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: The cache service requires the following open ports: Inbound TCP port 1094 for unauthenticated file access via the XRootD protocol (optional) Inbound TCP port 8000 for unauthenticated file access via HTTP(S) and/or Inbound TCP port 8443 for authenticated file access via HTTPS Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring File Systems: The cache needs host partitions to store user data. For improved performance and storage, we recommend multiple partitions for handling namespaces (HDD, SSD, or NVMe), data (HDDs), and metadata (SSDs or NVMe). Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request host certificates. Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 1 TB of disk space for the cache directory, and 12GB of RAM. Registering the Cache \u00b6 To be part of the OSDF, your cache must be registered with the OSG. You will need basic information like the resource name, hostname, host certificate DN, and the administrative and security contacts. Initial registration \u00b6 To register your cache host, follow the general registration instructions here . The service type is XRootD cache server . Info This step must be completed before installation. In your registration, you must specify which VOs your cache will serve by adding an AllowedVOs list, with each line specifying a VO whose data you are willing to cache. There are special values you may use in AllowedVOs : ANY_PUBLIC indicates that the cache is willing to serve public data from any VO. ANY indicates that the cache is willing to serve data from any VO, both public and protected. ANY implies ANY_PUBLIC . There are extra requirements for serving protected data: In addition to the cache allowing a VO in the AllowedVOs list, that VO must also allow the cache in its AllowedCaches list. See the page on getting your VO's data into OSDF . There must be an authenticated XRootD instance on the cache server. There must be a DN attribute in the resource registration with the subject DN of the host certificate This is an example registration for a cache server that serves all public data: MY_OSDF_CACHE : FQDN : my-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - ANY_PUBLIC This is an example registration for a cache server that only serves protected data for the Open Science Pool: MY_AUTH_OSDF_CACHE : FQDN : my-auth-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-auth-cache.example.net This is an example registration for a cache server that serves all public data and protected data from the OSG VO: MY_COMBO_OSDF_CACHE : FQDN : my-combo-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG - ANY_PUBLIC DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-combo-cache.example.net Configuring the OSDF Cache \u00b6 In addition to the required configuration above (ports and file systems), you may also configure the behavior of your cache with the following variables using an environment variable file: Where the environment file on the docker host, /opt/xcache/.env , has (at least) the following contents, replacing <YOUR_RESOURCE_NAME> with the name of your resource as registered in Topology and <FQDN> with the public DNS name that should be used to contact your cache: XC_RESOURCENAME=<YOUR_RESOURCE_NAME> CACHE_FQDN=<FQDN> Providing a host certificate \u00b6 The service will need a certificate for contacting central OSDF services and for authenticating to origins. Follow our host certificate documentation to obtain a host certificate and key. Then, volume-mount the host certificate to /etc/grid-security/hostcert.pem , and the key to /etc/grid-security/hostkey.pem . Note You must restart the container whenever you renew your certificate in order for the services to pick up the new certificate. If you automate certificate renewal, you should automate restarts as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site . Optional configuration \u00b6 Further behavior of the cache can be configured by setting the following in the environment variable file: XC_SPACE_HIGH_WM , XC_SPACE_LOW_WM : High-water and low-water marks for disk usage, as numbers between 0.00 (0%) and 1.00 (100%); when usage goes above the high-water mark, the cache will delete files until it hits the low-water mark. XC_RAMSIZE : Amount of memory to use for storing blocks before writting them to disk. (Use higher for slower disks). XC_BLOCKSIZE : Size of the blocks in the cache. XC_PREFETCH : Number of blocks to prefetch from a file at once. This controls how aggressive the cache is to request portions of a file. Running a Cache \u00b6 Cache containers may be run with either multiple mounted host partitions (recommended) or a single host partition. It is recommended to use a container orchestration service such as docker-compose or kubernetes whose details are beyond the scope of this document. The following sections provide examples for starting cache containers from the command-line as well as a more production-appropriate method using systemd. Multiple host partitions (recommended) \u00b6 For improved performance and storage, especially if your cache is serving over 10 TB of data, we recommend multiple partitions for handling namespaces (HDD, SSD, or NVMe), data (HDDs), and metadata (SSDs or NVMe). Note Under this configuration the <NAMESPACE PARTITION> is not used to store the files. Instead, the partition stores symlinks to the files in the metadata and data partitions. user@host $ docker run --rm \\ --publish <HTTP HOST PORT>:8000 \\ --publish <HTTPS HOST PORT>:8443 \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --volume <NAMESPACE PARTITION>:/xcache/namespace \\ --volume <METADATA PARTITION 1 >:/xcache/meta1 ... --volume <METADATA PARTITION N>:/xcache/metaN --volume <DATA PARTITION 1>:/xcache/data1 ... --volume <DATA PARTITION N>:/xcache/dataN --env-file=/opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release Warning For over 10 TB of assigned space we highly encourage to use this setup and mount <NAMESPACE PARTITION> in solid state disks or NVMe. Single host partition \u00b6 For a simpler installation, you may use a single host partition mounted to /xcache/ : user@host $ docker run --rm \\ --publish <HTTP HOST PORT>:8000 \\ --publish <HTTPS HOST PORT>:8443 \\ --volume <HOST PARTITION>:/xcache \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release Running a cache on container with systemd \u00b6 An example systemd service file for the OSDF cache. This will require creating the environment file in the directory /opt/xcache/.env . Note This example systemd file assumes <HTTP HOST PORT> is 8000 , <HTTPS HOST PORT> is 8443 , <HOST PARTITION> is /srv/cache , and the cert and key to use are in /etc/ssl/host.crt and /etc/ssl/host.key , respectively. Create the systemd service file /etc/systemd/system/docker.stash-cache.service as follows: [Unit] Description=Cache Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/stash-cache:3.6-release ExecStart=/usr/bin/docker run --rm --name %n \\ --publish 8000:8000 \\ --publish 8443:8443 \\ --volume /srv/cache:/xcache \\ --volume /etc/ssl/host.crt:/etc/grid-security/hostcert.pem \\ --volume /etc/ssl/host.key:/etc/grid-security/hostkey.pem \\ --env-file /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.stash-cache root@host $ systemctl start docker.stash-cache Warning You must register the cache before starting it up. Network optimization \u00b6 For caches that are connected to NICs over 40 Gbps we recommend that you disable the virtualized network and \"bind\" the container to the host network: user@host $ docker run --rm \\ --network = \"host\" \\ --volume <HOST PARTITION>:/cache \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release Memory optimization \u00b6 The cache uses the host's memory for two purposes: Caching files recently read from disk (via the kernel page cache). Buffering files recently received from the network before writing them to disk (to compensate for slow disks). An easy way to increase the performance of the cache is to assign it more memory. If you set a limit on the container's memory usage via the docker option --memory or Kubernetes resource limits, make sure it is at least twice the value of XC_RAMSIZE . Validating the Cache \u00b6 The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl . Here, <HTTP HOST PORT> is the port chosen in the docker run command, 8000 by default. user@host $ curl -O http://cache_host:<HTTP HOST PORT>/osgconnect/public/rynge/test.data curl may not correctly report a failure, so verify that the contents of the file are: hello world! Getting Help \u00b6 To get assistance, please use the this page .","title":"Install from container"},{"location":"data/stashcache/run-stashcache-container/#running-osdf-cache-in-a-container","text":"The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data across sites or operate their own scalable infrastructure. OSDF Caches transfer data to clients such as jobs or users. A set of caches are operated across the OSG for the benefit of nearby sites; in addition, each site may run its own cache in order to reduce the amount of data transferred over the WAN. This document outlines how to run a cache in a Docker container. Note The OSDF cache was previously named \"Stash Cache\" and some documentation and software may use the old name.","title":"Running OSDF Cache in a Container"},{"location":"data/stashcache/run-stashcache-container/#before-starting","text":"Before starting the installation process, consider the following requirements: Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: The cache service requires the following open ports: Inbound TCP port 1094 for unauthenticated file access via the XRootD protocol (optional) Inbound TCP port 8000 for unauthenticated file access via HTTP(S) and/or Inbound TCP port 8443 for authenticated file access via HTTPS Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring File Systems: The cache needs host partitions to store user data. For improved performance and storage, we recommend multiple partitions for handling namespaces (HDD, SSD, or NVMe), data (HDDs), and metadata (SSDs or NVMe). Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request host certificates. Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 1 TB of disk space for the cache directory, and 12GB of RAM.","title":"Before Starting"},{"location":"data/stashcache/run-stashcache-container/#registering-the-cache","text":"To be part of the OSDF, your cache must be registered with the OSG. You will need basic information like the resource name, hostname, host certificate DN, and the administrative and security contacts.","title":"Registering the Cache"},{"location":"data/stashcache/run-stashcache-container/#initial-registration","text":"To register your cache host, follow the general registration instructions here . The service type is XRootD cache server . Info This step must be completed before installation. In your registration, you must specify which VOs your cache will serve by adding an AllowedVOs list, with each line specifying a VO whose data you are willing to cache. There are special values you may use in AllowedVOs : ANY_PUBLIC indicates that the cache is willing to serve public data from any VO. ANY indicates that the cache is willing to serve data from any VO, both public and protected. ANY implies ANY_PUBLIC . There are extra requirements for serving protected data: In addition to the cache allowing a VO in the AllowedVOs list, that VO must also allow the cache in its AllowedCaches list. See the page on getting your VO's data into OSDF . There must be an authenticated XRootD instance on the cache server. There must be a DN attribute in the resource registration with the subject DN of the host certificate This is an example registration for a cache server that serves all public data: MY_OSDF_CACHE : FQDN : my-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - ANY_PUBLIC This is an example registration for a cache server that only serves protected data for the Open Science Pool: MY_AUTH_OSDF_CACHE : FQDN : my-auth-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-auth-cache.example.net This is an example registration for a cache server that serves all public data and protected data from the OSG VO: MY_COMBO_OSDF_CACHE : FQDN : my-combo-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG - ANY_PUBLIC DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-combo-cache.example.net","title":"Initial registration"},{"location":"data/stashcache/run-stashcache-container/#configuring-the-osdf-cache","text":"In addition to the required configuration above (ports and file systems), you may also configure the behavior of your cache with the following variables using an environment variable file: Where the environment file on the docker host, /opt/xcache/.env , has (at least) the following contents, replacing <YOUR_RESOURCE_NAME> with the name of your resource as registered in Topology and <FQDN> with the public DNS name that should be used to contact your cache: XC_RESOURCENAME=<YOUR_RESOURCE_NAME> CACHE_FQDN=<FQDN>","title":"Configuring the OSDF Cache"},{"location":"data/stashcache/run-stashcache-container/#providing-a-host-certificate","text":"The service will need a certificate for contacting central OSDF services and for authenticating to origins. Follow our host certificate documentation to obtain a host certificate and key. Then, volume-mount the host certificate to /etc/grid-security/hostcert.pem , and the key to /etc/grid-security/hostkey.pem . Note You must restart the container whenever you renew your certificate in order for the services to pick up the new certificate. If you automate certificate renewal, you should automate restarts as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site .","title":"Providing a host certificate"},{"location":"data/stashcache/run-stashcache-container/#optional-configuration","text":"Further behavior of the cache can be configured by setting the following in the environment variable file: XC_SPACE_HIGH_WM , XC_SPACE_LOW_WM : High-water and low-water marks for disk usage, as numbers between 0.00 (0%) and 1.00 (100%); when usage goes above the high-water mark, the cache will delete files until it hits the low-water mark. XC_RAMSIZE : Amount of memory to use for storing blocks before writting them to disk. (Use higher for slower disks). XC_BLOCKSIZE : Size of the blocks in the cache. XC_PREFETCH : Number of blocks to prefetch from a file at once. This controls how aggressive the cache is to request portions of a file.","title":"Optional configuration"},{"location":"data/stashcache/run-stashcache-container/#running-a-cache","text":"Cache containers may be run with either multiple mounted host partitions (recommended) or a single host partition. It is recommended to use a container orchestration service such as docker-compose or kubernetes whose details are beyond the scope of this document. The following sections provide examples for starting cache containers from the command-line as well as a more production-appropriate method using systemd.","title":"Running a Cache"},{"location":"data/stashcache/run-stashcache-container/#multiple-host-partitions-recommended","text":"For improved performance and storage, especially if your cache is serving over 10 TB of data, we recommend multiple partitions for handling namespaces (HDD, SSD, or NVMe), data (HDDs), and metadata (SSDs or NVMe). Note Under this configuration the <NAMESPACE PARTITION> is not used to store the files. Instead, the partition stores symlinks to the files in the metadata and data partitions. user@host $ docker run --rm \\ --publish <HTTP HOST PORT>:8000 \\ --publish <HTTPS HOST PORT>:8443 \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --volume <NAMESPACE PARTITION>:/xcache/namespace \\ --volume <METADATA PARTITION 1 >:/xcache/meta1 ... --volume <METADATA PARTITION N>:/xcache/metaN --volume <DATA PARTITION 1>:/xcache/data1 ... --volume <DATA PARTITION N>:/xcache/dataN --env-file=/opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release Warning For over 10 TB of assigned space we highly encourage to use this setup and mount <NAMESPACE PARTITION> in solid state disks or NVMe.","title":"Multiple host partitions (recommended)"},{"location":"data/stashcache/run-stashcache-container/#single-host-partition","text":"For a simpler installation, you may use a single host partition mounted to /xcache/ : user@host $ docker run --rm \\ --publish <HTTP HOST PORT>:8000 \\ --publish <HTTPS HOST PORT>:8443 \\ --volume <HOST PARTITION>:/xcache \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release","title":"Single host partition"},{"location":"data/stashcache/run-stashcache-container/#running-a-cache-on-container-with-systemd","text":"An example systemd service file for the OSDF cache. This will require creating the environment file in the directory /opt/xcache/.env . Note This example systemd file assumes <HTTP HOST PORT> is 8000 , <HTTPS HOST PORT> is 8443 , <HOST PARTITION> is /srv/cache , and the cert and key to use are in /etc/ssl/host.crt and /etc/ssl/host.key , respectively. Create the systemd service file /etc/systemd/system/docker.stash-cache.service as follows: [Unit] Description=Cache Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/stash-cache:3.6-release ExecStart=/usr/bin/docker run --rm --name %n \\ --publish 8000:8000 \\ --publish 8443:8443 \\ --volume /srv/cache:/xcache \\ --volume /etc/ssl/host.crt:/etc/grid-security/hostcert.pem \\ --volume /etc/ssl/host.key:/etc/grid-security/hostkey.pem \\ --env-file /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.stash-cache root@host $ systemctl start docker.stash-cache Warning You must register the cache before starting it up.","title":"Running a cache on container with systemd"},{"location":"data/stashcache/run-stashcache-container/#network-optimization","text":"For caches that are connected to NICs over 40 Gbps we recommend that you disable the virtualized network and \"bind\" the container to the host network: user@host $ docker run --rm \\ --network = \"host\" \\ --volume <HOST PARTITION>:/cache \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release","title":"Network optimization"},{"location":"data/stashcache/run-stashcache-container/#memory-optimization","text":"The cache uses the host's memory for two purposes: Caching files recently read from disk (via the kernel page cache). Buffering files recently received from the network before writing them to disk (to compensate for slow disks). An easy way to increase the performance of the cache is to assign it more memory. If you set a limit on the container's memory usage via the docker option --memory or Kubernetes resource limits, make sure it is at least twice the value of XC_RAMSIZE .","title":"Memory optimization"},{"location":"data/stashcache/run-stashcache-container/#validating-the-cache","text":"The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl . Here, <HTTP HOST PORT> is the port chosen in the docker run command, 8000 by default. user@host $ curl -O http://cache_host:<HTTP HOST PORT>/osgconnect/public/rynge/test.data curl may not correctly report a failure, so verify that the contents of the file are: hello world!","title":"Validating the Cache"},{"location":"data/stashcache/run-stashcache-container/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/stashcache/vo-data/","text":"Getting VO Data into the OSDF \u00b6 This document describes the steps required to manage a VO's role in the Open Science Data Federation (OSDF) including selecting a namespace, registration, and selecting which resources are allowed to host or cache your data. For general information about the OSDF, see the overview document . Site admins should work together with VO managers in order to perform these steps. Definitions \u00b6 Namespace: a directory tree in the federation that is used to find VO data. Public data: data that can be read by anyone. Protected data: data that requires authorization to read. Requirements \u00b6 In order for a Virtual Organization to join the federation, the VO must already be registered in OSG Topology. See the registration document . Choosing Namespaces \u00b6 The VO must pick one or more \"namespaces\" for their data. A namespace is a directory tree in the federation where VO data is found. Note Namespaces are global across the federation, so you must work with the OSG Operations team to ensure that your VO's namespaces do not collide with those of another VO. Send an email to help@osg-htc.org with the following subject: \"Requesting OSDF namespaces for VO \" and put the desired namespaces in the body of the email. A namespace should be easy for your users to remember but not so generic that it collides with other VOs. We recommend using the lowercase version of your VO as the top-level directory. In addition, public data, if any, should be stored in a subdirectory named PUBLIC , and protected data, if any, should be stored in a subdirectory named PROTECTED . Putting this together, if your VO is named Astro , you should have: /astro/PUBLIC for public data /astro/PROTECTED for protected data Separating the public and protected data in separate directory trees is preferred for technical reasons. Registering Data Federation Information \u00b6 The VO must allow one or more origins to host their data. An origin will typically be hosted on a site owned by the VO. For information about setting up an origin, see the installation document . In order to declare your VO's role in the federation, you must add OSDF information to your VO's YAML file in the OSG Topology repository. For example, the full registration for the Astro VO may look something like the following: DataFederations : StashCache : Namespaces : - Path : /astro/PUBLIC Authorizations : - PUBLIC AllowedCaches : - ANY AllowedOrigins : - ASTRO_OSDF_ORIGIN - Path : /astro/PROTECTED Authorizations : - FQAN : /Astro - DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci - SciTokens : Issuer : https://astro.org Base Path : /astro/PROTECTED AllowedCaches : - ASTRO_EAST_CACHE - ASTRO_WEST_CACHE AllowedOrigins : - ASTRO_AUTH_OSDF_ORIGIN The sections are described below. Namespaces section \u00b6 In the namespaces section, you will declare one or more namespaces. A namespace is a directory tree in the data federation that is owned by a VO/collaboration. Each namespace requires: a Path that is the path to the directory tree, e.g. /astro/PUBLIC an Authorizations list which describes how users are authorized to access data within the namespace an AllowedCaches list of the OSDF caches that are allowed to cache the data within the namespace an AllowedOrigins list of the OSDF origins that are allowed to serve the data within the namespace In addition, a namespace may have the following optional attributes: a Writeback endpoint that is an HTTPS URL like https://stash-xrd.osgconnect.net:1094 that can be used for jobs to write data to the origin a DirList endpoint that is an HTTPS URL like https://origin-auth2001.chtc.wisc.edu:1095 that can be used for getting a directory listing of that namespace Authorizations list \u00b6 The Authorizations list of each namespace describes how a user can get authorized in order to access the data within the namespace. The list will contain one or more of these: FQAN: <VOMS FQAN> allows someone using a proxy with the specified VOMS FQAN DN: <DN> allows someone using a proxy with that specific DN PUBLIC allows anyone; this is used for public data SciTokens allows someone using a SciToken with the given parameters, which are described below A complete declaration looks like: Namespaces : - Path : /astro/PUBLIC Authorizations : - PUBLIC AllowedCaches : ... AllowedOrigins : ... - Path : /astro/PROTECTED Authorizations : - FQAN : /Astro - DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci - SciTokens : Issuer : https://astro.org Base Path : /astro/PROTECTED Map Subject : True AllowedCaches : ... AllowedOrigins : ... This declares two namespaces: /astro/PUBLIC for public data, and /astro/PROTECTED which can only be read by someone with the /Astro FQAN, by Matyas Selmeci, or by someone with a SciToken issued by https://astro.org . SciTokens \u00b6 A SciTokens authorization has multiple parameters: Issuer (required) is the token issuer of the SciToken that the authorization accepts. Base Path (required) is a path that will be prepended to the scopes of the token in order to construct the full path to the file(s) that the bearer of the token is allowed to access. For example, if Base Path is set to /astro/PROTECTED then a token with the scope read:/matyas will have the permission to read from the directory tree under /astro/PROTECTED/matyas . The correct value for Base Path depends on how the issuer is set up, but we recommend that you set Base Path to the namespace path, and configure the issuer to create scopes relative to the namespace path. Map Subject (optional, False if not specified) should be set to True if the origin uses the XRootD-Multiuser plugin. It will cause the origin to use the token subject ( sub field) to map to a Unix user in order to access files. Restricted Path (optional) is a further restriction on paths the token is allowed to access. Only tokens whose scopes start with the Restricted Path will be accepted. Use this only if your issuer does not create relative scopes. AllowedCaches list \u00b6 The VO must allow one or more OSDF caches to cache their data. The more places a VO's data can be cached in, the bigger the data transfer benefit for the VO. The majority of caches across OSG will automatically cache all \"public\" VO data. Caching \"protected\" VO data will often be done on a site owned by the VO. For information about setting up a cache, see the installation document . AllowedCaches is a list of which caches are allowed to host copies of your data. There are two cases: If you only have public data, your AllowedCaches list can look like: AllowedCaches : - ANY This allows any cache to host a copy of your data. If you have some protected data, then AllowedCaches is a list of resources that are allowed to cache your data. A resource is an entry in a /topology/<FACILITY>/<SITE>/<RESOURCEGROUP>.yaml file, for example CHTC_OSDF_CACHE . The following requirements must be met for the resource: It must have an \"XRootD cache server\" service It must have an AllowedVOs list that includes either your VO, \"ANY\", or \"ANY_PUBLIC\" It must have a DN attribute with the DN of its host cert AllowedOrigins list \u00b6 AllowedOrigins is a list of which origins are allowed to host your data. This is a list of resources . A resource is an entry in a /topology/<FACILITY>/<SITE>/<RESOURCEGROUP>.yaml file, for example CHTC_OSDF_ORIGIN . The following requirements must be met for the resource: It must have an \"XRootD origin server\" service It must have an AllowedVOs list that includes either your VO or \"ANY\"","title":"Publishing VO data"},{"location":"data/stashcache/vo-data/#getting-vo-data-into-the-osdf","text":"This document describes the steps required to manage a VO's role in the Open Science Data Federation (OSDF) including selecting a namespace, registration, and selecting which resources are allowed to host or cache your data. For general information about the OSDF, see the overview document . Site admins should work together with VO managers in order to perform these steps.","title":"Getting VO Data into the OSDF"},{"location":"data/stashcache/vo-data/#definitions","text":"Namespace: a directory tree in the federation that is used to find VO data. Public data: data that can be read by anyone. Protected data: data that requires authorization to read.","title":"Definitions"},{"location":"data/stashcache/vo-data/#requirements","text":"In order for a Virtual Organization to join the federation, the VO must already be registered in OSG Topology. See the registration document .","title":"Requirements"},{"location":"data/stashcache/vo-data/#choosing-namespaces","text":"The VO must pick one or more \"namespaces\" for their data. A namespace is a directory tree in the federation where VO data is found. Note Namespaces are global across the federation, so you must work with the OSG Operations team to ensure that your VO's namespaces do not collide with those of another VO. Send an email to help@osg-htc.org with the following subject: \"Requesting OSDF namespaces for VO \" and put the desired namespaces in the body of the email. A namespace should be easy for your users to remember but not so generic that it collides with other VOs. We recommend using the lowercase version of your VO as the top-level directory. In addition, public data, if any, should be stored in a subdirectory named PUBLIC , and protected data, if any, should be stored in a subdirectory named PROTECTED . Putting this together, if your VO is named Astro , you should have: /astro/PUBLIC for public data /astro/PROTECTED for protected data Separating the public and protected data in separate directory trees is preferred for technical reasons.","title":"Choosing Namespaces"},{"location":"data/stashcache/vo-data/#registering-data-federation-information","text":"The VO must allow one or more origins to host their data. An origin will typically be hosted on a site owned by the VO. For information about setting up an origin, see the installation document . In order to declare your VO's role in the federation, you must add OSDF information to your VO's YAML file in the OSG Topology repository. For example, the full registration for the Astro VO may look something like the following: DataFederations : StashCache : Namespaces : - Path : /astro/PUBLIC Authorizations : - PUBLIC AllowedCaches : - ANY AllowedOrigins : - ASTRO_OSDF_ORIGIN - Path : /astro/PROTECTED Authorizations : - FQAN : /Astro - DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci - SciTokens : Issuer : https://astro.org Base Path : /astro/PROTECTED AllowedCaches : - ASTRO_EAST_CACHE - ASTRO_WEST_CACHE AllowedOrigins : - ASTRO_AUTH_OSDF_ORIGIN The sections are described below.","title":"Registering Data Federation Information"},{"location":"data/stashcache/vo-data/#namespaces-section","text":"In the namespaces section, you will declare one or more namespaces. A namespace is a directory tree in the data federation that is owned by a VO/collaboration. Each namespace requires: a Path that is the path to the directory tree, e.g. /astro/PUBLIC an Authorizations list which describes how users are authorized to access data within the namespace an AllowedCaches list of the OSDF caches that are allowed to cache the data within the namespace an AllowedOrigins list of the OSDF origins that are allowed to serve the data within the namespace In addition, a namespace may have the following optional attributes: a Writeback endpoint that is an HTTPS URL like https://stash-xrd.osgconnect.net:1094 that can be used for jobs to write data to the origin a DirList endpoint that is an HTTPS URL like https://origin-auth2001.chtc.wisc.edu:1095 that can be used for getting a directory listing of that namespace","title":"Namespaces section"},{"location":"data/stashcache/vo-data/#authorizations-list","text":"The Authorizations list of each namespace describes how a user can get authorized in order to access the data within the namespace. The list will contain one or more of these: FQAN: <VOMS FQAN> allows someone using a proxy with the specified VOMS FQAN DN: <DN> allows someone using a proxy with that specific DN PUBLIC allows anyone; this is used for public data SciTokens allows someone using a SciToken with the given parameters, which are described below A complete declaration looks like: Namespaces : - Path : /astro/PUBLIC Authorizations : - PUBLIC AllowedCaches : ... AllowedOrigins : ... - Path : /astro/PROTECTED Authorizations : - FQAN : /Astro - DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci - SciTokens : Issuer : https://astro.org Base Path : /astro/PROTECTED Map Subject : True AllowedCaches : ... AllowedOrigins : ... This declares two namespaces: /astro/PUBLIC for public data, and /astro/PROTECTED which can only be read by someone with the /Astro FQAN, by Matyas Selmeci, or by someone with a SciToken issued by https://astro.org .","title":"Authorizations list"},{"location":"data/stashcache/vo-data/#scitokens","text":"A SciTokens authorization has multiple parameters: Issuer (required) is the token issuer of the SciToken that the authorization accepts. Base Path (required) is a path that will be prepended to the scopes of the token in order to construct the full path to the file(s) that the bearer of the token is allowed to access. For example, if Base Path is set to /astro/PROTECTED then a token with the scope read:/matyas will have the permission to read from the directory tree under /astro/PROTECTED/matyas . The correct value for Base Path depends on how the issuer is set up, but we recommend that you set Base Path to the namespace path, and configure the issuer to create scopes relative to the namespace path. Map Subject (optional, False if not specified) should be set to True if the origin uses the XRootD-Multiuser plugin. It will cause the origin to use the token subject ( sub field) to map to a Unix user in order to access files. Restricted Path (optional) is a further restriction on paths the token is allowed to access. Only tokens whose scopes start with the Restricted Path will be accepted. Use this only if your issuer does not create relative scopes.","title":"SciTokens"},{"location":"data/stashcache/vo-data/#allowedcaches-list","text":"The VO must allow one or more OSDF caches to cache their data. The more places a VO's data can be cached in, the bigger the data transfer benefit for the VO. The majority of caches across OSG will automatically cache all \"public\" VO data. Caching \"protected\" VO data will often be done on a site owned by the VO. For information about setting up a cache, see the installation document . AllowedCaches is a list of which caches are allowed to host copies of your data. There are two cases: If you only have public data, your AllowedCaches list can look like: AllowedCaches : - ANY This allows any cache to host a copy of your data. If you have some protected data, then AllowedCaches is a list of resources that are allowed to cache your data. A resource is an entry in a /topology/<FACILITY>/<SITE>/<RESOURCEGROUP>.yaml file, for example CHTC_OSDF_CACHE . The following requirements must be met for the resource: It must have an \"XRootD cache server\" service It must have an AllowedVOs list that includes either your VO, \"ANY\", or \"ANY_PUBLIC\" It must have a DN attribute with the DN of its host cert","title":"AllowedCaches list"},{"location":"data/stashcache/vo-data/#allowedorigins-list","text":"AllowedOrigins is a list of which origins are allowed to host your data. This is a list of resources . A resource is an entry in a /topology/<FACILITY>/<SITE>/<RESOURCEGROUP>.yaml file, for example CHTC_OSDF_ORIGIN . The following requirements must be met for the resource: It must have an \"XRootD origin server\" service It must have an AllowedVOs list that includes either your VO or \"ANY\"","title":"AllowedOrigins list"},{"location":"data/xrootd/install-client/","text":"Using XRootD \u00b6 XRootD is a high performance data system widely used by several science VOs on OSG to store and to distribute data to jobs. It can be used to create a data store from distributed data nodes or to serve data to systems using a distributed caching architecture. Either mode of operation requires you to install the XRootD client software. This page provides instructions for accessing data on XRootD data systems using a variety of methods. As a user you have three different ways to interact with XRootD: Using the XRootD clients Using a XRootDFS FUSE mount to access a local XRootD data store Using LD_PRELOAD to use XRootD libraries with Unix tools We'll show how to install the XRootD client software and use all three mechanisms to access data. Note Only the client tools method should be used to access XRootD systems across a WAN link. Before Starting \u00b6 As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates If you are using the FUSE mount, you should also consider the following requirement: User IDs: If it does not exist already, you will need to create a xrootd user Using the XRootD client software \u00b6 Installing the XRootD Client \u00b6 If you are planning on interacting with XRootD using the XRootD client, then you'll need to install the XRootD client RPM. Installing the XRootD Client RPM \u00b6 The following steps will install the rpm on your system. Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update This command will update all packages Install XRootD Client rpm: root@client $ yum install xrootd-client Using the XRootD Client \u00b6 Once the xrootd-client rpm is installed, you should be able to use the xrdcp command to copy files to and from XRootD systems and the local file system. For example: user@client $ echo \"This is a test\" >/tmp/test user@client $ xrdcp /tmp/test xroot://redirector.domain.org:1094//storage/path/test user@client $ xrdcp xroot://redirector.domain.org:1094//storage/path/test /tmp/test1 user@client $ diff /tmp/test1 /tmp/test For other operations, you'll need to use the xrdfs command. This command allows you to do file operations such as creating directories, removing directories, deleting files, and moving files on a XRootD system, provided you have the appropriate authorization. The xrdfs command can be used interactively by running xrdfs xroot://redirector.domain.org:1094/ . Alternatively, you can use it in batch mode by adding the xrdfs command after the xroot URI. For example: user@client $ echo \"This is a test\" >/tmp/test user@client $ xrdfs xroot://redirector.domain.org:1094/ mkdir /storage/path/test user@client $ xrdcp xroot://redirector.domain.org:1094//storage/path/test/test1 /tmp/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ ls /storage/path/test/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ rm /storage/path/test/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ rmdir /storage/path/test Note To access remote XRootD resources, you will may need to use a VOMS proxy in order to authenticate successfully. The XRootD client tools will automatically locate your proxy if you generate it using voms-proxy-init , otherwise you can set the X509_USER_PROXY environment variable to the location of the proxy XRootD should use. Validation \u00b6 Assuming that there is a file called test_file in your XRootD data store, you can do the following to validate your installation. Here we assume that there is a file on your XRootD system at /storage/path/test_file . user@client $ xrdcp xroot://redirector.yourdomain.org:1094//storage/path/test_file /tmp/test1 Using XRootDFS FUSE mount \u00b6 This section will explain how to install, setup, and interact with XRootD using a FUSE mount. This method of accessing XRootD only works when accessing a local XRootD system. Installing the XRootD FUSE RPM \u00b6 If you are planning on using a FUSE mount, you'll need to install the xrootd-fuse rpm by running the following commands: Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update Install XRootD FUSE rpm: root@client $ yum install xrootd-fuse Configuring the FUSE Mount \u00b6 Once the appropriate rpms are installed, the FUSE setup will need further configuration. See this for instructions on updating your fstab file. Using the XRootDFS FUSE Mount \u00b6 The directory mounted using XRootDFS can be used as any other directory mounted on your file system. All the normal Unix commands should work out of the box. Try using cp , rm , mv , mkdir , rmdir . Assuming your mount is /mnt/xrootd : user@client $ echo \"This is a new test\" >/tmp/test user@client $ mkdir -p /mnt/xrootd/subdir/sub2 user@client $ cp /tmp/test /mnt/xrootd/subdir/sub2/test user@client $ cp /mnt/xrootd/subdir/sub2/test /mnt/xrootd/subdir/sub2/test1 user@client $ cp /mnt/xuserd/subdir/sub2/test1 /tmp/test1 user@client $ diff /tmp/test1 /tmp/test user@client $ rm -r /mnt/xrootd/subdir Validation \u00b6 Assuming your mount is /mnt/xrootd and that there is a file called test_file in your XRootD data store: user@client $ cp /mnt/xrootd/test_file /tmp/test1 Using LD_PRELOAD to access XRootD \u00b6 Installing XRootD Libraries For LD_PRELOAD \u00b6 In order to use LD_PRELOAD to access XRootD, you'll need to install the XRootD client libraries. The following steps will install them on your system: Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update This command will update all packages Install XRootD Client rpm: root@client $ yum install xrootd-client Using LD_PRELOAD method \u00b6 In order to use the LD_PRELOAD method to access a XRootD data store, you'll need to change your environment to use the XRootD libraries in conjunction with the standard Unix binaries. This is done by setting the LD_PRELOAD environment variable. Once this is done, the standard unix commands like mkdir , rm , cp , etc. will work with xroot URIs. For example: user@client $ export LD_PRELOAD = /usr/lib64/libXrdPosixPreload.so user@client $ echo \"This is a new test\" >/tmp/test user@client $ mkdir xroot://redirector.yourdomain.org:1094//storage/path/subdir user@client $ cp /tmp/test xroot://redirector.yourdomain.org:1094//storage/path/subdir/test user@client $ cp xuser://redirector.yourdomain.org:1094//storage/path/subdir/test /tmp/test1 user@client $ diff /tmp/test1 /tmp/test user@client $ rm xroot://redirector.yourdomain.org:1094//storage/path/subdir/test user@client $ rmdir xroot://redirector.yourdomain.org:1094//storage/path/subdir Validation \u00b6 Assuming that there is a file called test_file in your XRootD data store, the following steps will validate your installation: user@client $ export LD_PRELOAD = /usr/lib64/libXrdPosixPreload.so user@client $ cp xroot://redirector.yourdomain.org:1094//storage/path/test_file /tmp/test1 How to get Help? \u00b6 If you cannot resolve the problem, please consult this page for assistance..","title":"Using XRootD"},{"location":"data/xrootd/install-client/#using-xrootd","text":"XRootD is a high performance data system widely used by several science VOs on OSG to store and to distribute data to jobs. It can be used to create a data store from distributed data nodes or to serve data to systems using a distributed caching architecture. Either mode of operation requires you to install the XRootD client software. This page provides instructions for accessing data on XRootD data systems using a variety of methods. As a user you have three different ways to interact with XRootD: Using the XRootD clients Using a XRootDFS FUSE mount to access a local XRootD data store Using LD_PRELOAD to use XRootD libraries with Unix tools We'll show how to install the XRootD client software and use all three mechanisms to access data. Note Only the client tools method should be used to access XRootD systems across a WAN link.","title":"Using XRootD"},{"location":"data/xrootd/install-client/#before-starting","text":"As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates If you are using the FUSE mount, you should also consider the following requirement: User IDs: If it does not exist already, you will need to create a xrootd user","title":"Before Starting"},{"location":"data/xrootd/install-client/#using-the-xrootd-client-software","text":"","title":"Using the XRootD client software"},{"location":"data/xrootd/install-client/#installing-the-xrootd-client","text":"If you are planning on interacting with XRootD using the XRootD client, then you'll need to install the XRootD client RPM.","title":"Installing the XRootD Client"},{"location":"data/xrootd/install-client/#installing-the-xrootd-client-rpm","text":"The following steps will install the rpm on your system. Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update This command will update all packages Install XRootD Client rpm: root@client $ yum install xrootd-client","title":"Installing the XRootD Client RPM"},{"location":"data/xrootd/install-client/#using-the-xrootd-client","text":"Once the xrootd-client rpm is installed, you should be able to use the xrdcp command to copy files to and from XRootD systems and the local file system. For example: user@client $ echo \"This is a test\" >/tmp/test user@client $ xrdcp /tmp/test xroot://redirector.domain.org:1094//storage/path/test user@client $ xrdcp xroot://redirector.domain.org:1094//storage/path/test /tmp/test1 user@client $ diff /tmp/test1 /tmp/test For other operations, you'll need to use the xrdfs command. This command allows you to do file operations such as creating directories, removing directories, deleting files, and moving files on a XRootD system, provided you have the appropriate authorization. The xrdfs command can be used interactively by running xrdfs xroot://redirector.domain.org:1094/ . Alternatively, you can use it in batch mode by adding the xrdfs command after the xroot URI. For example: user@client $ echo \"This is a test\" >/tmp/test user@client $ xrdfs xroot://redirector.domain.org:1094/ mkdir /storage/path/test user@client $ xrdcp xroot://redirector.domain.org:1094//storage/path/test/test1 /tmp/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ ls /storage/path/test/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ rm /storage/path/test/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ rmdir /storage/path/test Note To access remote XRootD resources, you will may need to use a VOMS proxy in order to authenticate successfully. The XRootD client tools will automatically locate your proxy if you generate it using voms-proxy-init , otherwise you can set the X509_USER_PROXY environment variable to the location of the proxy XRootD should use.","title":"Using the XRootD Client"},{"location":"data/xrootd/install-client/#validation","text":"Assuming that there is a file called test_file in your XRootD data store, you can do the following to validate your installation. Here we assume that there is a file on your XRootD system at /storage/path/test_file . user@client $ xrdcp xroot://redirector.yourdomain.org:1094//storage/path/test_file /tmp/test1","title":"Validation"},{"location":"data/xrootd/install-client/#using-xrootdfs-fuse-mount","text":"This section will explain how to install, setup, and interact with XRootD using a FUSE mount. This method of accessing XRootD only works when accessing a local XRootD system.","title":"Using XRootDFS FUSE mount"},{"location":"data/xrootd/install-client/#installing-the-xrootd-fuse-rpm","text":"If you are planning on using a FUSE mount, you'll need to install the xrootd-fuse rpm by running the following commands: Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update Install XRootD FUSE rpm: root@client $ yum install xrootd-fuse","title":"Installing the XRootD FUSE RPM"},{"location":"data/xrootd/install-client/#configuring-the-fuse-mount","text":"Once the appropriate rpms are installed, the FUSE setup will need further configuration. See this for instructions on updating your fstab file.","title":"Configuring the FUSE Mount"},{"location":"data/xrootd/install-client/#using-the-xrootdfs-fuse-mount","text":"The directory mounted using XRootDFS can be used as any other directory mounted on your file system. All the normal Unix commands should work out of the box. Try using cp , rm , mv , mkdir , rmdir . Assuming your mount is /mnt/xrootd : user@client $ echo \"This is a new test\" >/tmp/test user@client $ mkdir -p /mnt/xrootd/subdir/sub2 user@client $ cp /tmp/test /mnt/xrootd/subdir/sub2/test user@client $ cp /mnt/xrootd/subdir/sub2/test /mnt/xrootd/subdir/sub2/test1 user@client $ cp /mnt/xuserd/subdir/sub2/test1 /tmp/test1 user@client $ diff /tmp/test1 /tmp/test user@client $ rm -r /mnt/xrootd/subdir","title":"Using the XRootDFS FUSE Mount"},{"location":"data/xrootd/install-client/#validation_1","text":"Assuming your mount is /mnt/xrootd and that there is a file called test_file in your XRootD data store: user@client $ cp /mnt/xrootd/test_file /tmp/test1","title":"Validation"},{"location":"data/xrootd/install-client/#using-ld_preload-to-access-xrootd","text":"","title":"Using LD_PRELOAD to access XRootD"},{"location":"data/xrootd/install-client/#installing-xrootd-libraries-for-ld_preload","text":"In order to use LD_PRELOAD to access XRootD, you'll need to install the XRootD client libraries. The following steps will install them on your system: Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update This command will update all packages Install XRootD Client rpm: root@client $ yum install xrootd-client","title":"Installing XRootD Libraries For LD_PRELOAD"},{"location":"data/xrootd/install-client/#using-ld_preload-method","text":"In order to use the LD_PRELOAD method to access a XRootD data store, you'll need to change your environment to use the XRootD libraries in conjunction with the standard Unix binaries. This is done by setting the LD_PRELOAD environment variable. Once this is done, the standard unix commands like mkdir , rm , cp , etc. will work with xroot URIs. For example: user@client $ export LD_PRELOAD = /usr/lib64/libXrdPosixPreload.so user@client $ echo \"This is a new test\" >/tmp/test user@client $ mkdir xroot://redirector.yourdomain.org:1094//storage/path/subdir user@client $ cp /tmp/test xroot://redirector.yourdomain.org:1094//storage/path/subdir/test user@client $ cp xuser://redirector.yourdomain.org:1094//storage/path/subdir/test /tmp/test1 user@client $ diff /tmp/test1 /tmp/test user@client $ rm xroot://redirector.yourdomain.org:1094//storage/path/subdir/test user@client $ rmdir xroot://redirector.yourdomain.org:1094//storage/path/subdir","title":"Using LD_PRELOAD method"},{"location":"data/xrootd/install-client/#validation_2","text":"Assuming that there is a file called test_file in your XRootD data store, the following steps will validate your installation: user@client $ export LD_PRELOAD = /usr/lib64/libXrdPosixPreload.so user@client $ cp xroot://redirector.yourdomain.org:1094//storage/path/test_file /tmp/test1","title":"Validation"},{"location":"data/xrootd/install-client/#how-to-get-help","text":"If you cannot resolve the problem, please consult this page for assistance..","title":"How to get Help?"},{"location":"data/xrootd/install-cms-xcache/","text":"Installing the CMS XCache \u00b6 This document describes how to install a CMS XCache. This service allows a site or regional network to cache data frequently used by the CMS experiment , reducing data transfer over the wide-area network and decreasing access latency. The are two types of installations described in this document: single or multinode cache. The difference might be based on the total disk that your cache needs. Before Starting \u00b6 Before starting the installation process, consider the following requirements: Operating system: A RHEL 7 or compatible operating systems. User IDs: If they do not exist already, the installation will create the Linux user IDs xrootd Host certificate: Required for client authentication and authentication with CMS VOMS Server See our documentation for instructions on how to request and install host certificates. Network ports: The cache service requires the following ports open: Inbound TCP port 1094 for file access via the XRootD protocol Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 100TB of disk space for the whole cache (can be divided among several caches), and 8GB of RAM. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates Installing the Cache \u00b6 The CMS XCache ROM software consists of an XRootD server with special configuration and supporting services. To simplify installation, OSG provides convenience RPMs that install all required packages with a single command: root@host # yum install cms-xcache Configuring the Cache \u00b6 First, you must create a \"cache directory\", which will be used to store downloaded files. By default this is /mnt/stash . We recommend using a separate file system for the cache directory, with at least 1 TB of storage available. Note The cache directory must be writable by the xrootd:xrootd user and group. The cms-xcache package provides default configuration files in /etc/xrootd/xrootd-cms-xcache.cfg and /etc/xrootd/config.d/ . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d/1*.cfg (for files that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for files that need to be processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG for example (\"T2_US_UCSD\") Note XRootD can manage a set of independent disk for the cache. So you can modify file 90-cms-xcache-disks.cfg and add the disks there then rootdir just becomes a place to hold symlinks. Ensure the xrootd service has a certificate \u00b6 The service will need a certificate for reporting and to authenticate to CMS AAA. The easiest solution for this is to use your host certificate and key as follows: Copy the host certificate to /etc/grid-security/xrd/xrd{cert,key}.pem Set the owner of the directory and contents /etc/grid-security/xrd/ to xrootd:xrootd : root@host # chown -R xrootd:xrootd /etc/grid-security/xrd/ Note You must repeat the above steps whenever you renew your host certificate. If you automate certificate renewal, you should automate copying as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented here . Note You must also register this certificate with the CMS VOMS (https://voms2.cern.ch:8443/voms/cms/) Configuring Optional Features \u00b6 Adjust disk utilization \u00b6 To adjust the disk utilization of your cache, create or edit a file named /etc/xrootd/config.d/90-local.cfg and set the values of pfc.diskusage . pfc.diskusage 0.90 0.95 The two values correspond to the low and high usage water marks, respectively. When usage goes above the high water mark, the XRootD service will delete cached files until usage goes below the low water mark. Modify the storage access settings at a site \u00b6 In order for CMSSW jobs to use the cache at your site you need to modify the storage.xml and create the following rules # Portions of /store in xcache <lfn-to-pfn protocol=\"direct\" destination-match=\".*\" path-match=\"/+store/(data/.*/.*/NANOAOD/.*)\" result=\"root://yourlocalcache:1094//store/$1\"/> <lfn-to-pfn protocol=\"direct\" destination-match=\".*\" path-match=\"/+store/(mc/.*/.*/NANOAODSIM/.*)\" result=\"root://yourlocalcache:1094//store/$1\"/> Note If you are installing a multinode cache then instead of yourlocalcache:1094 url should be changed for yourcacheredirector:2040 Enable remote debugging \u00b6 XRootD provides remote debugging via a read-only file system named digFS. This feature is disabled by default, but you may enable it if you need help troubleshooting your server. To enable remote debugging, edit /etc/xrootd/digauth.cfg and specify the authorizations for reading digFS. An example of authorizations: all allow gsi g=/glow h=*.cs.wisc.edu This gives access to the config file, log files, core files, and process information to anyone from *.cs.wisc.edu in the /glow VOMS group. See the XRootD manual for the full syntax. Remote debugging should only be enabled for as long as you need assistance. As soon as your issue has been resolved, revert any changes you have made to /etc/xrootd/digauth.cfg . Installing a Multinode Cache (optional) \u00b6 Some sites would like to have a single logical cache composed of several nodes as shown below: This can be achieved by following the next steps Install an XCache redirector \u00b6 This can be a simple lightweight virtual machine and will be the single point of contact from jobs to the caches. Install the redirector package root@host # yum install xcache-redirector Create file named /etc/xrootd/config.d/04-local-redir.cfg with contents: all.manager yourlocalredir:2041 You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG for example (\"T2_US_UCSD\") Start and enable the cmsd and xrootd proccess: Software Service name Notes XRootD cmsd@xcache-redir.service The cmsd daemon that interact with the different xrootd servers XRootD xrootd@xcache-redir.service The xrootd daemon which performs authenticated data transfers Configuring each of your cache nodes \u00b6 Create a config file in the nodes where you installed your caches /etc/xrootd/config.d/94-xrootd-manager.cfg with the following contents: all.manager yourlocalredir:2041 Start and enable the cmsd service: Software Service name Notes XRootD cmsd@cms-xcache.service The xrootd daemon which performs authenticated data transfers Managing CMS XCache and associated services \u00b6 These services must be managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ) for EL7: To... On EL7, run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> CMS XCache services \u00b6 Software Service name Notes XRootD xrootd@cms-xcache.service The XRootD daemon, which performs the data transfers XRootD (Optional) cmsd@cms-xcache.service The cmsd daemon that interact with the different xrootd servers Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron Required to authenticate monitoring services. See CA documentation for more info xrootd-renew-proxy.service Renew a proxy for downloads to the cache xrootd-renew-proxy.timer Trigger daily proxy renewal XCache redirector services (Optional) \u00b6 In the node where the cache redirector is installed these are the list of services: Software Service name Notes XRootD (Optional) xrootd@xcache-redir.service The xrootd daemon which performs authenticated data transfers XRootD (Optional) cmsd@xcache-redir.service The xrootd daemon which performs authenticated data transfers Validating the Cache \u00b6 The cache server functions as a normal CMS XRootD server so first verify it with a personal CMS X.509 proxy: === VO cms extension information === VO : cms subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=efajardo/CN=722781/CN=Edgar Fajardo Hernandez issuer : /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch attribute : /cms/Role=NULL/Capability=NULL attribute : /cms/uscms/Role=NULL/Capability=NULL timeleft : 71:59:46 uri : lcg-voms2.cern.ch:15002 Then test using xrdcp directly in your cache: user@host $ xrdcp -vf -d 1 root://cache_host:1094//store/data/Run2017B/SingleElectron/MINIAOD/31Mar2018-v1/60000/9E0F8458-EA37-E811-93F1-008CFAC919F0.root /dev/null Getting Help \u00b6 To get assistance, please use the this page .","title":"Install CMS XCache"},{"location":"data/xrootd/install-cms-xcache/#installing-the-cms-xcache","text":"This document describes how to install a CMS XCache. This service allows a site or regional network to cache data frequently used by the CMS experiment , reducing data transfer over the wide-area network and decreasing access latency. The are two types of installations described in this document: single or multinode cache. The difference might be based on the total disk that your cache needs.","title":"Installing the CMS XCache"},{"location":"data/xrootd/install-cms-xcache/#before-starting","text":"Before starting the installation process, consider the following requirements: Operating system: A RHEL 7 or compatible operating systems. User IDs: If they do not exist already, the installation will create the Linux user IDs xrootd Host certificate: Required for client authentication and authentication with CMS VOMS Server See our documentation for instructions on how to request and install host certificates. Network ports: The cache service requires the following ports open: Inbound TCP port 1094 for file access via the XRootD protocol Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 100TB of disk space for the whole cache (can be divided among several caches), and 8GB of RAM. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"data/xrootd/install-cms-xcache/#installing-the-cache","text":"The CMS XCache ROM software consists of an XRootD server with special configuration and supporting services. To simplify installation, OSG provides convenience RPMs that install all required packages with a single command: root@host # yum install cms-xcache","title":"Installing the Cache"},{"location":"data/xrootd/install-cms-xcache/#configuring-the-cache","text":"First, you must create a \"cache directory\", which will be used to store downloaded files. By default this is /mnt/stash . We recommend using a separate file system for the cache directory, with at least 1 TB of storage available. Note The cache directory must be writable by the xrootd:xrootd user and group. The cms-xcache package provides default configuration files in /etc/xrootd/xrootd-cms-xcache.cfg and /etc/xrootd/config.d/ . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d/1*.cfg (for files that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for files that need to be processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG for example (\"T2_US_UCSD\") Note XRootD can manage a set of independent disk for the cache. So you can modify file 90-cms-xcache-disks.cfg and add the disks there then rootdir just becomes a place to hold symlinks.","title":"Configuring the Cache"},{"location":"data/xrootd/install-cms-xcache/#ensure-the-xrootd-service-has-a-certificate","text":"The service will need a certificate for reporting and to authenticate to CMS AAA. The easiest solution for this is to use your host certificate and key as follows: Copy the host certificate to /etc/grid-security/xrd/xrd{cert,key}.pem Set the owner of the directory and contents /etc/grid-security/xrd/ to xrootd:xrootd : root@host # chown -R xrootd:xrootd /etc/grid-security/xrd/ Note You must repeat the above steps whenever you renew your host certificate. If you automate certificate renewal, you should automate copying as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented here . Note You must also register this certificate with the CMS VOMS (https://voms2.cern.ch:8443/voms/cms/)","title":"Ensure the xrootd service has a certificate"},{"location":"data/xrootd/install-cms-xcache/#configuring-optional-features","text":"","title":"Configuring Optional Features"},{"location":"data/xrootd/install-cms-xcache/#adjust-disk-utilization","text":"To adjust the disk utilization of your cache, create or edit a file named /etc/xrootd/config.d/90-local.cfg and set the values of pfc.diskusage . pfc.diskusage 0.90 0.95 The two values correspond to the low and high usage water marks, respectively. When usage goes above the high water mark, the XRootD service will delete cached files until usage goes below the low water mark.","title":"Adjust disk utilization"},{"location":"data/xrootd/install-cms-xcache/#modify-the-storage-access-settings-at-a-site","text":"In order for CMSSW jobs to use the cache at your site you need to modify the storage.xml and create the following rules # Portions of /store in xcache <lfn-to-pfn protocol=\"direct\" destination-match=\".*\" path-match=\"/+store/(data/.*/.*/NANOAOD/.*)\" result=\"root://yourlocalcache:1094//store/$1\"/> <lfn-to-pfn protocol=\"direct\" destination-match=\".*\" path-match=\"/+store/(mc/.*/.*/NANOAODSIM/.*)\" result=\"root://yourlocalcache:1094//store/$1\"/> Note If you are installing a multinode cache then instead of yourlocalcache:1094 url should be changed for yourcacheredirector:2040","title":"Modify the storage access settings at a site"},{"location":"data/xrootd/install-cms-xcache/#enable-remote-debugging","text":"XRootD provides remote debugging via a read-only file system named digFS. This feature is disabled by default, but you may enable it if you need help troubleshooting your server. To enable remote debugging, edit /etc/xrootd/digauth.cfg and specify the authorizations for reading digFS. An example of authorizations: all allow gsi g=/glow h=*.cs.wisc.edu This gives access to the config file, log files, core files, and process information to anyone from *.cs.wisc.edu in the /glow VOMS group. See the XRootD manual for the full syntax. Remote debugging should only be enabled for as long as you need assistance. As soon as your issue has been resolved, revert any changes you have made to /etc/xrootd/digauth.cfg .","title":"Enable remote debugging"},{"location":"data/xrootd/install-cms-xcache/#installing-a-multinode-cache-optional","text":"Some sites would like to have a single logical cache composed of several nodes as shown below: This can be achieved by following the next steps","title":"Installing a Multinode Cache (optional)"},{"location":"data/xrootd/install-cms-xcache/#install-an-xcache-redirector","text":"This can be a simple lightweight virtual machine and will be the single point of contact from jobs to the caches. Install the redirector package root@host # yum install xcache-redirector Create file named /etc/xrootd/config.d/04-local-redir.cfg with contents: all.manager yourlocalredir:2041 You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG for example (\"T2_US_UCSD\") Start and enable the cmsd and xrootd proccess: Software Service name Notes XRootD cmsd@xcache-redir.service The cmsd daemon that interact with the different xrootd servers XRootD xrootd@xcache-redir.service The xrootd daemon which performs authenticated data transfers","title":"Install an XCache redirector"},{"location":"data/xrootd/install-cms-xcache/#configuring-each-of-your-cache-nodes","text":"Create a config file in the nodes where you installed your caches /etc/xrootd/config.d/94-xrootd-manager.cfg with the following contents: all.manager yourlocalredir:2041 Start and enable the cmsd service: Software Service name Notes XRootD cmsd@cms-xcache.service The xrootd daemon which performs authenticated data transfers","title":"Configuring each of your cache nodes"},{"location":"data/xrootd/install-cms-xcache/#managing-cms-xcache-and-associated-services","text":"These services must be managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ) for EL7: To... On EL7, run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME>","title":"Managing CMS XCache and associated services"},{"location":"data/xrootd/install-cms-xcache/#cms-xcache-services","text":"Software Service name Notes XRootD xrootd@cms-xcache.service The XRootD daemon, which performs the data transfers XRootD (Optional) cmsd@cms-xcache.service The cmsd daemon that interact with the different xrootd servers Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron Required to authenticate monitoring services. See CA documentation for more info xrootd-renew-proxy.service Renew a proxy for downloads to the cache xrootd-renew-proxy.timer Trigger daily proxy renewal","title":"CMS XCache services"},{"location":"data/xrootd/install-cms-xcache/#xcache-redirector-services-optional","text":"In the node where the cache redirector is installed these are the list of services: Software Service name Notes XRootD (Optional) xrootd@xcache-redir.service The xrootd daemon which performs authenticated data transfers XRootD (Optional) cmsd@xcache-redir.service The xrootd daemon which performs authenticated data transfers","title":"XCache redirector services (Optional)"},{"location":"data/xrootd/install-cms-xcache/#validating-the-cache","text":"The cache server functions as a normal CMS XRootD server so first verify it with a personal CMS X.509 proxy: === VO cms extension information === VO : cms subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=efajardo/CN=722781/CN=Edgar Fajardo Hernandez issuer : /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch attribute : /cms/Role=NULL/Capability=NULL attribute : /cms/uscms/Role=NULL/Capability=NULL timeleft : 71:59:46 uri : lcg-voms2.cern.ch:15002 Then test using xrdcp directly in your cache: user@host $ xrdcp -vf -d 1 root://cache_host:1094//store/data/Run2017B/SingleElectron/MINIAOD/31Mar2018-v1/60000/9E0F8458-EA37-E811-93F1-008CFAC919F0.root /dev/null","title":"Validating the Cache"},{"location":"data/xrootd/install-cms-xcache/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/xrootd/install-shoveler/","text":"Installing the XRootD Monitoring Shoveler \u00b6 The XRootD Monitoring Shoveler is designed to accept the XRootD monitoring packets and \"shovel\" them to the OSG message bus. Shoveling is the act of moving messages from one medium to another. In this case, the shoveler is moving messages from a UDP stream to a message bus. graph LR subgraph Site subgraph Node 1 node1[XRootD] -- UDP --> shoveler1{Shoveler}; end subgraph Node 2 node2[XRootD] -- UDP --> shoveler1{Shoveler}; end end; subgraph OSG Operations shoveler1 -- TCP/TLS --> C[Message Bus]; C -- Raw --> D[XRootD Collector]; D -- Summary --> C; C -- Summary --> E[(Storage)]; style shoveler1 font-weight:bolder,stroke-width:4px,stroke:#E74C3C,font-size:4em,color:#E74C3C end; Installing the Shoveler \u00b6 The shoveler can be installed via RPM, container, or staticly compiled binary. Requirements for running the Shoveler \u00b6 An open port (configurable) that can receive UDP packets from the XRootD servers on the shoveler server. It does not need to be an open port to the internet, only open to the XRootD servers. Outgoing TCP connectivity on the shoveler host. A directory on the shoveler host to store the on-disk queue. Resource Requirements \u00b6 RAM : Production shovelers use less than 50MB of memory. Disk : If the shoveler is disconnected from the message bus, it will store the messages on disk until reconnected. Through testing, a disconnected shoveler with 12 busy XRootD servers will generate <30 MB of data a day on disk. CPU : A production shoveler will use 1-2% of a CPU, depending on how many XRootD servers are reporting to the shoveler. A shoveler with 12 busy XRootD servers reporting to it uses 1-2% of a CPU. Network : A production shoveler will receive UDP messages from XRootD servers and send them to a message bus. The incoming and outgoing network utilization will be the same. In testing, a shoveler will use <30MB of data a day on the network. Configuring the Shoveler \u00b6 Configuration can be specified with environment variables or a configuration file. The configuration file is in yaml . An example configuration file is distributed with the shoveler. In the RPM, the configuration file is located in /etc/xrootd-monitoring-shoveler/config.yaml . Below, we will break the configuration file into fragments but together they make a whole configuration file. Environment variables can be derived from the yaml. Every environment variable starts with SHOVELER_ , then continues with the structure of the configuration file. For example, the amqp url can be configured with the environment variable SHOVELER_AMQP_URL . The verify option can be configured with SHOVELER_VERIFY . Configuration Fragments \u00b6 AMQP Configuration \u00b6 AMQP configuration. For the OSG, the url should be amqps://clever-turkey.rmq.cloudamqp.com/xrd-mon . The exchange should is correct for the OSG. token_location is the path to the authentication token. # AMQP configuration amqp : url : amqps://username:password@example.com/vhost exchange : shoveled-xrd topic : token_location : /etc/xrootd-monitoring-shoveler/token Listening to UDP messages \u00b6 Where to listen for UDP messages from XRootD servers. listen : port : 9993 ip : 0.0.0.0 Verify packet header \u00b6 Whether to verify the header of the packet matches XRootD's monitoring packet format. verify : true Prometheus monitoring data \u00b6 Listening location of Prometheus metrics to view the performance and status of the shoveler in Prometheus format. # Export prometheus metrics metrics : enable : true port : 8000 Queue Configuration \u00b6 Directory to store overflow of queue onto disk. The queue keeps 100 messages in memory. If the shoveler is disconnected from the message bus, it will store messages over the 100 in memory onto disk into this directory. Once the connection has been re-established the queue will be emptied. The queue on disk is persistent between restarts. queue_directory : /tmp/shoveler-queue IP Mapping Configuration \u00b6 Mapping configuration (optional). If map.all is set, all messages will be mapped to the configured IP address. For example, with the above configuration, if a packet comes in with the private IP address of 192.168.0.4, the packet origin will be changed to 172.0.0.4. The port is always preserved. # map: # all: 172.0.0.4 If you want multiple mappings, you can specify multiple map entries. # map: # 192.168.0.5: 172.0.0.5 # 192.168.0.6: 129.93.10.7 Configuring Security \u00b6 A token is used to authenticate and authorize the shoveler with the message bus. The token is generated by the shoveler's lightweight issuer. Sequence of getting a token for the shoveler is shown below. sequenceDiagram User->>oidc-agent: Authenticate oidc-agent->>Issuer: Register agent Issuer->>oidc-agent: User Code oidc-agent->>User: User Code and URL User->>Issuer: Authenticate at URL oidc-agent->>Issuer: Get Token Get your unique CILogon User Identifier from CILogon . It is under User Attributes, and follows the pattern http://cilogon.org/serverA/users/12345. Open a ticket at help@osg-htc.org with your CILogon User Identifier to authorize your login with the renewer. Install the OSG Token Renewal Service When installing, the issuer is https://lw-issuer.osgdev.chtc.io/scitokens-server/ When asked about scopes, accept the default. Follow through authentication the flow. In the configuration for the issuer, /etc/osg/token-renewer/config.ini , the token location must match the location of the token in the Shoveler configuration.","title":"Install XRootD Shoveler"},{"location":"data/xrootd/install-shoveler/#installing-the-xrootd-monitoring-shoveler","text":"The XRootD Monitoring Shoveler is designed to accept the XRootD monitoring packets and \"shovel\" them to the OSG message bus. Shoveling is the act of moving messages from one medium to another. In this case, the shoveler is moving messages from a UDP stream to a message bus. graph LR subgraph Site subgraph Node 1 node1[XRootD] -- UDP --> shoveler1{Shoveler}; end subgraph Node 2 node2[XRootD] -- UDP --> shoveler1{Shoveler}; end end; subgraph OSG Operations shoveler1 -- TCP/TLS --> C[Message Bus]; C -- Raw --> D[XRootD Collector]; D -- Summary --> C; C -- Summary --> E[(Storage)]; style shoveler1 font-weight:bolder,stroke-width:4px,stroke:#E74C3C,font-size:4em,color:#E74C3C end;","title":"Installing the XRootD Monitoring Shoveler"},{"location":"data/xrootd/install-shoveler/#installing-the-shoveler","text":"The shoveler can be installed via RPM, container, or staticly compiled binary.","title":"Installing the Shoveler"},{"location":"data/xrootd/install-shoveler/#requirements-for-running-the-shoveler","text":"An open port (configurable) that can receive UDP packets from the XRootD servers on the shoveler server. It does not need to be an open port to the internet, only open to the XRootD servers. Outgoing TCP connectivity on the shoveler host. A directory on the shoveler host to store the on-disk queue.","title":"Requirements for running the Shoveler"},{"location":"data/xrootd/install-shoveler/#resource-requirements","text":"RAM : Production shovelers use less than 50MB of memory. Disk : If the shoveler is disconnected from the message bus, it will store the messages on disk until reconnected. Through testing, a disconnected shoveler with 12 busy XRootD servers will generate <30 MB of data a day on disk. CPU : A production shoveler will use 1-2% of a CPU, depending on how many XRootD servers are reporting to the shoveler. A shoveler with 12 busy XRootD servers reporting to it uses 1-2% of a CPU. Network : A production shoveler will receive UDP messages from XRootD servers and send them to a message bus. The incoming and outgoing network utilization will be the same. In testing, a shoveler will use <30MB of data a day on the network.","title":"Resource Requirements"},{"location":"data/xrootd/install-shoveler/#configuring-the-shoveler","text":"Configuration can be specified with environment variables or a configuration file. The configuration file is in yaml . An example configuration file is distributed with the shoveler. In the RPM, the configuration file is located in /etc/xrootd-monitoring-shoveler/config.yaml . Below, we will break the configuration file into fragments but together they make a whole configuration file. Environment variables can be derived from the yaml. Every environment variable starts with SHOVELER_ , then continues with the structure of the configuration file. For example, the amqp url can be configured with the environment variable SHOVELER_AMQP_URL . The verify option can be configured with SHOVELER_VERIFY .","title":"Configuring the Shoveler"},{"location":"data/xrootd/install-shoveler/#configuration-fragments","text":"","title":"Configuration Fragments"},{"location":"data/xrootd/install-shoveler/#amqp-configuration","text":"AMQP configuration. For the OSG, the url should be amqps://clever-turkey.rmq.cloudamqp.com/xrd-mon . The exchange should is correct for the OSG. token_location is the path to the authentication token. # AMQP configuration amqp : url : amqps://username:password@example.com/vhost exchange : shoveled-xrd topic : token_location : /etc/xrootd-monitoring-shoveler/token","title":"AMQP Configuration"},{"location":"data/xrootd/install-shoveler/#listening-to-udp-messages","text":"Where to listen for UDP messages from XRootD servers. listen : port : 9993 ip : 0.0.0.0","title":"Listening to UDP messages"},{"location":"data/xrootd/install-shoveler/#verify-packet-header","text":"Whether to verify the header of the packet matches XRootD's monitoring packet format. verify : true","title":"Verify packet header"},{"location":"data/xrootd/install-shoveler/#prometheus-monitoring-data","text":"Listening location of Prometheus metrics to view the performance and status of the shoveler in Prometheus format. # Export prometheus metrics metrics : enable : true port : 8000","title":"Prometheus monitoring data"},{"location":"data/xrootd/install-shoveler/#queue-configuration","text":"Directory to store overflow of queue onto disk. The queue keeps 100 messages in memory. If the shoveler is disconnected from the message bus, it will store messages over the 100 in memory onto disk into this directory. Once the connection has been re-established the queue will be emptied. The queue on disk is persistent between restarts. queue_directory : /tmp/shoveler-queue","title":"Queue Configuration"},{"location":"data/xrootd/install-shoveler/#ip-mapping-configuration","text":"Mapping configuration (optional). If map.all is set, all messages will be mapped to the configured IP address. For example, with the above configuration, if a packet comes in with the private IP address of 192.168.0.4, the packet origin will be changed to 172.0.0.4. The port is always preserved. # map: # all: 172.0.0.4 If you want multiple mappings, you can specify multiple map entries. # map: # 192.168.0.5: 172.0.0.5 # 192.168.0.6: 129.93.10.7","title":"IP Mapping Configuration"},{"location":"data/xrootd/install-shoveler/#configuring-security","text":"A token is used to authenticate and authorize the shoveler with the message bus. The token is generated by the shoveler's lightweight issuer. Sequence of getting a token for the shoveler is shown below. sequenceDiagram User->>oidc-agent: Authenticate oidc-agent->>Issuer: Register agent Issuer->>oidc-agent: User Code oidc-agent->>User: User Code and URL User->>Issuer: Authenticate at URL oidc-agent->>Issuer: Get Token Get your unique CILogon User Identifier from CILogon . It is under User Attributes, and follows the pattern http://cilogon.org/serverA/users/12345. Open a ticket at help@osg-htc.org with your CILogon User Identifier to authorize your login with the renewer. Install the OSG Token Renewal Service When installing, the issuer is https://lw-issuer.osgdev.chtc.io/scitokens-server/ When asked about scopes, accept the default. Follow through authentication the flow. In the configuration for the issuer, /etc/osg/token-renewer/config.ini , the token location must match the location of the token in the Shoveler configuration.","title":"Configuring Security"},{"location":"data/xrootd/install-standalone/","text":"Install XRootD Standalone \u00b6 XRootD is a hierarchical storage system that can be used in many ways to access data, typically distributed among actual storage resources. In its standalone configuration, XRootD acts as a simple layer exporting data from a storage system to the outside world. This document focuses on installing a default configuration of XRootD standalone that provides the following features: Supports any POSIX-based storage system Macaroons, X.509 proxy, and VOMS proxy authentication Third-Party Copy over HTTP (HTTP-TPC) Before Starting \u00b6 Before starting the installation process, consider the following points: User IDs: If it does not exist already, the installation will create the Linux user ID xrootd Service certificate: The XRootD service uses a host certificate and key pair at /etc/grid-security/xrd/xrdcert.pem and /etc/grid-security/xrd/xrdkey.pem that must be owned by the xrootd user Networking: The XRootD service uses port 1094 by default As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Installing XRootD \u00b6 Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. To install an XRootD Standalone server, run the following command: root@xrootd-standalone # yum install osg-xrootd-standalone Configuring XRootD \u00b6 To configure XRootD as a standalone server, you will modify /etc/xrootd/xrootd-standalone.cfg and the config files under /etc/xrootd/config.d/ as follows: Configure a rootdir in /etc/xrootd/config.d/10-common-site-local.cfg , to point to the top of the directory hierarchy which you wish to serve via XRootD. set rootdir = <DIRECTORY> Carefully consider your rootdir Do not set rootdir to / . This might result in serving private information. If you want to limit the sub-directories to serve under your configured rootdir , comment out the all.export / directive in /etc/xrootd/config.d/90-osg-standalone-paths.cfg , and add an all.export directive for each directory under rootdir that you wish to serve via XRootD. This is useful if you have a mixture of files under your rootdir , for example from multiple users, but only want to expose a subset of them to the world. For example, to serve the contents of /data/store and /data/public (with rootdir configured to /data ): all.export /store/ all.export /public/ If you want to serve everything under your configured rootdir , you don't have to change anything. Danger The directories specified this way are writable by default. Access controls should be managed via authorization configuration . In /etc/xrootd/config.d/10-common-site-local.cfg , add a line to set the resourcename variable. Unless your supported VOs' policies state otherwise, this should match the resource name of your XRootD service. For example, the XRootD service registered at the University of Florida site should set the following configuration: set resourcename = UFlorida-XRD Configuring authentication and authorization \u00b6 XRootD offers several authentication options using security plugins to validate incoming credentials, such as bearer tokens, X.509 proxies, and VOMS proxies. Please follow the XRootD authorization documentation for instructions on how to configure authentication and authorization, including validating credentials and mapping them to users if desired. Optional configuration \u00b6 The following configuration steps are optional and will likely not be required for setting up a small site. If you do not need any of the following special configurations, skip to the section on using XRootD . Enabling multi-user support \u00b6 Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. The xrootd-multiuser plugin allows XRootD to write files on the storage system as the authenticated user instead of the xrootd user. If your XRootD service only allows read-only access, you should skip installation of this plugin. To set up XRootD in multi-user mode, install the xrootd-multiuser package: root@xrootd-standalone # yum install xrootd-multiuser Throttling IO requests \u00b6 XRootD allows throttling of requests to the underlying filesystem. To enable this, In an /etc/xrootd/config.d/*.cfg file, e.g. /etc/xrootd/config.d/99-local.cfg , set the following configuration: xrootd.fslib throttle default throttle.throttle concurrency <CONCUR> data <RATE> Replacing <CONCUR> with the IO concurrency limit, measured in seconds (e.g., 100 connections taking 1ms each, would be 0.1), and <RATE> with the data rate limit in bytes per second. Note that you may also just specify either the concurrency limit: xrootd.fslib throttle default throttle.throttle concurrency <CONCUR> Or the data rate limit: xrootd.fslib throttle default throttle.throttle data <RATE> If XRootD is already running, restart the relevant XRootD service for your configuration to take effect. For more details of the throttling implementation, see the upstream documentation . Enabling CMS TFC support (CMS sites only) \u00b6 For CMS sites, there is a package available to integrate rule-based name lookup using a storage.xml file. If you are not setting up a service for CMS, skip this section. To install an xrootd-cmstfc on OSG 3.6, run the following command: root@xrootd-standalone # yum install --enablerepo = osg-contrib xrootd-cmstfc You will need to add your storage.xml to /etc/xrootd/storage.xml and then add the following line to your XRootD configuration: # Integrate with CMS TFC, placed in /etc/xrootd/storage.xml oss.namelib /usr/lib64/libXrdCmsTfc.so file:/etc/xrootd/storage.xml?protocol=hadoop Add the orange text only if you are running hadoop (see below). See the CMS TWiki for more information: https://twiki.cern.ch/twiki/bin/view/Main/XrootdTfcChanges https://twiki.cern.ch/twiki/bin/view/Main/HdfsXrootdInstall Using XRootD \u00b6 In addition to the XRootD service itself, there are a number of supporting services in your installation. The specific services are: Software Service Name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info XRootD xrootd@standalone Primary xrootd service if not running in multi-user mode XRootD Multi-user xrootd-privileged@standalone Primary xrootd service to start instead of xrootd@standalone if running in multi-user mode Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To \u2026 Run the command\u2026 Start a service systemctl start SERVICE-NAME Stop a service systemctl stop SERVICE-NAME Enable a service to start during boot systemctl enable SERVICE-NAME Disable a service from starting during boot systemctl disable SERVICE-NAME Validating XRootD \u00b6 To validate an XRootD installation, perform the following verification steps: Note If you have configured authentication/authorization for XRootD, be sure you have given yourself the necessary permissions to run these tests. For example, if you are using an X.509 proxy, make sure your DN is mapped to a user in /etc/grid-security/grid-mapfile , make sure you have a valid proxy on your local machine, and ensure that the Authfile on the XRootD server gives write access to the mapped user from /etc/grid-security/grid-mapfile . Verify authorization of bearer tokens and/or proxies Verify HTTP-TPC using the same GFAL2 client tools: Requires gfal2 >= 2.20.0 gfal2-2.20.0 contains a fix for a bug affecting XRootD HTTP-TPC support. Copy a file from your XRootD standalone host to another host and path where you have write access: root@xrootd-standalone # gfal-copy davs://localhost:1094/<PATH TO LOCAL FILE> \\ <REMOTE HOST>/<PATH TO WRITE REMOTE FILE> Replacing <PATH TO LOCAL FILE> with the path to a file that you can read on your host relative to rootdir ; <REMOTE HOST> with the protocol, FQDN, and port of the remote storage host; and <PATH TO WRITE REMOTE FILE> to a location on the remote storage host where you have write access. Copy a file from a remote host where you have read access to your XRootD standalone installation: root@xrootd-standalone # gfal-copy <REMOTE HOST>/<PATH TO REMOTE FILE> \\ davs://localhost:1094/<PATH TO WRITE LOCAL FILE> Replacing <REMOTE HOST> with the protocol, FQDN, and port of the remote storage host; <PATH TO REMOTE FILE> with the path to a file that you can read on the remote storage host; and <PATH TO WRITE LOCAL FILE> to a location on the XRootD standalone host relative to rootdir where you have write access. Registering an XRootD Standalone Server \u00b6 To register your XRootD server, follow the general registration instructions here with the following XRootD-specific details: Add an XRootD component: section to the Services: list, with any relevant fields for that service. This is a partial example: ... FQDN: <FULLY QUALIFIED DOMAIN NAME> Services: XRootD component: Description: Standalone XRootD server ... Replacing <FULLY QUALIFIED DOMAIN NAME> with your XRootD server's DNS entry. If you are setting up a new resource, set Active: false . Only set Active: true for a resource when it is accepting requests and ready for production. Getting Help \u00b6 To get assistance. please use the Help Procedure page. Reference \u00b6 XRootD documentation Export directive in the XRootD configuration and relevant options Service Configuration \u00b6 The configuration that your XRootD service uses is determined by the service name given to systemctl . To use the standalone config, you would start XRootD with the following command: root@host # systemctl start xrootd@standalone File locations \u00b6 Service/Process Configuration File Description xrootd /etc/xrootd/xrootd-standalone.cfg Main XRootD configuration /etc/xrootd/config.d/ Drop-in configuration dir /etc/xrootd/auth_file Authorized users file Service/Process Log File Description xrootd /var/log/xrootd/server/xrootd.log XRootD server daemon log cmsd /var/log/xrootd/server/cmsd.log Cluster management log","title":"Install XRootD Standalone"},{"location":"data/xrootd/install-standalone/#install-xrootd-standalone","text":"XRootD is a hierarchical storage system that can be used in many ways to access data, typically distributed among actual storage resources. In its standalone configuration, XRootD acts as a simple layer exporting data from a storage system to the outside world. This document focuses on installing a default configuration of XRootD standalone that provides the following features: Supports any POSIX-based storage system Macaroons, X.509 proxy, and VOMS proxy authentication Third-Party Copy over HTTP (HTTP-TPC)","title":"Install XRootD Standalone"},{"location":"data/xrootd/install-standalone/#before-starting","text":"Before starting the installation process, consider the following points: User IDs: If it does not exist already, the installation will create the Linux user ID xrootd Service certificate: The XRootD service uses a host certificate and key pair at /etc/grid-security/xrd/xrdcert.pem and /etc/grid-security/xrd/xrdkey.pem that must be owned by the xrootd user Networking: The XRootD service uses port 1094 by default As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"data/xrootd/install-standalone/#installing-xrootd","text":"Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. To install an XRootD Standalone server, run the following command: root@xrootd-standalone # yum install osg-xrootd-standalone","title":"Installing XRootD"},{"location":"data/xrootd/install-standalone/#configuring-xrootd","text":"To configure XRootD as a standalone server, you will modify /etc/xrootd/xrootd-standalone.cfg and the config files under /etc/xrootd/config.d/ as follows: Configure a rootdir in /etc/xrootd/config.d/10-common-site-local.cfg , to point to the top of the directory hierarchy which you wish to serve via XRootD. set rootdir = <DIRECTORY> Carefully consider your rootdir Do not set rootdir to / . This might result in serving private information. If you want to limit the sub-directories to serve under your configured rootdir , comment out the all.export / directive in /etc/xrootd/config.d/90-osg-standalone-paths.cfg , and add an all.export directive for each directory under rootdir that you wish to serve via XRootD. This is useful if you have a mixture of files under your rootdir , for example from multiple users, but only want to expose a subset of them to the world. For example, to serve the contents of /data/store and /data/public (with rootdir configured to /data ): all.export /store/ all.export /public/ If you want to serve everything under your configured rootdir , you don't have to change anything. Danger The directories specified this way are writable by default. Access controls should be managed via authorization configuration . In /etc/xrootd/config.d/10-common-site-local.cfg , add a line to set the resourcename variable. Unless your supported VOs' policies state otherwise, this should match the resource name of your XRootD service. For example, the XRootD service registered at the University of Florida site should set the following configuration: set resourcename = UFlorida-XRD","title":"Configuring XRootD"},{"location":"data/xrootd/install-standalone/#configuring-authentication-and-authorization","text":"XRootD offers several authentication options using security plugins to validate incoming credentials, such as bearer tokens, X.509 proxies, and VOMS proxies. Please follow the XRootD authorization documentation for instructions on how to configure authentication and authorization, including validating credentials and mapping them to users if desired.","title":"Configuring authentication and authorization"},{"location":"data/xrootd/install-standalone/#optional-configuration","text":"The following configuration steps are optional and will likely not be required for setting up a small site. If you do not need any of the following special configurations, skip to the section on using XRootD .","title":"Optional configuration"},{"location":"data/xrootd/install-standalone/#enabling-multi-user-support","text":"Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. The xrootd-multiuser plugin allows XRootD to write files on the storage system as the authenticated user instead of the xrootd user. If your XRootD service only allows read-only access, you should skip installation of this plugin. To set up XRootD in multi-user mode, install the xrootd-multiuser package: root@xrootd-standalone # yum install xrootd-multiuser","title":"Enabling multi-user support"},{"location":"data/xrootd/install-standalone/#throttling-io-requests","text":"XRootD allows throttling of requests to the underlying filesystem. To enable this, In an /etc/xrootd/config.d/*.cfg file, e.g. /etc/xrootd/config.d/99-local.cfg , set the following configuration: xrootd.fslib throttle default throttle.throttle concurrency <CONCUR> data <RATE> Replacing <CONCUR> with the IO concurrency limit, measured in seconds (e.g., 100 connections taking 1ms each, would be 0.1), and <RATE> with the data rate limit in bytes per second. Note that you may also just specify either the concurrency limit: xrootd.fslib throttle default throttle.throttle concurrency <CONCUR> Or the data rate limit: xrootd.fslib throttle default throttle.throttle data <RATE> If XRootD is already running, restart the relevant XRootD service for your configuration to take effect. For more details of the throttling implementation, see the upstream documentation .","title":"Throttling IO requests"},{"location":"data/xrootd/install-standalone/#enabling-cms-tfc-support-cms-sites-only","text":"For CMS sites, there is a package available to integrate rule-based name lookup using a storage.xml file. If you are not setting up a service for CMS, skip this section. To install an xrootd-cmstfc on OSG 3.6, run the following command: root@xrootd-standalone # yum install --enablerepo = osg-contrib xrootd-cmstfc You will need to add your storage.xml to /etc/xrootd/storage.xml and then add the following line to your XRootD configuration: # Integrate with CMS TFC, placed in /etc/xrootd/storage.xml oss.namelib /usr/lib64/libXrdCmsTfc.so file:/etc/xrootd/storage.xml?protocol=hadoop Add the orange text only if you are running hadoop (see below). See the CMS TWiki for more information: https://twiki.cern.ch/twiki/bin/view/Main/XrootdTfcChanges https://twiki.cern.ch/twiki/bin/view/Main/HdfsXrootdInstall","title":"Enabling CMS TFC support (CMS sites only)"},{"location":"data/xrootd/install-standalone/#using-xrootd","text":"In addition to the XRootD service itself, there are a number of supporting services in your installation. The specific services are: Software Service Name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info XRootD xrootd@standalone Primary xrootd service if not running in multi-user mode XRootD Multi-user xrootd-privileged@standalone Primary xrootd service to start instead of xrootd@standalone if running in multi-user mode Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To \u2026 Run the command\u2026 Start a service systemctl start SERVICE-NAME Stop a service systemctl stop SERVICE-NAME Enable a service to start during boot systemctl enable SERVICE-NAME Disable a service from starting during boot systemctl disable SERVICE-NAME","title":"Using XRootD"},{"location":"data/xrootd/install-standalone/#validating-xrootd","text":"To validate an XRootD installation, perform the following verification steps: Note If you have configured authentication/authorization for XRootD, be sure you have given yourself the necessary permissions to run these tests. For example, if you are using an X.509 proxy, make sure your DN is mapped to a user in /etc/grid-security/grid-mapfile , make sure you have a valid proxy on your local machine, and ensure that the Authfile on the XRootD server gives write access to the mapped user from /etc/grid-security/grid-mapfile . Verify authorization of bearer tokens and/or proxies Verify HTTP-TPC using the same GFAL2 client tools: Requires gfal2 >= 2.20.0 gfal2-2.20.0 contains a fix for a bug affecting XRootD HTTP-TPC support. Copy a file from your XRootD standalone host to another host and path where you have write access: root@xrootd-standalone # gfal-copy davs://localhost:1094/<PATH TO LOCAL FILE> \\ <REMOTE HOST>/<PATH TO WRITE REMOTE FILE> Replacing <PATH TO LOCAL FILE> with the path to a file that you can read on your host relative to rootdir ; <REMOTE HOST> with the protocol, FQDN, and port of the remote storage host; and <PATH TO WRITE REMOTE FILE> to a location on the remote storage host where you have write access. Copy a file from a remote host where you have read access to your XRootD standalone installation: root@xrootd-standalone # gfal-copy <REMOTE HOST>/<PATH TO REMOTE FILE> \\ davs://localhost:1094/<PATH TO WRITE LOCAL FILE> Replacing <REMOTE HOST> with the protocol, FQDN, and port of the remote storage host; <PATH TO REMOTE FILE> with the path to a file that you can read on the remote storage host; and <PATH TO WRITE LOCAL FILE> to a location on the XRootD standalone host relative to rootdir where you have write access.","title":"Validating XRootD"},{"location":"data/xrootd/install-standalone/#registering-an-xrootd-standalone-server","text":"To register your XRootD server, follow the general registration instructions here with the following XRootD-specific details: Add an XRootD component: section to the Services: list, with any relevant fields for that service. This is a partial example: ... FQDN: <FULLY QUALIFIED DOMAIN NAME> Services: XRootD component: Description: Standalone XRootD server ... Replacing <FULLY QUALIFIED DOMAIN NAME> with your XRootD server's DNS entry. If you are setting up a new resource, set Active: false . Only set Active: true for a resource when it is accepting requests and ready for production.","title":"Registering an XRootD Standalone Server"},{"location":"data/xrootd/install-standalone/#getting-help","text":"To get assistance. please use the Help Procedure page.","title":"Getting Help"},{"location":"data/xrootd/install-standalone/#reference","text":"XRootD documentation Export directive in the XRootD configuration and relevant options","title":"Reference"},{"location":"data/xrootd/install-standalone/#service-configuration","text":"The configuration that your XRootD service uses is determined by the service name given to systemctl . To use the standalone config, you would start XRootD with the following command: root@host # systemctl start xrootd@standalone","title":"Service Configuration"},{"location":"data/xrootd/install-standalone/#file-locations","text":"Service/Process Configuration File Description xrootd /etc/xrootd/xrootd-standalone.cfg Main XRootD configuration /etc/xrootd/config.d/ Drop-in configuration dir /etc/xrootd/auth_file Authorized users file Service/Process Log File Description xrootd /var/log/xrootd/server/xrootd.log XRootD server daemon log cmsd /var/log/xrootd/server/cmsd.log Cluster management log","title":"File locations"},{"location":"data/xrootd/install-storage-element/","text":"Installing an XRootD Storage Element \u00b6 XRootD is a hierarchical storage system that can be used in a variety of ways to access data, typically distributed among actual storage resources. One way to use XRootD is to have it refer to many data resources at a single site, and another way to use it is to refer to many storage systems, most likely distributed among sites. An XRootD system includes a redirector , which accepts requests for data and finds a storage repository \u2014 locally or otherwise \u2014 that can provide the data to the requestor. Use this page to learn how to install, configure, and use an XRootD redirector as part of a Storage Element (SE) or as part of a global namespace. Before Starting \u00b6 Before starting the installation process, consider the following points: User IDs: If it does not exist already, the installation will create the Linux user ID xrootd Service certificate: The XRootD service uses a host certificate at /etc/grid-security/host*.pem Networking: The XRootD service uses port 1094 by default As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Installing an XRootD Server \u00b6 An installation of the XRootD server consists of the server itself and its dependencies. Install these with Yum: root@host # yum install osg-xrootd Configuring an XRootD Server \u00b6 An advanced XRootD setup has multiple components; it is important to validate that each additional component that you set up is working before moving on to the next component. We have included validation instructions after each component below. Creating an XRootD cluster \u00b6 If your storage is spread out over multiple hosts, you will need to set up an XRootD cluster . The cluster uses one \"redirector\" node as a frontend for user accesses, and multiple data nodes that have the data that users request. Two daemons will run on each node: xrootd The eXtended Root Daemon controls file access and storage. cmsd The Cluster Management Services Daemon controls communication between nodes. Note that for large virtual organizations, a site-level redirector may actually also communicate upwards to a regional or global redirector that handles access to a multi-level hierarchy. This section will only cover handling one level of XRootD hierarchy. In the instructions below, <RDRNODE> will refer to the redirector host and <DATANODE> will refer to the data node host. These should be replaced with the fully-qualified domain name of the host in question. Modify /etc/xrootd/xrootd-clustered.cfg \u00b6 You will need to modify the xrootd-clustered.cfg on the redirector node and each data node. The following example should serve as a base configuration for clustering. Further customizations are detailed below. all.export /mnt/xrootd stage set xrdr = <RDRNODE> all.manager $(xrdr):3121 if $(xrdr) # Lines in this block are only executed on the redirector node all.role manager else # Lines in this block are executed on all nodes but the redirector node all.role server cms.space min 2g 5g fi You will need to customize the following lines: Configuration Line Changes Needed all.export /mnt/xrootd stage Change /mnt/xrootd to the directory to allow XRootD access to set xrdr=<RDRNODE> Change to the hostname of the redirector cms.space min 2g 5g Reserve this amount of free space on the node. For this example, if space falls below 2GB, xrootd will not store further files on this node until space climbs above 5GB. You can use k , m , g , or t to indicate kilobyte, megabytes, gigabytes, or terabytes, respectively. Further information can be found at https://xrootd.slac.stanford.edu/docs.html Verifying the clustered config \u00b6 Start both xrootd and cmsd on all nodes according to the instructions in the Using XRootD section . Verify that you can copy a file such as /bin/sh to /mnt/xrootd on the server data via the redirector: root@host # xrdcp /bin/sh root://<RDRNODE>:1094///mnt/xrootd/second_test [xrootd] Total 0.76 MB [====================] 100.00 % [inf MB/s] Check that the /mnt/xrootd/second_test is located on data server <DATANODE> . (Optional) Adding High Availability (HA) redirectors \u00b6 It is possible to have an XRootD clustered setup with more than one redirector to ensure high availability service. To do this: In the /etc/xrootd/xrootd-clustered.cfg on each data node follow the instructions in this section with: set xrdr1 = <RDRNODE1> set xrdr2 = <RDRNODE2> all.manager $(xrdr1):3121 all.manager $(xrdr2):3121 Create DNS ALIAS records for <RDRNODE> pointing to <RDNODE1> and <RDRNODE2> Advertise the <RDRNODE> FQDN to users interacting with the XRootD cluster should be <RDRNODE> . (Optional) Adding Simple Server Inventory to your cluster \u00b6 The Simple Server Inventory (SSI) provide means to have an inventory for each data server. SSI requires: A second instance of the xrootd daemon on the redirector A \"composite name space daemon\" ( XrdCnsd ) on each data server; this daemon handles the inventory As an example, we will set up a two-node XRootD cluster with SSI. Host A is a redirector node that is running the following daemons: xrootd redirector cmsd xrootd - second instance that required for SSI Host B is a data server that is running the following daemons: xrootd data server cmsd XrdCnsd - started automatically by xrootd We will need to create a directory on the redirector node for Inventory files. root@host # mkdir -p /data/inventory root@host # chown xrootd:xrootd /data/inventory On the data server (host B) let's use a storage cache that will be at a different location from /mnt/xrootd . root@host # mkdir -p /local/xrootd root@host # chown xrootd:xrootd /local/xrootd We will be running two instances of XRootD on <HOST A> . Modify /etc/xrootd/xrootd-clustered.cfg to give the two instances different behavior, as such: all.export /data/xrootdfs set xrdr=<HOST A> all.manager $(xrdr):3121 if $(xrdr) && named cns all.export /data/inventory xrd.port 1095 else if $(xrdr) all.role manager xrd.port 1094 else all.role server oss.localroot /local/xrootd ofs.notify closew create mkdir mv rm rmdir trunc | /usr/bin/XrdCnsd -d -D 2 -i 90 -b $(xrdr):1095:/data/inventory #add cms.space if you have less the 11GB # cms.space options https://xrootd.slac.stanford.edu/doc/dev410/cms_config.htm cms.space min 2g 5g fi The value of oss.localroot will be prepended to any file access. E.g. accessing root://<RDRNODE>:1094//data/xrootdfs/test1 will actually go to /local/xrootd/data/xrootdfs/test1 . Starting a second instance of XRootD \u00b6 Create a symlink pointing to /etc/xrootd/xrootd-clustered.cfg at /etc/xrootd/xrootd-cns.cfg : root@host # ln -s /etc/xrootd/xrootd-clustered.cfg /etc/xrootd/xrootd-cns.cfg Start an instance of the xrootd service named cns using the syntax in the managing services section : root@host # systemctl start xrootd@cns Testing an XRootD cluster with SSI \u00b6 Copy file to redirector node specifying storage path (/data/xrootdfs instead of /mnt/xrootd): root@host # xrdcp /bin/sh root://<RDRNODE>:1094//data/xrootdfs/test1 [xrootd] Total 0.00 MB [================] 100.00 % [inf MB/s] To verify that SSI is working execute cns_ssi command on the redirector node: root@host # cns_ssi list /data/inventory fermicloud054.fnal.gov incomplete inventory as of Mon Apr 11 17:28:11 2011 root@host # cns_ssi updt /data/inventory cns_ssi: fermicloud054.fnal.gov inventory with 1 directory and 1 file updated with 0 errors. root@host # cns_ssi list /data/inventory fermicloud054.fnal.gov complete inventory as of Tue Apr 12 07:38:29 2011 /data/xrootdfs/test1 Note : In this example, fermicloud53.fnal.gov is a redirector node and fermicloud054.fnal.gov is a data node. (Optional) Enabling Xrootd over HTTP \u00b6 XRootD can be accessed using the HTTP protocol. To do that: Add the following line to /etc/xrootd/config.d/10-common-site-local.cfg : set EnableHttp = 1 Testing the configuration From the terminal, generate a proxy and attempt to use davix-get to copy from your XRootD host (the XRootD service needs running; see the services section ). For example, if your server has a file named /store/user/test.root : davix-get https://<YOUR FQDN>:1094/store/user/test.root -E /mnt/xrootd/x509up_u`id -u` --capath /etc/grid-security/certificates Note For clients to successfully read from the regional redirector, HTTPS must be enabled for the data servers and the site-level redirector. Warning If you have u * in your Authfile, recall this provides an authorization to ALL users, including unauthenticated. This includes random web spiders! (Optional) Enable HTTP based Writes \u00b6 No changes to the HTTP module is needed to enable HTTP-based writes. The HTTP protocol uses the same authorization setup as the XRootD protocol. For example, you may need to provide a (all) style authorizations to allow users authorization to write. See the Authentication File section for more details. (Optional) Enabling a FUSE mount \u00b6 XRootD storage can be mounted as a standard POSIX filesystem via FUSE, providing users with a more familiar interface.. Modify /etc/fstab by adding the following entries: .... xrootdfs /mnt/xrootd fuse rdr=xroot://<REDIRECTOR FQDN>:1094/<PATH TO FILE>,uid=xrootd 0 0 Replace /mnt/xrootd with the path that you would like to access with. Create /mnt/xrootd directory. Make sure the xrootd user exists on the system. Once you are finished, you can mount it: mount /mnt/xrootd You should now be able to run UNIX commands such as ls /mnt/xrootd to see the contents of the XRootD server. (Optional) Authorization \u00b6 For information on how to configure XRootD authorization, please refer to the Configuring XRootD Authorization guide . (Optional) Adding CMS TFC support to XRootD (CMS sites only) \u00b6 For CMS users, there is a package available to integrate rule-based name lookup using a storage.xml file. See this documentation . (Optional) Adding Multi user support for an XRootd server \u00b6 For documentation how to enable multi-user support using XRootD see this documentation . (Optional) Adding File Residency Manager (FRM) to an XRootd cluster \u00b6 If you have a multi-tiered storage system (e.g. some data is stored on SSDs and some on disks or tapes), then install the File Residency Manager (FRM), so you can move data between tiers more easily. If you do not have a multi-tiered storage system, then you do not need FRM and you can skip this section. The FRM deals with two major mechanisms: local disk remote servers The description of fully functional multiple XRootD clusters is beyond the scope of this document. In order to have this fully functional system you will need a global redirector and at least one remote XRootD cluster from where files could be moved to the local cluster. Below are the modifications you should make in order to enable FRM on your local cluster: Make sure that FRM is enabled in /etc/sysconfig/xrootd on your data sever: ROOTD_USER=xrootd XROOTD_GROUP=xrootd XROOTD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/xrootd.log -c /etc/xrootd/xrootd-clustered.cfg\" CMSD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/cmsd.log -c /etc/xrootd/xrootd-clustered.cfg\" FRMD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/frmd.log -c /etc/xrootd/xrootd-clustered.cfg\" XROOTD_INSTANCES=\"default\" CMSD_INSTANCES=\"default\" FRMD_INSTANCES=\"default\" Modify /etc/xrootd/xrootd-clustered.cfg on both nodes to specify options for frm_xfrd (File Transfer Daemon) and frm_purged (File Purging Daemon). For more information, you can visit the FRM Documentation Start frm daemons on data server: root@host # service frm_xfrd start root@host # service frm_purged start Using XRootD \u00b6 Managing XRootD services \u00b6 Start services on the redirector node before starting any services on the data nodes. If you installed only XRootD itself, you will only need to start the xrootd service. However, if you installed cluster management services, you will need to start cmsd as well. XRootD determines which configuration to use based on the service name specified by systemctl . For example, to have xrootd use the clustered config, you would start up xrootd with this line: root@host # systemctl start xrootd@clustered To use the standalone config instead, you would use: root@host # systemctl start xrootd@standalone The services are: Service EL 7 & 8 service name XRootD (standalone config) xrootd@standalone XRootD (clustered config) xrootd@clustered XRootD (multiuser) xrootd-privileged@clustered CMSD (clustered config) cmsd@clustered As a reminder, here are common service commands (all run as root ): To ... On EL 7 & 8, run the command... Start a service systemctl start SERVICE-NAME Stop a service systemctl stop SERVICE-NAME Enable a service to start during boot systemctl enable SERVICE-NAME Disable a service from starting during boot systemctl disable SERVICE-NAME Getting Help \u00b6 To get assistance. please use the Help Procedure page. Reference \u00b6 File locations \u00b6 Service/Process Configuration File Description xrootd /etc/xrootd/xrootd-clustered.cfg Main clustered mode XRootD configuration /etc/xrootd/auth_file Authorized users file Service/Process Log File Description xrootd /var/log/xrootd/xrootd.log XRootD server daemon log cmsd /var/log/xrootd/cmsd.log Cluster management log cns /var/log/xrootd/cns/xrootd.log Server inventory (composite name space) log frm_xfrd , frm_purged /var/log/xrootd/frmd.log File Residency Manager log Links \u00b6 XRootD documentation","title":"Install XRootD SE"},{"location":"data/xrootd/install-storage-element/#installing-an-xrootd-storage-element","text":"XRootD is a hierarchical storage system that can be used in a variety of ways to access data, typically distributed among actual storage resources. One way to use XRootD is to have it refer to many data resources at a single site, and another way to use it is to refer to many storage systems, most likely distributed among sites. An XRootD system includes a redirector , which accepts requests for data and finds a storage repository \u2014 locally or otherwise \u2014 that can provide the data to the requestor. Use this page to learn how to install, configure, and use an XRootD redirector as part of a Storage Element (SE) or as part of a global namespace.","title":"Installing an XRootD Storage Element"},{"location":"data/xrootd/install-storage-element/#before-starting","text":"Before starting the installation process, consider the following points: User IDs: If it does not exist already, the installation will create the Linux user ID xrootd Service certificate: The XRootD service uses a host certificate at /etc/grid-security/host*.pem Networking: The XRootD service uses port 1094 by default As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"data/xrootd/install-storage-element/#installing-an-xrootd-server","text":"An installation of the XRootD server consists of the server itself and its dependencies. Install these with Yum: root@host # yum install osg-xrootd","title":"Installing an XRootD Server"},{"location":"data/xrootd/install-storage-element/#configuring-an-xrootd-server","text":"An advanced XRootD setup has multiple components; it is important to validate that each additional component that you set up is working before moving on to the next component. We have included validation instructions after each component below.","title":"Configuring an XRootD Server"},{"location":"data/xrootd/install-storage-element/#creating-an-xrootd-cluster","text":"If your storage is spread out over multiple hosts, you will need to set up an XRootD cluster . The cluster uses one \"redirector\" node as a frontend for user accesses, and multiple data nodes that have the data that users request. Two daemons will run on each node: xrootd The eXtended Root Daemon controls file access and storage. cmsd The Cluster Management Services Daemon controls communication between nodes. Note that for large virtual organizations, a site-level redirector may actually also communicate upwards to a regional or global redirector that handles access to a multi-level hierarchy. This section will only cover handling one level of XRootD hierarchy. In the instructions below, <RDRNODE> will refer to the redirector host and <DATANODE> will refer to the data node host. These should be replaced with the fully-qualified domain name of the host in question.","title":"Creating an XRootD cluster"},{"location":"data/xrootd/install-storage-element/#modify-etcxrootdxrootd-clusteredcfg","text":"You will need to modify the xrootd-clustered.cfg on the redirector node and each data node. The following example should serve as a base configuration for clustering. Further customizations are detailed below. all.export /mnt/xrootd stage set xrdr = <RDRNODE> all.manager $(xrdr):3121 if $(xrdr) # Lines in this block are only executed on the redirector node all.role manager else # Lines in this block are executed on all nodes but the redirector node all.role server cms.space min 2g 5g fi You will need to customize the following lines: Configuration Line Changes Needed all.export /mnt/xrootd stage Change /mnt/xrootd to the directory to allow XRootD access to set xrdr=<RDRNODE> Change to the hostname of the redirector cms.space min 2g 5g Reserve this amount of free space on the node. For this example, if space falls below 2GB, xrootd will not store further files on this node until space climbs above 5GB. You can use k , m , g , or t to indicate kilobyte, megabytes, gigabytes, or terabytes, respectively. Further information can be found at https://xrootd.slac.stanford.edu/docs.html","title":"Modify /etc/xrootd/xrootd-clustered.cfg"},{"location":"data/xrootd/install-storage-element/#verifying-the-clustered-config","text":"Start both xrootd and cmsd on all nodes according to the instructions in the Using XRootD section . Verify that you can copy a file such as /bin/sh to /mnt/xrootd on the server data via the redirector: root@host # xrdcp /bin/sh root://<RDRNODE>:1094///mnt/xrootd/second_test [xrootd] Total 0.76 MB [====================] 100.00 % [inf MB/s] Check that the /mnt/xrootd/second_test is located on data server <DATANODE> .","title":"Verifying the clustered config"},{"location":"data/xrootd/install-storage-element/#optional-adding-high-availability-ha-redirectors","text":"It is possible to have an XRootD clustered setup with more than one redirector to ensure high availability service. To do this: In the /etc/xrootd/xrootd-clustered.cfg on each data node follow the instructions in this section with: set xrdr1 = <RDRNODE1> set xrdr2 = <RDRNODE2> all.manager $(xrdr1):3121 all.manager $(xrdr2):3121 Create DNS ALIAS records for <RDRNODE> pointing to <RDNODE1> and <RDRNODE2> Advertise the <RDRNODE> FQDN to users interacting with the XRootD cluster should be <RDRNODE> .","title":"(Optional) Adding High Availability (HA) redirectors"},{"location":"data/xrootd/install-storage-element/#optional-adding-simple-server-inventory-to-your-cluster","text":"The Simple Server Inventory (SSI) provide means to have an inventory for each data server. SSI requires: A second instance of the xrootd daemon on the redirector A \"composite name space daemon\" ( XrdCnsd ) on each data server; this daemon handles the inventory As an example, we will set up a two-node XRootD cluster with SSI. Host A is a redirector node that is running the following daemons: xrootd redirector cmsd xrootd - second instance that required for SSI Host B is a data server that is running the following daemons: xrootd data server cmsd XrdCnsd - started automatically by xrootd We will need to create a directory on the redirector node for Inventory files. root@host # mkdir -p /data/inventory root@host # chown xrootd:xrootd /data/inventory On the data server (host B) let's use a storage cache that will be at a different location from /mnt/xrootd . root@host # mkdir -p /local/xrootd root@host # chown xrootd:xrootd /local/xrootd We will be running two instances of XRootD on <HOST A> . Modify /etc/xrootd/xrootd-clustered.cfg to give the two instances different behavior, as such: all.export /data/xrootdfs set xrdr=<HOST A> all.manager $(xrdr):3121 if $(xrdr) && named cns all.export /data/inventory xrd.port 1095 else if $(xrdr) all.role manager xrd.port 1094 else all.role server oss.localroot /local/xrootd ofs.notify closew create mkdir mv rm rmdir trunc | /usr/bin/XrdCnsd -d -D 2 -i 90 -b $(xrdr):1095:/data/inventory #add cms.space if you have less the 11GB # cms.space options https://xrootd.slac.stanford.edu/doc/dev410/cms_config.htm cms.space min 2g 5g fi The value of oss.localroot will be prepended to any file access. E.g. accessing root://<RDRNODE>:1094//data/xrootdfs/test1 will actually go to /local/xrootd/data/xrootdfs/test1 .","title":"(Optional) Adding Simple Server Inventory to your cluster"},{"location":"data/xrootd/install-storage-element/#starting-a-second-instance-of-xrootd","text":"Create a symlink pointing to /etc/xrootd/xrootd-clustered.cfg at /etc/xrootd/xrootd-cns.cfg : root@host # ln -s /etc/xrootd/xrootd-clustered.cfg /etc/xrootd/xrootd-cns.cfg Start an instance of the xrootd service named cns using the syntax in the managing services section : root@host # systemctl start xrootd@cns","title":"Starting a second instance of XRootD"},{"location":"data/xrootd/install-storage-element/#testing-an-xrootd-cluster-with-ssi","text":"Copy file to redirector node specifying storage path (/data/xrootdfs instead of /mnt/xrootd): root@host # xrdcp /bin/sh root://<RDRNODE>:1094//data/xrootdfs/test1 [xrootd] Total 0.00 MB [================] 100.00 % [inf MB/s] To verify that SSI is working execute cns_ssi command on the redirector node: root@host # cns_ssi list /data/inventory fermicloud054.fnal.gov incomplete inventory as of Mon Apr 11 17:28:11 2011 root@host # cns_ssi updt /data/inventory cns_ssi: fermicloud054.fnal.gov inventory with 1 directory and 1 file updated with 0 errors. root@host # cns_ssi list /data/inventory fermicloud054.fnal.gov complete inventory as of Tue Apr 12 07:38:29 2011 /data/xrootdfs/test1 Note : In this example, fermicloud53.fnal.gov is a redirector node and fermicloud054.fnal.gov is a data node.","title":"Testing an XRootD cluster with SSI"},{"location":"data/xrootd/install-storage-element/#optional-enabling-xrootd-over-http","text":"XRootD can be accessed using the HTTP protocol. To do that: Add the following line to /etc/xrootd/config.d/10-common-site-local.cfg : set EnableHttp = 1 Testing the configuration From the terminal, generate a proxy and attempt to use davix-get to copy from your XRootD host (the XRootD service needs running; see the services section ). For example, if your server has a file named /store/user/test.root : davix-get https://<YOUR FQDN>:1094/store/user/test.root -E /mnt/xrootd/x509up_u`id -u` --capath /etc/grid-security/certificates Note For clients to successfully read from the regional redirector, HTTPS must be enabled for the data servers and the site-level redirector. Warning If you have u * in your Authfile, recall this provides an authorization to ALL users, including unauthenticated. This includes random web spiders!","title":"(Optional) Enabling Xrootd over HTTP"},{"location":"data/xrootd/install-storage-element/#optional-enable-http-based-writes","text":"No changes to the HTTP module is needed to enable HTTP-based writes. The HTTP protocol uses the same authorization setup as the XRootD protocol. For example, you may need to provide a (all) style authorizations to allow users authorization to write. See the Authentication File section for more details.","title":"(Optional) Enable HTTP based Writes"},{"location":"data/xrootd/install-storage-element/#optional-enabling-a-fuse-mount","text":"XRootD storage can be mounted as a standard POSIX filesystem via FUSE, providing users with a more familiar interface.. Modify /etc/fstab by adding the following entries: .... xrootdfs /mnt/xrootd fuse rdr=xroot://<REDIRECTOR FQDN>:1094/<PATH TO FILE>,uid=xrootd 0 0 Replace /mnt/xrootd with the path that you would like to access with. Create /mnt/xrootd directory. Make sure the xrootd user exists on the system. Once you are finished, you can mount it: mount /mnt/xrootd You should now be able to run UNIX commands such as ls /mnt/xrootd to see the contents of the XRootD server.","title":"(Optional) Enabling a FUSE mount"},{"location":"data/xrootd/install-storage-element/#optional-authorization","text":"For information on how to configure XRootD authorization, please refer to the Configuring XRootD Authorization guide .","title":"(Optional) Authorization"},{"location":"data/xrootd/install-storage-element/#optional-adding-cms-tfc-support-to-xrootd-cms-sites-only","text":"For CMS users, there is a package available to integrate rule-based name lookup using a storage.xml file. See this documentation .","title":"(Optional) Adding CMS TFC support to XRootD (CMS sites only)"},{"location":"data/xrootd/install-storage-element/#optional-adding-multi-user-support-for-an-xrootd-server","text":"For documentation how to enable multi-user support using XRootD see this documentation .","title":"(Optional) Adding Multi user support for an XRootd server"},{"location":"data/xrootd/install-storage-element/#optional-adding-file-residency-manager-frm-to-an-xrootd-cluster","text":"If you have a multi-tiered storage system (e.g. some data is stored on SSDs and some on disks or tapes), then install the File Residency Manager (FRM), so you can move data between tiers more easily. If you do not have a multi-tiered storage system, then you do not need FRM and you can skip this section. The FRM deals with two major mechanisms: local disk remote servers The description of fully functional multiple XRootD clusters is beyond the scope of this document. In order to have this fully functional system you will need a global redirector and at least one remote XRootD cluster from where files could be moved to the local cluster. Below are the modifications you should make in order to enable FRM on your local cluster: Make sure that FRM is enabled in /etc/sysconfig/xrootd on your data sever: ROOTD_USER=xrootd XROOTD_GROUP=xrootd XROOTD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/xrootd.log -c /etc/xrootd/xrootd-clustered.cfg\" CMSD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/cmsd.log -c /etc/xrootd/xrootd-clustered.cfg\" FRMD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/frmd.log -c /etc/xrootd/xrootd-clustered.cfg\" XROOTD_INSTANCES=\"default\" CMSD_INSTANCES=\"default\" FRMD_INSTANCES=\"default\" Modify /etc/xrootd/xrootd-clustered.cfg on both nodes to specify options for frm_xfrd (File Transfer Daemon) and frm_purged (File Purging Daemon). For more information, you can visit the FRM Documentation Start frm daemons on data server: root@host # service frm_xfrd start root@host # service frm_purged start","title":"(Optional) Adding File Residency Manager (FRM) to an XRootd cluster"},{"location":"data/xrootd/install-storage-element/#using-xrootd","text":"","title":"Using XRootD"},{"location":"data/xrootd/install-storage-element/#managing-xrootd-services","text":"Start services on the redirector node before starting any services on the data nodes. If you installed only XRootD itself, you will only need to start the xrootd service. However, if you installed cluster management services, you will need to start cmsd as well. XRootD determines which configuration to use based on the service name specified by systemctl . For example, to have xrootd use the clustered config, you would start up xrootd with this line: root@host # systemctl start xrootd@clustered To use the standalone config instead, you would use: root@host # systemctl start xrootd@standalone The services are: Service EL 7 & 8 service name XRootD (standalone config) xrootd@standalone XRootD (clustered config) xrootd@clustered XRootD (multiuser) xrootd-privileged@clustered CMSD (clustered config) cmsd@clustered As a reminder, here are common service commands (all run as root ): To ... On EL 7 & 8, run the command... Start a service systemctl start SERVICE-NAME Stop a service systemctl stop SERVICE-NAME Enable a service to start during boot systemctl enable SERVICE-NAME Disable a service from starting during boot systemctl disable SERVICE-NAME","title":"Managing XRootD services"},{"location":"data/xrootd/install-storage-element/#getting-help","text":"To get assistance. please use the Help Procedure page.","title":"Getting Help"},{"location":"data/xrootd/install-storage-element/#reference","text":"","title":"Reference"},{"location":"data/xrootd/install-storage-element/#file-locations","text":"Service/Process Configuration File Description xrootd /etc/xrootd/xrootd-clustered.cfg Main clustered mode XRootD configuration /etc/xrootd/auth_file Authorized users file Service/Process Log File Description xrootd /var/log/xrootd/xrootd.log XRootD server daemon log cmsd /var/log/xrootd/cmsd.log Cluster management log cns /var/log/xrootd/cns/xrootd.log Server inventory (composite name space) log frm_xfrd , frm_purged /var/log/xrootd/frmd.log File Residency Manager log","title":"File locations"},{"location":"data/xrootd/install-storage-element/#links","text":"XRootD documentation","title":"Links"},{"location":"data/xrootd/overview/","text":"XRootD Overview \u00b6 XRootD is a highly-configurable data server used by sites in the OSG to support VO-specific storage needs. The software can be used to create an export of an existing file system through multiple protocols, participate in a data federation, or act as a caching service. XRootD data servers can stream data directly to client applications or support experiment-wide data management by performing bulk data transfer via \"third-party-copy\" between distinct sites. The OSG supports multiple different configurations of XRootD: XCache \u00b6 Previously known as the \"XRootD proxy cache\", XCache provides a caching service for data federations that serve one or more VOs. If your site contributes large amounts of computing resources to the OSG, a site XCache could be part of a solution to help reduce incoming WAN usage. In the OSG, there are three data federations based on XCache: ATLAS XCache, CMS XCache, and StashCache for all other VOs. If you are affiliated with a site or VO interested in contributing to a data federation, contact us at help@osg-htc.org . XRootD Standalone \u00b6 An XRootD standalone server exports data from an existing network storage solution, such as HDFS or Lustre, using both the XRootD and WebDAV protocols. Generally, only sites affiliated with large VOs would need to install an XRootD standalone server so consult your VO if you are interested in contributing storage. XRootD Storage Element \u00b6 For an XRootD storage element (SE) , the XRootD software acts as the network storage technology, exporting data from multiple, distributed hosts using both the XRootD and WebDAV protocols. Generally, only sites affiliated with large VOs would need to install an XRootD SE so consult your VO if you are interested in contributing storage.","title":"XRootD Overview"},{"location":"data/xrootd/overview/#xrootd-overview","text":"XRootD is a highly-configurable data server used by sites in the OSG to support VO-specific storage needs. The software can be used to create an export of an existing file system through multiple protocols, participate in a data federation, or act as a caching service. XRootD data servers can stream data directly to client applications or support experiment-wide data management by performing bulk data transfer via \"third-party-copy\" between distinct sites. The OSG supports multiple different configurations of XRootD:","title":"XRootD Overview"},{"location":"data/xrootd/overview/#xcache","text":"Previously known as the \"XRootD proxy cache\", XCache provides a caching service for data federations that serve one or more VOs. If your site contributes large amounts of computing resources to the OSG, a site XCache could be part of a solution to help reduce incoming WAN usage. In the OSG, there are three data federations based on XCache: ATLAS XCache, CMS XCache, and StashCache for all other VOs. If you are affiliated with a site or VO interested in contributing to a data federation, contact us at help@osg-htc.org .","title":"XCache"},{"location":"data/xrootd/overview/#xrootd-standalone","text":"An XRootD standalone server exports data from an existing network storage solution, such as HDFS or Lustre, using both the XRootD and WebDAV protocols. Generally, only sites affiliated with large VOs would need to install an XRootD standalone server so consult your VO if you are interested in contributing storage.","title":"XRootD Standalone"},{"location":"data/xrootd/overview/#xrootd-storage-element","text":"For an XRootD storage element (SE) , the XRootD software acts as the network storage technology, exporting data from multiple, distributed hosts using both the XRootD and WebDAV protocols. Generally, only sites affiliated with large VOs would need to install an XRootD SE so consult your VO if you are interested in contributing storage.","title":"XRootD Storage Element"},{"location":"data/xrootd/xrootd-authorization/","text":"Configuring XRootD Authorization \u00b6 XRootD offers several authentication options using security plugins to validate incoming credentials, such as bearer tokens, X.509 proxies, and VOMS proxies. In the case of X.509 and VOMS proxies, after the incoming credential has been mapped to a username or groupname, the authorization database is used to provide fine-grained file access. Note On data nodes, files will be owned by Unix user xrootd (or other daemon user), not as the user authenticated to, under most circumstances. XRootD will verify the permissions and authorization based on the user that the security plugin authenticates you to, but, internally, the data node files will be owned by the xrootd user. If this behaviour is not desired, enable XRootD multi-user support . Authorizing Bearer Tokens \u00b6 The OSG 3.6 configurations of XRootD support authorization of bearer tokens such as macaroons, SciTokens, or WLCG tokens. Encoded in the bearer tokens themselves are information about the files that they should have read/write access to and in the case of SciTokens and WLCG tokens, you may configure XRootD to further restrict access. Configuring SciTokens/WLCG Tokens \u00b6 SciTokens and WLCG Tokens are asymmetrically signed bearer tokens: they are signed by a token issuer (e.g., CILogon, IAM) and can be verified with the token issuer's public key. To configure XRootD to accept tokens from a given token issuer use the following instructions: Add a section for each token issuer to /etc/xrootd/scitokens.conf : [Issuer <NAME>] issuer = <URL> base_path = <RELATIVE PATH> Replacing <NAME > with a descriptive name, <URL> with the token issuer URL, and base_path to a path relative to rootdir that the client should be restricted to accessing. (Optional) if you want to map the incoming token for a given issuer to a Unix username: Install xrootd-multiuser Add the following to the relevant issuer section in /etc/xrootd/scitokens.conf : map_subject = True (Optional) if you want to only accept tokens with the appropriate aud field, add the following to /etc/xrootd/scitokens.conf : [Global] audience = <COMMMA SEPARATED LIST OF AUDIENCES> An example configuration that supports tokens issued by the OSG Connect and CMS: [Global] audience = https://testserver.example.com/, MySite [Issuer OSG-Connect] issuer = https://scitokens.org/osg-connect base_path = /stash map_subject = True [Issuer CMS] issuer = https://scitokens.org/cms base_path = /user/cms Configuring macaroons \u00b6 Macaroons are symetrically signed bearer tokens so your XRootD host must have access to the same secret key that is used to sign incoming macaroons. When used in an XRootD cluster, all data nodes and the redirector need access to the same secret. To enable macaroon support: Place the shared secret in /etc/xrootd/macaroon-secret Ensure that it has the appropriate file ownership and permissions: root@host # chown xrootd:xrootd /etc/xrootd/macaroon-secret root@host # chmod 0600 /etc/xrootd/macaroon-secret Authorizing X.509 proxies \u00b6 Authenticating proxies \u00b6 Authorizations for proxy-based security are declared in an XRootD authorization database file . XRootD authentication plugins are used to provide the mappings that are used in the database. Starting with OSG 3.6 , DN mappings are performed with XRootD's built-in GSI support, and FQAN mappings are with the XRootD-VOMS ( XrdVoms ) plugin. To enable proxy authentication, edit /etc/xrootd/config.d/10-osg-xrdvoms.cfg and add or uncomment the line set EnableVoms = 1 Note Proxy authentication is already enabled in XRootD Standalone , so this step is not necessary there. Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, or XRootD 5.5.0 or newer. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. Key length requirements Servers on EL 8 or newer will reject proxies that are not at least 2048 bits long. Ensure your clients' proxies have at least 2048 bits long with voms-proxy-info ; if necessary, have them add the argument -bits 2048 to their voms-proxy-init calls. Mapping subject DNs \u00b6 DN mappings take precedence over VOMS attributes If you have mapped the subject Distinguished Name (DN) of an incoming proxy with VOMS attributes, XRootD will map it to a username. In OSG 3.6, X.509 proxies are mapped using the built-in XRootD GSI plug-in. To map an incoming proxy's subject DN to an XRootD username , add lines of the following format to /etc/grid-security/grid-mapfile : \"<SUBJECT DN>\" <AUTHDB USERNAME> Replacing <SUBJECT DN> with the X.509 proxy's DN to map and <AUTHDB USERNAME> with the username to reference in the authorization database . For example, the following mapping: \"/DC=org/DC=cilogon/C=US/O=University of Wisconsin-Madison/CN=Brian Lin A2266246\" blin Will result in the username blin , i.e. authorize access to clients presenting the above proxy with u blin ... in the authorization database. Mapping VOMS attributes \u00b6 Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, or XRootD 5.5.0 or newer. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. In OSG 3.6, if the XRootD-VOMS plugin is enabled, an incoming VOMS proxy will authenticate the first VOMS FQAN and map it to an organization name ( o ), groupname ( g ), and role name ( r ) in the authorization database . For example, a proxy from the OSPool whose first VOMS FQAN is /osg/Role=NULL/Capability=NULL will be authenticated to the /osg groupname; note that the / is included in the groupname. Instead of only using the first VOMS FQAN, you can configure XRootD to consider all VOMS FQANs in the proxy for authentication by setting the following in /etc/xrootd/config.d/10-osg-xrdvoms.cfg : set vomsfqans = useall Mapping VOMS attributes to users \u00b6 In order for the XRootD-Multiuser plugin to work, a proxy must be mapped to a user ( u ) that is a valid Unix user. Use a VOMS Mapfile, conventionally in /etc/grid-security/voms-mapfile that contains lines in the following form: \"<FQAN PATTERN>\" <USERNAME> replacing <FQAN PATTERN> with a glob matching FQANs, and <USERNAME> with the user that you want to map matching FQANs to. For example, \"/osg/*\" osg01 will map FQANs starting with /osg/ to the user osg01 . To enable using VOMS mapfiles in the first place, add the following line to your XRootD configuration: voms.mapfile /etc/grid-security/voms-mapfile replacing /etc/grid-security/voms-mapfile with the actual location of your mapfile, if it is different. Note A VOMS Mapfile only affects mapping the user ( u ) attribute understood in the authorization-database . The FQAN will always be used for the groupname ( g ), organization name ( o ), and role name ( r ), even if the mapfile is missing or does not contain a matching mapping. See the VOMS Mapping documentation for details. VOMS Mapfiles previously used with LCMAPS should continue to work unmodified, but the plugin can only look at a single mapfile, so if you are using the mappings provided in /usr/share/osg/voms-mapfile-default (by the vo-client-lcmaps-voms package), you will have to copy them to /etc/grid-security/voms-mapfile . Authorization database \u00b6 XRootD allows configuring fine-grained file access permissions based on authenticated identities and paths. This is configured in the authorization file /etc/xrootd/Authfile , which should be writable only by the xrootd user, optionally readable by others. Here is an example /etc/xrootd/Authfile : # This means that all the users have read access to the datasets, _except_ under /private u * <STORAGE PATH>/private -rl <STORAGE PATH> rl # Or the following, without a restricted /private dir # u * <STORAGE PATH> rl # This means that all the users have full access to their private home dirs u = <STORAGE PATH>/home/@=/ a # This means that the privileged 'xrootd' user can do everything # There must be at least one such user in order to create the # private dirs for users willing to store their data in the facility u xrootd <STORAGE PATH> a # This means that OSPool clients presenting a VOMS proxy can do anything under the 'osg' directory g /osg <STORAGE PATH>/osg a Replacing <STORAGE PATH> with the path to the directory that will contain data served by XRootD, e.g. /data/xrootdfs . This path is relative to the rootdir . Configure most to least specific paths Specific paths need to be specified before generic paths. For example, this line will allow all users to read the contents /data/xrootdfs/private : u * /data/xrootdfs rl /data/xrootdfs/private -rl Instead, specify the following to ensure that a given user will not be able to read the contents of /data/xrootdfs/private unless specified with another authorization rule: u * /data/xrootdfs/private -rl /data/xrootdfs rl Formatting \u00b6 More generally, each authorization rule of the authorization database has the following form: idtype id path privs Field Description idtype Type of id. Use u for username, g for groupname, o for organization name, r for role name, etc. id ID name, e.g. username or groupname. Use * for all users or = for user-specific capabilities, like home directories path The path prefix to be used for matching purposes. @= expands to the current user name before a path prefix match is attempted privs Letter list of privileges: a - all ; l - lookup ; d - delete ; n - rename ; i - insert ; r - read ; k - lock (not used) ; w - write ; - - prefix to remove specified privileges For more details or examples on how to use templated user options, see XRootD authorization database . Verifying file ownership and permissions \u00b6 Ensure the authorization datbase file is owned by xrootd (if you have created file as root), and that it is not writable by others. root@host # chown xrootd:xrootd /etc/xrootd/Authfile root@host # chmod 0640 /etc/xrootd/Authfile # or 0644 Multiuser and the authorization database \u00b6 The XRootD-Multiuser plugin can be used to perform file system operations as a different user than the XRootD daemon (whose user is xrootd ). If it is enabled, then after authorization is done using the authorization database, XRootD will take the user ( u ) attribute of the incoming request, and perform file operations as the Unix user with the same name as that attribute. Note If there is no Unix user with a matching name, you will see an error like XRootD mapped request to username that does not exist: <username> ; the operation will then fail with \"EACCES\" (access denied). Applying Authorization Changes \u00b6 After making changes to your authorization database , you must restart the relevant services . Verifying XRootD Authorization \u00b6 Bearer tokens \u00b6 To test read access using macaroon, SciTokens, and WLCG token authorization with an OSG 3.6 installation, run the following command: user@host $ curl -v \\ -H 'Authorization: Bearer <TOKEN>' \\ https://host.example.com//path/to/directory/hello_world Replacing <TOKEN> with the contents of your encoded token, host.example.com with the target XRootD host, and /path/to/directory/hello_world with the path of the file to read. To test write access, using macaroon, SciTokens, and WLCG token authorization, run the following command: user@host $ curl -v \\ -X PUT \\ --upload-file <FILE TO UPLOAD> \\ -H 'Authorization: Bearer <TOKEN>' \\ https://host.example.com//path/to/directory/hello_world Replacing <TOKEN> with the contents of your encoded token, <FILE TO UPLOAD> with the file to write to the XRootD host, host.example.com with the target XRootD host, and /path/to/directory/hello_world with the path of the file to write. X.509 and VOMS proxies \u00b6 To verify X.509 and VOMS proxy authorization, run the following commands from a machine with your user certificate/key pair, xrootd-client , and voms-clients-cpp installed: Destroy any pre-existing proxies and attempt a copy to a directory (which we will refer to as <DESTINATION PATH> ) on the <XROOTD HOST> to verify failure: user@client $ voms-proxy-destroy user@client $ xrdcp /bin/bash root://<XROOTD HOST>/<DESTINATION PATH> 180213 13:56:49 396570 cryptossl_X509CreateProxy: EEC certificate has expired [0B/0B][100%][==================================================][0B/s] Run: [FATAL] Auth failed On the XRootD host, add your DN to /etc/grid-security/grid-mapfile Add a line to the authorization database to ensure the mapped user can write to <DESTINATION PATH> Restart the relevant XRootD services. See this section for details Generate your proxy and verify that you can successfully transfer files: user@client $ voms-proxy-init user@client $ xrdcp /bin/sh root://<XROOTD HOST>/<DESTINATION PATH> [938.1kB/938.1kB][100%][==================================================][938.1kB/s] If your transfer does not succeed, re-run xrdcp with --debug 2 for more information. Updating to OSG 3.6 \u00b6 There are some manual steps that need to be taken for authentication to work in OSG 3.6. Ensure OSG XRootD packages are fully up-to-date \u00b6 Some authentication configuration is provided by OSG packaging. Old versions of the packages may result in broken configuration. It is best if your packages match the versions in the appropriate release subdirectories of https://repo.opensciencegrid.org/osg/3.6/ , but at the very least these should be true: xrootd >= 5.4 xrootd-multiuser >= 2 (if using multiuser) xrootd-scitokens >= 5.4 (if using SciTokens/WLCG Tokens) xrootd-voms >= 5.4.2-1.1 (if using VOMS auth) osg-xrootd >= 3.6 osg-xrootd-standalone >= 3.6 (if installed) xcache >= 3 (if using xcache-derived software such as stash-cache, stash-origin, atlas-xcache, or cms-xcache) SciToken auth \u00b6 Updating from XRootD 4 (OSG 3.5 without 3.5-upcoming) \u00b6 The config syntax for adding auth plugins has changed between XRootD 4 and XRootD 5. Replace ofs.authlib libXrdAccSciTokens.so ... with ofs.authlib ++ libXrdAccSciTokens.so ... Updating from XRootD 5 (OSG 3.5 with 3.5-upcoming) \u00b6 No config changes are necessary. Proxy auth: transitioning from XrdLcmaps to XrdVoms \u00b6 In OSG 3.5 and previous, proxy authentication was handled by the XrdLcmaps plugin, provided in the xrootd-lcmaps RPM. This is no longer the case in OSG 3.6; instead it is handled by the XrdVoms plugin, provided in the xrootd-voms RPM. To continue using proxy authentication, update your configuration and your authorization database (Authfile) as described below. Updating XRootD configuration \u00b6 Remove any old config in /etc/xrootd and /etc/xrootd/config.d that mentions LCMAPS or libXrdLcmaps.so , otherwise XRootD may fail to start. If you do not have both an unauthenticated stash-cache and an authenticated stash-cache on the same server, uncomment set EnableVoms = 1 in /etc/xrootd/config.d/10-osg-xrdvoms.cfg . If you have both an an authenticated stash-cache and an unauthenticated stash-cache on the same server, add the following block to /etc/xrootd/config.d/10-osg-xrdvoms.cfg : if named stash-cache-auth set EnableVoms = 1 fi If you are using XRootD Multiuser, create a VOMS Mapfile at /etc/grid-security/voms-mapfile , with the syntax described above , then add voms.mapfile /etc/grid-security/voms-mapfile to your XRootD config if it's not already present. Note In order to make yum update easier, xrootd-lcmaps has been replaced with an empty package, which can be removed after upgrading. Updating your authorization database \u00b6 Unlike the XrdLcmaps plugin, which mapped VOMS FQANs to users u , the XrdVoms plugin maps FQANs to groups g , roles r , and organizations o , as described in the mapping VOMS attributes section . You can still use a VOMS mapfile but if you want to use the mappings provided at /usr/share/osg/voms-mapfile-default by the vo-client-lcmaps-voms package, you must copy them to /etc/grid-security/voms-mapfile . Replace mappings based on users with mappings based on the other attributes. For example, instead of u uscmslocal /uscms rl use g /cms/uscms /uscms rl If you need to make a mapping based on group and role, create and use a \"compound ID\" as described in the XRootD security documentation . # create the ID named \"cmsprod\" = cmsprod g /cms r Production # use it x cmsprod /cmsprod rl","title":"Configure Authorization"},{"location":"data/xrootd/xrootd-authorization/#configuring-xrootd-authorization","text":"XRootD offers several authentication options using security plugins to validate incoming credentials, such as bearer tokens, X.509 proxies, and VOMS proxies. In the case of X.509 and VOMS proxies, after the incoming credential has been mapped to a username or groupname, the authorization database is used to provide fine-grained file access. Note On data nodes, files will be owned by Unix user xrootd (or other daemon user), not as the user authenticated to, under most circumstances. XRootD will verify the permissions and authorization based on the user that the security plugin authenticates you to, but, internally, the data node files will be owned by the xrootd user. If this behaviour is not desired, enable XRootD multi-user support .","title":"Configuring XRootD Authorization"},{"location":"data/xrootd/xrootd-authorization/#authorizing-bearer-tokens","text":"The OSG 3.6 configurations of XRootD support authorization of bearer tokens such as macaroons, SciTokens, or WLCG tokens. Encoded in the bearer tokens themselves are information about the files that they should have read/write access to and in the case of SciTokens and WLCG tokens, you may configure XRootD to further restrict access.","title":"Authorizing Bearer Tokens"},{"location":"data/xrootd/xrootd-authorization/#configuring-scitokenswlcg-tokens","text":"SciTokens and WLCG Tokens are asymmetrically signed bearer tokens: they are signed by a token issuer (e.g., CILogon, IAM) and can be verified with the token issuer's public key. To configure XRootD to accept tokens from a given token issuer use the following instructions: Add a section for each token issuer to /etc/xrootd/scitokens.conf : [Issuer <NAME>] issuer = <URL> base_path = <RELATIVE PATH> Replacing <NAME > with a descriptive name, <URL> with the token issuer URL, and base_path to a path relative to rootdir that the client should be restricted to accessing. (Optional) if you want to map the incoming token for a given issuer to a Unix username: Install xrootd-multiuser Add the following to the relevant issuer section in /etc/xrootd/scitokens.conf : map_subject = True (Optional) if you want to only accept tokens with the appropriate aud field, add the following to /etc/xrootd/scitokens.conf : [Global] audience = <COMMMA SEPARATED LIST OF AUDIENCES> An example configuration that supports tokens issued by the OSG Connect and CMS: [Global] audience = https://testserver.example.com/, MySite [Issuer OSG-Connect] issuer = https://scitokens.org/osg-connect base_path = /stash map_subject = True [Issuer CMS] issuer = https://scitokens.org/cms base_path = /user/cms","title":"Configuring SciTokens/WLCG Tokens"},{"location":"data/xrootd/xrootd-authorization/#configuring-macaroons","text":"Macaroons are symetrically signed bearer tokens so your XRootD host must have access to the same secret key that is used to sign incoming macaroons. When used in an XRootD cluster, all data nodes and the redirector need access to the same secret. To enable macaroon support: Place the shared secret in /etc/xrootd/macaroon-secret Ensure that it has the appropriate file ownership and permissions: root@host # chown xrootd:xrootd /etc/xrootd/macaroon-secret root@host # chmod 0600 /etc/xrootd/macaroon-secret","title":"Configuring macaroons"},{"location":"data/xrootd/xrootd-authorization/#authorizing-x509-proxies","text":"","title":"Authorizing X.509 proxies"},{"location":"data/xrootd/xrootd-authorization/#authenticating-proxies","text":"Authorizations for proxy-based security are declared in an XRootD authorization database file . XRootD authentication plugins are used to provide the mappings that are used in the database. Starting with OSG 3.6 , DN mappings are performed with XRootD's built-in GSI support, and FQAN mappings are with the XRootD-VOMS ( XrdVoms ) plugin. To enable proxy authentication, edit /etc/xrootd/config.d/10-osg-xrdvoms.cfg and add or uncomment the line set EnableVoms = 1 Note Proxy authentication is already enabled in XRootD Standalone , so this step is not necessary there. Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, or XRootD 5.5.0 or newer. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. Key length requirements Servers on EL 8 or newer will reject proxies that are not at least 2048 bits long. Ensure your clients' proxies have at least 2048 bits long with voms-proxy-info ; if necessary, have them add the argument -bits 2048 to their voms-proxy-init calls.","title":"Authenticating proxies"},{"location":"data/xrootd/xrootd-authorization/#mapping-subject-dns","text":"DN mappings take precedence over VOMS attributes If you have mapped the subject Distinguished Name (DN) of an incoming proxy with VOMS attributes, XRootD will map it to a username. In OSG 3.6, X.509 proxies are mapped using the built-in XRootD GSI plug-in. To map an incoming proxy's subject DN to an XRootD username , add lines of the following format to /etc/grid-security/grid-mapfile : \"<SUBJECT DN>\" <AUTHDB USERNAME> Replacing <SUBJECT DN> with the X.509 proxy's DN to map and <AUTHDB USERNAME> with the username to reference in the authorization database . For example, the following mapping: \"/DC=org/DC=cilogon/C=US/O=University of Wisconsin-Madison/CN=Brian Lin A2266246\" blin Will result in the username blin , i.e. authorize access to clients presenting the above proxy with u blin ... in the authorization database.","title":"Mapping subject DNs"},{"location":"data/xrootd/xrootd-authorization/#mapping-voms-attributes","text":"Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, or XRootD 5.5.0 or newer. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. In OSG 3.6, if the XRootD-VOMS plugin is enabled, an incoming VOMS proxy will authenticate the first VOMS FQAN and map it to an organization name ( o ), groupname ( g ), and role name ( r ) in the authorization database . For example, a proxy from the OSPool whose first VOMS FQAN is /osg/Role=NULL/Capability=NULL will be authenticated to the /osg groupname; note that the / is included in the groupname. Instead of only using the first VOMS FQAN, you can configure XRootD to consider all VOMS FQANs in the proxy for authentication by setting the following in /etc/xrootd/config.d/10-osg-xrdvoms.cfg : set vomsfqans = useall","title":"Mapping VOMS attributes"},{"location":"data/xrootd/xrootd-authorization/#mapping-voms-attributes-to-users","text":"In order for the XRootD-Multiuser plugin to work, a proxy must be mapped to a user ( u ) that is a valid Unix user. Use a VOMS Mapfile, conventionally in /etc/grid-security/voms-mapfile that contains lines in the following form: \"<FQAN PATTERN>\" <USERNAME> replacing <FQAN PATTERN> with a glob matching FQANs, and <USERNAME> with the user that you want to map matching FQANs to. For example, \"/osg/*\" osg01 will map FQANs starting with /osg/ to the user osg01 . To enable using VOMS mapfiles in the first place, add the following line to your XRootD configuration: voms.mapfile /etc/grid-security/voms-mapfile replacing /etc/grid-security/voms-mapfile with the actual location of your mapfile, if it is different. Note A VOMS Mapfile only affects mapping the user ( u ) attribute understood in the authorization-database . The FQAN will always be used for the groupname ( g ), organization name ( o ), and role name ( r ), even if the mapfile is missing or does not contain a matching mapping. See the VOMS Mapping documentation for details. VOMS Mapfiles previously used with LCMAPS should continue to work unmodified, but the plugin can only look at a single mapfile, so if you are using the mappings provided in /usr/share/osg/voms-mapfile-default (by the vo-client-lcmaps-voms package), you will have to copy them to /etc/grid-security/voms-mapfile .","title":"Mapping VOMS attributes to users"},{"location":"data/xrootd/xrootd-authorization/#authorization-database","text":"XRootD allows configuring fine-grained file access permissions based on authenticated identities and paths. This is configured in the authorization file /etc/xrootd/Authfile , which should be writable only by the xrootd user, optionally readable by others. Here is an example /etc/xrootd/Authfile : # This means that all the users have read access to the datasets, _except_ under /private u * <STORAGE PATH>/private -rl <STORAGE PATH> rl # Or the following, without a restricted /private dir # u * <STORAGE PATH> rl # This means that all the users have full access to their private home dirs u = <STORAGE PATH>/home/@=/ a # This means that the privileged 'xrootd' user can do everything # There must be at least one such user in order to create the # private dirs for users willing to store their data in the facility u xrootd <STORAGE PATH> a # This means that OSPool clients presenting a VOMS proxy can do anything under the 'osg' directory g /osg <STORAGE PATH>/osg a Replacing <STORAGE PATH> with the path to the directory that will contain data served by XRootD, e.g. /data/xrootdfs . This path is relative to the rootdir . Configure most to least specific paths Specific paths need to be specified before generic paths. For example, this line will allow all users to read the contents /data/xrootdfs/private : u * /data/xrootdfs rl /data/xrootdfs/private -rl Instead, specify the following to ensure that a given user will not be able to read the contents of /data/xrootdfs/private unless specified with another authorization rule: u * /data/xrootdfs/private -rl /data/xrootdfs rl","title":"Authorization database"},{"location":"data/xrootd/xrootd-authorization/#formatting","text":"More generally, each authorization rule of the authorization database has the following form: idtype id path privs Field Description idtype Type of id. Use u for username, g for groupname, o for organization name, r for role name, etc. id ID name, e.g. username or groupname. Use * for all users or = for user-specific capabilities, like home directories path The path prefix to be used for matching purposes. @= expands to the current user name before a path prefix match is attempted privs Letter list of privileges: a - all ; l - lookup ; d - delete ; n - rename ; i - insert ; r - read ; k - lock (not used) ; w - write ; - - prefix to remove specified privileges For more details or examples on how to use templated user options, see XRootD authorization database .","title":"Formatting"},{"location":"data/xrootd/xrootd-authorization/#verifying-file-ownership-and-permissions","text":"Ensure the authorization datbase file is owned by xrootd (if you have created file as root), and that it is not writable by others. root@host # chown xrootd:xrootd /etc/xrootd/Authfile root@host # chmod 0640 /etc/xrootd/Authfile # or 0644","title":"Verifying file ownership and permissions"},{"location":"data/xrootd/xrootd-authorization/#multiuser-and-the-authorization-database","text":"The XRootD-Multiuser plugin can be used to perform file system operations as a different user than the XRootD daemon (whose user is xrootd ). If it is enabled, then after authorization is done using the authorization database, XRootD will take the user ( u ) attribute of the incoming request, and perform file operations as the Unix user with the same name as that attribute. Note If there is no Unix user with a matching name, you will see an error like XRootD mapped request to username that does not exist: <username> ; the operation will then fail with \"EACCES\" (access denied).","title":"Multiuser and the authorization database"},{"location":"data/xrootd/xrootd-authorization/#applying-authorization-changes","text":"After making changes to your authorization database , you must restart the relevant services .","title":"Applying Authorization Changes"},{"location":"data/xrootd/xrootd-authorization/#verifying-xrootd-authorization","text":"","title":"Verifying XRootD Authorization"},{"location":"data/xrootd/xrootd-authorization/#bearer-tokens","text":"To test read access using macaroon, SciTokens, and WLCG token authorization with an OSG 3.6 installation, run the following command: user@host $ curl -v \\ -H 'Authorization: Bearer <TOKEN>' \\ https://host.example.com//path/to/directory/hello_world Replacing <TOKEN> with the contents of your encoded token, host.example.com with the target XRootD host, and /path/to/directory/hello_world with the path of the file to read. To test write access, using macaroon, SciTokens, and WLCG token authorization, run the following command: user@host $ curl -v \\ -X PUT \\ --upload-file <FILE TO UPLOAD> \\ -H 'Authorization: Bearer <TOKEN>' \\ https://host.example.com//path/to/directory/hello_world Replacing <TOKEN> with the contents of your encoded token, <FILE TO UPLOAD> with the file to write to the XRootD host, host.example.com with the target XRootD host, and /path/to/directory/hello_world with the path of the file to write.","title":"Bearer tokens"},{"location":"data/xrootd/xrootd-authorization/#x509-and-voms-proxies","text":"To verify X.509 and VOMS proxy authorization, run the following commands from a machine with your user certificate/key pair, xrootd-client , and voms-clients-cpp installed: Destroy any pre-existing proxies and attempt a copy to a directory (which we will refer to as <DESTINATION PATH> ) on the <XROOTD HOST> to verify failure: user@client $ voms-proxy-destroy user@client $ xrdcp /bin/bash root://<XROOTD HOST>/<DESTINATION PATH> 180213 13:56:49 396570 cryptossl_X509CreateProxy: EEC certificate has expired [0B/0B][100%][==================================================][0B/s] Run: [FATAL] Auth failed On the XRootD host, add your DN to /etc/grid-security/grid-mapfile Add a line to the authorization database to ensure the mapped user can write to <DESTINATION PATH> Restart the relevant XRootD services. See this section for details Generate your proxy and verify that you can successfully transfer files: user@client $ voms-proxy-init user@client $ xrdcp /bin/sh root://<XROOTD HOST>/<DESTINATION PATH> [938.1kB/938.1kB][100%][==================================================][938.1kB/s] If your transfer does not succeed, re-run xrdcp with --debug 2 for more information.","title":"X.509 and VOMS proxies"},{"location":"data/xrootd/xrootd-authorization/#updating-to-osg-36","text":"There are some manual steps that need to be taken for authentication to work in OSG 3.6.","title":"Updating to OSG 3.6"},{"location":"data/xrootd/xrootd-authorization/#ensure-osg-xrootd-packages-are-fully-up-to-date","text":"Some authentication configuration is provided by OSG packaging. Old versions of the packages may result in broken configuration. It is best if your packages match the versions in the appropriate release subdirectories of https://repo.opensciencegrid.org/osg/3.6/ , but at the very least these should be true: xrootd >= 5.4 xrootd-multiuser >= 2 (if using multiuser) xrootd-scitokens >= 5.4 (if using SciTokens/WLCG Tokens) xrootd-voms >= 5.4.2-1.1 (if using VOMS auth) osg-xrootd >= 3.6 osg-xrootd-standalone >= 3.6 (if installed) xcache >= 3 (if using xcache-derived software such as stash-cache, stash-origin, atlas-xcache, or cms-xcache)","title":"Ensure OSG XRootD packages are fully up-to-date"},{"location":"data/xrootd/xrootd-authorization/#scitoken-auth","text":"","title":"SciToken auth"},{"location":"data/xrootd/xrootd-authorization/#updating-from-xrootd-4-osg-35-without-35-upcoming","text":"The config syntax for adding auth plugins has changed between XRootD 4 and XRootD 5. Replace ofs.authlib libXrdAccSciTokens.so ... with ofs.authlib ++ libXrdAccSciTokens.so ...","title":"Updating from XRootD 4 (OSG 3.5 without 3.5-upcoming)"},{"location":"data/xrootd/xrootd-authorization/#updating-from-xrootd-5-osg-35-with-35-upcoming","text":"No config changes are necessary.","title":"Updating from XRootD 5 (OSG 3.5 with 3.5-upcoming)"},{"location":"data/xrootd/xrootd-authorization/#proxy-auth-transitioning-from-xrdlcmaps-to-xrdvoms","text":"In OSG 3.5 and previous, proxy authentication was handled by the XrdLcmaps plugin, provided in the xrootd-lcmaps RPM. This is no longer the case in OSG 3.6; instead it is handled by the XrdVoms plugin, provided in the xrootd-voms RPM. To continue using proxy authentication, update your configuration and your authorization database (Authfile) as described below.","title":"Proxy auth: transitioning from XrdLcmaps to XrdVoms"},{"location":"data/xrootd/xrootd-authorization/#updating-xrootd-configuration","text":"Remove any old config in /etc/xrootd and /etc/xrootd/config.d that mentions LCMAPS or libXrdLcmaps.so , otherwise XRootD may fail to start. If you do not have both an unauthenticated stash-cache and an authenticated stash-cache on the same server, uncomment set EnableVoms = 1 in /etc/xrootd/config.d/10-osg-xrdvoms.cfg . If you have both an an authenticated stash-cache and an unauthenticated stash-cache on the same server, add the following block to /etc/xrootd/config.d/10-osg-xrdvoms.cfg : if named stash-cache-auth set EnableVoms = 1 fi If you are using XRootD Multiuser, create a VOMS Mapfile at /etc/grid-security/voms-mapfile , with the syntax described above , then add voms.mapfile /etc/grid-security/voms-mapfile to your XRootD config if it's not already present. Note In order to make yum update easier, xrootd-lcmaps has been replaced with an empty package, which can be removed after upgrading.","title":"Updating XRootD configuration"},{"location":"data/xrootd/xrootd-authorization/#updating-your-authorization-database","text":"Unlike the XrdLcmaps plugin, which mapped VOMS FQANs to users u , the XrdVoms plugin maps FQANs to groups g , roles r , and organizations o , as described in the mapping VOMS attributes section . You can still use a VOMS mapfile but if you want to use the mappings provided at /usr/share/osg/voms-mapfile-default by the vo-client-lcmaps-voms package, you must copy them to /etc/grid-security/voms-mapfile . Replace mappings based on users with mappings based on the other attributes. For example, instead of u uscmslocal /uscms rl use g /cms/uscms /uscms rl If you need to make a mapping based on group and role, create and use a \"compound ID\" as described in the XRootD security documentation . # create the ID named \"cmsprod\" = cmsprod g /cms r Production # use it x cmsprod /cmsprod rl","title":"Updating your authorization database"},{"location":"other/configuration-with-osg-configure/","text":"Configuration with OSG-Configure \u00b6 OSG-Configure and the INI files in /etc/osg/config.d allow a high level configuration of OSG services. This document outlines the settings and options found in the INI files for system administers that are installing and configuring OSG software. This page gives an overview of the options for each of the sections of the configuration files that osg-configure uses. Invocation and script usage \u00b6 The osg-configure script is used to process the INI files and apply changes to the system. osg-configure must be run as root. The typical workflow of OSG-Configure is to first edit the INI files, then verify them, then apply the changes. To verify the config files, run: [root@server] osg-configure -v OSG-Configure will list any errors in your configuration, usually including the section and option where the problem is. Potential problems are: Required option not filled in Invalid value Syntax error Inconsistencies between options To apply changes, run: [root@server] osg-configure -c If your INI files do not change, then re-running osg-configure -c will result in the same configuration as when you ran it the last time. This allows you to experiment with your settings without having to worry about messing up your system. OSG-Configure is split up into modules. Normally, all modules are run when calling osg-configure . However, it is possible to run specific modules separately. To see a list of modules, including whether they can be run separately, run: [root@server] osg-configure -l If the module can be run separately, specify it with the -m <MODULE> option, where <MODULE> is one of the items of the output of the previous command. [root@server] osg-configure -c -m <MODULE> Options may be specified in multiple INI files, which may make it hard to determine which value OSG-Configure uses. You may query the final value of an option via one of these methods: [root@server] osg-configure -q -o <OPTION> [root@server] osg-configure -q -o <SECTION>.<OPTION> Where <OPTION> is the variable from which we want to know the value and <SECTION> refers to a section in any of the INI files, i.e. any name between brackets e.g. [Squid] . Logs are written to /var/log/osg/osg-configure.log . If something goes wrong, specify the -d flag to add more verbose output to osg-configure.log . The rest of this document will detail what to specify in the INI files. Conventions \u00b6 In the tables below: Mandatory options for a section are given in bold type. Sometime the default value may be OK and no edit required, but the variable has to be in the file. Options that are not found in the default ini file are in italics . Syntax and layout \u00b6 The configuration files used by osg-configure are the one supported by Python's configparser , similar in format to the INI configuration file used by MS Windows: Config files are separated into sections, specified by a section name in square brackets (e.g. [Section 1] ) Options should be set using name = value pairs Lines that begin with ; or # are comments Long lines can be split up using continutations: each white space character can be preceded by a newline to fold/continue the field on a new line (same syntax as specified in email RFC 822 ) Variable substitutions are supported -- see below osg-configure reads and uses all of the files in /etc/osg/config.d that have a \".ini\" suffix. The files in this directory are ordered with a numeric prefix with higher numbers being applied later and thus having higher precedence (e.g. 00-foo.ini has a lower precedence than 99-local-site-settings.ini). Configuration sections and options can be specified multiple times in different files. E.g. a section called [PBS] can be given in 20-pbs.ini as well as 99-local-site-settings.ini . Each of the files are successively read and merged to create a final configuration that is then used to configure OSG software. Options and settings in files read later override the ones in previous files. This allows admins to create a file with local settings (e.g. 99-local-site-settings.ini ) that can be read last and which will be take precedence over the default settings in configuration files installed by various RPMs and which will not be overwritten if RPMs are updated. Variable substitution \u00b6 The osg-configure parser allows variables to be defined and used in the configuration file: any option set in a given section can be used as a variable in that section. Assuming that you have set an option with the name myoption in the section, you can substitute the value of that option elsewhere in the section by referring to it as %(myoption)s . Note The trailing s is required. Also, option names cannot have a variable subsitution in them. Special Settings \u00b6 If a setting is set to UNAVAILABLE or DEFAULT or left blank, osg-configure will try to use a sensible default for setting if possible. Ignore setting \u00b6 The enabled option, specifying whether a service is enabled or not, is a boolean but also accepts Ignore as a possible value. Using Ignore, results in the service associated with the section being ignored entirely (and any configuration is skipped). This differs from using False (or the %(disabled)s variable), because using False results in the service associated with the section being disabled. osg-configure will not change the configuration of the service if the enabled is set to Ignore . This is useful, if you have a complex configuration for a given that can't be set up using the ini configuration files. You can manually configure that service by hand editing config files, manually start/stop the service and then use the Ignore setting so that osg-configure does not alter the service's configuration and status. Configuration sections \u00b6 The OSG configuration is divided into sections with each section starting with a section name in square brackets (e.g. [Section 1] ). The configuration is split in multiple files and options form one section can be in more than one files. The following sections give an overview of the options for each of the sections of the configuration files that osg-configure uses. Bosco \u00b6 This section is contained in /etc/osg/config.d/20-bosco.ini which is provided by the osg-configure-bosco RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Bosco jobmanager is being used or not. users String A comma separated string. The existing usernames on the CE for which to install Bosco and allow submissions. In order to have separate usernames per VO, for example the CMS VO to have the cms username, each user must have Bosco installed. The osg-configure service will install Bosco on each of the users listed here. endpoint String The remote cluster submission host for which Bosco will submit jobs to the scheduler. This is in the form of user@example.com , exactly as you would use to ssh into the remote cluster. batch String The type of scheduler installed on the remote cluster. ssh_key String The location of the ssh key, as created above. Condor \u00b6 This section describes the parameters for a Condor jobmanager if it's being used in the current CE installation. If Condor is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-condor.ini which is provided by the osg-configure-condor RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Condor jobmanager is being used or not. condor_location String This should be set to be directory where condor is installed. If this is set to a blank variable, DEFAULT or UNAVAILABLE, the osg-configure script will try to get this from the CONDOR_LOCATION environment variable if available otherwise it will use /usr which works for the RPM installation. condor_config String This should be set to be path where the condor_config file is located. If this is set to a blank variable, DEFAULT or UNAVAILABLE, the osg-configure script will try to get this from the CONDOR_CONFIG environment variable if available otherwise it will use /etc/condor/condor_config , the default for the RPM installation. LSF \u00b6 This section describes the parameters for a LSF jobmanager if it's being used in the current CE installation. If LSF is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-lsf.ini which is provided by the osg-configure-lsf RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the LSF jobmanager is being used or not. lsf_location String This should be set to be directory where lsf is installed PBS \u00b6 This section describes the parameters for a pbs jobmanager if it's being used in the current CE installation. If PBS is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-pbs.ini which is provided by the osg-configure-pbs RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the PBS jobmanager is being used or not. pbs_location String This should be set to be directory where pbs is installed. osg-configure will try to loocation for the pbs binaries in pbs_location/bin. accounting_log_directory String This setting is used to tell Gratia where to find your accounting log files, and it is required for proper accounting. pbs_server String This setting is optional and should point to your PBS server node if it is different from your OSG CE SGE \u00b6 This section describes the parameters for a SGE jobmanager if it's being used in the current CE installation. If SGE is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-sge.ini which is provided by the osg-configure-sge RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the SGE jobmanager is being used or not. sge_root String This should be set to be directory where sge is installed (e.g. same as $SGE_ROOT variable). sge_cell String The sge_cell setting should be set to the value of $SGE_CELL for your SGE install. Slurm \u00b6 This section describes the parameters for a Slurm jobmanager if it's being used in the current CE installation. If Slurm is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-slurm.ini which is provided by the osg-configure-slurm RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Slurm jobmanager is being used or not. slurm_location String This should be set to be directory where slurm is installed. osg-configure will try to location for the slurm binaries in slurm_location/bin. db_host String Hostname of the machine hosting the SLURM database. This information is needed to configure the SLURM gratia probe. db_port String Port of where the SLURM database is listening. This information is needed to configure the SLURM gratia probe. db_user String Username used to access the SLURM database. This information is needed to configure the SLURM gratia probe. db_pass String The location of a file containing the password used to access the SLURM database. This information is needed to configure the SLURM gratia probe. db_name String Name of the SLURM database. This information is needed to configure the SLURM gratia probe. slurm_cluster String The name of the Slurm cluster Gratia \u00b6 This section configures Gratia. If probes is set to UNAVAILABLE , then osg-configure will use appropriate default values. If you need to specify custom reporting (e.g. a local gratia collector) in addition to the default probes, %(osg-jobmanager-gratia)s , and %(itb-jobmanager-gratia)s are defined in the default configuration files to make it easier to specify the standard osg reporting. This section is contained in /etc/osg/config.d/30-gratia.ini which is provided by the osg-configure-gratia RPM. Option Values Accepted Explanation enabled True , False , Ignore This should be set to True if gratia should be configured and enabled on the installation being configured. resource String This should be set to the resource name as given in the OIM registration probes String This should be set to the gratia probes that should be enabled. A probe is specified by using as [probe_type]:server:port . See note Note probes : Legal values for probe_type are: jobmanager (for the HTCondor-CE probe) Info Services \u00b6 Reporting to the central CE Collectors is configured in this section. In the majority of cases, this file can be left untouched; you only need to configure this section if you wish to report to your own CE Collector instead of the ones run by OSG Operations. This section is contained in /etc/osg/config.d/30-infoservices.ini , which is provided by the osg-configure-infoservices RPM. (This is for historical reasons.) Option Values Accepted Explanation enabled True , False , Ignore True if reporting should be configured and enabled ce_collectors String The server(s) HTCondor-CE information should be sent to. See note Note ce_collectors : Set this to DEFAULT to report to the OSG Production or ITB servers (depending on your Site Information configuration). Set this to PRODUCTION to report to the OSG Production servers Set this to ITB to report to the OSG ITB servers Otherwise, set this to the hostname:port of a host running a condor-ce-collector daemon Subcluster / Resource Entry for AGIS / GlideinWMS Entry \u00b6 Subcluster and Resource Entry configuration is for reporting about the worker resources on your site. A subcluster is a homogeneous set of worker node hardware; a resource is a set of subcluster(s) with common capabilities that will be reported to the ATLAS AGIS system. At least one Subcluster or Resource Entry section is required on a CE; please populate the information for all your subclusters. This information will be reported to a central collector and will be used to send GlideIns / pilot jobs to your site; having accurate information is necessary for OSG jobs to effectively use your resources. These configuration files are provided by the osg-configure-cluster RPM. This configuration uses multiple sections of the OSG configuration files: Subcluster* in /etc/osg/config.d/31-cluster.ini : options about homogeneous subclusters Resource Entry* in /etc/osg/config.d/31-cluster.ini : options for specifying ATLAS queues for AGIS GlideinWMS Entry* in /etc/osg/config.d/35-pilot.ini : options for specifying queues for the CMS and OSG GlideinWMS factories Notes for multi-CE sites. \u00b6 If you would like to properly advertise multiple CEs per cluster, make sure that you: Set the value of site_name in the \"Site Information\" section to be the same for each CE. Have the exact same configuration values for the Subcluster* and Resource Entry* sections in each CE. Subcluster Configuration \u00b6 Each homogeneous set of worker node hardware is called a subcluster . For each subcluster in your cluster, fill in the information about the worker node hardware by creating a new Subcluster section with a unique name in the following format: [Subcluster CHANGEME] , where CHANGEME is the globally unique subcluster name (yes, it must be a globally unique name for the whole grid, not just unique to your site. Get creative.) Option Values Accepted Explanation name String The same name that is in the Section label; it should be globally unique ram_mb Positive Integer Megabytes of RAM per node cores_per_node Positive Integer Number of cores per node allowed_vos Comma-separated List or * The collaborations that are allowed to run jobs on this subcluster The following attributes are optional: Option Values Accepted Explanation max_wall_time Positive Integer Maximum wall-clock time, in minutes, that a job is allowed to run on this subcluster. The default is 1440, or the equivalent of one day. queue String The queue to which jobs should be submitted in order to run on this subcluster extra_transforms Classad Transformation attributes which the HTCondor Job Router should apply to incoming jobs so they can run on this subcluster Resource Entry Configuration (ATLAS only) \u00b6 If you are configuring a CE for the ATLAS VO, you must provide hardware information to advertise the queues that are available to AGIS. For each queue, create a new Resource Entry section with a unique name in the following format: [Resource Entry RESOURCE] where RESOURCE is a globally unique resource name (it must be a globally unique name for the whole grid, not just unique to your site). The following options are required for the Resource Entry section and are used to generate the data required by AGIS: Option Values Accepted Explanation name String The same name that is in the Resource Entry label; it must be globally unique max_wall_time Positive Integer Maximum wall-clock time, in minutes, that a job is allowed to run on this resource queue String The queue to which jobs should be submitted to run on this resource cpucount (alias cores_per_node ) Positive Integer Number of cores that a job using this resource can get maxmemory (alias ram_mb ) Positive Integer Maximum amount of memory (in MB) that a job using this resource can get allowed_vos Comma-separated List or * The collaborations that are allowed to run jobs on this resource The following attributes are optional: Option Values Accepted Explanation subclusters Comma-separated List The physical subclusters the resource entry refers to; must be defined as Subcluster sections elsewhere in the file vo_tag String An arbitrary label that is added to jobs routed through this resource GlideinWMS Entry (CMS and OSG pilot factories) \u00b6 If you are configuring a CE that is going to receive pilot jobs from the CMS or the OSG factories (CMS, OSG, LIGO, CLAS12, DUNE, Glow, IceCube, ...), you can provide pilot job specifications to help operators automatically configure the factory entries in GlideinWMS. For each pilot type, create a new Pilot section with a unique name in the following format: [Pilot NAME] where NAME is a string describing the pilot type (e.g.: GPU, WholeNode, default). The following options can be specified in the Pilot section: This section is contained in /etc/osg/config.d/35-pilot.ini Option Values Accepted Explanation cpucount Positive Integer The number of cores for this pilot type. ram_mb Positive Integer The amount of memory (in megabytes) for this pilot type. whole_node true, false This is a whole node pilot; cpucount and ram_mb are ignored if this is true. gpucount Positive Integer The number of GPUs available max_pilots Positive Integer The maximum number of pilots of this type that can be sent max_wall_time Positive Integer The maximum wall-clock time a job is allowed to run for this pilot type, in minutes queue String The queue or partition which jobs should be submitted to in order to run on this resource. Equivalent to the HTCondor grid universe classad attribute remote_queue require_singularity true, false True if the pilot should require singularity or apptainer on the workers. os Comma-separated List The OS of the workers; allowed values are rhel6 , rhel7 , rhel8 , or ubuntu18 . This is required unless require_singularity = true * send_tests true, false Send test pilots? Currently not working, placeholder allowed_vos Comma-separated List or * A comma-separated list of collaborations that are allowed to submit to this subcluster Gateway \u00b6 This section gives information about the options in the Gateway section of the configuration files. These options control the behavior of job gateways on the CE. CEs are based on HTCondor-CE, which uses condor-ce as the gateway. This section is contained in /etc/osg/config.d/10-gateway.ini which is provided by the osg-configure-gateway RPM. Option Values Accepted Explanation htcondor_gateway_enabled True , False (default True). True if the CE is using HTCondor-CE, False otherwise. HTCondor-CE will be configured to support enabled batch systems. RSV will use HTCondor-CE to launch remote probes. job_envvar_path String The value of the PATH environment variable to put into HTCondor jobs running with HTCondor-CE. This value is ignored if not using that batch system/gateway combination. Local Settings \u00b6 This section differs from other sections in that there are no set options in this section. Rather, the options set in this section will be placed in the osg-local-job-environment.conf verbatim. The options in this section are case sensitive and the case will be preserved when they are converted to environment variables. The osg-local-job-environment.conf file gets sourced by jobs run on your cluster so any variables set in this section will appear in the environment of jobs run on your system. Adding a line such as My_Setting = my_Value would result in the an environment variable called My_Setting set to my_Value in the job's environment. my_Value can also be defined in terms of an environment variable (i.e My_Setting = $my_Value ) that will be evaluated on the worker node. For example, to add a variable MY_PATH set to /usr/local/myapp , you'd have the following: [Local Settings] MY_PATH = /usr/local/myapp This section is contained in /etc/osg/config.d/40-localsettings.ini which is provided by the osg-configure-ce RPM. Site Information \u00b6 The settings found in the Site Information section are described below. This section is used to give information about a resource such as resource name, site sponsors, administrators, etc. This section is contained in /etc/osg/config.d/40-siteinfo.ini which is provided by the osg-configure-ce RPM. Option Values Accepted Description group OSG , OSG-ITB This should be set to either OSG or OSG-ITB depending on whether your resource is in the OSG or OSG-ITB group. Most sites should specify OSG host_name String This should be set to be hostname of the CE that is being configured resource String The resource name of this CE endpoint as registered in OIM. resource_group String The resource_group of this CE as registered in OIM. sponsor String This should be set to the sponsor of the resource. See note. site_policy Url This should be a url pointing to the resource's usage policy contact String This should be the name of the resource's admin contact email Email address This should be the email address of the admin contact for the resource city String This should be the city that the resource is located in country String This should be two letter country code for the country that the resource is located in. longitude Number This should be the longitude of the resource. It should be a number between -180 and 180. latitude Number This should be the latitude of the resource. It should be a number between -90 and 90. Note sponsor : If your resource has multiple sponsors, you can separate them using commas or specify the percentage using the following format 'osg, atlas, cms' or 'osg:10, atlas:45, cms:45'. The percentages must add up to 100 if multiple sponsors are used. If you have a sponsor that is not an OSG VO, you can indicate this by using 'local' as the VO. Squid \u00b6 This section handles the configuration and setup of the squid web caching and proxy service. This section is contained in /etc/osg/config.d/01-squid.ini which is provided by the osg-configure-squid RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the squid service is being used or not. location String This should be set to the hostname:port of the squid server. Storage \u00b6 This section gives information about the options in the Storage section of the configuration file. Several of these values are constrained and need to be set in a way that is consistent with one of the OSG storage models. Please review the Storage Related Parameters section of the Environment Variables description and Site Planning discussions for explanations of the various storage models and the requirements for them. This section is contained in /etc/osg/config.d/10-storage.ini which is provided by the osg-configure-ce RPM. Option Values Accepted Explanation se_available True , False This indicates whether there is an associated SE available. default_se String If an SE is available at your cluster, set default_se to the hostname of this SE, otherwise set default_se to UNAVAILABLE. grid_dir String This setting should point to the directory which holds the files from the OSG worker node package. See note app_dir String This setting should point to the directory which contains the VO specific applications. See note data_dir String This setting should point to a directory that can be used to store and stage data in and out of the cluster. See note worker_node_temp String This directory should point to a directory that can be used as scratch space on compute nodes. If not set, the default is UNAVAILABLE. See note site_read String This setting should be the location or url to a directory that can be read to stage in data via the variable $OSG_SITE_READ . This is an url if you are using a SE. If not set, the default is UNAVAILABLE site_write String This setting should be the location or url to a directory that can be write to stage out data via the variable $OSG_SITE_WRITE . This is an url if you are using a SE. If not set, the default is UNAVAILABLE Dynamic worker node paths The above variables may be set to an environment variable that is set on your site's worker nodes. For example, if each of your worker nodes has a different location for its scratch directory specified by LOCAL_SCRATCH_DIR , set the following configuration: [Storage] worker_node_temp = $LOCAL_SCRATCH_DIR grid_dir : If you have installed the worker node client via RPM (the normal case) it should be /etc/osg/wn-client . If you have installed the worker node in a special location (perhaps via the worker node client tarball or via OASIS), it should be the location of that directory. This directory will be accessed via the $OSG_GRID environment variable. It should be visible on all of the compute nodes. Read access is required, though worker nodes don't need write access. app_dir : This directory will be accesed via the $OSG_APP environment variable. It should be visible on both the CE and worker nodes. Only the CE needs to have write access to this directory. This directory must also contain a sub-directory etc/ with 1777 permissions. This directory may also be in OASIS, in which case set app_dir to /cvmfs/oasis.opensciencegrid.org . (The CE does not need write access in that case.) data_dir : This directory can be accessed via the $OSG_DATA environment variable. It should be readable and writable on both the CE and worker nodes. worker_node_temp : This directory will be accessed via the $OSG_WN_TMP environment variable. It should allow read and write access on a worker node and can be visible to just that worker node.","title":"Configuration with OSG-Configure"},{"location":"other/configuration-with-osg-configure/#configuration-with-osg-configure","text":"OSG-Configure and the INI files in /etc/osg/config.d allow a high level configuration of OSG services. This document outlines the settings and options found in the INI files for system administers that are installing and configuring OSG software. This page gives an overview of the options for each of the sections of the configuration files that osg-configure uses.","title":"Configuration with OSG-Configure"},{"location":"other/configuration-with-osg-configure/#invocation-and-script-usage","text":"The osg-configure script is used to process the INI files and apply changes to the system. osg-configure must be run as root. The typical workflow of OSG-Configure is to first edit the INI files, then verify them, then apply the changes. To verify the config files, run: [root@server] osg-configure -v OSG-Configure will list any errors in your configuration, usually including the section and option where the problem is. Potential problems are: Required option not filled in Invalid value Syntax error Inconsistencies between options To apply changes, run: [root@server] osg-configure -c If your INI files do not change, then re-running osg-configure -c will result in the same configuration as when you ran it the last time. This allows you to experiment with your settings without having to worry about messing up your system. OSG-Configure is split up into modules. Normally, all modules are run when calling osg-configure . However, it is possible to run specific modules separately. To see a list of modules, including whether they can be run separately, run: [root@server] osg-configure -l If the module can be run separately, specify it with the -m <MODULE> option, where <MODULE> is one of the items of the output of the previous command. [root@server] osg-configure -c -m <MODULE> Options may be specified in multiple INI files, which may make it hard to determine which value OSG-Configure uses. You may query the final value of an option via one of these methods: [root@server] osg-configure -q -o <OPTION> [root@server] osg-configure -q -o <SECTION>.<OPTION> Where <OPTION> is the variable from which we want to know the value and <SECTION> refers to a section in any of the INI files, i.e. any name between brackets e.g. [Squid] . Logs are written to /var/log/osg/osg-configure.log . If something goes wrong, specify the -d flag to add more verbose output to osg-configure.log . The rest of this document will detail what to specify in the INI files.","title":"Invocation and script usage"},{"location":"other/configuration-with-osg-configure/#conventions","text":"In the tables below: Mandatory options for a section are given in bold type. Sometime the default value may be OK and no edit required, but the variable has to be in the file. Options that are not found in the default ini file are in italics .","title":"Conventions"},{"location":"other/configuration-with-osg-configure/#syntax-and-layout","text":"The configuration files used by osg-configure are the one supported by Python's configparser , similar in format to the INI configuration file used by MS Windows: Config files are separated into sections, specified by a section name in square brackets (e.g. [Section 1] ) Options should be set using name = value pairs Lines that begin with ; or # are comments Long lines can be split up using continutations: each white space character can be preceded by a newline to fold/continue the field on a new line (same syntax as specified in email RFC 822 ) Variable substitutions are supported -- see below osg-configure reads and uses all of the files in /etc/osg/config.d that have a \".ini\" suffix. The files in this directory are ordered with a numeric prefix with higher numbers being applied later and thus having higher precedence (e.g. 00-foo.ini has a lower precedence than 99-local-site-settings.ini). Configuration sections and options can be specified multiple times in different files. E.g. a section called [PBS] can be given in 20-pbs.ini as well as 99-local-site-settings.ini . Each of the files are successively read and merged to create a final configuration that is then used to configure OSG software. Options and settings in files read later override the ones in previous files. This allows admins to create a file with local settings (e.g. 99-local-site-settings.ini ) that can be read last and which will be take precedence over the default settings in configuration files installed by various RPMs and which will not be overwritten if RPMs are updated.","title":"Syntax and layout"},{"location":"other/configuration-with-osg-configure/#variable-substitution","text":"The osg-configure parser allows variables to be defined and used in the configuration file: any option set in a given section can be used as a variable in that section. Assuming that you have set an option with the name myoption in the section, you can substitute the value of that option elsewhere in the section by referring to it as %(myoption)s . Note The trailing s is required. Also, option names cannot have a variable subsitution in them.","title":"Variable substitution"},{"location":"other/configuration-with-osg-configure/#special-settings","text":"If a setting is set to UNAVAILABLE or DEFAULT or left blank, osg-configure will try to use a sensible default for setting if possible.","title":"Special Settings"},{"location":"other/configuration-with-osg-configure/#ignore-setting","text":"The enabled option, specifying whether a service is enabled or not, is a boolean but also accepts Ignore as a possible value. Using Ignore, results in the service associated with the section being ignored entirely (and any configuration is skipped). This differs from using False (or the %(disabled)s variable), because using False results in the service associated with the section being disabled. osg-configure will not change the configuration of the service if the enabled is set to Ignore . This is useful, if you have a complex configuration for a given that can't be set up using the ini configuration files. You can manually configure that service by hand editing config files, manually start/stop the service and then use the Ignore setting so that osg-configure does not alter the service's configuration and status.","title":"Ignore setting"},{"location":"other/configuration-with-osg-configure/#configuration-sections","text":"The OSG configuration is divided into sections with each section starting with a section name in square brackets (e.g. [Section 1] ). The configuration is split in multiple files and options form one section can be in more than one files. The following sections give an overview of the options for each of the sections of the configuration files that osg-configure uses.","title":"Configuration sections"},{"location":"other/configuration-with-osg-configure/#bosco","text":"This section is contained in /etc/osg/config.d/20-bosco.ini which is provided by the osg-configure-bosco RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Bosco jobmanager is being used or not. users String A comma separated string. The existing usernames on the CE for which to install Bosco and allow submissions. In order to have separate usernames per VO, for example the CMS VO to have the cms username, each user must have Bosco installed. The osg-configure service will install Bosco on each of the users listed here. endpoint String The remote cluster submission host for which Bosco will submit jobs to the scheduler. This is in the form of user@example.com , exactly as you would use to ssh into the remote cluster. batch String The type of scheduler installed on the remote cluster. ssh_key String The location of the ssh key, as created above.","title":"Bosco"},{"location":"other/configuration-with-osg-configure/#condor","text":"This section describes the parameters for a Condor jobmanager if it's being used in the current CE installation. If Condor is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-condor.ini which is provided by the osg-configure-condor RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Condor jobmanager is being used or not. condor_location String This should be set to be directory where condor is installed. If this is set to a blank variable, DEFAULT or UNAVAILABLE, the osg-configure script will try to get this from the CONDOR_LOCATION environment variable if available otherwise it will use /usr which works for the RPM installation. condor_config String This should be set to be path where the condor_config file is located. If this is set to a blank variable, DEFAULT or UNAVAILABLE, the osg-configure script will try to get this from the CONDOR_CONFIG environment variable if available otherwise it will use /etc/condor/condor_config , the default for the RPM installation.","title":"Condor"},{"location":"other/configuration-with-osg-configure/#lsf","text":"This section describes the parameters for a LSF jobmanager if it's being used in the current CE installation. If LSF is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-lsf.ini which is provided by the osg-configure-lsf RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the LSF jobmanager is being used or not. lsf_location String This should be set to be directory where lsf is installed","title":"LSF"},{"location":"other/configuration-with-osg-configure/#pbs","text":"This section describes the parameters for a pbs jobmanager if it's being used in the current CE installation. If PBS is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-pbs.ini which is provided by the osg-configure-pbs RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the PBS jobmanager is being used or not. pbs_location String This should be set to be directory where pbs is installed. osg-configure will try to loocation for the pbs binaries in pbs_location/bin. accounting_log_directory String This setting is used to tell Gratia where to find your accounting log files, and it is required for proper accounting. pbs_server String This setting is optional and should point to your PBS server node if it is different from your OSG CE","title":"PBS"},{"location":"other/configuration-with-osg-configure/#sge","text":"This section describes the parameters for a SGE jobmanager if it's being used in the current CE installation. If SGE is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-sge.ini which is provided by the osg-configure-sge RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the SGE jobmanager is being used or not. sge_root String This should be set to be directory where sge is installed (e.g. same as $SGE_ROOT variable). sge_cell String The sge_cell setting should be set to the value of $SGE_CELL for your SGE install.","title":"SGE"},{"location":"other/configuration-with-osg-configure/#slurm","text":"This section describes the parameters for a Slurm jobmanager if it's being used in the current CE installation. If Slurm is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-slurm.ini which is provided by the osg-configure-slurm RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Slurm jobmanager is being used or not. slurm_location String This should be set to be directory where slurm is installed. osg-configure will try to location for the slurm binaries in slurm_location/bin. db_host String Hostname of the machine hosting the SLURM database. This information is needed to configure the SLURM gratia probe. db_port String Port of where the SLURM database is listening. This information is needed to configure the SLURM gratia probe. db_user String Username used to access the SLURM database. This information is needed to configure the SLURM gratia probe. db_pass String The location of a file containing the password used to access the SLURM database. This information is needed to configure the SLURM gratia probe. db_name String Name of the SLURM database. This information is needed to configure the SLURM gratia probe. slurm_cluster String The name of the Slurm cluster","title":"Slurm"},{"location":"other/configuration-with-osg-configure/#gratia","text":"This section configures Gratia. If probes is set to UNAVAILABLE , then osg-configure will use appropriate default values. If you need to specify custom reporting (e.g. a local gratia collector) in addition to the default probes, %(osg-jobmanager-gratia)s , and %(itb-jobmanager-gratia)s are defined in the default configuration files to make it easier to specify the standard osg reporting. This section is contained in /etc/osg/config.d/30-gratia.ini which is provided by the osg-configure-gratia RPM. Option Values Accepted Explanation enabled True , False , Ignore This should be set to True if gratia should be configured and enabled on the installation being configured. resource String This should be set to the resource name as given in the OIM registration probes String This should be set to the gratia probes that should be enabled. A probe is specified by using as [probe_type]:server:port . See note Note probes : Legal values for probe_type are: jobmanager (for the HTCondor-CE probe)","title":"Gratia"},{"location":"other/configuration-with-osg-configure/#info-services","text":"Reporting to the central CE Collectors is configured in this section. In the majority of cases, this file can be left untouched; you only need to configure this section if you wish to report to your own CE Collector instead of the ones run by OSG Operations. This section is contained in /etc/osg/config.d/30-infoservices.ini , which is provided by the osg-configure-infoservices RPM. (This is for historical reasons.) Option Values Accepted Explanation enabled True , False , Ignore True if reporting should be configured and enabled ce_collectors String The server(s) HTCondor-CE information should be sent to. See note Note ce_collectors : Set this to DEFAULT to report to the OSG Production or ITB servers (depending on your Site Information configuration). Set this to PRODUCTION to report to the OSG Production servers Set this to ITB to report to the OSG ITB servers Otherwise, set this to the hostname:port of a host running a condor-ce-collector daemon","title":"Info Services"},{"location":"other/configuration-with-osg-configure/#subcluster-resource-entry-for-agis-glideinwms-entry","text":"Subcluster and Resource Entry configuration is for reporting about the worker resources on your site. A subcluster is a homogeneous set of worker node hardware; a resource is a set of subcluster(s) with common capabilities that will be reported to the ATLAS AGIS system. At least one Subcluster or Resource Entry section is required on a CE; please populate the information for all your subclusters. This information will be reported to a central collector and will be used to send GlideIns / pilot jobs to your site; having accurate information is necessary for OSG jobs to effectively use your resources. These configuration files are provided by the osg-configure-cluster RPM. This configuration uses multiple sections of the OSG configuration files: Subcluster* in /etc/osg/config.d/31-cluster.ini : options about homogeneous subclusters Resource Entry* in /etc/osg/config.d/31-cluster.ini : options for specifying ATLAS queues for AGIS GlideinWMS Entry* in /etc/osg/config.d/35-pilot.ini : options for specifying queues for the CMS and OSG GlideinWMS factories","title":"Subcluster / Resource Entry for AGIS / GlideinWMS Entry"},{"location":"other/configuration-with-osg-configure/#notes-for-multi-ce-sites","text":"If you would like to properly advertise multiple CEs per cluster, make sure that you: Set the value of site_name in the \"Site Information\" section to be the same for each CE. Have the exact same configuration values for the Subcluster* and Resource Entry* sections in each CE.","title":"Notes for multi-CE sites."},{"location":"other/configuration-with-osg-configure/#subcluster-configuration","text":"Each homogeneous set of worker node hardware is called a subcluster . For each subcluster in your cluster, fill in the information about the worker node hardware by creating a new Subcluster section with a unique name in the following format: [Subcluster CHANGEME] , where CHANGEME is the globally unique subcluster name (yes, it must be a globally unique name for the whole grid, not just unique to your site. Get creative.) Option Values Accepted Explanation name String The same name that is in the Section label; it should be globally unique ram_mb Positive Integer Megabytes of RAM per node cores_per_node Positive Integer Number of cores per node allowed_vos Comma-separated List or * The collaborations that are allowed to run jobs on this subcluster The following attributes are optional: Option Values Accepted Explanation max_wall_time Positive Integer Maximum wall-clock time, in minutes, that a job is allowed to run on this subcluster. The default is 1440, or the equivalent of one day. queue String The queue to which jobs should be submitted in order to run on this subcluster extra_transforms Classad Transformation attributes which the HTCondor Job Router should apply to incoming jobs so they can run on this subcluster","title":"Subcluster Configuration"},{"location":"other/configuration-with-osg-configure/#resource-entry-configuration-atlas-only","text":"If you are configuring a CE for the ATLAS VO, you must provide hardware information to advertise the queues that are available to AGIS. For each queue, create a new Resource Entry section with a unique name in the following format: [Resource Entry RESOURCE] where RESOURCE is a globally unique resource name (it must be a globally unique name for the whole grid, not just unique to your site). The following options are required for the Resource Entry section and are used to generate the data required by AGIS: Option Values Accepted Explanation name String The same name that is in the Resource Entry label; it must be globally unique max_wall_time Positive Integer Maximum wall-clock time, in minutes, that a job is allowed to run on this resource queue String The queue to which jobs should be submitted to run on this resource cpucount (alias cores_per_node ) Positive Integer Number of cores that a job using this resource can get maxmemory (alias ram_mb ) Positive Integer Maximum amount of memory (in MB) that a job using this resource can get allowed_vos Comma-separated List or * The collaborations that are allowed to run jobs on this resource The following attributes are optional: Option Values Accepted Explanation subclusters Comma-separated List The physical subclusters the resource entry refers to; must be defined as Subcluster sections elsewhere in the file vo_tag String An arbitrary label that is added to jobs routed through this resource","title":"Resource Entry Configuration (ATLAS only)"},{"location":"other/configuration-with-osg-configure/#glideinwms-entry-cms-and-osg-pilot-factories","text":"If you are configuring a CE that is going to receive pilot jobs from the CMS or the OSG factories (CMS, OSG, LIGO, CLAS12, DUNE, Glow, IceCube, ...), you can provide pilot job specifications to help operators automatically configure the factory entries in GlideinWMS. For each pilot type, create a new Pilot section with a unique name in the following format: [Pilot NAME] where NAME is a string describing the pilot type (e.g.: GPU, WholeNode, default). The following options can be specified in the Pilot section: This section is contained in /etc/osg/config.d/35-pilot.ini Option Values Accepted Explanation cpucount Positive Integer The number of cores for this pilot type. ram_mb Positive Integer The amount of memory (in megabytes) for this pilot type. whole_node true, false This is a whole node pilot; cpucount and ram_mb are ignored if this is true. gpucount Positive Integer The number of GPUs available max_pilots Positive Integer The maximum number of pilots of this type that can be sent max_wall_time Positive Integer The maximum wall-clock time a job is allowed to run for this pilot type, in minutes queue String The queue or partition which jobs should be submitted to in order to run on this resource. Equivalent to the HTCondor grid universe classad attribute remote_queue require_singularity true, false True if the pilot should require singularity or apptainer on the workers. os Comma-separated List The OS of the workers; allowed values are rhel6 , rhel7 , rhel8 , or ubuntu18 . This is required unless require_singularity = true * send_tests true, false Send test pilots? Currently not working, placeholder allowed_vos Comma-separated List or * A comma-separated list of collaborations that are allowed to submit to this subcluster","title":"GlideinWMS Entry (CMS and OSG pilot factories)"},{"location":"other/configuration-with-osg-configure/#gateway","text":"This section gives information about the options in the Gateway section of the configuration files. These options control the behavior of job gateways on the CE. CEs are based on HTCondor-CE, which uses condor-ce as the gateway. This section is contained in /etc/osg/config.d/10-gateway.ini which is provided by the osg-configure-gateway RPM. Option Values Accepted Explanation htcondor_gateway_enabled True , False (default True). True if the CE is using HTCondor-CE, False otherwise. HTCondor-CE will be configured to support enabled batch systems. RSV will use HTCondor-CE to launch remote probes. job_envvar_path String The value of the PATH environment variable to put into HTCondor jobs running with HTCondor-CE. This value is ignored if not using that batch system/gateway combination.","title":"Gateway"},{"location":"other/configuration-with-osg-configure/#local-settings","text":"This section differs from other sections in that there are no set options in this section. Rather, the options set in this section will be placed in the osg-local-job-environment.conf verbatim. The options in this section are case sensitive and the case will be preserved when they are converted to environment variables. The osg-local-job-environment.conf file gets sourced by jobs run on your cluster so any variables set in this section will appear in the environment of jobs run on your system. Adding a line such as My_Setting = my_Value would result in the an environment variable called My_Setting set to my_Value in the job's environment. my_Value can also be defined in terms of an environment variable (i.e My_Setting = $my_Value ) that will be evaluated on the worker node. For example, to add a variable MY_PATH set to /usr/local/myapp , you'd have the following: [Local Settings] MY_PATH = /usr/local/myapp This section is contained in /etc/osg/config.d/40-localsettings.ini which is provided by the osg-configure-ce RPM.","title":"Local Settings"},{"location":"other/configuration-with-osg-configure/#site-information","text":"The settings found in the Site Information section are described below. This section is used to give information about a resource such as resource name, site sponsors, administrators, etc. This section is contained in /etc/osg/config.d/40-siteinfo.ini which is provided by the osg-configure-ce RPM. Option Values Accepted Description group OSG , OSG-ITB This should be set to either OSG or OSG-ITB depending on whether your resource is in the OSG or OSG-ITB group. Most sites should specify OSG host_name String This should be set to be hostname of the CE that is being configured resource String The resource name of this CE endpoint as registered in OIM. resource_group String The resource_group of this CE as registered in OIM. sponsor String This should be set to the sponsor of the resource. See note. site_policy Url This should be a url pointing to the resource's usage policy contact String This should be the name of the resource's admin contact email Email address This should be the email address of the admin contact for the resource city String This should be the city that the resource is located in country String This should be two letter country code for the country that the resource is located in. longitude Number This should be the longitude of the resource. It should be a number between -180 and 180. latitude Number This should be the latitude of the resource. It should be a number between -90 and 90. Note sponsor : If your resource has multiple sponsors, you can separate them using commas or specify the percentage using the following format 'osg, atlas, cms' or 'osg:10, atlas:45, cms:45'. The percentages must add up to 100 if multiple sponsors are used. If you have a sponsor that is not an OSG VO, you can indicate this by using 'local' as the VO.","title":"Site Information"},{"location":"other/configuration-with-osg-configure/#squid","text":"This section handles the configuration and setup of the squid web caching and proxy service. This section is contained in /etc/osg/config.d/01-squid.ini which is provided by the osg-configure-squid RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the squid service is being used or not. location String This should be set to the hostname:port of the squid server.","title":"Squid"},{"location":"other/configuration-with-osg-configure/#storage","text":"This section gives information about the options in the Storage section of the configuration file. Several of these values are constrained and need to be set in a way that is consistent with one of the OSG storage models. Please review the Storage Related Parameters section of the Environment Variables description and Site Planning discussions for explanations of the various storage models and the requirements for them. This section is contained in /etc/osg/config.d/10-storage.ini which is provided by the osg-configure-ce RPM. Option Values Accepted Explanation se_available True , False This indicates whether there is an associated SE available. default_se String If an SE is available at your cluster, set default_se to the hostname of this SE, otherwise set default_se to UNAVAILABLE. grid_dir String This setting should point to the directory which holds the files from the OSG worker node package. See note app_dir String This setting should point to the directory which contains the VO specific applications. See note data_dir String This setting should point to a directory that can be used to store and stage data in and out of the cluster. See note worker_node_temp String This directory should point to a directory that can be used as scratch space on compute nodes. If not set, the default is UNAVAILABLE. See note site_read String This setting should be the location or url to a directory that can be read to stage in data via the variable $OSG_SITE_READ . This is an url if you are using a SE. If not set, the default is UNAVAILABLE site_write String This setting should be the location or url to a directory that can be write to stage out data via the variable $OSG_SITE_WRITE . This is an url if you are using a SE. If not set, the default is UNAVAILABLE Dynamic worker node paths The above variables may be set to an environment variable that is set on your site's worker nodes. For example, if each of your worker nodes has a different location for its scratch directory specified by LOCAL_SCRATCH_DIR , set the following configuration: [Storage] worker_node_temp = $LOCAL_SCRATCH_DIR grid_dir : If you have installed the worker node client via RPM (the normal case) it should be /etc/osg/wn-client . If you have installed the worker node in a special location (perhaps via the worker node client tarball or via OASIS), it should be the location of that directory. This directory will be accessed via the $OSG_GRID environment variable. It should be visible on all of the compute nodes. Read access is required, though worker nodes don't need write access. app_dir : This directory will be accesed via the $OSG_APP environment variable. It should be visible on both the CE and worker nodes. Only the CE needs to have write access to this directory. This directory must also contain a sub-directory etc/ with 1777 permissions. This directory may also be in OASIS, in which case set app_dir to /cvmfs/oasis.opensciencegrid.org . (The CE does not need write access in that case.) data_dir : This directory can be accessed via the $OSG_DATA environment variable. It should be readable and writable on both the CE and worker nodes. worker_node_temp : This directory will be accessed via the $OSG_WN_TMP environment variable. It should allow read and write access on a worker node and can be visible to just that worker node.","title":"Storage"},{"location":"other/firewall/","text":"Firewall Considerations \u00b6 Services run at a site need to communicate with the distributed OSG Fabric of Services, which may require changes in your firewall. For instance, the OSG Factory hosts need to communicate with CEs in order for your site to receive any work. This page contains information about ports and hosts that need to communicate with your site services. Note Inbound hosts only apply to certain collaborations (including the OSPool). If an administrator is unsure about which hosts to allow, they should contact the collaborations that they support. Limiting inbound connections will limit collaborators' ability to remotely troubleshoot issues or result in service outages as a collaboration's glidein submission infrastructure evolves. Compute Entrypoints \u00b6 Destination Port Direction Hosts TCP 9619 Inbound gfactory-2.opensciencegrid.org gfactory-itb-1.opensciencegrid.org vocms0207.cern.ch TCP 9619 Outbound collector.opensciencegrid.org collector1.opensciencegrid.org collector1.opensciencegrid.org","title":"Firewall Considerations"},{"location":"other/firewall/#firewall-considerations","text":"Services run at a site need to communicate with the distributed OSG Fabric of Services, which may require changes in your firewall. For instance, the OSG Factory hosts need to communicate with CEs in order for your site to receive any work. This page contains information about ports and hosts that need to communicate with your site services. Note Inbound hosts only apply to certain collaborations (including the OSPool). If an administrator is unsure about which hosts to allow, they should contact the collaborations that they support. Limiting inbound connections will limit collaborators' ability to remotely troubleshoot issues or result in service outages as a collaboration's glidein submission infrastructure evolves.","title":"Firewall Considerations"},{"location":"other/firewall/#compute-entrypoints","text":"Destination Port Direction Hosts TCP 9619 Inbound gfactory-2.opensciencegrid.org gfactory-itb-1.opensciencegrid.org vocms0207.cern.ch TCP 9619 Outbound collector.opensciencegrid.org collector1.opensciencegrid.org collector1.opensciencegrid.org","title":"Compute Entrypoints"},{"location":"other/install-cvmfs-stratum1/","text":"Install a CVMFS Stratum 1 \u00b6 This document describes how to install a CVMFS Stratum 1. There are many different variations on how to do that, but this document focuses on the configuration of the OSG Operations Stratum 1 oasis-replica.opensciencegrid.org. It is applicable to other Stratum 1s as well, very likely with modifications (some of which are suggested in the document below). Applicable versions The applicable software versions for this document are cvmfs and cvmfs-server >= 2.4.2. Before Starting \u00b6 Before starting the installation process, consider the following points: User IDs and Group IDs: If your machine is also going to be a repository server like OSG Operations, the installation will create the same user and group IDs as the cvmfs client . If you are installing frontier-squid, the installation will also create the same user id as frontier-squid . Network ports: This installation will host the stratum 1 on ports 80, 8000 and 8080, and if squid is installed it will host the uncached apache on port 8081. Port 80 is default but sometimes runs into operational problems, port 8000 is the alternate for most production use, and port 8080 is for Cloudflare (https://openhtc.io). Host choice: - Make sure there is adequate disk space for all the repositories that will be served, at /srv/cvmfs . In addition, about 100GB should be reserved for apache and squid logs under /var/log on a production server, although they normally will not get that large. Apache logs get larger than squid logs because by default they are rotated much less frequently. Many installations share that space with the filesystem used for /srv/cvmfs by turning that directory along with /var/log/squid and /var/log/httpd into symlinks pointing to directories on the big filesystem. SELinux - Ensure SELinux is disabled As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Installing \u00b6 All CVMFS Stratum 1s require cvmfs-server software and apache (httpd). It is highly recommended to also install frontier-squid and frontier-awstats on the same machine to be able to easily join the WLCG MRTG and awstats monitoring systems. The recommended configuration for frontier-squid below only caches geo api lookups. Other than that, it is primarily for monitoring. Installing cvmfs-server and httpd \u00b6 Use this command to install cvmfs-server and httpd: root@host # yum -y install cvmfs-server cvmfs-config mod_wsgi Installing frontier-squid and frontier-awstats \u00b6 frontier-awstats is not distributed by OSG so these instructions get it from its original source. Do these commands to install frontier-squid and frontier-awstats: root@host # rpm -i http://frontier.cern.ch/dist/rpms/RPMS/noarch/frontier-release-1.1-1.noarch.rpm root@host # yum -y install frontier-squid frontier-awstats Configuring \u00b6 Configuring the system \u00b6 Increase the default number of open file descriptors: root@host # echo -e \"*\\t\\t-\\tnofile\\t\\t16384\" >>/etc/security/limits.conf root@host # ulimit -n 16384 In order for this to apply also interactively when logging in over ssh, the option UsePAM has to be set to yes in /etc/ssh/sshd_config . Configuring cron \u00b6 First, create the log directory: root@host # mkdir -p /var/log/cvmfs Put the following in /etc/cron.d/cvmfs : */5 * * * * root test -d /srv/cvmfs || exit;cvmfs_server snapshot -ai 6 1 * * * root cvmfs_server gc -af 2>/dev/null || true 0 9 * * * root find /srv/cvmfs/*.*/data/txn -name \"*.*\" -mtime +2 2>/dev/null|xargs rm -f Also, put the following in /etc/logrotate.d/cvmfs : /var/log/cvmfs/*.log { weekly missingok notifempty } Configuring apache \u00b6 If you are installing frontier-squid, create /etc/httpd/conf.d/cvmfs.conf and put the following lines into it: Listen 8081 KeepAlive On If you are not installing frontier-squid, instead put the following lines into that file: Listen 8000 Listen 8080 KeepAlive On Then enable apache: root@host # systemctl enable httpd root@host # systemctl start httpd Configuring frontier-squid \u00b6 Put the following in /etc/squid/customize.sh after the existing comment header: awk -- file ` dirname $ 0 ` / customhelps . awk -- source '{ # cache only api calls insertline(\"^http_access deny all\", \"acl CVMFSAPI urlpath_regex ^/cvmfs/[^/]*/api/\") insertline(\"^http_access deny all\", \"cache deny !CVMFSAPI\") # port 80 is also supported, through an iptables redirect setoption(\"http_port\", \"8080 accel defaultsite=localhost:8081 no-vhost\") insertline(\"^http_port\",\"http_port 8000 accel defaultsite=localhost:8081 no-vhost\") setoption(\"cache_peer\", \"localhost parent 8081 0 no-query originserver\") # allow incoming http accesses from anywhere # all requests will be forwarded to the originserver commentout(\"http_access allow NET_LOCAL\") insertline(\"^http_access deny all\", \"http_access allow all\") # do not let squid cache DNS entries more than 5 minutes setoption(\"positive_dns_ttl\", \"5 minutes\") # set shutdown_lifetime to 0 to avoid giving new connections error # codes, which get cached upstream setoption(\"shutdown_lifetime\", \"0 seconds\") # turn off collapsed_forwarding to prevent slow clients from slowing down # faster ones setoption(\"collapsed_forwarding\", \"off\") print }' On EL7 and EL8 systems, make sure that firewalld is disabled and iptables-services is installed and enabled: root@host # systemctl stop firewalld root@host # systemctl disable firewalld root@host # systemctl mask --now firewalld root@host # yum -y install iptables-services root@host # systemctl start iptables root@host # systemctl enable iptables root@host # systemctl start ip6tables root@host # systemctl enable ip6tables Forward port 80 to port 8000: root@host # iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000 root@host # service iptables save root@host # ip6tables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000 root@host # service ip6tables save Enable frontier-squid: root@host # systemctl enable frontier-squid root@host # systemctl start frontier-squid Note The above configuration is for a single squid thread, which is fine for 1Gbit/s and possibly 2Gbit/s, but if higher bandwidth is needed, see the instructions for running multiple squid workers . Verifying \u00b6 In order to verify that everything is installed correctly, create a repository replica. The repository chosen for the instructions below is the OSG config repository because it is very small, but you can use another one if you prefer. Adding an example repository \u00b6 It's a good idea to make your own script for adding repository replicas, because there's always at least two commands to run, and it's easy to forget which commands to run. The commands are: root@host # cvmfs_server add-replica -o root http://oasis.opensciencegrid.org:8000/cvmfs/config-osg.opensciencegrid.org /etc/cvmfs/keys/opensciencegrid.org/opensciencegrid.org.pub root@host # cvmfs_server snapshot config-osg.opensciencegrid.org With large repositories that can take a very long time, but with small repositories it should be very quick and not show any errors. Verifying that the replica is being served \u00b6 Now to verify that the replication is working, do the following commands: root@host # wget -qdO- http://localhost:8000/cvmfs/config-osg.opensciencegrid.org/.cvmfspublished | cat -v root@host # wget -qdO- http://localhost:80/cvmfs/config-osg.opensciencegrid.org/.cvmfspublished | cat -v Both commands should show a short file including gibberish at the end which is the signature. It is a good idea to familiarize yourself with the log entries at /var/log/httpd/access_log and also, if you have installed frontier-squid, at /var/log/squid/access.log . Also, at least 15 minutes after the snapshot is finished, check the log /var/log/cvmfs/snapshots.log to see that it tried to get an update and got no errors. Setting up monitoring \u00b6 If you installed frontier-squid and frontier-awstats, there is a little more to do to configure monitoring. First, make sure that your firewall accepts UDP queries from the monitoring server at CERN. Details are in the frontier-squid instructions . Next, choose any random password and put it in /etc/awstats/password-file . Then tell Dave Dykstra the fully qualified domain name of your machine and the password you chose, and he'll set up the monitoring servers. Finally, install the cvmfs-servermon package so the stratum 1 can be watched for problems with repositories. Managing replication \u00b6 Instead of manually managing replication it is highly recommended to use the cvmfs-manage-replicas package which can automatically add repositories based on wildcards of repositories installed elsewhere.","title":"Install a CVMFS Stratum 1"},{"location":"other/install-cvmfs-stratum1/#install-a-cvmfs-stratum-1","text":"This document describes how to install a CVMFS Stratum 1. There are many different variations on how to do that, but this document focuses on the configuration of the OSG Operations Stratum 1 oasis-replica.opensciencegrid.org. It is applicable to other Stratum 1s as well, very likely with modifications (some of which are suggested in the document below). Applicable versions The applicable software versions for this document are cvmfs and cvmfs-server >= 2.4.2.","title":"Install a CVMFS Stratum 1"},{"location":"other/install-cvmfs-stratum1/#before-starting","text":"Before starting the installation process, consider the following points: User IDs and Group IDs: If your machine is also going to be a repository server like OSG Operations, the installation will create the same user and group IDs as the cvmfs client . If you are installing frontier-squid, the installation will also create the same user id as frontier-squid . Network ports: This installation will host the stratum 1 on ports 80, 8000 and 8080, and if squid is installed it will host the uncached apache on port 8081. Port 80 is default but sometimes runs into operational problems, port 8000 is the alternate for most production use, and port 8080 is for Cloudflare (https://openhtc.io). Host choice: - Make sure there is adequate disk space for all the repositories that will be served, at /srv/cvmfs . In addition, about 100GB should be reserved for apache and squid logs under /var/log on a production server, although they normally will not get that large. Apache logs get larger than squid logs because by default they are rotated much less frequently. Many installations share that space with the filesystem used for /srv/cvmfs by turning that directory along with /var/log/squid and /var/log/httpd into symlinks pointing to directories on the big filesystem. SELinux - Ensure SELinux is disabled As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories","title":"Before Starting"},{"location":"other/install-cvmfs-stratum1/#installing","text":"All CVMFS Stratum 1s require cvmfs-server software and apache (httpd). It is highly recommended to also install frontier-squid and frontier-awstats on the same machine to be able to easily join the WLCG MRTG and awstats monitoring systems. The recommended configuration for frontier-squid below only caches geo api lookups. Other than that, it is primarily for monitoring.","title":"Installing"},{"location":"other/install-cvmfs-stratum1/#installing-cvmfs-server-and-httpd","text":"Use this command to install cvmfs-server and httpd: root@host # yum -y install cvmfs-server cvmfs-config mod_wsgi","title":"Installing cvmfs-server and httpd"},{"location":"other/install-cvmfs-stratum1/#installing-frontier-squid-and-frontier-awstats","text":"frontier-awstats is not distributed by OSG so these instructions get it from its original source. Do these commands to install frontier-squid and frontier-awstats: root@host # rpm -i http://frontier.cern.ch/dist/rpms/RPMS/noarch/frontier-release-1.1-1.noarch.rpm root@host # yum -y install frontier-squid frontier-awstats","title":"Installing frontier-squid and frontier-awstats"},{"location":"other/install-cvmfs-stratum1/#configuring","text":"","title":"Configuring"},{"location":"other/install-cvmfs-stratum1/#configuring-the-system","text":"Increase the default number of open file descriptors: root@host # echo -e \"*\\t\\t-\\tnofile\\t\\t16384\" >>/etc/security/limits.conf root@host # ulimit -n 16384 In order for this to apply also interactively when logging in over ssh, the option UsePAM has to be set to yes in /etc/ssh/sshd_config .","title":"Configuring the system"},{"location":"other/install-cvmfs-stratum1/#configuring-cron","text":"First, create the log directory: root@host # mkdir -p /var/log/cvmfs Put the following in /etc/cron.d/cvmfs : */5 * * * * root test -d /srv/cvmfs || exit;cvmfs_server snapshot -ai 6 1 * * * root cvmfs_server gc -af 2>/dev/null || true 0 9 * * * root find /srv/cvmfs/*.*/data/txn -name \"*.*\" -mtime +2 2>/dev/null|xargs rm -f Also, put the following in /etc/logrotate.d/cvmfs : /var/log/cvmfs/*.log { weekly missingok notifempty }","title":"Configuring cron"},{"location":"other/install-cvmfs-stratum1/#configuring-apache","text":"If you are installing frontier-squid, create /etc/httpd/conf.d/cvmfs.conf and put the following lines into it: Listen 8081 KeepAlive On If you are not installing frontier-squid, instead put the following lines into that file: Listen 8000 Listen 8080 KeepAlive On Then enable apache: root@host # systemctl enable httpd root@host # systemctl start httpd","title":"Configuring apache"},{"location":"other/install-cvmfs-stratum1/#configuring-frontier-squid","text":"Put the following in /etc/squid/customize.sh after the existing comment header: awk -- file ` dirname $ 0 ` / customhelps . awk -- source '{ # cache only api calls insertline(\"^http_access deny all\", \"acl CVMFSAPI urlpath_regex ^/cvmfs/[^/]*/api/\") insertline(\"^http_access deny all\", \"cache deny !CVMFSAPI\") # port 80 is also supported, through an iptables redirect setoption(\"http_port\", \"8080 accel defaultsite=localhost:8081 no-vhost\") insertline(\"^http_port\",\"http_port 8000 accel defaultsite=localhost:8081 no-vhost\") setoption(\"cache_peer\", \"localhost parent 8081 0 no-query originserver\") # allow incoming http accesses from anywhere # all requests will be forwarded to the originserver commentout(\"http_access allow NET_LOCAL\") insertline(\"^http_access deny all\", \"http_access allow all\") # do not let squid cache DNS entries more than 5 minutes setoption(\"positive_dns_ttl\", \"5 minutes\") # set shutdown_lifetime to 0 to avoid giving new connections error # codes, which get cached upstream setoption(\"shutdown_lifetime\", \"0 seconds\") # turn off collapsed_forwarding to prevent slow clients from slowing down # faster ones setoption(\"collapsed_forwarding\", \"off\") print }' On EL7 and EL8 systems, make sure that firewalld is disabled and iptables-services is installed and enabled: root@host # systemctl stop firewalld root@host # systemctl disable firewalld root@host # systemctl mask --now firewalld root@host # yum -y install iptables-services root@host # systemctl start iptables root@host # systemctl enable iptables root@host # systemctl start ip6tables root@host # systemctl enable ip6tables Forward port 80 to port 8000: root@host # iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000 root@host # service iptables save root@host # ip6tables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000 root@host # service ip6tables save Enable frontier-squid: root@host # systemctl enable frontier-squid root@host # systemctl start frontier-squid Note The above configuration is for a single squid thread, which is fine for 1Gbit/s and possibly 2Gbit/s, but if higher bandwidth is needed, see the instructions for running multiple squid workers .","title":"Configuring frontier-squid"},{"location":"other/install-cvmfs-stratum1/#verifying","text":"In order to verify that everything is installed correctly, create a repository replica. The repository chosen for the instructions below is the OSG config repository because it is very small, but you can use another one if you prefer.","title":"Verifying"},{"location":"other/install-cvmfs-stratum1/#adding-an-example-repository","text":"It's a good idea to make your own script for adding repository replicas, because there's always at least two commands to run, and it's easy to forget which commands to run. The commands are: root@host # cvmfs_server add-replica -o root http://oasis.opensciencegrid.org:8000/cvmfs/config-osg.opensciencegrid.org /etc/cvmfs/keys/opensciencegrid.org/opensciencegrid.org.pub root@host # cvmfs_server snapshot config-osg.opensciencegrid.org With large repositories that can take a very long time, but with small repositories it should be very quick and not show any errors.","title":"Adding an example repository"},{"location":"other/install-cvmfs-stratum1/#verifying-that-the-replica-is-being-served","text":"Now to verify that the replication is working, do the following commands: root@host # wget -qdO- http://localhost:8000/cvmfs/config-osg.opensciencegrid.org/.cvmfspublished | cat -v root@host # wget -qdO- http://localhost:80/cvmfs/config-osg.opensciencegrid.org/.cvmfspublished | cat -v Both commands should show a short file including gibberish at the end which is the signature. It is a good idea to familiarize yourself with the log entries at /var/log/httpd/access_log and also, if you have installed frontier-squid, at /var/log/squid/access.log . Also, at least 15 minutes after the snapshot is finished, check the log /var/log/cvmfs/snapshots.log to see that it tried to get an update and got no errors.","title":"Verifying that the replica is being served"},{"location":"other/install-cvmfs-stratum1/#setting-up-monitoring","text":"If you installed frontier-squid and frontier-awstats, there is a little more to do to configure monitoring. First, make sure that your firewall accepts UDP queries from the monitoring server at CERN. Details are in the frontier-squid instructions . Next, choose any random password and put it in /etc/awstats/password-file . Then tell Dave Dykstra the fully qualified domain name of your machine and the password you chose, and he'll set up the monitoring servers. Finally, install the cvmfs-servermon package so the stratum 1 can be watched for problems with repositories.","title":"Setting up monitoring"},{"location":"other/install-cvmfs-stratum1/#managing-replication","text":"Instead of manually managing replication it is highly recommended to use the cvmfs-manage-replicas package which can automatically add repositories based on wildcards of repositories installed elsewhere.","title":"Managing replication"},{"location":"other/install-gwms-frontend/","text":"GlideinWMS VO Frontend Installation \u00b6 This document describes how to install the Glidein Workflow Managment System (GlideinWMS) VO Frontend for use with the OSG Glidein factory. This software is the minimum requirement for a VO to use GlideinWMS. This document assumes expertise with HTCondor and familiarity with the GlideinWMS software. It does not cover anything but the simplest possible install. Please consult the GlideinWMS reference documentation for advanced topics, including non- root , non-RPM-based installation. This document covers three components of the GlideinWMS a VO needs to install: User Pool Collectors : A set of condor_collector processes. Pilots submitted by the factory will join to one of these collectors to form a HTCondor pool. User Pool Schedd : A condor_schedd . Users may submit HTCondor vanilla universe jobs to this schedd; it will run jobs in the HTCondor pool formed by the User Pool Collectors . Glidein Frontend : The frontend will periodically query the User Pool Schedd to determine the desired number of running job slots. If necessary, it will request the Factory to launch additional pilots. This guide covers installation of all three components on the same host: it is designed for small to medium VOs (see the Hardware Requirements below). Given a significant, large host, we have been able to scale the single-host install to 20,000 running jobs. Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If they do not exist already, the installation will create the Linux users apache (UID 48), condor , frontend , and gratia Network: The VO frontend must have reliable network connectivity and be on the public internet (i.e. no NAT). The latest version requires the following TCP ports to be open: 80 (HTTP) for monitoring and serving configuration to workers 9618 (HTCondor shared port) for HTCondor daemons including the Schedd and User Collector 9620 to 9660 for secondary collectors (depending on configuration, see below) Host choice : The GlideinWMS VO Frontend has the following hardware requirements for a production host: CPU : Four cores, preferably no more than 2 years old. RAM : 3GB plus 2MB per running job. For example, to sustain 2000 running jobs, a host with 5GB is needed. Disk : 30GB will be sufficient for all the binaries, config and log files related to GlideinWMS. As this will be an interactive access point, have enough disk space for your users' jobs. Note The default configuration uses a port range (9620 to 9660) for the secondary collectors. You can configure the secondary collectors to use the shared port 9618 instead; this will become the default in the future. Note GlideinWMS versions prior to 3.4.1 also required port 9615 for the Schedd, and did not support using shared port for the secondary collectors. If you are upgrading a standalone access point from version 3.4 or earlier, the default open port has changed from 9615 to 9618, and you need to update your firewall rules to reflect this change. You can figure out which port will be used by running the following command: condor_config_val SHARED_PORT_ARGS For more detailed information, see Configuring GlideinWMS Frontend . As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Credentials and Proxies \u00b6 The VO Frontend will use two credentials in its interactions with the other GlideinWMS services. At this time, these will be proxy files. the VO Frontend proxy (used to authenticate with the other GlideinWMS services). one or more GlideinWMS pilot proxies (used/delegated to the Factory services and submitted on the GlideinWMS pilot jobs). The VO Frontend proxy and the pilot proxy can be the same. By default, the VO Frontend will run as user frontend (UID is machine dependent) so these proxies must be owned by the user frontend . Note Both proxies need to be passwordless to allow automatic proxy renewal . VO Frontend proxy \u00b6 The use of a service certificate is recommended. Then you create a proxy from the certificate as explained in the proxy configuration section . You must give the Factory operations team the DN of this proxy when you initially setup the Frontend and each time the DN changes . Pilot proxies \u00b6 These proxies are used by the Factory to submit the GlideinWMS pilot jobs. Therefore, they must be authorized to access to the CEs (Factory entry points) where jobs are submitted. There is no need to notify the Factory operation about the DN of these proxies (neither at the initial registration nor for subsequent changes). These additional proxies have no special requirements or controls added by the Factory but will probably require VO attributes because of the CEs: if you are able to use one of these proxies to submit jobs to the corresponding CEs where the Factory runs GlideinWMS pilots for you, then the proxies are OK. You can test each of your proxies using globusrun or HTCondor-G. To check the important information about a PEM certificate you can use: openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout . You will need that to find out information for the configuration files and the request to the GlideinWMS Factory. OSG Factory access \u00b6 Before installing the GlideinWMS VO Frontend you need the information about a Glidein Factory that you can access: (recommended) OSG is managing a factory at UCSD You have another Glidein Factory that you can access To request access to the OSG Glidein Factory at UCSD you have to send an email to osg-gfactory-support@physics.ucsd.edu providing: Your Name The VO that is utilizing the VO Frontend The DN of the proxy you will use to communicate with the Factory (VO Frontend DN, e.g. the host certificate subject if you follow the proxy configuration section ) You can propose a security name that will have to be confirmed/changed by the Factory managers (see below) A list of sites where you want to run: Your VO must be supported on those sites You can provide a list or piggy back on existing lists, e.g. all the sites supported for the VO. Check with the Factory managers You can start with one single site In the reply from the OSG Factory managers you will receive some information needed for the configuration of your VO Frontend The exact spelling and capitalization of your VO name. Sometime is different from what is commonly used, e.g. OSG VO is \"OSGVO\". The host of the Factory Collector: gfactory-1.t2.ucsd.edu The DN os the factory, e.g. /DC=org/DC=doegrids/OU=Services/CN=gfactory-1.t2.ucsd.edu The factory identity, e.g.: gfactory@gfactory-1.t2.ucsd.edu The identity on the Factory you will be mapped to. Something like: username@gfactory-1.t2.ucsd.edu Your security name. A unique name, usually containing your VO name: My_SecName A string to add in the main Factory query_expr in the Frontend configuration, e.g. stringListMember(\"<VO>\",GLIDEIN_Supported_VOs) . This is used to select the entries you can use. From there you get the correct name of the VO (above in this list). Installing GlideinWMS Frontend \u00b6 Installing HTCondor \u00b6 If you don't have HTCondor already installed, you can install the HTCondor RPM from the OSG repository: root@host # yum install condor.x86_64 If you already have installed HTCondor using a tarball or a source other than the OSG ROM, you will need to install the empty-condor RPM: root@host # yum install empty-condor --enablerepo = osg-empty Installing the VO Frontend RPM \u00b6 Install the RPM and dependencies (be prepared for a lot of dependencies). root@host # yum install glideinwms-vofrontend This will install the current production release verified and tested by OSG with default HTCondor configuration. This command will install the GlideinWMS vofrontend, HTCondor, the OSG client, and all the required dependencies all on one node. If you wish to install a different version of GlideinWMS, add the \"--enablerepo\" argument to the command as follows: yum install --enablerepo=osg-testing glideinwms-vofrontend : The most recent production release, still in testing phase. This will usually match the current tarball version on the GlideinWMS home page. (The osg-release production version may lag behind the tarball release by a few weeks as it is verified and packaged by OSG). Note that this will also take the osg-testing versions of all dependencies as well. yum install --enablerepo=osg-upcoming glideinwms-vofrontend : The most recent development series release, i.e. version 3.3 release. This has newer features such as cloud submission support, but is less tested. Note that these commands will install default HTCondor configurations with all GlideinWMS services on one node. Installing GlideinWMS Frontend on Multiple Nodes (Advanced) \u00b6 For advanced users expecting heavy usage on their access point, you may want to consider splitting the user collector, user submit, and vo frontend services. This can be doing using the following three commands (on different machines): root@host # yum install glideinwms-vofrontend-standalone root@host # yum install glideinwms-usercollector root@host # yum install glideinwms-userschedd In addition, you will need to perform the following steps: On the vofrontend and userschedd, modify CONDOR_HOST to point to your usercollector. This is in /etc/condor/config.d/00_gwms_general.config . You can also override this value by placing it in a new config file. (For instance, /etc/condor/config.d/99_local_custom.config to avoid rpmsave/rpmnew conflicts on upgrades). In /etc/condor/certs/condor_mapfile , you will need to add the DNs of each machine (userschedd, usercollector, vofrontend). Take great care to escape all special characters. Alternatively, you can use the glidecondor_addDN to add these values. In the /etc/gwms-frontend/frontend.xml file, change the schedd locations to match the correct server. Also change the collectors tags at the bottom of the file. More details on frontend.xml are in the following sections. Configuring GlideinWMS Frontend \u00b6 After installing the RPM, you need to configure the components of the GlideinWMS VO Frontend: Edit Frontend configuration options Edit HTCondor configuration options Create a HTCondor grid map file Reconfigure and Start the Frontend Configuring the Frontend \u00b6 The VO Frontend configuration file is /etc/gwms-frontend/frontend.xml . The next steps will describe each line that you will need to edit if you are using the OSG Factory at UCSD. The portions to edit are highlighted. If you are using a different Factory more changes are necessary, please check the VO Frontend configuration reference. The VO you are affiliated with. This will identify those CEs that the GlideinWMS pilot will be authorized to run on using the pilot proxy described previously in this section . Sometimes the whole query_expr is provided to you by the Factory operators (see Factory access above): <factory query_expr= '((stringListMember(\"VO\", GLIDEIN_Supported_VOs)))' > Factory collector information. The username that you are assigned by the Factory (also called the identity you will be mapped to on the factory, see above) . Note that if you are using a factory different than the production Factory, you will have to change also DN , factory_identity and node attributes. (refer to the information provided to you by the Factory operator): <collector DN= \"/DC=org/DC=doegrids/OU=Services/CN=gfactory-1.t2.ucsd.edu\" comment= \"Define factory collector globally for simplicity\" factory_identity= \"gfactory@gfactory-1.t2.ucsd.edu\" my_identity= \"username@gfactory-1.t2.ucsd.edu\" node= \"gfactory-1.t2.ucsd.edu\" /> Frontend security information. The classad_proxy in the security entry is the location of the VO Frontend proxy described previously here . The proxy_DN is the DN of the classad_proxy above. The security_name identifies this VO Frontend to the the Factory, It is provided by the Factory operator. The absfname in the credential entry is the location of the GlideinWMS pilot proxy described in the requirements section here . There can be multiple pilot proxies, or even other kind of keys (e.g. if you use cloud resources). The type and trust_domain of the credential must match respectively auth_method and trust_domain used in the entry definition in the Factory. If there is no match, between these two attributes in one of the credentials and the corresponding ones in some entry in one of the Factories, then this Frontend cannot trigger glideins. Both the classad_proxy and absfname files should be owned by frontend user. <security classad_proxy= \"/tmp/vo_proxy\" proxy_DN= \"DN of vo_proxy\" proxy_selection_plugin= \"ProxyAll\" security_name= \"The security name, this is used by factory\" sym_key= \"aes_256_cbc\" > <credentials> <credential absfname= \"/tmp/pilot_proxy\" security_class= \"frontend\" trust_domain= \"OSG\" type= \"grid_proxy\" /> </credentials> </security> The schedd information. The DN of the VO Frontend Proxy described previously here . The fullname attribute is the fully qualified domain name of the host where you installed the VO Frontend ( hostname --fqdn ). A secondary schedd is optional. You will need to delete the secondary schedd line if you are not using it. Multiple schedds allow the Frontend to service requests from multiple access points. <schedds> <schedd DN= \"Cert DN used by the schedd at fullname:\" fullname= \"Hostname of the schedd\" /> <schedd DN= \"Cert DN used by the second Schedd at fullname:\" fullname= \"schedd name@Hostname of second schedd\" /> </schedds> The User Collector information. The DN of the VO Frontend Proxy described previously here . The node attribute is the full hostname of the collectors ( hostname --fqdn ) and port The secondary attribute indicates whether the element is for the primary or secondary collectors (True/False). The default HTCondor configuration of the VO Frontend starts multiple Collector processes on the host ( /etc/condor/config.d/11_gwms_secondary_collectors.config ). The DN and hostname on the first line are the hostname and the host certificate of the VO Frontend. The DN and hostname on the second line are the same as the ones in the first one. The hostname (e.g. hostname.domain.tld) is filled automatically during the installation. The secondary collector connection can be defined as sinful string for the sock case , e.g., hostname.domain.tld:9618?sock=collector16. [Example 1] :::xml <collector DN=\"DN of main collector\" node=\"hostname.domain.tld:9618\" secondary=\"False\"/> <collector DN=\"DN of secondary collectors (usually same as DN in line above)\" node=\"hostname.domain.tld:9620-9660\" secondary=\"True\"/> Note In GlideinWMS v3.4.1, shared port only configuration is incompatible if talking to older Factories (v3.4 or older). We strongly recommend any user of GlideinWMS Frontend v3.4.1 or newer, to transition to the use of shared port for secondary collectors and CCBs. The shared port configuration is incompatible if your Frontend is talking to Factories v3.4 or older and you'll get an error telling you to wait. To transition to the use of shared port for secondary collectors, you have to change the collectors section in the Frontend configuration. If you are using the default port range for the secondary collectors as shown in [Example 2] below, then you should replace it with port 9618 and the sock-range as shown in [Example 1] above. If you have a more complex configuration, please read the detailed GlideinWMS configuration [Example 2] :::xml <collector DN=\"DN of main collector\" node=\"hostname.domain.tld:9618\" secondary=\"False\"/> <collector DN=\"DN of secondary collectors (usually same as DN in line above)\" node=\u201chostname.domain.tld:9618?sock=collector0-40\" secondary=\"True\"/> The CCBs information. If you have a different configuration of the HTCondor Connection Brokering (CCB servers) from the default (usually the section is empty as the User Collectors acts as CCB if needed), you can set the connection in the CCB section the same way that User Collector information previously mentioned. Also, the same rules for transition to shared_port of the connections, apply to the CCBs. :::xml <ccb DN=\"DN of the CCB server\" node=\"hostname.domain.tld:9618\"/> <ccb DN=\"DN of the CCB server\" node=\u201chostname.domain.tld:9618?sock=collector0-40\" secondary=\"True\"/> Warning The Frontend configuration includes many knobs, some of which are conflicting with a RPM installation where there is only one version of the Frontend installed and it uses well known paths. Do not change the following in the Frontend configuration (you must leave the default values coming with the RPM installation): frontend_versioning='False' (in the first line of XML, versioning is useful to install multiple tarball versions) for RPM installs, work base_dir must be /var/lib/gwms-frontend/vofrontend/ (other scripts like /etc/init.d/gwms-frontend count on that value) Using a Different Factory \u00b6 The configuration above points to the OSG production Factory. If you are using a different Factory, then you have to: replace gfactory@gfactory-1.t2.ucsd.edu and gfactory-1.t2.ucsd.edu with the correct values for your Factory. And control also that the name used for the Frontend () matches. make sure that the Factory is advertising the attributes used in the Factory query expression ( query_expr ). Configuring HTCondor \u00b6 The HTCondor configuration for the Frontend is placed in /etc/condor/config.d . 00_gwms_general.config 01_gwms_collectors.config 02_gwms_schedds.config 03_gwms_local.config 11_gwms_secondary_collectors.config 90_gwms_dns.config For most installations create a new file named /etc/condor/config.d/92_local_condor.config Using other HTCondor RPMs, e.g. UW Madison HTCondor RPM \u00b6 The above procedure will work if you are using the OSG HTCondor RPMS. You can verify that you used the OSG HTCondor RPM by using yum list condor . The version name should include \"osg\", e.g. 8.6.4-3.osg.el7 . If you are using the UW Madison HTCondor RPMS, be aware of the following changes: This HTCondor RPM uses a file /etc/condor/condor_config.local to add your local machine slot to the user pool. If you want to disable this behavior (recommended), you should blank out that file or comment out the line in /etc/condor/condor_config for LOCAL_CONFIG_FILE. (Make sure that LOCAL_CONFIG_DIR is set to /etc/condor/config.d ) Note that the variable LOCAL_DIR is set differently in UW Madison and OSG RPMs. This should not cause any more problems in the GlideinWMS RPMs, but please take note if you use this variable in your job submissions or other customizations. In general if you are using a non OSG RPM or if you added custom configuration files for HTCondor please check the order of the configuration files: root@host # condor_config_val -config Configuration source: /etc/condor/condor_config Local configuration sources: /etc/condor/config.d/00_gwms_general.config /etc/condor/config.d/01_gwms_collectors.config /etc/condor/config.d/02_gwms_schedds.config /etc/condor/config.d/03_gwms_local.config /etc/condor/config.d/11_gwms_secondary_collectors.config /etc/condor/config.d/90_gwms_dns.config /etc/condor/condor_config.local If, like in the example above, the GlideinWMS configuration files are not the last ones in the list please verify that important configuration options have not been overridden by the other configuration files. Verifying your HTCondor configuration \u00b6 The GlideinWMS configuration files in /etc/condor/config.d should be the last ones in the list. If not, please verify that important configuration options have not been overridden by the other configuration files. Verify the alll the expected HTCondor daemons are running: root@host # condor_config_val -verbose DAEMON_LIST DAEMON_LIST: MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, SHARED_PORT, COLLECTOR0 COLLECTOR1 COLLECTOR2 COLLECTOR3 COLLECTOR4 COLLECTOR5 COLLECTOR6 COLLECTOR7 COLLECTOR8 COLLECTOR9 COLLECTOR10 , COLLECTOR11, COLLECTOR12, COLLECTOR13, COLLECTOR14, COLLECTOR15, COLLECTOR16, COLLECTOR17, COLLECTOR18, COLLECTOR19, COLLECTOR20, COLLECTOR21, COLLECTOR22, COLLECTOR23, COLLECTOR24, COLLECTOR25, COLLECTOR26, COLLECTOR27, COLLECTOR28, COLLECTOR29, COLLECTOR30, COLLECTOR31, COLLECTOR32, COLLECTOR33, COLLECTOR34, COLLECTOR35, COLLECTOR36, COLLECTOR37, COLLECTOR38, COLLECTOR39, COLLECTOR40 Defined in '/etc/condor/config.d/11_gwms_secondary_collectors.config', line 193. If you don't see all the collectors. shared port and the two schedd, then the configuration must be corrected. There should be no startd daemons listed. Creating a HTCondor grid mapfile. \u00b6 The HTCondor mapfile ( /etc/condor/certs/condor_mapfile ) is used for authentication between the GlideinWMS pilot running on a remote worker node, and the local collector. HTCondor uses the mapfile to map certificates to pseudo-users on the local machine. It is important that you map the DN's of: Each schedd proxy : The DN of each schedd that the frontend talks to. Specified in the frontend.xml schedd element DN attribute: <schedds> <schedd DN= \"/DC=org/DC=doegrids/OU=Services/CN=YOUR_HOST\" fullname= \"YOUR_HOST\" /> <schedd DN= \"/DC=org/DC=doegrids/OU=Services/CN=YOUR_HOST\" fullname= \"schedd_jobs2@YOUR_HOST\" /> </schedds> Frontend proxy : The DN of the proxy that the Frontend uses to communicate with the other GlideinWMS services. Specified in the frontend.xml security element proxy_DN attribute: <security classad_proxy= \"/tmp/vo_proxy\" proxy_DN= \"DN of vo_proxy\" .... Each pilot proxy The DN of each proxy that the frontend forwards to the factory to use with the GlideinWMS pilots. This allows the GlideinWMS pilot jobs to communicate with the User Collector. Specified in the frontend.xml proxy absfname attribute (you need to specify the DN of each of those proxies: <security .... <proxies > < proxy absfname= \"/tmp/vo_proxy\" .... : </proxies > Below is an example mapfile, by default found in /etc/condor/certs/condor_mapfile . In this example there are lines for each of services mentioned above. GSI \"<DN OF SCHEDD PROXY>\" schedd GSI \"<DN OF FRONTEND PROXY>\" frontend GSI \"<DN OF PILOT PROXY>\" pilot_proxy GSI \"^/DC=org/DC=doegrids/OU=Services/CN=personal-submit-host2.mydomain.edu$\" <example_of_format> GSI (.*) anonymous FS (.*) \\1 Change <DN OF SCHEDD PROXY> , <DN OF FRONTEND PROXY> , and <DN OF PILOT PROXY> to the distinguished names of the respective proxies. Restarting HTCondor \u00b6 After configuring HTCondor, be sure to restart HTCondor: root@host # service condor restart Proxy Configuration \u00b6 GlideinWMS comes with the gwms-renew-proxies service that can automatically generate and renew the pilot proxies and VO Frontend proxy . To configure this service, modify /etc/gwms-frontend/proxies.ini using the following instructions: For each of your pilot proxies , create a [PILOT <NAME>] section, where <NAME> is a descriptive name for the proxy that is unique to your local configuration. In each section, set the proxy_cert , proxy_key , output , and vo corresponding to each pilot proxy: [PILOT <NAME>] proxy_cert = <PATH TO THE PILOT CERTIFICATE> proxy_key = <PATH TO THE PILOT KEY> output = <PATH TO CREATE THE PILOT PROXY> vo = <NAME OF VIRTUAL ORGANIZATION> Change <PATH TO THE PILOT CERTIFICATE> , <PATH TO THE PILOT KEY> and <PATH TO CREATE THE PILOT PROXY> appropriately to point to the locations of the pilot certificate, pilot key, and pilot proxy, respectively. Additionally, in each [PILOT <NAME>] section, you must specify how the proxy's VOMS attributes will be signed by setting use_voms_server . Choose one of the following options: To directly sign the VOMS attributes (recommended), you must have access to the vo 's certificate and key. Specify the paths to the vo certificate and key, and optionally, the VOMS attribute (e.g. /osg/Role=NULL/Capability=NULL for the OSG VO): use_voms_server = false vo_cert = <PATH TO THE PILOT CERTIFICATE> vo_key = <PATH TO THE PILOT KEY> fqan = <VOMS ATTRIBUTE> Note If you do not have access to the vo 's voms_cert and voms_key , contact the VO manager. To have your proxy's VOMS attributes signed by the vo 's VOMS server, set use_voms_server = true and the VOMS attribute (e.g. /osg/Role=NULL/Capability=NULL for the OSG VO): use_voms_server = true fqan = <VOMS ATTRIBUTE> Warning Due to the retirement of VOMS Admin server in the OSG, use_voms_server = false is the preferred method for signing VOMS attributes. Optionally, the proxy renewal frequency and lifetime (in hours) can be specified in each [PILOT <NAME>] section: # Default: 1 frequency = <RENEWAL FREQUENCY> # Default: 24 lifetime = <PROXY LIFETIME> Configure the location and output of the VO Frontend proxy under the [FRONTEND] section and set the proxy_cert , proxy_key , and output to paths corresponding to your VO Frontend: [FRONTEND] proxy_cert = <PATH TO THE FRONTEND CERTIFICATE> proxy_key = <PATH TO THE FRONTEND KEY> output = <PATH TO CREATE THE FRONTEND PROXY> Note output must be the same path as the classad_proxy specified in this section (OPTIONAL) If you are running the gwms-frontend service under a <NON-DEFAULT USER> (default: frontend ), specify the user as the owner of your proxies under the [COMMON] section: [COMMON] owner = <NON-DEFAULT USER> Note The [COMMON] section is required but its contents are optional Adding Gratia Accounting and a Local Monitoring Page on a Production Server \u00b6 You must report accounting information if you are running more than a few test jobs on the OSG . Install the GlideinWMS Gratia Probe on each of your access points in your GlideinWMS installation: root@host # yum install gratia-probe-glideinwms Edit the ProbeConfig located in /etc/gratia/condor/ProbeConfig . First, edit the SiteName and ProbeName to be a unique identifier for your GlideinWMS access point. There can be multiple probes (with different names) per site. If you haven't already, you should register your GlideinWMS access point in OIM . Then you can use the name you used to register the resource. ProbeName=\"condor:<hostname>\" SiteName=\"HCC-GlideinWMW-Frontend\" Next, turn the probe on by setting EnableProbe : EnableProbe=\"1\" Reconfigure HTCondor: root@host # condor_reconfig Optional Accounting Configuration \u00b6 The following sections contain additional configuration that may be required depending on the customizations you've made to your GlideinWMS frontend installation. Users without Certificates \u00b6 If you have users that submit jobs without a certificate explicitly declared in the submit file, you will need to add MapUnknownToGroup to the ProbeConfig. In the file /etc/gratia/condor/ProbeConfig , add the value after the EnableProbe . ... SuppressGridLocalRecords=\"0\" EnableProbe=\"1\" MapUnknownToGroup=\"1\" Title3=\"Tuning parameter\" ... Further, if you want to record all usage as coming from a single VO, you can configure the probe to override the 'guessed' VO. In the below example, replace <ENGAGE> with a registered VO that you would like to report as. If you don't have a VO that you are affiliated with, you may use \"Engage\". ... MapUnknownToGroup=\"1\" MapGroupToRole=\"1\" VOOverride=\"<ENGAGE>\" ... Non-Standard HTCondor Install \u00b6 If HTCondor is installed in a non-standard location (i.e., not RPMs, or relocated RPM outside /usr/bin ), then you need to tell the probe where to find the HTCondor binaries. This can be done with a script with a special attribute in /etc/gratia/condor/ProbeConfig , CondorLocation . Point it to the location of the HTCondor install, such that CondorLocation/bin/condor_version exists. New Data Directory \u00b6 If your PER_JOB_HISTORY_DIR HTCondor configuration variable is different from the default value, you must update the value of DataFolder in /etc/gratia/condor/ProbeConfig . To check the value of PER_JOB_HISTORY_DIR run the following command: user@host $ condor_config_val PER_JOB_HISTORY_DIR Different collector and other customizations \u00b6 By default the probe reports to the OSG GRACC. To change that you must edit the configuration file, /etc/gratia/condor/ProbeConfig , and replace the OSG production host with your desired one: ... CollectorHost=\"gratia-osg-prod.opensciencegrid.org:80\" SSLHost=\"gratia-osg-prod.opensciencegrid.org:443\" SSLRegistrationHost=\"gratia-osg-prod.opensciencegrid.org:80\" ... Optional Configuration \u00b6 The following configuration steps are optional and will likely not be required for setting up a small site. If you do not need any of the following special configurations, skip to the section on using GlideinWMS . Allow users to specify where their jobs run Creating a group to test configuration changes Allowing users to specify where their jobs run \u00b6 In order to allow users to specify the sites at which their jobs want to run (or to test a specific site), a Frontend can be configured to match on DESIRED_Sites or ignore it if not specified. Modify /etc/gwms-frontend/frontend.xml using the following instructions: In the Frontend's global <match> stanza, set the match_expr : '((job.get(\"DESIRED_Sites\",\"nosite\")==\"nosite\") or (glidein[\"attrs\"][\"GLIDEIN_Site\"] in job.get(\"DESIRED_Sites\",\"nosite\").split(\",\")))' In the same <match> stanza, set the start_expr : '(DESIRED_Sites=?=undefined || stringListMember(GLIDEIN_Site,DESIRED_Sites,\",\")) Add the DESIRED_Sites attribute to the match attributes list: <match_attrs> <match_attr name= \"DESIRED_Sites\" type= \"string\" /> </match_attrs> Reconfigure the Frontend: root@host # /etc/init.d/gwms-frontend reconfig Creating a group for testing configuration changes \u00b6 To perform configuration changes without impacting production the recommended way is to create an ITB group in /etc/gwms-frontend/frontend.xml . This groupwould only match jobs that have the +is_itb=True ClassAd. Create a group named itb. Set the group's start_expr so that the group's glideins will only match user jobs with +is_itb=True : <match match_expr= \"True\" start_expr= \"(is_itb)\" > Set the factory_query_expr so that this group only communicates with ITB factories: <factory query_expr= 'FactoryType=?=\"itb\"' > Set the group's collector stanza to reference the ITB factory, replacing username@gfactory-1.t2.ucsd.edu with your factory identity: <collector DN= \"/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=glidein-itb.grid.iu.edu\" \\ factory_identity= \"gfactory@glidein-itb.grid.iu.edu\" \\ my_identity= \"username@gfactory-1.t2.ucsd.edu\" \\ node= \"glidein-itb.grid.iu.edu\" /> Set the job query_expr so that only ITB jobs appear in condor_q : <job query_expr= \"(!isUndefined(is_itb) && is_itb)\" > Reconfigure the Frontend (see the section below ): # on EL7 systems systemctl reload gwms-frontend Using GlideinWMS \u00b6 Managing GlideinWMS Services \u00b6 In addition to the GlideinWMS service itself, there are a number of supporting services in your installation. The specific services are: Software Service name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info Gratia gratia-probes-cron Accounting software HTCondor condor HTTPD httpd GlideinWMS monitoring and staging GlideinWMS gwms-renew-proxies.timer Automatic proxy renewal gwms-frontend The main GlideinWMS service Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> Reconfiguring GlideinWMS \u00b6 After changing the configuration of GlideinWMS, run the following command as root : root@host # systemctl reload gwms-frontend Note Note that systemctl reload gwms-frontend will work only if: - gwms-frontend service is running - gwms-frontend service was started with systemctl Otherwise, you will get the following error in any of the cases: # systemctl reload gwms-frontend Job for gwms-frontend.service invalid. Upgrading GlideinWMS FrontEnd \u00b6 After upgrading the GlideinWMS RPM, you must issue an upgrade command to GlideinWMS: Stop the condor and gwms-frontend services as specified in this section Issue the upgrade command: root@host # /usr/sbin/gwms-frontend upgrade Start the condor and gwms-frontend services as specified in this section Validating GlideinWMS Frontend \u00b6 The complete validation of the Frontend is the submission of actual jobs. However, there are a few things that can be checked prior to submitting user jobs to HTCondor. Verifying Services Are Running \u00b6 There are a few things that can be checked prior to submitting user jobs to HTCondor. Verify all HTCondor daemons are started. user@host $ condor_config_val -verbose DAEMON_LIST DAEMON_LIST: MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, SHARED_PORT, SCHEDDJOBS2 COLLECTOR0 COLLECTOR1 COLLECTOR2 COLLECTOR3 COLLECTOR4 COLLECTOR5 COLLECTOR6 COLLECTOR7 COLLECTOR8 COLLECTOR9 COLLECTOR10 , COLLECTOR11, COLLECTOR12, COLLECTOR13, COLLECTOR14, COLLECTOR15, COLLECTOR16, COLLECTOR17, COLLECTOR18, COLLECTOR19, COLLECTOR20, COLLECTOR21, COLLECTOR22, COLLECTOR23, COLLECTOR24, COLLECTOR25, COLLECTOR26, COLLECTOR27, COLLECTOR28, COLLECTOR29, COLLECTOR30, COLLECTOR31, COLLECTOR32, COLLECTOR33, COLLECTOR34, COLLECTOR35, COLLECTOR36, COLLECTOR37, COLLECTOR38, COLLECTOR39, COLLECTOR40 Defined in '/etc/condor/config.d/11_gwms_secondary_collectors.config', line 193. If you don't see all the collectors and the two schedds , then the configuration must be corrected. There should be no startd daemons listed Verify all VO Frontend HTCondor services are communicating. user@host $ condor_status -any MyType TargetType Name glideresource None MM_fermicloud026@gfactory_inst Scheduler None fermicloud020.fnal.gov DaemonMaster None fermicloud020.fnal.gov Negotiator None fermicloud020.fnal.gov Collector None frontend_service@fermicloud020.fnal.gov Scheduler None schedd_jobs2@fermicloud020.fnal.gov To see the details of the glidein resource use condor_status -subsystem glideresource -l , including the GlideFactoryName. Verify that the Factory is seeing correctly the Frontend using condor_status -pool <FACTORY_HOST> -any -constraint 'FrontendName==\"<FRONTEND_NAME_FROM_CONFIG>\"' -l , including the GlideFactoryName. Where <FACTORY_HOST> is the hostname of the factory being used, for example: gfactory-1.t2.ucsd.edu and is the value set for \"frontend_name\" in the frontend.xml file GlideinWMS Job submission \u00b6 HTCondor submit file glidein-job.sub . This is a simple job printing the hostname of the host where the job is running: #file glidein-job.sub universe = vanilla executable = /bin/hostname output = glidein/test.out error = glidein/test.err requirements = IS_GLIDEIN == True log = glidein/test.log ShouldTransferFiles = YES when_to_transfer_output = ON_EXIT queue To submit the job: root@host # condor_submit glidein-job.sub Then you can control the job like a normal HTCondor job, e.g. to check the status of the job use condor_q . Monitoring Web pages \u00b6 You should be able to see the jobs also in the GlideinWMS monitoring pages that are made available on the Web: http://gwms-frontend-host.domain/vofrontend/monitor/ Troubleshooting GlideinWMS \u00b6 File Locations \u00b6 File Description File Location Configuration file /etc/gwms-frontend/frontend.xml Logs /var/log/gwms-frontend/ Startup script /usr/bin/gwms-frontend Web Directory /var/lib/gwms-frontend/web-area Web Base /var/lib/gwms-frontend/web-base Web configuration /etc/httpd/conf.d/gwms-frontend.conf Working Directory /var/lib/gwms-frontend/vofrontend/ Lock files /var/lib/gwms-frontend/vofrontend/lock/frontend.lock /var/lib/gwms-frontend/vofrontend/group_*/lock/frontend.lock Status files /var/lib/gwms-frontend/vofrontend/monitor/group_*/frontend_status.xml Note /var/lib/gwms-frontend is also the home directory of the frontend user Certificates brief \u00b6 Here a short list of files to check when you change the certificates. Note that if you renew a proxy or certificate and the DN remains the same no configuration file needs to change, just put the renewed certificate/proxy in place. File Description File Location Configuration file /etc/gwms-frontend/frontend.xml HTCondor certificates map /etc/condor/certs/condor_mapfile (1) Host certificate and key (2) /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem VO Frontend proxy (from host certificate) /tmp/vofe_proxy (3) Pilot proxy /tmp/pilot_proxy (3) If using HTCondor RPM installation, e.g. the one coming from OSG. If you have separate/multiple HTCondor hosts (schedds, collectors, negotiators, ..) you may have to check this file on all of them to make sure that the HTCondor authentication works correctly. Used to create the VO Frontend proxy if following the instructions above If using the Frontend configuration and scripts described above in this document . These paths are the ones specified in the configuration file. Remember also that when you change DN: The VO Frontend certificate DN must be communicated to the GlideinWMS Factory ( see above ) The pilot proxy must be able to run jobs at the sites you are using, e.g. by being added to the correct VO in OSG (the Factory forwards the proxy and does not care about the DN) Increase the log level and change rotation policies \u00b6 You can increase the log level of the frontend. To add a log file with all the log information add the following line with all the message types in the process_log section of /etc/gwms-frontend/frontend.xml : <log_retention> <process_logs> <process_log extension= \"all\" max_days= \"7.0\" max_mbytes= \"100.0\" min_days= \"3.0\" msg_types= \"DEBUG,EXCEPTION,INFO,ERROR,ERR\" /> You can also change the rotation policy and choose whether compress the rotated files, all in the same section of the config files: max_bytes is the max size of the log files max_days it will be rotated. compression specifies if rotated files are compressed backup_count is the number of rotated log files kept Further details are in the reference documentation . Frontend reconfig failing \u00b6 If service gwms-frontend reconfig fails at the end with an error like \"Writing back config file failed, Reconfiguring the frontend [FAILED]\", make sure that /etc/gwms-frontend/ belongs to the frontend user. It must be able to write to update the configuration file. Frontend failing to start \u00b6 If the startup script of the frontend is failing, check the log file for errors (probably /var/log/gwms-frontend/frontend/frontend.<TODAY's DATE>.err.log and .debug.log ). If you find errors like \"Exception occurred: ... 'ExpatError: no element found: line 1, column 0\\n']\" and \"IOError: [Errno 9] Bad file descriptor\" you may have an empty status file ( /var/lib/gwms-frontend/vofrontend/monitor/group_*/frontend_status.xml ) that causes GlideinWMS Frontend not to start. The glideinFrontend crashes after a XML parsing exception visible in the log file (\"Exception occurred: ... 'ExpatError: no element found: line 1, column 0\\n']\"). Remove the status file. Then start the frontend. The Frontend will be fixed in future versions to handle this automatically. Certificates not there \u00b6 The scripts should send an email warning if there are problems and they fail to generate the proxies. Anyway something could go wrong and you want to check manually. If you are using the scripts to generate automatically the proxies but the proxies are not there (in /tmp or wherever you expect them): make sure that the scripts are there and configured with the correct values make sure that the scripts are executable make sure that the scripts are in frontend 's crontab make sure that the certificates (or master proxy) used to generate the proxies is not expired Failed authentication \u00b6 If you get a failed authentication error (e.g. \"Failed to talk to factory_pool gfactory-1.t2.ucsd.edu...) then: check that you have the right x509 certificates mentioned in the security section of /etc/gwms-frontend/frontend.xml the owner must be frontend (user running the frontend) the permission must be 600 they must be valid for more than one hour (2/300 hours), at least the non VO part check that the clock is synchronized (see HostTimeSetup) Frontend doesn't trust Factory \u00b6 If your frontend complains in the debug log: code 256:['Error: communication error\\n', 'AUTHENTICATE:1003:Failed to authenticate with any method\\n', 'AUTHENTICATE:1004:Failed to authenticate using GSI\\n', \"GSI:5006:Failed to authenticate because the subject '/DC=org/DC=doegrids/OU=Services/CN=devg-3.t2.ucsd.edu' is not currently trusted by you. If it should be, add it to GSI_DAEMON_NAME in the condor_config, or use the environment variable override (check the manual).\\n\", 'GSI:5004:Failed to gss_assist_gridmap /DC=org/DC=doegrids/OU=Services/CN=devg-3.t2.ucsd.edu to a local user. A possible solution is to comment/remove the LOCAL_CONFIG_DIR in the file /var/lib/gwms-frontend/vofrontend/frontend.condor_config . No security credentials match for factory pool ..., not advertising request \u00b6 You may see a warning like \"No security credentials match for factory pool ..., not advertising request\", if the trust_domain and auth_method of an entry in the Factory configuration is not matching any of the trust_domain , type couples in the credentials in the Frontend configuration. This causes the Frontend not to use some Factory entries (the ones not matching) and may end up without entries to send glideins to. To fix the problem make sure that those attributes match as desired. Jobs not running \u00b6 If your jobs remain Idle Check the frontend log files (see above) Check the HTCondor log files ( condor_config_val LOG will give you the correct log directory): Specifically look the CollectorXXXLog files Common causes of problems could be: x509 certificates missing or expired or too short-lived proxy incorrect ownership or permission on the certificate/proxy file missing certificates If the Frontend http server is down in the glidein logs in the Factory there will be errors like \"Failed to load file 'description.dbceCN.cfg' from http://FRONTEND_HOST/vofrontend/stage .\" check that the http server is running and you can reach the URL ( http://FRONTEND_HOST/vofrontend/stage/description.dbceCN.cfg ) Getting Help \u00b6 To get assistance about the OSG software please use this page . For specific questions about the Frontend configuration (and how to add it in your HTCondor infrastructure) you can email the glideinWMS support glideinwms-support@fnal.gov To request access the OSG Glidein Factory (e.g. the UCSD factory) you have to send an email to osg-gfactory-support@physics.ucsd.edu (see below). References \u00b6 Definitions: What is a Virtual Organization Documents about the Glidein-WMS system and the VO frontend: http://glideinwms.fnal.gov/ Users \u00b6 The Glidein WMS Frontend installation will create the following users unless they are already created. User Default uid Comment apache 48 Runs httpd to provide the monitoring page (installed via dependencies). condor none HTCondor user (installed via dependencies). frontend none This user runs the glideinWMS VO frontend. It also owns the credentials forwarded to the factory to use for the glideins. gratia none Runs the Gratia probes to collect accounting data (optional see the Gratia section below ) Warning UID 48 is reserved by RedHat for user apache . If it is already taken by a different username, you will experience errors. Certificates \u00b6 This document has a proxy configuration section that uses the host certificate/key and a user certificate to generate the required proxies. Certificate User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem Host key root /etc/grid-security/hostkey.pem Here are instructions to request a host certificate. Networking \u00b6 Service Name Protocol Port Number Inbound Outbound Comment HTCondor port range tcp LOWPORT, HIGHPORT YES contiguous range of ports GlideinWMS Frontend tcp 9618, 9620 to 9660 YES HTCondor Collectors for the GlideinWMS Frontend (received ClassAds from resources and jobs) The VO frontend must have reliable network connectivity, be on the public internet (no NAT), and preferably with no firewalls. Incoming TCP ports 9618 to 9660 must be open.","title":"Install GlideinWMS Frontend"},{"location":"other/install-gwms-frontend/#glideinwms-vo-frontend-installation","text":"This document describes how to install the Glidein Workflow Managment System (GlideinWMS) VO Frontend for use with the OSG Glidein factory. This software is the minimum requirement for a VO to use GlideinWMS. This document assumes expertise with HTCondor and familiarity with the GlideinWMS software. It does not cover anything but the simplest possible install. Please consult the GlideinWMS reference documentation for advanced topics, including non- root , non-RPM-based installation. This document covers three components of the GlideinWMS a VO needs to install: User Pool Collectors : A set of condor_collector processes. Pilots submitted by the factory will join to one of these collectors to form a HTCondor pool. User Pool Schedd : A condor_schedd . Users may submit HTCondor vanilla universe jobs to this schedd; it will run jobs in the HTCondor pool formed by the User Pool Collectors . Glidein Frontend : The frontend will periodically query the User Pool Schedd to determine the desired number of running job slots. If necessary, it will request the Factory to launch additional pilots. This guide covers installation of all three components on the same host: it is designed for small to medium VOs (see the Hardware Requirements below). Given a significant, large host, we have been able to scale the single-host install to 20,000 running jobs.","title":"GlideinWMS VO Frontend Installation"},{"location":"other/install-gwms-frontend/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If they do not exist already, the installation will create the Linux users apache (UID 48), condor , frontend , and gratia Network: The VO frontend must have reliable network connectivity and be on the public internet (i.e. no NAT). The latest version requires the following TCP ports to be open: 80 (HTTP) for monitoring and serving configuration to workers 9618 (HTCondor shared port) for HTCondor daemons including the Schedd and User Collector 9620 to 9660 for secondary collectors (depending on configuration, see below) Host choice : The GlideinWMS VO Frontend has the following hardware requirements for a production host: CPU : Four cores, preferably no more than 2 years old. RAM : 3GB plus 2MB per running job. For example, to sustain 2000 running jobs, a host with 5GB is needed. Disk : 30GB will be sufficient for all the binaries, config and log files related to GlideinWMS. As this will be an interactive access point, have enough disk space for your users' jobs. Note The default configuration uses a port range (9620 to 9660) for the secondary collectors. You can configure the secondary collectors to use the shared port 9618 instead; this will become the default in the future. Note GlideinWMS versions prior to 3.4.1 also required port 9615 for the Schedd, and did not support using shared port for the secondary collectors. If you are upgrading a standalone access point from version 3.4 or earlier, the default open port has changed from 9615 to 9618, and you need to update your firewall rules to reflect this change. You can figure out which port will be used by running the following command: condor_config_val SHARED_PORT_ARGS For more detailed information, see Configuring GlideinWMS Frontend . As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"other/install-gwms-frontend/#credentials-and-proxies","text":"The VO Frontend will use two credentials in its interactions with the other GlideinWMS services. At this time, these will be proxy files. the VO Frontend proxy (used to authenticate with the other GlideinWMS services). one or more GlideinWMS pilot proxies (used/delegated to the Factory services and submitted on the GlideinWMS pilot jobs). The VO Frontend proxy and the pilot proxy can be the same. By default, the VO Frontend will run as user frontend (UID is machine dependent) so these proxies must be owned by the user frontend . Note Both proxies need to be passwordless to allow automatic proxy renewal .","title":"Credentials and Proxies"},{"location":"other/install-gwms-frontend/#vo-frontend-proxy","text":"The use of a service certificate is recommended. Then you create a proxy from the certificate as explained in the proxy configuration section . You must give the Factory operations team the DN of this proxy when you initially setup the Frontend and each time the DN changes .","title":"VO Frontend proxy"},{"location":"other/install-gwms-frontend/#pilot-proxies","text":"These proxies are used by the Factory to submit the GlideinWMS pilot jobs. Therefore, they must be authorized to access to the CEs (Factory entry points) where jobs are submitted. There is no need to notify the Factory operation about the DN of these proxies (neither at the initial registration nor for subsequent changes). These additional proxies have no special requirements or controls added by the Factory but will probably require VO attributes because of the CEs: if you are able to use one of these proxies to submit jobs to the corresponding CEs where the Factory runs GlideinWMS pilots for you, then the proxies are OK. You can test each of your proxies using globusrun or HTCondor-G. To check the important information about a PEM certificate you can use: openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout . You will need that to find out information for the configuration files and the request to the GlideinWMS Factory.","title":"Pilot proxies"},{"location":"other/install-gwms-frontend/#osg-factory-access","text":"Before installing the GlideinWMS VO Frontend you need the information about a Glidein Factory that you can access: (recommended) OSG is managing a factory at UCSD You have another Glidein Factory that you can access To request access to the OSG Glidein Factory at UCSD you have to send an email to osg-gfactory-support@physics.ucsd.edu providing: Your Name The VO that is utilizing the VO Frontend The DN of the proxy you will use to communicate with the Factory (VO Frontend DN, e.g. the host certificate subject if you follow the proxy configuration section ) You can propose a security name that will have to be confirmed/changed by the Factory managers (see below) A list of sites where you want to run: Your VO must be supported on those sites You can provide a list or piggy back on existing lists, e.g. all the sites supported for the VO. Check with the Factory managers You can start with one single site In the reply from the OSG Factory managers you will receive some information needed for the configuration of your VO Frontend The exact spelling and capitalization of your VO name. Sometime is different from what is commonly used, e.g. OSG VO is \"OSGVO\". The host of the Factory Collector: gfactory-1.t2.ucsd.edu The DN os the factory, e.g. /DC=org/DC=doegrids/OU=Services/CN=gfactory-1.t2.ucsd.edu The factory identity, e.g.: gfactory@gfactory-1.t2.ucsd.edu The identity on the Factory you will be mapped to. Something like: username@gfactory-1.t2.ucsd.edu Your security name. A unique name, usually containing your VO name: My_SecName A string to add in the main Factory query_expr in the Frontend configuration, e.g. stringListMember(\"<VO>\",GLIDEIN_Supported_VOs) . This is used to select the entries you can use. From there you get the correct name of the VO (above in this list).","title":"OSG Factory access"},{"location":"other/install-gwms-frontend/#installing-glideinwms-frontend","text":"","title":"Installing GlideinWMS Frontend"},{"location":"other/install-gwms-frontend/#installing-htcondor","text":"If you don't have HTCondor already installed, you can install the HTCondor RPM from the OSG repository: root@host # yum install condor.x86_64 If you already have installed HTCondor using a tarball or a source other than the OSG ROM, you will need to install the empty-condor RPM: root@host # yum install empty-condor --enablerepo = osg-empty","title":"Installing HTCondor"},{"location":"other/install-gwms-frontend/#installing-the-vo-frontend-rpm","text":"Install the RPM and dependencies (be prepared for a lot of dependencies). root@host # yum install glideinwms-vofrontend This will install the current production release verified and tested by OSG with default HTCondor configuration. This command will install the GlideinWMS vofrontend, HTCondor, the OSG client, and all the required dependencies all on one node. If you wish to install a different version of GlideinWMS, add the \"--enablerepo\" argument to the command as follows: yum install --enablerepo=osg-testing glideinwms-vofrontend : The most recent production release, still in testing phase. This will usually match the current tarball version on the GlideinWMS home page. (The osg-release production version may lag behind the tarball release by a few weeks as it is verified and packaged by OSG). Note that this will also take the osg-testing versions of all dependencies as well. yum install --enablerepo=osg-upcoming glideinwms-vofrontend : The most recent development series release, i.e. version 3.3 release. This has newer features such as cloud submission support, but is less tested. Note that these commands will install default HTCondor configurations with all GlideinWMS services on one node.","title":"Installing the VO Frontend RPM"},{"location":"other/install-gwms-frontend/#installing-glideinwms-frontend-on-multiple-nodes-advanced","text":"For advanced users expecting heavy usage on their access point, you may want to consider splitting the user collector, user submit, and vo frontend services. This can be doing using the following three commands (on different machines): root@host # yum install glideinwms-vofrontend-standalone root@host # yum install glideinwms-usercollector root@host # yum install glideinwms-userschedd In addition, you will need to perform the following steps: On the vofrontend and userschedd, modify CONDOR_HOST to point to your usercollector. This is in /etc/condor/config.d/00_gwms_general.config . You can also override this value by placing it in a new config file. (For instance, /etc/condor/config.d/99_local_custom.config to avoid rpmsave/rpmnew conflicts on upgrades). In /etc/condor/certs/condor_mapfile , you will need to add the DNs of each machine (userschedd, usercollector, vofrontend). Take great care to escape all special characters. Alternatively, you can use the glidecondor_addDN to add these values. In the /etc/gwms-frontend/frontend.xml file, change the schedd locations to match the correct server. Also change the collectors tags at the bottom of the file. More details on frontend.xml are in the following sections.","title":"Installing GlideinWMS Frontend on Multiple Nodes (Advanced)"},{"location":"other/install-gwms-frontend/#configuring-glideinwms-frontend","text":"After installing the RPM, you need to configure the components of the GlideinWMS VO Frontend: Edit Frontend configuration options Edit HTCondor configuration options Create a HTCondor grid map file Reconfigure and Start the Frontend","title":"Configuring GlideinWMS Frontend"},{"location":"other/install-gwms-frontend/#configuring-the-frontend","text":"The VO Frontend configuration file is /etc/gwms-frontend/frontend.xml . The next steps will describe each line that you will need to edit if you are using the OSG Factory at UCSD. The portions to edit are highlighted. If you are using a different Factory more changes are necessary, please check the VO Frontend configuration reference. The VO you are affiliated with. This will identify those CEs that the GlideinWMS pilot will be authorized to run on using the pilot proxy described previously in this section . Sometimes the whole query_expr is provided to you by the Factory operators (see Factory access above): <factory query_expr= '((stringListMember(\"VO\", GLIDEIN_Supported_VOs)))' > Factory collector information. The username that you are assigned by the Factory (also called the identity you will be mapped to on the factory, see above) . Note that if you are using a factory different than the production Factory, you will have to change also DN , factory_identity and node attributes. (refer to the information provided to you by the Factory operator): <collector DN= \"/DC=org/DC=doegrids/OU=Services/CN=gfactory-1.t2.ucsd.edu\" comment= \"Define factory collector globally for simplicity\" factory_identity= \"gfactory@gfactory-1.t2.ucsd.edu\" my_identity= \"username@gfactory-1.t2.ucsd.edu\" node= \"gfactory-1.t2.ucsd.edu\" /> Frontend security information. The classad_proxy in the security entry is the location of the VO Frontend proxy described previously here . The proxy_DN is the DN of the classad_proxy above. The security_name identifies this VO Frontend to the the Factory, It is provided by the Factory operator. The absfname in the credential entry is the location of the GlideinWMS pilot proxy described in the requirements section here . There can be multiple pilot proxies, or even other kind of keys (e.g. if you use cloud resources). The type and trust_domain of the credential must match respectively auth_method and trust_domain used in the entry definition in the Factory. If there is no match, between these two attributes in one of the credentials and the corresponding ones in some entry in one of the Factories, then this Frontend cannot trigger glideins. Both the classad_proxy and absfname files should be owned by frontend user. <security classad_proxy= \"/tmp/vo_proxy\" proxy_DN= \"DN of vo_proxy\" proxy_selection_plugin= \"ProxyAll\" security_name= \"The security name, this is used by factory\" sym_key= \"aes_256_cbc\" > <credentials> <credential absfname= \"/tmp/pilot_proxy\" security_class= \"frontend\" trust_domain= \"OSG\" type= \"grid_proxy\" /> </credentials> </security> The schedd information. The DN of the VO Frontend Proxy described previously here . The fullname attribute is the fully qualified domain name of the host where you installed the VO Frontend ( hostname --fqdn ). A secondary schedd is optional. You will need to delete the secondary schedd line if you are not using it. Multiple schedds allow the Frontend to service requests from multiple access points. <schedds> <schedd DN= \"Cert DN used by the schedd at fullname:\" fullname= \"Hostname of the schedd\" /> <schedd DN= \"Cert DN used by the second Schedd at fullname:\" fullname= \"schedd name@Hostname of second schedd\" /> </schedds> The User Collector information. The DN of the VO Frontend Proxy described previously here . The node attribute is the full hostname of the collectors ( hostname --fqdn ) and port The secondary attribute indicates whether the element is for the primary or secondary collectors (True/False). The default HTCondor configuration of the VO Frontend starts multiple Collector processes on the host ( /etc/condor/config.d/11_gwms_secondary_collectors.config ). The DN and hostname on the first line are the hostname and the host certificate of the VO Frontend. The DN and hostname on the second line are the same as the ones in the first one. The hostname (e.g. hostname.domain.tld) is filled automatically during the installation. The secondary collector connection can be defined as sinful string for the sock case , e.g., hostname.domain.tld:9618?sock=collector16. [Example 1] :::xml <collector DN=\"DN of main collector\" node=\"hostname.domain.tld:9618\" secondary=\"False\"/> <collector DN=\"DN of secondary collectors (usually same as DN in line above)\" node=\"hostname.domain.tld:9620-9660\" secondary=\"True\"/> Note In GlideinWMS v3.4.1, shared port only configuration is incompatible if talking to older Factories (v3.4 or older). We strongly recommend any user of GlideinWMS Frontend v3.4.1 or newer, to transition to the use of shared port for secondary collectors and CCBs. The shared port configuration is incompatible if your Frontend is talking to Factories v3.4 or older and you'll get an error telling you to wait. To transition to the use of shared port for secondary collectors, you have to change the collectors section in the Frontend configuration. If you are using the default port range for the secondary collectors as shown in [Example 2] below, then you should replace it with port 9618 and the sock-range as shown in [Example 1] above. If you have a more complex configuration, please read the detailed GlideinWMS configuration [Example 2] :::xml <collector DN=\"DN of main collector\" node=\"hostname.domain.tld:9618\" secondary=\"False\"/> <collector DN=\"DN of secondary collectors (usually same as DN in line above)\" node=\u201chostname.domain.tld:9618?sock=collector0-40\" secondary=\"True\"/> The CCBs information. If you have a different configuration of the HTCondor Connection Brokering (CCB servers) from the default (usually the section is empty as the User Collectors acts as CCB if needed), you can set the connection in the CCB section the same way that User Collector information previously mentioned. Also, the same rules for transition to shared_port of the connections, apply to the CCBs. :::xml <ccb DN=\"DN of the CCB server\" node=\"hostname.domain.tld:9618\"/> <ccb DN=\"DN of the CCB server\" node=\u201chostname.domain.tld:9618?sock=collector0-40\" secondary=\"True\"/> Warning The Frontend configuration includes many knobs, some of which are conflicting with a RPM installation where there is only one version of the Frontend installed and it uses well known paths. Do not change the following in the Frontend configuration (you must leave the default values coming with the RPM installation): frontend_versioning='False' (in the first line of XML, versioning is useful to install multiple tarball versions) for RPM installs, work base_dir must be /var/lib/gwms-frontend/vofrontend/ (other scripts like /etc/init.d/gwms-frontend count on that value)","title":"Configuring the Frontend"},{"location":"other/install-gwms-frontend/#using-a-different-factory","text":"The configuration above points to the OSG production Factory. If you are using a different Factory, then you have to: replace gfactory@gfactory-1.t2.ucsd.edu and gfactory-1.t2.ucsd.edu with the correct values for your Factory. And control also that the name used for the Frontend () matches. make sure that the Factory is advertising the attributes used in the Factory query expression ( query_expr ).","title":"Using a Different Factory"},{"location":"other/install-gwms-frontend/#configuring-htcondor","text":"The HTCondor configuration for the Frontend is placed in /etc/condor/config.d . 00_gwms_general.config 01_gwms_collectors.config 02_gwms_schedds.config 03_gwms_local.config 11_gwms_secondary_collectors.config 90_gwms_dns.config For most installations create a new file named /etc/condor/config.d/92_local_condor.config","title":"Configuring HTCondor"},{"location":"other/install-gwms-frontend/#using-other-htcondor-rpms-eg-uw-madison-htcondor-rpm","text":"The above procedure will work if you are using the OSG HTCondor RPMS. You can verify that you used the OSG HTCondor RPM by using yum list condor . The version name should include \"osg\", e.g. 8.6.4-3.osg.el7 . If you are using the UW Madison HTCondor RPMS, be aware of the following changes: This HTCondor RPM uses a file /etc/condor/condor_config.local to add your local machine slot to the user pool. If you want to disable this behavior (recommended), you should blank out that file or comment out the line in /etc/condor/condor_config for LOCAL_CONFIG_FILE. (Make sure that LOCAL_CONFIG_DIR is set to /etc/condor/config.d ) Note that the variable LOCAL_DIR is set differently in UW Madison and OSG RPMs. This should not cause any more problems in the GlideinWMS RPMs, but please take note if you use this variable in your job submissions or other customizations. In general if you are using a non OSG RPM or if you added custom configuration files for HTCondor please check the order of the configuration files: root@host # condor_config_val -config Configuration source: /etc/condor/condor_config Local configuration sources: /etc/condor/config.d/00_gwms_general.config /etc/condor/config.d/01_gwms_collectors.config /etc/condor/config.d/02_gwms_schedds.config /etc/condor/config.d/03_gwms_local.config /etc/condor/config.d/11_gwms_secondary_collectors.config /etc/condor/config.d/90_gwms_dns.config /etc/condor/condor_config.local If, like in the example above, the GlideinWMS configuration files are not the last ones in the list please verify that important configuration options have not been overridden by the other configuration files.","title":"Using other HTCondor RPMs, e.g. UW Madison HTCondor RPM"},{"location":"other/install-gwms-frontend/#verifying-your-htcondor-configuration","text":"The GlideinWMS configuration files in /etc/condor/config.d should be the last ones in the list. If not, please verify that important configuration options have not been overridden by the other configuration files. Verify the alll the expected HTCondor daemons are running: root@host # condor_config_val -verbose DAEMON_LIST DAEMON_LIST: MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, SHARED_PORT, COLLECTOR0 COLLECTOR1 COLLECTOR2 COLLECTOR3 COLLECTOR4 COLLECTOR5 COLLECTOR6 COLLECTOR7 COLLECTOR8 COLLECTOR9 COLLECTOR10 , COLLECTOR11, COLLECTOR12, COLLECTOR13, COLLECTOR14, COLLECTOR15, COLLECTOR16, COLLECTOR17, COLLECTOR18, COLLECTOR19, COLLECTOR20, COLLECTOR21, COLLECTOR22, COLLECTOR23, COLLECTOR24, COLLECTOR25, COLLECTOR26, COLLECTOR27, COLLECTOR28, COLLECTOR29, COLLECTOR30, COLLECTOR31, COLLECTOR32, COLLECTOR33, COLLECTOR34, COLLECTOR35, COLLECTOR36, COLLECTOR37, COLLECTOR38, COLLECTOR39, COLLECTOR40 Defined in '/etc/condor/config.d/11_gwms_secondary_collectors.config', line 193. If you don't see all the collectors. shared port and the two schedd, then the configuration must be corrected. There should be no startd daemons listed.","title":"Verifying your HTCondor configuration"},{"location":"other/install-gwms-frontend/#creating-a-htcondor-grid-mapfile","text":"The HTCondor mapfile ( /etc/condor/certs/condor_mapfile ) is used for authentication between the GlideinWMS pilot running on a remote worker node, and the local collector. HTCondor uses the mapfile to map certificates to pseudo-users on the local machine. It is important that you map the DN's of: Each schedd proxy : The DN of each schedd that the frontend talks to. Specified in the frontend.xml schedd element DN attribute: <schedds> <schedd DN= \"/DC=org/DC=doegrids/OU=Services/CN=YOUR_HOST\" fullname= \"YOUR_HOST\" /> <schedd DN= \"/DC=org/DC=doegrids/OU=Services/CN=YOUR_HOST\" fullname= \"schedd_jobs2@YOUR_HOST\" /> </schedds> Frontend proxy : The DN of the proxy that the Frontend uses to communicate with the other GlideinWMS services. Specified in the frontend.xml security element proxy_DN attribute: <security classad_proxy= \"/tmp/vo_proxy\" proxy_DN= \"DN of vo_proxy\" .... Each pilot proxy The DN of each proxy that the frontend forwards to the factory to use with the GlideinWMS pilots. This allows the GlideinWMS pilot jobs to communicate with the User Collector. Specified in the frontend.xml proxy absfname attribute (you need to specify the DN of each of those proxies: <security .... <proxies > < proxy absfname= \"/tmp/vo_proxy\" .... : </proxies > Below is an example mapfile, by default found in /etc/condor/certs/condor_mapfile . In this example there are lines for each of services mentioned above. GSI \"<DN OF SCHEDD PROXY>\" schedd GSI \"<DN OF FRONTEND PROXY>\" frontend GSI \"<DN OF PILOT PROXY>\" pilot_proxy GSI \"^/DC=org/DC=doegrids/OU=Services/CN=personal-submit-host2.mydomain.edu$\" <example_of_format> GSI (.*) anonymous FS (.*) \\1 Change <DN OF SCHEDD PROXY> , <DN OF FRONTEND PROXY> , and <DN OF PILOT PROXY> to the distinguished names of the respective proxies.","title":"Creating a HTCondor grid mapfile."},{"location":"other/install-gwms-frontend/#restarting-htcondor","text":"After configuring HTCondor, be sure to restart HTCondor: root@host # service condor restart","title":"Restarting HTCondor"},{"location":"other/install-gwms-frontend/#proxy-configuration","text":"GlideinWMS comes with the gwms-renew-proxies service that can automatically generate and renew the pilot proxies and VO Frontend proxy . To configure this service, modify /etc/gwms-frontend/proxies.ini using the following instructions: For each of your pilot proxies , create a [PILOT <NAME>] section, where <NAME> is a descriptive name for the proxy that is unique to your local configuration. In each section, set the proxy_cert , proxy_key , output , and vo corresponding to each pilot proxy: [PILOT <NAME>] proxy_cert = <PATH TO THE PILOT CERTIFICATE> proxy_key = <PATH TO THE PILOT KEY> output = <PATH TO CREATE THE PILOT PROXY> vo = <NAME OF VIRTUAL ORGANIZATION> Change <PATH TO THE PILOT CERTIFICATE> , <PATH TO THE PILOT KEY> and <PATH TO CREATE THE PILOT PROXY> appropriately to point to the locations of the pilot certificate, pilot key, and pilot proxy, respectively. Additionally, in each [PILOT <NAME>] section, you must specify how the proxy's VOMS attributes will be signed by setting use_voms_server . Choose one of the following options: To directly sign the VOMS attributes (recommended), you must have access to the vo 's certificate and key. Specify the paths to the vo certificate and key, and optionally, the VOMS attribute (e.g. /osg/Role=NULL/Capability=NULL for the OSG VO): use_voms_server = false vo_cert = <PATH TO THE PILOT CERTIFICATE> vo_key = <PATH TO THE PILOT KEY> fqan = <VOMS ATTRIBUTE> Note If you do not have access to the vo 's voms_cert and voms_key , contact the VO manager. To have your proxy's VOMS attributes signed by the vo 's VOMS server, set use_voms_server = true and the VOMS attribute (e.g. /osg/Role=NULL/Capability=NULL for the OSG VO): use_voms_server = true fqan = <VOMS ATTRIBUTE> Warning Due to the retirement of VOMS Admin server in the OSG, use_voms_server = false is the preferred method for signing VOMS attributes. Optionally, the proxy renewal frequency and lifetime (in hours) can be specified in each [PILOT <NAME>] section: # Default: 1 frequency = <RENEWAL FREQUENCY> # Default: 24 lifetime = <PROXY LIFETIME> Configure the location and output of the VO Frontend proxy under the [FRONTEND] section and set the proxy_cert , proxy_key , and output to paths corresponding to your VO Frontend: [FRONTEND] proxy_cert = <PATH TO THE FRONTEND CERTIFICATE> proxy_key = <PATH TO THE FRONTEND KEY> output = <PATH TO CREATE THE FRONTEND PROXY> Note output must be the same path as the classad_proxy specified in this section (OPTIONAL) If you are running the gwms-frontend service under a <NON-DEFAULT USER> (default: frontend ), specify the user as the owner of your proxies under the [COMMON] section: [COMMON] owner = <NON-DEFAULT USER> Note The [COMMON] section is required but its contents are optional","title":"Proxy Configuration"},{"location":"other/install-gwms-frontend/#adding-gratia-accounting-and-a-local-monitoring-page-on-a-production-server","text":"You must report accounting information if you are running more than a few test jobs on the OSG . Install the GlideinWMS Gratia Probe on each of your access points in your GlideinWMS installation: root@host # yum install gratia-probe-glideinwms Edit the ProbeConfig located in /etc/gratia/condor/ProbeConfig . First, edit the SiteName and ProbeName to be a unique identifier for your GlideinWMS access point. There can be multiple probes (with different names) per site. If you haven't already, you should register your GlideinWMS access point in OIM . Then you can use the name you used to register the resource. ProbeName=\"condor:<hostname>\" SiteName=\"HCC-GlideinWMW-Frontend\" Next, turn the probe on by setting EnableProbe : EnableProbe=\"1\" Reconfigure HTCondor: root@host # condor_reconfig","title":"Adding Gratia Accounting and a Local Monitoring Page on a Production Server"},{"location":"other/install-gwms-frontend/#optional-accounting-configuration","text":"The following sections contain additional configuration that may be required depending on the customizations you've made to your GlideinWMS frontend installation.","title":"Optional Accounting Configuration"},{"location":"other/install-gwms-frontend/#users-without-certificates","text":"If you have users that submit jobs without a certificate explicitly declared in the submit file, you will need to add MapUnknownToGroup to the ProbeConfig. In the file /etc/gratia/condor/ProbeConfig , add the value after the EnableProbe . ... SuppressGridLocalRecords=\"0\" EnableProbe=\"1\" MapUnknownToGroup=\"1\" Title3=\"Tuning parameter\" ... Further, if you want to record all usage as coming from a single VO, you can configure the probe to override the 'guessed' VO. In the below example, replace <ENGAGE> with a registered VO that you would like to report as. If you don't have a VO that you are affiliated with, you may use \"Engage\". ... MapUnknownToGroup=\"1\" MapGroupToRole=\"1\" VOOverride=\"<ENGAGE>\" ...","title":"Users without Certificates"},{"location":"other/install-gwms-frontend/#non-standard-htcondor-install","text":"If HTCondor is installed in a non-standard location (i.e., not RPMs, or relocated RPM outside /usr/bin ), then you need to tell the probe where to find the HTCondor binaries. This can be done with a script with a special attribute in /etc/gratia/condor/ProbeConfig , CondorLocation . Point it to the location of the HTCondor install, such that CondorLocation/bin/condor_version exists.","title":"Non-Standard HTCondor Install"},{"location":"other/install-gwms-frontend/#new-data-directory","text":"If your PER_JOB_HISTORY_DIR HTCondor configuration variable is different from the default value, you must update the value of DataFolder in /etc/gratia/condor/ProbeConfig . To check the value of PER_JOB_HISTORY_DIR run the following command: user@host $ condor_config_val PER_JOB_HISTORY_DIR","title":"New Data Directory"},{"location":"other/install-gwms-frontend/#different-collector-and-other-customizations","text":"By default the probe reports to the OSG GRACC. To change that you must edit the configuration file, /etc/gratia/condor/ProbeConfig , and replace the OSG production host with your desired one: ... CollectorHost=\"gratia-osg-prod.opensciencegrid.org:80\" SSLHost=\"gratia-osg-prod.opensciencegrid.org:443\" SSLRegistrationHost=\"gratia-osg-prod.opensciencegrid.org:80\" ...","title":"Different collector and other customizations"},{"location":"other/install-gwms-frontend/#optional-configuration","text":"The following configuration steps are optional and will likely not be required for setting up a small site. If you do not need any of the following special configurations, skip to the section on using GlideinWMS . Allow users to specify where their jobs run Creating a group to test configuration changes","title":"Optional Configuration"},{"location":"other/install-gwms-frontend/#allowing-users-to-specify-where-their-jobs-run","text":"In order to allow users to specify the sites at which their jobs want to run (or to test a specific site), a Frontend can be configured to match on DESIRED_Sites or ignore it if not specified. Modify /etc/gwms-frontend/frontend.xml using the following instructions: In the Frontend's global <match> stanza, set the match_expr : '((job.get(\"DESIRED_Sites\",\"nosite\")==\"nosite\") or (glidein[\"attrs\"][\"GLIDEIN_Site\"] in job.get(\"DESIRED_Sites\",\"nosite\").split(\",\")))' In the same <match> stanza, set the start_expr : '(DESIRED_Sites=?=undefined || stringListMember(GLIDEIN_Site,DESIRED_Sites,\",\")) Add the DESIRED_Sites attribute to the match attributes list: <match_attrs> <match_attr name= \"DESIRED_Sites\" type= \"string\" /> </match_attrs> Reconfigure the Frontend: root@host # /etc/init.d/gwms-frontend reconfig","title":"Allowing users to specify where their jobs run"},{"location":"other/install-gwms-frontend/#creating-a-group-for-testing-configuration-changes","text":"To perform configuration changes without impacting production the recommended way is to create an ITB group in /etc/gwms-frontend/frontend.xml . This groupwould only match jobs that have the +is_itb=True ClassAd. Create a group named itb. Set the group's start_expr so that the group's glideins will only match user jobs with +is_itb=True : <match match_expr= \"True\" start_expr= \"(is_itb)\" > Set the factory_query_expr so that this group only communicates with ITB factories: <factory query_expr= 'FactoryType=?=\"itb\"' > Set the group's collector stanza to reference the ITB factory, replacing username@gfactory-1.t2.ucsd.edu with your factory identity: <collector DN= \"/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=glidein-itb.grid.iu.edu\" \\ factory_identity= \"gfactory@glidein-itb.grid.iu.edu\" \\ my_identity= \"username@gfactory-1.t2.ucsd.edu\" \\ node= \"glidein-itb.grid.iu.edu\" /> Set the job query_expr so that only ITB jobs appear in condor_q : <job query_expr= \"(!isUndefined(is_itb) && is_itb)\" > Reconfigure the Frontend (see the section below ): # on EL7 systems systemctl reload gwms-frontend","title":"Creating a group for testing configuration changes"},{"location":"other/install-gwms-frontend/#using-glideinwms","text":"","title":"Using GlideinWMS"},{"location":"other/install-gwms-frontend/#managing-glideinwms-services","text":"In addition to the GlideinWMS service itself, there are a number of supporting services in your installation. The specific services are: Software Service name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info Gratia gratia-probes-cron Accounting software HTCondor condor HTTPD httpd GlideinWMS monitoring and staging GlideinWMS gwms-renew-proxies.timer Automatic proxy renewal gwms-frontend The main GlideinWMS service Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME>","title":"Managing GlideinWMS Services"},{"location":"other/install-gwms-frontend/#reconfiguring-glideinwms","text":"After changing the configuration of GlideinWMS, run the following command as root : root@host # systemctl reload gwms-frontend Note Note that systemctl reload gwms-frontend will work only if: - gwms-frontend service is running - gwms-frontend service was started with systemctl Otherwise, you will get the following error in any of the cases: # systemctl reload gwms-frontend Job for gwms-frontend.service invalid.","title":"Reconfiguring GlideinWMS"},{"location":"other/install-gwms-frontend/#upgrading-glideinwms-frontend","text":"After upgrading the GlideinWMS RPM, you must issue an upgrade command to GlideinWMS: Stop the condor and gwms-frontend services as specified in this section Issue the upgrade command: root@host # /usr/sbin/gwms-frontend upgrade Start the condor and gwms-frontend services as specified in this section","title":"Upgrading GlideinWMS FrontEnd"},{"location":"other/install-gwms-frontend/#validating-glideinwms-frontend","text":"The complete validation of the Frontend is the submission of actual jobs. However, there are a few things that can be checked prior to submitting user jobs to HTCondor.","title":"Validating GlideinWMS Frontend"},{"location":"other/install-gwms-frontend/#verifying-services-are-running","text":"There are a few things that can be checked prior to submitting user jobs to HTCondor. Verify all HTCondor daemons are started. user@host $ condor_config_val -verbose DAEMON_LIST DAEMON_LIST: MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, SHARED_PORT, SCHEDDJOBS2 COLLECTOR0 COLLECTOR1 COLLECTOR2 COLLECTOR3 COLLECTOR4 COLLECTOR5 COLLECTOR6 COLLECTOR7 COLLECTOR8 COLLECTOR9 COLLECTOR10 , COLLECTOR11, COLLECTOR12, COLLECTOR13, COLLECTOR14, COLLECTOR15, COLLECTOR16, COLLECTOR17, COLLECTOR18, COLLECTOR19, COLLECTOR20, COLLECTOR21, COLLECTOR22, COLLECTOR23, COLLECTOR24, COLLECTOR25, COLLECTOR26, COLLECTOR27, COLLECTOR28, COLLECTOR29, COLLECTOR30, COLLECTOR31, COLLECTOR32, COLLECTOR33, COLLECTOR34, COLLECTOR35, COLLECTOR36, COLLECTOR37, COLLECTOR38, COLLECTOR39, COLLECTOR40 Defined in '/etc/condor/config.d/11_gwms_secondary_collectors.config', line 193. If you don't see all the collectors and the two schedds , then the configuration must be corrected. There should be no startd daemons listed Verify all VO Frontend HTCondor services are communicating. user@host $ condor_status -any MyType TargetType Name glideresource None MM_fermicloud026@gfactory_inst Scheduler None fermicloud020.fnal.gov DaemonMaster None fermicloud020.fnal.gov Negotiator None fermicloud020.fnal.gov Collector None frontend_service@fermicloud020.fnal.gov Scheduler None schedd_jobs2@fermicloud020.fnal.gov To see the details of the glidein resource use condor_status -subsystem glideresource -l , including the GlideFactoryName. Verify that the Factory is seeing correctly the Frontend using condor_status -pool <FACTORY_HOST> -any -constraint 'FrontendName==\"<FRONTEND_NAME_FROM_CONFIG>\"' -l , including the GlideFactoryName. Where <FACTORY_HOST> is the hostname of the factory being used, for example: gfactory-1.t2.ucsd.edu and is the value set for \"frontend_name\" in the frontend.xml file","title":"Verifying Services Are Running"},{"location":"other/install-gwms-frontend/#glideinwms-job-submission","text":"HTCondor submit file glidein-job.sub . This is a simple job printing the hostname of the host where the job is running: #file glidein-job.sub universe = vanilla executable = /bin/hostname output = glidein/test.out error = glidein/test.err requirements = IS_GLIDEIN == True log = glidein/test.log ShouldTransferFiles = YES when_to_transfer_output = ON_EXIT queue To submit the job: root@host # condor_submit glidein-job.sub Then you can control the job like a normal HTCondor job, e.g. to check the status of the job use condor_q .","title":"GlideinWMS Job submission"},{"location":"other/install-gwms-frontend/#monitoring-web-pages","text":"You should be able to see the jobs also in the GlideinWMS monitoring pages that are made available on the Web: http://gwms-frontend-host.domain/vofrontend/monitor/","title":"Monitoring Web pages"},{"location":"other/install-gwms-frontend/#troubleshooting-glideinwms","text":"","title":"Troubleshooting GlideinWMS"},{"location":"other/install-gwms-frontend/#file-locations","text":"File Description File Location Configuration file /etc/gwms-frontend/frontend.xml Logs /var/log/gwms-frontend/ Startup script /usr/bin/gwms-frontend Web Directory /var/lib/gwms-frontend/web-area Web Base /var/lib/gwms-frontend/web-base Web configuration /etc/httpd/conf.d/gwms-frontend.conf Working Directory /var/lib/gwms-frontend/vofrontend/ Lock files /var/lib/gwms-frontend/vofrontend/lock/frontend.lock /var/lib/gwms-frontend/vofrontend/group_*/lock/frontend.lock Status files /var/lib/gwms-frontend/vofrontend/monitor/group_*/frontend_status.xml Note /var/lib/gwms-frontend is also the home directory of the frontend user","title":"File Locations"},{"location":"other/install-gwms-frontend/#certificates-brief","text":"Here a short list of files to check when you change the certificates. Note that if you renew a proxy or certificate and the DN remains the same no configuration file needs to change, just put the renewed certificate/proxy in place. File Description File Location Configuration file /etc/gwms-frontend/frontend.xml HTCondor certificates map /etc/condor/certs/condor_mapfile (1) Host certificate and key (2) /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem VO Frontend proxy (from host certificate) /tmp/vofe_proxy (3) Pilot proxy /tmp/pilot_proxy (3) If using HTCondor RPM installation, e.g. the one coming from OSG. If you have separate/multiple HTCondor hosts (schedds, collectors, negotiators, ..) you may have to check this file on all of them to make sure that the HTCondor authentication works correctly. Used to create the VO Frontend proxy if following the instructions above If using the Frontend configuration and scripts described above in this document . These paths are the ones specified in the configuration file. Remember also that when you change DN: The VO Frontend certificate DN must be communicated to the GlideinWMS Factory ( see above ) The pilot proxy must be able to run jobs at the sites you are using, e.g. by being added to the correct VO in OSG (the Factory forwards the proxy and does not care about the DN)","title":"Certificates brief"},{"location":"other/install-gwms-frontend/#increase-the-log-level-and-change-rotation-policies","text":"You can increase the log level of the frontend. To add a log file with all the log information add the following line with all the message types in the process_log section of /etc/gwms-frontend/frontend.xml : <log_retention> <process_logs> <process_log extension= \"all\" max_days= \"7.0\" max_mbytes= \"100.0\" min_days= \"3.0\" msg_types= \"DEBUG,EXCEPTION,INFO,ERROR,ERR\" /> You can also change the rotation policy and choose whether compress the rotated files, all in the same section of the config files: max_bytes is the max size of the log files max_days it will be rotated. compression specifies if rotated files are compressed backup_count is the number of rotated log files kept Further details are in the reference documentation .","title":"Increase the log level and change rotation policies"},{"location":"other/install-gwms-frontend/#frontend-reconfig-failing","text":"If service gwms-frontend reconfig fails at the end with an error like \"Writing back config file failed, Reconfiguring the frontend [FAILED]\", make sure that /etc/gwms-frontend/ belongs to the frontend user. It must be able to write to update the configuration file.","title":"Frontend reconfig failing"},{"location":"other/install-gwms-frontend/#frontend-failing-to-start","text":"If the startup script of the frontend is failing, check the log file for errors (probably /var/log/gwms-frontend/frontend/frontend.<TODAY's DATE>.err.log and .debug.log ). If you find errors like \"Exception occurred: ... 'ExpatError: no element found: line 1, column 0\\n']\" and \"IOError: [Errno 9] Bad file descriptor\" you may have an empty status file ( /var/lib/gwms-frontend/vofrontend/monitor/group_*/frontend_status.xml ) that causes GlideinWMS Frontend not to start. The glideinFrontend crashes after a XML parsing exception visible in the log file (\"Exception occurred: ... 'ExpatError: no element found: line 1, column 0\\n']\"). Remove the status file. Then start the frontend. The Frontend will be fixed in future versions to handle this automatically.","title":"Frontend failing to start"},{"location":"other/install-gwms-frontend/#certificates-not-there","text":"The scripts should send an email warning if there are problems and they fail to generate the proxies. Anyway something could go wrong and you want to check manually. If you are using the scripts to generate automatically the proxies but the proxies are not there (in /tmp or wherever you expect them): make sure that the scripts are there and configured with the correct values make sure that the scripts are executable make sure that the scripts are in frontend 's crontab make sure that the certificates (or master proxy) used to generate the proxies is not expired","title":"Certificates not there"},{"location":"other/install-gwms-frontend/#failed-authentication","text":"If you get a failed authentication error (e.g. \"Failed to talk to factory_pool gfactory-1.t2.ucsd.edu...) then: check that you have the right x509 certificates mentioned in the security section of /etc/gwms-frontend/frontend.xml the owner must be frontend (user running the frontend) the permission must be 600 they must be valid for more than one hour (2/300 hours), at least the non VO part check that the clock is synchronized (see HostTimeSetup)","title":"Failed authentication"},{"location":"other/install-gwms-frontend/#frontend-doesnt-trust-factory","text":"If your frontend complains in the debug log: code 256:['Error: communication error\\n', 'AUTHENTICATE:1003:Failed to authenticate with any method\\n', 'AUTHENTICATE:1004:Failed to authenticate using GSI\\n', \"GSI:5006:Failed to authenticate because the subject '/DC=org/DC=doegrids/OU=Services/CN=devg-3.t2.ucsd.edu' is not currently trusted by you. If it should be, add it to GSI_DAEMON_NAME in the condor_config, or use the environment variable override (check the manual).\\n\", 'GSI:5004:Failed to gss_assist_gridmap /DC=org/DC=doegrids/OU=Services/CN=devg-3.t2.ucsd.edu to a local user. A possible solution is to comment/remove the LOCAL_CONFIG_DIR in the file /var/lib/gwms-frontend/vofrontend/frontend.condor_config .","title":"Frontend doesn't trust Factory"},{"location":"other/install-gwms-frontend/#no-security-credentials-match-for-factory-pool-not-advertising-request","text":"You may see a warning like \"No security credentials match for factory pool ..., not advertising request\", if the trust_domain and auth_method of an entry in the Factory configuration is not matching any of the trust_domain , type couples in the credentials in the Frontend configuration. This causes the Frontend not to use some Factory entries (the ones not matching) and may end up without entries to send glideins to. To fix the problem make sure that those attributes match as desired.","title":"No security credentials match for factory pool ..., not advertising request"},{"location":"other/install-gwms-frontend/#jobs-not-running","text":"If your jobs remain Idle Check the frontend log files (see above) Check the HTCondor log files ( condor_config_val LOG will give you the correct log directory): Specifically look the CollectorXXXLog files Common causes of problems could be: x509 certificates missing or expired or too short-lived proxy incorrect ownership or permission on the certificate/proxy file missing certificates If the Frontend http server is down in the glidein logs in the Factory there will be errors like \"Failed to load file 'description.dbceCN.cfg' from http://FRONTEND_HOST/vofrontend/stage .\" check that the http server is running and you can reach the URL ( http://FRONTEND_HOST/vofrontend/stage/description.dbceCN.cfg )","title":"Jobs not running"},{"location":"other/install-gwms-frontend/#getting-help","text":"To get assistance about the OSG software please use this page . For specific questions about the Frontend configuration (and how to add it in your HTCondor infrastructure) you can email the glideinWMS support glideinwms-support@fnal.gov To request access the OSG Glidein Factory (e.g. the UCSD factory) you have to send an email to osg-gfactory-support@physics.ucsd.edu (see below).","title":"Getting Help"},{"location":"other/install-gwms-frontend/#references","text":"Definitions: What is a Virtual Organization Documents about the Glidein-WMS system and the VO frontend: http://glideinwms.fnal.gov/","title":"References"},{"location":"other/install-gwms-frontend/#users","text":"The Glidein WMS Frontend installation will create the following users unless they are already created. User Default uid Comment apache 48 Runs httpd to provide the monitoring page (installed via dependencies). condor none HTCondor user (installed via dependencies). frontend none This user runs the glideinWMS VO frontend. It also owns the credentials forwarded to the factory to use for the glideins. gratia none Runs the Gratia probes to collect accounting data (optional see the Gratia section below ) Warning UID 48 is reserved by RedHat for user apache . If it is already taken by a different username, you will experience errors.","title":"Users"},{"location":"other/install-gwms-frontend/#certificates","text":"This document has a proxy configuration section that uses the host certificate/key and a user certificate to generate the required proxies. Certificate User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem Host key root /etc/grid-security/hostkey.pem Here are instructions to request a host certificate.","title":"Certificates"},{"location":"other/install-gwms-frontend/#networking","text":"Service Name Protocol Port Number Inbound Outbound Comment HTCondor port range tcp LOWPORT, HIGHPORT YES contiguous range of ports GlideinWMS Frontend tcp 9618, 9620 to 9660 YES HTCondor Collectors for the GlideinWMS Frontend (received ClassAds from resources and jobs) The VO frontend must have reliable network connectivity, be on the public internet (no NAT), and preferably with no firewalls. Incoming TCP ports 9618 to 9660 must be open.","title":"Networking"},{"location":"other/osg-token-renewer/","text":"Installing and Using the OSG Token Renewal Service \u00b6 This document contains instructions to install and configure the OSG Token Renewal Service package, osg-token-renewer , for obtaining and automatically renewing tokens with oidc-agent . Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Reference section below as needed): An account is needed with an OIDC token issuer that offers the device flow User and Group IDs: If they do not exist already, the installation will create the Linux user and group named osg-token-svc As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Installing the OSG Token Renewal Service \u00b6 Install the OSG Token Renewal Service package: root@server # yum install osg-token-renewer This will install the osg-token-renewer scripts & systemd service files, and will pull in the oidc-agent package that the service depends on. Configuring the OSG Token Renewal Service \u00b6 Configuring accounts \u00b6 To create a new client account named <ACCOUNT_SHORTNAME> : Create a corresponding file named /etc/osg/tokens/<ACCOUNT_SHORTNAME>.pw with the encryption password to use for this client account. Consult the Requesting Tokens document to determine which scopes you will need for this client account. Run the setup script as follows: root@server # osg-token-renewer-setup <ACCOUNT_SHORTNAME> For example, root@server # osg-token-renewer-setup myaccount123 That will use dynamic client registration. If you instead have a predefined client id and secret, add a --manual option, for example, :::console root@server # osg-token-renewer-setup --manual myaccount123 When prompted, enter your Issuer and desired scopes for this account from the list of valid options. If you used --manual , also enter the client id and secret. You will also be prompted on the console to visit a web link to authorize the client request with a passcode printed on the console. Follow the prompts (visit the web link, enter the request passcode, log in with your account for your issuer, and authorize the request). If this succeeds, you will be prompted with a new [account <ACCOUNT_SHORTNAME>] section to add to your config.ini . Add the section to your /etc/osg/token-renewer/config.ini , replacing the example section if it's still there. Next you can configure one or more tokens for this client account. Configuring tokens \u00b6 After you have created an OIDC client account and added it to /etc/osg/token-renewer/config.ini , you need to create a corresponding token section in the config for each token that should be generated for this account (possibly with different options). Choose a <TOKEN_NAME> and add a new [token <TOKEN_NAME>] section (replacing the example section if it's still there). The account option in this section must match the <ACCOUNT_SHORTNAME> for the corresponding [account <ACCOUNT_SHORTNAME>] section. Set the token_path to /etc/osg/tokens/<ACCOUNT_SHORTNAME>.<TOKEN_NAME>.token Optionally, you may also specify any of the following options, which can limit the respective values in the generated token compared to the associated account: Option Description audience list of audiences (see RFC7519 ) scope list of scopes min_lifetime min token lifetime in seconds Note For tokens used against an HTCondor-CE, set the audience option to <CE FQDN>:<CE PORT> . Example configuration \u00b6 [account myclient1234] password_file = /etc/osg/tokens/myclient1234.pw [token mytoken567] account = myclient1234 token_path = /etc/osg/tokens/myclient1234.mytoken567.token Adjusting token renewal frequency \u00b6 It is possible to override the default osg-token-renewer systemd timer frequency for this service by creating a config override file under /etc/systemd/system/osg-token-renewer.timer.d/ . For example, to configure the token renewal service to run every 10 minutes, run the following: root@host # cat << EOF > /etc/systemd/system/osg-token-renewer.timer.d/timer-frequency.conf [Timer] OnBootSec=10min OnUnitActiveSec=10min EOF root@host # systemctl daemon-reload Note Be aware that the default timer configuration also has a 3 minute random delay built in, via the parameter RandomizedDelaySec=3min . Thus setting the frequency to 10min only guarantees runs every 13 minutes. This parameter is also configurable in the above systemd override file. Managing the OSG Token Renewal Service \u00b6 These services are managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ) for EL7: To... On EL7, run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> Token renewal services \u00b6 Software Service name Notes OSG Token Renewer osg-token-renewer.service The OSG Token Renewer, runs as a \"oneshot\" service, not a daemon. OSG Token Renewer timer osg-token-renewer.timer Timer to run the OSG Token Renewer every 15 minutes The OSG token renewal service is set to run via a systemd timer every 15 minutes. After configuring your account(s) and token(s), enable the timer with: root@host # systemctl enable osg-token-renewer.timer root@host # systemctl start osg-token-renewer.timer If you would like to run the service manually at a different time (e.g., to generate all the tokens immediately), you can run the service once with: root@host # systemctl start osg-token-renewer.service If this succeeds, the new token will be written to the location you configured for token_path ( /etc/osg/tokens/<ACCOUNT_SHORTNAME>.token , by convention). Failures can be diagnosed by running: root@host # journalctl -eu osg-token-renewer Help \u00b6 To get assistance please use this Help Procedure . Reference \u00b6 Files of interest \u00b6 Path Description /etc/osg/token-renewer/config.ini Main config file for service /etc/osg/tokens/<ACCOUNT_SHORTNAME>.pw Encryption password file for client account /etc/osg/tokens/<ACCOUNT_SHORTNAME>.<TOKEN_NAME>.token Output location for token files /usr/sbin/osg-token-renewer-setup Setup script for each new client account /usr/lib/systemd/system/osg-token-renewer.service SystemD service unit configuruation /usr/lib/systemd/system/osg-token-renewer.timer SystemD timer for service /usr/libexec/osg-token-renewer/osg-token-renewer.sh Main wrapper script invoked by service /usr/libexec/osg-token-renewer/osg-token-renewer Token renewal program invoked by main wrapper","title":"Install OSG Token Renewal Service"},{"location":"other/osg-token-renewer/#installing-and-using-the-osg-token-renewal-service","text":"This document contains instructions to install and configure the OSG Token Renewal Service package, osg-token-renewer , for obtaining and automatically renewing tokens with oidc-agent .","title":"Installing and Using the OSG Token Renewal Service"},{"location":"other/osg-token-renewer/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Reference section below as needed): An account is needed with an OIDC token issuer that offers the device flow User and Group IDs: If they do not exist already, the installation will create the Linux user and group named osg-token-svc As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories","title":"Before Starting"},{"location":"other/osg-token-renewer/#installing-the-osg-token-renewal-service","text":"Install the OSG Token Renewal Service package: root@server # yum install osg-token-renewer This will install the osg-token-renewer scripts & systemd service files, and will pull in the oidc-agent package that the service depends on.","title":"Installing the OSG Token Renewal Service"},{"location":"other/osg-token-renewer/#configuring-the-osg-token-renewal-service","text":"","title":"Configuring the OSG Token Renewal Service"},{"location":"other/osg-token-renewer/#configuring-accounts","text":"To create a new client account named <ACCOUNT_SHORTNAME> : Create a corresponding file named /etc/osg/tokens/<ACCOUNT_SHORTNAME>.pw with the encryption password to use for this client account. Consult the Requesting Tokens document to determine which scopes you will need for this client account. Run the setup script as follows: root@server # osg-token-renewer-setup <ACCOUNT_SHORTNAME> For example, root@server # osg-token-renewer-setup myaccount123 That will use dynamic client registration. If you instead have a predefined client id and secret, add a --manual option, for example, :::console root@server # osg-token-renewer-setup --manual myaccount123 When prompted, enter your Issuer and desired scopes for this account from the list of valid options. If you used --manual , also enter the client id and secret. You will also be prompted on the console to visit a web link to authorize the client request with a passcode printed on the console. Follow the prompts (visit the web link, enter the request passcode, log in with your account for your issuer, and authorize the request). If this succeeds, you will be prompted with a new [account <ACCOUNT_SHORTNAME>] section to add to your config.ini . Add the section to your /etc/osg/token-renewer/config.ini , replacing the example section if it's still there. Next you can configure one or more tokens for this client account.","title":"Configuring accounts"},{"location":"other/osg-token-renewer/#configuring-tokens","text":"After you have created an OIDC client account and added it to /etc/osg/token-renewer/config.ini , you need to create a corresponding token section in the config for each token that should be generated for this account (possibly with different options). Choose a <TOKEN_NAME> and add a new [token <TOKEN_NAME>] section (replacing the example section if it's still there). The account option in this section must match the <ACCOUNT_SHORTNAME> for the corresponding [account <ACCOUNT_SHORTNAME>] section. Set the token_path to /etc/osg/tokens/<ACCOUNT_SHORTNAME>.<TOKEN_NAME>.token Optionally, you may also specify any of the following options, which can limit the respective values in the generated token compared to the associated account: Option Description audience list of audiences (see RFC7519 ) scope list of scopes min_lifetime min token lifetime in seconds Note For tokens used against an HTCondor-CE, set the audience option to <CE FQDN>:<CE PORT> .","title":"Configuring tokens"},{"location":"other/osg-token-renewer/#example-configuration","text":"[account myclient1234] password_file = /etc/osg/tokens/myclient1234.pw [token mytoken567] account = myclient1234 token_path = /etc/osg/tokens/myclient1234.mytoken567.token","title":"Example configuration"},{"location":"other/osg-token-renewer/#adjusting-token-renewal-frequency","text":"It is possible to override the default osg-token-renewer systemd timer frequency for this service by creating a config override file under /etc/systemd/system/osg-token-renewer.timer.d/ . For example, to configure the token renewal service to run every 10 minutes, run the following: root@host # cat << EOF > /etc/systemd/system/osg-token-renewer.timer.d/timer-frequency.conf [Timer] OnBootSec=10min OnUnitActiveSec=10min EOF root@host # systemctl daemon-reload Note Be aware that the default timer configuration also has a 3 minute random delay built in, via the parameter RandomizedDelaySec=3min . Thus setting the frequency to 10min only guarantees runs every 13 minutes. This parameter is also configurable in the above systemd override file.","title":"Adjusting token renewal frequency"},{"location":"other/osg-token-renewer/#managing-the-osg-token-renewal-service","text":"These services are managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ) for EL7: To... On EL7, run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME>","title":"Managing the OSG Token Renewal Service"},{"location":"other/osg-token-renewer/#token-renewal-services","text":"Software Service name Notes OSG Token Renewer osg-token-renewer.service The OSG Token Renewer, runs as a \"oneshot\" service, not a daemon. OSG Token Renewer timer osg-token-renewer.timer Timer to run the OSG Token Renewer every 15 minutes The OSG token renewal service is set to run via a systemd timer every 15 minutes. After configuring your account(s) and token(s), enable the timer with: root@host # systemctl enable osg-token-renewer.timer root@host # systemctl start osg-token-renewer.timer If you would like to run the service manually at a different time (e.g., to generate all the tokens immediately), you can run the service once with: root@host # systemctl start osg-token-renewer.service If this succeeds, the new token will be written to the location you configured for token_path ( /etc/osg/tokens/<ACCOUNT_SHORTNAME>.token , by convention). Failures can be diagnosed by running: root@host # journalctl -eu osg-token-renewer","title":"Token renewal services"},{"location":"other/osg-token-renewer/#help","text":"To get assistance please use this Help Procedure .","title":"Help"},{"location":"other/osg-token-renewer/#reference","text":"","title":"Reference"},{"location":"other/osg-token-renewer/#files-of-interest","text":"Path Description /etc/osg/token-renewer/config.ini Main config file for service /etc/osg/tokens/<ACCOUNT_SHORTNAME>.pw Encryption password file for client account /etc/osg/tokens/<ACCOUNT_SHORTNAME>.<TOKEN_NAME>.token Output location for token files /usr/sbin/osg-token-renewer-setup Setup script for each new client account /usr/lib/systemd/system/osg-token-renewer.service SystemD service unit configuruation /usr/lib/systemd/system/osg-token-renewer.timer SystemD timer for service /usr/libexec/osg-token-renewer/osg-token-renewer.sh Main wrapper script invoked by service /usr/libexec/osg-token-renewer/osg-token-renewer Token renewal program invoked by main wrapper","title":"Files of interest"},{"location":"other/schedd-filebeats/","text":"Warning This is a technology preview document and will probably change content and location withouth notice. Installation of FileBeats for Access Points \u00b6 This document is for frontend administrators. It describes the installation of Filebeats to continuously upload the HTCondor access point transfer log to Elastic Search. Introduction \u00b6 An access point (HTCondor schedd) is a login node where users submit jobs to distributed computing pools. One interesting log that it produces is the TransferLog. The TransferLogs report all the transfers of files between compute node and access points. In this guide we describe the installation of Filebeats to upload this log to Elastic Search. Installation \u00b6 FileBeat Installation \u00b6 For the installation of filebeats follow the official instruction to set up the repositories and install filebeats as described here . Configuration \u00b6 Configuration of Filebeats \u00b6 The configuration of filebeats revolves around this file /etc/filebeat/filebeat.yml . Bellow are the steps to modify the different sections of this file The Filebeat Inputs section, the input should look like this: filebeat.inputs: - type: log enabled: true paths: - /var/log/condor/XferStatsLog The output logstash section should look like: #----------------------------- Logstash output -------------------------------- output.logstash: # The Logstash hosts hosts: [\"gracc.opensciencegrid.org:6938\"] # Optional SSL. By default is off. # List of root certificates for HTTPS server verifications ssl.certificate_authorities: [\"/etc/grid-security/certificates/InCommon-IGTF-Server-CA.pem\"] # Certificate for SSL client authentication ssl.certificate: \"/etc/grid-security/hostcert.pem\" # Client Certificate Key ssl.key: \"/etc/grid-security/hostkey.pem\" Comment out all of the Elasticsearch output since we are using LogStash #-------------------------- Elasticsearch output ------------------------------ #output.elasticsearch: # Array of hosts to connect to. #hosts: [\"localhost:9200\"] # Optional protocol and basic auth credentials. #protocol: \"https\" #username: \"elastic\" #password: \"changeme\" The general section should look like this, where <HOSTNAME> should be replaced by the hostname of the machine you are installing filebeats on. #================================ General ===================================== name: <HOSTNAME> tags: [\"xfer-log\"] Test that the configuration is correct by running: root@host # filebeat test config Start the filebeats services: root@host # service filebeat start Configuration of HTCondor \u00b6 For the configuration of the HTCondor access point to use the TransferLog follow the next instructions: Note The transfer metrics was introduced in HTCondor 8.6 series. You need to be running a version equal or greater than 8.6.1 to enable it. Create a file named /etc/condor/config.d/50-transferLog.config with the following contents: SHADOW_DEBUG = D_STATS SHADOW_STATS_LOG = $(LOG)/XferStatsLog STARTER_STATS_LOG = $(LOG)/XferStatsLog Reconfigure condor: root@host # condor_reconfig Make sure that after a couple of minutes the new log /var/log/condor/XferStatsLog is present.","title":"Install Transfer Log Filebeats"},{"location":"other/schedd-filebeats/#installation-of-filebeats-for-access-points","text":"This document is for frontend administrators. It describes the installation of Filebeats to continuously upload the HTCondor access point transfer log to Elastic Search.","title":"Installation of FileBeats for Access Points"},{"location":"other/schedd-filebeats/#introduction","text":"An access point (HTCondor schedd) is a login node where users submit jobs to distributed computing pools. One interesting log that it produces is the TransferLog. The TransferLogs report all the transfers of files between compute node and access points. In this guide we describe the installation of Filebeats to upload this log to Elastic Search.","title":"Introduction"},{"location":"other/schedd-filebeats/#installation","text":"","title":"Installation"},{"location":"other/schedd-filebeats/#filebeat-installation","text":"For the installation of filebeats follow the official instruction to set up the repositories and install filebeats as described here .","title":"FileBeat Installation"},{"location":"other/schedd-filebeats/#configuration","text":"","title":"Configuration"},{"location":"other/schedd-filebeats/#configuration-of-filebeats","text":"The configuration of filebeats revolves around this file /etc/filebeat/filebeat.yml . Bellow are the steps to modify the different sections of this file The Filebeat Inputs section, the input should look like this: filebeat.inputs: - type: log enabled: true paths: - /var/log/condor/XferStatsLog The output logstash section should look like: #----------------------------- Logstash output -------------------------------- output.logstash: # The Logstash hosts hosts: [\"gracc.opensciencegrid.org:6938\"] # Optional SSL. By default is off. # List of root certificates for HTTPS server verifications ssl.certificate_authorities: [\"/etc/grid-security/certificates/InCommon-IGTF-Server-CA.pem\"] # Certificate for SSL client authentication ssl.certificate: \"/etc/grid-security/hostcert.pem\" # Client Certificate Key ssl.key: \"/etc/grid-security/hostkey.pem\" Comment out all of the Elasticsearch output since we are using LogStash #-------------------------- Elasticsearch output ------------------------------ #output.elasticsearch: # Array of hosts to connect to. #hosts: [\"localhost:9200\"] # Optional protocol and basic auth credentials. #protocol: \"https\" #username: \"elastic\" #password: \"changeme\" The general section should look like this, where <HOSTNAME> should be replaced by the hostname of the machine you are installing filebeats on. #================================ General ===================================== name: <HOSTNAME> tags: [\"xfer-log\"] Test that the configuration is correct by running: root@host # filebeat test config Start the filebeats services: root@host # service filebeat start","title":"Configuration of Filebeats"},{"location":"other/schedd-filebeats/#configuration-of-htcondor","text":"For the configuration of the HTCondor access point to use the TransferLog follow the next instructions: Note The transfer metrics was introduced in HTCondor 8.6 series. You need to be running a version equal or greater than 8.6.1 to enable it. Create a file named /etc/condor/config.d/50-transferLog.config with the following contents: SHADOW_DEBUG = D_STATS SHADOW_STATS_LOG = $(LOG)/XferStatsLog STARTER_STATS_LOG = $(LOG)/XferStatsLog Reconfigure condor: root@host # condor_reconfig Make sure that after a couple of minutes the new log /var/log/condor/XferStatsLog is present.","title":"Configuration of HTCondor"},{"location":"other/troubleshooting-gratia/","text":"Troubleshooting Gratia Accounting \u00b6 Gratia is software used in OSG to gather accounting information for usage of computational resources. This information is collected from individual services at a site, such as a Compute Entrypoint (CE) or an Access Point (AP), through \"Gratia probes\" and transferred to the central OSG GRACC server. Gratia probes are run periodically as cron jobs under an HTCondor Schedd process (i.e., Schedd cron jobs) as the condor user. The commands that you run, configuration locations that you verify, and log locations that you check will depend on the type of host that you are troubleshooting. Accounting Architecture \u00b6 These are the definitions of the major elements in the above figure. Gratia probe : A piece of software that collects accounting data from the host on which it's running, and transmits it to the GRACC server. GRACC server : A server that collects Gratia accounting data from one or more sites and can share it with users via a web page. The GRACC server is hosted by the OSG. Reporter : A web service running on the GRACC server. Users can connect to the reporter via a web browser to explore the Gratia data. Collector : A web service running on the GRACC server that collects data from one or more Gratia probes. Users do not directly interact with the collector. You can explore the details of the OSG accounting data at https://gracc.opensciencegrid.org and https://display.opensciencegrid.org/ . Determine Your Host Type \u00b6 Before continuing with the rest of the document, it is important to know what type of host you are troubleshooting: Do users log into this host and submit jobs? Then you are running an Access Point Does this host accept pilot jobs from remote clients? Then you are running a Compute Entrypoint If you are still not sure, you can run the following command to determine if this is a CE installation: $ rpm -q osg-ce osg-ce-3.6-4.osg36.el7.x86_64 If the output is blank, then you are not working with a CE host. Access Points vs Compute Entrypoints A single host should not be used as both an AP and a CE but if you've inherited a host, it's possible that the host was installed improperly. Is Gratia Up-To-Date? \u00b6 Ensure that you have the latest bug-fixes: yum update 'gratia-probe*' Is Gratia Running? \u00b6 Since Gratia probes run as Schedd cron jobs, first verify that the relevant HTCondor service is running based on the type of host that you are troubleshooting: Host Command Access Point systemctl status condor Compute Entrypoint systemctl status condor-ce If they are not running, consult the relevant documentation to enable and start the appropriate service: Access Point Compute Entrypoint Identifying failures \u00b6 Schedd cron jobs are logged to the SchedLog, whose location depends on the type of host you are troubleshooting: Host Log Path Access Point /var/log/condor/SchedLog Compute Entrypoint /var/log/condor-ce/SchedLog Currently, the default log level does not show Schedd cron job activity (future releases will show failures by default) so you must perform the following steps to see the relevant log messages: Identify the configuration location for your host: Host Configuration Directory Access Point /etc/condor/config.d Compute Entrypoint /etc/condor-ce/config.d In a .conf file in the configuration directory that you determined above, increase the debug level: SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_CAT D_ALWAYS:2 Successful cron jobs will appear in the relevant SchedLog like so: 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: Starting job 'GRATIA' (/usr/share/gratia/htcondor-ce/condor_meter) 05/06/22 19:25:31 (D_ALWAYS:2) Create_Process: using fast clone() to create child process. 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: STDOUT closed for 'GRATIA' 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: STDERR closed for 'GRATIA' 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: 'GRATIA' (pid 1082) exit_status=0 05/06/22 19:25:31 (D_ALWAYS:2) CronJob::Schedule 'GRATIA' IR=F IP=T IWE=F IOS=F IOD=F nr=116 nf=0 05/06/22 19:25:31 (D_ALWAYS:2) CronJob::Schedule 'GRATIA_CLEANUP' IR=F IP=T IWE=F IOS=F IOD=F nr=2 nf=0 Verifying packaging \u00b6 Gratia probe RPM packaging will create the appropriate files and folder structure with the correct permissions so that Gratia probes can run smoothly. However, it's possible that stale configuration management or other automation scripts at your site could To verify that file contents and ownership have not been changed, run one of the following commands based on the type of host you are troubleshooting: Host Command Access Point rpm -q --verify gratia-probe-condor-ap Compute Entrypoint rpm -q --verify gratia-probe-htcondor-ce Verifying configuration \u00b6 When troubleshooting Gratia, there are two different configurations to investigate: HTCondor configuration if job history records aren't being processed. Gratia ProbeConfig if your job history records are being processed but they are either malformed or are not being sent to the GRACC HTCondor configuration \u00b6 The HTCondor and/or HTCondor-CE configuration determines where job history files are written and how often the Gratia probe Schedd cron job are run. If you recently updated your host to OSG 3.6, it's important to verify the location of the job history files. Access Points \u00b6 Verify the values of your HTCondor PER_JOB_HISTORY_DIR configurations match the output below: # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor/gratia/data # at: /usr/share/condor/config.d/50-gratia-gwms.conf, line 28 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor/gratia/data/ root@host # chown -R condor:condor /var/lib/condor/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a. HTCondor-CE and HTCondor batch systems \u00b6 Verify the values of your HTCondor and HTCondor-CE PER_JOB_HISTORY_DIR configurations match the output below: # condor_ce_config_val -v PER_JOB_HISTORY_DIR Not defined: PER_JOB_HISTORY_DIR # at: <Default> # raw: PER_JOB_HISTORY_DIR = # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor/config.d/99-gratia.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor-ce/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If the value of condor_ce_config_val -v PER_JOB_HISTORY_DIR is set, visit the referenced file and remove the offending configuration. Repeat until the output of condor_ce_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor-ce/gratia/data/ root@host # chown -R condor:condor /var/lib/condor-ce/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a. HTCondor-CE and non-HTCondor batch systems \u00b6 After updating your gratia-probe-* packages, verify that your HTCondor-CE's PER_JOB_HISTORY_DIR is set to /var/lib/condor-ce/gratia/data : root@host # condor_ce_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor-ce/config.d/99_gratia-ce.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output , visit the file listed in the output of condor_ce_config_val , remove the offending value, and repeat until the proper value is returned. Gratia ProbeConfig \u00b6 Access Points \u00b6 Verify that your Gratia configuration is correct in /etc/gratia/condor-ap/ProbeConfig based on the table below: Fill in the value for SiteName with the Resource Name you registered in Topology (see this section for details). For example: SiteName=\"OSG_US_EXAMPLE_SUBMIT\" Set the ProbeName: ProbeName=\"condor-ap: <HOST_FQDN> \" Replacing <HOST_FQDN> with your access point's fully qualifed domain name Enable the Gratia Probe: EnableProbe=\"1\" If you are updating an existing ProbeConfig from a pre-OSG 3.6 installation, also ensure that the following values are set: Option Value VOOverrides The collaboration's resource pool of your AP, e.g. osg for an OSPool AP SuppressGridLocalRecords \"1\" MapUnknownToGroup \"1\" DataFolder \"/var/lib/condor/gratia/data/\" WorkingFolder \"/var/lib/condor/gratia/tmp/\" LogFolder \"/var/log/condor/gratia/\" Compute Entrypoints \u00b6 In normal cases, osg-configure manages the relevant ProbeConfig and it can be configured by modifying /etc/osg/config.d/30-gratia.ini . Consult the osg-configuration documentation for details. If there are problems or special configuration, you might need to edit the Gratia configuration files yourself by modifying /etc/gratia/htcondor-ce/ProbeConfig . The ProbeConfig files have many details. A few options that you might need to edit are shown before. This is not a complete file, but only shows a subset of the options. <ProbeConfiguration CollectorHost=\"gratia-osg-itb.opensciencegrid.org:80\" SSLHost=\"gratia-osg-itb.opensciencegrid.org:80\" SSLRegistrationHost=\"gratia-osg-itb.opensciencegrid.org:80\" ProbeName=\"htcondor-ce:fermicloud084.fnal.gov\" SiteName=\"WISC_OSG_EDU\" EnableProbe=\"1\" /> The options you see here are: Option Comments CollectorHost The GRACC server this probe reports to SSLHost The GRACC server this probe reports to SSLRegistrationHost The GRACC server this probe reports to ProbeName The unique name for this probe. Note that it includes the probe type and the host name SiteName The name of your Resource, as registered in OSG Topology . EnableProbe The probe will only run if this is \"1\" Again, there are many more options in this file. Most of the time you won't need to touch them. Have Records Been Uploaded To the GRACC? \u00b6 If you have verified that your Gratia probe is running and you are receiving pilot jobs, you should see data in the GRACC for your service approximately 24h after jobs have completed successfully by entering your Topology-registered site name into the Facility dropdown: Access Points Compute Entrypoints If you still aren't seeing data in GRACC, use this section to ensure that your resources are registered properly. Have you configured the resource names correctly? \u00b6 Access Points \u00b6 Ensure that SiteName in your ProbeConfig matches your Topology-registered resource name. See this section for details Compute Entrypoints \u00b6 Do the names of your resources match the names in OSG Topology ? Gratia retrieves the resource name from the Site Information section of the /etc/osg/config.d/40-siteinfo.ini ;=================================================================== ; Site Information ;=================================================================== [Site Information] ; Set \"group\" to \"OSG\" for a production site, or \"OSG-ITB\" for an ITB site. ; ; YOU WILL NEED TO CHANGE THIS group = OSG ; Set \"host_name\" to the host name of the CE being configured. ; This should resolve in DNS; if DNS is not set up yet, enter an IPv4/v6 address instead. ; ; YOU WILL NEED TO CHANGE THIS host_name = tusker-gw1.unl.edu ; Set \"resource\" to the name of the resource that you have registered ; in the OSG topology repository at https://github.com/opensciencegrid/topology ; ; YOU WILL NEED TO CHANGE THIS resource = Tusker-CE1 Do those names match the names that you registered with OSG Topology? If not, edit the names, and rerun \"osg-configure -c\". Did the site name change? \u00b6 Was the site previously reporting data, but the registered Topology site name changed? When the site name changes, you need to ask the GRACC operations team to update the name of your site at the GRACC collector: Open a support ticket Select \"Software or Service\" Select \"GRACC Operations\" Write a friendly email that asks the GRACC team to change your site name at the collector. Make sure to tell them the old name and the new name: Hello GRACC Team, Please change the site name of my site from <OLD NAME> to <NEW NAME>. Thanks, ... Reference \u00b6 If you need to look for more data, consider consulting some of the relevant files here: File Purpose /var/log/condor/gratia/<DATE>.log or /var/log/condor-ce/gratia/<DATE>.log Log file that records information about processing and uploading of Gratia accounting data /var/lib/condor/gratia/data or /var/lib/condor-ce/gratia/data Location for AP and CE job data before being processed by Gratia HTCondor or HTCondor-CE's PER_JOB_HISTORY_DIR should be set to this location /var/lib/condor/gratia/tmp or /var/lib/condor-ce/gratia/tmp Location for temporary Gratia data as it is being processed, usually empty. If you have files that are more than 30 minutes old in this directory, there may be a problem /etc/gratia/condor-ap/ProbeConfig or /etc/gratia/htcondor-ce/ProbeConfig Configuration for Gratia probes Not all RPMs will be on all hosts. Instead, only the gratia-probe-common and the one RPM specific to that host will be installed. The most common RPMs you will see are: RPM Purpose gratia-probe-common Code shared between all Gratia probes gratia-probe-condor An empty probe to ease updates from OSG 3.5 to OSG 3.6. Can be safely removed gratia-probe-condor-ap The probe that tracks Access Point usage gratia-probe-htcondor-ce Probe that tracks HTCondor-CE usage","title":"Troubleshooting Gratia"},{"location":"other/troubleshooting-gratia/#troubleshooting-gratia-accounting","text":"Gratia is software used in OSG to gather accounting information for usage of computational resources. This information is collected from individual services at a site, such as a Compute Entrypoint (CE) or an Access Point (AP), through \"Gratia probes\" and transferred to the central OSG GRACC server. Gratia probes are run periodically as cron jobs under an HTCondor Schedd process (i.e., Schedd cron jobs) as the condor user. The commands that you run, configuration locations that you verify, and log locations that you check will depend on the type of host that you are troubleshooting.","title":"Troubleshooting Gratia Accounting"},{"location":"other/troubleshooting-gratia/#accounting-architecture","text":"These are the definitions of the major elements in the above figure. Gratia probe : A piece of software that collects accounting data from the host on which it's running, and transmits it to the GRACC server. GRACC server : A server that collects Gratia accounting data from one or more sites and can share it with users via a web page. The GRACC server is hosted by the OSG. Reporter : A web service running on the GRACC server. Users can connect to the reporter via a web browser to explore the Gratia data. Collector : A web service running on the GRACC server that collects data from one or more Gratia probes. Users do not directly interact with the collector. You can explore the details of the OSG accounting data at https://gracc.opensciencegrid.org and https://display.opensciencegrid.org/ .","title":"Accounting Architecture"},{"location":"other/troubleshooting-gratia/#determine-your-host-type","text":"Before continuing with the rest of the document, it is important to know what type of host you are troubleshooting: Do users log into this host and submit jobs? Then you are running an Access Point Does this host accept pilot jobs from remote clients? Then you are running a Compute Entrypoint If you are still not sure, you can run the following command to determine if this is a CE installation: $ rpm -q osg-ce osg-ce-3.6-4.osg36.el7.x86_64 If the output is blank, then you are not working with a CE host. Access Points vs Compute Entrypoints A single host should not be used as both an AP and a CE but if you've inherited a host, it's possible that the host was installed improperly.","title":"Determine Your Host Type"},{"location":"other/troubleshooting-gratia/#is-gratia-up-to-date","text":"Ensure that you have the latest bug-fixes: yum update 'gratia-probe*'","title":"Is Gratia Up-To-Date?"},{"location":"other/troubleshooting-gratia/#is-gratia-running","text":"Since Gratia probes run as Schedd cron jobs, first verify that the relevant HTCondor service is running based on the type of host that you are troubleshooting: Host Command Access Point systemctl status condor Compute Entrypoint systemctl status condor-ce If they are not running, consult the relevant documentation to enable and start the appropriate service: Access Point Compute Entrypoint","title":"Is Gratia Running?"},{"location":"other/troubleshooting-gratia/#identifying-failures","text":"Schedd cron jobs are logged to the SchedLog, whose location depends on the type of host you are troubleshooting: Host Log Path Access Point /var/log/condor/SchedLog Compute Entrypoint /var/log/condor-ce/SchedLog Currently, the default log level does not show Schedd cron job activity (future releases will show failures by default) so you must perform the following steps to see the relevant log messages: Identify the configuration location for your host: Host Configuration Directory Access Point /etc/condor/config.d Compute Entrypoint /etc/condor-ce/config.d In a .conf file in the configuration directory that you determined above, increase the debug level: SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_CAT D_ALWAYS:2 Successful cron jobs will appear in the relevant SchedLog like so: 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: Starting job 'GRATIA' (/usr/share/gratia/htcondor-ce/condor_meter) 05/06/22 19:25:31 (D_ALWAYS:2) Create_Process: using fast clone() to create child process. 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: STDOUT closed for 'GRATIA' 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: STDERR closed for 'GRATIA' 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: 'GRATIA' (pid 1082) exit_status=0 05/06/22 19:25:31 (D_ALWAYS:2) CronJob::Schedule 'GRATIA' IR=F IP=T IWE=F IOS=F IOD=F nr=116 nf=0 05/06/22 19:25:31 (D_ALWAYS:2) CronJob::Schedule 'GRATIA_CLEANUP' IR=F IP=T IWE=F IOS=F IOD=F nr=2 nf=0","title":"Identifying failures"},{"location":"other/troubleshooting-gratia/#verifying-packaging","text":"Gratia probe RPM packaging will create the appropriate files and folder structure with the correct permissions so that Gratia probes can run smoothly. However, it's possible that stale configuration management or other automation scripts at your site could To verify that file contents and ownership have not been changed, run one of the following commands based on the type of host you are troubleshooting: Host Command Access Point rpm -q --verify gratia-probe-condor-ap Compute Entrypoint rpm -q --verify gratia-probe-htcondor-ce","title":"Verifying packaging"},{"location":"other/troubleshooting-gratia/#verifying-configuration","text":"When troubleshooting Gratia, there are two different configurations to investigate: HTCondor configuration if job history records aren't being processed. Gratia ProbeConfig if your job history records are being processed but they are either malformed or are not being sent to the GRACC","title":"Verifying configuration"},{"location":"other/troubleshooting-gratia/#htcondor-configuration","text":"The HTCondor and/or HTCondor-CE configuration determines where job history files are written and how often the Gratia probe Schedd cron job are run. If you recently updated your host to OSG 3.6, it's important to verify the location of the job history files.","title":"HTCondor configuration"},{"location":"other/troubleshooting-gratia/#access-points","text":"Verify the values of your HTCondor PER_JOB_HISTORY_DIR configurations match the output below: # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor/gratia/data # at: /usr/share/condor/config.d/50-gratia-gwms.conf, line 28 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor/gratia/data/ root@host # chown -R condor:condor /var/lib/condor/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a.","title":"Access Points"},{"location":"other/troubleshooting-gratia/#htcondor-ce-and-htcondor-batch-systems","text":"Verify the values of your HTCondor and HTCondor-CE PER_JOB_HISTORY_DIR configurations match the output below: # condor_ce_config_val -v PER_JOB_HISTORY_DIR Not defined: PER_JOB_HISTORY_DIR # at: <Default> # raw: PER_JOB_HISTORY_DIR = # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor/config.d/99-gratia.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor-ce/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If the value of condor_ce_config_val -v PER_JOB_HISTORY_DIR is set, visit the referenced file and remove the offending configuration. Repeat until the output of condor_ce_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor-ce/gratia/data/ root@host # chown -R condor:condor /var/lib/condor-ce/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a.","title":"HTCondor-CE and HTCondor batch systems"},{"location":"other/troubleshooting-gratia/#htcondor-ce-and-non-htcondor-batch-systems","text":"After updating your gratia-probe-* packages, verify that your HTCondor-CE's PER_JOB_HISTORY_DIR is set to /var/lib/condor-ce/gratia/data : root@host # condor_ce_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor-ce/config.d/99_gratia-ce.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output , visit the file listed in the output of condor_ce_config_val , remove the offending value, and repeat until the proper value is returned.","title":"HTCondor-CE and non-HTCondor batch systems"},{"location":"other/troubleshooting-gratia/#gratia-probeconfig","text":"","title":"Gratia ProbeConfig"},{"location":"other/troubleshooting-gratia/#access-points_1","text":"Verify that your Gratia configuration is correct in /etc/gratia/condor-ap/ProbeConfig based on the table below: Fill in the value for SiteName with the Resource Name you registered in Topology (see this section for details). For example: SiteName=\"OSG_US_EXAMPLE_SUBMIT\" Set the ProbeName: ProbeName=\"condor-ap: <HOST_FQDN> \" Replacing <HOST_FQDN> with your access point's fully qualifed domain name Enable the Gratia Probe: EnableProbe=\"1\" If you are updating an existing ProbeConfig from a pre-OSG 3.6 installation, also ensure that the following values are set: Option Value VOOverrides The collaboration's resource pool of your AP, e.g. osg for an OSPool AP SuppressGridLocalRecords \"1\" MapUnknownToGroup \"1\" DataFolder \"/var/lib/condor/gratia/data/\" WorkingFolder \"/var/lib/condor/gratia/tmp/\" LogFolder \"/var/log/condor/gratia/\"","title":"Access Points"},{"location":"other/troubleshooting-gratia/#compute-entrypoints","text":"In normal cases, osg-configure manages the relevant ProbeConfig and it can be configured by modifying /etc/osg/config.d/30-gratia.ini . Consult the osg-configuration documentation for details. If there are problems or special configuration, you might need to edit the Gratia configuration files yourself by modifying /etc/gratia/htcondor-ce/ProbeConfig . The ProbeConfig files have many details. A few options that you might need to edit are shown before. This is not a complete file, but only shows a subset of the options. <ProbeConfiguration CollectorHost=\"gratia-osg-itb.opensciencegrid.org:80\" SSLHost=\"gratia-osg-itb.opensciencegrid.org:80\" SSLRegistrationHost=\"gratia-osg-itb.opensciencegrid.org:80\" ProbeName=\"htcondor-ce:fermicloud084.fnal.gov\" SiteName=\"WISC_OSG_EDU\" EnableProbe=\"1\" /> The options you see here are: Option Comments CollectorHost The GRACC server this probe reports to SSLHost The GRACC server this probe reports to SSLRegistrationHost The GRACC server this probe reports to ProbeName The unique name for this probe. Note that it includes the probe type and the host name SiteName The name of your Resource, as registered in OSG Topology . EnableProbe The probe will only run if this is \"1\" Again, there are many more options in this file. Most of the time you won't need to touch them.","title":"Compute Entrypoints"},{"location":"other/troubleshooting-gratia/#have-records-been-uploaded-to-the-gracc","text":"If you have verified that your Gratia probe is running and you are receiving pilot jobs, you should see data in the GRACC for your service approximately 24h after jobs have completed successfully by entering your Topology-registered site name into the Facility dropdown: Access Points Compute Entrypoints If you still aren't seeing data in GRACC, use this section to ensure that your resources are registered properly.","title":"Have Records Been Uploaded To the GRACC?"},{"location":"other/troubleshooting-gratia/#have-you-configured-the-resource-names-correctly","text":"","title":"Have you configured the resource names correctly?"},{"location":"other/troubleshooting-gratia/#access-points_2","text":"Ensure that SiteName in your ProbeConfig matches your Topology-registered resource name. See this section for details","title":"Access Points"},{"location":"other/troubleshooting-gratia/#compute-entrypoints_1","text":"Do the names of your resources match the names in OSG Topology ? Gratia retrieves the resource name from the Site Information section of the /etc/osg/config.d/40-siteinfo.ini ;=================================================================== ; Site Information ;=================================================================== [Site Information] ; Set \"group\" to \"OSG\" for a production site, or \"OSG-ITB\" for an ITB site. ; ; YOU WILL NEED TO CHANGE THIS group = OSG ; Set \"host_name\" to the host name of the CE being configured. ; This should resolve in DNS; if DNS is not set up yet, enter an IPv4/v6 address instead. ; ; YOU WILL NEED TO CHANGE THIS host_name = tusker-gw1.unl.edu ; Set \"resource\" to the name of the resource that you have registered ; in the OSG topology repository at https://github.com/opensciencegrid/topology ; ; YOU WILL NEED TO CHANGE THIS resource = Tusker-CE1 Do those names match the names that you registered with OSG Topology? If not, edit the names, and rerun \"osg-configure -c\".","title":"Compute Entrypoints"},{"location":"other/troubleshooting-gratia/#did-the-site-name-change","text":"Was the site previously reporting data, but the registered Topology site name changed? When the site name changes, you need to ask the GRACC operations team to update the name of your site at the GRACC collector: Open a support ticket Select \"Software or Service\" Select \"GRACC Operations\" Write a friendly email that asks the GRACC team to change your site name at the collector. Make sure to tell them the old name and the new name: Hello GRACC Team, Please change the site name of my site from <OLD NAME> to <NEW NAME>. Thanks, ...","title":"Did the site name change?"},{"location":"other/troubleshooting-gratia/#reference","text":"If you need to look for more data, consider consulting some of the relevant files here: File Purpose /var/log/condor/gratia/<DATE>.log or /var/log/condor-ce/gratia/<DATE>.log Log file that records information about processing and uploading of Gratia accounting data /var/lib/condor/gratia/data or /var/lib/condor-ce/gratia/data Location for AP and CE job data before being processed by Gratia HTCondor or HTCondor-CE's PER_JOB_HISTORY_DIR should be set to this location /var/lib/condor/gratia/tmp or /var/lib/condor-ce/gratia/tmp Location for temporary Gratia data as it is being processed, usually empty. If you have files that are more than 30 minutes old in this directory, there may be a problem /etc/gratia/condor-ap/ProbeConfig or /etc/gratia/htcondor-ce/ProbeConfig Configuration for Gratia probes Not all RPMs will be on all hosts. Instead, only the gratia-probe-common and the one RPM specific to that host will be installed. The most common RPMs you will see are: RPM Purpose gratia-probe-common Code shared between all Gratia probes gratia-probe-condor An empty probe to ease updates from OSG 3.5 to OSG 3.6. Can be safely removed gratia-probe-condor-ap The probe that tracks Access Point usage gratia-probe-htcondor-ce Probe that tracks HTCondor-CE usage","title":"Reference"},{"location":"release/osg-23/","text":"OSG 23 News \u00b6 Supported OS Versions: EL8, EL9 (see this document for details) OSG 23 is the first release series that follows our new support policy , which aims to increase the regularity of the OSG Software Stack lifecycle. Moving forward, we plan to distribute a new release series each year, supporting each release series for approximately 2 years total. As with OSG 3.6, we will continue to release OSG 23 package updates in a rolling fashion. Additionally, OSG 23 aligns the OSG and HTCondor Software Suite (HTCSS) release cycles: OSG 23 main Yum repositories will contain HTCSS LTS series ( HTCondor 23.0 , HTCondor-CE 23.0 ) OSG 23 upcoming Yum repositories will contain HTCSS feature series (HTCondor 23.x, HTCondor-CE 23.x) Known Issues \u00b6 HTCondor-CE and Torque \u00b6 We have noticed issues with our HTCondor-CE + Torque batch system automated tests for Torque RPMs installed out of EPEL. This issue is still under investigation. Latest News \u00b6 November 2, 2023: IGTF 1.124, CVMFS 2.11.2, cvmfs-x509-helper 2.4 \u00b6 CA certificates based on IGTF 1.124 Updated contact meta-data for ArmeSFo authority (AM) Removed discontinued AEGIS authority (RS) Removed suspended KENET Root and issuing CAs (KE) Removed suspended SDG-G2 authority (CN) Removed suspended CNIC authority (CN) Removed all four discontinued DigitalTrust CAs operated by their issuer (AE) CVMFS 2.11.2 Bug fix release cvmfs-x509-helper 2.4 Important bug fix for reading credentials from within an unprivileged user namespace such as unprivileged apptainer users. This is needed due to a change in recent el8 & el9 kernels. October 31, 2023: HTCondor 23.0.1 LTS; Upcoming: HTCondor 23.1.0 \u00b6 HTCondor 23.0.1 LTS Update to apptainer version 1.2.4 in the HTCondor tarballs Fix 10.6.0 bug that broke PID namespaces Fix bug where execution times for ARC CE jobs were 60 times too large Fix bug where a failed 'Service' node would crash DAGMan Condor-C and Job Router jobs now get resources provisioned updates Upcoming: HTCondor 23.1.0 Enhanced filtering with condor_watch_q The Access Point can now be told to use a non-standard ssh port when sending jobs to a remote scheduling system (such as Slurm) Laid groundwork to allow an Execution Point running without root access to accurately limit the job's usage of CPU and Memory in real time via Linux kernel cgroups; this is particularly interesting for glidein pools HTCondor file transfers using HTTPS can now utilize CA certificates in a non-standard location All the fixes from HTCondor 23.0.1 October 26, 2023: CVMFS 2.11.1-1.3, XRootD 5.6.2-2.3, osg-update-vos 1.4.2-2 \u00b6 CVMFS 2.11.1-1.3 Important fix to bug impacting osgstorage.org repositories introduced in 2.11.0 -- all 2.11.0 installations should upgrade urgently Fix race conditions on concurrent fuse3 mounts XRootD 5.6.2-2.3 Update to -2.3 release to avoid confusion with upstream -2 release Fix a bug with parsing compound IDs in authfiles osg-update-vos 1.4.2-2 tarballs now contain cpio, so osg-update-vos will work October 3, 2023 : Initial Release \u00b6 OSG 3.6 retirement As part of our transition to our new series release cadence, we are planning to end support for OSG 3.6 on 30 June 2024 to align with the EL7 end-of-life. See our release series life-cycle table for details. The initial release of OSG 23 contains major package updates , package removals , and new container images . All other packages may have received minor version and/or packaging updates compared to OSG 3.6. Major package updates \u00b6 This release contains the following major package updates compared to the current OSG 3.6 release: HTCondor 23.0.0 : an update from 10.0.8 in OSG 3.6 main and 10.8.0 in OSG 3.6 upcoming. New features A condor_startd without any slot types defined will now default to a single partitionable slot rather than a number of static slots equal to the number of cores as it was in previous versions. The configuration template use FEATURE : StaticSlots was added for admins wanting the old behavior. The TargetType attribute is no longer a required attribute in most Classads. It is still used for queries to the condor_collector and it remains in the Job ClassAd and the Machine ClassAd because of older versions of HTCondor require it to be present. The -dry-run option of condor_submit will now print the output of a SEC_CREDENTIAL_STORER script . This can be useful when developing such a script. Added ability to query epoch history records from the python bindings. The default value of SEC_DEFAULT_AUTHENTICATION_METHODS will now be visible in condor_config_val . The default for SEC_*_AUTHENTICATION_METHODS will inherit from this value, and thus no READ and CLIENT will no longer automatically have CLAIMTOBE . Added new tool condor_test_token , which will create a SciToken with configurable contents (including issuer) which will be accepted for a short period of time by the local HTCondor daemons. Bug fixes Fixed a bug that would cause the condor_startd to crash in rare cases when jobs go on hold Fixed a bug where if a user-level checkpoint could not be transferred from the starter to the AP, the job would go on hold Now it will retry, or go back to idle Fixed a bug where the CommittedTime attribute was not set correctly for Docker Universe jobs doing user level check-pointing Fixed a bug where condor_preen was deleting files named 'OfflineAds' in the spool directory Fixed a bug where the blahpd would incorrectly believe that an LSF batch scheduler was not working Fixed the Execution Point\u2019s detection of whether libvirt is working properly for the VM universe Fixed a bug where container universe did not work for late materialization jobs submitted to the condor_schedd Fixed a bug where the condor_startd could crash if a new match is made at the end a drain request HTCondor-CE 23.0.0 : an update from 6.0.0 in OSG 3.6 main. Job router configuration deprecation The configuration macros JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of HTCondor. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_<name> . Note: The removal will occur during the lifetime of the HTCondor V23 feature series, i.e. the versions that will be available in OSG upcoming repositories. Adds deprecation warnings for old job router configuration syntax Adds grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory on startup condor_ce_upgrade_check checks compatibility with HTCondor 23 Adds an option to allow running condor_ce_trace without a SciToken for testing batch system integration XRootD 5.6.2 : an update from XRootD 5.5.5 in OSG 3.6 main. New Features Add xrdfs cache subcommand to allow for cache evictions Better handling of unicode strings in the API Add gsi option to display DN when it differs from entity name Allow specfication of minimum and maximum creation mode Make maxfd be configurable (default is 256k) Include token information in the monitoring stream (phase 1) Implement a file evict function Increase default number of parallel event loops to 10 xrdcp: number of parallel copy jobs increased from 4 to 128 Allow XRootD to return trailers indicating failure Denote Accept-Ranges in HEAD response Report cache object age for caching proxy mode Allow origin to be a directory of a locally mounted file system Implement ability to have the token username as a separate claim Use SHA-256 for signatures, and message digest algorithm Allow option '-tokenlib none' to disable token validation Allow to point to a token file using CGI '?xrd.ztn=tokenfile' Major bug fixes Fix SEGV in case request has object for opaque data but no content Fix memory leaks in GSI authentication Fix chunked PUT creating empty files GlideinWMS 3.10.5 : an update from 3.10.1 in OSG 3.6 main If you are using custom setup scripts... If you are using custom setup scripts please change the use of glidein_config : Custom scripts should always read values via gconfig_get() . The only exception is the parsing of the line to get the add_config_line source file. add_config_line is deprecated in favor of gconfig_add . add_config_line will be removed from future versions. add_config_line_safe is deprecated in favor of gconfig_add_safe . gconfig_add is the recommended method to use also in concurrent scripts. Custom scripts in Python should import gconfig.py (compatible with both Python 2 and 3) and use the provided functions or classes: gconfig_get , gconfig_add , etc. This release completes EL9 and Python 3.9 support Added structured logging Various OSG_Autoconf improvements Fixed bugs with Python 3.9 and rrdtools failures with missing ClassAds and monitoring Package Removals \u00b6 The following packages were removed from OSG 23 compared to OSG 3.6: blahp : available as part of the condor package oidc-agent : available in EPEL python-jwt : available in EPEL python-scitokens : available in EPEL rrdtool available from OS repositories Container Images \u00b6 The following container images have new tags for OSG 23: Image name Tags hub.opensciencegrid.org/opensciencegrid/atlas-xcache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/cms-xcache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/frontier-squid 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/oidc-agent 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/osgvo-docker-pilot 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/stash-cache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/stash-origin 23-release , 23-testing For example, to retreive an OSG 23 backfill container image, run the following command: docker pull hub.opensciencegrid.org/opensciencegrid/osgvo-docker-pilot:23-release For more details on OSG container images, see our policy document . Announcements \u00b6 Updates to critical packages also announced by email and are sent to the following recipients and lists: Registered administrative contacts osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org site-announce@opensciencegrid.org software-discuss@osg-htc.org","title":"News"},{"location":"release/osg-23/#osg-23-news","text":"Supported OS Versions: EL8, EL9 (see this document for details) OSG 23 is the first release series that follows our new support policy , which aims to increase the regularity of the OSG Software Stack lifecycle. Moving forward, we plan to distribute a new release series each year, supporting each release series for approximately 2 years total. As with OSG 3.6, we will continue to release OSG 23 package updates in a rolling fashion. Additionally, OSG 23 aligns the OSG and HTCondor Software Suite (HTCSS) release cycles: OSG 23 main Yum repositories will contain HTCSS LTS series ( HTCondor 23.0 , HTCondor-CE 23.0 ) OSG 23 upcoming Yum repositories will contain HTCSS feature series (HTCondor 23.x, HTCondor-CE 23.x)","title":"OSG 23 News"},{"location":"release/osg-23/#known-issues","text":"","title":"Known Issues"},{"location":"release/osg-23/#htcondor-ce-and-torque","text":"We have noticed issues with our HTCondor-CE + Torque batch system automated tests for Torque RPMs installed out of EPEL. This issue is still under investigation.","title":"HTCondor-CE and Torque"},{"location":"release/osg-23/#latest-news","text":"","title":"Latest News"},{"location":"release/osg-23/#november-2-2023-igtf-1124-cvmfs-2112-cvmfs-x509-helper-24","text":"CA certificates based on IGTF 1.124 Updated contact meta-data for ArmeSFo authority (AM) Removed discontinued AEGIS authority (RS) Removed suspended KENET Root and issuing CAs (KE) Removed suspended SDG-G2 authority (CN) Removed suspended CNIC authority (CN) Removed all four discontinued DigitalTrust CAs operated by their issuer (AE) CVMFS 2.11.2 Bug fix release cvmfs-x509-helper 2.4 Important bug fix for reading credentials from within an unprivileged user namespace such as unprivileged apptainer users. This is needed due to a change in recent el8 & el9 kernels.","title":"November 2, 2023: IGTF 1.124, CVMFS 2.11.2, cvmfs-x509-helper 2.4"},{"location":"release/osg-23/#october-31-2023-htcondor-2301-lts-upcoming-htcondor-2310","text":"HTCondor 23.0.1 LTS Update to apptainer version 1.2.4 in the HTCondor tarballs Fix 10.6.0 bug that broke PID namespaces Fix bug where execution times for ARC CE jobs were 60 times too large Fix bug where a failed 'Service' node would crash DAGMan Condor-C and Job Router jobs now get resources provisioned updates Upcoming: HTCondor 23.1.0 Enhanced filtering with condor_watch_q The Access Point can now be told to use a non-standard ssh port when sending jobs to a remote scheduling system (such as Slurm) Laid groundwork to allow an Execution Point running without root access to accurately limit the job's usage of CPU and Memory in real time via Linux kernel cgroups; this is particularly interesting for glidein pools HTCondor file transfers using HTTPS can now utilize CA certificates in a non-standard location All the fixes from HTCondor 23.0.1","title":"October 31, 2023: HTCondor 23.0.1 LTS; Upcoming: HTCondor 23.1.0"},{"location":"release/osg-23/#october-26-2023-cvmfs-2111-13-xrootd-562-23-osg-update-vos-142-2","text":"CVMFS 2.11.1-1.3 Important fix to bug impacting osgstorage.org repositories introduced in 2.11.0 -- all 2.11.0 installations should upgrade urgently Fix race conditions on concurrent fuse3 mounts XRootD 5.6.2-2.3 Update to -2.3 release to avoid confusion with upstream -2 release Fix a bug with parsing compound IDs in authfiles osg-update-vos 1.4.2-2 tarballs now contain cpio, so osg-update-vos will work","title":"October 26, 2023: CVMFS 2.11.1-1.3, XRootD 5.6.2-2.3, osg-update-vos 1.4.2-2"},{"location":"release/osg-23/#october-3-2023-initial-release","text":"OSG 3.6 retirement As part of our transition to our new series release cadence, we are planning to end support for OSG 3.6 on 30 June 2024 to align with the EL7 end-of-life. See our release series life-cycle table for details. The initial release of OSG 23 contains major package updates , package removals , and new container images . All other packages may have received minor version and/or packaging updates compared to OSG 3.6.","title":"October 3, 2023: Initial Release"},{"location":"release/osg-23/#major-package-updates","text":"This release contains the following major package updates compared to the current OSG 3.6 release: HTCondor 23.0.0 : an update from 10.0.8 in OSG 3.6 main and 10.8.0 in OSG 3.6 upcoming. New features A condor_startd without any slot types defined will now default to a single partitionable slot rather than a number of static slots equal to the number of cores as it was in previous versions. The configuration template use FEATURE : StaticSlots was added for admins wanting the old behavior. The TargetType attribute is no longer a required attribute in most Classads. It is still used for queries to the condor_collector and it remains in the Job ClassAd and the Machine ClassAd because of older versions of HTCondor require it to be present. The -dry-run option of condor_submit will now print the output of a SEC_CREDENTIAL_STORER script . This can be useful when developing such a script. Added ability to query epoch history records from the python bindings. The default value of SEC_DEFAULT_AUTHENTICATION_METHODS will now be visible in condor_config_val . The default for SEC_*_AUTHENTICATION_METHODS will inherit from this value, and thus no READ and CLIENT will no longer automatically have CLAIMTOBE . Added new tool condor_test_token , which will create a SciToken with configurable contents (including issuer) which will be accepted for a short period of time by the local HTCondor daemons. Bug fixes Fixed a bug that would cause the condor_startd to crash in rare cases when jobs go on hold Fixed a bug where if a user-level checkpoint could not be transferred from the starter to the AP, the job would go on hold Now it will retry, or go back to idle Fixed a bug where the CommittedTime attribute was not set correctly for Docker Universe jobs doing user level check-pointing Fixed a bug where condor_preen was deleting files named 'OfflineAds' in the spool directory Fixed a bug where the blahpd would incorrectly believe that an LSF batch scheduler was not working Fixed the Execution Point\u2019s detection of whether libvirt is working properly for the VM universe Fixed a bug where container universe did not work for late materialization jobs submitted to the condor_schedd Fixed a bug where the condor_startd could crash if a new match is made at the end a drain request HTCondor-CE 23.0.0 : an update from 6.0.0 in OSG 3.6 main. Job router configuration deprecation The configuration macros JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of HTCondor. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_<name> . Note: The removal will occur during the lifetime of the HTCondor V23 feature series, i.e. the versions that will be available in OSG upcoming repositories. Adds deprecation warnings for old job router configuration syntax Adds grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory on startup condor_ce_upgrade_check checks compatibility with HTCondor 23 Adds an option to allow running condor_ce_trace without a SciToken for testing batch system integration XRootD 5.6.2 : an update from XRootD 5.5.5 in OSG 3.6 main. New Features Add xrdfs cache subcommand to allow for cache evictions Better handling of unicode strings in the API Add gsi option to display DN when it differs from entity name Allow specfication of minimum and maximum creation mode Make maxfd be configurable (default is 256k) Include token information in the monitoring stream (phase 1) Implement a file evict function Increase default number of parallel event loops to 10 xrdcp: number of parallel copy jobs increased from 4 to 128 Allow XRootD to return trailers indicating failure Denote Accept-Ranges in HEAD response Report cache object age for caching proxy mode Allow origin to be a directory of a locally mounted file system Implement ability to have the token username as a separate claim Use SHA-256 for signatures, and message digest algorithm Allow option '-tokenlib none' to disable token validation Allow to point to a token file using CGI '?xrd.ztn=tokenfile' Major bug fixes Fix SEGV in case request has object for opaque data but no content Fix memory leaks in GSI authentication Fix chunked PUT creating empty files GlideinWMS 3.10.5 : an update from 3.10.1 in OSG 3.6 main If you are using custom setup scripts... If you are using custom setup scripts please change the use of glidein_config : Custom scripts should always read values via gconfig_get() . The only exception is the parsing of the line to get the add_config_line source file. add_config_line is deprecated in favor of gconfig_add . add_config_line will be removed from future versions. add_config_line_safe is deprecated in favor of gconfig_add_safe . gconfig_add is the recommended method to use also in concurrent scripts. Custom scripts in Python should import gconfig.py (compatible with both Python 2 and 3) and use the provided functions or classes: gconfig_get , gconfig_add , etc. This release completes EL9 and Python 3.9 support Added structured logging Various OSG_Autoconf improvements Fixed bugs with Python 3.9 and rrdtools failures with missing ClassAds and monitoring","title":"Major package updates"},{"location":"release/osg-23/#package-removals","text":"The following packages were removed from OSG 23 compared to OSG 3.6: blahp : available as part of the condor package oidc-agent : available in EPEL python-jwt : available in EPEL python-scitokens : available in EPEL rrdtool available from OS repositories","title":"Package Removals"},{"location":"release/osg-23/#container-images","text":"The following container images have new tags for OSG 23: Image name Tags hub.opensciencegrid.org/opensciencegrid/atlas-xcache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/cms-xcache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/frontier-squid 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/oidc-agent 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/osgvo-docker-pilot 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/stash-cache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/stash-origin 23-release , 23-testing For example, to retreive an OSG 23 backfill container image, run the following command: docker pull hub.opensciencegrid.org/opensciencegrid/osgvo-docker-pilot:23-release For more details on OSG container images, see our policy document .","title":"Container Images"},{"location":"release/osg-23/#announcements","text":"Updates to critical packages also announced by email and are sent to the following recipients and lists: Registered administrative contacts osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org site-announce@opensciencegrid.org software-discuss@osg-htc.org","title":"Announcements"},{"location":"release/osg-36/","text":"OSG 3.6 News \u00b6 Supported OS Versions: EL7, EL8, EL9 The OSG 3.6 release series is a major overhaul of the OSG software stack compared to previous release series with changes to core protocols used for authentication and data transfer: bearer tokens, such as SciTokens or WLCG tokens, are used for authentication instead of GSI proxies and HTTP is used for data transfer instead of GridFTP. To support these new protocols, OSG 3.6 includes HTCondor 9.0, HTCondor-CE 5, GlideinWMS 3.9, and XRootD 5.4. We also dropped support for the GridFTP, GSI authentication, and Hadoop. Known Issues \u00b6 The following issues are known to currently affect packages distributed in OSG 3.6: Preparing for HTCondor 10.0 \u00b6 We have released HTCondor version 10.0 into the OSG repositories. Note The condor-upgrade-checks RPM version 10.0.5 works with existing HTCondor 9.0.x installations. It can be installed with either HTCondor version 9 or 10. HTCondor-CE and HTCondor pool administrators should install the condor-upgrade-checks RPM and run the condor_upgrade_check script to check for actions that need to be taken before upgrading to HTCondor version 10. This script checks for three possible issues: HTCondor upgrade causing a change in TRUST_DOMAIN which would invalidate existing IDTOKENS Recent and current GPU jobs that will no longer match, because the new require_gpus condor_submit command must be used for GPU matching HTCondor map files that have regular expressions that the new PCRE2 library will not accept To check your Access Point configuration run: root@access-point # condor_upgrade_check To check your Compute Entrypoint configuration run: root@htcondor-ce # condor_upgrade_check -ce Note Don't forget to check to HTCondor batch system as well For more information, consult the HTCondor documentation CA Certificates on EL9 \u00b6 EL9 operating systems have a tighter default cryptographic policy that can cause services to reject certificates issued by SHA-1 signed CAs. Some CAs in the igtf-ca-certs and osg-ca-certs packages are affected and you may see service issues if your server certificate or certificates presented by clients are issued by these CAs. The Software Team is investigating solutions but in the meantime, we recommend running the following command on XRootD hosts to accept certificates issued by SHA-1 signed CAs: root@host # update-crypto-policies --set DEFAULT:SHA1 Do I need to run this on my Compute Entrypoint (CE) hosts? No. At this time, the Software Team believes that CE hosts are unaffected since their clients only present tokens and token issuers present modern CAs. rrdtool \u00b6 To improve support for Python 3 based GlideinWMS in EL7, the EL7 OSG Yum repositories contain a newer version of rrdtool than the operating system repositories. This may cause dependency solving issues with non-OSG packages. Therefore, on EL7 hosts that are not running GlideinWMS, we suggest adding the following line under the [osg] section of /etc/yum.repos.d/osg.repo : excludepkgs=rrdtool Latest News \u00b6 November 2, 2023: IGTF 1.124, CVMFS 2.11.2, cvmfs-x509-helper 2.4 \u00b6 CA certificates based on IGTF 1.124 Updated contact meta-data for ArmeSFo authority (AM) Removed discontinued AEGIS authority (RS) Removed suspended KENET Root and issuing CAs (KE) Removed suspended SDG-G2 authority (CN) Removed suspended CNIC authority (CN) Removed all four discontinued DigitalTrust CAs operated by their issuer (AE) CVMFS 2.11.2 Bug fix release cvmfs-x509-helper 2.4 Important bug fix for reading credentials from within an unprivileged user namespace such as unprivileged apptainer users. This is needed due to a change in recent el8 & el9 kernels. October 26, 2023: CVMFS 2.11.1-1.3, XRootD 5.6.2-2.3, HTCondor-CE 6.0.1, osg-update-vos 1.4.2-2 \u00b6 CVMFS 2.11.1-1.3 Important fix to bug impacting osgstorage.org repositories introduced in 2.11.0 -- all 2.11.0 installations should upgrade urgently Fix race conditions on concurrent fuse3 mounts XRootD 5.6.2-2.3 Update to -2.3 release to avoid confusion with upstream -2 release Fix a bug with parsing compound IDs in authfiles HTCondor-CE 6.0.1 Add grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory Can use condor_ce_trace without SciToken to test batch system integration condor_ce_upgrade_check checks compatibility with HTCondor 23.0 Adds deprecation warnings for old job router configuration syntax osg-update-vos 1.4.2-2 tarballs now contain cpio, so osg-update-vos will work October 5, 2023: HTCondor 10.0.9, HTCondor 10.9.0, XRootD 5.6.2, GlideinWMS 3.10.5, CVMFS 2.11.0, cvmfs-x509-helper 2.3, osg-pki-tools 3.6.1, oidc-agent 4.5.2 \u00b6 HTCondor 10.0.9 : EL7, EL8 The condor_upgrade_check script now provides guidance on updating to 23.0 Avoid kernel panic on some Enterprise Linux 8 systems Fix bug where early termination of service nodes could crash DAGMan The htchirp Python binding now properly locates the chirp configuration Limit email about long file transfer queue to once daily Various fixes to condor_adstash HTCondor 10.9.0 : EL7 Upcoming, EL8 Upcoming, EL9 Fold the classads, blahp, and procd RPMs into the main condor RPM On Linux, the default configuration enforces memory limits with cgroups condor_status -gpus shows nodes with GPUs and the GPU properties condor_status -compact shows a row for each slot type New ENV command controls which environment variables are present in DAGMan All the fixes from HTCondor 10.0.9 (listed above) XRootD 5.6.2 New feature release: XRootD 5.6.0 with many improvements Plus two bug fix releases: XRootD 5.6.1 and XRootD 5.6.2 GlideinWMS 3.10.5 Enterprise Linux 9 and Python 3.9 support Added structured logging CVMFS 2.11.0 Support for symlink kernel caching A new reference-counted cache manager mode that reduces the number of open file descriptors A bug fix for an issue that could slow down client startup A new telemetry option to send client performance counters to influx cvmfs-x509-helper 2.3 Fixes to support Enterprise Linux 9 Fix for tokens that are bigger than 1024 bytes Fix usage of $BEARER_TOKEN when accessing data osg-pki-tools 3.6.1 Add configuration file for osg-incommon-cert-request Update default CSR key length to 4096, add CLI option oidc-agent 4.5.2 Update to a recent release that has timeouts to prevent hangs September 8, 2023: IGTF 1.123-2 \u00b6 Warning If you updated to osg-ca-certs-1.114-1.1 or igtf-ca-certs-1.123-1.1, update to osg-ca-certs-1.114-2 or igtf-ca-certs-123-2 as soon a possible. Java-based services may need to be restarted to pick up the new certificates. Reverted work around for supporting SHA1-signed CA certificates on systems with tight cryptographic policies (i.e., the EL9 default) September 7, 2023: IGTF 1.123, htgettoken 1.20, Pegasus 5.0.6 \u00b6 CA certificates based on IGTF 1.123 Add ECC private trust hierarchy for GEANT (Research and Education) TCS (EU) Added accredited private trust eMudhra IGTF root and issuers (IN) Resolved issue on EL9 with SHA1 signed Certificate Authorities htgettoken 1.20 Adds httokensh command to automatically renew access tokens as long a subshell runs Update httokensh to by default set the minimum vault token time to live to 6 days, and to make sure that the background refresh never gets a new vault token Changed the preferred name of httokendecode to htdecodetoken , keeping links in the opposite direction Add man pages for httokensh , htdestroytoken , and htdecodetoken Pegasus 5.0.6 : Bug fix release August 10, 2023: frontier-squid 5.9-1.1, xrootd-multiuser 2.1.3-1.3 \u00b6 frontier-squid 5.9-1.1 Improvement of debug logging related to the reply_body_max_size parameter Consistent with squid5, disallow the combination of multiple workers, ufs cache, and memory_cache_shared even if collapsed_forwarding is off. Limit the maximum number of file descriptors to 65536 even if the OS would allow a higher number xrootd-multiuser 2.1.3-1.3 Add support for supplementary groups First release of worker node tarballs for EL9 August 8, 2023: IGTF 1.122 \u00b6 CA certificates based on IGTF 1.122 Added private trust hierarchy for GEANT (Research and Education) TCS (EU) (RSA variants only) Added accredited eMudhra joint public trust root and issuing CAs (IN) Added private trust eMudhra IGTF root and issuers as experimental (IN, US) August 2, 2023: HTCondor 10.0.7; Upcoming: HTCondor 10.7.0 \u00b6 Danger The format of the HTCondor job queue log has changed. Once you have updated the Access Point and HTCondor-CE (i.e., hosts with a condor_schedd daemon) to HTCondor 10.7.0, you may only downgrade to a version that can parse this new format. (LTS: 10.0.4 and later, feature: 10.5.0 and later) We recommend upgrading your Access Points and HTCondor-CE hosts to the latest 10.0.x release or 10.5.0 first, then proceeding with an upgrade to 10.7.0. HTCondor 10.0.7 : EL7, EL8 Fixed bug where held condor cron jobs would never run when released Improved daemon IDTOKENS logging to make useful messages more prominent Remove limit on certificate chain length in SSL authentication condor_config_val -summary now works with a remote configuration query Prints detailed message when condor_remote_cluster fails to fetch a URL Improvements to condor_preen HTCondor 10.7.0 : EL7 Upcoming, EL8 Upcoming, EL9 Can run defrag daemons with different policies on distinct sets of nodes Added want_io_proxy submit command Apptainer is now included in the HTCondor tarballs Fix 10.5.0 bug where reported CPU time is very low when using cgroups v1 Fix 10.5.0 bug where .job.ad and .machine.ad were missing for local jobs July 19, 2023: HTCondor 10.0.6, osg-xrootd 3.6-20, XCache 3.5.0-2, osg-ca-scripts 1.2.4-2; Upcoming: HTCondor 10.6.0 \u00b6 Danger The format of the HTCondor job queue log has changed. Once you have updated the Access Point and HTCondor-CE (i.e., hosts with a condor_schedd daemon) to HTCondor 10.6.0, you may only downgrade to a version that can parse this new format. (LTS: 10.0.4 and later, feature: 10.5.0 and later) We recommend upgrading your Access Points and HTCondor-CE hosts to the latest 10.0.x release or 10.5.0 first, then proceeding with an upgrade to 10.6.0. HTCondor 10.0.6 : EL7, EL8 In SSL Authentication, use the identity instead of the X.509 proxy subject Can use environment variable to locate the client's SSL X.509 credential ClassAd aggregate functions now tolerate undefined values Fix Python binding bug where accounting ads were omitted from the result The Python bindings now properly report the HTCondor version remote_initial_dir works when submitting remote grid batch jobs via ssh Add a ClassAd stringlist subset match function osg-xrootd 3.6-20 Allow create_macaroon_secret to be run as a non-root user XCache 3.5.0-2 Add dependency on the xrdcl-http package osg-ca-scripts 1.2.4-2 Update package dependencies for Enterprise Linux 9 HTCondor 10.6.0 : EL7 Upcoming, EL8 Upcoming, EL9 Administrators can enable and disable job submission for a specific user Work around memory leak in libcurl on EL7 when using the ARC-CE GAHP Container images may now be transferred via a file transfer plugin Add submit file macro $(JobId) which expands to full ID of the job The job's executable is no longer renamed to condor_exec.exe June 29, 2023: XRootD 5.5.5-1.2, osdf-client 6.12.1, hosted-ce-tools 1.0 \u00b6 XRootD 5.5.5-1.2 Patched to allow Diffie-Hellman key exchange between Enterprise Linux 7 clients and Enterprise Linux 9 servers osdf-client 6.12.1 Bug fixes and improvements, notably with regard to authenticated access hosted-ce-tools 1.0 Dereference hardlinks when extracting the cvmfsexec tarball More aggressively kill timed out rsync processes Convert update worker node client scripts to Python 3 June 20, 2023: IGTF 1.121 \u00b6 CA certificates based on IGTF 1.121 Added accredited (classic) InCommon RSA IGTF Server CA 3 under the Sectigo USERTrust RSA root, for which namespaces have been updated (US) June 9, 2023: HTCondor 10.0.5 \u00b6 HTCondor 10.0.5 : EL7, EL8 Rename upgrade9to10checks.py script to condor_upgrade_check Fix spurious warning from condor_upgrade_check about regular expression that contain a space Note The condor-upgrade-checks RPM version 10.0.5 works with existing HTCondor 9.0.x installations. It can be installed with either HTCondor version 9 or 10. June 8, 2023: HTCondor 10.0.4, XCache 3.5.0, frontier-squid 5.8, IGTF 1.120; Upcoming: HTCondor 10.5.1 \u00b6 HTCondor 10.0.4 : EL7, EL8 Users can prevent runaway jobs by specifying an allowed duration Able to extend submit commands and create job submit templates Initial implementation of htcondor command line interface Initial implementation of Job Sets in the htcondor CLI tool Users can supply a container image without concern for which container runtime is used Add the ability to select a particular model of GPU when the execution points have heterogeneous GPU cards installed or cards that support nVidia MIG File transfer error messages are now returned and clearly indicate where the error occurred HTCondor now utilizes ARC-CE's REST interface Support for ARM and PowerPC for Enterprise Linux 8 Security Enhancements For IDTOKENS, signing key not required on every execution point Trust on first use ability for SSL connections Improvements against replay attacks XCache 3.5.0 The authfile updater pulls a grid-mapfile from Topology frontier-squid 5.8 Add predefined ACL named to_linklocal Bug fix for the cache manager returning mgr_index rather than data CA certificates based on IGTF 1.120 Added transitional CDP mirror URLs for retiring DigitalTrust CAs (AE) Removed discontinued NIIF-Root-CA-2 (HU) Removed expiring GermanGrid (GridKA CrossGrid) CA (DE) htgettoken 1.18 Fixes bug with --nobearertoken when invoked by HTCondor EL9 support osg-token-renewer 0.8.3-2 : Remove X11 UI dependencies osg-update-vos 1.4.1: Remove Python 2 dependencies cigetcert 1.21: Remove warning on EL9 HTCondor 10.5.1 : EL7 Upcoming, EL8 Upcoming, EL9 Can now define DAGMan save points to be able to rerun DAGs from there Expand environment variables passed by default to the DAGMan manager Administrators can prevent users using \"getenv = true\" in submit files Improved throughput when submitting a large number of ARC-CE jobs Execute events contain the slot name, sandbox path, resource quantities Can add attributes of the execution point to be recorded in the user log Enhanced condor_transform_ads tool to ease offline job transform testing Fix bug where memory limits over 2 GiB might not be correctly enforced May 30, 2023: HTCondor 9.0.17-3, osdf-client 6.11.0 \u00b6 HTCondor 9.0.17-3 Provides script to assist updating from HTCondor version 9 to version 10 osdf-client 6.11.0 Distinguish between slow and stopped transfers Fix token finding bug introduced in 6.10.0 May 18, 2023: XRootD 5.5.5, vault 1.13.2, htvault-config 1.15, VOMS 2.0.6-1.6 + 2.1.0-0.14.rc2.6 \u00b6 XRootD 5.5.5 : Bug fix release Vault 1.13.2, htvault-config 1.15 Update to latest upstream plugin versions Add new API for token exchange Fix bug where kerberos policydomain was ignored VOMS 2.0.16-1.6 (el7), 2.1.0-0.14.rc2.6 (el8) More detailed error messages to help diagnose CA or certificate issues May 4, 2023: XRootD 5.5.4, VO Pacakge v131 \u00b6 XRootD 5.5.4 Bug fix release Enabled xrdcl-http, an HTTP plugin to the XRootD clients VO Package v131 New CLAS2 and EIC certificates April 20, 2023: HTCondor-CE 6.0.0, htgettoken 1.17; Upcoming: HTCondor 10.4.0 \u00b6 HTCondor-CE 6.0.0 Align HTCondor-CE security configuration with HTCondor defaults Add example configuration on how to ban users Add condor_ce_transform_ads command Improve essential directory checking and creation at startup htgettoken 1.17 Make --showbearerurl work properly in combination with --nobearertoken httokendecode 's error message for missing token file now goes to stderr EL7/EL8 upcoming and EL9 release: HTCondor 10.4.0 - new feature release Please review the upgrade documentation for any manual steps March 30, 2023: EL9 and Gratia Probe 2.8.4 (all operating systems) \u00b6 No tarball client updates This release does not contain any tarball client updates for EL7 or EL8. Initial EL9 tarballs will be released at a later date. Critical Gratia Probe 2.8.4 update for HTCondor APs for all operating systems, fixing issues with gratia-probe-condor-ap 2.8.1 through 2.8.3. If you have any of these versions, please update and perform the following steps to process # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history* . Too many files for mv If you have a busy AP, you may encounter too many files in the quarantine directory to move all at once. In this case, we suggest moving the history files to the data directory in batches. The Gratia Probe will handle history files in the data directory in bundles so you do not need to wait for processing to complete before moving the next batch over. This is the initial release of OSG Software Stack for EL9! Notable differences between EL9 and EL7/EL8 include: Frontier Squid 5.8-2.1 If you've already installed frontier-squid-5.8-1.1 on an EL9 host... You will need to uninstall frontier-squid and remove /etc/init.d . If you have any other packages that have files in /etc/init.d , they may also need to be reinstalled and cleaned up in a similar fashion. HTCondor 10.3.0: see upstream documentation for manual update steps. HTCondor-CE 6.0.0: see upstream documentation for manual update steps. Missing packages to be released at a later date: hosted-ce-tools htgettoken osg-update-data March 16, 2023: OSDF Client 6.10.0 \u00b6 OSDF Client 6.10.0 The stashcp client, when run in a terminal, can acquire a new token via OAuth2 if supported by the upstream origin The use of the http_proxy can now be disabled via setting the OSDF_DISABLE_HTTP_PROXY environment variable The OSDF client can now handle HTTP 206 Partial Content responses stash_plugin -get-caches <prefix> will print out the caches to be used for a given prefix March 14, 2023: IGTF 1.119, osg-scitokens-mapfile 12, XCache 3.4.0-3 \u00b6 CA certificates based on IGTF 1.119 Updated UKeScience Root (2007) with consistent string encodings (UK) Removed obsolete SHA1 subordinates DigiCertGridTrustCA-Classic and DigiCertGridCA-1-Classic from DigiCert, reflected in RPDNC namespaces Experimental (non-accredited) new InCommon RSA IGTF Server CA 2 (ICA under Sectigo USERTrust RSA root, for which namespaces have been updated) (US) Updated GridCanada CA with re-issued SHA-2 based root (CA) Updated CILogon basic, silver, and openid with re-issued SHA-2 certs (US) Updated UKeScience Root (2007) re-issued with SHA-2, retired 2A ICA (UK) osg-scitokens-mapfile 12 New token for USCMS local pilots XCache 3.4.0-3 Add xrootd-tcp-stats to osdf-cache March 9, 2023: XRootD 5.5.3-1.2, frontier-squid 5.7-2.1, CVMFS 2.10.1 \u00b6 XRootD 5.5.3-1.2 Fix bug where GFAL davs writes fail on EL7 redirectors after eight hours XRootD 5.5.2 was not released because of critical bugs that are fixed in XRootD 5.5.3 frontier-squid-5.7-2.1 Uses first destination IP address that responds ( dns_v4_first removed) Add MAJOR_CVMFS sites cvmfs-stratum-one.cc.kek.jp, cvmfs*.sdcc.bnl.gov Remove obsolete sites frontier*.racf.bnl.gov from ATLAS_FRONTIER Fix bug where old caches may not be cleaned up Note Manual intervention is needed to downgrade frontier-squid . # Downgrade instructions root@host # rpm -e --nodeps frontier-squid root@host # yum install ' frontier-squid < 11 :5 CVMFS 2.10.1 Minor bug fixes and improvements March 2, 2023: gratia-probe 2.8.2, osg-flock 1.9 \u00b6 gratia-probe 2.8.2 CRITICAL bug fix for sites that have installed gratia-probe-htcondor-ce-2.8.2 or gratia-probe-condor-ap-2.8.2 . After updating to 2.8.2, perform the following manual steps for a CE: # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_ce_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history*.0 . And for an AP: # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history* . osg-flock 1.9 Adds the \"OSPool\" attribute to the job ad based on the EP configuration February 21, 2023: VO Package v130 \u00b6 VO Package v130 Update DN for voms2.fnal.gov February 16, 2023: gratia-probe 2.8.1, python-scitokens 1.7.4 \u00b6 gratia-probe 2.8.1 gratia-probe-condor-ap: important reporting update for APs with access to multiple pools (e.g., flocking) python-scitokens 1.7.4 Remove aud enforcement from deserialize function February 7, 2023: VO Package v129 \u00b6 VO Package v129 Update DN for voms1.fnal.gov Note The CA/Browser Forum has changed DN formats again this year. This will affect certificates issued by IGTF CAs. February 2, 2023: GlideinWMS 3.10.1, osg-client 6.9.5, htvault-config 1.14 \u00b6 GlideinWMS 3.10.1 Production release supporting tokens and Python 3 Fix monitoring by including Glidein IDs with SciTokens Add Python module to help with custom scripts Custom setup scripts written in shell should always use the gconfig_* functions introduced in 3.9.6 to read and write glidein configuration. See the release notes for details. osdf-client 6.9.5 Better handling of failures and broken proxies Various token handling bug fixes Add support for specifying token name in URL htvault-config 1.14 Add auditlog option to move the audit log to a different location January 17, 2023: VO Package v128 \u00b6 VO Package v128 Update HCC, GLOW, and OSG VOMS certificates January 3, 2023: htgettoken 1.16 \u00b6 htgettoken 1.16 Fix httokendecode -H functionality to only attempt to convert a parsed word if it is entirely numeric, not if it just contains one digit At the same time, rewrite this functionality in native bash instead of using grep and sed Add htdestroytoken command Add htdecodetoken symbolic link that points to httokendecode December 22, 2022: VO Package v127 \u00b6 VO Package v127 Update VOMS certificates for DESY VOs (IceCube, Belle, ILC, and others) Rebuild packages and sign with new repository key (no software changes) cigetcert cilogon-openid-ca-cert cvmfs-config-osg cvmfs-gateway javascriptrrd osg-ca-certs-updater osg-ca-scripts osg-system-profiler osg-update-vos python-jwt scitokens-credmon December 8, 2022: osg-scitokens-mapfile 11, XRootD 5.5.1, CVMFS 2.10.0, GlideinWMS 3.9.6, XCache 3.3.0, Vault 1.12.1 \u00b6 osg-scitokens-mapfile 11 Support HEPCloud factory XRootD 5.5.1 Fixes critical issue with XRootD FUSE mounts via xrdfs CVMFS 2.10.0 Support for proxy sharding with the new client option CVMFS_PROXY_SHARD={yes|no} Improved use of the kernel page cache resulting in significant client performance improvements in some scenarios Fix for a long-standing open issue regarding the concurrent reading of changing files Support for unpacking container images through Harbor registry proxies in the container conversion tools GlideinWMS 3.9.6 Adds token (and hybrid) support for Clouds (AWS/GCE) XCache 3.3.0 Removed X.509 proxy requirement for an unauthenticated stash-cache instance Vault 1.12.1 Includes a fix to prevent a potential denial of service attack for HA installations November 21, 2022: VO Package v126 \u00b6 VO Package v126 Update VOMS certificates for DESY VOs (IceCube, Belle, ILC, and others) Note Any sites supporting \"IceCube\", \"Belle\", or \"ILC\" must update. If not, pilots will not arrive or jobs will have storage access issues. November 3, 2022: HTCondor-CE 5.1.6, osdf-client 6.9.2, xrootd-multiuser 2.1.2, XCache 3.2.3; Upcoming: HTCondor 9.12.0 \u00b6 HTCondor-CE 5.1.6 HTCondor-CE now uses the C++ Collector plugin for payload job traceability Fix HTCondor-CE mapfiles to be compliant with PCRE2 and HTCondor 9.10.0+ Add support for multiple APEL accounting scaling factors Suppress spurious log message about a missing negotiator Fix crash in HTCondor-CE View osdf-client 6.9.2 (includes stashcp and condor_stash_plugin ) Add support for osdf:// URLs Fix zero-byte file upload error xrootd-multiuser 2.1.2 Fix advertising of files not readable by the \"xrootd\" user XCache 3.2.3 Update XCache systemd overrides for xrootd-multiuser 2.1.2 Upcoming: HTCondor 9.12.0 Provide a mechanism to bootstrap secure authentication within a pool Add the ability to define submit templates Administrators can now extend the help offered by condor_submit Add DAGMan ClassAd attributes to record more information about jobs On Linux, advertise the x86_64 micro-architecture in a slot attribute Added -drain option to condor_off and condor_restart Administrators can now set the shared memory size for Docker jobs Multiple improvements to condor_adstash HAD daemons now use SHA-256 checksums by default October 13, 2022: HTCondor 9.0.17 \u00b6 HTCondor 9.0.17 Fix file descriptor leak when schedd fails to launch scheduler universe jobs Fix failure to forward batch grid universe job's refreshed X.509 proxy Fix DAGMan failure when the DONE keyword appeared in the JOB line Fix HTCondor's handling of extremely large UIDs on Linux Fix bug where OAUTH tokens lose their scope and audience upon refresh Support for Apptainer in addition to Singularity September 29, 2022: XCache 3.2.2 \u00b6 XCache 3.2.2 Allow specifying separate export paths for unauthenticated and authenticated origin instances Allow local scitokens.conf additions Fix authfile generation on origins that serve no public data Note XRootD services should be restarted after this update September 16, 2022: VO Package v125 \u00b6 VO Package v125 DN changes for Gluex VO September 15, 2022: XRootD 5.5.0, stashcp 6.8.1, osg-token-renewer 0.8.3 \u00b6 XRootD 5.5.0 : Multiple new features and bug fixes stashcp 6.8.1 Fix WLCG token discovery Dynamically obtain list of caches based on the source file's namespace osg-token-renewer 0.8.3 Doesn't check for password file when --pw-fd is being used September 9, 2022: VO Package v124 \u00b6 VO Package v124 Add voms1.slac.stanford.edu VOMS server for LSST and SuperCDMS August 30, 2022: VO Package v123, IGTF 1.117 \u00b6 VO Package v123 Update Virgo DNs CA certificates based on IGTF 1.117 Add new intermediate ICA DigiCert Grid-TLS (US) Add new intermediate ICA DigiCert Grid-Client-RSA2048-SHA256-2022-CA1 (US) Removed discontinued NCSA-slcs-2013 following end of XSEDE (US) Removed discontinued PSC-Myproxy-CA following end of XSEDE (US) August 25, 2022: gratia-probe 2.7.1, HTCondor 9.11.0 \u00b6 gratia-probe 2.7.1 Fix condor-ap probe bugs in resource name detection Upcoming: HTCondor 9.11.0 Modified GPU attributes to support the new require_gpus submit command Add PREEMPT_IF_DISK_EXCEEDED and HOLD_IF_DISK_EXCEEDED configuration templates ADVERTISE authorization levels now also provide READ authorization Periodic release expressions no longer apply to manually held jobs If a #! interpreter doesn't exist, a proper hold and log message appears Can now set the Singularity target directory with container_target_dir If SciToken and X.509 available, uses SciToken for arc job authentication Singularity now mounts /tmp and /var/tmp under the scratch directory Fix bug where Singularity jobs go on hold at the first checkpoint Report resources provisioned by the Slurm batch scheduler when available Fix bug where gridmanager deletes the X.509 proxy file instead of the copy Fixes jobs going on hold in the HTCondor-CE with the following message: HoldReason:Failed to get expiration time of proxy: unable to read proxy file August 18, 2022: HTCondor 9.0.16, xrootd-monitoring-shoveler 1.1.2 \u00b6 HTCondor 9.0.16 Singularity now mounts /tmp and /var/tmp under the scratch directory Fix bug where Singularity jobs go on hold at the first checkpoint Fix bug where gridmanager deletes the X.509 proxy file instead of the copy Fixes jobs going on hold in the HTCondor-CE with the following message: HoldReason:Failed to get expiration time of proxy: unable to read proxy file xrootd-monitoring-shoveler 1.1.2 Fix bug that affects those using the auto-renewal of tokens July 28, 2022: gratia-probe 2.7.0, blahp 2.2.1, HTCondor 9.0.15, CVMFS 2.9.3 \u00b6 gratia-probe 2.7.0 Fix issue with accounting for whole node Slurm pilots by reporting allocated CPUs if available Fix broken dcache-transfer probe Improve mechanism to extract Resource Name Add back missing gratia-probe-services package blahp 2.2.1 Report allocated CPUs of Slurm jobs in status result Disable email notifications for blahp->condor jobs HTCondor 9.0.15 Report resources provisioned by the Slurm batch scheduler when available SciToken mapping failures are now recorded in the HTCondor daemon logs Fix bug that stopped file transfers when output and error are the same Ensure that the Python bindings version matches the installed HTCondor $(OPSYSANDVER) now expand properly in job transforms Fix bug where context managed Python htcondor.SecMan sessions crash Fix bug where remote CPU times would rarely be set to zero CVMFS 2.9.3 Bug fix for a type of client crash Bug fix for server garbage collection unreleased locks osg-xrootd 3.6-18 Enable VOMS support in authenticated stash caches and origins Add ability to turn off VOMS support via environment variable XRootD 5.4.3-1.2 Improve logging for xrootd-scitokens htgettoken 1.15 Improve support for vault service using round-robin DNS Upcoming: HTCondor 9.10.1 ActivationSetupDuration is now correct for jobs that checkpoint With collector administrator access, can manage HTCondor pool daemons SciTokens can now be used for authentication with ARC CE servers Prevent negative values when using huge files with file transfer plugins June 16, 2022: HTCondor-CE 5.1.5, gratia-probe 2.6.1, GlideinWMS 3.9.5, HTCondor 9.0.13 \u00b6 HTCondor-CE 5.1.5 Fix whole node job glidein CPUs and GPUs expressions that caused held jobs Fix bug where default CERequirements were being ignored Pass whole node request from GlideinWMS to the batch system Rename AuthToken attributes in the routed job to better support accounting Prevent GSI environment from pointing the job to the wrong certificates Fix issue where HTCondor-CE would need port 9618 open to start up gratia-probe 2.6.1 Log schedd cron errors with newer versions of HTCondor Replace AuthToken* references with routed job attributes Remove certinfo flie log messages Fix crash on send failure GlideinWMS 3.9.5 Support for Apptainer Support for CVMFS on-demand Configurable idtokens lifetime Improved frontend logging Improved default SHARED_PORT configuration Special handling of multiline condor config values Several bug fixes HTCondor 9.0.13 : Bug fix release Schedd and startd cron jobs can now log output upon non-zero exit condor_config_val now produces correct syntax for multi-line values The condor_run tool now reports submit errors and warnings to the terminal Fix issue where Kerberos authentication would fail within DAGMan Fix HTCondor startup failure with certain complex network configurations VO Package v122 Add new sPHENIX and EIC VO certificates XCache 3.1.0 Fixed library dependency issues for xcache-reporter Add systemd overrides for xrootd-privileged XRootD 5.4.3 : Bug fix release stashcp 6.7.5 Adds multi-file transfer and improved error messages Relax download timeouts for file transfer plugin Multiple bug fixes htvault-config 1.13 Removes support for old style secret storage; requires htgettoken >= 1.7 htgettoken 1.12 Avoids crash when verbose output includes UTF-8 osg-pki-tools 3.5.2 Bug fix for osg-incommon-cert-request when using host file osg-token-renewer 0.8.2 Use oidc-agent's built-in password file option Ensure tokens are renewed more frequently than their lifespan rrdtool 1.8.0-1.2.el7 Make Python RRDtools available to GlideinWMS xrootd-multiuser 2.0.4 Fix crash on Enterprise Linux 8 osg-release 3.6-5: Add osg-next yum repository Upcoming HTCondor 9.9.1 A new authentication method for remote HTCondor administration Several changes to improve the security of connections Fix issue where DAGMan direct submission failed when using Kerberos The submission method is now recorded in the job ClassAd Singularity jobs can now pull from Docker style repositories The OWNER authorization level has been folded into the ADMINISTRATOR level May 24, 2022: VO Package v121 \u00b6 VO Package v121 Add new VO certificates for CLAS12 and EIC May 11, 2022: OSG Worker Node Client and Tarballs \u00b6 OSG worker node client 3.6-5 Add in missing stashcp and voms-client-cpp packages Warning The current OASIS tarball link now points to the OSG 3.6 tarball. Packages no longer available in the OSG worker node tarball include: fts-client (was present in EL7 only) MyProxy GridFTP clients (e.g. globus-url-copy) UberFTP SRM and GridFTP plugins for GFAL2 GSISSH client May 5, 2022: HTCondor 9.0.12, XCache 3.0.1, gratia-probe 2.5.2 \u00b6 HTCondor 9.0.12 : Bug fix release XCache 3.0.1 Fixed library dependency issues for xcache-reporter gratia-probe 2.5.2 Remove pre-routed jobs instead of quarantining them Always set MapUnknownToGroup osg-flock 1.8 Remove MapUnknownToGroup and MapGroupToRole from osg-flock Advertise osg-flock version in the osg-flock RPM April 26, 2022: CVMFS 2.9.2, Upcoming: HTCondor 9.8.1 \u00b6 CA certificates based on IGTF 1.116 Updated intermediate CERN Grid CA ICA with extended validity (CERN) CVMFS 2.9.2 : Bug fix release cigetcert 1.20: works better with CILogon's AWS infrastructure osg-ce 3.6-5 Add OSG_SERIES = 3.6 as a schedd attribute Remove default BATCH_GAHP configuration now provided by upstream osg-pki-tools 3.5.1: Python 3 fixes for osg-incommon-cert-request osg-xrootd 3.6-16 Fix stash-cache: enabling VOMS causes unauth cache to crash vault 1.10, htvault-config 1.12 htgettoken 1.11 Update from upstream software and change httokendecode to also verify tokens if scitokens-verify is present VOMS 2: Update default proxy certificate key length to 2048 bits Upcoming: HTCondor 9.8.1 Support for Heterogeneous GPUs, some configuration required Allow HTCondor to use grid sites requiring multi-factor authentication Technology preview: bring your own resources from HPC clusters Fix HTCondor startup failure with complex network configurations April 14, 2022: osg-configure 4.1.1, osg-scitokens-mapfile 8 \u00b6 OSG-Configure 4.1.1 Fix gratia DataFolder/PER_JOB_HISTORY_DIR check for HTCondor-CE with an HTCondor batch system osg-scitokens-mapfile 8 New token mappings for CMS local and USCMS local pilots. New token mappings for HCC pilots March 31, 2022: IGTF 1.115 \u00b6 This release contains updated CA Certificates based on IGTF 1.115 Removed obsolete CNRS2 CAs, superseded by AC-GRID-FR hierarchy (FR) Add supplementary BCDR download location for UGRID-G2 CRL (UA) Extended validity period of HPCI CA (JP) March 24, 2022: XRootD 5.4.2-1.1, HTCondor 9.0.11, stashcp 6.6.0 \u00b6 XRootD 5.4.2 plus OSG patches Add support for VOMS mapfiles Fix DN hashing for HTTPS transfers Add new throttling for max open files and active connections per entity HTCondor 9.0.11 The Job Router can now create an IDTOKEN for use by the job Fix bug where a self-checkpointing job may erroneously be held Fix bug where the Job Router erroneously substitutes a default value Fix bug where a file transfer error may identify the wrong file Fix bug where condor_ssh_to_job may fail to connect stashcp 6.6.0 Rewritten in Go New features HTCondor File Transfer plugin interface Progress bar on interactive terminals Recursive downloads WLCG token discovery python-scitokens 1.7.0 osg-token-renewer 0.8.1: Add support for manual client registration xrootd-monitoring-shoveler 1.0.0 : Initial release vault 1.9.3 htgettoken 1.10 Upcoming HTCondor 9.7.0 Support environment variables, application elements in ARC REST jobs Container universe supports Singularity jobs with hard-coded command DAGMan submits jobs directly (does not shell out to condor_submit) Meaningful error message and sub-code for file transfer failures Add file transfer statistics for file transfer plugins Add named list policy knobs for SYSTEM_PERIODIC_ policies March 15, 2022: High Priority Release \u00b6 HTCondor 9.0.10 and 9.6.0 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: HTCONDOR-2022-0001 HTCONDOR-2022-0002 HTCONDOR-2022-0003 March 3, 2022: XRootD 5.4.1 and GlideinWMS 3.9.4 \u00b6 XRootD 5.4.1 : Bug fix release osg-xrootd 3.6-15 GlideinWMS 3.9.4 Add flexible mount points for CVMFS in the Glideins (not always /cvmfs) Per-Entry IDTOKENS Support per-group SciTokens Frontend and Factory check the expiration of SciTokens, other JWT tokens Bug Fixes: IDTOKEN issuer changed from collector host to trust domain X.509 proxy is now renewed also when using also tokens shared port is now the default in the User (VO) Collector HTCondor February 17, 2022: VO Package v120 \u00b6 VO Package v120 Update FNAL voms1 DN February 10, 2022: HTCondor 9.0.9 LTS \u00b6 HTCondor 9.0.9 LTS The OAUTH credmon is now available on Enterprise Linux 8 Deprecated C-style comments no longer cause the job router to crash VO Package v119 Update OSG VO and GLOW VO DNs hosted-ce-tools 0.9: new for Enterprise Linux 8 scitokens-credmon 0.8.1: new for Enterprise Linux 8 February 3, 2022: Gratia Probe 2.5.1 \u00b6 gratia-probe 2.5.1 Fix a bug that prevented record generation for HTCondor batch systems. Manual intervention required; see this documentation for details. Fix ownership of the record quarantine directory osg-flock 1.7 Fixed capitalization of the OSG VO in the default accounting configuration (access point admins that have already updated to osg-flock-1.6 should change VOOverride=\"OSG\" to VOOverride=\"osg\" in /etc/gratia/condor-ap/ProbeConfig) Dropped configuration required for old versions of HTCondor VO Package v118 Update FNAL voms2 DN January 27, 2022: VO Package v117 and OSG SciTokens mapfile 5 \u00b6 VO Package v117 Update GlueX DN Update hcc-voms2 DN Add ATLAS IAM vomses entry osg-scitokens-mapfile 5 Add default SciTokens mappings for the FNAL VOs January 20, 2022: CVMFS 2.9.0 and HTCondor 5.1.3 updates \u00b6 CA Certificates based on IGTF 1.114 Extend validity for SlovakGrid issuing CA (SK) Remove expired Let's Encrypt ROOT CA X3 and X4 CVMFS 2.9.0 Incremental conversion of container images, resulting in a large speed-up for publishing new container image versions to unpacked.cern.ch Support for maintaining repositories in S3 over HTTPS (not just HTTP) Significant speed-ups for S3 and gateway publisher deployments Various bugfixes and smaller improvements (error reporting, repository statistics, network failure handling, etc.) HTCondor-CE 5.1.3 The HTCondor-CE central collector requires SSL credentials from client CEs Fix BDII crash if an HTCondor Access Point is not available Fix formatting of APEL records that contain huge values HTCondor-CE client mapfiles are not installed on the central collector osg-xrootd 3.6-12 Fix default location for grid-mapfile when using HTTP/WebDAV transfer Fix monitoring of writes osg-ce 3.6-4 Release the osg-ce-bosco package January 13, 2022: XRootD 5.4.0 and Vault updates \u00b6 XRootD 5.4.0 : New feature release Fix problem interacting with version 5.1 or 5.2 origin servers xrootd-tcp-stats 1.0.0: Initial release of TCP statistics plugin vault 1.9.0, htvault-config 1.11, htgettoken 1.9 upgrade to latest vault add support for ssh-agent authentication VO Package v116 Add second Belle2 VOMS server oidc-agent 4.2.4 Upgrade to new major version from version 3.3.3 NOTE: oidc-agent must be restarted after upgrade osg-scitokens-mapfile 4 Add default ATLAS token mappings osg-pki-tools 3.5.0-2: Upgrade to Python 3 December 9, 2021: XRootD and HTCondor updates \u00b6 Problem interoperating with older origin servers If an XRootD 5.3.4 cache interacts with a 5.1 or 5.2 origin and there is an asyncio error, it may crash the origin. Please upgrade your origin at your earliest convenience. You may turn off asyncio ( async off ) on either end to avoid the problem. XRootD 5.3.4 Fix uncorrectable checksum errors in XCache Origins HTCondor 9.0.8 LTS X.509 proxy delegation now works in OSG 3.6 Fix bug where huge values of ImageSize and others would end up negative Fix bug in how MAX_JOBS_PER_OWNER applied to late materialization jobs Fix bug where the schedd could choose a slot with insufficient disk space Fix crash in ClassAd substr() function when the offset is out of range Fix bug in Kerberos code that can crash on macOS and could leak memory Fix bug where a job is ignored for 20 minutes if the startd claim fails December 1, 2021: Initial XRootD release \u00b6 XRootD 5.3.2 Initial release of XRootD in OSG 3.6 XCache 3.0.0 Initial release of XCache in OSG 3.6 HTCondor 9.0.7 : Bug fix release Fix bug where condor_gpu_discovery could crash with older CUDA libraries Fix bug where condor_watch_q would fail on machines with older kernels condor_watch_q no longer has a limit on the number of job event log files Fix bug where a startd could crash claiming a slot with p-slot preemption Fix bug where a job start would not be recorded when a shadow reconnects VO Package v115 Add CMS IAM vomses entry Update WLCG VO certificate GlideinWMS 3.9.3 Type validation support to the check_python3_expr.py script Drops the encondingSupport.py module and its unit tests Fixes an encoding problem affecting cloud submissions Pegasus 5.0.1 First OSG release of the Pegasus 5 series Upcoming HTCondor 9.3.0 File transfer plugin sample code to aid in developing new plugins Add generic knob to set the slot user for all slots November 11, 2021: osg-flock and gratia-probes \u00b6 osg-flock 1.6-3 Update probe configuration to support Open Science Pool Overhaul configuration for HTCondor 9.0 gratia-probe 2.3.3 Add gratia-probe-condor-ap for user job accounting of HTCondor Access Points Drop unused XRootD transfer probes Fix default HTCondor-CE probe directory configurations and ownership October 13, 2021: Initial osg-token-renewer release \u00b6 Initial release of the osg-token-renewer : a service to manage automatic renewal of bearer tokens from OIDC providers (e.g., CILogon, IAM), intended for use by VO managers blahp 2.1.3 : Bug fix release Include the more efficient LSF status script Fix status caching on EL7 for PBS, Slurm, and LSF October 5, 2021: IGTF 1.113 \u00b6 This release contains updated CA Certificates based on IGTF 1.113 Suspended MD-GRID CA due to network resolution issues (MD) September 30, 2021: Urgent Let's Encrypt CA certificate update \u00b6 Please update osg-ca-certs as soon as possible. Applications and tools using OpenSSL such as wget, HTCondor, and XRootD, will to fail to establish TLS/HTTPS connections to servers using Let's Encrypt certificates with a \"certificate has expired\" message. This release of OSG 3.6 contains the following packages: osg-ca-certs 1.99 : Remove expired Let's Encrypt CA certificate osg-wn-client: Fix installation issue causes by EPEL's gfal2 update CVMFS 2.8.2 : Bug fix release cvmfs-x509-helper 2.2-2: Fix a number of issues with SciTokens support HTCondor 9.0.6 CUDA_VISIBLE_DEVICES can now contain GPU- formatted values Fix a bug that caused jobs to fail when using Singularity versions > 3.7 Fix bugs relating to the transfer of standard output and error logs vault 1.8.2, htvault-config 1.6, htgettoken 1.6: Minor improvements Upcoming HTCondor 9.2.0 Add DAGMan SERVICE node, used to monitor or report on DAG workflow Fix problem where proxy delegation to HTCondor versions < 9.1.3 failed Jobs are now re-run if the execute directory unexpectedly disappears HTCondor counts the number of files transferred at the submit node Fix a bug that caused jobs to fail when using Singularity versions > 3.7 September 23, 2021: HTCondor-CE 5.1.2 \u00b6 This release of OSG 3.6 contains the following packages: HTCondor-CE 5.1.2 Fixed the default memory and CPU requests when using job router transforms Apply default MaxJobs and MaxJobsIdle when using job router transforms Improved SciTokens support in submission tools Fixed --debug flag in condor_ce_run Update configuration verification script to handle job router transforms Corrected ownership of the HTCondor PER_JOBS_HISTORY_DIR Fix bug passing maximum wall time requests to the local batch system September 9, 2021: HTCondor 9.0.5 and blahp 2.1.1 \u00b6 This release of OSG 3.6 contains the following packages: HTCondor 9.0.5 : Bug fix release Other authentication methods are tried if mapping fails using SciTokens Fix rare crashes from successful condor_submit , which caused DAGMan issues Fix bug where ExitCode attribute would be suppressed when OnExitHold fired condor_who now suppresses spurious warnings coming from netstat The online manual now has detailed instructions for installing on MacOS Fix bug where misconfigured MIG devices would cause no GPUs to be detected The transfer_checkpoint_file list may now include input files blahp 2.1.1 : Bug fix release Add Python 2 support back for Enterprise Linux 7 Allow the user to override system configuration files Enable flexible configuration via a configuration directory Fix Slurm resource usage reporting August 16, 2021: IGTF 1.112 \u00b6 This release contains updated CA Certificates based on IGTF 1.112 Updated ANSPGrid CA with extended validity date (BR) August 12, 2021: Gratia probes 2.1.0 \u00b6 Gratia probes 2.1.0 Fix a problem that caused a traceback message in the condor_meter Fix a traceback caused by missing LogLevel in ProbeConfig Ensure that Gratia accounts for SciTokens-based pilots August 5, 2021: VOMS Update, htvault-config 1.4, htgettoken 1.3 \u00b6 VOMS 2.0.16-1.2 (EL7) and VOMS 2.1.0-0.14.rc2.2 (EL8) Add IAM and TLS SNI support htvault-config 1.4 and htgettoken 1.3 Improved security through more fine-grained vault tokens and detailed logging Miscellaneous improvements July 30, 2021: High Priority Release \u00b6 HTCondor 9.0.4 and 9.1.2 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: HTCONDOR-2021-0003 HTCONDOR-2021-0004 July 27, 2021: High Priority Release \u00b6 HTCondor 9.0.3 and 9.1.1 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: Unfortunately, these releases did not fully mitigate the vulnerability described in HTCONDOR-2021-0003 HTCONDOR-2021-0003 HTCONDOR-2021-0004 July 22, 2021: HTCondor 9.0.2 and blahp 2.1.0 \u00b6 This release of OSG 3.6 contains the following packages: HTCondor 9.0.2-1.1 : Bug fix release HTCondor can be setup to use only FIPS 140-2 approved security functions If the Singularity test fails, the job returns to the idle state Can divide GPU memory, when making multiple GPU entries for a single GPU Startd and Schedd cron job maximum line length increased to 64k bytes Added first class submit keywords for SciTokens Fixed MUNGE authentication blahp 2.1.0 : Bug fix release Fix bug where GPU request was not passed onto the batch script Fix issue where proxy symlinks were not cleaned up by not creating them Fix bug where output files are overwritten if no transfer output remap Added support for passing in extra submit arguments from the job ad July 15, 2021: VO Package v114 \u00b6 This release contains an updated VO Package with the following changes: Fix typo in CLAS12 and EIC VOMS certificate issuers Add LSC files for CERN VO IAM endpoints July 1, 2021: Frontier Squid 4.15-2.1, Vault 1.7.3, Upcoming: HTCondor 9.1.0 \u00b6 This release of OSG 3.6 contains the following packages: Frontier Squid 4.15-2.1 : Fix log rotation when not compressing Vault 1.7.3 : Bug fix release htvault-config 1.2: Updated to match vault 1.7.3 Upcoming HTCondor 9.1.0 : Start of next feature series June 24, 2021: HTCondor 9.0.1, HTCondor-CE 5.1.1 \u00b6 This release of OSG 3.6 contains the following packages: HTCondor 9.0.1-1.2 : Bug fix release Fix problem where X.509 proxy refresh kills job when using AES encryption Fix problem when jobs require a different machine after a failure Fix problem where a job matched a machine it can't use, delaying job start Fix exit code and retry checking when a job exits because of a signal Fix a memory leak in the job router when a job is removed via job policy Fixed the back-end support for the bosco_cluster --add command HTCondor-CE 5.1.1 Improve restart time of HTCondor-CE View Fix bug that caused HTCondor-CE to ignore incoming BatchRuntime requests Fixed error that occurred during RPM installation of non-HTCondor batch systems regarding missing file batch_gahp June 16, 2021: VO Package v113 \u00b6 This release contains an updated VO Package with the following changes: Added new CLAS12 and EIC VO certificates Retired old CLAS12 and EIC VO certificates June 3, 2021: Vault security update and gratia probes \u00b6 This release of OSG 3.6 contains the following packages: gratia-probe 1.23.3: Fix problem that could cause pilot hours to be zero for non-HTCondor batch systems vault 1.7.2 : Security update; fixes CVE-2021-32923. (OSG configuration not vulnerable) May 25, 2021: IGTF 1.111 \u00b6 This release contains updated CA Certificates based on IGTF 1.111 Removed discontinued NERSC-SLCS CA (US) Removed discontinued MYIFAM CA (MY) May 17, 2021: HTCondor-CE 5.1.0 and HTCondor 9.0.0 \u00b6 This release of OSG 3.6 contains the following packages: HTCondor 9.0.0-1.5 : Major new release with enhanced security Blahp 2.0.2 : GPU Support, Converted to Python 3 HTCondor-CE 5.1.0 Support for Job Router Transform configuration syntax Credential mapping changes Converted to Python 3 osg-scitokens-mapfile 3: Updated to support HTCondor-CE 5.1.0 osg-ce: now requires osg-scitokens-mapfile vault 1.7.1: Update to latest upstream release htvault-config 1.1: Uses yaml configuration files htgettoken 1.2: improved error message handling and bug fixes May 13, 2021: High Priority Release \u00b6 This release of OSG 3.6 contains the following packages: Frontier Squid 4.15-1.2 : Closes multiple security vulnerabilities Updated CA certificates based on IGTF 1.110 osg-ca-certs 1.96 : Fixed Let's Encrypt signing policy to accept cross-signing chain April 22, 2021: CVMFS 2.8.1 \u00b6 This release of OSG 3.6 contains the following packages: CVMFS 2.8.1 : Bug fix release gratia-probe 1.23.2 : Converted to use Python 3 March 25, 2021: HTCondor 8.9.11 patches \u00b6 This release of OSG 3.6 contains the following packages: HTCondor 8.9.11-1.4 (EL7 only) Fixes a potential SchedD crash when using malformed tokens condor_watch_q now works on DAGs vo-client-110-1 with updated WeNMR VOMS information Additionally, the following packages that were already available in OSG 3.6 for EL7 were released for EL8: osg-scitokens-mapfile-1-1 containing a new HTCondor-CE mapfile for VO token issuers vault-1.6.2-1 and htvault-config-0.5-1 for managing tokens cvmfs-gateway-1.2.0-1 : note the upstream documentation for updating from version 0.2.5 February 26, 2021: 3.6 Released \u00b6 Where are GlideinWMS and XRootD? XRootD and GlideinWMS are both absent in the initial OSG 3.6 release: we expect major version updates that may require manual intervention for both of these packages so we are holding their initial releases in this series until they are ready. This initial release of the OSG 3.6 release series is based on the packages available in OSG 3.5.31. One of the major changes in this release series is the shift to token-based authentication from GSI proxy-based authentication. Here is a list of the differences in this initial release: GridFTP, GSI, and Hadoop are no longer available Added packages to support token-based authentication HTCondor 8.9.11 : initial token support (8.9.12, which will contain default configuration using tokens, was delayed) HTCondor-CE 5.0.0: support for Python 3 Gratia Probe 2.0.0 : replace all batch system probes with the non-root HTCondor-CE probe OSG-Configure 4.0.0 : Deprecated RSV Dropped unused configuration modules and attributes Reorganized some configuration (see update instructions for more details) In addition, we have updated our Software Release Policy to follow a rolling release model. Finally, our Docker image releases will more closely track our OSG 3.6 repositories. Announcements \u00b6 Updates to critical packages also announced by email and are sent to the following recipients and lists: Registered administrative contacts osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org site-announce@opensciencegrid.org software-discuss@osg-htc.org","title":"News"},{"location":"release/osg-36/#osg-36-news","text":"Supported OS Versions: EL7, EL8, EL9 The OSG 3.6 release series is a major overhaul of the OSG software stack compared to previous release series with changes to core protocols used for authentication and data transfer: bearer tokens, such as SciTokens or WLCG tokens, are used for authentication instead of GSI proxies and HTTP is used for data transfer instead of GridFTP. To support these new protocols, OSG 3.6 includes HTCondor 9.0, HTCondor-CE 5, GlideinWMS 3.9, and XRootD 5.4. We also dropped support for the GridFTP, GSI authentication, and Hadoop.","title":"OSG 3.6 News"},{"location":"release/osg-36/#known-issues","text":"The following issues are known to currently affect packages distributed in OSG 3.6:","title":"Known Issues"},{"location":"release/osg-36/#preparing-for-htcondor-100","text":"We have released HTCondor version 10.0 into the OSG repositories. Note The condor-upgrade-checks RPM version 10.0.5 works with existing HTCondor 9.0.x installations. It can be installed with either HTCondor version 9 or 10. HTCondor-CE and HTCondor pool administrators should install the condor-upgrade-checks RPM and run the condor_upgrade_check script to check for actions that need to be taken before upgrading to HTCondor version 10. This script checks for three possible issues: HTCondor upgrade causing a change in TRUST_DOMAIN which would invalidate existing IDTOKENS Recent and current GPU jobs that will no longer match, because the new require_gpus condor_submit command must be used for GPU matching HTCondor map files that have regular expressions that the new PCRE2 library will not accept To check your Access Point configuration run: root@access-point # condor_upgrade_check To check your Compute Entrypoint configuration run: root@htcondor-ce # condor_upgrade_check -ce Note Don't forget to check to HTCondor batch system as well For more information, consult the HTCondor documentation","title":"Preparing for HTCondor 10.0"},{"location":"release/osg-36/#ca-certificates-on-el9","text":"EL9 operating systems have a tighter default cryptographic policy that can cause services to reject certificates issued by SHA-1 signed CAs. Some CAs in the igtf-ca-certs and osg-ca-certs packages are affected and you may see service issues if your server certificate or certificates presented by clients are issued by these CAs. The Software Team is investigating solutions but in the meantime, we recommend running the following command on XRootD hosts to accept certificates issued by SHA-1 signed CAs: root@host # update-crypto-policies --set DEFAULT:SHA1 Do I need to run this on my Compute Entrypoint (CE) hosts? No. At this time, the Software Team believes that CE hosts are unaffected since their clients only present tokens and token issuers present modern CAs.","title":"CA Certificates on EL9"},{"location":"release/osg-36/#rrdtool","text":"To improve support for Python 3 based GlideinWMS in EL7, the EL7 OSG Yum repositories contain a newer version of rrdtool than the operating system repositories. This may cause dependency solving issues with non-OSG packages. Therefore, on EL7 hosts that are not running GlideinWMS, we suggest adding the following line under the [osg] section of /etc/yum.repos.d/osg.repo : excludepkgs=rrdtool","title":"rrdtool"},{"location":"release/osg-36/#latest-news","text":"","title":"Latest News"},{"location":"release/osg-36/#november-2-2023-igtf-1124-cvmfs-2112-cvmfs-x509-helper-24","text":"CA certificates based on IGTF 1.124 Updated contact meta-data for ArmeSFo authority (AM) Removed discontinued AEGIS authority (RS) Removed suspended KENET Root and issuing CAs (KE) Removed suspended SDG-G2 authority (CN) Removed suspended CNIC authority (CN) Removed all four discontinued DigitalTrust CAs operated by their issuer (AE) CVMFS 2.11.2 Bug fix release cvmfs-x509-helper 2.4 Important bug fix for reading credentials from within an unprivileged user namespace such as unprivileged apptainer users. This is needed due to a change in recent el8 & el9 kernels.","title":"November 2, 2023: IGTF 1.124, CVMFS 2.11.2, cvmfs-x509-helper 2.4"},{"location":"release/osg-36/#october-26-2023-cvmfs-2111-13-xrootd-562-23-htcondor-ce-601-osg-update-vos-142-2","text":"CVMFS 2.11.1-1.3 Important fix to bug impacting osgstorage.org repositories introduced in 2.11.0 -- all 2.11.0 installations should upgrade urgently Fix race conditions on concurrent fuse3 mounts XRootD 5.6.2-2.3 Update to -2.3 release to avoid confusion with upstream -2 release Fix a bug with parsing compound IDs in authfiles HTCondor-CE 6.0.1 Add grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory Can use condor_ce_trace without SciToken to test batch system integration condor_ce_upgrade_check checks compatibility with HTCondor 23.0 Adds deprecation warnings for old job router configuration syntax osg-update-vos 1.4.2-2 tarballs now contain cpio, so osg-update-vos will work","title":"October 26, 2023: CVMFS 2.11.1-1.3, XRootD 5.6.2-2.3, HTCondor-CE 6.0.1, osg-update-vos 1.4.2-2"},{"location":"release/osg-36/#october-5-2023-htcondor-1009-htcondor-1090-xrootd-562-glideinwms-3105-cvmfs-2110-cvmfs-x509-helper-23-osg-pki-tools-361-oidc-agent-452","text":"HTCondor 10.0.9 : EL7, EL8 The condor_upgrade_check script now provides guidance on updating to 23.0 Avoid kernel panic on some Enterprise Linux 8 systems Fix bug where early termination of service nodes could crash DAGMan The htchirp Python binding now properly locates the chirp configuration Limit email about long file transfer queue to once daily Various fixes to condor_adstash HTCondor 10.9.0 : EL7 Upcoming, EL8 Upcoming, EL9 Fold the classads, blahp, and procd RPMs into the main condor RPM On Linux, the default configuration enforces memory limits with cgroups condor_status -gpus shows nodes with GPUs and the GPU properties condor_status -compact shows a row for each slot type New ENV command controls which environment variables are present in DAGMan All the fixes from HTCondor 10.0.9 (listed above) XRootD 5.6.2 New feature release: XRootD 5.6.0 with many improvements Plus two bug fix releases: XRootD 5.6.1 and XRootD 5.6.2 GlideinWMS 3.10.5 Enterprise Linux 9 and Python 3.9 support Added structured logging CVMFS 2.11.0 Support for symlink kernel caching A new reference-counted cache manager mode that reduces the number of open file descriptors A bug fix for an issue that could slow down client startup A new telemetry option to send client performance counters to influx cvmfs-x509-helper 2.3 Fixes to support Enterprise Linux 9 Fix for tokens that are bigger than 1024 bytes Fix usage of $BEARER_TOKEN when accessing data osg-pki-tools 3.6.1 Add configuration file for osg-incommon-cert-request Update default CSR key length to 4096, add CLI option oidc-agent 4.5.2 Update to a recent release that has timeouts to prevent hangs","title":"October 5, 2023: HTCondor 10.0.9, HTCondor 10.9.0, XRootD 5.6.2, GlideinWMS 3.10.5, CVMFS 2.11.0, cvmfs-x509-helper 2.3, osg-pki-tools 3.6.1, oidc-agent 4.5.2"},{"location":"release/osg-36/#september-8-2023-igtf-1123-2","text":"Warning If you updated to osg-ca-certs-1.114-1.1 or igtf-ca-certs-1.123-1.1, update to osg-ca-certs-1.114-2 or igtf-ca-certs-123-2 as soon a possible. Java-based services may need to be restarted to pick up the new certificates. Reverted work around for supporting SHA1-signed CA certificates on systems with tight cryptographic policies (i.e., the EL9 default)","title":"September 8, 2023: IGTF 1.123-2"},{"location":"release/osg-36/#september-7-2023-igtf-1123-htgettoken-120-pegasus-506","text":"CA certificates based on IGTF 1.123 Add ECC private trust hierarchy for GEANT (Research and Education) TCS (EU) Added accredited private trust eMudhra IGTF root and issuers (IN) Resolved issue on EL9 with SHA1 signed Certificate Authorities htgettoken 1.20 Adds httokensh command to automatically renew access tokens as long a subshell runs Update httokensh to by default set the minimum vault token time to live to 6 days, and to make sure that the background refresh never gets a new vault token Changed the preferred name of httokendecode to htdecodetoken , keeping links in the opposite direction Add man pages for httokensh , htdestroytoken , and htdecodetoken Pegasus 5.0.6 : Bug fix release","title":"September 7, 2023: IGTF 1.123, htgettoken 1.20, Pegasus 5.0.6"},{"location":"release/osg-36/#august-10-2023-frontier-squid-59-11-xrootd-multiuser-213-13","text":"frontier-squid 5.9-1.1 Improvement of debug logging related to the reply_body_max_size parameter Consistent with squid5, disallow the combination of multiple workers, ufs cache, and memory_cache_shared even if collapsed_forwarding is off. Limit the maximum number of file descriptors to 65536 even if the OS would allow a higher number xrootd-multiuser 2.1.3-1.3 Add support for supplementary groups First release of worker node tarballs for EL9","title":"August 10, 2023: frontier-squid 5.9-1.1, xrootd-multiuser 2.1.3-1.3"},{"location":"release/osg-36/#august-8-2023-igtf-1122","text":"CA certificates based on IGTF 1.122 Added private trust hierarchy for GEANT (Research and Education) TCS (EU) (RSA variants only) Added accredited eMudhra joint public trust root and issuing CAs (IN) Added private trust eMudhra IGTF root and issuers as experimental (IN, US)","title":"August 8, 2023: IGTF 1.122"},{"location":"release/osg-36/#august-2-2023-htcondor-1007-upcoming-htcondor-1070","text":"Danger The format of the HTCondor job queue log has changed. Once you have updated the Access Point and HTCondor-CE (i.e., hosts with a condor_schedd daemon) to HTCondor 10.7.0, you may only downgrade to a version that can parse this new format. (LTS: 10.0.4 and later, feature: 10.5.0 and later) We recommend upgrading your Access Points and HTCondor-CE hosts to the latest 10.0.x release or 10.5.0 first, then proceeding with an upgrade to 10.7.0. HTCondor 10.0.7 : EL7, EL8 Fixed bug where held condor cron jobs would never run when released Improved daemon IDTOKENS logging to make useful messages more prominent Remove limit on certificate chain length in SSL authentication condor_config_val -summary now works with a remote configuration query Prints detailed message when condor_remote_cluster fails to fetch a URL Improvements to condor_preen HTCondor 10.7.0 : EL7 Upcoming, EL8 Upcoming, EL9 Can run defrag daemons with different policies on distinct sets of nodes Added want_io_proxy submit command Apptainer is now included in the HTCondor tarballs Fix 10.5.0 bug where reported CPU time is very low when using cgroups v1 Fix 10.5.0 bug where .job.ad and .machine.ad were missing for local jobs","title":"August 2, 2023: HTCondor 10.0.7; Upcoming: HTCondor 10.7.0"},{"location":"release/osg-36/#july-19-2023-htcondor-1006-osg-xrootd-36-20-xcache-350-2-osg-ca-scripts-124-2-upcoming-htcondor-1060","text":"Danger The format of the HTCondor job queue log has changed. Once you have updated the Access Point and HTCondor-CE (i.e., hosts with a condor_schedd daemon) to HTCondor 10.6.0, you may only downgrade to a version that can parse this new format. (LTS: 10.0.4 and later, feature: 10.5.0 and later) We recommend upgrading your Access Points and HTCondor-CE hosts to the latest 10.0.x release or 10.5.0 first, then proceeding with an upgrade to 10.6.0. HTCondor 10.0.6 : EL7, EL8 In SSL Authentication, use the identity instead of the X.509 proxy subject Can use environment variable to locate the client's SSL X.509 credential ClassAd aggregate functions now tolerate undefined values Fix Python binding bug where accounting ads were omitted from the result The Python bindings now properly report the HTCondor version remote_initial_dir works when submitting remote grid batch jobs via ssh Add a ClassAd stringlist subset match function osg-xrootd 3.6-20 Allow create_macaroon_secret to be run as a non-root user XCache 3.5.0-2 Add dependency on the xrdcl-http package osg-ca-scripts 1.2.4-2 Update package dependencies for Enterprise Linux 9 HTCondor 10.6.0 : EL7 Upcoming, EL8 Upcoming, EL9 Administrators can enable and disable job submission for a specific user Work around memory leak in libcurl on EL7 when using the ARC-CE GAHP Container images may now be transferred via a file transfer plugin Add submit file macro $(JobId) which expands to full ID of the job The job's executable is no longer renamed to condor_exec.exe","title":"July 19, 2023: HTCondor 10.0.6, osg-xrootd 3.6-20, XCache 3.5.0-2, osg-ca-scripts 1.2.4-2; Upcoming: HTCondor 10.6.0"},{"location":"release/osg-36/#june-29-2023-xrootd-555-12-osdf-client-6121-hosted-ce-tools-10","text":"XRootD 5.5.5-1.2 Patched to allow Diffie-Hellman key exchange between Enterprise Linux 7 clients and Enterprise Linux 9 servers osdf-client 6.12.1 Bug fixes and improvements, notably with regard to authenticated access hosted-ce-tools 1.0 Dereference hardlinks when extracting the cvmfsexec tarball More aggressively kill timed out rsync processes Convert update worker node client scripts to Python 3","title":"June 29, 2023: XRootD 5.5.5-1.2, osdf-client 6.12.1, hosted-ce-tools 1.0"},{"location":"release/osg-36/#june-20-2023-igtf-1121","text":"CA certificates based on IGTF 1.121 Added accredited (classic) InCommon RSA IGTF Server CA 3 under the Sectigo USERTrust RSA root, for which namespaces have been updated (US)","title":"June 20, 2023: IGTF 1.121"},{"location":"release/osg-36/#june-9-2023-htcondor-1005","text":"HTCondor 10.0.5 : EL7, EL8 Rename upgrade9to10checks.py script to condor_upgrade_check Fix spurious warning from condor_upgrade_check about regular expression that contain a space Note The condor-upgrade-checks RPM version 10.0.5 works with existing HTCondor 9.0.x installations. It can be installed with either HTCondor version 9 or 10.","title":"June 9, 2023: HTCondor 10.0.5"},{"location":"release/osg-36/#june-8-2023-htcondor-1004-xcache-350-frontier-squid-58-igtf-1120-upcoming-htcondor-1051","text":"HTCondor 10.0.4 : EL7, EL8 Users can prevent runaway jobs by specifying an allowed duration Able to extend submit commands and create job submit templates Initial implementation of htcondor command line interface Initial implementation of Job Sets in the htcondor CLI tool Users can supply a container image without concern for which container runtime is used Add the ability to select a particular model of GPU when the execution points have heterogeneous GPU cards installed or cards that support nVidia MIG File transfer error messages are now returned and clearly indicate where the error occurred HTCondor now utilizes ARC-CE's REST interface Support for ARM and PowerPC for Enterprise Linux 8 Security Enhancements For IDTOKENS, signing key not required on every execution point Trust on first use ability for SSL connections Improvements against replay attacks XCache 3.5.0 The authfile updater pulls a grid-mapfile from Topology frontier-squid 5.8 Add predefined ACL named to_linklocal Bug fix for the cache manager returning mgr_index rather than data CA certificates based on IGTF 1.120 Added transitional CDP mirror URLs for retiring DigitalTrust CAs (AE) Removed discontinued NIIF-Root-CA-2 (HU) Removed expiring GermanGrid (GridKA CrossGrid) CA (DE) htgettoken 1.18 Fixes bug with --nobearertoken when invoked by HTCondor EL9 support osg-token-renewer 0.8.3-2 : Remove X11 UI dependencies osg-update-vos 1.4.1: Remove Python 2 dependencies cigetcert 1.21: Remove warning on EL9 HTCondor 10.5.1 : EL7 Upcoming, EL8 Upcoming, EL9 Can now define DAGMan save points to be able to rerun DAGs from there Expand environment variables passed by default to the DAGMan manager Administrators can prevent users using \"getenv = true\" in submit files Improved throughput when submitting a large number of ARC-CE jobs Execute events contain the slot name, sandbox path, resource quantities Can add attributes of the execution point to be recorded in the user log Enhanced condor_transform_ads tool to ease offline job transform testing Fix bug where memory limits over 2 GiB might not be correctly enforced","title":"June 8, 2023: HTCondor 10.0.4, XCache 3.5.0, frontier-squid 5.8, IGTF 1.120; Upcoming: HTCondor 10.5.1"},{"location":"release/osg-36/#may-30-2023-htcondor-9017-3-osdf-client-6110","text":"HTCondor 9.0.17-3 Provides script to assist updating from HTCondor version 9 to version 10 osdf-client 6.11.0 Distinguish between slow and stopped transfers Fix token finding bug introduced in 6.10.0","title":"May 30, 2023: HTCondor 9.0.17-3, osdf-client 6.11.0"},{"location":"release/osg-36/#may-18-2023-xrootd-555-vault-1132-htvault-config-115-voms-206-16-210-014rc26","text":"XRootD 5.5.5 : Bug fix release Vault 1.13.2, htvault-config 1.15 Update to latest upstream plugin versions Add new API for token exchange Fix bug where kerberos policydomain was ignored VOMS 2.0.16-1.6 (el7), 2.1.0-0.14.rc2.6 (el8) More detailed error messages to help diagnose CA or certificate issues","title":"May 18, 2023: XRootD 5.5.5, vault 1.13.2, htvault-config 1.15, VOMS 2.0.6-1.6 + 2.1.0-0.14.rc2.6"},{"location":"release/osg-36/#may-4-2023-xrootd-554-vo-pacakge-v131","text":"XRootD 5.5.4 Bug fix release Enabled xrdcl-http, an HTTP plugin to the XRootD clients VO Package v131 New CLAS2 and EIC certificates","title":"May 4, 2023: XRootD 5.5.4, VO Pacakge v131"},{"location":"release/osg-36/#april-20-2023-htcondor-ce-600-htgettoken-117-upcoming-htcondor-1040","text":"HTCondor-CE 6.0.0 Align HTCondor-CE security configuration with HTCondor defaults Add example configuration on how to ban users Add condor_ce_transform_ads command Improve essential directory checking and creation at startup htgettoken 1.17 Make --showbearerurl work properly in combination with --nobearertoken httokendecode 's error message for missing token file now goes to stderr EL7/EL8 upcoming and EL9 release: HTCondor 10.4.0 - new feature release Please review the upgrade documentation for any manual steps","title":"April 20, 2023: HTCondor-CE 6.0.0, htgettoken 1.17; Upcoming: HTCondor 10.4.0"},{"location":"release/osg-36/#march-30-2023-el9-and-gratia-probe-284-all-operating-systems","text":"No tarball client updates This release does not contain any tarball client updates for EL7 or EL8. Initial EL9 tarballs will be released at a later date. Critical Gratia Probe 2.8.4 update for HTCondor APs for all operating systems, fixing issues with gratia-probe-condor-ap 2.8.1 through 2.8.3. If you have any of these versions, please update and perform the following steps to process # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history* . Too many files for mv If you have a busy AP, you may encounter too many files in the quarantine directory to move all at once. In this case, we suggest moving the history files to the data directory in batches. The Gratia Probe will handle history files in the data directory in bundles so you do not need to wait for processing to complete before moving the next batch over. This is the initial release of OSG Software Stack for EL9! Notable differences between EL9 and EL7/EL8 include: Frontier Squid 5.8-2.1 If you've already installed frontier-squid-5.8-1.1 on an EL9 host... You will need to uninstall frontier-squid and remove /etc/init.d . If you have any other packages that have files in /etc/init.d , they may also need to be reinstalled and cleaned up in a similar fashion. HTCondor 10.3.0: see upstream documentation for manual update steps. HTCondor-CE 6.0.0: see upstream documentation for manual update steps. Missing packages to be released at a later date: hosted-ce-tools htgettoken osg-update-data","title":"March 30, 2023: EL9 and Gratia Probe 2.8.4 (all operating systems)"},{"location":"release/osg-36/#march-16-2023-osdf-client-6100","text":"OSDF Client 6.10.0 The stashcp client, when run in a terminal, can acquire a new token via OAuth2 if supported by the upstream origin The use of the http_proxy can now be disabled via setting the OSDF_DISABLE_HTTP_PROXY environment variable The OSDF client can now handle HTTP 206 Partial Content responses stash_plugin -get-caches <prefix> will print out the caches to be used for a given prefix","title":"March 16, 2023: OSDF Client 6.10.0"},{"location":"release/osg-36/#march-14-2023-igtf-1119-osg-scitokens-mapfile-12-xcache-340-3","text":"CA certificates based on IGTF 1.119 Updated UKeScience Root (2007) with consistent string encodings (UK) Removed obsolete SHA1 subordinates DigiCertGridTrustCA-Classic and DigiCertGridCA-1-Classic from DigiCert, reflected in RPDNC namespaces Experimental (non-accredited) new InCommon RSA IGTF Server CA 2 (ICA under Sectigo USERTrust RSA root, for which namespaces have been updated) (US) Updated GridCanada CA with re-issued SHA-2 based root (CA) Updated CILogon basic, silver, and openid with re-issued SHA-2 certs (US) Updated UKeScience Root (2007) re-issued with SHA-2, retired 2A ICA (UK) osg-scitokens-mapfile 12 New token for USCMS local pilots XCache 3.4.0-3 Add xrootd-tcp-stats to osdf-cache","title":"March 14, 2023: IGTF 1.119, osg-scitokens-mapfile 12, XCache 3.4.0-3"},{"location":"release/osg-36/#march-9-2023-xrootd-553-12-frontier-squid-57-21-cvmfs-2101","text":"XRootD 5.5.3-1.2 Fix bug where GFAL davs writes fail on EL7 redirectors after eight hours XRootD 5.5.2 was not released because of critical bugs that are fixed in XRootD 5.5.3 frontier-squid-5.7-2.1 Uses first destination IP address that responds ( dns_v4_first removed) Add MAJOR_CVMFS sites cvmfs-stratum-one.cc.kek.jp, cvmfs*.sdcc.bnl.gov Remove obsolete sites frontier*.racf.bnl.gov from ATLAS_FRONTIER Fix bug where old caches may not be cleaned up Note Manual intervention is needed to downgrade frontier-squid . # Downgrade instructions root@host # rpm -e --nodeps frontier-squid root@host # yum install ' frontier-squid < 11 :5 CVMFS 2.10.1 Minor bug fixes and improvements","title":"March 9, 2023: XRootD 5.5.3-1.2, frontier-squid 5.7-2.1, CVMFS 2.10.1"},{"location":"release/osg-36/#march-2-2023-gratia-probe-282-osg-flock-19","text":"gratia-probe 2.8.2 CRITICAL bug fix for sites that have installed gratia-probe-htcondor-ce-2.8.2 or gratia-probe-condor-ap-2.8.2 . After updating to 2.8.2, perform the following manual steps for a CE: # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_ce_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history*.0 . And for an AP: # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history* . osg-flock 1.9 Adds the \"OSPool\" attribute to the job ad based on the EP configuration","title":"March 2, 2023: gratia-probe 2.8.2, osg-flock 1.9"},{"location":"release/osg-36/#february-21-2023-vo-package-v130","text":"VO Package v130 Update DN for voms2.fnal.gov","title":"February 21, 2023: VO Package v130"},{"location":"release/osg-36/#february-16-2023-gratia-probe-281-python-scitokens-174","text":"gratia-probe 2.8.1 gratia-probe-condor-ap: important reporting update for APs with access to multiple pools (e.g., flocking) python-scitokens 1.7.4 Remove aud enforcement from deserialize function","title":"February 16, 2023: gratia-probe 2.8.1, python-scitokens 1.7.4"},{"location":"release/osg-36/#february-7-2023-vo-package-v129","text":"VO Package v129 Update DN for voms1.fnal.gov Note The CA/Browser Forum has changed DN formats again this year. This will affect certificates issued by IGTF CAs.","title":"February 7, 2023: VO Package v129"},{"location":"release/osg-36/#february-2-2023-glideinwms-3101-osg-client-695-htvault-config-114","text":"GlideinWMS 3.10.1 Production release supporting tokens and Python 3 Fix monitoring by including Glidein IDs with SciTokens Add Python module to help with custom scripts Custom setup scripts written in shell should always use the gconfig_* functions introduced in 3.9.6 to read and write glidein configuration. See the release notes for details. osdf-client 6.9.5 Better handling of failures and broken proxies Various token handling bug fixes Add support for specifying token name in URL htvault-config 1.14 Add auditlog option to move the audit log to a different location","title":"February 2, 2023: GlideinWMS 3.10.1, osg-client 6.9.5, htvault-config 1.14"},{"location":"release/osg-36/#january-17-2023-vo-package-v128","text":"VO Package v128 Update HCC, GLOW, and OSG VOMS certificates","title":"January 17, 2023: VO Package v128"},{"location":"release/osg-36/#january-3-2023-htgettoken-116","text":"htgettoken 1.16 Fix httokendecode -H functionality to only attempt to convert a parsed word if it is entirely numeric, not if it just contains one digit At the same time, rewrite this functionality in native bash instead of using grep and sed Add htdestroytoken command Add htdecodetoken symbolic link that points to httokendecode","title":"January 3, 2023: htgettoken 1.16"},{"location":"release/osg-36/#december-22-2022-vo-package-v127","text":"VO Package v127 Update VOMS certificates for DESY VOs (IceCube, Belle, ILC, and others) Rebuild packages and sign with new repository key (no software changes) cigetcert cilogon-openid-ca-cert cvmfs-config-osg cvmfs-gateway javascriptrrd osg-ca-certs-updater osg-ca-scripts osg-system-profiler osg-update-vos python-jwt scitokens-credmon","title":"December 22, 2022: VO Package v127"},{"location":"release/osg-36/#december-8-2022-osg-scitokens-mapfile-11-xrootd-551-cvmfs-2100-glideinwms-396-xcache-330-vault-1121","text":"osg-scitokens-mapfile 11 Support HEPCloud factory XRootD 5.5.1 Fixes critical issue with XRootD FUSE mounts via xrdfs CVMFS 2.10.0 Support for proxy sharding with the new client option CVMFS_PROXY_SHARD={yes|no} Improved use of the kernel page cache resulting in significant client performance improvements in some scenarios Fix for a long-standing open issue regarding the concurrent reading of changing files Support for unpacking container images through Harbor registry proxies in the container conversion tools GlideinWMS 3.9.6 Adds token (and hybrid) support for Clouds (AWS/GCE) XCache 3.3.0 Removed X.509 proxy requirement for an unauthenticated stash-cache instance Vault 1.12.1 Includes a fix to prevent a potential denial of service attack for HA installations","title":"December 8, 2022: osg-scitokens-mapfile 11, XRootD 5.5.1, CVMFS 2.10.0, GlideinWMS 3.9.6, XCache 3.3.0, Vault 1.12.1"},{"location":"release/osg-36/#november-21-2022-vo-package-v126","text":"VO Package v126 Update VOMS certificates for DESY VOs (IceCube, Belle, ILC, and others) Note Any sites supporting \"IceCube\", \"Belle\", or \"ILC\" must update. If not, pilots will not arrive or jobs will have storage access issues.","title":"November 21, 2022: VO Package v126"},{"location":"release/osg-36/#november-3-2022-htcondor-ce-516-osdf-client-692-xrootd-multiuser-212-xcache-323-upcoming-htcondor-9120","text":"HTCondor-CE 5.1.6 HTCondor-CE now uses the C++ Collector plugin for payload job traceability Fix HTCondor-CE mapfiles to be compliant with PCRE2 and HTCondor 9.10.0+ Add support for multiple APEL accounting scaling factors Suppress spurious log message about a missing negotiator Fix crash in HTCondor-CE View osdf-client 6.9.2 (includes stashcp and condor_stash_plugin ) Add support for osdf:// URLs Fix zero-byte file upload error xrootd-multiuser 2.1.2 Fix advertising of files not readable by the \"xrootd\" user XCache 3.2.3 Update XCache systemd overrides for xrootd-multiuser 2.1.2 Upcoming: HTCondor 9.12.0 Provide a mechanism to bootstrap secure authentication within a pool Add the ability to define submit templates Administrators can now extend the help offered by condor_submit Add DAGMan ClassAd attributes to record more information about jobs On Linux, advertise the x86_64 micro-architecture in a slot attribute Added -drain option to condor_off and condor_restart Administrators can now set the shared memory size for Docker jobs Multiple improvements to condor_adstash HAD daemons now use SHA-256 checksums by default","title":"November 3, 2022: HTCondor-CE 5.1.6, osdf-client 6.9.2, xrootd-multiuser 2.1.2, XCache 3.2.3; Upcoming: HTCondor 9.12.0"},{"location":"release/osg-36/#october-13-2022-htcondor-9017","text":"HTCondor 9.0.17 Fix file descriptor leak when schedd fails to launch scheduler universe jobs Fix failure to forward batch grid universe job's refreshed X.509 proxy Fix DAGMan failure when the DONE keyword appeared in the JOB line Fix HTCondor's handling of extremely large UIDs on Linux Fix bug where OAUTH tokens lose their scope and audience upon refresh Support for Apptainer in addition to Singularity","title":"October 13, 2022: HTCondor 9.0.17"},{"location":"release/osg-36/#september-29-2022-xcache-322","text":"XCache 3.2.2 Allow specifying separate export paths for unauthenticated and authenticated origin instances Allow local scitokens.conf additions Fix authfile generation on origins that serve no public data Note XRootD services should be restarted after this update","title":"September 29, 2022: XCache 3.2.2"},{"location":"release/osg-36/#september-16-2022-vo-package-v125","text":"VO Package v125 DN changes for Gluex VO","title":"September 16, 2022: VO Package v125"},{"location":"release/osg-36/#september-15-2022-xrootd-550-stashcp-681-osg-token-renewer-083","text":"XRootD 5.5.0 : Multiple new features and bug fixes stashcp 6.8.1 Fix WLCG token discovery Dynamically obtain list of caches based on the source file's namespace osg-token-renewer 0.8.3 Doesn't check for password file when --pw-fd is being used","title":"September 15, 2022: XRootD 5.5.0, stashcp 6.8.1, osg-token-renewer 0.8.3"},{"location":"release/osg-36/#september-9-2022-vo-package-v124","text":"VO Package v124 Add voms1.slac.stanford.edu VOMS server for LSST and SuperCDMS","title":"September 9, 2022: VO Package v124"},{"location":"release/osg-36/#august-30-2022-vo-package-v123-igtf-1117","text":"VO Package v123 Update Virgo DNs CA certificates based on IGTF 1.117 Add new intermediate ICA DigiCert Grid-TLS (US) Add new intermediate ICA DigiCert Grid-Client-RSA2048-SHA256-2022-CA1 (US) Removed discontinued NCSA-slcs-2013 following end of XSEDE (US) Removed discontinued PSC-Myproxy-CA following end of XSEDE (US)","title":"August 30, 2022: VO Package v123, IGTF 1.117"},{"location":"release/osg-36/#august-25-2022-gratia-probe-271-htcondor-9110","text":"gratia-probe 2.7.1 Fix condor-ap probe bugs in resource name detection Upcoming: HTCondor 9.11.0 Modified GPU attributes to support the new require_gpus submit command Add PREEMPT_IF_DISK_EXCEEDED and HOLD_IF_DISK_EXCEEDED configuration templates ADVERTISE authorization levels now also provide READ authorization Periodic release expressions no longer apply to manually held jobs If a #! interpreter doesn't exist, a proper hold and log message appears Can now set the Singularity target directory with container_target_dir If SciToken and X.509 available, uses SciToken for arc job authentication Singularity now mounts /tmp and /var/tmp under the scratch directory Fix bug where Singularity jobs go on hold at the first checkpoint Report resources provisioned by the Slurm batch scheduler when available Fix bug where gridmanager deletes the X.509 proxy file instead of the copy Fixes jobs going on hold in the HTCondor-CE with the following message: HoldReason:Failed to get expiration time of proxy: unable to read proxy file","title":"August 25, 2022: gratia-probe 2.7.1, HTCondor 9.11.0"},{"location":"release/osg-36/#august-18-2022-htcondor-9016-xrootd-monitoring-shoveler-112","text":"HTCondor 9.0.16 Singularity now mounts /tmp and /var/tmp under the scratch directory Fix bug where Singularity jobs go on hold at the first checkpoint Fix bug where gridmanager deletes the X.509 proxy file instead of the copy Fixes jobs going on hold in the HTCondor-CE with the following message: HoldReason:Failed to get expiration time of proxy: unable to read proxy file xrootd-monitoring-shoveler 1.1.2 Fix bug that affects those using the auto-renewal of tokens","title":"August 18, 2022: HTCondor 9.0.16, xrootd-monitoring-shoveler 1.1.2"},{"location":"release/osg-36/#july-28-2022-gratia-probe-270-blahp-221-htcondor-9015-cvmfs-293","text":"gratia-probe 2.7.0 Fix issue with accounting for whole node Slurm pilots by reporting allocated CPUs if available Fix broken dcache-transfer probe Improve mechanism to extract Resource Name Add back missing gratia-probe-services package blahp 2.2.1 Report allocated CPUs of Slurm jobs in status result Disable email notifications for blahp->condor jobs HTCondor 9.0.15 Report resources provisioned by the Slurm batch scheduler when available SciToken mapping failures are now recorded in the HTCondor daemon logs Fix bug that stopped file transfers when output and error are the same Ensure that the Python bindings version matches the installed HTCondor $(OPSYSANDVER) now expand properly in job transforms Fix bug where context managed Python htcondor.SecMan sessions crash Fix bug where remote CPU times would rarely be set to zero CVMFS 2.9.3 Bug fix for a type of client crash Bug fix for server garbage collection unreleased locks osg-xrootd 3.6-18 Enable VOMS support in authenticated stash caches and origins Add ability to turn off VOMS support via environment variable XRootD 5.4.3-1.2 Improve logging for xrootd-scitokens htgettoken 1.15 Improve support for vault service using round-robin DNS Upcoming: HTCondor 9.10.1 ActivationSetupDuration is now correct for jobs that checkpoint With collector administrator access, can manage HTCondor pool daemons SciTokens can now be used for authentication with ARC CE servers Prevent negative values when using huge files with file transfer plugins","title":"July 28, 2022: gratia-probe 2.7.0, blahp 2.2.1, HTCondor 9.0.15, CVMFS 2.9.3"},{"location":"release/osg-36/#june-16-2022-htcondor-ce-515-gratia-probe-261-glideinwms-395-htcondor-9013","text":"HTCondor-CE 5.1.5 Fix whole node job glidein CPUs and GPUs expressions that caused held jobs Fix bug where default CERequirements were being ignored Pass whole node request from GlideinWMS to the batch system Rename AuthToken attributes in the routed job to better support accounting Prevent GSI environment from pointing the job to the wrong certificates Fix issue where HTCondor-CE would need port 9618 open to start up gratia-probe 2.6.1 Log schedd cron errors with newer versions of HTCondor Replace AuthToken* references with routed job attributes Remove certinfo flie log messages Fix crash on send failure GlideinWMS 3.9.5 Support for Apptainer Support for CVMFS on-demand Configurable idtokens lifetime Improved frontend logging Improved default SHARED_PORT configuration Special handling of multiline condor config values Several bug fixes HTCondor 9.0.13 : Bug fix release Schedd and startd cron jobs can now log output upon non-zero exit condor_config_val now produces correct syntax for multi-line values The condor_run tool now reports submit errors and warnings to the terminal Fix issue where Kerberos authentication would fail within DAGMan Fix HTCondor startup failure with certain complex network configurations VO Package v122 Add new sPHENIX and EIC VO certificates XCache 3.1.0 Fixed library dependency issues for xcache-reporter Add systemd overrides for xrootd-privileged XRootD 5.4.3 : Bug fix release stashcp 6.7.5 Adds multi-file transfer and improved error messages Relax download timeouts for file transfer plugin Multiple bug fixes htvault-config 1.13 Removes support for old style secret storage; requires htgettoken >= 1.7 htgettoken 1.12 Avoids crash when verbose output includes UTF-8 osg-pki-tools 3.5.2 Bug fix for osg-incommon-cert-request when using host file osg-token-renewer 0.8.2 Use oidc-agent's built-in password file option Ensure tokens are renewed more frequently than their lifespan rrdtool 1.8.0-1.2.el7 Make Python RRDtools available to GlideinWMS xrootd-multiuser 2.0.4 Fix crash on Enterprise Linux 8 osg-release 3.6-5: Add osg-next yum repository Upcoming HTCondor 9.9.1 A new authentication method for remote HTCondor administration Several changes to improve the security of connections Fix issue where DAGMan direct submission failed when using Kerberos The submission method is now recorded in the job ClassAd Singularity jobs can now pull from Docker style repositories The OWNER authorization level has been folded into the ADMINISTRATOR level","title":"June 16, 2022: HTCondor-CE 5.1.5, gratia-probe 2.6.1, GlideinWMS 3.9.5, HTCondor 9.0.13"},{"location":"release/osg-36/#may-24-2022-vo-package-v121","text":"VO Package v121 Add new VO certificates for CLAS12 and EIC","title":"May 24, 2022: VO Package v121"},{"location":"release/osg-36/#may-11-2022-osg-worker-node-client-and-tarballs","text":"OSG worker node client 3.6-5 Add in missing stashcp and voms-client-cpp packages Warning The current OASIS tarball link now points to the OSG 3.6 tarball. Packages no longer available in the OSG worker node tarball include: fts-client (was present in EL7 only) MyProxy GridFTP clients (e.g. globus-url-copy) UberFTP SRM and GridFTP plugins for GFAL2 GSISSH client","title":"May 11, 2022: OSG Worker Node Client and Tarballs"},{"location":"release/osg-36/#may-5-2022-htcondor-9012-xcache-301-gratia-probe-252","text":"HTCondor 9.0.12 : Bug fix release XCache 3.0.1 Fixed library dependency issues for xcache-reporter gratia-probe 2.5.2 Remove pre-routed jobs instead of quarantining them Always set MapUnknownToGroup osg-flock 1.8 Remove MapUnknownToGroup and MapGroupToRole from osg-flock Advertise osg-flock version in the osg-flock RPM","title":"May 5, 2022: HTCondor 9.0.12, XCache 3.0.1, gratia-probe 2.5.2"},{"location":"release/osg-36/#april-26-2022-cvmfs-292-upcoming-htcondor-981","text":"CA certificates based on IGTF 1.116 Updated intermediate CERN Grid CA ICA with extended validity (CERN) CVMFS 2.9.2 : Bug fix release cigetcert 1.20: works better with CILogon's AWS infrastructure osg-ce 3.6-5 Add OSG_SERIES = 3.6 as a schedd attribute Remove default BATCH_GAHP configuration now provided by upstream osg-pki-tools 3.5.1: Python 3 fixes for osg-incommon-cert-request osg-xrootd 3.6-16 Fix stash-cache: enabling VOMS causes unauth cache to crash vault 1.10, htvault-config 1.12 htgettoken 1.11 Update from upstream software and change httokendecode to also verify tokens if scitokens-verify is present VOMS 2: Update default proxy certificate key length to 2048 bits Upcoming: HTCondor 9.8.1 Support for Heterogeneous GPUs, some configuration required Allow HTCondor to use grid sites requiring multi-factor authentication Technology preview: bring your own resources from HPC clusters Fix HTCondor startup failure with complex network configurations","title":"April 26, 2022: CVMFS 2.9.2, Upcoming: HTCondor 9.8.1"},{"location":"release/osg-36/#april-14-2022-osg-configure-411-osg-scitokens-mapfile-8","text":"OSG-Configure 4.1.1 Fix gratia DataFolder/PER_JOB_HISTORY_DIR check for HTCondor-CE with an HTCondor batch system osg-scitokens-mapfile 8 New token mappings for CMS local and USCMS local pilots. New token mappings for HCC pilots","title":"April 14, 2022: osg-configure 4.1.1, osg-scitokens-mapfile 8"},{"location":"release/osg-36/#march-31-2022-igtf-1115","text":"This release contains updated CA Certificates based on IGTF 1.115 Removed obsolete CNRS2 CAs, superseded by AC-GRID-FR hierarchy (FR) Add supplementary BCDR download location for UGRID-G2 CRL (UA) Extended validity period of HPCI CA (JP)","title":"March 31, 2022: IGTF 1.115"},{"location":"release/osg-36/#march-24-2022-xrootd-542-11-htcondor-9011-stashcp-660","text":"XRootD 5.4.2 plus OSG patches Add support for VOMS mapfiles Fix DN hashing for HTTPS transfers Add new throttling for max open files and active connections per entity HTCondor 9.0.11 The Job Router can now create an IDTOKEN for use by the job Fix bug where a self-checkpointing job may erroneously be held Fix bug where the Job Router erroneously substitutes a default value Fix bug where a file transfer error may identify the wrong file Fix bug where condor_ssh_to_job may fail to connect stashcp 6.6.0 Rewritten in Go New features HTCondor File Transfer plugin interface Progress bar on interactive terminals Recursive downloads WLCG token discovery python-scitokens 1.7.0 osg-token-renewer 0.8.1: Add support for manual client registration xrootd-monitoring-shoveler 1.0.0 : Initial release vault 1.9.3 htgettoken 1.10 Upcoming HTCondor 9.7.0 Support environment variables, application elements in ARC REST jobs Container universe supports Singularity jobs with hard-coded command DAGMan submits jobs directly (does not shell out to condor_submit) Meaningful error message and sub-code for file transfer failures Add file transfer statistics for file transfer plugins Add named list policy knobs for SYSTEM_PERIODIC_ policies","title":"March 24, 2022: XRootD 5.4.2-1.1, HTCondor 9.0.11, stashcp 6.6.0"},{"location":"release/osg-36/#march-15-2022-high-priority-release","text":"HTCondor 9.0.10 and 9.6.0 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: HTCONDOR-2022-0001 HTCONDOR-2022-0002 HTCONDOR-2022-0003","title":"March 15, 2022: High Priority Release"},{"location":"release/osg-36/#march-3-2022-xrootd-541-and-glideinwms-394","text":"XRootD 5.4.1 : Bug fix release osg-xrootd 3.6-15 GlideinWMS 3.9.4 Add flexible mount points for CVMFS in the Glideins (not always /cvmfs) Per-Entry IDTOKENS Support per-group SciTokens Frontend and Factory check the expiration of SciTokens, other JWT tokens Bug Fixes: IDTOKEN issuer changed from collector host to trust domain X.509 proxy is now renewed also when using also tokens shared port is now the default in the User (VO) Collector HTCondor","title":"March 3, 2022: XRootD 5.4.1 and GlideinWMS 3.9.4"},{"location":"release/osg-36/#february-17-2022-vo-package-v120","text":"VO Package v120 Update FNAL voms1 DN","title":"February 17, 2022: VO Package v120"},{"location":"release/osg-36/#february-10-2022-htcondor-909-lts","text":"HTCondor 9.0.9 LTS The OAUTH credmon is now available on Enterprise Linux 8 Deprecated C-style comments no longer cause the job router to crash VO Package v119 Update OSG VO and GLOW VO DNs hosted-ce-tools 0.9: new for Enterprise Linux 8 scitokens-credmon 0.8.1: new for Enterprise Linux 8","title":"February 10, 2022: HTCondor 9.0.9 LTS"},{"location":"release/osg-36/#february-3-2022-gratia-probe-251","text":"gratia-probe 2.5.1 Fix a bug that prevented record generation for HTCondor batch systems. Manual intervention required; see this documentation for details. Fix ownership of the record quarantine directory osg-flock 1.7 Fixed capitalization of the OSG VO in the default accounting configuration (access point admins that have already updated to osg-flock-1.6 should change VOOverride=\"OSG\" to VOOverride=\"osg\" in /etc/gratia/condor-ap/ProbeConfig) Dropped configuration required for old versions of HTCondor VO Package v118 Update FNAL voms2 DN","title":"February 3, 2022: Gratia Probe 2.5.1"},{"location":"release/osg-36/#january-27-2022-vo-package-v117-and-osg-scitokens-mapfile-5","text":"VO Package v117 Update GlueX DN Update hcc-voms2 DN Add ATLAS IAM vomses entry osg-scitokens-mapfile 5 Add default SciTokens mappings for the FNAL VOs","title":"January 27, 2022: VO Package v117 and OSG SciTokens mapfile 5"},{"location":"release/osg-36/#january-20-2022-cvmfs-290-and-htcondor-513-updates","text":"CA Certificates based on IGTF 1.114 Extend validity for SlovakGrid issuing CA (SK) Remove expired Let's Encrypt ROOT CA X3 and X4 CVMFS 2.9.0 Incremental conversion of container images, resulting in a large speed-up for publishing new container image versions to unpacked.cern.ch Support for maintaining repositories in S3 over HTTPS (not just HTTP) Significant speed-ups for S3 and gateway publisher deployments Various bugfixes and smaller improvements (error reporting, repository statistics, network failure handling, etc.) HTCondor-CE 5.1.3 The HTCondor-CE central collector requires SSL credentials from client CEs Fix BDII crash if an HTCondor Access Point is not available Fix formatting of APEL records that contain huge values HTCondor-CE client mapfiles are not installed on the central collector osg-xrootd 3.6-12 Fix default location for grid-mapfile when using HTTP/WebDAV transfer Fix monitoring of writes osg-ce 3.6-4 Release the osg-ce-bosco package","title":"January 20, 2022: CVMFS 2.9.0 and HTCondor 5.1.3 updates"},{"location":"release/osg-36/#january-13-2022-xrootd-540-and-vault-updates","text":"XRootD 5.4.0 : New feature release Fix problem interacting with version 5.1 or 5.2 origin servers xrootd-tcp-stats 1.0.0: Initial release of TCP statistics plugin vault 1.9.0, htvault-config 1.11, htgettoken 1.9 upgrade to latest vault add support for ssh-agent authentication VO Package v116 Add second Belle2 VOMS server oidc-agent 4.2.4 Upgrade to new major version from version 3.3.3 NOTE: oidc-agent must be restarted after upgrade osg-scitokens-mapfile 4 Add default ATLAS token mappings osg-pki-tools 3.5.0-2: Upgrade to Python 3","title":"January 13, 2022: XRootD 5.4.0 and Vault updates"},{"location":"release/osg-36/#december-9-2021-xrootd-and-htcondor-updates","text":"Problem interoperating with older origin servers If an XRootD 5.3.4 cache interacts with a 5.1 or 5.2 origin and there is an asyncio error, it may crash the origin. Please upgrade your origin at your earliest convenience. You may turn off asyncio ( async off ) on either end to avoid the problem. XRootD 5.3.4 Fix uncorrectable checksum errors in XCache Origins HTCondor 9.0.8 LTS X.509 proxy delegation now works in OSG 3.6 Fix bug where huge values of ImageSize and others would end up negative Fix bug in how MAX_JOBS_PER_OWNER applied to late materialization jobs Fix bug where the schedd could choose a slot with insufficient disk space Fix crash in ClassAd substr() function when the offset is out of range Fix bug in Kerberos code that can crash on macOS and could leak memory Fix bug where a job is ignored for 20 minutes if the startd claim fails","title":"December 9, 2021: XRootD and HTCondor updates"},{"location":"release/osg-36/#december-1-2021-initial-xrootd-release","text":"XRootD 5.3.2 Initial release of XRootD in OSG 3.6 XCache 3.0.0 Initial release of XCache in OSG 3.6 HTCondor 9.0.7 : Bug fix release Fix bug where condor_gpu_discovery could crash with older CUDA libraries Fix bug where condor_watch_q would fail on machines with older kernels condor_watch_q no longer has a limit on the number of job event log files Fix bug where a startd could crash claiming a slot with p-slot preemption Fix bug where a job start would not be recorded when a shadow reconnects VO Package v115 Add CMS IAM vomses entry Update WLCG VO certificate GlideinWMS 3.9.3 Type validation support to the check_python3_expr.py script Drops the encondingSupport.py module and its unit tests Fixes an encoding problem affecting cloud submissions Pegasus 5.0.1 First OSG release of the Pegasus 5 series Upcoming HTCondor 9.3.0 File transfer plugin sample code to aid in developing new plugins Add generic knob to set the slot user for all slots","title":"December 1, 2021: Initial XRootD release"},{"location":"release/osg-36/#november-11-2021-osg-flock-and-gratia-probes","text":"osg-flock 1.6-3 Update probe configuration to support Open Science Pool Overhaul configuration for HTCondor 9.0 gratia-probe 2.3.3 Add gratia-probe-condor-ap for user job accounting of HTCondor Access Points Drop unused XRootD transfer probes Fix default HTCondor-CE probe directory configurations and ownership","title":"November 11, 2021: osg-flock and gratia-probes"},{"location":"release/osg-36/#october-13-2021-initial-osg-token-renewer-release","text":"Initial release of the osg-token-renewer : a service to manage automatic renewal of bearer tokens from OIDC providers (e.g., CILogon, IAM), intended for use by VO managers blahp 2.1.3 : Bug fix release Include the more efficient LSF status script Fix status caching on EL7 for PBS, Slurm, and LSF","title":"October 13, 2021: Initial osg-token-renewer release"},{"location":"release/osg-36/#october-5-2021-igtf-1113","text":"This release contains updated CA Certificates based on IGTF 1.113 Suspended MD-GRID CA due to network resolution issues (MD)","title":"October 5, 2021: IGTF 1.113"},{"location":"release/osg-36/#september-30-2021-urgent-lets-encrypt-ca-certificate-update","text":"Please update osg-ca-certs as soon as possible. Applications and tools using OpenSSL such as wget, HTCondor, and XRootD, will to fail to establish TLS/HTTPS connections to servers using Let's Encrypt certificates with a \"certificate has expired\" message. This release of OSG 3.6 contains the following packages: osg-ca-certs 1.99 : Remove expired Let's Encrypt CA certificate osg-wn-client: Fix installation issue causes by EPEL's gfal2 update CVMFS 2.8.2 : Bug fix release cvmfs-x509-helper 2.2-2: Fix a number of issues with SciTokens support HTCondor 9.0.6 CUDA_VISIBLE_DEVICES can now contain GPU- formatted values Fix a bug that caused jobs to fail when using Singularity versions > 3.7 Fix bugs relating to the transfer of standard output and error logs vault 1.8.2, htvault-config 1.6, htgettoken 1.6: Minor improvements Upcoming HTCondor 9.2.0 Add DAGMan SERVICE node, used to monitor or report on DAG workflow Fix problem where proxy delegation to HTCondor versions < 9.1.3 failed Jobs are now re-run if the execute directory unexpectedly disappears HTCondor counts the number of files transferred at the submit node Fix a bug that caused jobs to fail when using Singularity versions > 3.7","title":"September 30, 2021: Urgent Let's Encrypt CA certificate update"},{"location":"release/osg-36/#september-23-2021-htcondor-ce-512","text":"This release of OSG 3.6 contains the following packages: HTCondor-CE 5.1.2 Fixed the default memory and CPU requests when using job router transforms Apply default MaxJobs and MaxJobsIdle when using job router transforms Improved SciTokens support in submission tools Fixed --debug flag in condor_ce_run Update configuration verification script to handle job router transforms Corrected ownership of the HTCondor PER_JOBS_HISTORY_DIR Fix bug passing maximum wall time requests to the local batch system","title":"September 23, 2021: HTCondor-CE 5.1.2"},{"location":"release/osg-36/#september-9-2021-htcondor-905-and-blahp-211","text":"This release of OSG 3.6 contains the following packages: HTCondor 9.0.5 : Bug fix release Other authentication methods are tried if mapping fails using SciTokens Fix rare crashes from successful condor_submit , which caused DAGMan issues Fix bug where ExitCode attribute would be suppressed when OnExitHold fired condor_who now suppresses spurious warnings coming from netstat The online manual now has detailed instructions for installing on MacOS Fix bug where misconfigured MIG devices would cause no GPUs to be detected The transfer_checkpoint_file list may now include input files blahp 2.1.1 : Bug fix release Add Python 2 support back for Enterprise Linux 7 Allow the user to override system configuration files Enable flexible configuration via a configuration directory Fix Slurm resource usage reporting","title":"September 9, 2021: HTCondor 9.0.5 and blahp 2.1.1"},{"location":"release/osg-36/#august-16-2021-igtf-1112","text":"This release contains updated CA Certificates based on IGTF 1.112 Updated ANSPGrid CA with extended validity date (BR)","title":"August 16, 2021: IGTF 1.112"},{"location":"release/osg-36/#august-12-2021-gratia-probes-210","text":"Gratia probes 2.1.0 Fix a problem that caused a traceback message in the condor_meter Fix a traceback caused by missing LogLevel in ProbeConfig Ensure that Gratia accounts for SciTokens-based pilots","title":"August 12, 2021: Gratia probes 2.1.0"},{"location":"release/osg-36/#august-5-2021-voms-update-htvault-config-14-htgettoken-13","text":"VOMS 2.0.16-1.2 (EL7) and VOMS 2.1.0-0.14.rc2.2 (EL8) Add IAM and TLS SNI support htvault-config 1.4 and htgettoken 1.3 Improved security through more fine-grained vault tokens and detailed logging Miscellaneous improvements","title":"August 5, 2021: VOMS Update, htvault-config 1.4, htgettoken 1.3"},{"location":"release/osg-36/#july-30-2021-high-priority-release","text":"HTCondor 9.0.4 and 9.1.2 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: HTCONDOR-2021-0003 HTCONDOR-2021-0004","title":"July 30, 2021: High Priority Release"},{"location":"release/osg-36/#july-27-2021-high-priority-release","text":"HTCondor 9.0.3 and 9.1.1 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: Unfortunately, these releases did not fully mitigate the vulnerability described in HTCONDOR-2021-0003 HTCONDOR-2021-0003 HTCONDOR-2021-0004","title":"July 27, 2021: High Priority Release"},{"location":"release/osg-36/#july-22-2021-htcondor-902-and-blahp-210","text":"This release of OSG 3.6 contains the following packages: HTCondor 9.0.2-1.1 : Bug fix release HTCondor can be setup to use only FIPS 140-2 approved security functions If the Singularity test fails, the job returns to the idle state Can divide GPU memory, when making multiple GPU entries for a single GPU Startd and Schedd cron job maximum line length increased to 64k bytes Added first class submit keywords for SciTokens Fixed MUNGE authentication blahp 2.1.0 : Bug fix release Fix bug where GPU request was not passed onto the batch script Fix issue where proxy symlinks were not cleaned up by not creating them Fix bug where output files are overwritten if no transfer output remap Added support for passing in extra submit arguments from the job ad","title":"July 22, 2021: HTCondor 9.0.2 and blahp 2.1.0"},{"location":"release/osg-36/#july-15-2021-vo-package-v114","text":"This release contains an updated VO Package with the following changes: Fix typo in CLAS12 and EIC VOMS certificate issuers Add LSC files for CERN VO IAM endpoints","title":"July 15, 2021: VO Package v114"},{"location":"release/osg-36/#july-1-2021-frontier-squid-415-21-vault-173-upcoming-htcondor-910","text":"This release of OSG 3.6 contains the following packages: Frontier Squid 4.15-2.1 : Fix log rotation when not compressing Vault 1.7.3 : Bug fix release htvault-config 1.2: Updated to match vault 1.7.3 Upcoming HTCondor 9.1.0 : Start of next feature series","title":"July 1, 2021: Frontier Squid 4.15-2.1, Vault 1.7.3, Upcoming: HTCondor 9.1.0"},{"location":"release/osg-36/#june-24-2021-htcondor-901-htcondor-ce-511","text":"This release of OSG 3.6 contains the following packages: HTCondor 9.0.1-1.2 : Bug fix release Fix problem where X.509 proxy refresh kills job when using AES encryption Fix problem when jobs require a different machine after a failure Fix problem where a job matched a machine it can't use, delaying job start Fix exit code and retry checking when a job exits because of a signal Fix a memory leak in the job router when a job is removed via job policy Fixed the back-end support for the bosco_cluster --add command HTCondor-CE 5.1.1 Improve restart time of HTCondor-CE View Fix bug that caused HTCondor-CE to ignore incoming BatchRuntime requests Fixed error that occurred during RPM installation of non-HTCondor batch systems regarding missing file batch_gahp","title":"June 24, 2021: HTCondor 9.0.1, HTCondor-CE 5.1.1"},{"location":"release/osg-36/#june-16-2021-vo-package-v113","text":"This release contains an updated VO Package with the following changes: Added new CLAS12 and EIC VO certificates Retired old CLAS12 and EIC VO certificates","title":"June 16, 2021: VO Package v113"},{"location":"release/osg-36/#june-3-2021-vault-security-update-and-gratia-probes","text":"This release of OSG 3.6 contains the following packages: gratia-probe 1.23.3: Fix problem that could cause pilot hours to be zero for non-HTCondor batch systems vault 1.7.2 : Security update; fixes CVE-2021-32923. (OSG configuration not vulnerable)","title":"June 3, 2021: Vault security update and gratia probes"},{"location":"release/osg-36/#may-25-2021-igtf-1111","text":"This release contains updated CA Certificates based on IGTF 1.111 Removed discontinued NERSC-SLCS CA (US) Removed discontinued MYIFAM CA (MY)","title":"May 25, 2021: IGTF 1.111"},{"location":"release/osg-36/#may-17-2021-htcondor-ce-510-and-htcondor-900","text":"This release of OSG 3.6 contains the following packages: HTCondor 9.0.0-1.5 : Major new release with enhanced security Blahp 2.0.2 : GPU Support, Converted to Python 3 HTCondor-CE 5.1.0 Support for Job Router Transform configuration syntax Credential mapping changes Converted to Python 3 osg-scitokens-mapfile 3: Updated to support HTCondor-CE 5.1.0 osg-ce: now requires osg-scitokens-mapfile vault 1.7.1: Update to latest upstream release htvault-config 1.1: Uses yaml configuration files htgettoken 1.2: improved error message handling and bug fixes","title":"May 17, 2021: HTCondor-CE 5.1.0 and HTCondor 9.0.0"},{"location":"release/osg-36/#may-13-2021-high-priority-release","text":"This release of OSG 3.6 contains the following packages: Frontier Squid 4.15-1.2 : Closes multiple security vulnerabilities Updated CA certificates based on IGTF 1.110 osg-ca-certs 1.96 : Fixed Let's Encrypt signing policy to accept cross-signing chain","title":"May 13, 2021: High Priority Release"},{"location":"release/osg-36/#april-22-2021-cvmfs-281","text":"This release of OSG 3.6 contains the following packages: CVMFS 2.8.1 : Bug fix release gratia-probe 1.23.2 : Converted to use Python 3","title":"April 22, 2021: CVMFS 2.8.1"},{"location":"release/osg-36/#march-25-2021-htcondor-8911-patches","text":"This release of OSG 3.6 contains the following packages: HTCondor 8.9.11-1.4 (EL7 only) Fixes a potential SchedD crash when using malformed tokens condor_watch_q now works on DAGs vo-client-110-1 with updated WeNMR VOMS information Additionally, the following packages that were already available in OSG 3.6 for EL7 were released for EL8: osg-scitokens-mapfile-1-1 containing a new HTCondor-CE mapfile for VO token issuers vault-1.6.2-1 and htvault-config-0.5-1 for managing tokens cvmfs-gateway-1.2.0-1 : note the upstream documentation for updating from version 0.2.5","title":"March 25, 2021: HTCondor 8.9.11 patches"},{"location":"release/osg-36/#february-26-2021-36-released","text":"Where are GlideinWMS and XRootD? XRootD and GlideinWMS are both absent in the initial OSG 3.6 release: we expect major version updates that may require manual intervention for both of these packages so we are holding their initial releases in this series until they are ready. This initial release of the OSG 3.6 release series is based on the packages available in OSG 3.5.31. One of the major changes in this release series is the shift to token-based authentication from GSI proxy-based authentication. Here is a list of the differences in this initial release: GridFTP, GSI, and Hadoop are no longer available Added packages to support token-based authentication HTCondor 8.9.11 : initial token support (8.9.12, which will contain default configuration using tokens, was delayed) HTCondor-CE 5.0.0: support for Python 3 Gratia Probe 2.0.0 : replace all batch system probes with the non-root HTCondor-CE probe OSG-Configure 4.0.0 : Deprecated RSV Dropped unused configuration modules and attributes Reorganized some configuration (see update instructions for more details) In addition, we have updated our Software Release Policy to follow a rolling release model. Finally, our Docker image releases will more closely track our OSG 3.6 repositories.","title":"February 26, 2021: 3.6 Released"},{"location":"release/osg-36/#announcements","text":"Updates to critical packages also announced by email and are sent to the following recipients and lists: Registered administrative contacts osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org site-announce@opensciencegrid.org software-discuss@osg-htc.org","title":"Announcements"},{"location":"release/release_series/","text":"Release Series \u00b6 OSG Software releases are organized into release series, with the intent that software updates within a series do will not take require manual configuration updates, cause significant downtime, or break dependent software. When a new series is released, it is an opportunity for the OSG Software Team to add major new software packages, make substantial updates to existing packages, and remove obsolete packages. When a new series is initially released, most packages are identical to the previous release, but two adjacent series will diverge over time. Support Policy \u00b6 The OSG Software Team supports at most two concurrent release series, current and previous , where the goal is to begin a new release series about every 12 months. Once a new series starts, the Software Team will support the previous series until the next release series and will announce its end-of-life date at least 6 months in advance. When support ends for a release series, it means that the Software Team no longer updates the software, fixes issues, or troubleshoots installations for releases within the series. The plan is to maintain interoperability between supported series, but there is no guarantee that unsupported series will continue to function. Files for release series older than current or previous will be removed from the OSG Software repositories no earlier than when support ends for the previous release. For example, files for OSG 3.2 were not removed until May 2018, when support ended for OSG 3.3 in May 2018. Series Overviews \u00b6 Since the start of the RPM-based OSG Software Stack, we have offered the following release series: OSG 23 (started October 2023) aligns the OSG release series and HTCondor Software Suite release cycles. The initial release includes GlideinWMS 3.10.5, HTCondor 23.0, HTCondor-CE 23.0, and XRootD 5.6.2 OSG 3.6 (started February 2021) overhauls the authentication and data transfer protocols used in the OSG software stack: bearer tokens, such as SciTokens or WLCG tokens, are used for authentication instead of GSI proxies and HTTP is used for data transfer instead of GridFTP. See the OSG GridFTP and GSI migration plan for more details. To support these new protocols, OSG 3.6 includes HTCondor 8.9, HTCondor-CE 5, and will include XRootD 5.1. OSG 3.5 started in August 2019 and was end-of-lifed in May 2022. The main differences between it and 3.4 were the introduction of the HTCondor 8.8 and 8.9 series; also the RSV monitoring probes, EL6 support, and CREAM support were all dropped. OSG 3.4 started June 2017 and was end-of-lifed in November 2020. The main differences between it and 3.3 are the removal of edg-mkgridmap, GUMS, BeStMan, and VOMS Admin Server packages. OSG 3.3 started in August 2015 and was end-of-lifed in May 2018. While the files have not been removed, it is strongly recommended that it not be installed anymore. The main differences between 3.3 and 3.2 are the dropping of EL5 support, the addition of EL7 support, and the dropping of Globus GRAM support. OSG 3.2 started in November 2013, and was end-of-lifed in August 2016. The main differences between it and 3.1 were the introduction of glideinWMS 3.2, HTCondor 8.0, and Hadoop/HDFS 2.0; also the gLite CE Monitor system was dropped in favor of osg-info-services. OSG 3.1 started in April 2012, and was end-of-lifed in April 2015. Historically, there were 3.0.x releases as well, but there was no separate release series for 3.0 and 3.1; we simply went from 3.0.10 to 3.1.0 in the same repositories. Series Life-cycle \u00b6 Support ends at the end of the month of the following dates unless otherwise specified: Release Series Initial Release End of Regular Support End of Critical Bug/Security Support 23 October 2023 Not set Not set 3.6 Februrary 2021 31 March 2024 30 June 2024 3.5 August 2019 30 August 2021 1 May 2022 3.4 June 2017 29 February 2020 30 November 2020 3.3 August 2015 31 December 2017 31 May 2018 3.2 November 2013 29 February 2016 31 August 2016 3.1 April 2012 31 October 2014 30 April 2015 Installing an OSG Release Series \u00b6 See the yum repositories document for instructions on installing the OSG repositories. References \u00b6 Yum repositories Basic use of Yum","title":"Release Series"},{"location":"release/release_series/#release-series","text":"OSG Software releases are organized into release series, with the intent that software updates within a series do will not take require manual configuration updates, cause significant downtime, or break dependent software. When a new series is released, it is an opportunity for the OSG Software Team to add major new software packages, make substantial updates to existing packages, and remove obsolete packages. When a new series is initially released, most packages are identical to the previous release, but two adjacent series will diverge over time.","title":"Release Series"},{"location":"release/release_series/#support-policy","text":"The OSG Software Team supports at most two concurrent release series, current and previous , where the goal is to begin a new release series about every 12 months. Once a new series starts, the Software Team will support the previous series until the next release series and will announce its end-of-life date at least 6 months in advance. When support ends for a release series, it means that the Software Team no longer updates the software, fixes issues, or troubleshoots installations for releases within the series. The plan is to maintain interoperability between supported series, but there is no guarantee that unsupported series will continue to function. Files for release series older than current or previous will be removed from the OSG Software repositories no earlier than when support ends for the previous release. For example, files for OSG 3.2 were not removed until May 2018, when support ended for OSG 3.3 in May 2018.","title":"Support Policy"},{"location":"release/release_series/#series-overviews","text":"Since the start of the RPM-based OSG Software Stack, we have offered the following release series: OSG 23 (started October 2023) aligns the OSG release series and HTCondor Software Suite release cycles. The initial release includes GlideinWMS 3.10.5, HTCondor 23.0, HTCondor-CE 23.0, and XRootD 5.6.2 OSG 3.6 (started February 2021) overhauls the authentication and data transfer protocols used in the OSG software stack: bearer tokens, such as SciTokens or WLCG tokens, are used for authentication instead of GSI proxies and HTTP is used for data transfer instead of GridFTP. See the OSG GridFTP and GSI migration plan for more details. To support these new protocols, OSG 3.6 includes HTCondor 8.9, HTCondor-CE 5, and will include XRootD 5.1. OSG 3.5 started in August 2019 and was end-of-lifed in May 2022. The main differences between it and 3.4 were the introduction of the HTCondor 8.8 and 8.9 series; also the RSV monitoring probes, EL6 support, and CREAM support were all dropped. OSG 3.4 started June 2017 and was end-of-lifed in November 2020. The main differences between it and 3.3 are the removal of edg-mkgridmap, GUMS, BeStMan, and VOMS Admin Server packages. OSG 3.3 started in August 2015 and was end-of-lifed in May 2018. While the files have not been removed, it is strongly recommended that it not be installed anymore. The main differences between 3.3 and 3.2 are the dropping of EL5 support, the addition of EL7 support, and the dropping of Globus GRAM support. OSG 3.2 started in November 2013, and was end-of-lifed in August 2016. The main differences between it and 3.1 were the introduction of glideinWMS 3.2, HTCondor 8.0, and Hadoop/HDFS 2.0; also the gLite CE Monitor system was dropped in favor of osg-info-services. OSG 3.1 started in April 2012, and was end-of-lifed in April 2015. Historically, there were 3.0.x releases as well, but there was no separate release series for 3.0 and 3.1; we simply went from 3.0.10 to 3.1.0 in the same repositories.","title":"Series Overviews"},{"location":"release/release_series/#series-life-cycle","text":"Support ends at the end of the month of the following dates unless otherwise specified: Release Series Initial Release End of Regular Support End of Critical Bug/Security Support 23 October 2023 Not set Not set 3.6 Februrary 2021 31 March 2024 30 June 2024 3.5 August 2019 30 August 2021 1 May 2022 3.4 June 2017 29 February 2020 30 November 2020 3.3 August 2015 31 December 2017 31 May 2018 3.2 November 2013 29 February 2016 31 August 2016 3.1 April 2012 31 October 2014 30 April 2015","title":"Series Life-cycle"},{"location":"release/release_series/#installing-an-osg-release-series","text":"See the yum repositories document for instructions on installing the OSG repositories.","title":"Installing an OSG Release Series"},{"location":"release/release_series/#references","text":"Yum repositories Basic use of Yum","title":"References"},{"location":"release/signing/","text":"OSG Release Signing Information \u00b6 Verifying OSG's RPMs \u00b6 We use a GPG key to sign our software packages. Normally yum and rpm transparently use the GPG signatures to verify the packages have not been corrupted and were created by us. You get our GPG public key when you install the osg-release RPM. If you wish to verify one of our RPMs manually, you can run: $ rpm --checksig -v <NAME.RPM> For example: $ rpm --checksig -v globus-core-8.0-2.osg.x86_64.rpm globus-core-8.0-2.osg.x86_64.rpm: Header V3 DSA signature: OK, key ID 824b8603 Header SHA1 digest: OK (2b5af4348c548c27f10e2e47e1ec80500c4f85d7) MD5 digest: OK (d11503a229a1a0e02262034efe0f7e46) V3 DSA signature: OK, key ID 824b8603 The OSG Packaging Signing Keys \u00b6 The OSG Software Team has several GPG keys for signing RPMs; The key used depends on the OSG version and EL variant used, as documented below: Key 1 (3.0 to 3.5) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG Download UW-Madison , GitHub Fingerprint 6459 !D9D2 AAA9 AB67 A251 FB44 2110 !B1C8 824B 8603 Key ID 824b8603 Key 2 (3.6 and on, EL <= 8) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 Download UW-Madison , GitHub Fingerprint 1216 FF68 897A 77EA 222F C961 27DC 6864 96D2 B90F Key ID 96d2b90f Key 4 (3.6 and on, EL >= 9) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 Download GitHub Fingerprint B77E 70A6 0537 1D3B E109 A18E 3170 E150 1887 C61A Key ID 1887c61a OSG 23 Automated Signing Key Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-auto Download GitHub Fingerprint E2AF 9F6E 239F D62B 5377 05C0 1760 EDF6 4D43 84D0 Key ID 4d4384d0 OSG 23 Developer Signing Key Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-developer Download GitHub Fingerprint 4A56 C5BB CDB0 AAA2 DDE9 A690 BDEE E24C 9289 7C00 Key ID 92897c00 Note Some packages in the 3.6 repos may still be signed with the old key; the osg-release RPM contains both keys so you can verify old packages. You can see the fingerprint for yourself. On EL 7 and older (GnuPG < 2.1.13): $ gpg --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG pub 1024D/824B8603 2011-09-15 OSG Software Team (RPM Signing Key for Koji Packages) <vdt-support@opensciencegrid.org> Key fingerprint = 6459 D9D2 AAA9 AB67 A251 FB44 2110 B1C8 824B 8603 sub 2048g/28E5857C 2011-09-15 $ gpg --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 pub 4096R/96D2B90F 2021-02-24 Open Science Grid Software <help@osg-htc.org> Key fingerprint = 1216 FF68 897A 77EA 222F C961 27DC 6864 96D2 B90F sub 4096R/49E9ACC2 2021-02-24 On EL 8 and newer (GnuPG >= 2.1.13): $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG pub dsa1024 2011-09-15 [SC] 6459D9D2AAA9AB67A251FB442110B1C8824B8603 uid OSG Software Team (RPM Signing Key for Koji Packages) <vdt-support@opensciencegrid.org> sub elg2048 2011-09-15 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 pub rsa4096 2021-02-24 [SC] 1216FF68897A77EA222FC96127DC686496D2B90F uid Open Science Grid Software <help@osg-htc.org> sub rsa4096 2021-02-24 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 pub rsa4096 2022-12-28 [SC] B77E70A605371D3BE109A18E3170E1501887C61A uid OSG Software 3.6 for EL9 RSA <help@osg-htc.org> sub rsa4096 2022-12-28 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-auto pub rsa4096 2023-06-23 [SC] E2AF9F6E239FD62B537705C01760EDF64D4384D0 uid OSG 23 Automated Signing Key <help@osg-htc.org> sub rsa4096 2023-06-23 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-developer pub rsa4096 2023-08-15 [SC] 4A56C5BBCDB0AAA2DDE9A690BDEEE24C92897C00 uid OSG 23 Developer Signing Key <help@osg-chtc.org> sub rsa4096 2023-08-15 [E]","title":"RPM Signing"},{"location":"release/signing/#osg-release-signing-information","text":"","title":"OSG Release Signing Information"},{"location":"release/signing/#verifying-osgs-rpms","text":"We use a GPG key to sign our software packages. Normally yum and rpm transparently use the GPG signatures to verify the packages have not been corrupted and were created by us. You get our GPG public key when you install the osg-release RPM. If you wish to verify one of our RPMs manually, you can run: $ rpm --checksig -v <NAME.RPM> For example: $ rpm --checksig -v globus-core-8.0-2.osg.x86_64.rpm globus-core-8.0-2.osg.x86_64.rpm: Header V3 DSA signature: OK, key ID 824b8603 Header SHA1 digest: OK (2b5af4348c548c27f10e2e47e1ec80500c4f85d7) MD5 digest: OK (d11503a229a1a0e02262034efe0f7e46) V3 DSA signature: OK, key ID 824b8603","title":"Verifying OSG's RPMs"},{"location":"release/signing/#the-osg-packaging-signing-keys","text":"The OSG Software Team has several GPG keys for signing RPMs; The key used depends on the OSG version and EL variant used, as documented below: Key 1 (3.0 to 3.5) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG Download UW-Madison , GitHub Fingerprint 6459 !D9D2 AAA9 AB67 A251 FB44 2110 !B1C8 824B 8603 Key ID 824b8603 Key 2 (3.6 and on, EL <= 8) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 Download UW-Madison , GitHub Fingerprint 1216 FF68 897A 77EA 222F C961 27DC 6864 96D2 B90F Key ID 96d2b90f Key 4 (3.6 and on, EL >= 9) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 Download GitHub Fingerprint B77E 70A6 0537 1D3B E109 A18E 3170 E150 1887 C61A Key ID 1887c61a OSG 23 Automated Signing Key Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-auto Download GitHub Fingerprint E2AF 9F6E 239F D62B 5377 05C0 1760 EDF6 4D43 84D0 Key ID 4d4384d0 OSG 23 Developer Signing Key Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-developer Download GitHub Fingerprint 4A56 C5BB CDB0 AAA2 DDE9 A690 BDEE E24C 9289 7C00 Key ID 92897c00 Note Some packages in the 3.6 repos may still be signed with the old key; the osg-release RPM contains both keys so you can verify old packages. You can see the fingerprint for yourself. On EL 7 and older (GnuPG < 2.1.13): $ gpg --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG pub 1024D/824B8603 2011-09-15 OSG Software Team (RPM Signing Key for Koji Packages) <vdt-support@opensciencegrid.org> Key fingerprint = 6459 D9D2 AAA9 AB67 A251 FB44 2110 B1C8 824B 8603 sub 2048g/28E5857C 2011-09-15 $ gpg --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 pub 4096R/96D2B90F 2021-02-24 Open Science Grid Software <help@osg-htc.org> Key fingerprint = 1216 FF68 897A 77EA 222F C961 27DC 6864 96D2 B90F sub 4096R/49E9ACC2 2021-02-24 On EL 8 and newer (GnuPG >= 2.1.13): $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG pub dsa1024 2011-09-15 [SC] 6459D9D2AAA9AB67A251FB442110B1C8824B8603 uid OSG Software Team (RPM Signing Key for Koji Packages) <vdt-support@opensciencegrid.org> sub elg2048 2011-09-15 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 pub rsa4096 2021-02-24 [SC] 1216FF68897A77EA222FC96127DC686496D2B90F uid Open Science Grid Software <help@osg-htc.org> sub rsa4096 2021-02-24 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 pub rsa4096 2022-12-28 [SC] B77E70A605371D3BE109A18E3170E1501887C61A uid OSG Software 3.6 for EL9 RSA <help@osg-htc.org> sub rsa4096 2022-12-28 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-auto pub rsa4096 2023-06-23 [SC] E2AF9F6E239FD62B537705C01760EDF64D4384D0 uid OSG 23 Automated Signing Key <help@osg-htc.org> sub rsa4096 2023-06-23 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-developer pub rsa4096 2023-08-15 [SC] 4A56C5BBCDB0AAA2DDE9A690BDEEE24C92897C00 uid OSG 23 Developer Signing Key <help@osg-chtc.org> sub rsa4096 2023-08-15 [E]","title":"The OSG Packaging Signing Keys"},{"location":"release/supported_platforms/","text":"Supported Platforms \u00b6 The OSG release series are supported on Red Hat Enterprise Linux (RHEL) compatible platforms for 64-bit Intel architectures according to the following table: Platform OSG 3.6 OSG 23 CentOS 7 \u2705 CentOS Stream 8 \u2705 \u2705 CentOS Stream 9 \u2705 \u2705 Alma Linux 8 \u2705 \u2705 Alma Linux 9 \u2705 \u2705 Red Hat Enterprise Linux 7 \u2705 Red Hat Enterprise Linux 8 \u2705 \u2705 Red Hat Enterprise Linux 9 \u2705 \u2705 Rocky Linux 8 \u2705 \u2705 Rocky Linux 9 \u2705 \u2705 Scientifix Linux 7 \u2705 OSG builds and tests its RPMs on the latest releases of the relevant platforms (e.g., in 2023, the RHEL 9 builds were based on RHEL 9.2). Older platform release versions may not receive thorough testing and may have subtle bugs. In particular, versions of RHEL/CentOS/SL 7 less than 7.5 have known issues with several pieces of software, including osg-oasis and xrootd-lcmaps . In addition, versions of RHEL/CentOS/SL 7 less than 7.8 do not have Python 3, which is required to run HTCondor 9 and HTCondor-CE 5. If sites run into problems with one of those versions, we will ask them to update to the latest operating system packages as part of the support process.","title":"Supported Platforms"},{"location":"release/supported_platforms/#supported-platforms","text":"The OSG release series are supported on Red Hat Enterprise Linux (RHEL) compatible platforms for 64-bit Intel architectures according to the following table: Platform OSG 3.6 OSG 23 CentOS 7 \u2705 CentOS Stream 8 \u2705 \u2705 CentOS Stream 9 \u2705 \u2705 Alma Linux 8 \u2705 \u2705 Alma Linux 9 \u2705 \u2705 Red Hat Enterprise Linux 7 \u2705 Red Hat Enterprise Linux 8 \u2705 \u2705 Red Hat Enterprise Linux 9 \u2705 \u2705 Rocky Linux 8 \u2705 \u2705 Rocky Linux 9 \u2705 \u2705 Scientifix Linux 7 \u2705 OSG builds and tests its RPMs on the latest releases of the relevant platforms (e.g., in 2023, the RHEL 9 builds were based on RHEL 9.2). Older platform release versions may not receive thorough testing and may have subtle bugs. In particular, versions of RHEL/CentOS/SL 7 less than 7.5 have known issues with several pieces of software, including osg-oasis and xrootd-lcmaps . In addition, versions of RHEL/CentOS/SL 7 less than 7.8 do not have Python 3, which is required to run HTCondor 9 and HTCondor-CE 5. If sites run into problems with one of those versions, we will ask them to update to the latest operating system packages as part of the support process.","title":"Supported Platforms"},{"location":"release/updating-to-osg-23/","text":"Updating to OSG 23 \u00b6 OSG 23 (the new series ) aligns much more closely with HTCondor 23 and HTCondor-CE 23. Compute Entrypoints should be updated to OSG 23 as soon as possible. HTCondor pools and access points should be updated to OSG 23 as soon as possible. All other services (e.g., OSG Worker Node clients, Frontier Squids) should be updated to OSG 23 as soon as possible. Updating the OSG Repositories \u00b6 Note Before updating the OSG repository, be sure to turn off any OSG services. Consult the sections below that match your situation. Clean the yum cache: root@host # yum clean all --enablerepo = * Disable to upcoming repository: yum-config-manager --disable osg-upcoming Remove the old series Yum repositories: root@host # rpm -e osg-release This step ensures that any local modifications to *.repo files will not prevent installing the new series repos. Any modified *.repo files should appear under /etc/yum.repos.d/ with the *.rpmsave extension. After installing the new OSG repositories (the next step) you may want to apply any changes made in the *.rpmsave files to the new *.repo files. Update your Yum repositories to OSG 23 Update software: root@host # yum update Warning Please be aware that running yum update may also update other RPMs. You can exclude packages from being updated using the --exclude=[package-name or glob] option for the yum command. Watch the yum update carefully for any messages about a .rpmnew file being created. That means that a configuration file had been edited, and a new default version was to be installed. In that case, RPM does not overwrite the edited configuration file but instead installs the new version with a .rpmnew extension. You will need to merge any edits that have made into the .rpmnew file and then move the merged version into place (that is, without the .rpmnew extension). Continue on to any update instructions that match the role(s) that the host performs. Updating Your OSG Access Point \u00b6 In OSG 23, some manual configuration changes may be required for an OSG Access Point (APs). HTCondor \u00b6 Consult the HTCondor upgrade section for details on updating your HTCondor configuration. Restarting HTCondor \u00b6 After updating your RPMs, restart your HTCondor service: root@host # systemctl restart condor Updating Your OSG Compute Entrypoint \u00b6 The OSG 23 release series contains HTCondor-CE 23 , a minor version upgrade from HTCondor-CE 6, which was available in the OSG 3.6 release repositories. To upgrade your CE to OSG 23, follow the sections below. Check for possible incompatibilities \u00b6 Ensure that you have the latest HTCondor installed (either HTCondor 10.9.0 or HTCondor 10.0.9) Run the condor_upgrade_check -ce script and address any issues found. If you have an HTCondor batch system, also run the condor_upgrade_check script and address any issues found. Also consult the upgrade documentation for more information. Turning off CE services \u00b6 Register a downtime During the update, turn off the following services on your HTCondor-CE host: root@host # systemctl stop condor-ce Updating CE packages \u00b6 For OSG CEs serving an HTCondor pool If your OSG CE routes pilot jobs to a local HTCondor pool, also see the section for updating your HTCondor hosts After turning off your CE's services, you may proceed with the repository and RPM update process . Starting CE services \u00b6 After updating your RPMs and updating your configuration, turn on the HTCondor-CE service: :::console root@host # systemctl start condor-ce Updating Your HTCondor Hosts \u00b6 HTCondor-CE hosts Consult this section before updating the condor package on your HTCondor-CE hosts. If you are running an HTCondor pool, consult the following instructions to update to HTCondor from OSG 23. Ensure that you have the latest HTCondor installed (either HTCondor 10.9.0 or HTCondor 10.0.9). Run the condor_upgrade_check script and address any issues found. Also consult the HTCondor 23.0 upgrade instructions . You may proceed with the repository and RPM update process . Getting Help \u00b6 To get assistance, please use the this page .","title":"Updating to OSG 23"},{"location":"release/updating-to-osg-23/#updating-to-osg-23","text":"OSG 23 (the new series ) aligns much more closely with HTCondor 23 and HTCondor-CE 23. Compute Entrypoints should be updated to OSG 23 as soon as possible. HTCondor pools and access points should be updated to OSG 23 as soon as possible. All other services (e.g., OSG Worker Node clients, Frontier Squids) should be updated to OSG 23 as soon as possible.","title":"Updating to OSG 23"},{"location":"release/updating-to-osg-23/#updating-the-osg-repositories","text":"Note Before updating the OSG repository, be sure to turn off any OSG services. Consult the sections below that match your situation. Clean the yum cache: root@host # yum clean all --enablerepo = * Disable to upcoming repository: yum-config-manager --disable osg-upcoming Remove the old series Yum repositories: root@host # rpm -e osg-release This step ensures that any local modifications to *.repo files will not prevent installing the new series repos. Any modified *.repo files should appear under /etc/yum.repos.d/ with the *.rpmsave extension. After installing the new OSG repositories (the next step) you may want to apply any changes made in the *.rpmsave files to the new *.repo files. Update your Yum repositories to OSG 23 Update software: root@host # yum update Warning Please be aware that running yum update may also update other RPMs. You can exclude packages from being updated using the --exclude=[package-name or glob] option for the yum command. Watch the yum update carefully for any messages about a .rpmnew file being created. That means that a configuration file had been edited, and a new default version was to be installed. In that case, RPM does not overwrite the edited configuration file but instead installs the new version with a .rpmnew extension. You will need to merge any edits that have made into the .rpmnew file and then move the merged version into place (that is, without the .rpmnew extension). Continue on to any update instructions that match the role(s) that the host performs.","title":"Updating the OSG Repositories"},{"location":"release/updating-to-osg-23/#updating-your-osg-access-point","text":"In OSG 23, some manual configuration changes may be required for an OSG Access Point (APs).","title":"Updating Your OSG Access Point"},{"location":"release/updating-to-osg-23/#htcondor","text":"Consult the HTCondor upgrade section for details on updating your HTCondor configuration.","title":"HTCondor"},{"location":"release/updating-to-osg-23/#restarting-htcondor","text":"After updating your RPMs, restart your HTCondor service: root@host # systemctl restart condor","title":"Restarting HTCondor"},{"location":"release/updating-to-osg-23/#updating-your-osg-compute-entrypoint","text":"The OSG 23 release series contains HTCondor-CE 23 , a minor version upgrade from HTCondor-CE 6, which was available in the OSG 3.6 release repositories. To upgrade your CE to OSG 23, follow the sections below.","title":"Updating Your OSG Compute Entrypoint"},{"location":"release/updating-to-osg-23/#check-for-possible-incompatibilities","text":"Ensure that you have the latest HTCondor installed (either HTCondor 10.9.0 or HTCondor 10.0.9) Run the condor_upgrade_check -ce script and address any issues found. If you have an HTCondor batch system, also run the condor_upgrade_check script and address any issues found. Also consult the upgrade documentation for more information.","title":"Check for possible incompatibilities"},{"location":"release/updating-to-osg-23/#turning-off-ce-services","text":"Register a downtime During the update, turn off the following services on your HTCondor-CE host: root@host # systemctl stop condor-ce","title":"Turning off CE services"},{"location":"release/updating-to-osg-23/#updating-ce-packages","text":"For OSG CEs serving an HTCondor pool If your OSG CE routes pilot jobs to a local HTCondor pool, also see the section for updating your HTCondor hosts After turning off your CE's services, you may proceed with the repository and RPM update process .","title":"Updating CE packages"},{"location":"release/updating-to-osg-23/#starting-ce-services","text":"After updating your RPMs and updating your configuration, turn on the HTCondor-CE service: :::console root@host # systemctl start condor-ce","title":"Starting CE services"},{"location":"release/updating-to-osg-23/#updating-your-htcondor-hosts","text":"HTCondor-CE hosts Consult this section before updating the condor package on your HTCondor-CE hosts. If you are running an HTCondor pool, consult the following instructions to update to HTCondor from OSG 23. Ensure that you have the latest HTCondor installed (either HTCondor 10.9.0 or HTCondor 10.0.9). Run the condor_upgrade_check script and address any issues found. Also consult the HTCondor 23.0 upgrade instructions . You may proceed with the repository and RPM update process .","title":"Updating Your HTCondor Hosts"},{"location":"release/updating-to-osg-23/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"release/updating-to-osg-36/","text":"Updating to OSG 3.6 \u00b6 OSG 3.6 (the new series ) is a major overhaul of the OSG software stack compared to OSG 3.5 (the old series ) with changes to core protocols used for authentication and data transfer. See this page for more details regarding this transition. Depending on the collaboration(s) that you support , updating to the new series could result in issues with your site receiving pilot jobs and/or issues with data transfer. See the list of services below for any special considerations for the OSG 3.6 update: Compute Entrypoints should be updated to OSG 3.6 with care: If the collaborations that you support have NOT moved to bearer token pilot job submission, update to HTCondor-CE 5 available in OSG 3.5 upcoming to help your collaborations transition to bearer tokens. OSG 3.5 end-of-life OSG 3.5 support is scheduled to end on May 1, 2022 . If your collaboration does not yet support token-based pilot job submission, please contact them directly for their timeline. XRootD will continue to support GSI and VOMS proxies in OSG 3.6 through plugins that do not use the Grid Community Toolkit libraries. Therefore, XRootD hosts (i.e., standalone installations, caches and origins) should be updated to OSG 3.6 as soon as possible. Some config changes will be necessary; see the XRootD auth update instructions for specifics. GridFTP services should be replaced with an installation of XRootD standalone. HTCondor pools and access points should be updated to OSG 3.6 as soon as possible. Note that any pools using GSI authentication will need to transition to a different authentication method, such as IDTOKENS. All other services (e.g., OSG Worker Node clients, Frontier Squids) should be updated to OSG 3.6 as soon as possible. Updating the OSG Repositories \u00b6 Python 3 support Many software packages, such as HTCondor and HTCondor-CE, use Python 3 scripts. If you are using Enterprise Linux 7, you must upgrade to at least version 7.8 for Python 3 support. Note Before updating the OSG repository, be sure to turn off any OSG services. Consult the sections below that match your situation. Clean the yum cache: root@host # yum clean all --enablerepo = * Disable to upcoming repository: yum-config-manager --disable osg-upcoming Remove the old series Yum repositories: root@host # rpm -e osg-release This step ensures that any local modifications to *.repo files will not prevent installing the new series repos. Any modified *.repo files should appear under /etc/yum.repos.d/ with the *.rpmsave extension. After installing the new OSG repositories (the next step) you may want to apply any changes made in the *.rpmsave files to the new *.repo files. Update your Yum repositories to OSG 3.6 Update software: root@host # yum update Warning Please be aware that running yum update may also update other RPMs. You can exclude packages from being updated using the --exclude=[package-name or glob] option for the yum command. Watch the yum update carefully for any messages about a .rpmnew file being created. That means that a configuration file had been edited, and a new default version was to be installed. In that case, RPM does not overwrite the edited configuration file but instead installs the new version with a .rpmnew extension. You will need to merge any edits that have made into the .rpmnew file and then move the merged version into place (that is, without the .rpmnew extension). Continue on to any update instructions that match the role(s) that the host performs. Updating Your OSG Access Point \u00b6 In OSG 3.6, some manual configuration changes are required for OSG Access Point (APs). To perform this upgrade, turn off and disable the gratia-probes-cron service: root@host # systemctl stop gratia-probes-cron root@host # systemctl disable gratia-probes-cron Updating AP packages \u00b6 For OSG 3.6 APs, the relevant Gratia Probe package to install is gratia-probe-condor-ap and you may need to explicitly install it if you are running a non-OSPool AP: Proceed with the repository and RPM update process . Install the gratia-probe-condor-ap RPM (OSPool APs should already have this package through the osg-flock RPM): root@host # yum install condor-probe-ap Updating AP configuration \u00b6 HTCondor \u00b6 Consult the HTCondor upgrade section for details on updating your HTCondor configuration. Gratia Probe \u00b6 Copy the following values from /etc/gratia/condor/ProbeConfig to /etc/gratia/condor-ap/ProbeConfig : EnableProbe MapUnknownToGroup ProbeName SiteName VOOverride Updated default values It is not sufficient to overwrite the contents of /etc/gratia/condor-ap/ProbeConfig entirely with the contents of /etc/gratia/condor/ProbeConfig as many default values have changed. In /etc/gratia/condor-ap/ProbeConfig , replace condor: in the ProbeName with condor-ap: . For example, the following value should be changed from: ProbeName=\"condor:my-ap.site.edu to: ProbeName=\"condor-ap:my-ap.site.edu Ensure that the paths ( /var/lib/condor/gratia/data ) from the following commands are the same: root@host # condor_config_val PER_JOB_HISTORY_DIR /var/lib/condor/gratia/data root@host # awk -F '=' '/DataFolder/ {print $2}' /etc/gratia/condor-ap/ProbeConfig | tr -d '\"' /var/lib/condor/gratia/data Restarting HTCondor \u00b6 After updating your RPMs and updating your configuration, restart your HTCondor service: root@host # systemctl restart condor What about gratia-probes-cron ? In OSG 3.6, OSG APs no longer needs a separate service for Gratia Probe. Instead, the default CE configuration runs its Gratia Probe as a periodic process under the HTCondor process tree. Updating Your OSG Compute Entrypoint \u00b6 In OSG 3.6, OSG Compute Entrypoints (CEs) only accept token-based pilot job submissions. If you need to support token-based and GSI proxy-based pilot job submission, you must install or remain on OSG 3.5, with the osg-upcoming repositories enabled. If the collaborations that you support have the capability to submit token-based pilots, you may update your CE to OSG 3.6. In addition to the change in authentication protocol, OSG 3.6 CEs include new major versions of software that require manual updates. To upgrade your CE to OSG 3.6, follow the sections below to make your configuration OSG 3.6-compatible. Turning off CE services \u00b6 Register a downtime During the update, turn off the following services on your HTCondor-CE host: root@host # systemctl stop condor-ce root@host # systemctl stop gratia-probes-cron Run the command corresponding to your batch system to upload any remaining accounting records to the GRACC: If your batch system is... Then run the following command... HTCondor /usr/share/gratia/condor/condor_meter LSF /usr/share/gratia/lsf/lsf PBS /usr/share/gratia/pbs-lsf/pbs-lsf_meter.cron.sh SGE /usr/share/gratia/sge/sge_meter.cron.sh Slurm /usr/share/gratia/slurm/slurm_meter -c Disable the gratia-probes-cron service: root@host # systemctl disable gratia-probes-cron Updating CE packages \u00b6 After turning off your CE's services, you may proceed with the repository and RPM update process . Updating CE configuration \u00b6 Gratia Probe \u00b6 The OSG 3.6 release series contains Gratia Probe 2 , which uses the non-root HTCondor-CE probe to account for your site's resource contributions. To ensure that your contributions continue to be properly accounted for, perform the following steps based on the type of batch system running at your site. HTCondor batch systems \u00b6 After updating your gratia-probe-* packages, verify the values of your HTCondor and HTCondor-CE PER_JOB_HISTORY_DIR configurations match the output below: # condor_ce_config_val -v PER_JOB_HISTORY_DIR Not defined: PER_JOB_HISTORY_DIR # at: <Default> # raw: PER_JOB_HISTORY_DIR = # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor/config.d/99-gratia.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor-ce/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If the value of condor_ce_config_val -v PER_JOB_HISTORY_DIR is set, visit the referenced file and remove the offending configuration. Repeat until the output of condor_ce_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor-ce/gratia/data/ root@host # chown -R condor:condor /var/lib/condor-ce/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a. Non-HTCondor batch systems \u00b6 After updating your gratia-probe-* packages, verify that your HTCondor-CE's PER_JOB_HISTORY_DIR is set to /var/lib/condor-ce/gratia/data : root@host # condor_ce_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor-ce/config.d/99_gratia-ce.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output , visit the file listed in the output of condor_ce_config_val , remove the offending value, and repeat until the proper value is returned. OSG-Configure \u00b6 The OSG 3.6 release series contains OSG-Configure 4, a major version upgrade from the previously released versions in the OSG. See the OSG-Configure 4.0.0 release notes for an overview of the changes. Several configuration modules and options were removed or deprecated and CE configuration has been simplified; the update from version 3 to version 4 will require some manual changes to your configuration. To update OSG-Configure, perform the following steps: Merge any *.rpmnew files in /etc/osg/config.d/ Uninstall osg-configure-gip and osg-configure-misc if they are installed: root@host# yum erase osg-configure-gip osg-configure-misc If /etc/osg/config.d/30-gip.ini.rpmsave exists, merge its contents into 31-cluster.ini Edit the Site Information configuration section (in 40-siteinfo.ini ): If resource_group is not set, add: resource_group = <TOPOLOGY RESOURCE GROUP FOR THIS HOST> Delete the following attributes: sponsor site_policy contact email city country latitude longitude Run osg-configure to apply your changes: osg-configure -dc HTCondor-CE \u00b6 Passing along non-HTCondor batch system directives default_CERequirements in the the new Job Router ClassAd transform syntax is ignored. To fix this, apply the change in this patch to /usr/share/condor-ce/config.d/01-ce-router-defaults.conf . The next release of HTCondor-CE will contain this fix and will not require any additional action post-update. The OSG 3.6 release series contains HTCondor-CE 5, a major version upgrade from HTCondor-CE 4, which was available in the OSG 3.5 release repositories. To update HTCondor-CE, perform the following steps: If the collaboration(s) that you support submit token-based pilots and you map these pilots to non-default local Unix accounts: Copy the relevant default mappings from /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf (provided by the osg-scitokens-mapfile package) to a file in /etc/condor-ce/mapfiles.d/ Replacing the third field with the local Unix account. Also consult the upgrade documentation for other required configuration updates. For OSG CEs serving an HTCondor pool If your OSG CE routes pilot jobs to a local HTCondor pool, also see the section for updating your HTCondor hosts Starting CE services \u00b6 After updating your RPMs and updating your configuration, turn on the HTCondor-CE service: root@host # systemctl start condor-ce What about gratia-probes-cron ? In OSG 3.6, the OSG CE no longer needs a separate service for Gratia Probe. Instead, the default CE configuration runs its Gratia Probe as a periodic process under the HTCondor-CE process tree. Updating Your HTCondor Hosts \u00b6 HTCondor-CE hosts Consult this section before updating the condor package on your HTCondor-CE hosts. If you are running an HTCondor pool, consult the following instructions to update to HTCondor from OSG 3.6. Note that the version of HTCondor available in OSG 3.6 does not support GSI authentication. If your pool is configured to authenticate with GSI, we recommend using HTCondor's \"IDTOKENS\" configuration for host-to-host authentication. The following OSG specific configuration was dropped in anticipation of HTCondor's new secure by default configuration coming in HTCondor version 9.0. HTCondor's 9.0 recommended security configuration requires authentication for all access (including read access). CONDOR_HOST = $(FULL_HOSTNAME) DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD # require authentication and integrity for everything... SEC_DEFAULT_AUTHENTICATION=REQUIRED SEC_DEFAULT_INTEGRITY=REQUIRED # ...except read access... SEC_READ_AUTHENTICATION=OPTIONAL SEC_READ_INTEGRITY=OPTIONAL # ...and the outgoing (client side) connection since the server side will enforce its policy SEC_CLIENT_AUTHENTICATION=OPTIONAL SEC_CLIENT_INTEGRITY=OPTIONAL # this will required PASSWORD authentications for daemon-to-daemon, and # allow FS authentication for submitting jobs and running administrator commands SEC_DEFAULT_AUTHENTICATION_METHODS = FS, PASSWORD SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD SEC_NEGOTIATOR_AUTHENTICATION_METHODS = PASSWORD SEC_PASSWORD_FILE = /etc/condor/passwords.d/POOL # admin commands (e.g. condor_off) can be run by: # 1. root on the local host or the central manager # 2. condor user on the local host or the central manager ALLOW_ADMINISTRATOR = condor@*/$(FULL_HOSTNAME) condor@*/$(CONDOR_HOST) condor_pool@*/$(FULL_HOSTNAME) condor_pool@*/$(CONDOR_HOST) root@$(UID_DOMAIN)/$(FULL_HOSTNAME) # only the condor daemons on the central manager can negotiate ALLOW_NEGOTIATOR = condor@*/$(CONDOR_HOST) condor_pool@*/$(CONDOR_HOST) # any authenticated daemons in the pool can read/write/advertise ALLOW_DAEMON = condor@* condor_pool@* Manual intervention may be required to upgrade from the HTCondor 8.8 series to HTCondor 9.0.x. Please consult the HTCondor 9.0 upgrade instructions . If you are upgrading from the HTCondor 8.9 series (8.9.11 and earlier), please consult the Upgrading to 9.0 instructions Replacing Your GridFTP Service \u00b6 Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. As part of the GridFTP and GSI migration , GridFTP is no longer available in the OSG 3.6 repositories. If you need to continue to provide remote access to local storage at your site, follow the instructions to install the OSG's configuration of XRootD Standalone . Getting Help \u00b6 To get assistance, please use the this page .","title":"Updating to OSG 3.6"},{"location":"release/updating-to-osg-36/#updating-to-osg-36","text":"OSG 3.6 (the new series ) is a major overhaul of the OSG software stack compared to OSG 3.5 (the old series ) with changes to core protocols used for authentication and data transfer. See this page for more details regarding this transition. Depending on the collaboration(s) that you support , updating to the new series could result in issues with your site receiving pilot jobs and/or issues with data transfer. See the list of services below for any special considerations for the OSG 3.6 update: Compute Entrypoints should be updated to OSG 3.6 with care: If the collaborations that you support have NOT moved to bearer token pilot job submission, update to HTCondor-CE 5 available in OSG 3.5 upcoming to help your collaborations transition to bearer tokens. OSG 3.5 end-of-life OSG 3.5 support is scheduled to end on May 1, 2022 . If your collaboration does not yet support token-based pilot job submission, please contact them directly for their timeline. XRootD will continue to support GSI and VOMS proxies in OSG 3.6 through plugins that do not use the Grid Community Toolkit libraries. Therefore, XRootD hosts (i.e., standalone installations, caches and origins) should be updated to OSG 3.6 as soon as possible. Some config changes will be necessary; see the XRootD auth update instructions for specifics. GridFTP services should be replaced with an installation of XRootD standalone. HTCondor pools and access points should be updated to OSG 3.6 as soon as possible. Note that any pools using GSI authentication will need to transition to a different authentication method, such as IDTOKENS. All other services (e.g., OSG Worker Node clients, Frontier Squids) should be updated to OSG 3.6 as soon as possible.","title":"Updating to OSG 3.6"},{"location":"release/updating-to-osg-36/#updating-the-osg-repositories","text":"Python 3 support Many software packages, such as HTCondor and HTCondor-CE, use Python 3 scripts. If you are using Enterprise Linux 7, you must upgrade to at least version 7.8 for Python 3 support. Note Before updating the OSG repository, be sure to turn off any OSG services. Consult the sections below that match your situation. Clean the yum cache: root@host # yum clean all --enablerepo = * Disable to upcoming repository: yum-config-manager --disable osg-upcoming Remove the old series Yum repositories: root@host # rpm -e osg-release This step ensures that any local modifications to *.repo files will not prevent installing the new series repos. Any modified *.repo files should appear under /etc/yum.repos.d/ with the *.rpmsave extension. After installing the new OSG repositories (the next step) you may want to apply any changes made in the *.rpmsave files to the new *.repo files. Update your Yum repositories to OSG 3.6 Update software: root@host # yum update Warning Please be aware that running yum update may also update other RPMs. You can exclude packages from being updated using the --exclude=[package-name or glob] option for the yum command. Watch the yum update carefully for any messages about a .rpmnew file being created. That means that a configuration file had been edited, and a new default version was to be installed. In that case, RPM does not overwrite the edited configuration file but instead installs the new version with a .rpmnew extension. You will need to merge any edits that have made into the .rpmnew file and then move the merged version into place (that is, without the .rpmnew extension). Continue on to any update instructions that match the role(s) that the host performs.","title":"Updating the OSG Repositories"},{"location":"release/updating-to-osg-36/#updating-your-osg-access-point","text":"In OSG 3.6, some manual configuration changes are required for OSG Access Point (APs). To perform this upgrade, turn off and disable the gratia-probes-cron service: root@host # systemctl stop gratia-probes-cron root@host # systemctl disable gratia-probes-cron","title":"Updating Your OSG Access Point"},{"location":"release/updating-to-osg-36/#updating-ap-packages","text":"For OSG 3.6 APs, the relevant Gratia Probe package to install is gratia-probe-condor-ap and you may need to explicitly install it if you are running a non-OSPool AP: Proceed with the repository and RPM update process . Install the gratia-probe-condor-ap RPM (OSPool APs should already have this package through the osg-flock RPM): root@host # yum install condor-probe-ap","title":"Updating AP packages"},{"location":"release/updating-to-osg-36/#updating-ap-configuration","text":"","title":"Updating AP configuration"},{"location":"release/updating-to-osg-36/#htcondor","text":"Consult the HTCondor upgrade section for details on updating your HTCondor configuration.","title":"HTCondor"},{"location":"release/updating-to-osg-36/#gratia-probe","text":"Copy the following values from /etc/gratia/condor/ProbeConfig to /etc/gratia/condor-ap/ProbeConfig : EnableProbe MapUnknownToGroup ProbeName SiteName VOOverride Updated default values It is not sufficient to overwrite the contents of /etc/gratia/condor-ap/ProbeConfig entirely with the contents of /etc/gratia/condor/ProbeConfig as many default values have changed. In /etc/gratia/condor-ap/ProbeConfig , replace condor: in the ProbeName with condor-ap: . For example, the following value should be changed from: ProbeName=\"condor:my-ap.site.edu to: ProbeName=\"condor-ap:my-ap.site.edu Ensure that the paths ( /var/lib/condor/gratia/data ) from the following commands are the same: root@host # condor_config_val PER_JOB_HISTORY_DIR /var/lib/condor/gratia/data root@host # awk -F '=' '/DataFolder/ {print $2}' /etc/gratia/condor-ap/ProbeConfig | tr -d '\"' /var/lib/condor/gratia/data","title":"Gratia Probe"},{"location":"release/updating-to-osg-36/#restarting-htcondor","text":"After updating your RPMs and updating your configuration, restart your HTCondor service: root@host # systemctl restart condor What about gratia-probes-cron ? In OSG 3.6, OSG APs no longer needs a separate service for Gratia Probe. Instead, the default CE configuration runs its Gratia Probe as a periodic process under the HTCondor process tree.","title":"Restarting HTCondor"},{"location":"release/updating-to-osg-36/#updating-your-osg-compute-entrypoint","text":"In OSG 3.6, OSG Compute Entrypoints (CEs) only accept token-based pilot job submissions. If you need to support token-based and GSI proxy-based pilot job submission, you must install or remain on OSG 3.5, with the osg-upcoming repositories enabled. If the collaborations that you support have the capability to submit token-based pilots, you may update your CE to OSG 3.6. In addition to the change in authentication protocol, OSG 3.6 CEs include new major versions of software that require manual updates. To upgrade your CE to OSG 3.6, follow the sections below to make your configuration OSG 3.6-compatible.","title":"Updating Your OSG Compute Entrypoint"},{"location":"release/updating-to-osg-36/#turning-off-ce-services","text":"Register a downtime During the update, turn off the following services on your HTCondor-CE host: root@host # systemctl stop condor-ce root@host # systemctl stop gratia-probes-cron Run the command corresponding to your batch system to upload any remaining accounting records to the GRACC: If your batch system is... Then run the following command... HTCondor /usr/share/gratia/condor/condor_meter LSF /usr/share/gratia/lsf/lsf PBS /usr/share/gratia/pbs-lsf/pbs-lsf_meter.cron.sh SGE /usr/share/gratia/sge/sge_meter.cron.sh Slurm /usr/share/gratia/slurm/slurm_meter -c Disable the gratia-probes-cron service: root@host # systemctl disable gratia-probes-cron","title":"Turning off CE services"},{"location":"release/updating-to-osg-36/#updating-ce-packages","text":"After turning off your CE's services, you may proceed with the repository and RPM update process .","title":"Updating CE packages"},{"location":"release/updating-to-osg-36/#updating-ce-configuration","text":"","title":"Updating CE configuration"},{"location":"release/updating-to-osg-36/#gratia-probe_1","text":"The OSG 3.6 release series contains Gratia Probe 2 , which uses the non-root HTCondor-CE probe to account for your site's resource contributions. To ensure that your contributions continue to be properly accounted for, perform the following steps based on the type of batch system running at your site.","title":"Gratia Probe"},{"location":"release/updating-to-osg-36/#htcondor-batch-systems","text":"After updating your gratia-probe-* packages, verify the values of your HTCondor and HTCondor-CE PER_JOB_HISTORY_DIR configurations match the output below: # condor_ce_config_val -v PER_JOB_HISTORY_DIR Not defined: PER_JOB_HISTORY_DIR # at: <Default> # raw: PER_JOB_HISTORY_DIR = # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor/config.d/99-gratia.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor-ce/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If the value of condor_ce_config_val -v PER_JOB_HISTORY_DIR is set, visit the referenced file and remove the offending configuration. Repeat until the output of condor_ce_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor-ce/gratia/data/ root@host # chown -R condor:condor /var/lib/condor-ce/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a.","title":"HTCondor batch systems"},{"location":"release/updating-to-osg-36/#non-htcondor-batch-systems","text":"After updating your gratia-probe-* packages, verify that your HTCondor-CE's PER_JOB_HISTORY_DIR is set to /var/lib/condor-ce/gratia/data : root@host # condor_ce_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor-ce/config.d/99_gratia-ce.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output , visit the file listed in the output of condor_ce_config_val , remove the offending value, and repeat until the proper value is returned.","title":"Non-HTCondor batch systems"},{"location":"release/updating-to-osg-36/#osg-configure","text":"The OSG 3.6 release series contains OSG-Configure 4, a major version upgrade from the previously released versions in the OSG. See the OSG-Configure 4.0.0 release notes for an overview of the changes. Several configuration modules and options were removed or deprecated and CE configuration has been simplified; the update from version 3 to version 4 will require some manual changes to your configuration. To update OSG-Configure, perform the following steps: Merge any *.rpmnew files in /etc/osg/config.d/ Uninstall osg-configure-gip and osg-configure-misc if they are installed: root@host# yum erase osg-configure-gip osg-configure-misc If /etc/osg/config.d/30-gip.ini.rpmsave exists, merge its contents into 31-cluster.ini Edit the Site Information configuration section (in 40-siteinfo.ini ): If resource_group is not set, add: resource_group = <TOPOLOGY RESOURCE GROUP FOR THIS HOST> Delete the following attributes: sponsor site_policy contact email city country latitude longitude Run osg-configure to apply your changes: osg-configure -dc","title":"OSG-Configure"},{"location":"release/updating-to-osg-36/#htcondor-ce","text":"Passing along non-HTCondor batch system directives default_CERequirements in the the new Job Router ClassAd transform syntax is ignored. To fix this, apply the change in this patch to /usr/share/condor-ce/config.d/01-ce-router-defaults.conf . The next release of HTCondor-CE will contain this fix and will not require any additional action post-update. The OSG 3.6 release series contains HTCondor-CE 5, a major version upgrade from HTCondor-CE 4, which was available in the OSG 3.5 release repositories. To update HTCondor-CE, perform the following steps: If the collaboration(s) that you support submit token-based pilots and you map these pilots to non-default local Unix accounts: Copy the relevant default mappings from /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf (provided by the osg-scitokens-mapfile package) to a file in /etc/condor-ce/mapfiles.d/ Replacing the third field with the local Unix account. Also consult the upgrade documentation for other required configuration updates. For OSG CEs serving an HTCondor pool If your OSG CE routes pilot jobs to a local HTCondor pool, also see the section for updating your HTCondor hosts","title":"HTCondor-CE"},{"location":"release/updating-to-osg-36/#starting-ce-services","text":"After updating your RPMs and updating your configuration, turn on the HTCondor-CE service: root@host # systemctl start condor-ce What about gratia-probes-cron ? In OSG 3.6, the OSG CE no longer needs a separate service for Gratia Probe. Instead, the default CE configuration runs its Gratia Probe as a periodic process under the HTCondor-CE process tree.","title":"Starting CE services"},{"location":"release/updating-to-osg-36/#updating-your-htcondor-hosts","text":"HTCondor-CE hosts Consult this section before updating the condor package on your HTCondor-CE hosts. If you are running an HTCondor pool, consult the following instructions to update to HTCondor from OSG 3.6. Note that the version of HTCondor available in OSG 3.6 does not support GSI authentication. If your pool is configured to authenticate with GSI, we recommend using HTCondor's \"IDTOKENS\" configuration for host-to-host authentication. The following OSG specific configuration was dropped in anticipation of HTCondor's new secure by default configuration coming in HTCondor version 9.0. HTCondor's 9.0 recommended security configuration requires authentication for all access (including read access). CONDOR_HOST = $(FULL_HOSTNAME) DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD # require authentication and integrity for everything... SEC_DEFAULT_AUTHENTICATION=REQUIRED SEC_DEFAULT_INTEGRITY=REQUIRED # ...except read access... SEC_READ_AUTHENTICATION=OPTIONAL SEC_READ_INTEGRITY=OPTIONAL # ...and the outgoing (client side) connection since the server side will enforce its policy SEC_CLIENT_AUTHENTICATION=OPTIONAL SEC_CLIENT_INTEGRITY=OPTIONAL # this will required PASSWORD authentications for daemon-to-daemon, and # allow FS authentication for submitting jobs and running administrator commands SEC_DEFAULT_AUTHENTICATION_METHODS = FS, PASSWORD SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD SEC_NEGOTIATOR_AUTHENTICATION_METHODS = PASSWORD SEC_PASSWORD_FILE = /etc/condor/passwords.d/POOL # admin commands (e.g. condor_off) can be run by: # 1. root on the local host or the central manager # 2. condor user on the local host or the central manager ALLOW_ADMINISTRATOR = condor@*/$(FULL_HOSTNAME) condor@*/$(CONDOR_HOST) condor_pool@*/$(FULL_HOSTNAME) condor_pool@*/$(CONDOR_HOST) root@$(UID_DOMAIN)/$(FULL_HOSTNAME) # only the condor daemons on the central manager can negotiate ALLOW_NEGOTIATOR = condor@*/$(CONDOR_HOST) condor_pool@*/$(CONDOR_HOST) # any authenticated daemons in the pool can read/write/advertise ALLOW_DAEMON = condor@* condor_pool@* Manual intervention may be required to upgrade from the HTCondor 8.8 series to HTCondor 9.0.x. Please consult the HTCondor 9.0 upgrade instructions . If you are upgrading from the HTCondor 8.9 series (8.9.11 and earlier), please consult the Upgrading to 9.0 instructions","title":"Updating Your HTCondor Hosts"},{"location":"release/updating-to-osg-36/#replacing-your-gridftp-service","text":"Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. As part of the GridFTP and GSI migration , GridFTP is no longer available in the OSG 3.6 repositories. If you need to continue to provide remote access to local storage at your site, follow the instructions to install the OSG's configuration of XRootD Standalone .","title":"Replacing Your GridFTP Service"},{"location":"release/updating-to-osg-36/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"release/yum-basics/","text":"Basics of using yum and RPM \u00b6 About This Document \u00b6 This document introduces package management tools that help you install, update, and remove packages. OSG uses RPMs (the Red Hat Packaging Manager) to package its software. While RPM is the packaging format, yum is the command you will use to do the installation. For example, yum will resolve and download the dependencies for the package you want to install; rpm will simply complain if you want to install a package that does not have all its dependencies installed. Installation \u00b6 Installation is done with the yum install command. Each of the individual installation guide shows you the correct command to use to do an installation. Here is an example installation with all of the output from yum. root@host # sudo yum install osg-ca-certs OSG Software for Enterprise Linux 9 - x86_64 668 kB/s | 438 kB 00:00 Dependencies resolved. ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: osg-ca-certs noarch 1.110-1.2.osg36.el9 osg 244 k Transaction Summary ==================================================================================================================== Install 1 Package Total download size: 244 k Installed size: 340 k Is this ok [y/N]: y Downloading Packages: osg-ca-certs-1.110-1.2.osg36.el9.noarch.rpm 1.5 MB/s | 244 kB 00:00 -------------------------------------------------------------------------------------------------------------------- Total 1.0 MB/s | 244 kB 00:00 OSG Software for Enterprise Linux 9 - x86_64 3.0 MB/s | 3.1 kB 00:00 Importing GPG key 0x1887C61A: Userid : \"OSG Software 3.6 for EL9 RSA <help@osg-htc.org>\" Fingerprint: B77E 70A6 0537 1D3B E109 A18E 3170 E150 1887 C61A From : /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 Is this ok [y/N]: y ... Installed: osg-ca-certs-1.110-1.2.osg36.el9.noarch Please Note : When you first install a package from the OSG repository, you will be prompted to import the GPG key. We use this key to sign our RPMs as a security measure. You should double-check the key id (above it is 824B8603) with the information on our signed RPMs . If it doesn't match, there is a problem somewhere and you should report it to the OSG via help@osg-htc.org. Verifying Packages and Installations \u00b6 You can check if an RPM has been modified. For instance, to check to see if any files have been modified in the osg-ca-certs RPM you just installed: user@host $ rpm --verify osg-ca-certs The lack of any output means there were no problems. If you would like to see all the files for which there are no problems, you can do: user@host $ rpm --verify -v osg-ca-certs ........ /etc/grid-security/certificates ........ /etc/grid-security/certificates/0119347c.0 ........ /etc/grid-security/certificates/0119347c.namespaces ........ /etc/grid-security/certificates/0119347c.signing_policy ... etc ... Each dot indicates a specific check that was made and passed. If someone had modified a file, you might see this: user@host $ rpm --verify osg-ca-certs ..5....T /etc/grid-security/certificates/ffc3d59b.0 This means the files MD5 checksum has changed (so the contents have changed) and the timestamp is different. The complete set of changes you might see (copied from the rpm man page) are: Letter Meaning S file Size differs M Mode differs (includes permissions and file type) 5 MD5 sum differs D Device major/minor number mismatch L readLink(2) path mismatch U User ownership differs G Group ownership differs T mTime differs If you don't care about some of those changes, you can tell rpm to ignore them. For instance, to ignore changes in the modification time: user@host $ rpm --verify --nomtime osg-ca-certs ..5..... /etc/grid-security/certificates/ffc3d59b.0 Understanding a package \u00b6 If you want to know what package a file belongs to, you can ask rpm. For instance, if you're curious what package contains the srm-ls command, you can do: # 1 . Find the exact path user@host $ which osg-cert-request /usr/bin/osg-cert-request # 2 . Ask rpm what package it is part of: user@host $ rpm -q --file /usr/bin/osg-cert-request osg-pki-tools-3.5.2-2.osg36.el9.noarch If you want to know what other things are in a package--perhaps the other available tools or configuration files--you can do that as well: user@host $ rpm -ql osg-pki-tools /usr/bin/osg-cert-request /usr/bin/osg-incommon-cert-request /usr/lib/python3.9/site-packages/osgpkitools /usr/lib/python3.9/site-packages/osgpkitools/ExceptionDefinitions.py /usr/lib/python3.9/site-packages/osgpkitools/__init__.py ... output trimmed ... What else does a package install? \u00b6 Sometimes you need to understand what other software is installed by a package. This can be particularly useful for understanding meta-packages , which are packages such as the osg-wn-client (worker node client) that contain nothing by themselves but only depend on other RPMs. To do this, use the --requires option to rpm. For example, you can see that the worker node client (as of OSG 3.6.0 in early June, 2023) will install curl , uberftp , wget , and a dozen or so other packages. user@host $ rpm -q --requires osg-wn-client /usr/bin/curl /usr/bin/ldapsearch /usr/bin/wget /usr/bin/xrdcp config(osg-wn-client) = 3.6-6.osg36.el9 fetch-crl gfal2 gfal2-plugin-file gfal2-plugin-http gfal2-plugin-xrootd grid-certificates >= 7 osg-system-profiler python3-gfal2-util rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(PayloadIsZstd) <= 5.4.18-1 stashcp vo-client voms-clients-cpp Finding RPM Packages \u00b6 It is normally best to read the OSG documentation to decide which packages to install because it may not be obvious what the correct packages to install are. That said, you can use yum to find out all sort of things. For instance, you can list packages that begin with \"voms\": user@host $ yum list \"voms*\" Available Packages voms.x86_64 2.1.0-0.27.rc3.el9 epel voms-clients-cpp.x86_64 2.1.0-0.27.rc3.el9 epel voms-api-java.noarch 3.3.2-11.el9 epel voms-api-java-javadoc.noarch 3.3.2-11.el9 epel voms-clients-java.noarch 3.3.2-5.el9 epel voms-devel.x86_64 2.1.0-0.27.rc3.el9 epel voms-doc.noarch 2.1.0-0.27.rc3.el9 epel voms-mysql-plugin.x86_64 3.1.7-13.el9 epel voms-server.x86_64 2.1.0-0.27.rc3.el9 epel If you want to search for packages that contain VOMS anywhere in the name or description, you can use yum search : user@host $ yum search voms ============================================ Name Exactly Matched: voms ============================================ voms.x86_64 : Virtual Organization Membership Service =========================================== Name & Summary Matched: voms =========================================== vo-client-lcmaps-voms.noarch : Provides a voms-mapfile-default file, mapping VOMS FQANs to Unix users suitable for : use by the LCMAPS VOMS plugin voms-mysql-plugin.x86_64 : VOMS server plugin for MySQL xrootd-voms.x86_64 : VOMS attribute extractor plug-in for XRootD ================================================ Name Matched: voms ================================================ voms-doc.noarch : Virtual Organization Membership Service Documentation voms-server.x86_64 : Virtual Organization Membership Service Server ============================================== Summary Matched: voms =============================================== vo-client.noarch : Contains vomses file for use with user authentication vo-client-dcache.noarch : Provides a grid-vorolemap file for use by dCache, similar to voms-mapfile-default ... etc ... One last example, if you want to know what RPM would give you the voms-proxy-init command, you can ask yum . The * indicates that you don't know the full pathname of voms-proxy-init . user@host $ yum whatprovides \"*voms-proxy-init\" voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 : Virtual Organization Membership Service Clients Repo : @System Matched from: Other : *voms-proxy-init voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 : Virtual Organization Membership Service Clients Repo : epel Matched from: Other : *voms-proxy-init voms-clients-java-3.3.2-5.el9.noarch : Virtual Organization Membership Service Java clients Repo : epel Matched from: Other : *voms-proxy-init Removing Packages \u00b6 To remove a single RPM, you can use yum remove . Not only will it uninstall the RPM you requested, but it will uninstall anything that depends on it. For example, if I previously installed the voms-clients package, I also installed another package it depends on called voms . If I remove voms , yum will also remove voms-clients : user@host $ sudo yum remove voms Dependencies resolved. ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Removing: voms x86_64 2.1.0-0.27.rc3.el9 @epel 432 k Removing dependent packages: osg-wn-client noarch 3.6-6.osg36.el9 @osg 211 Transaction Summary ==================================================================================================================== Remove 2 Package(s) Is this ok [y/N]: y Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running Transaction ... etc ... Removed: voms-2.1.0-0.27.rc3.el9.x86_64 Complete! Upgrading Packages \u00b6 You can check for updates with yum check-update . For example: root@host # yum check-update Loaded plugins: kernel-module, priorities 957 packages excluded due to repository priority protections kernel.x86_64 2.6.18-274.3.1.el5 fermi-security Obsoleting Packages ocsinventory-agent.noarch 1.1.2.1-1.el5 epel ocsinventory-client.noarch 0.9.9-10 installed You can do the update with yum update . Note that in this case we got more than was listed due to dependencies that needed to be resolved: root@host # yum update 957 packages excluded due to repository priority protections Setting up Update Process Resolving Dependencies --> Running transaction check ---> Package kernel.x86_64 0:2.6.18-274.3.1.el5 set to be installed ---> Package ocsinventory-agent.noarch 0:1.1.2.1-1.el5 set to be updated --> Processing Dependency: perl(Crypt::SSLeay) for package: ocsinventory-agent --> Processing Dependency: perl(Proc::Daemon) for package: ocsinventory-agent --> Processing Dependency: monitor-edid for package: ocsinventory-agent --> Processing Dependency: perl(Net::IP) for package: ocsinventory-agent --> Processing Dependency: nmap for package: ocsinventory-agent --> Processing Dependency: perl(Net::SSLeay) for package: ocsinventory-agent --> Running transaction check ---> Package monitor-edid.x86_64 0:2.5-1.el5.1 set to be updated ---> Package nmap.x86_64 2:4.11-1.1 set to be updated ---> Package perl-Crypt-SSLeay.x86_64 0:0.51-11.el5 set to be updated ---> Package perl-Net-IP.noarch 0:1.25-2.fc6 set to be updated ---> Package perl-Net-SSLeay.x86_64 0:1.30-4.fc6 set to be updated ---> Package perl-Proc-Daemon.noarch 0:0.03-1.el5 set to be updated --> Finished Dependency Resolution Beginning Kernel Module Plugin Finished Kernel Module Plugin --> Running transaction check ---> Package kernel.x86_64 0:2.6.18-238.1.1.el5 set to be erased --> Finished Dependency Resolution Dependencies Resolved ==================================================================================================================== Package Arch Version Repository Size ==================================================================================================================== Installing: kernel x86_64 2.6.18-274.3.1.el5 fermi-security 21 M ocsinventory-agent noarch 1.1.2.1-1.el5 epel 156 k replacing ocsinventory-client.noarch 0.9.9-10 Removing: kernel x86_64 2.6.18-238.1.1.el5 installed 93 M Installing for dependencies: monitor-edid x86_64 2.5-1.el5.1 epel 82 k nmap x86_64 2:4.11-1.1 sl-base 680 k perl-Crypt-SSLeay x86_64 0.51-11.el5 sl-base 45 k perl-Net-IP noarch 1.25-2.fc6 sl-base 31 k perl-Net-SSLeay x86_64 1.30-4.fc6 sl-base 192 k perl-Proc-Daemon noarch 0.03-1.el5 epel 9.4 k Transaction Summary ==================================================================================================================== Install 8 Package(s) Upgrade 0 Package(s) Remove 1 Package(s) Reinstall 0 Package(s) Downgrade 0 Package(s) Total download size: 22 M Is this ok [y/N]: y Downloading Packages: (1/8) : perl-Proc-Daemon-0.03-1.el5.noarch.rpm | 9.4 kB 00:00 (2/8) : perl-Net-IP-1.25-2.fc6.noarch.rpm | 31 kB 00:00 (3/8) : perl-Crypt-SSLeay-0.51-11.el5.x86_64.rpm | 45 kB 00:00 (4/8) : monitor-edid-2.5-1.el5.1.x86_64.rpm | 82 kB 00:00 (5/8) : ocsinventory-agent-1.1.2.1-1.el5.noarch.rpm | 156 kB 00:00 (6/8) : perl-Net-SSLeay-1.30-4.fc6.x86_64.rpm | 192 kB 00:00 (7/8) : nmap-4.11-1.1.x86_64.rpm | 680 kB 00:00 (8/8) : kernel-2.6.18-274.3.1.el5.x86_64.rpm | 21 MB 00:00 -------------------------------------------------------------------------------------------------------------------- Total 3.5 MB/s | 22 MB 00:06 warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID 217521f6 epel/gpgkey | 1.7 kB 00:00 Importing GPG key 0x217521F6 \"Fedora EPEL <epel@fedoraproject.org>\" from /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL Is this ok [y/N]: y Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : perl-Net-SSLeay 1/10 Installing : nmap 2/10 Installing : monitor-edid 3/10 Installing : perl-Crypt-SSLeay 4/10 Installing : perl-Net-IP 5/10 Installing : perl-Proc-Daemon 6/10 Installing : kernel 7/10 Installing : ocsinventory-agent 8/10 ule, priorities 957 packages excluded due to repository priority protections kernel.x86_64 2.6.18-274.3.1.el5 fermi-security Obsoleting Packages ocsinventory-agent.noarch 1.1.2.1-1.el5 epel ocsinventory-client.noarch 0.9.9-10 installed Erasing : ocsinventory-client 9/10 warning: /etc/ocsinventory-client/ocsinv.conf saved as /etc/ocsinventory-client/ocsinv.conf.rpmsave Cleanup : kernel 10/10 Removed: kernel.x86_64 0:2.6.18-238.1.1.el5 Installed: kernel.x86_64 0:2.6.18-274.3.1.el5 ocsinventory-agent.noarch 0:1.1.2.1-1.el5 Dependency Installed: monitor-edid.x86_64 0:2.5-1.el5.1 nmap.x86_64 2:4.11-1.1 perl-Crypt-SSLeay.x86_64 0:0.51-11.el5 perl-Net-IP.noarch 0:1.25-2.fc6 perl-Net-SSLeay.x86_64 0:1.30-4.fc6 perl-Proc-Daemon.noarch 0:0.03-1.el5 Replaced: ocsinventory-client.noarch 0:0.9.9-10 Complete! Advanced topic: Only geting OSG updates \u00b6 If you only want to get updates from the OSG repository and no other repositories, you can tell yum to do that with the following command: root@host # yum --disablerepo = * --enablerepo = osg update Advanced topic: Getting debugging information for installed software \u00b6 If you run into a problem with our software and have a hankering to debug it directly (or perhaps we need to ask you for some help), you can install so-called \"debuginfo\" packages. These packages will provide debugging symbols and source code so that you can do things like run gdb or pstack to get information about a program. Installing the debuginfo package requires three steps. Install the yum-utils package, which contains the debuginfo-install utility. root@host # yum install yum-utils ... ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: yum-utils noarch 4.3.0-5.el9_2 baseos 35 k Transaction Summary ==================================================================================================================== Install 1 Package Total download size: 35 k Installed size: 23 k Is this ok [y/N]: y ... Installed: yum-utils-4.3.0-5.el9_2.noarch Figure out which package installed the program you want to debug. One way to figure it out is to ask RPM. For example, if you want to debug grid-proxy-init: user@host $ rpm -qf ` which voms-proxy-init ` voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 Install the debugging information for that package. Continuing this example: root@host # debuginfo-install voms-clients-cpp ... ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: voms-clients-cpp-debuginfo x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 437 k voms-debugsource x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 195 k Installing dependencies: voms-debuginfo x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 928 k Transaction Summary ==================================================================================================================== Install 3 Packages Total download size: 1.5 M Installed size: 6.0 M Is this ok [y/N]: y ... Installed: voms-clients-cpp-debuginfo-2.1.0-0.27.rc3.el9.x86_64 voms-debuginfo-2.1.0-0.27.rc3.el9.x86_64 voms-debugsource-2.1.0-0.27.rc3.el9.x86_64 This last step will select the right package name, then use yum to install it. Troubleshooting \u00b6 Yum not finding packages \u00b6 If you is not finding some packages, e.g.: Error Downloading Packages: packageXYZ: failure: packageXYZ.rpm from osg: [Errno 256] No more mirrors to try. then you can try cleaning up Yum's cache: root@host # yum clean all --enablerepo = * Yum complaining about missing keys \u00b6 If yum is complaining you can re-import the keys in your distribution: root@host # rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY* References \u00b6 The main Yum web site A good description of the commands for RPM and Yum can be found at Learn Linux 101: RPM and Yum Package Management .","title":"Yum Basics"},{"location":"release/yum-basics/#basics-of-using-yum-and-rpm","text":"","title":"Basics of using yum and RPM"},{"location":"release/yum-basics/#about-this-document","text":"This document introduces package management tools that help you install, update, and remove packages. OSG uses RPMs (the Red Hat Packaging Manager) to package its software. While RPM is the packaging format, yum is the command you will use to do the installation. For example, yum will resolve and download the dependencies for the package you want to install; rpm will simply complain if you want to install a package that does not have all its dependencies installed.","title":"About This Document"},{"location":"release/yum-basics/#installation","text":"Installation is done with the yum install command. Each of the individual installation guide shows you the correct command to use to do an installation. Here is an example installation with all of the output from yum. root@host # sudo yum install osg-ca-certs OSG Software for Enterprise Linux 9 - x86_64 668 kB/s | 438 kB 00:00 Dependencies resolved. ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: osg-ca-certs noarch 1.110-1.2.osg36.el9 osg 244 k Transaction Summary ==================================================================================================================== Install 1 Package Total download size: 244 k Installed size: 340 k Is this ok [y/N]: y Downloading Packages: osg-ca-certs-1.110-1.2.osg36.el9.noarch.rpm 1.5 MB/s | 244 kB 00:00 -------------------------------------------------------------------------------------------------------------------- Total 1.0 MB/s | 244 kB 00:00 OSG Software for Enterprise Linux 9 - x86_64 3.0 MB/s | 3.1 kB 00:00 Importing GPG key 0x1887C61A: Userid : \"OSG Software 3.6 for EL9 RSA <help@osg-htc.org>\" Fingerprint: B77E 70A6 0537 1D3B E109 A18E 3170 E150 1887 C61A From : /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 Is this ok [y/N]: y ... Installed: osg-ca-certs-1.110-1.2.osg36.el9.noarch Please Note : When you first install a package from the OSG repository, you will be prompted to import the GPG key. We use this key to sign our RPMs as a security measure. You should double-check the key id (above it is 824B8603) with the information on our signed RPMs . If it doesn't match, there is a problem somewhere and you should report it to the OSG via help@osg-htc.org.","title":"Installation"},{"location":"release/yum-basics/#verifying-packages-and-installations","text":"You can check if an RPM has been modified. For instance, to check to see if any files have been modified in the osg-ca-certs RPM you just installed: user@host $ rpm --verify osg-ca-certs The lack of any output means there were no problems. If you would like to see all the files for which there are no problems, you can do: user@host $ rpm --verify -v osg-ca-certs ........ /etc/grid-security/certificates ........ /etc/grid-security/certificates/0119347c.0 ........ /etc/grid-security/certificates/0119347c.namespaces ........ /etc/grid-security/certificates/0119347c.signing_policy ... etc ... Each dot indicates a specific check that was made and passed. If someone had modified a file, you might see this: user@host $ rpm --verify osg-ca-certs ..5....T /etc/grid-security/certificates/ffc3d59b.0 This means the files MD5 checksum has changed (so the contents have changed) and the timestamp is different. The complete set of changes you might see (copied from the rpm man page) are: Letter Meaning S file Size differs M Mode differs (includes permissions and file type) 5 MD5 sum differs D Device major/minor number mismatch L readLink(2) path mismatch U User ownership differs G Group ownership differs T mTime differs If you don't care about some of those changes, you can tell rpm to ignore them. For instance, to ignore changes in the modification time: user@host $ rpm --verify --nomtime osg-ca-certs ..5..... /etc/grid-security/certificates/ffc3d59b.0","title":"Verifying Packages and Installations"},{"location":"release/yum-basics/#understanding-a-package","text":"If you want to know what package a file belongs to, you can ask rpm. For instance, if you're curious what package contains the srm-ls command, you can do: # 1 . Find the exact path user@host $ which osg-cert-request /usr/bin/osg-cert-request # 2 . Ask rpm what package it is part of: user@host $ rpm -q --file /usr/bin/osg-cert-request osg-pki-tools-3.5.2-2.osg36.el9.noarch If you want to know what other things are in a package--perhaps the other available tools or configuration files--you can do that as well: user@host $ rpm -ql osg-pki-tools /usr/bin/osg-cert-request /usr/bin/osg-incommon-cert-request /usr/lib/python3.9/site-packages/osgpkitools /usr/lib/python3.9/site-packages/osgpkitools/ExceptionDefinitions.py /usr/lib/python3.9/site-packages/osgpkitools/__init__.py ... output trimmed ...","title":"Understanding a package"},{"location":"release/yum-basics/#what-else-does-a-package-install","text":"Sometimes you need to understand what other software is installed by a package. This can be particularly useful for understanding meta-packages , which are packages such as the osg-wn-client (worker node client) that contain nothing by themselves but only depend on other RPMs. To do this, use the --requires option to rpm. For example, you can see that the worker node client (as of OSG 3.6.0 in early June, 2023) will install curl , uberftp , wget , and a dozen or so other packages. user@host $ rpm -q --requires osg-wn-client /usr/bin/curl /usr/bin/ldapsearch /usr/bin/wget /usr/bin/xrdcp config(osg-wn-client) = 3.6-6.osg36.el9 fetch-crl gfal2 gfal2-plugin-file gfal2-plugin-http gfal2-plugin-xrootd grid-certificates >= 7 osg-system-profiler python3-gfal2-util rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(PayloadIsZstd) <= 5.4.18-1 stashcp vo-client voms-clients-cpp","title":"What else does a package install?"},{"location":"release/yum-basics/#finding-rpm-packages","text":"It is normally best to read the OSG documentation to decide which packages to install because it may not be obvious what the correct packages to install are. That said, you can use yum to find out all sort of things. For instance, you can list packages that begin with \"voms\": user@host $ yum list \"voms*\" Available Packages voms.x86_64 2.1.0-0.27.rc3.el9 epel voms-clients-cpp.x86_64 2.1.0-0.27.rc3.el9 epel voms-api-java.noarch 3.3.2-11.el9 epel voms-api-java-javadoc.noarch 3.3.2-11.el9 epel voms-clients-java.noarch 3.3.2-5.el9 epel voms-devel.x86_64 2.1.0-0.27.rc3.el9 epel voms-doc.noarch 2.1.0-0.27.rc3.el9 epel voms-mysql-plugin.x86_64 3.1.7-13.el9 epel voms-server.x86_64 2.1.0-0.27.rc3.el9 epel If you want to search for packages that contain VOMS anywhere in the name or description, you can use yum search : user@host $ yum search voms ============================================ Name Exactly Matched: voms ============================================ voms.x86_64 : Virtual Organization Membership Service =========================================== Name & Summary Matched: voms =========================================== vo-client-lcmaps-voms.noarch : Provides a voms-mapfile-default file, mapping VOMS FQANs to Unix users suitable for : use by the LCMAPS VOMS plugin voms-mysql-plugin.x86_64 : VOMS server plugin for MySQL xrootd-voms.x86_64 : VOMS attribute extractor plug-in for XRootD ================================================ Name Matched: voms ================================================ voms-doc.noarch : Virtual Organization Membership Service Documentation voms-server.x86_64 : Virtual Organization Membership Service Server ============================================== Summary Matched: voms =============================================== vo-client.noarch : Contains vomses file for use with user authentication vo-client-dcache.noarch : Provides a grid-vorolemap file for use by dCache, similar to voms-mapfile-default ... etc ... One last example, if you want to know what RPM would give you the voms-proxy-init command, you can ask yum . The * indicates that you don't know the full pathname of voms-proxy-init . user@host $ yum whatprovides \"*voms-proxy-init\" voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 : Virtual Organization Membership Service Clients Repo : @System Matched from: Other : *voms-proxy-init voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 : Virtual Organization Membership Service Clients Repo : epel Matched from: Other : *voms-proxy-init voms-clients-java-3.3.2-5.el9.noarch : Virtual Organization Membership Service Java clients Repo : epel Matched from: Other : *voms-proxy-init","title":"Finding RPM Packages"},{"location":"release/yum-basics/#removing-packages","text":"To remove a single RPM, you can use yum remove . Not only will it uninstall the RPM you requested, but it will uninstall anything that depends on it. For example, if I previously installed the voms-clients package, I also installed another package it depends on called voms . If I remove voms , yum will also remove voms-clients : user@host $ sudo yum remove voms Dependencies resolved. ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Removing: voms x86_64 2.1.0-0.27.rc3.el9 @epel 432 k Removing dependent packages: osg-wn-client noarch 3.6-6.osg36.el9 @osg 211 Transaction Summary ==================================================================================================================== Remove 2 Package(s) Is this ok [y/N]: y Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running Transaction ... etc ... Removed: voms-2.1.0-0.27.rc3.el9.x86_64 Complete!","title":"Removing Packages"},{"location":"release/yum-basics/#upgrading-packages","text":"You can check for updates with yum check-update . For example: root@host # yum check-update Loaded plugins: kernel-module, priorities 957 packages excluded due to repository priority protections kernel.x86_64 2.6.18-274.3.1.el5 fermi-security Obsoleting Packages ocsinventory-agent.noarch 1.1.2.1-1.el5 epel ocsinventory-client.noarch 0.9.9-10 installed You can do the update with yum update . Note that in this case we got more than was listed due to dependencies that needed to be resolved: root@host # yum update 957 packages excluded due to repository priority protections Setting up Update Process Resolving Dependencies --> Running transaction check ---> Package kernel.x86_64 0:2.6.18-274.3.1.el5 set to be installed ---> Package ocsinventory-agent.noarch 0:1.1.2.1-1.el5 set to be updated --> Processing Dependency: perl(Crypt::SSLeay) for package: ocsinventory-agent --> Processing Dependency: perl(Proc::Daemon) for package: ocsinventory-agent --> Processing Dependency: monitor-edid for package: ocsinventory-agent --> Processing Dependency: perl(Net::IP) for package: ocsinventory-agent --> Processing Dependency: nmap for package: ocsinventory-agent --> Processing Dependency: perl(Net::SSLeay) for package: ocsinventory-agent --> Running transaction check ---> Package monitor-edid.x86_64 0:2.5-1.el5.1 set to be updated ---> Package nmap.x86_64 2:4.11-1.1 set to be updated ---> Package perl-Crypt-SSLeay.x86_64 0:0.51-11.el5 set to be updated ---> Package perl-Net-IP.noarch 0:1.25-2.fc6 set to be updated ---> Package perl-Net-SSLeay.x86_64 0:1.30-4.fc6 set to be updated ---> Package perl-Proc-Daemon.noarch 0:0.03-1.el5 set to be updated --> Finished Dependency Resolution Beginning Kernel Module Plugin Finished Kernel Module Plugin --> Running transaction check ---> Package kernel.x86_64 0:2.6.18-238.1.1.el5 set to be erased --> Finished Dependency Resolution Dependencies Resolved ==================================================================================================================== Package Arch Version Repository Size ==================================================================================================================== Installing: kernel x86_64 2.6.18-274.3.1.el5 fermi-security 21 M ocsinventory-agent noarch 1.1.2.1-1.el5 epel 156 k replacing ocsinventory-client.noarch 0.9.9-10 Removing: kernel x86_64 2.6.18-238.1.1.el5 installed 93 M Installing for dependencies: monitor-edid x86_64 2.5-1.el5.1 epel 82 k nmap x86_64 2:4.11-1.1 sl-base 680 k perl-Crypt-SSLeay x86_64 0.51-11.el5 sl-base 45 k perl-Net-IP noarch 1.25-2.fc6 sl-base 31 k perl-Net-SSLeay x86_64 1.30-4.fc6 sl-base 192 k perl-Proc-Daemon noarch 0.03-1.el5 epel 9.4 k Transaction Summary ==================================================================================================================== Install 8 Package(s) Upgrade 0 Package(s) Remove 1 Package(s) Reinstall 0 Package(s) Downgrade 0 Package(s) Total download size: 22 M Is this ok [y/N]: y Downloading Packages: (1/8) : perl-Proc-Daemon-0.03-1.el5.noarch.rpm | 9.4 kB 00:00 (2/8) : perl-Net-IP-1.25-2.fc6.noarch.rpm | 31 kB 00:00 (3/8) : perl-Crypt-SSLeay-0.51-11.el5.x86_64.rpm | 45 kB 00:00 (4/8) : monitor-edid-2.5-1.el5.1.x86_64.rpm | 82 kB 00:00 (5/8) : ocsinventory-agent-1.1.2.1-1.el5.noarch.rpm | 156 kB 00:00 (6/8) : perl-Net-SSLeay-1.30-4.fc6.x86_64.rpm | 192 kB 00:00 (7/8) : nmap-4.11-1.1.x86_64.rpm | 680 kB 00:00 (8/8) : kernel-2.6.18-274.3.1.el5.x86_64.rpm | 21 MB 00:00 -------------------------------------------------------------------------------------------------------------------- Total 3.5 MB/s | 22 MB 00:06 warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID 217521f6 epel/gpgkey | 1.7 kB 00:00 Importing GPG key 0x217521F6 \"Fedora EPEL <epel@fedoraproject.org>\" from /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL Is this ok [y/N]: y Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : perl-Net-SSLeay 1/10 Installing : nmap 2/10 Installing : monitor-edid 3/10 Installing : perl-Crypt-SSLeay 4/10 Installing : perl-Net-IP 5/10 Installing : perl-Proc-Daemon 6/10 Installing : kernel 7/10 Installing : ocsinventory-agent 8/10 ule, priorities 957 packages excluded due to repository priority protections kernel.x86_64 2.6.18-274.3.1.el5 fermi-security Obsoleting Packages ocsinventory-agent.noarch 1.1.2.1-1.el5 epel ocsinventory-client.noarch 0.9.9-10 installed Erasing : ocsinventory-client 9/10 warning: /etc/ocsinventory-client/ocsinv.conf saved as /etc/ocsinventory-client/ocsinv.conf.rpmsave Cleanup : kernel 10/10 Removed: kernel.x86_64 0:2.6.18-238.1.1.el5 Installed: kernel.x86_64 0:2.6.18-274.3.1.el5 ocsinventory-agent.noarch 0:1.1.2.1-1.el5 Dependency Installed: monitor-edid.x86_64 0:2.5-1.el5.1 nmap.x86_64 2:4.11-1.1 perl-Crypt-SSLeay.x86_64 0:0.51-11.el5 perl-Net-IP.noarch 0:1.25-2.fc6 perl-Net-SSLeay.x86_64 0:1.30-4.fc6 perl-Proc-Daemon.noarch 0:0.03-1.el5 Replaced: ocsinventory-client.noarch 0:0.9.9-10 Complete!","title":"Upgrading Packages"},{"location":"release/yum-basics/#advanced-topic-only-geting-osg-updates","text":"If you only want to get updates from the OSG repository and no other repositories, you can tell yum to do that with the following command: root@host # yum --disablerepo = * --enablerepo = osg update","title":"Advanced topic: Only geting OSG updates"},{"location":"release/yum-basics/#advanced-topic-getting-debugging-information-for-installed-software","text":"If you run into a problem with our software and have a hankering to debug it directly (or perhaps we need to ask you for some help), you can install so-called \"debuginfo\" packages. These packages will provide debugging symbols and source code so that you can do things like run gdb or pstack to get information about a program. Installing the debuginfo package requires three steps. Install the yum-utils package, which contains the debuginfo-install utility. root@host # yum install yum-utils ... ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: yum-utils noarch 4.3.0-5.el9_2 baseos 35 k Transaction Summary ==================================================================================================================== Install 1 Package Total download size: 35 k Installed size: 23 k Is this ok [y/N]: y ... Installed: yum-utils-4.3.0-5.el9_2.noarch Figure out which package installed the program you want to debug. One way to figure it out is to ask RPM. For example, if you want to debug grid-proxy-init: user@host $ rpm -qf ` which voms-proxy-init ` voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 Install the debugging information for that package. Continuing this example: root@host # debuginfo-install voms-clients-cpp ... ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: voms-clients-cpp-debuginfo x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 437 k voms-debugsource x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 195 k Installing dependencies: voms-debuginfo x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 928 k Transaction Summary ==================================================================================================================== Install 3 Packages Total download size: 1.5 M Installed size: 6.0 M Is this ok [y/N]: y ... Installed: voms-clients-cpp-debuginfo-2.1.0-0.27.rc3.el9.x86_64 voms-debuginfo-2.1.0-0.27.rc3.el9.x86_64 voms-debugsource-2.1.0-0.27.rc3.el9.x86_64 This last step will select the right package name, then use yum to install it.","title":"Advanced topic: Getting debugging information for installed software"},{"location":"release/yum-basics/#troubleshooting","text":"","title":"Troubleshooting"},{"location":"release/yum-basics/#yum-not-finding-packages","text":"If you is not finding some packages, e.g.: Error Downloading Packages: packageXYZ: failure: packageXYZ.rpm from osg: [Errno 256] No more mirrors to try. then you can try cleaning up Yum's cache: root@host # yum clean all --enablerepo = *","title":"Yum not finding packages"},{"location":"release/yum-basics/#yum-complaining-about-missing-keys","text":"If yum is complaining you can re-import the keys in your distribution: root@host # rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY*","title":"Yum complaining about missing keys"},{"location":"release/yum-basics/#references","text":"The main Yum web site A good description of the commands for RPM and Yum can be found at Learn Linux 101: RPM and Yum Package Management .","title":"References"},{"location":"resource-sharing/os-backfill-containers/","text":"Open Science Pool Containers \u00b6 In order to share compute resources with the Open Science Pool (OSPool), sites can launch pilot jobs directly by starting an OSG-provided Docker container. The container includes a worker node environment and an embedded pilot; when combined with an OSG-provided authentication token (not included in the container), the pilot can connect to the OSPool and start executing jobs. This technique is useful to implement backfill at a site: contributing computing resources when they would otherwise be idle. Container Limitations These containers do not allow the site to share resources between multiple pools and, if there are no matching idle jobs in the OSPool, the pilots may remain idle. Before Starting \u00b6 In order to configure the container, you will need: A system that can run containers, such as Docker or Kubernetes A registered administrative contact A registered resource in OSG Topology; resource registration allows OSG to do proper usage accounting and maintain contacts in case of security incidents and other issues. An authentication token from the OSG: once contact and resource registration are complete, you can retrieve a token through the OSPool Token Registry An HTTP caching proxy at or near your site. Running the Container with Docker \u00b6 The Docker image is kept in DockerHub . In order to successfully start payload jobs: Configure authentication: Authentication with the OSPool is performed using tokens retrieved from the OSPool Token Registry which you can then pass to the container by volume mounting it as a file under /etc/condor/tokens-orig.d/ . If you are using Docker to launch the container, this is done with the command line flag -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org . Replace /path/to/token with the full path to the token you obtained from the OSPool Token Registry. Set GLIDEIN_Site and GLIDEIN_ResourceName to match the resource group name and resource name that you registered in Topology, respectively. Set the OSG_SQUID_LOCATION environment variable to the HTTP address of your preferred Squid instance. If providing NVIDIA GPU resources: Bind-mount /etc/OpenCL/vendors , read-only. If you are using Docker to launch the container, this is done with the command line flags -v /etc/OpenCL/vendors:/etc/OpenCL/vendors:ro . Strongly_recommended: Enable CVMFS via one of the mechanisms described below. Strongly recommended: If you want job I/O to be done in a separate directory outside of the container, volume mount the desired directory on the host to /pilot inside the container. Without this, user jobs may compete for disk space with other containers on your system. If you are using Docker to launch the container, this is done with the command line flag -v /worker-temp-dir:/pilot . Replace /worker-temp-dir with a directory you created for jobs to write into. Make sure the user you run your container as has write access to this directory. Optional: add an expression with the GLIDEIN_Start_Extra environment variable to append to the HTCondor START expression ; this limits the pilot to only run certain jobs. Optional: limit OSG pilot container resource usage Here is an example invocation using docker run by hand: docker run -it --rm --user osg \\ --pull=always \\ --privileged \\ -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org \\ -v /worker-temp-dir:/pilot \\ -e GLIDEIN_Site=\"...\" \\ -e GLIDEIN_ResourceName=\"...\" \\ -e GLIDEIN_Start_Extra=\"True\" \\ -e OSG_SQUID_LOCATION=\"...\" \\ -e CVMFSEXEC_REPOS=\" \\ oasis.opensciencegrid.org \\ singularity.opensciencegrid.org\" \\ opensciencegrid/osgvo-docker-pilot:3.6-release Replace /path/to/token with the location you saved the token obtained from the OSPool Token Registry. Privileged mode ( --privileged ) requested in the above docker run allows the container to mount CVMFS using cvmfsexec and invoke singularity for user jobs. Singularity (now known as Apptainer) allows OSPool users to use their own container for their job (e.g., a common use case for GPU jobs). Optional Configuration \u00b6 (Recommended) CVMFS \u00b6 CernVM-FS (CVMFS) is a read-only remote filesystem that many OSG jobs depend on for software and data. Supporting CVMFS inside your container will greatly increase the types of OSG jobs you can run. There are two methods for making CVMFS available in your container: enabling cvmfsexec , or bind mounting CVMFS from the host . Bind mounting CVMFS will require CVMFS to be installed on the host first, but the container will need fewer privileges. cvmfsexec \u00b6 cvmfsexec System Requirements On EL7, you must have kernel version >= 3.10.0-1127 (run uname -vr to check), and user namespaces enabled. See step 1 in the Apptainer Install document for details. On EL8, you must have kernel version >= 4.18 (run uname -vr to check). See the cvmfsexec README details. cvmfsexec is a tool that can be used to mount CVMFS inside the container without requiring CVMFS on the host. To enable cvmfsexec, specify a space-separated list of repos in the CVMFSEXEC_REPOS environment variable. At a minimum, we recommend enabling the following repos: oasis.opensciencegrid.org singularity.opensciencegrid.org Additionally, you may set the following environment variables to further control the behavior of cvmfsexec: CVMFS_HTTP_PROXY - this sets the proxy to use for CVMFS; if left blank it will find the best one via WLCG Web Proxy Auto Discovery. CVMFS_QUOTA_LIMIT - the quota limit in MB for CVMFS; leave this blank to use the system default (4 GB) You can add other CVMFS options by bind mounting a config file over /cvmfsexec/default.local ; note that options in environment variables take precedence over options in /cvmfsexec/default.local . You may store the cache outside of the container by volume mounting a directory to /cvmfs-cache . Similarly, logs may be stored outside of the container by volume mounting a directory to /cvmfs-logs . Bind mount \u00b6 As an alternative to using cvmfsexec, you may install CVMFS on the host, and volume mount it into the container. Containers with bind mounted CVMFS can be run without --privileged but still require the following capabilities: DAC_OVERRIDE , DAC_READ_SEARCH , SETGID , SETUID , SYS_ADMIN , SYS_CHROOT , and SYS_PTRACE . Once you have CVMFS installed and mounted on your host, add -v /cvmfs:/cvmfs:shared to your docker run invocation. This is the example at the top of the page , modified to volume mount CVMFS instead of using cvmfsexec, and using reduced privileges: docker run -it --rm --user osg \\ --pull=always \\ --security-opt seccomp=unconfined \\ --security-opt systempaths=unconfined \\ --security-opt no-new-privileges \\ -v /cvmfs:/cvmfs:shared \\ -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org \\ -v /worker-temp-dir:/pilot \\ -e GLIDEIN_Site=\"...\" \\ -e GLIDEIN_ResourceName=\"...\" \\ -e GLIDEIN_Start_Extra=\"True\" \\ -e OSG_SQUID_LOCATION=\"...\" \\ opensciencegrid/osgvo-docker-pilot:3.6-release Fill in the values for /path/to/token , /worker-temp-dir , GLIDEIN_Site , GLIDEIN_ResourceName , and OSG_SQUID_LOCATION as above . Limiting resource usage \u00b6 By default, the container allows jobs to utilize the entire node's resources (CPUs, memory). To limit a container's resource consumptions, you may specify limits, which must be set in the following ways: As environment variables, limiting the resources the pilot offers to jobs. As options to the docker run command, limiting the resources the pilot container can use. CPUs \u00b6 To limit the number of CPUs available to jobs (thus limiting the number of simultaneous jobs), add the following to your docker run command: -e NUM_CPUS=<X> --cpus=<X> \\ where <X> is the number of CPUs you want to allow jobs to use. The NUM_CPUS environment variable tells the pilot not to offer more than the given number of CPUs to jobs; the --cpus argument tells Docker not to allocate more than the given number of CPUs to the container. Memory \u00b6 To limit the total amount of memory available to jobs, add the following to your docker run command: -e MEMORY=<X> --memory=$(( (<X> + 100) * 1024 * 1024 )) \\ where <X> is the total amount of memory (in MB) you want to allow jobs to use. The MEMORY environment variable tells the pilot not to offer more than the given amount of memory to jobs; the --memory argument tells Docker to kill the container if its total memory usage exceeds the given number. Allocating additional memory Note that the above command will allocate 100 MB more memory to the container. The pilot will place jobs on hold if they exceed their requested memory, but it may not notice high memory usage immediately. Additionally, the processes that manage jobs also use some amount of memory. Therefore, it is important to give the container some extra room. Advanced: Advertising additional pilot attributes \u00b6 You can put arbitrary additional attributes in the machine ads that the pilot advertises to the OSPool. These attributes will show up when users run condor_status -l against your pilot. This could be useful for advertising something about the way the pilot was provisioned. To do this, volume-mount a file containing key=value pairs to /etc/osg/extra-attributes.cfg . Keys must be valid classad attribute names and values must be valid classad expressions. Multi-line strings are not supported. A line starting with # will be treated as a comment. For example: # The Kubernetes namespace this pod is running under KUBERNETES_NAMESPACE = \"path-osgdev\" # The deployment for this pilot KUBERNETES_DEPLOYMENT = \"osgvo-docker-pilot-gpu\" Best Practices \u00b6 We recommend pulling new versions of backfill containers at least every 72 hours. This ensures that your containers will have the latest bug and security fixes as well as the configurations to match the rest of the OSPool. Additionally, you may see better utilization of your resources as the OSPool may ignore resources with a high number of job starts. Getting Help \u00b6 For assistance, please use the this page .","title":"Site Backfill"},{"location":"resource-sharing/os-backfill-containers/#open-science-pool-containers","text":"In order to share compute resources with the Open Science Pool (OSPool), sites can launch pilot jobs directly by starting an OSG-provided Docker container. The container includes a worker node environment and an embedded pilot; when combined with an OSG-provided authentication token (not included in the container), the pilot can connect to the OSPool and start executing jobs. This technique is useful to implement backfill at a site: contributing computing resources when they would otherwise be idle. Container Limitations These containers do not allow the site to share resources between multiple pools and, if there are no matching idle jobs in the OSPool, the pilots may remain idle.","title":"Open Science Pool Containers"},{"location":"resource-sharing/os-backfill-containers/#before-starting","text":"In order to configure the container, you will need: A system that can run containers, such as Docker or Kubernetes A registered administrative contact A registered resource in OSG Topology; resource registration allows OSG to do proper usage accounting and maintain contacts in case of security incidents and other issues. An authentication token from the OSG: once contact and resource registration are complete, you can retrieve a token through the OSPool Token Registry An HTTP caching proxy at or near your site.","title":"Before Starting"},{"location":"resource-sharing/os-backfill-containers/#running-the-container-with-docker","text":"The Docker image is kept in DockerHub . In order to successfully start payload jobs: Configure authentication: Authentication with the OSPool is performed using tokens retrieved from the OSPool Token Registry which you can then pass to the container by volume mounting it as a file under /etc/condor/tokens-orig.d/ . If you are using Docker to launch the container, this is done with the command line flag -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org . Replace /path/to/token with the full path to the token you obtained from the OSPool Token Registry. Set GLIDEIN_Site and GLIDEIN_ResourceName to match the resource group name and resource name that you registered in Topology, respectively. Set the OSG_SQUID_LOCATION environment variable to the HTTP address of your preferred Squid instance. If providing NVIDIA GPU resources: Bind-mount /etc/OpenCL/vendors , read-only. If you are using Docker to launch the container, this is done with the command line flags -v /etc/OpenCL/vendors:/etc/OpenCL/vendors:ro . Strongly_recommended: Enable CVMFS via one of the mechanisms described below. Strongly recommended: If you want job I/O to be done in a separate directory outside of the container, volume mount the desired directory on the host to /pilot inside the container. Without this, user jobs may compete for disk space with other containers on your system. If you are using Docker to launch the container, this is done with the command line flag -v /worker-temp-dir:/pilot . Replace /worker-temp-dir with a directory you created for jobs to write into. Make sure the user you run your container as has write access to this directory. Optional: add an expression with the GLIDEIN_Start_Extra environment variable to append to the HTCondor START expression ; this limits the pilot to only run certain jobs. Optional: limit OSG pilot container resource usage Here is an example invocation using docker run by hand: docker run -it --rm --user osg \\ --pull=always \\ --privileged \\ -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org \\ -v /worker-temp-dir:/pilot \\ -e GLIDEIN_Site=\"...\" \\ -e GLIDEIN_ResourceName=\"...\" \\ -e GLIDEIN_Start_Extra=\"True\" \\ -e OSG_SQUID_LOCATION=\"...\" \\ -e CVMFSEXEC_REPOS=\" \\ oasis.opensciencegrid.org \\ singularity.opensciencegrid.org\" \\ opensciencegrid/osgvo-docker-pilot:3.6-release Replace /path/to/token with the location you saved the token obtained from the OSPool Token Registry. Privileged mode ( --privileged ) requested in the above docker run allows the container to mount CVMFS using cvmfsexec and invoke singularity for user jobs. Singularity (now known as Apptainer) allows OSPool users to use their own container for their job (e.g., a common use case for GPU jobs).","title":"Running the Container with Docker"},{"location":"resource-sharing/os-backfill-containers/#optional-configuration","text":"","title":"Optional Configuration"},{"location":"resource-sharing/os-backfill-containers/#recommended-cvmfs","text":"CernVM-FS (CVMFS) is a read-only remote filesystem that many OSG jobs depend on for software and data. Supporting CVMFS inside your container will greatly increase the types of OSG jobs you can run. There are two methods for making CVMFS available in your container: enabling cvmfsexec , or bind mounting CVMFS from the host . Bind mounting CVMFS will require CVMFS to be installed on the host first, but the container will need fewer privileges.","title":"(Recommended) CVMFS"},{"location":"resource-sharing/os-backfill-containers/#cvmfsexec","text":"cvmfsexec System Requirements On EL7, you must have kernel version >= 3.10.0-1127 (run uname -vr to check), and user namespaces enabled. See step 1 in the Apptainer Install document for details. On EL8, you must have kernel version >= 4.18 (run uname -vr to check). See the cvmfsexec README details. cvmfsexec is a tool that can be used to mount CVMFS inside the container without requiring CVMFS on the host. To enable cvmfsexec, specify a space-separated list of repos in the CVMFSEXEC_REPOS environment variable. At a minimum, we recommend enabling the following repos: oasis.opensciencegrid.org singularity.opensciencegrid.org Additionally, you may set the following environment variables to further control the behavior of cvmfsexec: CVMFS_HTTP_PROXY - this sets the proxy to use for CVMFS; if left blank it will find the best one via WLCG Web Proxy Auto Discovery. CVMFS_QUOTA_LIMIT - the quota limit in MB for CVMFS; leave this blank to use the system default (4 GB) You can add other CVMFS options by bind mounting a config file over /cvmfsexec/default.local ; note that options in environment variables take precedence over options in /cvmfsexec/default.local . You may store the cache outside of the container by volume mounting a directory to /cvmfs-cache . Similarly, logs may be stored outside of the container by volume mounting a directory to /cvmfs-logs .","title":"cvmfsexec"},{"location":"resource-sharing/os-backfill-containers/#bind-mount","text":"As an alternative to using cvmfsexec, you may install CVMFS on the host, and volume mount it into the container. Containers with bind mounted CVMFS can be run without --privileged but still require the following capabilities: DAC_OVERRIDE , DAC_READ_SEARCH , SETGID , SETUID , SYS_ADMIN , SYS_CHROOT , and SYS_PTRACE . Once you have CVMFS installed and mounted on your host, add -v /cvmfs:/cvmfs:shared to your docker run invocation. This is the example at the top of the page , modified to volume mount CVMFS instead of using cvmfsexec, and using reduced privileges: docker run -it --rm --user osg \\ --pull=always \\ --security-opt seccomp=unconfined \\ --security-opt systempaths=unconfined \\ --security-opt no-new-privileges \\ -v /cvmfs:/cvmfs:shared \\ -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org \\ -v /worker-temp-dir:/pilot \\ -e GLIDEIN_Site=\"...\" \\ -e GLIDEIN_ResourceName=\"...\" \\ -e GLIDEIN_Start_Extra=\"True\" \\ -e OSG_SQUID_LOCATION=\"...\" \\ opensciencegrid/osgvo-docker-pilot:3.6-release Fill in the values for /path/to/token , /worker-temp-dir , GLIDEIN_Site , GLIDEIN_ResourceName , and OSG_SQUID_LOCATION as above .","title":"Bind mount"},{"location":"resource-sharing/os-backfill-containers/#limiting-resource-usage","text":"By default, the container allows jobs to utilize the entire node's resources (CPUs, memory). To limit a container's resource consumptions, you may specify limits, which must be set in the following ways: As environment variables, limiting the resources the pilot offers to jobs. As options to the docker run command, limiting the resources the pilot container can use.","title":"Limiting resource usage"},{"location":"resource-sharing/os-backfill-containers/#cpus","text":"To limit the number of CPUs available to jobs (thus limiting the number of simultaneous jobs), add the following to your docker run command: -e NUM_CPUS=<X> --cpus=<X> \\ where <X> is the number of CPUs you want to allow jobs to use. The NUM_CPUS environment variable tells the pilot not to offer more than the given number of CPUs to jobs; the --cpus argument tells Docker not to allocate more than the given number of CPUs to the container.","title":"CPUs"},{"location":"resource-sharing/os-backfill-containers/#memory","text":"To limit the total amount of memory available to jobs, add the following to your docker run command: -e MEMORY=<X> --memory=$(( (<X> + 100) * 1024 * 1024 )) \\ where <X> is the total amount of memory (in MB) you want to allow jobs to use. The MEMORY environment variable tells the pilot not to offer more than the given amount of memory to jobs; the --memory argument tells Docker to kill the container if its total memory usage exceeds the given number. Allocating additional memory Note that the above command will allocate 100 MB more memory to the container. The pilot will place jobs on hold if they exceed their requested memory, but it may not notice high memory usage immediately. Additionally, the processes that manage jobs also use some amount of memory. Therefore, it is important to give the container some extra room.","title":"Memory"},{"location":"resource-sharing/os-backfill-containers/#advanced-advertising-additional-pilot-attributes","text":"You can put arbitrary additional attributes in the machine ads that the pilot advertises to the OSPool. These attributes will show up when users run condor_status -l against your pilot. This could be useful for advertising something about the way the pilot was provisioned. To do this, volume-mount a file containing key=value pairs to /etc/osg/extra-attributes.cfg . Keys must be valid classad attribute names and values must be valid classad expressions. Multi-line strings are not supported. A line starting with # will be treated as a comment. For example: # The Kubernetes namespace this pod is running under KUBERNETES_NAMESPACE = \"path-osgdev\" # The deployment for this pilot KUBERNETES_DEPLOYMENT = \"osgvo-docker-pilot-gpu\"","title":"Advanced: Advertising additional pilot attributes"},{"location":"resource-sharing/os-backfill-containers/#best-practices","text":"We recommend pulling new versions of backfill containers at least every 72 hours. This ensures that your containers will have the latest bug and security fixes as well as the configurations to match the rest of the OSPool. Additionally, you may see better utilization of your resources as the OSPool may ignore resources with a high number of job starts.","title":"Best Practices"},{"location":"resource-sharing/os-backfill-containers/#getting-help","text":"For assistance, please use the this page .","title":"Getting Help"},{"location":"resource-sharing/overview/","text":"Compute Resource Sharing Overview \u00b6 OSG uses a resource-overlay (\"pilot\") model to share resources from your local cluster: compute resources are added to a large central resource pool in the OSG through the use of a bootstrap process, often called a pilot or a glidein . These pilots, in turn, download and execute user jobs from an OSG community (also known as \"payloads\") from the resource pool to run within the pilots. On OSG, there are several resource pools, one for each large community (such as ATLAS or CMS) and the special-purpose Open Science Pool . The latter focuses on aggregating resources together for small researcher-driven groups and is operated by the OSG itself. There are several ways pilots can join a resource pool: Submitted to your local batch system by a compute entrypoint (CE). These jobs are created by an external entity, a pilot factory based on observed demand in the pool. The CE is the most common way to receive pilot jobs since they integrate with automated processes that are responsive to existing demand. Sites can launch pilot containers when they have local resources they would like to contribute directly to a specific OSG pool. The site-launched pilot container method is useful for backfilling resources without the need for a batch system; however, at times these pilots may stay idle because there is insufficient demand within the resource pool. Users can launch personal pilot containers within a site's batch system so they can use an existing share or allocation at a site through the open science pool.","title":"Overview"},{"location":"resource-sharing/overview/#compute-resource-sharing-overview","text":"OSG uses a resource-overlay (\"pilot\") model to share resources from your local cluster: compute resources are added to a large central resource pool in the OSG through the use of a bootstrap process, often called a pilot or a glidein . These pilots, in turn, download and execute user jobs from an OSG community (also known as \"payloads\") from the resource pool to run within the pilots. On OSG, there are several resource pools, one for each large community (such as ATLAS or CMS) and the special-purpose Open Science Pool . The latter focuses on aggregating resources together for small researcher-driven groups and is operated by the OSG itself. There are several ways pilots can join a resource pool: Submitted to your local batch system by a compute entrypoint (CE). These jobs are created by an external entity, a pilot factory based on observed demand in the pool. The CE is the most common way to receive pilot jobs since they integrate with automated processes that are responsive to existing demand. Sites can launch pilot containers when they have local resources they would like to contribute directly to a specific OSG pool. The site-launched pilot container method is useful for backfilling resources without the need for a batch system; however, at times these pilots may stay idle because there is insufficient demand within the resource pool. Users can launch personal pilot containers within a site's batch system so they can use an existing share or allocation at a site through the open science pool.","title":"Compute Resource Sharing Overview"},{"location":"resource-sharing/user-containers/","text":"User-launched Containers with Singularity/Apptainer \u00b6 The OSG pilot container can be launched by users in order to run jobs on resources they have access to. The most common use case, documented here, is to start the pilot container inside a Slurm batch job that is launched by the user. This is a great way to add personal resources to the Open Science Pool to increase throughput for a specific workflow. Before Starting \u00b6 In order to configure the container, you will need: A registered resource in OSG Topology; resource registration allows OSG to do proper usage accounting and maintain contacts in case of security incidents. An authentication token from the OSG. Please contact OSG support to request a token for your user. Launching Inside Slurm \u00b6 To launch inside Slurm, one needs to write a small job control script; the details will vary from site-to-site and the following is given as an example for running on compute hosts with 24 cores: #!/bin/bash #SBATCH --job-name=osg-glidein #SBATCH -p compute #SBATCH -N 1 #SBATCH -n 24 #SBATCH -t 48:00:00 #SBATCH --output=osg-glidein-%j.log export TOKEN = \"put_your_provided_token_here\" # Set this so that the OSG accouting knows where the jobs ran export GLIDEIN_Site = \"SDSC\" export GLIDEIN_ResourceName = \"Comet\" # This is an important setting limiting what jobs your glideins will accept. # At the minimum, the expression should limit the \"Owner\" of the jobs to # whatever your username is on the OSG _submit_ side export GLIDEIN_Start_Extra = \"Owner == \\\"my_osgconnect_username\\\"\" module load singularity singularity run --contain \\ --bind /cvmfs \\ --bind /dev/fuse \\ --scratch /pilot \\ docker://opensciencegrid/osgvo-docker-pilot:3.6-release The above example rebuilds the Docker container on each host. If you plan to run large numbers of these jobs, you can download the Docker container once and create a local Singularity image: $ singularity build osgvo-pilot.sif docker://opensciencegrid/osgvo-docker-pilot:3.6-release In this case, the singularity run command should be changed to: singularity run --contain --bind /cvmfs --bind /dev/fuse --scratch /pilot osgvo-pilot.sif Limiting Resource Usage \u00b6 By default, the OSG pilot container will allow jobs to utilize the entire machines's resources (CPUs, memory). This is regardless of how many cores or how much memory you requested in your SLURM batch job. If you do not have the full machine allocated for your use, you should specify limits to what HTCondor will offer. This is done by specifying environment variables when launching the container. To limit the number of CPUs available to jobs (thus limiting the number of simultaneous jobs), add the following to your singularity run command: --env NUM_CPUS=<X> where <X> is the number of CPUs you want to allow jobs to use. The NUM_CPUS environment variable will tell the pilot not to offer more than the given number of CPUs to jobs. To limit the total amount of memory available to jobs, add the following to your docker run command: --env MEMORY=<X> where <X> is the total amount of memory (in MB) you want to allow jobs to use. The MEMORY environment variable will tell the pilot not to offer more than the given amount of memory to jobs. Note If you requested a specific amount of memory for your SLURM batch job, for example with the --mem argument, you should set MEMORY to be about 100 MB less than that, for the following reasons: The pilot will place jobs on hold if they exceed their requested memory, but it may not notice high memory usage immediately. In addition, the processes that manage jobs also use some amount of memory. Therefore it is important to give the container some extra room.","title":"User Allocations"},{"location":"resource-sharing/user-containers/#user-launched-containers-with-singularityapptainer","text":"The OSG pilot container can be launched by users in order to run jobs on resources they have access to. The most common use case, documented here, is to start the pilot container inside a Slurm batch job that is launched by the user. This is a great way to add personal resources to the Open Science Pool to increase throughput for a specific workflow.","title":"User-launched Containers with Singularity/Apptainer"},{"location":"resource-sharing/user-containers/#before-starting","text":"In order to configure the container, you will need: A registered resource in OSG Topology; resource registration allows OSG to do proper usage accounting and maintain contacts in case of security incidents. An authentication token from the OSG. Please contact OSG support to request a token for your user.","title":"Before Starting"},{"location":"resource-sharing/user-containers/#launching-inside-slurm","text":"To launch inside Slurm, one needs to write a small job control script; the details will vary from site-to-site and the following is given as an example for running on compute hosts with 24 cores: #!/bin/bash #SBATCH --job-name=osg-glidein #SBATCH -p compute #SBATCH -N 1 #SBATCH -n 24 #SBATCH -t 48:00:00 #SBATCH --output=osg-glidein-%j.log export TOKEN = \"put_your_provided_token_here\" # Set this so that the OSG accouting knows where the jobs ran export GLIDEIN_Site = \"SDSC\" export GLIDEIN_ResourceName = \"Comet\" # This is an important setting limiting what jobs your glideins will accept. # At the minimum, the expression should limit the \"Owner\" of the jobs to # whatever your username is on the OSG _submit_ side export GLIDEIN_Start_Extra = \"Owner == \\\"my_osgconnect_username\\\"\" module load singularity singularity run --contain \\ --bind /cvmfs \\ --bind /dev/fuse \\ --scratch /pilot \\ docker://opensciencegrid/osgvo-docker-pilot:3.6-release The above example rebuilds the Docker container on each host. If you plan to run large numbers of these jobs, you can download the Docker container once and create a local Singularity image: $ singularity build osgvo-pilot.sif docker://opensciencegrid/osgvo-docker-pilot:3.6-release In this case, the singularity run command should be changed to: singularity run --contain --bind /cvmfs --bind /dev/fuse --scratch /pilot osgvo-pilot.sif","title":"Launching Inside Slurm"},{"location":"resource-sharing/user-containers/#limiting-resource-usage","text":"By default, the OSG pilot container will allow jobs to utilize the entire machines's resources (CPUs, memory). This is regardless of how many cores or how much memory you requested in your SLURM batch job. If you do not have the full machine allocated for your use, you should specify limits to what HTCondor will offer. This is done by specifying environment variables when launching the container. To limit the number of CPUs available to jobs (thus limiting the number of simultaneous jobs), add the following to your singularity run command: --env NUM_CPUS=<X> where <X> is the number of CPUs you want to allow jobs to use. The NUM_CPUS environment variable will tell the pilot not to offer more than the given number of CPUs to jobs. To limit the total amount of memory available to jobs, add the following to your docker run command: --env MEMORY=<X> where <X> is the total amount of memory (in MB) you want to allow jobs to use. The MEMORY environment variable will tell the pilot not to offer more than the given amount of memory to jobs. Note If you requested a specific amount of memory for your SLURM batch job, for example with the --mem argument, you should set MEMORY to be about 100 MB less than that, for the following reasons: The pilot will place jobs on hold if they exceed their requested memory, but it may not notice high memory usage immediately. In addition, the processes that manage jobs also use some amount of memory. Therefore it is important to give the container some extra room.","title":"Limiting Resource Usage"},{"location":"security/certificate-management/","text":"Managing Certificates \u00b6 The OSG provides several tools to assist in the management of host and CA certificates. This page serves as a reference guide for several of these tools: osg-pki-tools : command line tools for requesting and managing user and host certificates. osg-ca-certs-updater : A package for auto-updating CAs on a server host. osg-ca-manage : A tool for detailed management of CA directories outside RPMs. Note This is a reference document and not introduction on how to install CA certificates or request host / user certificates. Most users will want the CA overview , host certificate overview . OSG PKI Command Line Clients \u00b6 Overview \u00b6 The OSG PKI Command Line Clients provide a command-line interface for creating certificate signing requests (CSRs). Prerequisites \u00b6 If you have not already done so, you need to configure the OSG software repositories . Installation \u00b6 The command-line scripts have been packaged as an RPM and are available from the OSG repositories. To install the RPM, run: root@host # yum install osg-pki-tools Usage \u00b6 Documentation for usage of the osg-pki-tools can be found here OSG CA Certificates Updater \u00b6 This section explains the installation and use of osg-ca-certs-updater , a package that provides automatic updates of CA certificates. Requirements \u00b6 As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Install instructions \u00b6 Run the following command to install the latest version of the updater. root@host# yum install osg-ca-certs-updater Services \u00b6 Starting and Enabling Services \u00b6 Run the following to enable the updater. This will persist until the machine is rebooted. root@host# service osg-ca-certs-updater-cron start Run the following to enable the updater when the machine is rebooted. root@host# chkconfig osg-ca-certs-updater-cron on Run both commands if you wish for the service to activate immediately and remain active throughout reboots. Stopping and Disabling Services \u00b6 Enter the following to disable the updater. This will persist until the machine is rebooted. root@host# service osg-ca-certs-updater-cron stop Enter the following to disable the updater when the machine is rebooted. root@host# chkconfig osg-ca-certs-updater-cron off Run both commands if you wish for the service to deactivate immediately and not get reactivated during reboots. Configuration \u00b6 While there is no configuration file, the behavior of the updater can be adjusted by command-line arguments that are specified in the cron entry of the service. This entry is located in the file /etc/cron.d/osg-ca-certs-updater . Please see the Unix manual page for crontab in section 5 for an explanation of the format. The manual page can be accessed by the command man 5 crontab . The valid command-line arguments can be listed by running osg-ca-certs-updater --help . Reasonable defaults have been provided, namely: Attempt an update no more often than every 23 hours. Due to the random wait (see below), having a 24-hour minimum time between updates would cause the update time to slowly slide back every day. Run the script every 6 hours. We run the script more often than we update so that downtime at the wrong moment does not cause the update to be delayed for a full day. Delay for a random amount of time up to 30 minutes before updating, to reduce load spikes on OSG repositories. Do not warn the administrator about update failures that have happened less than 72 hours since the last successful update. Log errors only. Troubleshooting \u00b6 Useful configuration and log files \u00b6 Configuration file \u00b6 Package File Description Location Comment osg-ca-certs-updater Cron entry for periodically launching the updater /etc/cron.d/osg-ca-certs-updater Command-line arguments to the updater can be specified here osg-release Repo definition files for production OSG repositories /etc/yum.repos.d/osg.repo Make sure these repositories are enabled and reachable from the host you are trying to update Log files \u00b6 Logging is performed to the console by default. Please see the manual for your cron daemon to find out how it handles console output. A logfile can be specified via the -l / --logfile command-line option. If logging to syslog via the -s / --log-to-syslog option, the updater will write to the user section of the syslog. The file /etc/syslog.conf determines where syslog messages are saved. References \u00b6 Some guides on X.509 certificates: Useful commands: http://security.ncsa.illinois.edu/research/grid-howtos/usefulopenssl.html Install GSI authentication on a server: http://security.ncsa.illinois.edu/research/wssec/gsihttps/ Certificates how-to: http://www.nordugrid.org/documents/certificate_howto.html See this page for examples of verifying certificates. Managing CAs \u00b6 The osg-ca-manage tool provides a unified interface to manage the CA Certificate installations. This page provides the instructions on using this command. It provides status commands that allows you to list the CAs and the validity of the CAs and CRLs included in the installation. The manage commands allow you to fetch CAs and CRLs, change the distribution URL, as well as add and remove CAs from your local installation. Usage examples \u00b6 Documentation for usage of the osg-ca-manage tool can be found here Note These commands will not work if of the osg-ca-certs (or igtf-ca-certs) RPM packages are installed. Install a certificate authority package \u00b6 Before you proceed to install a Certificate Authority Package you should decide which of the available packages to install. osg , the package recommended to be used by production resources on the OSG. It is based on the CA distribution from the IGTF, but it may differ slightly as decided by the Security Team . igtf , the package is a redistribution of the unchanged CA distribution from the IGTF url a package provided at a given URL Note If in doubt, please consult the policies of your home institution and get in contact with the Security Team . Next decide at what location to install the Certificate Authority Package: on the root file system in a system directory /etc/grid-security/certificates in a custom directory that can also be shared Setup the CA certificates \u00b6 The Certificate Authority Package is preferably be used by grid users without root privileges or if the CA certificates will not be shared by other installations on the same host. root@host # osg-ca-manage setupca --location root --url osg Setting CA Certificates for at '/etc/grid-security/certificates' Setup completed successfully. After a successful installation the certificates will be installed in ( /etc/grid-security/certificates in this example). Typically to write into this default location you will need root privileges. If you need to need to install it with out root privileges use user@host $ osg-ca-manage setupca --location $HOME /certificates --url osg Setting CA Certificates for at '$HOME/certificates' Setup completed successfully. Adding a directory of local CAs \u00b6 root@host # osg-ca-manage add --cadir /etc/grid-security/localca NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. Here is the resulting file after add ##cat /etc/osg/osg-update-certs.conf # Configuration file for osg-update-certs # This file has been regenerated by osg-ca-manage, which removes most # comments. You can still manually modify it, any manual change will # be preserved if osg-ca-manage is used again. ## The parent location certificates will be installed at. install_dir = /etc/grid-security ## cacerts_url is the URL of your certificate distribution cacerts_url = https://repo.opensciencegrid.org/cadist/ca-certs-version-igtf-new ## log specifies where logging output will go log = /var/log/osg-update-certs.log ## include specifies files (full pathnames) that should be copied ## into the certificates installation after an update has occured. include=/etc/grid-security/localca/* ## exclude_ca specifies a CA (not full pathnames, but just the hash ## of the CA you want to exclude) that should be removed from the ## certificates installation after an update has occured. debug = 0 Removing a directory of local CAs \u00b6 root@host # osg-ca-manage remove --cadir /etc/grid-security/localca NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. Removing a particular CA included in OSG CA package \u00b6 root@host # osg-ca-manage remove --caname ce33db76 Symlink detected for hash: We have determided that the hash value you entered belong to the CA 'IRAN-GRID.pem'. If you wish to add this CA back you will have to use this name is the parameter. NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. The resulting config file after the remove is as follows ##cat /etc/osg/osg-update-certs.conf # Configuration file for osg-update-certs # This file has been regenerated by osg-ca-manage, which removes most # comments. You can still manually modify it, any manual change will # be preserved if osg-ca-manage is used again. ## The parent location certificates will be installed at. install_dir = /etc/grid-security ## cacerts_url is the URL of your certificate distribution cacerts_url = https://repo.opensciencegrid.org/cadist/ca-certs-version-igtf-new ## log specifies where logging output will go log = /var/log/osg-update-certs.log ## include specifies files (full pathnames) that should be copied ## into the certificates installation after an update has occured. ## exclude_ca specifies a CA (not full pathnames, but just the hash ## of the CA you want to exclude) that should be removed from the ## certificates installation after an update has occured. exclude_ca = IRAN-GRID debug = 0 Adding a CA from the OSG CA package \u00b6 root@host # osg-ca-manage add --caname IRAN-GRID NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. Inspect installed CA certificates \u00b6 You can inspect the list of CA Certificates that have been installed: user@host $ osg-ca-manage listCA Hash=09ff08b7; Subject= /C=FR/O=CNRS/CN=CNRS2-Projets; Issuer= /C=FR/O=CNRS/CN=CNRS2; Accreditation=Unknown; Status=https://repo.opensciencegrid.org/cadist/ca-certs-version-new Hash=0a12b607; Subject= /DC=org/DC=ugrid/CN=UGRID CA; Issuer= /DC=org/DC=ugrid/CN=UGRID CA; Accreditation=Unknown; Status=https://repo.opensciencegrid.org/cadist/ca-certs-version-new [...] Any certificate issued by any of the Certificate Authorities listed will be trusted. If in doubt please contact the OSG Security Team and review the policies of your home institution. Troubleshooting \u00b6 Useful configuration and log files \u00b6 Logs and configuration: File Description Location Comment Configuration File for osg-update-certs /etc/osg/osg-update-certs.conf This file may be edited by hand, though it is recommended to use osg-ca-manage to set configuration parameters. Log file of osg-update-certs /var/log/osg-update-certs.log Stdout of osg-update-certs /var/log/osg-ca-certs-status.system.out Stdout of osg-ca-manage /var/log/osg-ca-manage.system.out Stdout of initial CA setup /var/log/osg-setup-ca-certificates.system.out References \u00b6 Installing the Certificate Authorities Certificates and the related RPMs","title":"Certificate Management Reference"},{"location":"security/certificate-management/#managing-certificates","text":"The OSG provides several tools to assist in the management of host and CA certificates. This page serves as a reference guide for several of these tools: osg-pki-tools : command line tools for requesting and managing user and host certificates. osg-ca-certs-updater : A package for auto-updating CAs on a server host. osg-ca-manage : A tool for detailed management of CA directories outside RPMs. Note This is a reference document and not introduction on how to install CA certificates or request host / user certificates. Most users will want the CA overview , host certificate overview .","title":"Managing Certificates"},{"location":"security/certificate-management/#osg-pki-command-line-clients","text":"","title":"OSG PKI Command Line Clients"},{"location":"security/certificate-management/#overview","text":"The OSG PKI Command Line Clients provide a command-line interface for creating certificate signing requests (CSRs).","title":"Overview"},{"location":"security/certificate-management/#prerequisites","text":"If you have not already done so, you need to configure the OSG software repositories .","title":"Prerequisites"},{"location":"security/certificate-management/#installation","text":"The command-line scripts have been packaged as an RPM and are available from the OSG repositories. To install the RPM, run: root@host # yum install osg-pki-tools","title":"Installation"},{"location":"security/certificate-management/#usage","text":"Documentation for usage of the osg-pki-tools can be found here","title":"Usage"},{"location":"security/certificate-management/#osg-ca-certificates-updater","text":"This section explains the installation and use of osg-ca-certs-updater , a package that provides automatic updates of CA certificates.","title":"OSG CA Certificates Updater"},{"location":"security/certificate-management/#requirements","text":"As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Requirements"},{"location":"security/certificate-management/#install-instructions","text":"Run the following command to install the latest version of the updater. root@host# yum install osg-ca-certs-updater","title":"Install instructions"},{"location":"security/certificate-management/#services","text":"","title":"Services"},{"location":"security/certificate-management/#starting-and-enabling-services","text":"Run the following to enable the updater. This will persist until the machine is rebooted. root@host# service osg-ca-certs-updater-cron start Run the following to enable the updater when the machine is rebooted. root@host# chkconfig osg-ca-certs-updater-cron on Run both commands if you wish for the service to activate immediately and remain active throughout reboots.","title":"Starting and Enabling Services"},{"location":"security/certificate-management/#stopping-and-disabling-services","text":"Enter the following to disable the updater. This will persist until the machine is rebooted. root@host# service osg-ca-certs-updater-cron stop Enter the following to disable the updater when the machine is rebooted. root@host# chkconfig osg-ca-certs-updater-cron off Run both commands if you wish for the service to deactivate immediately and not get reactivated during reboots.","title":"Stopping and Disabling Services"},{"location":"security/certificate-management/#configuration","text":"While there is no configuration file, the behavior of the updater can be adjusted by command-line arguments that are specified in the cron entry of the service. This entry is located in the file /etc/cron.d/osg-ca-certs-updater . Please see the Unix manual page for crontab in section 5 for an explanation of the format. The manual page can be accessed by the command man 5 crontab . The valid command-line arguments can be listed by running osg-ca-certs-updater --help . Reasonable defaults have been provided, namely: Attempt an update no more often than every 23 hours. Due to the random wait (see below), having a 24-hour minimum time between updates would cause the update time to slowly slide back every day. Run the script every 6 hours. We run the script more often than we update so that downtime at the wrong moment does not cause the update to be delayed for a full day. Delay for a random amount of time up to 30 minutes before updating, to reduce load spikes on OSG repositories. Do not warn the administrator about update failures that have happened less than 72 hours since the last successful update. Log errors only.","title":"Configuration"},{"location":"security/certificate-management/#troubleshooting","text":"","title":"Troubleshooting"},{"location":"security/certificate-management/#useful-configuration-and-log-files","text":"","title":"Useful configuration and log files"},{"location":"security/certificate-management/#configuration-file","text":"Package File Description Location Comment osg-ca-certs-updater Cron entry for periodically launching the updater /etc/cron.d/osg-ca-certs-updater Command-line arguments to the updater can be specified here osg-release Repo definition files for production OSG repositories /etc/yum.repos.d/osg.repo Make sure these repositories are enabled and reachable from the host you are trying to update","title":"Configuration file"},{"location":"security/certificate-management/#log-files","text":"Logging is performed to the console by default. Please see the manual for your cron daemon to find out how it handles console output. A logfile can be specified via the -l / --logfile command-line option. If logging to syslog via the -s / --log-to-syslog option, the updater will write to the user section of the syslog. The file /etc/syslog.conf determines where syslog messages are saved.","title":"Log files"},{"location":"security/certificate-management/#references","text":"Some guides on X.509 certificates: Useful commands: http://security.ncsa.illinois.edu/research/grid-howtos/usefulopenssl.html Install GSI authentication on a server: http://security.ncsa.illinois.edu/research/wssec/gsihttps/ Certificates how-to: http://www.nordugrid.org/documents/certificate_howto.html See this page for examples of verifying certificates.","title":"References"},{"location":"security/certificate-management/#managing-cas","text":"The osg-ca-manage tool provides a unified interface to manage the CA Certificate installations. This page provides the instructions on using this command. It provides status commands that allows you to list the CAs and the validity of the CAs and CRLs included in the installation. The manage commands allow you to fetch CAs and CRLs, change the distribution URL, as well as add and remove CAs from your local installation.","title":"Managing CAs"},{"location":"security/certificate-management/#usage-examples","text":"Documentation for usage of the osg-ca-manage tool can be found here Note These commands will not work if of the osg-ca-certs (or igtf-ca-certs) RPM packages are installed.","title":"Usage examples"},{"location":"security/certificate-management/#install-a-certificate-authority-package","text":"Before you proceed to install a Certificate Authority Package you should decide which of the available packages to install. osg , the package recommended to be used by production resources on the OSG. It is based on the CA distribution from the IGTF, but it may differ slightly as decided by the Security Team . igtf , the package is a redistribution of the unchanged CA distribution from the IGTF url a package provided at a given URL Note If in doubt, please consult the policies of your home institution and get in contact with the Security Team . Next decide at what location to install the Certificate Authority Package: on the root file system in a system directory /etc/grid-security/certificates in a custom directory that can also be shared","title":"Install a certificate authority package"},{"location":"security/certificate-management/#setup-the-ca-certificates","text":"The Certificate Authority Package is preferably be used by grid users without root privileges or if the CA certificates will not be shared by other installations on the same host. root@host # osg-ca-manage setupca --location root --url osg Setting CA Certificates for at '/etc/grid-security/certificates' Setup completed successfully. After a successful installation the certificates will be installed in ( /etc/grid-security/certificates in this example). Typically to write into this default location you will need root privileges. If you need to need to install it with out root privileges use user@host $ osg-ca-manage setupca --location $HOME /certificates --url osg Setting CA Certificates for at '$HOME/certificates' Setup completed successfully.","title":"Setup the CA certificates"},{"location":"security/certificate-management/#adding-a-directory-of-local-cas","text":"root@host # osg-ca-manage add --cadir /etc/grid-security/localca NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. Here is the resulting file after add ##cat /etc/osg/osg-update-certs.conf # Configuration file for osg-update-certs # This file has been regenerated by osg-ca-manage, which removes most # comments. You can still manually modify it, any manual change will # be preserved if osg-ca-manage is used again. ## The parent location certificates will be installed at. install_dir = /etc/grid-security ## cacerts_url is the URL of your certificate distribution cacerts_url = https://repo.opensciencegrid.org/cadist/ca-certs-version-igtf-new ## log specifies where logging output will go log = /var/log/osg-update-certs.log ## include specifies files (full pathnames) that should be copied ## into the certificates installation after an update has occured. include=/etc/grid-security/localca/* ## exclude_ca specifies a CA (not full pathnames, but just the hash ## of the CA you want to exclude) that should be removed from the ## certificates installation after an update has occured. debug = 0","title":"Adding a directory of local CAs"},{"location":"security/certificate-management/#removing-a-directory-of-local-cas","text":"root@host # osg-ca-manage remove --cadir /etc/grid-security/localca NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately.","title":"Removing a directory of local CAs"},{"location":"security/certificate-management/#removing-a-particular-ca-included-in-osg-ca-package","text":"root@host # osg-ca-manage remove --caname ce33db76 Symlink detected for hash: We have determided that the hash value you entered belong to the CA 'IRAN-GRID.pem'. If you wish to add this CA back you will have to use this name is the parameter. NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. The resulting config file after the remove is as follows ##cat /etc/osg/osg-update-certs.conf # Configuration file for osg-update-certs # This file has been regenerated by osg-ca-manage, which removes most # comments. You can still manually modify it, any manual change will # be preserved if osg-ca-manage is used again. ## The parent location certificates will be installed at. install_dir = /etc/grid-security ## cacerts_url is the URL of your certificate distribution cacerts_url = https://repo.opensciencegrid.org/cadist/ca-certs-version-igtf-new ## log specifies where logging output will go log = /var/log/osg-update-certs.log ## include specifies files (full pathnames) that should be copied ## into the certificates installation after an update has occured. ## exclude_ca specifies a CA (not full pathnames, but just the hash ## of the CA you want to exclude) that should be removed from the ## certificates installation after an update has occured. exclude_ca = IRAN-GRID debug = 0","title":"Removing a particular CA included in OSG CA package"},{"location":"security/certificate-management/#adding-a-ca-from-the-osg-ca-package","text":"root@host # osg-ca-manage add --caname IRAN-GRID NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately.","title":"Adding a CA from the OSG CA package"},{"location":"security/certificate-management/#inspect-installed-ca-certificates","text":"You can inspect the list of CA Certificates that have been installed: user@host $ osg-ca-manage listCA Hash=09ff08b7; Subject= /C=FR/O=CNRS/CN=CNRS2-Projets; Issuer= /C=FR/O=CNRS/CN=CNRS2; Accreditation=Unknown; Status=https://repo.opensciencegrid.org/cadist/ca-certs-version-new Hash=0a12b607; Subject= /DC=org/DC=ugrid/CN=UGRID CA; Issuer= /DC=org/DC=ugrid/CN=UGRID CA; Accreditation=Unknown; Status=https://repo.opensciencegrid.org/cadist/ca-certs-version-new [...] Any certificate issued by any of the Certificate Authorities listed will be trusted. If in doubt please contact the OSG Security Team and review the policies of your home institution.","title":"Inspect installed CA certificates"},{"location":"security/certificate-management/#troubleshooting_1","text":"","title":"Troubleshooting"},{"location":"security/certificate-management/#useful-configuration-and-log-files_1","text":"Logs and configuration: File Description Location Comment Configuration File for osg-update-certs /etc/osg/osg-update-certs.conf This file may be edited by hand, though it is recommended to use osg-ca-manage to set configuration parameters. Log file of osg-update-certs /var/log/osg-update-certs.log Stdout of osg-update-certs /var/log/osg-ca-certs-status.system.out Stdout of osg-ca-manage /var/log/osg-ca-manage.system.out Stdout of initial CA setup /var/log/osg-setup-ca-certificates.system.out","title":"Useful configuration and log files"},{"location":"security/certificate-management/#references_1","text":"Installing the Certificate Authorities Certificates and the related RPMs","title":"References"},{"location":"security/host-certs/","text":"Page moved to Host Certificates .","title":"Host certs"},{"location":"security/host-certs/incommon/","text":"Requesting InCommon IGTF Host Certificates \u00b6 Many institutions in the United States already subscribe to InCommon and offer IGTF certificate services. If your institution is in the list of InCommon subscribers , continue with the instructions below. If your institution is not in the list, Let's Encrypt certificates do not meet your needs, and you do not have access to another IGTF CA subscription, please contact us . As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories From a host that meets the above requirements, there are two options to get InCommon IGTF-accredited host certificates: Requesting certificates from a Registration Authority (RA) : This requires a Certificate Signing Request (CSR), which can be generated with the osg-cert-request tool. Requesting certificates as an RA : As an RA, you can request, approve, and retrieve certificates yourself through the InCommon REST API using the osg-incommon-cert-request tool . Install the osg-pki-tools where both command line tools are available: root@host # yum install osg-pki-tools Requesting certificates from a registration authority \u00b6 Generate a Certificate Signing Request (CSR) and private key using the osg-cert-request tool: user@host $ osg-cert-request --hostname <HOSTNAME> \\ --country <COUNTRY> \\ --state <STATE> \\ --locality <LOCALITY> \\ --organization <ORGANIZATION> You may also add DNS Subject Alternative Names (SAN) to the request by specifying any number of --altname <SAN> . For example, the following generates a CSR for test.opensciencegrid.org with foo.opensciencegrid.org and bar.opensciencegrid.org as SANs: user@host $ osg-cert-request --hostname test.opensciencegrid.org \\ --country US \\ --state Wisconsin \\ --locality Madison \\ --organization 'University of Wisconsin-Madison' \\ --altname foo.opensciencegrid.org \\ --altname bar.opensciencegrid.org If successful, the CSR will be named <HOSTNAME>.req and the private key will be named <HOSTNAME>-key.pem . Additional options and descriptions can be found here . Find your institution-specific InCommon contact and submit the CSR that you generated above. Make sure to request a 1-year IGTF Server Certificate for OTHER server software. After the certificate has been issued by your institution, download the host certificate only (not the full chain) to its intended host and copy over the key you generated above. Verify that the issuer CN field is InCommon IGTF Server CA : $ openssl x509 -in <PATH TO CERTIFICATE> -noout -issuer issuer= /C=US/O=Internet2/OU=InCommon/CN=InCommon IGTF Server CA Where <PATH TO CERTIFICATE> is the file you downloaded in the previous step Install the host certificate and key: root@host # cp <PATH TO CERTIFICATE> /etc/grid-security/hostcert.pem root@host # chmod 444 /etc/grid-security/hostcert.pem root@host # cp <PATH TO KEY> /etc/grid-security/hostkey.pem root@host # chmod 400 /etc/grid-security/hostkey.pem Where <PATH TO KEY> is the \".key\" file you created in the first step Requesting certificates as a registration authority \u00b6 If you are a Registration Authority for your institution, skip ahead to this section . If you are not already a Registration Authority (RA) for your institution, you must request to be made one: Find your institution-specific InCommon contact (e.g. campus central IT) Request a Department Registration Authority user with SSL auto-approve enabled and a client certificate: If they do not grant your request, you will not be able to request, approve, and retrieve certificates yourself. Instead, you must request certificates from your RA . If they grant your request, you will receive an email with instructions for requesting your client certificate; download the .p12 file. Find your institution-specific organization and department codes at the InCommon Cert Manager (https://cert-manager.com/customer/InCommon). These are numeric codes that should be specified through the command line using the -O/--orgcode ORG,DEPT option: Organization code is shown as OrgID under Settings > Organizations > Edit Department code is shown as OrgID under Settings > Organizations > Departments > Edit Once you have RA privileges, you may request, approve, and retrieve host certificates using osg-incommon-cert-request : In order to request a certificate, you will need your InCommon client certificate as two separate files, incommon_user_key.pem for the key, and incommon_user_cert.pem for the cert. If you don't already have them, perform the following steps: Download the .p12 file with your client certificate and save this as incommon_file.p12 . You should have received instructions for how to obtain this file in an email when you became an RA. Extract the certificate and key: user@host $ openssl pkcs12 -in incommon_file.p12 \\ -nocerts -out ~/path_to_dir/incommon_user_key.pem user@host $ openssl pkcs12 -in incommon_file.p12 \\ -nokeys -out ~/path_to_dir/incommon_user_cert.pem Requesting a certificate with a single hostname <HOSTNAME> : user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostname <HOSTNAME> [--orgcode <ORG,DEPT>] Requesting a certificate with Subject Alternative Names (SANs): user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostname <HOSTNAME> \\ --altname <ALTNAME> \\ --altname <ALTNAME2> [--orgcode <ORG,DEPT>] Requesting certificates in bulk using a hostfile name: user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostfile ~/path_to_file/hostfile.txt \\ [ --orgcode <ORG,DEPT> ] Where the contents of hostfile.txt contain one hostname and any number of SANs per line: hostname01.yourdomain hostname02.yourdomain hostnamealias.yourdomain hostname03.yourdomain hostname04.yourdomain hostname05.yourdomain References \u00b6 CILogon documentation for requesting InCommon certificates Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate.","title":"Using InCommon"},{"location":"security/host-certs/incommon/#requesting-incommon-igtf-host-certificates","text":"Many institutions in the United States already subscribe to InCommon and offer IGTF certificate services. If your institution is in the list of InCommon subscribers , continue with the instructions below. If your institution is not in the list, Let's Encrypt certificates do not meet your needs, and you do not have access to another IGTF CA subscription, please contact us . As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories From a host that meets the above requirements, there are two options to get InCommon IGTF-accredited host certificates: Requesting certificates from a Registration Authority (RA) : This requires a Certificate Signing Request (CSR), which can be generated with the osg-cert-request tool. Requesting certificates as an RA : As an RA, you can request, approve, and retrieve certificates yourself through the InCommon REST API using the osg-incommon-cert-request tool . Install the osg-pki-tools where both command line tools are available: root@host # yum install osg-pki-tools","title":"Requesting InCommon IGTF Host Certificates"},{"location":"security/host-certs/incommon/#requesting-certificates-from-a-registration-authority","text":"Generate a Certificate Signing Request (CSR) and private key using the osg-cert-request tool: user@host $ osg-cert-request --hostname <HOSTNAME> \\ --country <COUNTRY> \\ --state <STATE> \\ --locality <LOCALITY> \\ --organization <ORGANIZATION> You may also add DNS Subject Alternative Names (SAN) to the request by specifying any number of --altname <SAN> . For example, the following generates a CSR for test.opensciencegrid.org with foo.opensciencegrid.org and bar.opensciencegrid.org as SANs: user@host $ osg-cert-request --hostname test.opensciencegrid.org \\ --country US \\ --state Wisconsin \\ --locality Madison \\ --organization 'University of Wisconsin-Madison' \\ --altname foo.opensciencegrid.org \\ --altname bar.opensciencegrid.org If successful, the CSR will be named <HOSTNAME>.req and the private key will be named <HOSTNAME>-key.pem . Additional options and descriptions can be found here . Find your institution-specific InCommon contact and submit the CSR that you generated above. Make sure to request a 1-year IGTF Server Certificate for OTHER server software. After the certificate has been issued by your institution, download the host certificate only (not the full chain) to its intended host and copy over the key you generated above. Verify that the issuer CN field is InCommon IGTF Server CA : $ openssl x509 -in <PATH TO CERTIFICATE> -noout -issuer issuer= /C=US/O=Internet2/OU=InCommon/CN=InCommon IGTF Server CA Where <PATH TO CERTIFICATE> is the file you downloaded in the previous step Install the host certificate and key: root@host # cp <PATH TO CERTIFICATE> /etc/grid-security/hostcert.pem root@host # chmod 444 /etc/grid-security/hostcert.pem root@host # cp <PATH TO KEY> /etc/grid-security/hostkey.pem root@host # chmod 400 /etc/grid-security/hostkey.pem Where <PATH TO KEY> is the \".key\" file you created in the first step","title":"Requesting certificates from a registration authority"},{"location":"security/host-certs/incommon/#requesting-certificates-as-a-registration-authority","text":"If you are a Registration Authority for your institution, skip ahead to this section . If you are not already a Registration Authority (RA) for your institution, you must request to be made one: Find your institution-specific InCommon contact (e.g. campus central IT) Request a Department Registration Authority user with SSL auto-approve enabled and a client certificate: If they do not grant your request, you will not be able to request, approve, and retrieve certificates yourself. Instead, you must request certificates from your RA . If they grant your request, you will receive an email with instructions for requesting your client certificate; download the .p12 file. Find your institution-specific organization and department codes at the InCommon Cert Manager (https://cert-manager.com/customer/InCommon). These are numeric codes that should be specified through the command line using the -O/--orgcode ORG,DEPT option: Organization code is shown as OrgID under Settings > Organizations > Edit Department code is shown as OrgID under Settings > Organizations > Departments > Edit Once you have RA privileges, you may request, approve, and retrieve host certificates using osg-incommon-cert-request : In order to request a certificate, you will need your InCommon client certificate as two separate files, incommon_user_key.pem for the key, and incommon_user_cert.pem for the cert. If you don't already have them, perform the following steps: Download the .p12 file with your client certificate and save this as incommon_file.p12 . You should have received instructions for how to obtain this file in an email when you became an RA. Extract the certificate and key: user@host $ openssl pkcs12 -in incommon_file.p12 \\ -nocerts -out ~/path_to_dir/incommon_user_key.pem user@host $ openssl pkcs12 -in incommon_file.p12 \\ -nokeys -out ~/path_to_dir/incommon_user_cert.pem Requesting a certificate with a single hostname <HOSTNAME> : user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostname <HOSTNAME> [--orgcode <ORG,DEPT>] Requesting a certificate with Subject Alternative Names (SANs): user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostname <HOSTNAME> \\ --altname <ALTNAME> \\ --altname <ALTNAME2> [--orgcode <ORG,DEPT>] Requesting certificates in bulk using a hostfile name: user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostfile ~/path_to_file/hostfile.txt \\ [ --orgcode <ORG,DEPT> ] Where the contents of hostfile.txt contain one hostname and any number of SANs per line: hostname01.yourdomain hostname02.yourdomain hostnamealias.yourdomain hostname03.yourdomain hostname04.yourdomain hostname05.yourdomain","title":"Requesting certificates as a registration authority"},{"location":"security/host-certs/incommon/#references","text":"CILogon documentation for requesting InCommon certificates Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate.","title":"References"},{"location":"security/host-certs/lets-encrypt/","text":"Requesting Host Certificates Using Let's Encrypt \u00b6 Let's Encrypt is a free, automated, and open CA frequently used for web services; see the security team's position on Let's Encrypt for more details. Let's Encrypt can be used to obtain host certificates as an alternative to InCommon if your institution does not have an InCommon subscription. Let's Encrypt uses an automated script named certbot for requesting and renewing host certs. certbot binds to port 80 when running, so services running on port 80 (such as HTCondor-CE View service ) must be temporarily stopped before running certbot . In addition, port 80 must be open to the world while certbot is running. If this does not work for your host, see the alternate renewal methods section below. Let's Encrypt host certs expire every three months so it is important to set up automated renewal. Installation and Obtaining the Initial Certificate \u00b6 Install the certbot package (available from the EPEL 7 repository): root@host # yum install certbot Stop services running on port 80 if there are any. Run the following command to obtain the host certificate with Let's Encrypt: root@host # certbot certonly --standalone --email <ADMIN_EMAIL> -d <HOST> Set up hostcert/hostkey links: root@host # ln -sf /etc/letsencrypt/live/*/cert.pem /etc/grid-security/hostcert.pem root@host # ln -sf /etc/letsencrypt/live/*/privkey.pem /etc/grid-security/hostkey.pem root@host # chmod 0600 /etc/letsencrypt/archive/*/privkey*.pem Restart services running on port 80 if there were any. Renewing Let's Encrypt host certificates \u00b6 You can manually renew your certificate with the following command: root@host # certbot renew The certificate will be renewed if it is close to expiring. Disable services listening on port 80 Just like with obtaining a new certificate, renewing a certificate requires you to temporarily disable services running on port 80 so that certbot can verify the host. Automating renewals using systemd timers \u00b6 To automate renewal using systemd, you'll need to create two files: The first is a service file that tells systemd how to invoke certbot. The second is to generate a timer file that tells systemd how often to run the service. The steps to setup the timer are as follows: Create a service file called /etc/systemd/system/certbot.service with the following contents [Unit] Description=Let's Encrypt renewal [Service] Type=oneshot ExecStart=/usr/bin/certbot renew --quiet --agree-tos Once the certbot service is working correctly, you will need to create the timer file. Create the timer file at /etc/systemd/system/certbot.timer ) with the following contents: [Unit] Description=Let's Encrypt renewal timer [Timer] OnCalendar=0/12:00:00 RandomizedDelaySec=1h Persistent=true [Install] WantedBy=timers.target Update the systemd manager configuration: root@host # systemctl daemon-reload Start and enable the certbot timer: root@host # systemctl enable --now certbot.timer You can verify that the timer is active by running systemctl list-timers . Note Verify that the service has started correctly by running systemctl status certbot.service . The timer may fail without warnings if the service does not run correctly. Pre- and post-renewal hooks \u00b6 certbot provides the ability to run scripts before and/or after certificate renewal via command hooks. Common uses for these hooks include: Copying the renewed certificate so that it can be used for a separate service (such as XRootD) Shutting down and restarting a service running on port 80 Temporarily opening up the firewall To do this, call certbot with --pre-hook <COMMAND> for a command or script to run before renewal, and --post-hook <COMMAND> for a command or script to run after renewal. The command(s) will only be run if the certificate is actually renewed. Example \u00b6 This example is for a host running CEView and XRootD standalone; CEView needs to be stopped so it doesn't block port 80, and XRootD needs its certificate in a separate location. Create the following scripts: /root/bin/certbot-pre.sh #!/bin/bash condor_ce_off -daemon CEVIEW /root/bin/certbot-post.sh #!/bin/bash cd /etc/grid-security cp -f hostcert.pem xrd/xrdcert.pem cp -f hostkey.pem xrd/xrdkey.pem chown -R xrootd:xrootd xrd condor_ce_on -daemon CEVIEW systemctl restart xrootd@standalone Then call certbot as follows: root@host # certbot renew --pre-hook /root/bin/certbot-pre.sh \\ --post-hook /root/bin/certbot-post.sh For automated renewal, edit the certbot.service file that you created above and add the --pre-hook <COMMAND> and --post-hook <COMMAND> arguments to the ExecStart line: ExecStart = /usr/bin/certbot renew --quiet --agree-tos \\ --pre-hook /root/bin/certbot-pre.sh \\ --post-hook /root/bin/certbot-post.sh Alternate renewal methods \u00b6 There are some cases in which you might need an alternative to running certbot or certbot-auto as above. For example: You have a web server running on port 80 that you do not want to shut down during renewal You cannot open port 80 during renewal You want a wildcard certificate You want to run the renewal on a different machine than where the certificate will be used Certbot plugins may help in these cases. The Apache, Nginx, and Webroot plugins integrate with an already running web server to allow renewal without shutting the webserver down. One of the DNS plugins can be used to avoid using port 80, run on a different machine, or obtain a wildcard cert. If all else fails, the manual plugin can be used for manual renewal. References \u00b6 Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate. Official Let's Encrypt setup guide . Another Let's Encrypt setup reference . Under Getting your host certificate, we follow the first \"Setting up\" section.","title":"Using Let's Encrypt"},{"location":"security/host-certs/lets-encrypt/#requesting-host-certificates-using-lets-encrypt","text":"Let's Encrypt is a free, automated, and open CA frequently used for web services; see the security team's position on Let's Encrypt for more details. Let's Encrypt can be used to obtain host certificates as an alternative to InCommon if your institution does not have an InCommon subscription. Let's Encrypt uses an automated script named certbot for requesting and renewing host certs. certbot binds to port 80 when running, so services running on port 80 (such as HTCondor-CE View service ) must be temporarily stopped before running certbot . In addition, port 80 must be open to the world while certbot is running. If this does not work for your host, see the alternate renewal methods section below. Let's Encrypt host certs expire every three months so it is important to set up automated renewal.","title":"Requesting Host Certificates Using Let's Encrypt"},{"location":"security/host-certs/lets-encrypt/#installation-and-obtaining-the-initial-certificate","text":"Install the certbot package (available from the EPEL 7 repository): root@host # yum install certbot Stop services running on port 80 if there are any. Run the following command to obtain the host certificate with Let's Encrypt: root@host # certbot certonly --standalone --email <ADMIN_EMAIL> -d <HOST> Set up hostcert/hostkey links: root@host # ln -sf /etc/letsencrypt/live/*/cert.pem /etc/grid-security/hostcert.pem root@host # ln -sf /etc/letsencrypt/live/*/privkey.pem /etc/grid-security/hostkey.pem root@host # chmod 0600 /etc/letsencrypt/archive/*/privkey*.pem Restart services running on port 80 if there were any.","title":"Installation and Obtaining the Initial Certificate"},{"location":"security/host-certs/lets-encrypt/#renewing-lets-encrypt-host-certificates","text":"You can manually renew your certificate with the following command: root@host # certbot renew The certificate will be renewed if it is close to expiring. Disable services listening on port 80 Just like with obtaining a new certificate, renewing a certificate requires you to temporarily disable services running on port 80 so that certbot can verify the host.","title":"Renewing Let's Encrypt host certificates"},{"location":"security/host-certs/lets-encrypt/#automating-renewals-using-systemd-timers","text":"To automate renewal using systemd, you'll need to create two files: The first is a service file that tells systemd how to invoke certbot. The second is to generate a timer file that tells systemd how often to run the service. The steps to setup the timer are as follows: Create a service file called /etc/systemd/system/certbot.service with the following contents [Unit] Description=Let's Encrypt renewal [Service] Type=oneshot ExecStart=/usr/bin/certbot renew --quiet --agree-tos Once the certbot service is working correctly, you will need to create the timer file. Create the timer file at /etc/systemd/system/certbot.timer ) with the following contents: [Unit] Description=Let's Encrypt renewal timer [Timer] OnCalendar=0/12:00:00 RandomizedDelaySec=1h Persistent=true [Install] WantedBy=timers.target Update the systemd manager configuration: root@host # systemctl daemon-reload Start and enable the certbot timer: root@host # systemctl enable --now certbot.timer You can verify that the timer is active by running systemctl list-timers . Note Verify that the service has started correctly by running systemctl status certbot.service . The timer may fail without warnings if the service does not run correctly.","title":"Automating renewals using systemd timers"},{"location":"security/host-certs/lets-encrypt/#pre-and-post-renewal-hooks","text":"certbot provides the ability to run scripts before and/or after certificate renewal via command hooks. Common uses for these hooks include: Copying the renewed certificate so that it can be used for a separate service (such as XRootD) Shutting down and restarting a service running on port 80 Temporarily opening up the firewall To do this, call certbot with --pre-hook <COMMAND> for a command or script to run before renewal, and --post-hook <COMMAND> for a command or script to run after renewal. The command(s) will only be run if the certificate is actually renewed.","title":"Pre- and post-renewal hooks"},{"location":"security/host-certs/lets-encrypt/#example","text":"This example is for a host running CEView and XRootD standalone; CEView needs to be stopped so it doesn't block port 80, and XRootD needs its certificate in a separate location. Create the following scripts: /root/bin/certbot-pre.sh #!/bin/bash condor_ce_off -daemon CEVIEW /root/bin/certbot-post.sh #!/bin/bash cd /etc/grid-security cp -f hostcert.pem xrd/xrdcert.pem cp -f hostkey.pem xrd/xrdkey.pem chown -R xrootd:xrootd xrd condor_ce_on -daemon CEVIEW systemctl restart xrootd@standalone Then call certbot as follows: root@host # certbot renew --pre-hook /root/bin/certbot-pre.sh \\ --post-hook /root/bin/certbot-post.sh For automated renewal, edit the certbot.service file that you created above and add the --pre-hook <COMMAND> and --post-hook <COMMAND> arguments to the ExecStart line: ExecStart = /usr/bin/certbot renew --quiet --agree-tos \\ --pre-hook /root/bin/certbot-pre.sh \\ --post-hook /root/bin/certbot-post.sh","title":"Example"},{"location":"security/host-certs/lets-encrypt/#alternate-renewal-methods","text":"There are some cases in which you might need an alternative to running certbot or certbot-auto as above. For example: You have a web server running on port 80 that you do not want to shut down during renewal You cannot open port 80 during renewal You want a wildcard certificate You want to run the renewal on a different machine than where the certificate will be used Certbot plugins may help in these cases. The Apache, Nginx, and Webroot plugins integrate with an already running web server to allow renewal without shutting the webserver down. One of the DNS plugins can be used to avoid using port 80, run on a different machine, or obtain a wildcard cert. If all else fails, the manual plugin can be used for manual renewal.","title":"Alternate renewal methods"},{"location":"security/host-certs/lets-encrypt/#references","text":"Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate. Official Let's Encrypt setup guide . Another Let's Encrypt setup reference . Under Getting your host certificate, we follow the first \"Setting up\" section.","title":"References"},{"location":"security/host-certs/overview/","text":"Host Certificates \u00b6 Note This document describes how to get host certificates. Host certificates are X.509 certificates that are used to securely identify servers and to establish encrypted connections between services and clients. In the OSG Fabric of Services, some services (e.g., HTCondor-CE, XRootD) require host certificates. If you are unsure if your host needs a host certificate, please consult the installation instructions for the software you are interested in installing. Before Starting \u00b6 Before requesting a new host certificate, use openssl to check if your host already has a valid certificate, i.e. the present is between notBefore and notAfter dates and times. If so, you may safely skip this document: user@host $ openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout subject= /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=host.opensciencegrid.org issuer=/DC=org/DC=cilogon/C=US/O=CILogon/CN=CILogon OSG CA 1 notBefore=Jan 4 21:08:09 2010 GMT notAfter=Jan 4 21:08:09 2011 GMT If you are using OpenSSL 1.1, you may notice minor formatting differences. Requesting Host Certificates \u00b6 To acquire a host certificate, you must submit a request to a Certificate Authority (CA). We recommend requesting host certificates from one of the following CAs: InCommon IGTF : an IGTF-accredited CA for services that interact with the WLCG; requires a subscription, generally held by an institution Important For integration with the OSG Fabric of Services, InCommon host certificates must be issued by the IGTF CA and not the InCommon RSA CA. Let's Encrypt : a free, automated, and open CA frequently used for web services; see the security team's position on Let's Encrypt for more details. Let's Encrypt is not IGTF-accredited so their certificates are not suitable for WLCG services. If neither of the above options work for your site, the OSG Fabric of Services also accepts all IGTF-accredited CAs . Note For SSL to work properly, you will need to request a host certificate with \"TLS Web Server Authentication\" and \"TLS Web Client Authentication\" included in the X509v3 Extended Key Usage. Requesting Service Certificates \u00b6 Previously, the OSG Consortium recommended using separate X.509 certificates, called \"service certificates\", for each service on a host. This practice has become less popular as sites have separated SSL-requiring services to their own hosts. In the case where your host is only running a single service that requires a service certificate, we recommend using your host certificate as your service certificate. Ensure that the ownership of the host certificate and key are appropriate for the service you are running. If you are running multiple services that require host certificates, we recommend requesting a certificate whose CommonName is <service>-hostname and has the hostname in the list of subject alternative names. Frequently Asked Questions \u00b6 Can I use any host to request a certificate for a different host? \u00b6 YES, you can use any host to create a certificate signing request as long as the hostname for the certificate is a fully qualified domain name. How do I renew a host certificate? \u00b6 For Let's Encrypt certificates, see this section For other certificates, there is no separate renewal procedure. Instead, request a new certificate using one of the methods above. How can I check if I have a host certificate installed already? \u00b6 By default the host certificate key pair will be installed in /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem . You can use openssl to access basic information about the certificate: root@host # openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout subject= /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=host.opensciencegrid.org issuer= /DC=org/DC=cilogon/C=US/O=CILogon/CN=CILogon OSG CA 1 notBefore=Apr 8 00:00:00 2013 GMT notAfter=May 17 12:00:00 2014 GMT Note The openssl version 1.1.x command prints the subject DN in a slightly different format. OpenSSL version 1.1 is present on Enterprise Linux 8 systems. The new format is a comma separated list of attributes. You must convert that back to the older format for our map files. Each attribute must start with a / and there are no spaces around the = and remove the comma between attributes: DC = org, DC = opensciencegrid, O = Open Science Grid, OU = People, CN = Matyas Selmeci should be written as: /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci How can I check the expiration time of my installed host certificate? \u00b6 Use the following openssl command to find the dates that your host certificate is valid: root@host # openssl x509 -in /etc/grid-security/hostcert.pem -dates -noout notBefore=Jan 4 21:08:41 2010 GMT notAfter=Jan 4 21:08:41 2011 GMT References \u00b6 Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate.","title":"Overview"},{"location":"security/host-certs/overview/#host-certificates","text":"Note This document describes how to get host certificates. Host certificates are X.509 certificates that are used to securely identify servers and to establish encrypted connections between services and clients. In the OSG Fabric of Services, some services (e.g., HTCondor-CE, XRootD) require host certificates. If you are unsure if your host needs a host certificate, please consult the installation instructions for the software you are interested in installing.","title":"Host Certificates"},{"location":"security/host-certs/overview/#before-starting","text":"Before requesting a new host certificate, use openssl to check if your host already has a valid certificate, i.e. the present is between notBefore and notAfter dates and times. If so, you may safely skip this document: user@host $ openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout subject= /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=host.opensciencegrid.org issuer=/DC=org/DC=cilogon/C=US/O=CILogon/CN=CILogon OSG CA 1 notBefore=Jan 4 21:08:09 2010 GMT notAfter=Jan 4 21:08:09 2011 GMT If you are using OpenSSL 1.1, you may notice minor formatting differences.","title":"Before Starting"},{"location":"security/host-certs/overview/#requesting-host-certificates","text":"To acquire a host certificate, you must submit a request to a Certificate Authority (CA). We recommend requesting host certificates from one of the following CAs: InCommon IGTF : an IGTF-accredited CA for services that interact with the WLCG; requires a subscription, generally held by an institution Important For integration with the OSG Fabric of Services, InCommon host certificates must be issued by the IGTF CA and not the InCommon RSA CA. Let's Encrypt : a free, automated, and open CA frequently used for web services; see the security team's position on Let's Encrypt for more details. Let's Encrypt is not IGTF-accredited so their certificates are not suitable for WLCG services. If neither of the above options work for your site, the OSG Fabric of Services also accepts all IGTF-accredited CAs . Note For SSL to work properly, you will need to request a host certificate with \"TLS Web Server Authentication\" and \"TLS Web Client Authentication\" included in the X509v3 Extended Key Usage.","title":"Requesting Host Certificates"},{"location":"security/host-certs/overview/#requesting-service-certificates","text":"Previously, the OSG Consortium recommended using separate X.509 certificates, called \"service certificates\", for each service on a host. This practice has become less popular as sites have separated SSL-requiring services to their own hosts. In the case where your host is only running a single service that requires a service certificate, we recommend using your host certificate as your service certificate. Ensure that the ownership of the host certificate and key are appropriate for the service you are running. If you are running multiple services that require host certificates, we recommend requesting a certificate whose CommonName is <service>-hostname and has the hostname in the list of subject alternative names.","title":"Requesting Service Certificates"},{"location":"security/host-certs/overview/#frequently-asked-questions","text":"","title":"Frequently Asked Questions"},{"location":"security/host-certs/overview/#can-i-use-any-host-to-request-a-certificate-for-a-different-host","text":"YES, you can use any host to create a certificate signing request as long as the hostname for the certificate is a fully qualified domain name.","title":"Can I use any host to request a certificate for a different host?"},{"location":"security/host-certs/overview/#how-do-i-renew-a-host-certificate","text":"For Let's Encrypt certificates, see this section For other certificates, there is no separate renewal procedure. Instead, request a new certificate using one of the methods above.","title":"How do I renew a host certificate?"},{"location":"security/host-certs/overview/#how-can-i-check-if-i-have-a-host-certificate-installed-already","text":"By default the host certificate key pair will be installed in /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem . You can use openssl to access basic information about the certificate: root@host # openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout subject= /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=host.opensciencegrid.org issuer= /DC=org/DC=cilogon/C=US/O=CILogon/CN=CILogon OSG CA 1 notBefore=Apr 8 00:00:00 2013 GMT notAfter=May 17 12:00:00 2014 GMT Note The openssl version 1.1.x command prints the subject DN in a slightly different format. OpenSSL version 1.1 is present on Enterprise Linux 8 systems. The new format is a comma separated list of attributes. You must convert that back to the older format for our map files. Each attribute must start with a / and there are no spaces around the = and remove the comma between attributes: DC = org, DC = opensciencegrid, O = Open Science Grid, OU = People, CN = Matyas Selmeci should be written as: /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci","title":"How can I check if I have a host certificate installed already?"},{"location":"security/host-certs/overview/#how-can-i-check-the-expiration-time-of-my-installed-host-certificate","text":"Use the following openssl command to find the dates that your host certificate is valid: root@host # openssl x509 -in /etc/grid-security/hostcert.pem -dates -noout notBefore=Jan 4 21:08:41 2010 GMT notAfter=Jan 4 21:08:41 2011 GMT","title":"How can I check the expiration time of my installed host certificate?"},{"location":"security/host-certs/overview/#references","text":"Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate.","title":"References"},{"location":"security/tokens/overview/","text":"Bearer Token Overview \u00b6 Token-based Authentication and Authorization Infrastructure (AAI) is a security method that is intended as the replacement for X.509 for accessing compute and storage resources. This document will describe \"bearer tokens,\" which are one of the components of Token AAI; bearer tokens are the type of token that server software such as HTCondor and XRootD will primarily interact with. A bearer token (sometimes called an \"access token\") is a short-lived credential, performing a similar role as a grid proxy did in X.509. X.509 proxies established identity (the DN in your subject) and group membership (VOMS FQANs). Servers made decisions about access based on those properties. Tokens also have 'scope' which can restrict the actions that can be done with the token. For example, a token used for storage access can restrict the files that can be read to a particular directory tree. Instead of using a single proxy, a job may have multiple tokens. For example the job could have one token granting it the ability to be run; it could have a token for read access to an input dataset, and a token for write access to a results directory. Token Components \u00b6 Bearer tokens are credential strings in the JSON Web Token (JWT) format. A JWT consists of a JSON header, a JSON payload, and a signature that can be verified. The payload contains a number of fields, called \"claims\", that describe the token and what it can access. There are two JWT-based token standards that can be used with OSG software: SciTokens and WLCG Tokens . These standards describe the claims that are used in the payload of the JWT. SciTokens and WLCG Tokens are similar standards and have some common claims: Issuer (\"iss\") The issuer identifies the organization that issued the token. An issuer looks like an HTTPS URL; this URL must be valid and publicly accessible as they are used by site services to validate the token. Token issuers will be described below . Subject (\"sub\") The subject identifies an entity (which could be a human or a robot) that owns the token. Unlike the subject of an X.509 certificate, a token subject does not need to be globally unique, only unique to the issuer. Subjects will be elaborated on below . Issued-at (\"iat\"), not-before (\"nbf\"), expiration (\"exp\") These claims are Unix timestamps that specify when the token was issued, and its lifespan. Audience (\"aud\") The audience is a server (or a JSON list of servers) that the token may be used on; it is typically a hostname, host:port, or URI. For example a token used for submitting a job to a CE would have <CE FQDN>:<CE PORT> in the aud claim. The special values ANY (SciTokens) or https://wlcg.cern.ch/jwt/v1/any (WLCG Tokens) allow the token to be used on any server. Scope (\"scope\") The scope limits the actions that can be made using the token. The format of the scope claim differs between SciTokens and WLCG Tokens; scopes in use by OSG services will be listed below . WLCG Tokens may have a wlcg.group instead of a scope, as described below . Issuer \u00b6 To generate bearer tokens, a collaboration must adminster at least one \"token issuer\" to issue tokens to their users. In addition to generating and signing tokens, token issuers provide a public endpoint that can be used to validate an issued token, e.g. an OSG Compute Entrypoint (CE) will contact the token issuer to authorize a bearer token used for pilot job submission. The issuer is listed in the iss claim; this should be an HTTPS URL of a web server. This server must have the public key that can be used to validate the token in a well-known location, as described by the OpenID Connect Discovery standard . If the issuer is down, or the the public key cannot be downloaded, the token cannot be verified and will be rejected. Note that most clients will cache the public key. In order to ease the token transition, the current cache lifetime is 4 days, but at some point this will be lowered to a few hours. A collaboration may have more than one token issuer, but a single token issuer should never serve more than one collaboration. The issuer claim should be able to uniquely identify the collaboration that identifies the token. Subject \u00b6 The subject is listed in the sub claim and should be unique, stable identifier that corresponds to a user (human) or a service (robot or pilot job submission). A subject does not need to be globally unique but it must be unique to the issuer. The subject, when combined with the issuer, will give a globally unique identity that can be used for mapping, banning, accounting, monitoring, auditing, or tracing. Note Due to privacy concerns, the subject may be a randomly generated string, hash, UUID, etc., that does not contain any personally identifying information. Tracing a token to a user or service may require contacting the issuer. Scopes \u00b6 The scope claim is a space-separated list of authorizations that should be granted to the bearer. Scopes utilized by OSG services include the following: Capability SciTokens scope WLCG scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. A SciToken must have a non-empty scope, or it cannot be used to do anything. WLCG Groups \u00b6 A WLCG Token may have a wlcg.groups claim instead of a scope. The wlcg.groups claim is a comma and space separated list of collaboration groups. The format of these groups are similar to VOMS FQANs: /<collaboration>[/<group>][/Role=<role>] , replacing <collaboration> , <group> , and <role> with the collaboration, group, and role, respectively, where the group and role are optional. For example, the following groups and roles have been used by the ATLAS and CMS collaborations: /atlas/ /atlas/usatlas /cms/Role=pilot /cms/local/Role=pilot Validating Tokens in Pilot Jobs \u00b6 If an incoming (pre-routed) pilot on a CE has a token, it will have the following classad attributes: Attribute Meaning AuthTokenId A UUID of the token AuthTokenIssuer The URL of the issuer of the token AuthTokenScopes Any scope restrictions on the token AuthTokenSubject The sub claim of the token AuthTokenGroups The wlcg.groups , if any, claim of the token (A pre-routed job is a job without RoutedJob=True in its classad.) Note A job may have both a token and an X.509 proxy. Presence of any x509* attributes does not indicate the absence of a token. To see which authentication method was used for a job: - Examine the /var/log/condor-ce/AuditLog* files. - Find a line saying Submitting new job <JOBID> (where <JOBID> is a job ID like 21249.0 ). The line before that should say what authentication method was used. - Authentication via a token will say AuthMethod=SCITOKENS . - Authentication via a proxy will say AuthMethod=GSI . See the upstream documentation for more details. Collaboration support \u00b6 Verify support with collaborations The tables of collaborations below are updated as frequently as possible. If a collaboration you support is listed as not supporting tokens or WebDav, please contact your collaboration directly to verify that this information is up-to-date. WebDAV/XRootD File transfer \u00b6 The following collaborations support support file transfer using WebDAV or XRootD: Collaboration Supports WebDAV or XRootD ATLAS Yes CMS Yes CLAS12 Yes EIC N/A GLOW Yes GlueX N/A HCC N/A IceCube Undergoing testing* LIGO Undergoing testing* OSG Yes * Currently, collaborations testing WebDAV or XRootD support will continue to support other file transfer protocols so it should it should be safe to update your OSG WN clients to OSG 3.6. If you have any questions, please contact your collaboration directly. Help \u00b6 To get assistance, please use the this page . References \u00b6 Troubleshooting Tokens OSG Technology - Collaborations and Bearer Tokens JSON Web Tokens - includes token decoder SciTokens SciToken Claims and Scopes Language SciTokens Demo - includes token generator, verifier, and links to libraries WLCG Common JWT Profiles","title":"Overview"},{"location":"security/tokens/overview/#bearer-token-overview","text":"Token-based Authentication and Authorization Infrastructure (AAI) is a security method that is intended as the replacement for X.509 for accessing compute and storage resources. This document will describe \"bearer tokens,\" which are one of the components of Token AAI; bearer tokens are the type of token that server software such as HTCondor and XRootD will primarily interact with. A bearer token (sometimes called an \"access token\") is a short-lived credential, performing a similar role as a grid proxy did in X.509. X.509 proxies established identity (the DN in your subject) and group membership (VOMS FQANs). Servers made decisions about access based on those properties. Tokens also have 'scope' which can restrict the actions that can be done with the token. For example, a token used for storage access can restrict the files that can be read to a particular directory tree. Instead of using a single proxy, a job may have multiple tokens. For example the job could have one token granting it the ability to be run; it could have a token for read access to an input dataset, and a token for write access to a results directory.","title":"Bearer Token Overview"},{"location":"security/tokens/overview/#token-components","text":"Bearer tokens are credential strings in the JSON Web Token (JWT) format. A JWT consists of a JSON header, a JSON payload, and a signature that can be verified. The payload contains a number of fields, called \"claims\", that describe the token and what it can access. There are two JWT-based token standards that can be used with OSG software: SciTokens and WLCG Tokens . These standards describe the claims that are used in the payload of the JWT. SciTokens and WLCG Tokens are similar standards and have some common claims: Issuer (\"iss\") The issuer identifies the organization that issued the token. An issuer looks like an HTTPS URL; this URL must be valid and publicly accessible as they are used by site services to validate the token. Token issuers will be described below . Subject (\"sub\") The subject identifies an entity (which could be a human or a robot) that owns the token. Unlike the subject of an X.509 certificate, a token subject does not need to be globally unique, only unique to the issuer. Subjects will be elaborated on below . Issued-at (\"iat\"), not-before (\"nbf\"), expiration (\"exp\") These claims are Unix timestamps that specify when the token was issued, and its lifespan. Audience (\"aud\") The audience is a server (or a JSON list of servers) that the token may be used on; it is typically a hostname, host:port, or URI. For example a token used for submitting a job to a CE would have <CE FQDN>:<CE PORT> in the aud claim. The special values ANY (SciTokens) or https://wlcg.cern.ch/jwt/v1/any (WLCG Tokens) allow the token to be used on any server. Scope (\"scope\") The scope limits the actions that can be made using the token. The format of the scope claim differs between SciTokens and WLCG Tokens; scopes in use by OSG services will be listed below . WLCG Tokens may have a wlcg.group instead of a scope, as described below .","title":"Token Components"},{"location":"security/tokens/overview/#issuer","text":"To generate bearer tokens, a collaboration must adminster at least one \"token issuer\" to issue tokens to their users. In addition to generating and signing tokens, token issuers provide a public endpoint that can be used to validate an issued token, e.g. an OSG Compute Entrypoint (CE) will contact the token issuer to authorize a bearer token used for pilot job submission. The issuer is listed in the iss claim; this should be an HTTPS URL of a web server. This server must have the public key that can be used to validate the token in a well-known location, as described by the OpenID Connect Discovery standard . If the issuer is down, or the the public key cannot be downloaded, the token cannot be verified and will be rejected. Note that most clients will cache the public key. In order to ease the token transition, the current cache lifetime is 4 days, but at some point this will be lowered to a few hours. A collaboration may have more than one token issuer, but a single token issuer should never serve more than one collaboration. The issuer claim should be able to uniquely identify the collaboration that identifies the token.","title":"Issuer"},{"location":"security/tokens/overview/#subject","text":"The subject is listed in the sub claim and should be unique, stable identifier that corresponds to a user (human) or a service (robot or pilot job submission). A subject does not need to be globally unique but it must be unique to the issuer. The subject, when combined with the issuer, will give a globally unique identity that can be used for mapping, banning, accounting, monitoring, auditing, or tracing. Note Due to privacy concerns, the subject may be a randomly generated string, hash, UUID, etc., that does not contain any personally identifying information. Tracing a token to a user or service may require contacting the issuer.","title":"Subject"},{"location":"security/tokens/overview/#scopes","text":"The scope claim is a space-separated list of authorizations that should be granted to the bearer. Scopes utilized by OSG services include the following: Capability SciTokens scope WLCG scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. A SciToken must have a non-empty scope, or it cannot be used to do anything.","title":"Scopes"},{"location":"security/tokens/overview/#wlcg-groups","text":"A WLCG Token may have a wlcg.groups claim instead of a scope. The wlcg.groups claim is a comma and space separated list of collaboration groups. The format of these groups are similar to VOMS FQANs: /<collaboration>[/<group>][/Role=<role>] , replacing <collaboration> , <group> , and <role> with the collaboration, group, and role, respectively, where the group and role are optional. For example, the following groups and roles have been used by the ATLAS and CMS collaborations: /atlas/ /atlas/usatlas /cms/Role=pilot /cms/local/Role=pilot","title":"WLCG Groups"},{"location":"security/tokens/overview/#validating-tokens-in-pilot-jobs","text":"If an incoming (pre-routed) pilot on a CE has a token, it will have the following classad attributes: Attribute Meaning AuthTokenId A UUID of the token AuthTokenIssuer The URL of the issuer of the token AuthTokenScopes Any scope restrictions on the token AuthTokenSubject The sub claim of the token AuthTokenGroups The wlcg.groups , if any, claim of the token (A pre-routed job is a job without RoutedJob=True in its classad.) Note A job may have both a token and an X.509 proxy. Presence of any x509* attributes does not indicate the absence of a token. To see which authentication method was used for a job: - Examine the /var/log/condor-ce/AuditLog* files. - Find a line saying Submitting new job <JOBID> (where <JOBID> is a job ID like 21249.0 ). The line before that should say what authentication method was used. - Authentication via a token will say AuthMethod=SCITOKENS . - Authentication via a proxy will say AuthMethod=GSI . See the upstream documentation for more details.","title":"Validating Tokens in Pilot Jobs"},{"location":"security/tokens/overview/#collaboration-support","text":"Verify support with collaborations The tables of collaborations below are updated as frequently as possible. If a collaboration you support is listed as not supporting tokens or WebDav, please contact your collaboration directly to verify that this information is up-to-date.","title":"Collaboration support"},{"location":"security/tokens/overview/#webdavxrootd-file-transfer","text":"The following collaborations support support file transfer using WebDAV or XRootD: Collaboration Supports WebDAV or XRootD ATLAS Yes CMS Yes CLAS12 Yes EIC N/A GLOW Yes GlueX N/A HCC N/A IceCube Undergoing testing* LIGO Undergoing testing* OSG Yes * Currently, collaborations testing WebDAV or XRootD support will continue to support other file transfer protocols so it should it should be safe to update your OSG WN clients to OSG 3.6. If you have any questions, please contact your collaboration directly.","title":"WebDAV/XRootD File transfer"},{"location":"security/tokens/overview/#help","text":"To get assistance, please use the this page .","title":"Help"},{"location":"security/tokens/overview/#references","text":"Troubleshooting Tokens OSG Technology - Collaborations and Bearer Tokens JSON Web Tokens - includes token decoder SciTokens SciToken Claims and Scopes Language SciTokens Demo - includes token generator, verifier, and links to libraries WLCG Common JWT Profiles","title":"References"},{"location":"security/tokens/using-tokens/","text":"Using Bearer Tokens \u00b6 As part of the GridFTP and GSI migration , the OSG has transitioned authentication away from X.509 certificates to the use of bearer tokens such as SciTokens or WLCG JWT . Use this document to learn how to request tokens from an OpenID Connect (OIDC) Provider or how to generate a test token for validating your OSG services. Requesting Tokens From An OIDC Provider \u00b6 If you are a member of a collaboration with an OIDC provider (such as CILogon or Indigo IAM ), you can use the oidc-agent client to request tokens. This client tool is available either as a container or as an RPM installation . Alternatively, a collaboration may choose to set up a shared htvault-config service that is registered as the OIDC client or clients and enables each user to have a simpler experience to obtain tokens using the htgettoken command while at the same time keeping long-lived refresh tokens stored more securely. Both of those can be installed as RPMs from OSG repos as described at the above links, and they are also integrated with HTCondor . OSG Software recommends those tools as documented at those links for when collaborations are ready to use tokens in production, but the rest of this page gives instructions for oidc-agent which is better for early experimentation with tokens. At the end of the page we also recommend installing the htgettoken package just for its additional htdecodetoken command which is useful for looking inside tokens. Alternative tokens for testing If you are not a member of a collaboration with access to an OIDC provider, you can generate test SciTokens using these instructions Using a Container \u00b6 Registering an OIDC profile \u00b6 Start an agent container in the background and name it my-agent to easily run subsequent commands against it: docker run -d --name my-agent opensciencegrid/oidc-agent:3.6-release Generate a local client profile and follow the prompts: docker exec -it my-agent oidc-gen -w device <CLIENT PROFILE> Specify an OIDC provider such as CILogon or an IAM instance as the client issuer. For example, if you are requesting tokens from the WLCG IAM instance: Issuer [https://iam-test.indigo-datacloud.eu/]: https://wlcg.cloud.cnaf.infn.it/ Request scopes for the capabilities that you need based on the type of tokens that your provider issues: Capability SciTokens Scope WLCG Scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. If you are requesting WLCG tokens, you will need to also add the wlcg and offline_access scopes. For example, to request HTCondor READ and WRITE access from an OIDC provider issuing WLCG tokens, specify the following when prompted for a space delimited list of scopes: wlcg offline_access compute.read compute.modify compute.cancel compute.create When prompted, open the verification URL provided a browser, enter the code provided by oidc-gen , and click \"Submit\". Follow the instructions in your browser to authorize your new oidc-agent client Back in your terminal, enter a password to encrypt your local client profile. You'll need to remember this if you want to re-use this profile in subsequent sessions. Requesting access tokens \u00b6 Note You must first register a new profile . Request a token using the client profile that you used with oidc-gen : docker exec -it my-agent oidc-token --aud=\"<SERVER AUDIENCE>\" <CLIENT PROFILE> For tokens used against an HTCondor-CE, set <SERVER AUDIENCE> to <CE FQDN>:<CE PORT> . Copy the output of oidc-token into a file on the host where you need bearer token authentication, e.g. an HTCondor or XRootD client. Reloading an OIDC profile \u00b6 Note Required after restarting the running container. You must have an existing registered profile . If your existing container is not already running, start it: docker start my-agent Reload profile: docker exec -it my-agent oidc-add <CLIENT PROFILE> Enter the password used to encrypt your <CLIENT PROFILE> created during profile registration. Using an RPM installation \u00b6 Registering an OIDC profile \u00b6 Start the agent and add the appropriate variables to your environment: eval `oidc-agent` Generate a local client profile and follow the prompts: oidc-gen -w device <CLIENT PROFILE> Specify an OIDC provider such as CILogon or an IAM instance as the client issuer. For example, if you are requesting tokens from the WLCG IAM instance: Issuer [https://iam-test.indigo-datacloud.eu/]: https://wlcg.cloud.cnaf.infn.it/ Request scopes for the capabilities that you need based on the type of tokens that your provider issues: Capability SciTokens Scope WLCG Scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. If you are requesting WLCG tokens, you will need to also add the wlcg and offline_access scopes. For example, to request HTCondor READ and WRITE access from an OIDC provider issuing WLCG tokens, specify the following when prompted for a space delimited list of scopes: wlcg offline_access compute.read compute.modify compute.cancel compute.create When prompted, open the verification URL provided a browser, enter the code provided by oidc-gen , and click \"Submit\". Follow the instructions in your browser to authorize your new oidc-agent client Back in your terminal, enter a password to encrypt your local client profile. You'll need to remember this if you want to re-use this profile in subsequent sessions. Requesting access tokens \u00b6 Note You must first register a new profile . Request a token using the client profile that you used with oidc-gen : oidc-token --aud=\"<SERVER AUDIENCE>\" <CLIENT PROFILE> For tokens used against an HTCondor-CE, set <SERVER AUDIENCE> to <CE FQDN>:<CE PORT> . Copy the output of oidc-token into a file on the host where you need bearer token authentication, e.g. an HTCondor or XRootD client. Reloading an OIDC profile \u00b6 Note Required if you log out of the running machine. You must have an existing registered profile . If you do not already have a running 'oidc-agent', start one: eval 'oidc-agent' Reload profile: oidc-add <CLIENT PROFILE> Enter the password used to encrypt your <CLIENT PROFILE> created during profile registration. Generating SciTokens For Testing \u00b6 If you are not a member of a collaboration with an OIDC Provider and would like to validate token functionality with your HTCondor-CE or XRootD service, you can use the SciTokens demo website : Open https://demo.scitokens.org in a browser window Add a subject claim to the generated token by adding the following to the PAYLOAD: DATA window, between the curly braces: \"sub\": \"<subject string>\", Replacing <subject string> with a subject appropriate for the service that you are testing: Any random string for an HTCondor-CE, which should be reflected in your token mapping If you are using xrootd-multiuser , a local Unix username Add a scopes claim to the generated token by adding the following to the PAYLOAD: DATA window, between the curly braces: \"scope\": \"<space separated list of scopes>\", Replacing <list of scopes> appropriate for the service and authorization that you are interested in testing: Capability Scope Note HTCondor READ condor:/READ Required for job submission HTCondor WRITE condor:/WRITE Required for job submission XRootD read read:<PATH> XRootD write write:<PATH> Copy the entire contents of the Encoded window to a file where you will be running your client commands Add https://demo.scitokens.org (and subject if appropriate) to your service's configuration to authenticate your new test token Remove test mappings After completing testing, remove any test demo.scitokens.org mappings that you have added as anyone is capable of creating a demo SciToken. Using Tokens \u00b6 Client tools such as condor_submit or xrdcp will search for your access token in order of the following locations: Token contents in the $BEARER_TOKEN environment variable Path to the token in the $BEARER_TOKEN_FILE environment variable Path to the token in $XDG_RUNTIME_DIR/bt_u$UID Token saved to /tmp/bt_u$UID For more details, see the WLCG Bearer Token Discovery technical note . Troubleshooting Tokens \u00b6 Validating tokens \u00b6 A token must be a one-line string consisting of 3 base64-encoded parts separated by periods ( . ). You can use the tools in the scitokens-cpp RPM to validate a SciToken or WLCG token. Run scitokens-verify <TOKEN> (where <TOKEN> is the text of the token) to validate the token using the issuer. Run scitokens-list-access <TOKEN> <ISSUER> <AUDIENCE> (where <TOKEN> is the text of the token, <ISSUER> is the issuer to verify the token with, and <AUDIENCE> is the server you are using the token to access). Examining tokens \u00b6 Online: paste the token into https://jwt.io . Offline: Install htgettoken : # yum install htgettoken Write the token to a file named tok or store it in one of the default WLCG Bearer Token Discovery locations described above. Run htdecodetoken -H tok or leave off the tok filename if it is in one of the default locations. htdecodetoken is one of the additional commands that come with the htgettoken package.","title":"Using Tokens"},{"location":"security/tokens/using-tokens/#using-bearer-tokens","text":"As part of the GridFTP and GSI migration , the OSG has transitioned authentication away from X.509 certificates to the use of bearer tokens such as SciTokens or WLCG JWT . Use this document to learn how to request tokens from an OpenID Connect (OIDC) Provider or how to generate a test token for validating your OSG services.","title":"Using Bearer Tokens"},{"location":"security/tokens/using-tokens/#requesting-tokens-from-an-oidc-provider","text":"If you are a member of a collaboration with an OIDC provider (such as CILogon or Indigo IAM ), you can use the oidc-agent client to request tokens. This client tool is available either as a container or as an RPM installation . Alternatively, a collaboration may choose to set up a shared htvault-config service that is registered as the OIDC client or clients and enables each user to have a simpler experience to obtain tokens using the htgettoken command while at the same time keeping long-lived refresh tokens stored more securely. Both of those can be installed as RPMs from OSG repos as described at the above links, and they are also integrated with HTCondor . OSG Software recommends those tools as documented at those links for when collaborations are ready to use tokens in production, but the rest of this page gives instructions for oidc-agent which is better for early experimentation with tokens. At the end of the page we also recommend installing the htgettoken package just for its additional htdecodetoken command which is useful for looking inside tokens. Alternative tokens for testing If you are not a member of a collaboration with access to an OIDC provider, you can generate test SciTokens using these instructions","title":"Requesting Tokens From An OIDC Provider"},{"location":"security/tokens/using-tokens/#using-a-container","text":"","title":"Using a Container"},{"location":"security/tokens/using-tokens/#registering-an-oidc-profile","text":"Start an agent container in the background and name it my-agent to easily run subsequent commands against it: docker run -d --name my-agent opensciencegrid/oidc-agent:3.6-release Generate a local client profile and follow the prompts: docker exec -it my-agent oidc-gen -w device <CLIENT PROFILE> Specify an OIDC provider such as CILogon or an IAM instance as the client issuer. For example, if you are requesting tokens from the WLCG IAM instance: Issuer [https://iam-test.indigo-datacloud.eu/]: https://wlcg.cloud.cnaf.infn.it/ Request scopes for the capabilities that you need based on the type of tokens that your provider issues: Capability SciTokens Scope WLCG Scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. If you are requesting WLCG tokens, you will need to also add the wlcg and offline_access scopes. For example, to request HTCondor READ and WRITE access from an OIDC provider issuing WLCG tokens, specify the following when prompted for a space delimited list of scopes: wlcg offline_access compute.read compute.modify compute.cancel compute.create When prompted, open the verification URL provided a browser, enter the code provided by oidc-gen , and click \"Submit\". Follow the instructions in your browser to authorize your new oidc-agent client Back in your terminal, enter a password to encrypt your local client profile. You'll need to remember this if you want to re-use this profile in subsequent sessions.","title":"Registering an OIDC profile"},{"location":"security/tokens/using-tokens/#requesting-access-tokens","text":"Note You must first register a new profile . Request a token using the client profile that you used with oidc-gen : docker exec -it my-agent oidc-token --aud=\"<SERVER AUDIENCE>\" <CLIENT PROFILE> For tokens used against an HTCondor-CE, set <SERVER AUDIENCE> to <CE FQDN>:<CE PORT> . Copy the output of oidc-token into a file on the host where you need bearer token authentication, e.g. an HTCondor or XRootD client.","title":"Requesting access tokens"},{"location":"security/tokens/using-tokens/#reloading-an-oidc-profile","text":"Note Required after restarting the running container. You must have an existing registered profile . If your existing container is not already running, start it: docker start my-agent Reload profile: docker exec -it my-agent oidc-add <CLIENT PROFILE> Enter the password used to encrypt your <CLIENT PROFILE> created during profile registration.","title":"Reloading an OIDC profile"},{"location":"security/tokens/using-tokens/#using-an-rpm-installation","text":"","title":"Using an RPM installation"},{"location":"security/tokens/using-tokens/#registering-an-oidc-profile_1","text":"Start the agent and add the appropriate variables to your environment: eval `oidc-agent` Generate a local client profile and follow the prompts: oidc-gen -w device <CLIENT PROFILE> Specify an OIDC provider such as CILogon or an IAM instance as the client issuer. For example, if you are requesting tokens from the WLCG IAM instance: Issuer [https://iam-test.indigo-datacloud.eu/]: https://wlcg.cloud.cnaf.infn.it/ Request scopes for the capabilities that you need based on the type of tokens that your provider issues: Capability SciTokens Scope WLCG Scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. If you are requesting WLCG tokens, you will need to also add the wlcg and offline_access scopes. For example, to request HTCondor READ and WRITE access from an OIDC provider issuing WLCG tokens, specify the following when prompted for a space delimited list of scopes: wlcg offline_access compute.read compute.modify compute.cancel compute.create When prompted, open the verification URL provided a browser, enter the code provided by oidc-gen , and click \"Submit\". Follow the instructions in your browser to authorize your new oidc-agent client Back in your terminal, enter a password to encrypt your local client profile. You'll need to remember this if you want to re-use this profile in subsequent sessions.","title":"Registering an OIDC profile"},{"location":"security/tokens/using-tokens/#requesting-access-tokens_1","text":"Note You must first register a new profile . Request a token using the client profile that you used with oidc-gen : oidc-token --aud=\"<SERVER AUDIENCE>\" <CLIENT PROFILE> For tokens used against an HTCondor-CE, set <SERVER AUDIENCE> to <CE FQDN>:<CE PORT> . Copy the output of oidc-token into a file on the host where you need bearer token authentication, e.g. an HTCondor or XRootD client.","title":"Requesting access tokens"},{"location":"security/tokens/using-tokens/#reloading-an-oidc-profile_1","text":"Note Required if you log out of the running machine. You must have an existing registered profile . If you do not already have a running 'oidc-agent', start one: eval 'oidc-agent' Reload profile: oidc-add <CLIENT PROFILE> Enter the password used to encrypt your <CLIENT PROFILE> created during profile registration.","title":"Reloading an OIDC profile"},{"location":"security/tokens/using-tokens/#generating-scitokens-for-testing","text":"If you are not a member of a collaboration with an OIDC Provider and would like to validate token functionality with your HTCondor-CE or XRootD service, you can use the SciTokens demo website : Open https://demo.scitokens.org in a browser window Add a subject claim to the generated token by adding the following to the PAYLOAD: DATA window, between the curly braces: \"sub\": \"<subject string>\", Replacing <subject string> with a subject appropriate for the service that you are testing: Any random string for an HTCondor-CE, which should be reflected in your token mapping If you are using xrootd-multiuser , a local Unix username Add a scopes claim to the generated token by adding the following to the PAYLOAD: DATA window, between the curly braces: \"scope\": \"<space separated list of scopes>\", Replacing <list of scopes> appropriate for the service and authorization that you are interested in testing: Capability Scope Note HTCondor READ condor:/READ Required for job submission HTCondor WRITE condor:/WRITE Required for job submission XRootD read read:<PATH> XRootD write write:<PATH> Copy the entire contents of the Encoded window to a file where you will be running your client commands Add https://demo.scitokens.org (and subject if appropriate) to your service's configuration to authenticate your new test token Remove test mappings After completing testing, remove any test demo.scitokens.org mappings that you have added as anyone is capable of creating a demo SciToken.","title":"Generating SciTokens For Testing"},{"location":"security/tokens/using-tokens/#using-tokens","text":"Client tools such as condor_submit or xrdcp will search for your access token in order of the following locations: Token contents in the $BEARER_TOKEN environment variable Path to the token in the $BEARER_TOKEN_FILE environment variable Path to the token in $XDG_RUNTIME_DIR/bt_u$UID Token saved to /tmp/bt_u$UID For more details, see the WLCG Bearer Token Discovery technical note .","title":"Using Tokens"},{"location":"security/tokens/using-tokens/#troubleshooting-tokens","text":"","title":"Troubleshooting Tokens"},{"location":"security/tokens/using-tokens/#validating-tokens","text":"A token must be a one-line string consisting of 3 base64-encoded parts separated by periods ( . ). You can use the tools in the scitokens-cpp RPM to validate a SciToken or WLCG token. Run scitokens-verify <TOKEN> (where <TOKEN> is the text of the token) to validate the token using the issuer. Run scitokens-list-access <TOKEN> <ISSUER> <AUDIENCE> (where <TOKEN> is the text of the token, <ISSUER> is the issuer to verify the token with, and <AUDIENCE> is the server you are using the token to access).","title":"Validating tokens"},{"location":"security/tokens/using-tokens/#examining-tokens","text":"Online: paste the token into https://jwt.io . Offline: Install htgettoken : # yum install htgettoken Write the token to a file named tok or store it in one of the default WLCG Bearer Token Discovery locations described above. Run htdecodetoken -H tok or leave off the tok filename if it is in one of the default locations. htdecodetoken is one of the additional commands that come with the htgettoken package.","title":"Examining tokens"},{"location":"submit/ap-ospool-aup/","text":"Acceptable Use Policy For OSG Access Points and the OSPool \u00b6 The OSG Access Points and the Open Science Pool (OSPool) are shared resources in support of US Open Science research. As a shared resource, actions of one researcher can impact other researchers. It is therefore important that all parties involved in using or operating these OSG services follow the Acceptable Use Policies (AUP). The OSPool offers capacity contributed by organizations that are members of the OSG Compute Federation. These contributions are entrusted to the OSG Consortium to be shared with US researchers and their collaborators in support of their research. It is the responsibility of all parties involved to maximize the impact of these contributions to Open Science. The OSG operates shared Access Points ( https://connect.osg-htc.org ) that provide researchers with capabilities to harness the capacity of the OSPool. Misuse of the resources of these Access Points can slow down or prevent the launching of jobs to the OSPool or file movement to and from jobs served by the OSPool. This AUP outlines the responsibilities for operators of Access Points and researchers that place their workloads on OSG-managed Access Points that harness the capacity of the OSPool. General Use Limitations \u00b6 In addition to the user or operator specific policies outlined in the sections below, all AP users and operators are expected to adhere to the following usage limitations: You agree that work running on the OSPool through your Access Point will be relevant to research and/or education efforts associated with an academic, government, or non-profit institution in the United States. Use by external collaborators of such an institution are permitted, provided they are relevant as defined above. Use of other resources and services via the Access Point should also follow relevant policies of use for those resources and services. Efforts benefitting from the OSPool and other Access Point features shall provide appropriate acknowledgement of support or citation; please see this page for information about citation. You shall not use the Access Point or OSPool for any purpose that is unlawful and not (attempt to) breach or circumvent any administrative or security controls. You shall respect intellectual property and confidentiality agreements. You shall protect your access credentials (e.g. private keys, tokens, or passwords). You shall keep all your registered information correct and up to date. You shall immediately report any known or suspected security breach or misuse of the resources/services or access credentials to the specified incident reporting locations and to the relevant credential-issuing authorities. You use the resources/services at your own risk. There is no guarantee that the resources/services will be available at any time or that their integrity or confidentiality will be preserved or that they will suit any purpose. You agree that logged information, including personal data provided by you for registration purposes, may be used for administrative, operational, accounting, monitoring and security purposes. You agree that this logged information may be disclosed to other authorized participants via secured mechanisms, only for the same purposes and only as far as necessary to provide the services. You agree that the body or bodies granting you access and resource/service providers are entitled to regulate, suspend or terminate your access without prior notice and without compensation, within their domain of authority, and you shall immediately comply with their instructions. You are liable for the consequences of your violation of any of these conditions of use, which may include but are not limited to the reporting of your violation to your home institute and, if the activities are thought to be illegal, to appropriate law enforcement agencies. You are responsible for ensuring that your use of OSG Connect and the OSPool is appropriate and does not violate any other requirements. This includes adhering to any applicable agreements regarding appropriate use, any regulatory requirements, any licensing agreements, privacy agreements, or any other requirements covering the data and software which you use with OSG Connect or the OSPool. Access Point Operation Policy \u00b6 This Operational Policy enumerates the roles and responsibilities of the Access Point (AP) operator; the operator of an AP is then responsible for their users' behaviors. The OSG Consortium collects metadata about the jobs run on the OSPool to present resources and funding agencies with information like the resource usage, project, and field of science; APs must participate in this resource accounting system. Additionally, the statistics about jobs (aggregated per-user-per-project) will be used to monitor the overall health and throughput of the pool; staff may follow up with APs in light of poor resource utilization. As the OSPool operations staff has a responsibility to the entire pool, the OSPool operations staff may disable the AP\u2019s connection to the OSPool, as necessary, to maintain operational integrity and resource utilization. When disabled, the AP will no longer receive resources from the OSPool (but may still use other non-OSPool resources). If the OSPool operators are unable to contact the AP administrator in a timely manner to respond to an urgent operational issue (such as a security incident or workload crashing execution points), they may disable the AP. To reduce the security attack surface of the OSPool, operators will disable APs that have been idle for more than 90 days. APs will need to contact the OSPool operators to be re-enabled.","title":"Acceptable Use Policy"},{"location":"submit/ap-ospool-aup/#acceptable-use-policy-for-osg-access-points-and-the-ospool","text":"The OSG Access Points and the Open Science Pool (OSPool) are shared resources in support of US Open Science research. As a shared resource, actions of one researcher can impact other researchers. It is therefore important that all parties involved in using or operating these OSG services follow the Acceptable Use Policies (AUP). The OSPool offers capacity contributed by organizations that are members of the OSG Compute Federation. These contributions are entrusted to the OSG Consortium to be shared with US researchers and their collaborators in support of their research. It is the responsibility of all parties involved to maximize the impact of these contributions to Open Science. The OSG operates shared Access Points ( https://connect.osg-htc.org ) that provide researchers with capabilities to harness the capacity of the OSPool. Misuse of the resources of these Access Points can slow down or prevent the launching of jobs to the OSPool or file movement to and from jobs served by the OSPool. This AUP outlines the responsibilities for operators of Access Points and researchers that place their workloads on OSG-managed Access Points that harness the capacity of the OSPool.","title":"Acceptable Use Policy For OSG Access Points and the OSPool"},{"location":"submit/ap-ospool-aup/#general-use-limitations","text":"In addition to the user or operator specific policies outlined in the sections below, all AP users and operators are expected to adhere to the following usage limitations: You agree that work running on the OSPool through your Access Point will be relevant to research and/or education efforts associated with an academic, government, or non-profit institution in the United States. Use by external collaborators of such an institution are permitted, provided they are relevant as defined above. Use of other resources and services via the Access Point should also follow relevant policies of use for those resources and services. Efforts benefitting from the OSPool and other Access Point features shall provide appropriate acknowledgement of support or citation; please see this page for information about citation. You shall not use the Access Point or OSPool for any purpose that is unlawful and not (attempt to) breach or circumvent any administrative or security controls. You shall respect intellectual property and confidentiality agreements. You shall protect your access credentials (e.g. private keys, tokens, or passwords). You shall keep all your registered information correct and up to date. You shall immediately report any known or suspected security breach or misuse of the resources/services or access credentials to the specified incident reporting locations and to the relevant credential-issuing authorities. You use the resources/services at your own risk. There is no guarantee that the resources/services will be available at any time or that their integrity or confidentiality will be preserved or that they will suit any purpose. You agree that logged information, including personal data provided by you for registration purposes, may be used for administrative, operational, accounting, monitoring and security purposes. You agree that this logged information may be disclosed to other authorized participants via secured mechanisms, only for the same purposes and only as far as necessary to provide the services. You agree that the body or bodies granting you access and resource/service providers are entitled to regulate, suspend or terminate your access without prior notice and without compensation, within their domain of authority, and you shall immediately comply with their instructions. You are liable for the consequences of your violation of any of these conditions of use, which may include but are not limited to the reporting of your violation to your home institute and, if the activities are thought to be illegal, to appropriate law enforcement agencies. You are responsible for ensuring that your use of OSG Connect and the OSPool is appropriate and does not violate any other requirements. This includes adhering to any applicable agreements regarding appropriate use, any regulatory requirements, any licensing agreements, privacy agreements, or any other requirements covering the data and software which you use with OSG Connect or the OSPool.","title":"General Use Limitations"},{"location":"submit/ap-ospool-aup/#access-point-operation-policy","text":"This Operational Policy enumerates the roles and responsibilities of the Access Point (AP) operator; the operator of an AP is then responsible for their users' behaviors. The OSG Consortium collects metadata about the jobs run on the OSPool to present resources and funding agencies with information like the resource usage, project, and field of science; APs must participate in this resource accounting system. Additionally, the statistics about jobs (aggregated per-user-per-project) will be used to monitor the overall health and throughput of the pool; staff may follow up with APs in light of poor resource utilization. As the OSPool operations staff has a responsibility to the entire pool, the OSPool operations staff may disable the AP\u2019s connection to the OSPool, as necessary, to maintain operational integrity and resource utilization. When disabled, the AP will no longer receive resources from the OSPool (but may still use other non-OSPool resources). If the OSPool operators are unable to contact the AP administrator in a timely manner to respond to an urgent operational issue (such as a security incident or workload crashing execution points), they may disable the AP. To reduce the security attack surface of the OSPool, operators will disable APs that have been idle for more than 90 days. APs will need to contact the OSPool operators to be re-enabled.","title":"Access Point Operation Policy"},{"location":"submit/osg-flock/","text":"Installing an Open Science Pool Access Point \u00b6 This document explains how to add a path for user jobs to flow from your local site out to the OSG, which in most cases means that the jobs will have far more resources available to run on than locally. If your local batch system frequently has many jobs waiting to run for a long time, you do not have a local batch system, or if you simply want to provide a local entry point for OSG-bound jobs, adding a path to OSG may result in less waiting for your users. Note that if you do not have a local batch system, consider having your users use OSG Connect , which will require less infrastructure work at your site. Note Flocking to the OSG requires some modification to user workflows. After installation, see the usage section for instructions on what your users will need to do. Background \u00b6 Every batch computing system has one or more entry points that users log on to and use to hand over their computing work to the batch system for completion. For the HTCondor batch system, we say that users log on to a access point (i.e., submit node, submit host) to submit their jobs to HTCondor, where the jobs wait (\"are queued\") until computing resources are available to run them. In a purely local HTCondor system, there are one to a few access points and many computing resources. An HTCondor access point can also be configured to forward excess jobs to an OSG-managed pool. This process is called flocking . If you already have an HTCondor pool, we recommend that you install this software on top of one of your existing HTCondor access points. This approach allows a user to submit locally and have their jobs run locally or, if the user chooses and if local resources are unavailable, have their jobs automatically flock to OSG. If you do not have an HTCondor batch system, following these instructions will install the HTCondor submit service and configure it only to forward jobs to the OSG. In other words, you do not need a whole HTCondor batch system just to have a local OSG access point. System Requirements \u00b6 The hardware requirement for an OSG access point depends on several factors such as number of users, number of jobs and for example how I/O intensity of those jobs. Our minimum recommended configuration is 6 cores, 12 GB RAM and 1 TB of local disk. The hardware can be bare metal or virtual machine, but we do not recommend containers as these submit host are running many system services which can be difficult to configure in a container. Also consider the following configuration requirements: Operating system: Ensure the host has a supported operating system Software repositories: Install the appropriate EPEL and OSG Yum repositories for your operating system User IDs: If it does not exist already, the installation will create the Linux user ID condor . Network: Inbound TCP port 9618 must be open. The access point must have a public IP address with both forward and reverse DNS configured. Scheduling a Planning Consultation \u00b6 Before participating in the OSG, either as a computational resource contributor or consumer, we ask that you contact us to set up a consultation. During this consultation, OSG staff will introduce you and your team to the OSG and develop a plan to meet your resource contribution and/or research goals. Initial Steps \u00b6 Read the Acceptable Usage Policy \u00b6 Be aware that hosting an access point comes with responsibilities, both for the administrators as well as end users of the system. The polices can be found in the Acceptable Usage Policy document . Register your access point in OSG Topology \u00b6 To be part of OSG, your access point should be registered with the OSG. You will need information like the hostname, and the administrative and security contacts. Follow the general registration instructions . For historical reasons, the service type is Submit Node . We also request that you tag the resources with OSPool . An example of a registration is the osg-vo.isi.edu entry Register with COManage \u00b6 The adminstrative contact from the the topology entry needs to register with COManage. Instructions can be found here Next is to retrive a token so that the new submit host can authenticate with the Open Science Pool manager. Please use your COManage registered and approved identity to log into the OSG Token Registration . Once logged in, select Token on Docker , and find your registered submit node in the list. Follow the instructions (you probably have to do the steps on a host with Docker and as root), and once you have the token generated, keep that for later steps. Installing Required Software \u00b6 Flocking requires HTCondor software as well as software for reporting to the OSG accounting system. Start by setting up the EPEL and OSG YUM repositories following the Installing Yum Repositories document. Note that you have to use OSG 3.6 . Earlier versions will not work. Once the YUM repositories are setup, install the osg-flock convenience RPM that installs all required packages. Example on a RHEL 7 host: # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm # yum install osg-flock Upgrading \u00b6 Upgrading from previous versions should be as simple as switching to OSG 3.6, and then issuing yum upgrade . If you made local config changes, please verify that the files under /etc/condor/config.d were renamed/disabled during the upgrade. Note that in some older versions of the package, the Gratia config was kept in /etc/gratia/condor/ProbeConfig . The new location is /etc/gratia/condor-ap/ProbeConfig . The Open Science Pool will no longer accept GSI authentcation. Access points still configured with GSI, will have to be upgraded to OSG 3.6 and switched over to token authentication as described in this document. Configuring Reporting via Gratia \u00b6 Reporting to the OSG accounting system is done using the Gratia service, which consists of probes . HTCondor uses the \"condor-ap\" probe, which is configured in /etc/gratia/condor-ap/ProbeConfig : see this section for more details. Configuring Authentication \u00b6 Create a file named /etc/condor/tokens.d/ospool.token with the IDTOKEN you received earlier. Ensure that there aren't any line breaks in this file (i.e., the entire token should only take up one line). Change the ownership to condor:condor and the permissions to 0600 . Verify this with ls -l /etc/condor/tokens.d/ospool.token : # ls -l /etc/condor/tokens.d/ospool.token -rw------- 1 condor condor 288 Nov 11 09:03 /etc/condor/tokens.d/ospool.token You can also list the token with the condor_token_list command: # condor_token_list Header: {\"alg\":\"HS256\",\"kid\":\"POOL\"} Payload: {\"iat\":1234,\"iss\":\"flock.opensciencegrid.org\",\"jti\":\"...\",\"scope\":\"condor:\\/READ condor:\\/ADVERTISE_SCHEDD\",\"sub\":\"RESOURCE-hostname@flock.opensciencegrid.org\"} File: /etc/condor/tokens.d/ospool.token Managing Services \u00b6 The only service which is required to be running is condor . Enable and restart the sevice: # systemctl enable condor # systemctl restart condor Usage \u00b6 Running jobs in OSG \u00b6 If your users are accustomed to running jobs locally, they may encounter some significant differences when running jobs in OSG. Users should be aware that OSG jobs are distributed across multiple institutions across a large geographical area. Each institution will have its own policy about the kinds of jobs that are allowed to run, and data transfer may be more complicated. The OSG Helpdesk Solutions page has information about what users should know; the Organizing and Submitting HTC Workloads Tutorial and Policies for Using OSG Services and the OSPool are particularly relevant. Specifying a project \u00b6 OSG will only run jobs that have a registered project associated with them. Users must follow the instructions for starting a project in OSG-Connect to register a project. A project is associated with a job by adding a ProjectName line to the user's submit file. For example: +ProjectName = \"My_Project\" The double quotes are necessary . If not quoted, My_Project will be interpreted as an expression, and most likely evaluate to undefined, and prevent your job from running. Get Help \u00b6 If you need help with setup or troubleshooting, see our help procedure .","title":"Install an OSPool Access Point"},{"location":"submit/osg-flock/#installing-an-open-science-pool-access-point","text":"This document explains how to add a path for user jobs to flow from your local site out to the OSG, which in most cases means that the jobs will have far more resources available to run on than locally. If your local batch system frequently has many jobs waiting to run for a long time, you do not have a local batch system, or if you simply want to provide a local entry point for OSG-bound jobs, adding a path to OSG may result in less waiting for your users. Note that if you do not have a local batch system, consider having your users use OSG Connect , which will require less infrastructure work at your site. Note Flocking to the OSG requires some modification to user workflows. After installation, see the usage section for instructions on what your users will need to do.","title":"Installing an Open Science Pool Access Point"},{"location":"submit/osg-flock/#background","text":"Every batch computing system has one or more entry points that users log on to and use to hand over their computing work to the batch system for completion. For the HTCondor batch system, we say that users log on to a access point (i.e., submit node, submit host) to submit their jobs to HTCondor, where the jobs wait (\"are queued\") until computing resources are available to run them. In a purely local HTCondor system, there are one to a few access points and many computing resources. An HTCondor access point can also be configured to forward excess jobs to an OSG-managed pool. This process is called flocking . If you already have an HTCondor pool, we recommend that you install this software on top of one of your existing HTCondor access points. This approach allows a user to submit locally and have their jobs run locally or, if the user chooses and if local resources are unavailable, have their jobs automatically flock to OSG. If you do not have an HTCondor batch system, following these instructions will install the HTCondor submit service and configure it only to forward jobs to the OSG. In other words, you do not need a whole HTCondor batch system just to have a local OSG access point.","title":"Background"},{"location":"submit/osg-flock/#system-requirements","text":"The hardware requirement for an OSG access point depends on several factors such as number of users, number of jobs and for example how I/O intensity of those jobs. Our minimum recommended configuration is 6 cores, 12 GB RAM and 1 TB of local disk. The hardware can be bare metal or virtual machine, but we do not recommend containers as these submit host are running many system services which can be difficult to configure in a container. Also consider the following configuration requirements: Operating system: Ensure the host has a supported operating system Software repositories: Install the appropriate EPEL and OSG Yum repositories for your operating system User IDs: If it does not exist already, the installation will create the Linux user ID condor . Network: Inbound TCP port 9618 must be open. The access point must have a public IP address with both forward and reverse DNS configured.","title":"System Requirements"},{"location":"submit/osg-flock/#scheduling-a-planning-consultation","text":"Before participating in the OSG, either as a computational resource contributor or consumer, we ask that you contact us to set up a consultation. During this consultation, OSG staff will introduce you and your team to the OSG and develop a plan to meet your resource contribution and/or research goals.","title":"Scheduling a Planning Consultation"},{"location":"submit/osg-flock/#initial-steps","text":"","title":"Initial Steps"},{"location":"submit/osg-flock/#read-the-acceptable-usage-policy","text":"Be aware that hosting an access point comes with responsibilities, both for the administrators as well as end users of the system. The polices can be found in the Acceptable Usage Policy document .","title":"Read the Acceptable Usage Policy"},{"location":"submit/osg-flock/#register-your-access-point-in-osg-topology","text":"To be part of OSG, your access point should be registered with the OSG. You will need information like the hostname, and the administrative and security contacts. Follow the general registration instructions . For historical reasons, the service type is Submit Node . We also request that you tag the resources with OSPool . An example of a registration is the osg-vo.isi.edu entry","title":"Register your access point in OSG Topology"},{"location":"submit/osg-flock/#register-with-comanage","text":"The adminstrative contact from the the topology entry needs to register with COManage. Instructions can be found here Next is to retrive a token so that the new submit host can authenticate with the Open Science Pool manager. Please use your COManage registered and approved identity to log into the OSG Token Registration . Once logged in, select Token on Docker , and find your registered submit node in the list. Follow the instructions (you probably have to do the steps on a host with Docker and as root), and once you have the token generated, keep that for later steps.","title":"Register with COManage"},{"location":"submit/osg-flock/#installing-required-software","text":"Flocking requires HTCondor software as well as software for reporting to the OSG accounting system. Start by setting up the EPEL and OSG YUM repositories following the Installing Yum Repositories document. Note that you have to use OSG 3.6 . Earlier versions will not work. Once the YUM repositories are setup, install the osg-flock convenience RPM that installs all required packages. Example on a RHEL 7 host: # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm # yum install osg-flock","title":"Installing Required Software"},{"location":"submit/osg-flock/#upgrading","text":"Upgrading from previous versions should be as simple as switching to OSG 3.6, and then issuing yum upgrade . If you made local config changes, please verify that the files under /etc/condor/config.d were renamed/disabled during the upgrade. Note that in some older versions of the package, the Gratia config was kept in /etc/gratia/condor/ProbeConfig . The new location is /etc/gratia/condor-ap/ProbeConfig . The Open Science Pool will no longer accept GSI authentcation. Access points still configured with GSI, will have to be upgraded to OSG 3.6 and switched over to token authentication as described in this document.","title":"Upgrading"},{"location":"submit/osg-flock/#configuring-reporting-via-gratia","text":"Reporting to the OSG accounting system is done using the Gratia service, which consists of probes . HTCondor uses the \"condor-ap\" probe, which is configured in /etc/gratia/condor-ap/ProbeConfig : see this section for more details.","title":"Configuring Reporting via Gratia"},{"location":"submit/osg-flock/#configuring-authentication","text":"Create a file named /etc/condor/tokens.d/ospool.token with the IDTOKEN you received earlier. Ensure that there aren't any line breaks in this file (i.e., the entire token should only take up one line). Change the ownership to condor:condor and the permissions to 0600 . Verify this with ls -l /etc/condor/tokens.d/ospool.token : # ls -l /etc/condor/tokens.d/ospool.token -rw------- 1 condor condor 288 Nov 11 09:03 /etc/condor/tokens.d/ospool.token You can also list the token with the condor_token_list command: # condor_token_list Header: {\"alg\":\"HS256\",\"kid\":\"POOL\"} Payload: {\"iat\":1234,\"iss\":\"flock.opensciencegrid.org\",\"jti\":\"...\",\"scope\":\"condor:\\/READ condor:\\/ADVERTISE_SCHEDD\",\"sub\":\"RESOURCE-hostname@flock.opensciencegrid.org\"} File: /etc/condor/tokens.d/ospool.token","title":"Configuring Authentication"},{"location":"submit/osg-flock/#managing-services","text":"The only service which is required to be running is condor . Enable and restart the sevice: # systemctl enable condor # systemctl restart condor","title":"Managing Services"},{"location":"submit/osg-flock/#usage","text":"","title":"Usage"},{"location":"submit/osg-flock/#running-jobs-in-osg","text":"If your users are accustomed to running jobs locally, they may encounter some significant differences when running jobs in OSG. Users should be aware that OSG jobs are distributed across multiple institutions across a large geographical area. Each institution will have its own policy about the kinds of jobs that are allowed to run, and data transfer may be more complicated. The OSG Helpdesk Solutions page has information about what users should know; the Organizing and Submitting HTC Workloads Tutorial and Policies for Using OSG Services and the OSPool are particularly relevant.","title":"Running jobs in OSG"},{"location":"submit/osg-flock/#specifying-a-project","text":"OSG will only run jobs that have a registered project associated with them. Users must follow the instructions for starting a project in OSG-Connect to register a project. A project is associated with a job by adding a ProjectName line to the user's submit file. For example: +ProjectName = \"My_Project\" The double quotes are necessary . If not quoted, My_Project will be interpreted as an expression, and most likely evaluate to undefined, and prevent your job from running.","title":"Specifying a project"},{"location":"submit/osg-flock/#get-help","text":"If you need help with setup or troubleshooting, see our help procedure .","title":"Get Help"},{"location":"worker-node/install-apptainer/","text":"Install Apptainer \u00b6 Apptainer (formerly known as Singularity, see announcement ) is a tool that creates docker-like process containers but without giving extra privileges to unprivileged users. It is used by pilot jobs (which are submitted by per-collaboration workload management systems) to isolate user jobs from the pilot's files and processes and from other users' files and processes. It also supplies a chroot environment in order to run user jobs in different operating system images under one Linux kernel. Apptainer works either by making use of unprivileged user namespaces or with a setuid-root assist program. By default it does not install the setuid-root assist program and it uses only unprivileged user namespaces. Unprivileged user namespaces are available on all OS versions that OSG supports, although it is not enabled by default on EL 7; instructions to enable it are below . The feature is enabled by default on EL 8. Kernel vs. Userspace Security Enabling unprivileged user namespaces increases the risk to the kernel. However, the kernel is much more widely reviewed than Apptainer and the additional capability given to users is more limited. OSG Security considers the non-setuid, kernel-based method to have a lower security risk, and they also recommend disabling network namespaces as detailed below. The OSG has installed Apptainer in OASIS , so most sites will not need to install Apptainer locally unless they have non-OSG users that need it. This document is intended for system administrators that wish to enable, install, and/or configure Apptainer. Before Starting \u00b6 As with all OSG Software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host If you intend to install Apptainer locally, then prepare the required Yum repositories . Note that the apptainer RPM comes from the EPEL Yum repository. OSG validates that distribution, and detailed instructions are still here. In addition, this is highly recommended for image distribution and for access to Apptainer itself: Install CVMFS Choosing whether or not to install Apptainer \u00b6 There are two sets of instructions on this page: Enabling Unprivileged Apptainer Installing Apptainer OSG VOs all support running apptainer directly from CVMFS, when CVMFS is available and unprivileged user namespaces are enabled. Unprivileged user namespaces are enabled by default on EL 8, and OSG recommends that system administrators enable it on EL 7 worker nodes. When unprivileged user namespaces are enabled, OSG recommends that sites not install Apptainer unless they have non-OSG users that require it. Sites that do want to install apptainer locally have two choices on how to do it. They can install it with a script which creates an unprivileged relocatable installation directory, or they can install it by RPM. Sites that install the RPM will by default still only get a non-setuid installation that makes use of unprivileged user namespaces and will need to install an additional apptainer-suid RPM if they want a setuid installation that does not require unprivileged user namespaces. Enabling Unprivileged Apptainer \u00b6 The instructions in this section are for enabling Apptainer to run unprivileged by enabling unprivileged user namespaces. Enable user namespaces via sysctl on EL 7: If the operating system is an EL 7, enable unprivileged Apptainer with the following steps. This step is not needed on EL 8 because it is enabled by default. root@host # echo \"user.max_user_namespaces = 15000\" \\ > /etc/sysctl.d/90-max_user_namespaces.conf root@host # sysctl -p /etc/sysctl.d/90-max_user_namespaces.conf (Recommended) Disable network namespaces: root@host # echo \"user.max_net_namespaces = 0\" \\ > /etc/sysctl.d/90-max_net_namespaces.conf root@host # sysctl -p /etc/sysctl.d/90-max_net_namespaces.conf OSG VOs do not need network namespaces with Apptainer, and disabling them significantly lowers the risk profile of enabling user namespaces and reduces the frequency of needing to apply urgent updates. Most of the kernel vulnerabilities related to unprivileged user namespaces over the last few years have been in combination with network namespaces. Network namespaces are, however, utilized by other software, such as Docker or Podman. Disabling network namespaces may break other software, or limit its capabilities (such as requiring the --net=host option in Docker or Podman). Disabling network namespaces blocks the systemd PrivateNetwork feature, which is a feature that is used by some EL 8 services. It is also configured for some EL 7 services but they are all disabled by default. To check them all, look for PrivateNetwork in /lib/systemd/system/*.service and see which of those services are enabled but failed to start. The only default such service on EL 8 is systemd-hostnamed, and a popular non-default such service is mlocate-updatedb. The PrivateNetwork feature can be turned off for a service without modifying an RPM-installed file through a <service>.d/*.conf file, for example for systemd-hostnamed: root@host # cd /etc/systemd/system root@host # mkdir -p systemd-hostnamed.service.d root@host # ( echo \"[Service]\" ; echo \"PrivateNetwork=no\" ) \\ >systemd-hostnamed.service.d/no-private-network.conf root@host # systemctl daemon-reload root@host # systemctl start systemd-hostnamed root@host # systemctl status systemd-hostnamed Configuring Docker to work with Apptainer \u00b6 If docker is being used to run jobs, the following options are recommended to allow unprivileged Apptainer to run (it does not need --privileged or any added capabilities): --security-opt seccomp=unconfined --security-opt systempaths=unconfined --security-opt seccomp=unconfined enables unshare to be called (which is needed to create namespaces), and --security-opt systempaths=unconfined allows /proc to be mounted in an unprivileged process namespace (as is done by apptainer exec -p). --security-opt systempaths=unconfined requires Docker 19.03 or later. The options are secure as long as the system administrator controls the images and does not allow user code to run as root, and are generally more secure than adding capabilities. If at this point no setuid or setcap programs needs to be run within the container, adding the following option will improve security by preventing any privilege escalation (Apptainer uses the same feature on its containers): --security-opt no-new-privileges In addition, the following option is recommended for allowing unprivileged fuse mounts: --device=/dev/fuse Configuring Unprivileged Apptainer \u00b6 When unprivileged user namespaces are enabled and VOs run apptainer from CVMFS, the Apptainer configuration file also comes from CVMFS so local sites have no control over changing the configuration. However, the most common local configuration change to the apptainer RPM is to add additional local \"bind path\" options to map extra local file paths into containers. This can instead be accomplished by setting the APPTAINER_BINDPATH variable in the environment of jobs, for example through configuration on your compute entrypoint. This is a comma-separated list of paths to bind, following the syntax of the apptainer exec --bind option. In order to be backward compatible with Singularity, also set SINGULARITY_BINDPATH to the same value. Apptainer also recognizes that variable but it prints a deprecation warning if only a SINGULARITY_ variable is set without the corresponding APPTAINER_ variable. There are also other environment variables that can affect Apptainer operation; see the Apptainer documentation for details. Validating Unprivileged Apptainer in CVMFS \u00b6 If you will not be installing Apptainer locally and you haven't yet installed CVMFS , please do so. Alternatively, use the cvmfsexec package configured for osg as an unprivileged user and mount the oasis.opensciencegrid.org and singularity.opensciencegrid.org repositories. Then as an unprivileged user verify that Apptainer in CVMFS works with this command: user@host $ /cvmfs/oasis.opensciencegrid.org/mis/apptainer/bin/apptainer \\ exec --contain --ipc --pid --bind /cvmfs \\ /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el7:latest \\ ps -ef UID PID PPID C STIME TTY TIME CMD user 1 0 0 10:51 console 00:00:00 appinit user 11 1 0 10:51 console 00:00:00 /usr/bin/ps -ef Installing Apptainer \u00b6 The instructions in this section are for installing a local copy of Apptainer, either an unprivileged installation or an RPM installation. Installing Apptainer via unprivileged script \u00b6 To install a relocatable unprivileged installation of Apptainer, follow the instructions in the upstream documentation . Installing Apptainer via RPM \u00b6 To install the apptainer RPM, make sure that your host is up to date before installing the required packages: Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install Apptainer root@host # yum install apptainer If you choose to install the (not recommended) setuid-root portion of Apptainer, that can be done by instead doing this: root@host # yum install apptainer-suid Configuring Apptainer RPM \u00b6 Generally Apptainer requires no configuration, but if you install it by RPM the primary configuration is done in /etc/apptainer/apptainer.conf . Warning If you modify /etc/apptainer/apptainer.conf , be careful with your upgrade procedures. RPM will not automatically merge your changes with new upstream configuration keys, which may cause a broken install or inadvertently change the site configuration. Apptainer changes its default configuration file more frequently than typical OSG software. Look for apptainer.conf.rpmnew after upgrades and merge in any changes to the defaults. Upgrading from Singularity RPM \u00b6 When upgrading from Singularity to Apptainer, any local changes that were made to /etc/singularity/singularity.conf need to be manually migrated to /etc/apptainer/apptainer.conf and the /etc/singularity directory needs to be deleted. See the Apptainer Migrating from Singularity guide and its explanation of Singularity compatibility for more details. Limiting Image Types with Setuid Installation \u00b6 If the RPM installation is setuid, consider the following. Images based on loopback devices carry an inherently higher exposure to unknown kernel exploits compared to directory-based images distributed via CVMFS. See this article for further discussion. In setuid mode, the SIF images produced by default by Apptainer are mounted with loopback devices. However, OSG VOs only need directory-based images, and Apptainer can also mount SIF images using unprivileged user namespaces. Hence, it is reasonable to disable the loopback-based images by setting the following option in /etc/apptainer/apptainer.conf : max loop devices = 0 While reasonable for some sites, this is not required as there are currently no public kernel exploits for this issue; any exploits are patched by Red Hat when they are discovered. If loopback devices are disabled but unprivileged user namespaces are enabled, then users can run Apptainer with the --userns option (which is the same thing as the default in a non-setuid installation) and still be able to mount images unprivileged, although they will get an error if they don't use the option. Validating Apptainer installation \u00b6 After apptainer is installed, as an ordinary user run the following command to verify it: user@host $ apptainer exec --contain --ipc --pid docker://centos:7 ps -ef UID PID PPID C STIME TTY TIME CMD user 1 0 0 11:07 console 00:00:00 appinit user 12 1 0 11:07 console 00:00:00 /usr/bin/ps -ef Starting and Stopping Services \u00b6 Apptainer has no services to start or stop. References \u00b6 Apptainer Documentation Apptainer Support","title":"Install Apptainer"},{"location":"worker-node/install-apptainer/#install-apptainer","text":"Apptainer (formerly known as Singularity, see announcement ) is a tool that creates docker-like process containers but without giving extra privileges to unprivileged users. It is used by pilot jobs (which are submitted by per-collaboration workload management systems) to isolate user jobs from the pilot's files and processes and from other users' files and processes. It also supplies a chroot environment in order to run user jobs in different operating system images under one Linux kernel. Apptainer works either by making use of unprivileged user namespaces or with a setuid-root assist program. By default it does not install the setuid-root assist program and it uses only unprivileged user namespaces. Unprivileged user namespaces are available on all OS versions that OSG supports, although it is not enabled by default on EL 7; instructions to enable it are below . The feature is enabled by default on EL 8. Kernel vs. Userspace Security Enabling unprivileged user namespaces increases the risk to the kernel. However, the kernel is much more widely reviewed than Apptainer and the additional capability given to users is more limited. OSG Security considers the non-setuid, kernel-based method to have a lower security risk, and they also recommend disabling network namespaces as detailed below. The OSG has installed Apptainer in OASIS , so most sites will not need to install Apptainer locally unless they have non-OSG users that need it. This document is intended for system administrators that wish to enable, install, and/or configure Apptainer.","title":"Install Apptainer"},{"location":"worker-node/install-apptainer/#before-starting","text":"As with all OSG Software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host If you intend to install Apptainer locally, then prepare the required Yum repositories . Note that the apptainer RPM comes from the EPEL Yum repository. OSG validates that distribution, and detailed instructions are still here. In addition, this is highly recommended for image distribution and for access to Apptainer itself: Install CVMFS","title":"Before Starting"},{"location":"worker-node/install-apptainer/#choosing-whether-or-not-to-install-apptainer","text":"There are two sets of instructions on this page: Enabling Unprivileged Apptainer Installing Apptainer OSG VOs all support running apptainer directly from CVMFS, when CVMFS is available and unprivileged user namespaces are enabled. Unprivileged user namespaces are enabled by default on EL 8, and OSG recommends that system administrators enable it on EL 7 worker nodes. When unprivileged user namespaces are enabled, OSG recommends that sites not install Apptainer unless they have non-OSG users that require it. Sites that do want to install apptainer locally have two choices on how to do it. They can install it with a script which creates an unprivileged relocatable installation directory, or they can install it by RPM. Sites that install the RPM will by default still only get a non-setuid installation that makes use of unprivileged user namespaces and will need to install an additional apptainer-suid RPM if they want a setuid installation that does not require unprivileged user namespaces.","title":"Choosing whether or not to install Apptainer"},{"location":"worker-node/install-apptainer/#enabling-unprivileged-apptainer","text":"The instructions in this section are for enabling Apptainer to run unprivileged by enabling unprivileged user namespaces. Enable user namespaces via sysctl on EL 7: If the operating system is an EL 7, enable unprivileged Apptainer with the following steps. This step is not needed on EL 8 because it is enabled by default. root@host # echo \"user.max_user_namespaces = 15000\" \\ > /etc/sysctl.d/90-max_user_namespaces.conf root@host # sysctl -p /etc/sysctl.d/90-max_user_namespaces.conf (Recommended) Disable network namespaces: root@host # echo \"user.max_net_namespaces = 0\" \\ > /etc/sysctl.d/90-max_net_namespaces.conf root@host # sysctl -p /etc/sysctl.d/90-max_net_namespaces.conf OSG VOs do not need network namespaces with Apptainer, and disabling them significantly lowers the risk profile of enabling user namespaces and reduces the frequency of needing to apply urgent updates. Most of the kernel vulnerabilities related to unprivileged user namespaces over the last few years have been in combination with network namespaces. Network namespaces are, however, utilized by other software, such as Docker or Podman. Disabling network namespaces may break other software, or limit its capabilities (such as requiring the --net=host option in Docker or Podman). Disabling network namespaces blocks the systemd PrivateNetwork feature, which is a feature that is used by some EL 8 services. It is also configured for some EL 7 services but they are all disabled by default. To check them all, look for PrivateNetwork in /lib/systemd/system/*.service and see which of those services are enabled but failed to start. The only default such service on EL 8 is systemd-hostnamed, and a popular non-default such service is mlocate-updatedb. The PrivateNetwork feature can be turned off for a service without modifying an RPM-installed file through a <service>.d/*.conf file, for example for systemd-hostnamed: root@host # cd /etc/systemd/system root@host # mkdir -p systemd-hostnamed.service.d root@host # ( echo \"[Service]\" ; echo \"PrivateNetwork=no\" ) \\ >systemd-hostnamed.service.d/no-private-network.conf root@host # systemctl daemon-reload root@host # systemctl start systemd-hostnamed root@host # systemctl status systemd-hostnamed","title":"Enabling Unprivileged Apptainer"},{"location":"worker-node/install-apptainer/#configuring-docker-to-work-with-apptainer","text":"If docker is being used to run jobs, the following options are recommended to allow unprivileged Apptainer to run (it does not need --privileged or any added capabilities): --security-opt seccomp=unconfined --security-opt systempaths=unconfined --security-opt seccomp=unconfined enables unshare to be called (which is needed to create namespaces), and --security-opt systempaths=unconfined allows /proc to be mounted in an unprivileged process namespace (as is done by apptainer exec -p). --security-opt systempaths=unconfined requires Docker 19.03 or later. The options are secure as long as the system administrator controls the images and does not allow user code to run as root, and are generally more secure than adding capabilities. If at this point no setuid or setcap programs needs to be run within the container, adding the following option will improve security by preventing any privilege escalation (Apptainer uses the same feature on its containers): --security-opt no-new-privileges In addition, the following option is recommended for allowing unprivileged fuse mounts: --device=/dev/fuse","title":"Configuring Docker to work with Apptainer"},{"location":"worker-node/install-apptainer/#configuring-unprivileged-apptainer","text":"When unprivileged user namespaces are enabled and VOs run apptainer from CVMFS, the Apptainer configuration file also comes from CVMFS so local sites have no control over changing the configuration. However, the most common local configuration change to the apptainer RPM is to add additional local \"bind path\" options to map extra local file paths into containers. This can instead be accomplished by setting the APPTAINER_BINDPATH variable in the environment of jobs, for example through configuration on your compute entrypoint. This is a comma-separated list of paths to bind, following the syntax of the apptainer exec --bind option. In order to be backward compatible with Singularity, also set SINGULARITY_BINDPATH to the same value. Apptainer also recognizes that variable but it prints a deprecation warning if only a SINGULARITY_ variable is set without the corresponding APPTAINER_ variable. There are also other environment variables that can affect Apptainer operation; see the Apptainer documentation for details.","title":"Configuring Unprivileged Apptainer"},{"location":"worker-node/install-apptainer/#validating-unprivileged-apptainer-in-cvmfs","text":"If you will not be installing Apptainer locally and you haven't yet installed CVMFS , please do so. Alternatively, use the cvmfsexec package configured for osg as an unprivileged user and mount the oasis.opensciencegrid.org and singularity.opensciencegrid.org repositories. Then as an unprivileged user verify that Apptainer in CVMFS works with this command: user@host $ /cvmfs/oasis.opensciencegrid.org/mis/apptainer/bin/apptainer \\ exec --contain --ipc --pid --bind /cvmfs \\ /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el7:latest \\ ps -ef UID PID PPID C STIME TTY TIME CMD user 1 0 0 10:51 console 00:00:00 appinit user 11 1 0 10:51 console 00:00:00 /usr/bin/ps -ef","title":"Validating Unprivileged Apptainer in CVMFS"},{"location":"worker-node/install-apptainer/#installing-apptainer","text":"The instructions in this section are for installing a local copy of Apptainer, either an unprivileged installation or an RPM installation.","title":"Installing Apptainer"},{"location":"worker-node/install-apptainer/#installing-apptainer-via-unprivileged-script","text":"To install a relocatable unprivileged installation of Apptainer, follow the instructions in the upstream documentation .","title":"Installing Apptainer via unprivileged script"},{"location":"worker-node/install-apptainer/#installing-apptainer-via-rpm","text":"To install the apptainer RPM, make sure that your host is up to date before installing the required packages: Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install Apptainer root@host # yum install apptainer If you choose to install the (not recommended) setuid-root portion of Apptainer, that can be done by instead doing this: root@host # yum install apptainer-suid","title":"Installing Apptainer via RPM"},{"location":"worker-node/install-apptainer/#configuring-apptainer-rpm","text":"Generally Apptainer requires no configuration, but if you install it by RPM the primary configuration is done in /etc/apptainer/apptainer.conf . Warning If you modify /etc/apptainer/apptainer.conf , be careful with your upgrade procedures. RPM will not automatically merge your changes with new upstream configuration keys, which may cause a broken install or inadvertently change the site configuration. Apptainer changes its default configuration file more frequently than typical OSG software. Look for apptainer.conf.rpmnew after upgrades and merge in any changes to the defaults.","title":"Configuring Apptainer RPM"},{"location":"worker-node/install-apptainer/#upgrading-from-singularity-rpm","text":"When upgrading from Singularity to Apptainer, any local changes that were made to /etc/singularity/singularity.conf need to be manually migrated to /etc/apptainer/apptainer.conf and the /etc/singularity directory needs to be deleted. See the Apptainer Migrating from Singularity guide and its explanation of Singularity compatibility for more details.","title":"Upgrading from Singularity RPM"},{"location":"worker-node/install-apptainer/#limiting-image-types-with-setuid-installation","text":"If the RPM installation is setuid, consider the following. Images based on loopback devices carry an inherently higher exposure to unknown kernel exploits compared to directory-based images distributed via CVMFS. See this article for further discussion. In setuid mode, the SIF images produced by default by Apptainer are mounted with loopback devices. However, OSG VOs only need directory-based images, and Apptainer can also mount SIF images using unprivileged user namespaces. Hence, it is reasonable to disable the loopback-based images by setting the following option in /etc/apptainer/apptainer.conf : max loop devices = 0 While reasonable for some sites, this is not required as there are currently no public kernel exploits for this issue; any exploits are patched by Red Hat when they are discovered. If loopback devices are disabled but unprivileged user namespaces are enabled, then users can run Apptainer with the --userns option (which is the same thing as the default in a non-setuid installation) and still be able to mount images unprivileged, although they will get an error if they don't use the option.","title":"Limiting Image Types with Setuid Installation"},{"location":"worker-node/install-apptainer/#validating-apptainer-installation","text":"After apptainer is installed, as an ordinary user run the following command to verify it: user@host $ apptainer exec --contain --ipc --pid docker://centos:7 ps -ef UID PID PPID C STIME TTY TIME CMD user 1 0 0 11:07 console 00:00:00 appinit user 12 1 0 11:07 console 00:00:00 /usr/bin/ps -ef","title":"Validating Apptainer installation"},{"location":"worker-node/install-apptainer/#starting-and-stopping-services","text":"Apptainer has no services to start or stop.","title":"Starting and Stopping Services"},{"location":"worker-node/install-apptainer/#references","text":"Apptainer Documentation Apptainer Support","title":"References"},{"location":"worker-node/install-cvmfs/","text":"Installing and Maintaining the CernVM File System Client \u00b6 EL7 version compatibility There is an incompatibility with EL7 < 7.5 due to an old version of the selinux-policy package The CernVM File System ( CVMFS ) is an HTTP-based file distribution service used to provide data and software for jobs. By installing CVMFS, you have access to an alternative installation method for required worker node software and your site you will be able to support a wider range of user jobs. For example, CVMFS provides easy access to the following: The worker node client CA and VO security data Software used by VOs Data stored in StashCache . Use this page to learn how to install, configure, run, test, and troubleshoot the CVMFS client from the OSG software repositories. Applicable versions The applicable software versions for this document are OSG Version >= 3.4.3. The version of CVMFS installed should be >= 2.4.1 Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If it does not exist already, the installation will create the cvmfs Linux user Group IDs: If they do not exist already, the installation will create the Linux groups cvmfs and fuse Network ports: You will need network access to a local squid server such as the squid distributed by OSG . The squid will need out-bound access to cvmfs stratum 1 servers. Host choice: - Sufficient (~20GB+20%) cache space reserved, preferably in a separate filesystem (details below ) FUSE : CVMFS requires FUSE As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Installing CVMFS \u00b6 The following will install CVMFS from the OSG yum repository. It will also install fuse and autofs if you do not have them, and it will install the configuration for the OSG CVMFS distribution which is called OASIS. To simplify installation, OSG provides convenience RPMs that install all required software with a single command. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install CVMFS software: root@host # yum install osg-oasis Automount setup \u00b6 If automount is not yet in use on the system, do the following: root@host # systemctl enable autofs root@host # systemctl start autofs Put the following in /etc/auto.master.d/cvmfs.autofs : /cvmfs /etc/auto.cvmfs Restart autofs to make the change take effect: root@host # systemctl restart autofs Configuring CVMFS \u00b6 Create or edit /etc/cvmfs/default.local , a file that controls the CVMFS configuration. Below is a sample configuration, but please note that you will need to edit the parts in angle brackets . In particular, the CVMFS_HTTP_PROXY line below must be edited for your site. CVMFS_REPOSITORIES=\"$((echo oasis.opensciencegrid.org;echo cms.cern.ch;ls /cvmfs)|sort -u|paste -sd ,)\" CVMFS_QUOTA_LIMIT=<QUOTA LIMIT> CVMFS_HTTP_PROXY=\"<SQUID URL>:<SQUID PORT>\" CVMFS by default allows any repository to be mounted, no matter what the setting of CVMFS_REPOSITORIES is; that variable is used by support tools such as cvmfs_config and cvmfs_talk when they need to know a list of repositories. The recommended CVMFS_REPOSITORIES setting above is so that those tools will use two common repositories plus any additional that have been mounted. You may want to choose a different set of always-known repositories. Set up a list of CVMFS HTTP proxies to retrieve from in CVMFS_HTTP_PROXY . If you do not have any squid at your site follow the instructions to install squid from OSG . Vertical bars separating proxies means to load balance between them and try them all before continuing. A semicolon between proxies means to try that one only after the previous ones have failed. For example: CVMFS_HTTP_PROXY=\"http://squid1.example.com:3128|http://squid2.example.com:3128;http://backup-squid.example.com:3128\" If no squid is available, it is acceptable for very small sites and laptops to set CVMFS_HTTP_PROXY=\"DIRECT\" . In that case, the OSG configuration sets the servers to be contacted through globally distributed caches . This is strongly discouraged for large sites because of the performance impact and because of the potential impact on the global caches. When there is at least one local proxy defined, the OSG configuration instead adds fallback proxies at Fermilab and CERN. Those fallback proxies are monitored by a WLCG team that will contact your site when your local proxy is failing and help you fix it. Set up the cache limit in CVMFS_QUOTA_LIMIT (in Megabytes). The recommended value for most applications is 20000 MB. This is the combined limit for all but the osgstorage.org repositories. This cache will be stored in /var/lib/cvmfs by default; to override the location, set CVMFS_CACHE_BASE in /etc/cvmfs/default.local . Note that an additional 1000 MB is allocated for a separate osgstorage.org repositories cache in $CVMFS_CACHE_BASE/osgstorage . To be safe, make sure that at least 20% more than $CVMFS_QUOTA_LIMIT + 1000 MB of space stays available for CVMFS in that filesystem. This is very important, since if that space is not available it can cause many I/O errors and application crashes. Many system administrators choose to put the cache space in a separate filesystem, which is a good way to manage it. If you change CVMFS_CACHE_BASE... The new cache directory must be owned by the cvmfs user, and have 0700 permissions. If you use SELinux, then the new cache directory must be labeled with SELinux type cvmfs_cache_t . This can be done by executing the following command: :::console user@host $ chcon -R -t cvmfs_cache_t Validating CVMFS \u00b6 After CVMFS is installed, you should be able to see the /cvmfs directory. But note that it will initially appear to be empty: user@host $ ls /cvmfs user@host $ Directories within /cvmfs will not be mounted until you examine them. For instance: user@host $ ls /cvmfs user@host $ ls -l /cvmfs/atlas.cern.ch total 1 drwxr-xr-x 8 cvmfs cvmfs 3 Apr 13 14:50 repo user@host $ ls -l /cvmfs/oasis.opensciencegrid.org/cmssoft total 1 lrwxrwxrwx 1 cvmfs cvmfs 18 May 13 2015 cms -> /cvmfs/cms.cern.ch user@host $ ls -l /cvmfs/glast.egi.eu total 5 drwxr-xr-x 9 cvmfs cvmfs 4096 Feb 7 2014 glast user@host $ ls -l /cvmfs/nova.osgstorage.org total 6 lrwxrwxrwx 1 cvmfs cvmfs 43 Jun 14 2016 analysis -> pnfs/fnal.gov/usr/nova/persistent/analysis/ lrwxrwxrwx 1 cvmfs cvmfs 32 Jan 19 11:40 flux -> pnfs/fnal.gov/usr/nova/data/flux drwxr-xr-x 3 cvmfs cvmfs 4096 Jan 19 11:39 pnfs user@host $ ls /cvmfs atlas.cern.ch glast.egi.eu oasis.opensciencegrid.org config-osg.opensciencegrid.org nova.osgstorage.org Troubleshooting problems \u00b6 If no directories exist under /cvmfs/ , you can try the following steps to debug: Mount it manually with mkdir -p /mnt/cvmfs and then mount -t cvmfs REPOSITORYNAME /mnt/cvmfs where REPOSITORYNAME is the repository, for example config-osg.opensciencegrid.org (which is the best one to try first because other repositories require it to be mounted). If this works, then CVMFS is working, but there is a problem with automount. If that doesn't work and doesn't give any explanatory errors, try cvmfs_config chksetup or cvmfs_config showconfig REPOSITORYNAME to verify your setup. If chksetup reports access problems to proxies, it may be caused by access control settings in the squids. If you have changed settings in /etc/cvmfs/default.local , and they do not seem to be taking effect, note that there are other configuration files that can override the settings. See the comments at the beginning of /etc/cvmfs/default.conf regarding the order in which configuration files are evaluated and look for old files that may have been left from a previous installation. More things to try are in the upstream documentation . Starting and Stopping services \u00b6 Once it is set up, CVMFS is always automatically started when one of the repositories are accessed; there are no system services to start. CVMFS can be stopped via: root@host # cvmfs_config umount Unmounting /cvmfs/config-osg.opensciencegrid.org: OK Unmounting /cvmfs/atlas.cern.ch: OK Unmounting /cvmfs/oasis.opensciencegrid.org: OK Unmounting /cvmfs/glast.egi.eu: OK Unmounting /cvmfs/nova.osgstorage.org: OK How to get Help? \u00b6 If you cannot resolve the problem, there are several ways to receive help: For bug reporting and OSG-specific issues, see our help procedure For community support and best-effort software team support contact osg-cvmfs@opensciencegrid.org . For general CernVM File System support contact cernvm.support@cern.ch . References \u00b6 http://cernvm.cern.ch/portal/filesystem/techinformation https://cvmfs.readthedocs.io/en/latest/ Users and Groups \u00b6 This installation will create one user unless it already exists User Comment cvmfs CernVM-FS service account The installation will also create a cvmfs group and default the cvmfs user to that group. In addition, if the fuse RPM is not already installed, installing cvmfs will also install fuse and that will create another group: Group Comment Group members cvmfs CernVM-FS service account none fuse FUSE service account cvmfs","title":"Install CVMFS"},{"location":"worker-node/install-cvmfs/#installing-and-maintaining-the-cernvm-file-system-client","text":"EL7 version compatibility There is an incompatibility with EL7 < 7.5 due to an old version of the selinux-policy package The CernVM File System ( CVMFS ) is an HTTP-based file distribution service used to provide data and software for jobs. By installing CVMFS, you have access to an alternative installation method for required worker node software and your site you will be able to support a wider range of user jobs. For example, CVMFS provides easy access to the following: The worker node client CA and VO security data Software used by VOs Data stored in StashCache . Use this page to learn how to install, configure, run, test, and troubleshoot the CVMFS client from the OSG software repositories. Applicable versions The applicable software versions for this document are OSG Version >= 3.4.3. The version of CVMFS installed should be >= 2.4.1","title":"Installing and Maintaining the CernVM File System Client"},{"location":"worker-node/install-cvmfs/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If it does not exist already, the installation will create the cvmfs Linux user Group IDs: If they do not exist already, the installation will create the Linux groups cvmfs and fuse Network ports: You will need network access to a local squid server such as the squid distributed by OSG . The squid will need out-bound access to cvmfs stratum 1 servers. Host choice: - Sufficient (~20GB+20%) cache space reserved, preferably in a separate filesystem (details below ) FUSE : CVMFS requires FUSE As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories","title":"Before Starting"},{"location":"worker-node/install-cvmfs/#installing-cvmfs","text":"The following will install CVMFS from the OSG yum repository. It will also install fuse and autofs if you do not have them, and it will install the configuration for the OSG CVMFS distribution which is called OASIS. To simplify installation, OSG provides convenience RPMs that install all required software with a single command. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install CVMFS software: root@host # yum install osg-oasis","title":"Installing CVMFS"},{"location":"worker-node/install-cvmfs/#automount-setup","text":"If automount is not yet in use on the system, do the following: root@host # systemctl enable autofs root@host # systemctl start autofs Put the following in /etc/auto.master.d/cvmfs.autofs : /cvmfs /etc/auto.cvmfs Restart autofs to make the change take effect: root@host # systemctl restart autofs","title":"Automount setup"},{"location":"worker-node/install-cvmfs/#configuring-cvmfs","text":"Create or edit /etc/cvmfs/default.local , a file that controls the CVMFS configuration. Below is a sample configuration, but please note that you will need to edit the parts in angle brackets . In particular, the CVMFS_HTTP_PROXY line below must be edited for your site. CVMFS_REPOSITORIES=\"$((echo oasis.opensciencegrid.org;echo cms.cern.ch;ls /cvmfs)|sort -u|paste -sd ,)\" CVMFS_QUOTA_LIMIT=<QUOTA LIMIT> CVMFS_HTTP_PROXY=\"<SQUID URL>:<SQUID PORT>\" CVMFS by default allows any repository to be mounted, no matter what the setting of CVMFS_REPOSITORIES is; that variable is used by support tools such as cvmfs_config and cvmfs_talk when they need to know a list of repositories. The recommended CVMFS_REPOSITORIES setting above is so that those tools will use two common repositories plus any additional that have been mounted. You may want to choose a different set of always-known repositories. Set up a list of CVMFS HTTP proxies to retrieve from in CVMFS_HTTP_PROXY . If you do not have any squid at your site follow the instructions to install squid from OSG . Vertical bars separating proxies means to load balance between them and try them all before continuing. A semicolon between proxies means to try that one only after the previous ones have failed. For example: CVMFS_HTTP_PROXY=\"http://squid1.example.com:3128|http://squid2.example.com:3128;http://backup-squid.example.com:3128\" If no squid is available, it is acceptable for very small sites and laptops to set CVMFS_HTTP_PROXY=\"DIRECT\" . In that case, the OSG configuration sets the servers to be contacted through globally distributed caches . This is strongly discouraged for large sites because of the performance impact and because of the potential impact on the global caches. When there is at least one local proxy defined, the OSG configuration instead adds fallback proxies at Fermilab and CERN. Those fallback proxies are monitored by a WLCG team that will contact your site when your local proxy is failing and help you fix it. Set up the cache limit in CVMFS_QUOTA_LIMIT (in Megabytes). The recommended value for most applications is 20000 MB. This is the combined limit for all but the osgstorage.org repositories. This cache will be stored in /var/lib/cvmfs by default; to override the location, set CVMFS_CACHE_BASE in /etc/cvmfs/default.local . Note that an additional 1000 MB is allocated for a separate osgstorage.org repositories cache in $CVMFS_CACHE_BASE/osgstorage . To be safe, make sure that at least 20% more than $CVMFS_QUOTA_LIMIT + 1000 MB of space stays available for CVMFS in that filesystem. This is very important, since if that space is not available it can cause many I/O errors and application crashes. Many system administrators choose to put the cache space in a separate filesystem, which is a good way to manage it. If you change CVMFS_CACHE_BASE... The new cache directory must be owned by the cvmfs user, and have 0700 permissions. If you use SELinux, then the new cache directory must be labeled with SELinux type cvmfs_cache_t . This can be done by executing the following command: :::console user@host $ chcon -R -t cvmfs_cache_t","title":"Configuring CVMFS"},{"location":"worker-node/install-cvmfs/#validating-cvmfs","text":"After CVMFS is installed, you should be able to see the /cvmfs directory. But note that it will initially appear to be empty: user@host $ ls /cvmfs user@host $ Directories within /cvmfs will not be mounted until you examine them. For instance: user@host $ ls /cvmfs user@host $ ls -l /cvmfs/atlas.cern.ch total 1 drwxr-xr-x 8 cvmfs cvmfs 3 Apr 13 14:50 repo user@host $ ls -l /cvmfs/oasis.opensciencegrid.org/cmssoft total 1 lrwxrwxrwx 1 cvmfs cvmfs 18 May 13 2015 cms -> /cvmfs/cms.cern.ch user@host $ ls -l /cvmfs/glast.egi.eu total 5 drwxr-xr-x 9 cvmfs cvmfs 4096 Feb 7 2014 glast user@host $ ls -l /cvmfs/nova.osgstorage.org total 6 lrwxrwxrwx 1 cvmfs cvmfs 43 Jun 14 2016 analysis -> pnfs/fnal.gov/usr/nova/persistent/analysis/ lrwxrwxrwx 1 cvmfs cvmfs 32 Jan 19 11:40 flux -> pnfs/fnal.gov/usr/nova/data/flux drwxr-xr-x 3 cvmfs cvmfs 4096 Jan 19 11:39 pnfs user@host $ ls /cvmfs atlas.cern.ch glast.egi.eu oasis.opensciencegrid.org config-osg.opensciencegrid.org nova.osgstorage.org","title":"Validating CVMFS"},{"location":"worker-node/install-cvmfs/#troubleshooting-problems","text":"If no directories exist under /cvmfs/ , you can try the following steps to debug: Mount it manually with mkdir -p /mnt/cvmfs and then mount -t cvmfs REPOSITORYNAME /mnt/cvmfs where REPOSITORYNAME is the repository, for example config-osg.opensciencegrid.org (which is the best one to try first because other repositories require it to be mounted). If this works, then CVMFS is working, but there is a problem with automount. If that doesn't work and doesn't give any explanatory errors, try cvmfs_config chksetup or cvmfs_config showconfig REPOSITORYNAME to verify your setup. If chksetup reports access problems to proxies, it may be caused by access control settings in the squids. If you have changed settings in /etc/cvmfs/default.local , and they do not seem to be taking effect, note that there are other configuration files that can override the settings. See the comments at the beginning of /etc/cvmfs/default.conf regarding the order in which configuration files are evaluated and look for old files that may have been left from a previous installation. More things to try are in the upstream documentation .","title":"Troubleshooting problems"},{"location":"worker-node/install-cvmfs/#starting-and-stopping-services","text":"Once it is set up, CVMFS is always automatically started when one of the repositories are accessed; there are no system services to start. CVMFS can be stopped via: root@host # cvmfs_config umount Unmounting /cvmfs/config-osg.opensciencegrid.org: OK Unmounting /cvmfs/atlas.cern.ch: OK Unmounting /cvmfs/oasis.opensciencegrid.org: OK Unmounting /cvmfs/glast.egi.eu: OK Unmounting /cvmfs/nova.osgstorage.org: OK","title":"Starting and Stopping services"},{"location":"worker-node/install-cvmfs/#how-to-get-help","text":"If you cannot resolve the problem, there are several ways to receive help: For bug reporting and OSG-specific issues, see our help procedure For community support and best-effort software team support contact osg-cvmfs@opensciencegrid.org . For general CernVM File System support contact cernvm.support@cern.ch .","title":"How to get Help?"},{"location":"worker-node/install-cvmfs/#references","text":"http://cernvm.cern.ch/portal/filesystem/techinformation https://cvmfs.readthedocs.io/en/latest/","title":"References"},{"location":"worker-node/install-cvmfs/#users-and-groups","text":"This installation will create one user unless it already exists User Comment cvmfs CernVM-FS service account The installation will also create a cvmfs group and default the cvmfs user to that group. In addition, if the fuse RPM is not already installed, installing cvmfs will also install fuse and that will create another group: Group Comment Group members cvmfs CernVM-FS service account none fuse FUSE service account cvmfs","title":"Users and Groups"},{"location":"worker-node/install-singularity/","text":"Install Singularity \u00b6 Singularity has been renamed to Apptainer; see instead Install Apptainer . Enabling Unprivileged Singularity \u00b6 See instead Enabling Unprivileged Apptainer . Singularity via RPM \u00b6 See instead Apptainer via RPM .","title":"Install Singularity"},{"location":"worker-node/install-singularity/#install-singularity","text":"Singularity has been renamed to Apptainer; see instead Install Apptainer .","title":"Install Singularity"},{"location":"worker-node/install-singularity/#enabling-unprivileged-singularity","text":"See instead Enabling Unprivileged Apptainer .","title":"Enabling Unprivileged Singularity"},{"location":"worker-node/install-singularity/#singularity-via-rpm","text":"See instead Apptainer via RPM .","title":"Singularity via RPM"},{"location":"worker-node/install-wn-oasis/","text":"Installing the Worker Node Client via OASIS \u00b6 The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Use from OASIS (this guide) - useful when CVMFS is already mounted on your worker nodes Install using a tarball - useful when installing onto a shared filesystem for distribution to worker nodes Install using RPMs and Yum - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs This document is intended to guide system administrators through the process of configuring a site to make the Worker Node Client software available from OASIS. Before Starting \u00b6 As with all OSG Software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system On every worker node, install and configure CVMFS Once configured to use OASIS, OSG jobs will download the worker-node software on demand (into the local disk cache). This may result in extra network activity, especially on first use of the client tools. Configure the CE \u00b6 Determine the OASIS path to the Worker Node Client software for your worker nodes: Worker Node OS Use\u2026 EL 7 (64-bit) /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el7-x86_64 EL 8 (64-bit) /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el8-x86_64 On the CE, in the /etc/osg/config.d/10-storage.ini file, set the grid_dir configuration setting to the path from the previous step. Once you finish making changes to configuration files on your CE, validate, fix, and apply the configuration: root@host # osg-configure -v root@host # osg-configure -c For more information, see the OSG worker node environment documentation and the CE configuration instructions . Validating the Worker Node Client \u00b6 To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of OSG_GRID is set to the directory of your WN Client installation Manually Using the Worker Node Client From OASIS \u00b6 If you must log onto a worker node and use the Worker Node Client software directly during your login session, consult the following table for the command to set up your environment: Worker Node OS Run the following command\u2026 EL 7 (64-bit) source /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el7-x86_64/setup.sh EL 8 (64-bit) source /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el8-x86_64/setup.sh Getting Help \u00b6 To get assistance, please use this page .","title":"Install from OASIS"},{"location":"worker-node/install-wn-oasis/#installing-the-worker-node-client-via-oasis","text":"The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Use from OASIS (this guide) - useful when CVMFS is already mounted on your worker nodes Install using a tarball - useful when installing onto a shared filesystem for distribution to worker nodes Install using RPMs and Yum - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs This document is intended to guide system administrators through the process of configuring a site to make the Worker Node Client software available from OASIS.","title":"Installing the Worker Node Client via OASIS"},{"location":"worker-node/install-wn-oasis/#before-starting","text":"As with all OSG Software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system On every worker node, install and configure CVMFS Once configured to use OASIS, OSG jobs will download the worker-node software on demand (into the local disk cache). This may result in extra network activity, especially on first use of the client tools.","title":"Before Starting"},{"location":"worker-node/install-wn-oasis/#configure-the-ce","text":"Determine the OASIS path to the Worker Node Client software for your worker nodes: Worker Node OS Use\u2026 EL 7 (64-bit) /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el7-x86_64 EL 8 (64-bit) /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el8-x86_64 On the CE, in the /etc/osg/config.d/10-storage.ini file, set the grid_dir configuration setting to the path from the previous step. Once you finish making changes to configuration files on your CE, validate, fix, and apply the configuration: root@host # osg-configure -v root@host # osg-configure -c For more information, see the OSG worker node environment documentation and the CE configuration instructions .","title":"Configure the CE"},{"location":"worker-node/install-wn-oasis/#validating-the-worker-node-client","text":"To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of OSG_GRID is set to the directory of your WN Client installation","title":"Validating the Worker Node Client"},{"location":"worker-node/install-wn-oasis/#manually-using-the-worker-node-client-from-oasis","text":"If you must log onto a worker node and use the Worker Node Client software directly during your login session, consult the following table for the command to set up your environment: Worker Node OS Run the following command\u2026 EL 7 (64-bit) source /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el7-x86_64/setup.sh EL 8 (64-bit) source /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el8-x86_64/setup.sh","title":"Manually Using the Worker Node Client From OASIS"},{"location":"worker-node/install-wn-oasis/#getting-help","text":"To get assistance, please use this page .","title":"Getting Help"},{"location":"worker-node/install-wn-tarball/","text":"Installing the Worker Node Client via Tarball \u00b6 The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Install using a tarball (this guide) - useful when installing onto a shared filesystem for distribution to worker nodes Use from OASIS - useful when worker nodes already mount CVMFS Install using RPMs and Yum - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs This document is intended to guide users through the process of installing the worker node software and configuring the installed worker node software. Although this document is oriented to system administrators, any unprivileged user may install and use the client. Before starting, ensure the host has a supported operating system . Download the WN Client \u00b6 Please pick the osg-wn-client tarball that is appropriate for your distribution and architecture. You will find them in https://repo.opensciencegrid.org/tarball-install/ . For OSG 3.6: Binaries for RHEL7 Binaries for RHEL8 Install the WN Client \u00b6 Unpack the tarball. Move the directory that was created to where you want the tarball client to be. Run osg-post-install ( <PATH_TO_CLIENT>/osg/osg-post-install ) to fix the directories in the installation. Source the setup source <PATH_TO_CLIENT>/setup.sh (or setup.csh depending on the shell). Download and set up CA certificates using osg-ca-manage (See the CA management documentation for the available options). Download CRLs using fetch-crl . Note The WN client requires a Perl interpreter to be available in /usr/bin/perl . If not present, install by running yum install perl as root. Warning Once osg-post-install is run to relocate the install, it cannot be run again. You will need to unpack a fresh copy. Example installation (in /home/user/test-install , the <PATH_TO_CLIENT>/ is /home/user/test-install/osg-wn-client ): user@host $ mkdir /home/user/test-install user@host $ cd /home/user/test-install user@host $ wget https://repo.opensciencegrid.org/tarball-install/3.6/osg-wn-client-latest.el7.x86_64.tar.gz user@host $ tar xzf osg-wn-client-latest.el7.x86_64.tar.gz user@host $ cd osg-wn-client user@host $ ./osg/osg-post-install user@host $ source setup.sh user@host $ osg-ca-manage setupCA --url osg user@host $ fetch-crl Configure the CE \u00b6 Using the wn-client software installed from the tarball will require a few changes on the compute entrypoint so that the resource's configuration can be correctly reported. Set grid_dir in the Storage section of your OSG-Configure configs: CE configuration instructions . grid_dir is used as the $OSG_GRID environment variable in running jobs - see the worker node environment document . Pilot jobs source $OSG_GRID/setup.sh before performing any work. The value set for grid_dir must be the path of the wn-client installation directory. This is the path returned by echo $OSG_LOCATION once you source the setup file created by this installation. Services \u00b6 The WN client is a collection of client programs that do not require service startup or shutdown. The only services are osg-update-certs that keeps the CA certificates up-to-date and fetch-crl that keeps the CRLs up-to-date. Following the instructions below you'll add the services to your crontab that will take care to run them periodically until you remove them. Auto-updating certificates and CRLs \u00b6 You must create cron jobs to run fetch-crl and osg-update-certs to update your CRLs and certificates automatically. Here is what they should look like. (Note: fill in <OSG_LOCATION> with the full path of your tarball install, including osg-wn-client that is created by the tarball). # Cron job to update certs. # Runs every hour by default, though does not update certs until they're at # least 24 hours old. There is a random sleep time for up to 45 minutes (2700 # seconds) to avoid overloading cert servers. 10 * * * * ( . <OSG_LOCATION>/setup.sh && osg-update-certs --random-sleep 2700 --called-from-cron ) # Cron job to update CRLs # Runs every 6 hours at, 45 minutes +/- 3 minutes. 42 */6 * * * ( . <OSG_LOCATION>/setup.sh && fetch-crl -q -r 360 ) You might want to configure proxy settings in $OSG_LOCATION/etc/fetch-crl.conf . Enabling and Disabling Services \u00b6 To enable the CRL updates, you must edit your cron with crontab -e and add the lines above. To disable, remove the lines from the crontab . Validating the Worker Node Client \u00b6 To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of $OSG_GRID is set to the directory of your worker node client installation How to get Help? \u00b6 To get assistance please use this Help Procedure .","title":"Install from Tarball"},{"location":"worker-node/install-wn-tarball/#installing-the-worker-node-client-via-tarball","text":"The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Install using a tarball (this guide) - useful when installing onto a shared filesystem for distribution to worker nodes Use from OASIS - useful when worker nodes already mount CVMFS Install using RPMs and Yum - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs This document is intended to guide users through the process of installing the worker node software and configuring the installed worker node software. Although this document is oriented to system administrators, any unprivileged user may install and use the client. Before starting, ensure the host has a supported operating system .","title":"Installing the Worker Node Client via Tarball"},{"location":"worker-node/install-wn-tarball/#download-the-wn-client","text":"Please pick the osg-wn-client tarball that is appropriate for your distribution and architecture. You will find them in https://repo.opensciencegrid.org/tarball-install/ . For OSG 3.6: Binaries for RHEL7 Binaries for RHEL8","title":"Download the WN Client"},{"location":"worker-node/install-wn-tarball/#install-the-wn-client","text":"Unpack the tarball. Move the directory that was created to where you want the tarball client to be. Run osg-post-install ( <PATH_TO_CLIENT>/osg/osg-post-install ) to fix the directories in the installation. Source the setup source <PATH_TO_CLIENT>/setup.sh (or setup.csh depending on the shell). Download and set up CA certificates using osg-ca-manage (See the CA management documentation for the available options). Download CRLs using fetch-crl . Note The WN client requires a Perl interpreter to be available in /usr/bin/perl . If not present, install by running yum install perl as root. Warning Once osg-post-install is run to relocate the install, it cannot be run again. You will need to unpack a fresh copy. Example installation (in /home/user/test-install , the <PATH_TO_CLIENT>/ is /home/user/test-install/osg-wn-client ): user@host $ mkdir /home/user/test-install user@host $ cd /home/user/test-install user@host $ wget https://repo.opensciencegrid.org/tarball-install/3.6/osg-wn-client-latest.el7.x86_64.tar.gz user@host $ tar xzf osg-wn-client-latest.el7.x86_64.tar.gz user@host $ cd osg-wn-client user@host $ ./osg/osg-post-install user@host $ source setup.sh user@host $ osg-ca-manage setupCA --url osg user@host $ fetch-crl","title":"Install the WN Client"},{"location":"worker-node/install-wn-tarball/#configure-the-ce","text":"Using the wn-client software installed from the tarball will require a few changes on the compute entrypoint so that the resource's configuration can be correctly reported. Set grid_dir in the Storage section of your OSG-Configure configs: CE configuration instructions . grid_dir is used as the $OSG_GRID environment variable in running jobs - see the worker node environment document . Pilot jobs source $OSG_GRID/setup.sh before performing any work. The value set for grid_dir must be the path of the wn-client installation directory. This is the path returned by echo $OSG_LOCATION once you source the setup file created by this installation.","title":"Configure the CE"},{"location":"worker-node/install-wn-tarball/#services","text":"The WN client is a collection of client programs that do not require service startup or shutdown. The only services are osg-update-certs that keeps the CA certificates up-to-date and fetch-crl that keeps the CRLs up-to-date. Following the instructions below you'll add the services to your crontab that will take care to run them periodically until you remove them.","title":"Services"},{"location":"worker-node/install-wn-tarball/#auto-updating-certificates-and-crls","text":"You must create cron jobs to run fetch-crl and osg-update-certs to update your CRLs and certificates automatically. Here is what they should look like. (Note: fill in <OSG_LOCATION> with the full path of your tarball install, including osg-wn-client that is created by the tarball). # Cron job to update certs. # Runs every hour by default, though does not update certs until they're at # least 24 hours old. There is a random sleep time for up to 45 minutes (2700 # seconds) to avoid overloading cert servers. 10 * * * * ( . <OSG_LOCATION>/setup.sh && osg-update-certs --random-sleep 2700 --called-from-cron ) # Cron job to update CRLs # Runs every 6 hours at, 45 minutes +/- 3 minutes. 42 */6 * * * ( . <OSG_LOCATION>/setup.sh && fetch-crl -q -r 360 ) You might want to configure proxy settings in $OSG_LOCATION/etc/fetch-crl.conf .","title":"Auto-updating certificates and CRLs"},{"location":"worker-node/install-wn-tarball/#enabling-and-disabling-services","text":"To enable the CRL updates, you must edit your cron with crontab -e and add the lines above. To disable, remove the lines from the crontab .","title":"Enabling and Disabling Services"},{"location":"worker-node/install-wn-tarball/#validating-the-worker-node-client","text":"To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of $OSG_GRID is set to the directory of your worker node client installation","title":"Validating the Worker Node Client"},{"location":"worker-node/install-wn-tarball/#how-to-get-help","text":"To get assistance please use this Help Procedure .","title":"How to get Help?"},{"location":"worker-node/install-wn/","text":"Installing the Worker Node Client From RPMs \u00b6 The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Install using RPMs and Yum (this guide) - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs Use from OASIS - useful when worker nodes already mount CVMFS Install using a tarball - useful when installing onto a shared filesystem for distribution to worker nodes This document is intended to guide system administrators through the process of configuring a site to make the Worker Node Client software available from an RPM. Before Starting \u00b6 As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Install the Worker Node Client \u00b6 Install the Worker Node Client RPM: root@host # yum install osg-wn-client Services \u00b6 Fetch-CRL is the only service required to support the WN Client. Software Service name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info Note fetch-crl-boot will begin fetching CRLS, which can take a few minutes and fail on transient errors. You can add configuration to ignore these transient errors in /etc/fetch-crl.conf : noerrors As a reminder, here are common service commands (all run as root ): To \u2026 Run the command \u2026 Start a service service SERVICE-NAME start Stop a service service SERVICE-NAME stop Enable a service to start during boot chkconfig SERVICE-NAME on Disable a service from starting during boot chkconfig SERVICE-NAME off Validating the Worker Node Client \u00b6 To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of OSG_GRID is set to /etc/osg/wn-client How to get Help? \u00b6 To get assistance please use this Help Procedure .","title":"Install from RPM"},{"location":"worker-node/install-wn/#installing-the-worker-node-client-from-rpms","text":"The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Install using RPMs and Yum (this guide) - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs Use from OASIS - useful when worker nodes already mount CVMFS Install using a tarball - useful when installing onto a shared filesystem for distribution to worker nodes This document is intended to guide system administrators through the process of configuring a site to make the Worker Node Client software available from an RPM.","title":"Installing the Worker Node Client From RPMs"},{"location":"worker-node/install-wn/#before-starting","text":"As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"worker-node/install-wn/#install-the-worker-node-client","text":"Install the Worker Node Client RPM: root@host # yum install osg-wn-client","title":"Install the Worker Node Client"},{"location":"worker-node/install-wn/#services","text":"Fetch-CRL is the only service required to support the WN Client. Software Service name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info Note fetch-crl-boot will begin fetching CRLS, which can take a few minutes and fail on transient errors. You can add configuration to ignore these transient errors in /etc/fetch-crl.conf : noerrors As a reminder, here are common service commands (all run as root ): To \u2026 Run the command \u2026 Start a service service SERVICE-NAME start Stop a service service SERVICE-NAME stop Enable a service to start during boot chkconfig SERVICE-NAME on Disable a service from starting during boot chkconfig SERVICE-NAME off","title":"Services"},{"location":"worker-node/install-wn/#validating-the-worker-node-client","text":"To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of OSG_GRID is set to /etc/osg/wn-client","title":"Validating the Worker Node Client"},{"location":"worker-node/install-wn/#how-to-get-help","text":"To get assistance please use this Help Procedure .","title":"How to get Help?"},{"location":"worker-node/using-wn-containers/","text":"Using the Worker Node Containers \u00b6 The OSG worker node containers contain the suggested base environment for worker nodes. They can be used as a base image to build containers or to perform testing. The containers are available on Docker Hub . Available Containers \u00b6 Available tags include: latest : The latest version of the OSG worker node environment on the most recent supported OS. As of August 2021, this is OSG 3.6 and RHEL8. 3.6 : The OSG 3.6 release series on top of the most recent supported OS. As of August 2021, this is RHEL8. 3.6-el7 : The OSG 3.6 release series on top of a RHEL7 environment. 3.6-el8 : The OSG 3.6 release series on top of a RHEL8 environment. Building Upon the Container \u00b6 You may base the container on the OSG worker node by including it inside your Dockerfile : FROM opensciencegrid/osg-wn:latest You can replace latest with any tag listed above. Perform Testing \u00b6 You may perform testing from within the OSG worker node envionment by running the command: root@host # docker run -ti --rm opensciencegrid/osg-wn:latest /bin/bash","title":"Using WN Containers"},{"location":"worker-node/using-wn-containers/#using-the-worker-node-containers","text":"The OSG worker node containers contain the suggested base environment for worker nodes. They can be used as a base image to build containers or to perform testing. The containers are available on Docker Hub .","title":"Using the Worker Node Containers"},{"location":"worker-node/using-wn-containers/#available-containers","text":"Available tags include: latest : The latest version of the OSG worker node environment on the most recent supported OS. As of August 2021, this is OSG 3.6 and RHEL8. 3.6 : The OSG 3.6 release series on top of the most recent supported OS. As of August 2021, this is RHEL8. 3.6-el7 : The OSG 3.6 release series on top of a RHEL7 environment. 3.6-el8 : The OSG 3.6 release series on top of a RHEL8 environment.","title":"Available Containers"},{"location":"worker-node/using-wn-containers/#building-upon-the-container","text":"You may base the container on the OSG worker node by including it inside your Dockerfile : FROM opensciencegrid/osg-wn:latest You can replace latest with any tag listed above.","title":"Building Upon the Container"},{"location":"worker-node/using-wn-containers/#perform-testing","text":"You may perform testing from within the OSG worker node envionment by running the command: root@host # docker run -ti --rm opensciencegrid/osg-wn:latest /bin/bash","title":"Perform Testing"},{"location":"worker-node/using-wn/","text":"Worker Node Overview \u00b6 The Worker Node Client is a collection of useful software components that is expected to be on every OSG worker node. In addition, a job running on a worker node can access a handful of environment variables that can be used to locate resources. This page describes how to initialize the environment of your job to correctly access the execution and data areas from the worker node. The OSG provides no scientific software dependencies or software build tools on the worker node; you are expected to bring along all application-level dependencies yourself (preferred; most portable) or utilize CVMFS. Sites are not required to provide any specific tools ( gcc , lapack , blas , etc.) beyond the ones in the OSG worker node client and the base OS. If you would like to test the minimal OS environment that jobs can expect, you can test out your scientific software in the OSG Docker image . Filling local scratch disk The directory specified by the OSG_WN_TMP environment variable is used by pilot jobs as a temporary staging area for user job data during the lifetime of the pilot. If many pilot jobs do not exit cleanly (e.g., due to preemption), this may result in the local scratch directory filling up, which could negatively affect other jobs running on the impacted node. See this section for suggestions for mitigation. Hardware Recommendations \u00b6 Hardware Minimum Recommended Notes Core per pilot 1 8 Depends on the supported VOs. The total core count on every node in the cluster must be divisible by core per pilot. Memory per core 1024MB 2048MB Memory per core times core per pilot needs to be less than the total memory on every node. Do not overcommit. Scratch disk per core ( OSG_WN_TMP ) 2 GB 10 GB This can be overcommitted if a mix of different VO jobs is expected. CVMFS Cache per node (optional) 10 GB 20 GB Common Software Available on Worker Nodes \u00b6 The OSG worker node environment contains the following software: Data and related tooling: The supported set of CA certificates (located in $X509_CERT_DIR after the environment is set up) VO authentication: vo-client Update Certificate Revocation Lists: fetch-crl Proxy management tools: Create proxies: voms-proxy-init Show proxy info: voms-proxy-info Destroy the current proxy: voms-proxy-destroy Data transfer tools: HTTP/plain FTP protocol tools (via system dependencies): wget and curl : standard tools for downloading files with HTTP and FTP Transfer clients GFAL -based client ( gfal-copy and others). GFAL supports SRM, XRootD, and HTTP protocols. The stashcp data federation client The XRootD command line client, xrdcp Troubelshooting tool: osg-system-profiler At some sites, these tools may not be available at the pilot launch. To setup the environment, do the following: user@host $ source $OSG_GRID /setup.sh This should be done by a pilot job, not by the end-user payload. The Worker Node Environment \u00b6 The following table outlines the various important directories and information in the worker node environment. A job running on an OSG worker node can refer to each directory using the corresponding environment variable. Several of them are defined as options in your OSG-Configure .ini files in /etc/osg/config.d . Custom variables and those that aren't listed may be defined in the Local Settings section . Environment Variable OSG-Configure section/option Purpose Notes $OSG_GRID Storage / grid_dir Location of additional environment variables. Pilots should source $OSG_GRID/setup.sh in order to guarantee the environment contains the worker node binaries in $PATH . $OSG_SQUID_LOCATION , Squid / location Location of a HTTP caching proxy server Utilize this service for downloading files via HTTP for cache-friendly workflows. $OSG_WN_TMP Storage / worker_node_temp Temporary storage area workspace for pilot job(s) Local to each worker node. See this section below for details. $X509_CERT_DIR Location of the CA certificates If not defined, defaults to /etc/grid-security/certificates . $_CONDOR_SCRATCH_DIR Suggested temporary storage for glideinWMS-based payloads. Users should prefer this environment variable over $OSG_WN_TMP if running inside glideinWMS. OSG_WN_TMP \u00b6 As described above OSG_WN_TMP is a temporary storage area on each worker node for pilot jobs to use as temporary scratch space. Its value is set through the configuration of your CE . For site administrators \u00b6 Filling local scratch disk The directory specified by the OSG_WN_TMP environment variable is used by pilot jobs as a temporary staging area for user job data during the lifetime of the pilot. If many pilot jobs do not exit cleanly (e.g., due to preemption), this may result in the local scratch directory filling up, which could negatively affect other jobs running on the impacted node. Site administrators are responsible for cleaning up the contents of $OSG_WN_TMP (see table above for size recommendations). We recommend one of the following solutions: (Recommended) Use batch-system capabilities to create directories in the job scratch directory and bind mount them for the job so that the batch system performs the clean up. For HTCondor batch systems , HTCondor has this ability through MOUNT_UNDER_SCRATCH : MOUNT_UNDER_SCRATCH = $(MOUNT_UNDER_SCRATCH), <PATH TO OSG_WN_TMP> If using this method, space set aside for OSG_WN_TMP should be reallocated to the partition containing the job scratch directories. If using HTCondor, this will be the partition containing the path defined by the HTCondor EXECUTE configuration variable. For Slurm batch systems , we recommend using the Lua plugin Slurm-tmpdir alongside prolog/epilog scripts ( https://slurm.schedmd.com/prolog_epilog.html ). This method will create per job /scratch and /tmp directories which will be cleaned up after the job completes. Periodically purge the directory (e.g. tmpwatch ). Job removal grace periods Additionally, increasing the batch system grace period for job removal will give pilot jobs a better chance of cleaning up after themselves. For example, the time between scancel triggering a SIGTERM and a SIGKILL is controlled by the value of the KillWait configuration. Consider increasing this grace period scaling with the number of cores given to a pilot job as there could be more data to clean up with an increasing core count. For VO managers \u00b6 Note The following advice applies to VO managers or maintainers of pilot software; end-users should contact their VO for the proper locations to stage temporary work (often, this will be either $TMPDIR or $_CONDOR_SCRATCH_DIR ). Be careful with using $OSG_WN_TMP ; at some sites, this directory might be shared with other VOs. We recommend creating a new sub-directory as a precaution: mkdir -p $OSG_WN_TMP /MYVO export mydir = ` mktemp -d -t MYVO ` cd $mydir # Run the rest of your application rm -rf $mydir The pilot should utilize $TMPDIR to communicate the location of temporary storage to payloads. A significant number of sites use the batch system to make an independent directory for each user job, and change $OSG_WN_TMP on the fly to point to this directory. There is no way to know in advance how much scratch disk space any given worker node has available; recall, what disk space is available may be shared among a number of job slots.","title":"Overview"},{"location":"worker-node/using-wn/#worker-node-overview","text":"The Worker Node Client is a collection of useful software components that is expected to be on every OSG worker node. In addition, a job running on a worker node can access a handful of environment variables that can be used to locate resources. This page describes how to initialize the environment of your job to correctly access the execution and data areas from the worker node. The OSG provides no scientific software dependencies or software build tools on the worker node; you are expected to bring along all application-level dependencies yourself (preferred; most portable) or utilize CVMFS. Sites are not required to provide any specific tools ( gcc , lapack , blas , etc.) beyond the ones in the OSG worker node client and the base OS. If you would like to test the minimal OS environment that jobs can expect, you can test out your scientific software in the OSG Docker image . Filling local scratch disk The directory specified by the OSG_WN_TMP environment variable is used by pilot jobs as a temporary staging area for user job data during the lifetime of the pilot. If many pilot jobs do not exit cleanly (e.g., due to preemption), this may result in the local scratch directory filling up, which could negatively affect other jobs running on the impacted node. See this section for suggestions for mitigation.","title":"Worker Node Overview"},{"location":"worker-node/using-wn/#hardware-recommendations","text":"Hardware Minimum Recommended Notes Core per pilot 1 8 Depends on the supported VOs. The total core count on every node in the cluster must be divisible by core per pilot. Memory per core 1024MB 2048MB Memory per core times core per pilot needs to be less than the total memory on every node. Do not overcommit. Scratch disk per core ( OSG_WN_TMP ) 2 GB 10 GB This can be overcommitted if a mix of different VO jobs is expected. CVMFS Cache per node (optional) 10 GB 20 GB","title":"Hardware Recommendations"},{"location":"worker-node/using-wn/#common-software-available-on-worker-nodes","text":"The OSG worker node environment contains the following software: Data and related tooling: The supported set of CA certificates (located in $X509_CERT_DIR after the environment is set up) VO authentication: vo-client Update Certificate Revocation Lists: fetch-crl Proxy management tools: Create proxies: voms-proxy-init Show proxy info: voms-proxy-info Destroy the current proxy: voms-proxy-destroy Data transfer tools: HTTP/plain FTP protocol tools (via system dependencies): wget and curl : standard tools for downloading files with HTTP and FTP Transfer clients GFAL -based client ( gfal-copy and others). GFAL supports SRM, XRootD, and HTTP protocols. The stashcp data federation client The XRootD command line client, xrdcp Troubelshooting tool: osg-system-profiler At some sites, these tools may not be available at the pilot launch. To setup the environment, do the following: user@host $ source $OSG_GRID /setup.sh This should be done by a pilot job, not by the end-user payload.","title":"Common Software Available on Worker Nodes"},{"location":"worker-node/using-wn/#the-worker-node-environment","text":"The following table outlines the various important directories and information in the worker node environment. A job running on an OSG worker node can refer to each directory using the corresponding environment variable. Several of them are defined as options in your OSG-Configure .ini files in /etc/osg/config.d . Custom variables and those that aren't listed may be defined in the Local Settings section . Environment Variable OSG-Configure section/option Purpose Notes $OSG_GRID Storage / grid_dir Location of additional environment variables. Pilots should source $OSG_GRID/setup.sh in order to guarantee the environment contains the worker node binaries in $PATH . $OSG_SQUID_LOCATION , Squid / location Location of a HTTP caching proxy server Utilize this service for downloading files via HTTP for cache-friendly workflows. $OSG_WN_TMP Storage / worker_node_temp Temporary storage area workspace for pilot job(s) Local to each worker node. See this section below for details. $X509_CERT_DIR Location of the CA certificates If not defined, defaults to /etc/grid-security/certificates . $_CONDOR_SCRATCH_DIR Suggested temporary storage for glideinWMS-based payloads. Users should prefer this environment variable over $OSG_WN_TMP if running inside glideinWMS.","title":"The Worker Node Environment"},{"location":"worker-node/using-wn/#osg_wn_tmp","text":"As described above OSG_WN_TMP is a temporary storage area on each worker node for pilot jobs to use as temporary scratch space. Its value is set through the configuration of your CE .","title":"OSG_WN_TMP"},{"location":"worker-node/using-wn/#for-site-administrators","text":"Filling local scratch disk The directory specified by the OSG_WN_TMP environment variable is used by pilot jobs as a temporary staging area for user job data during the lifetime of the pilot. If many pilot jobs do not exit cleanly (e.g., due to preemption), this may result in the local scratch directory filling up, which could negatively affect other jobs running on the impacted node. Site administrators are responsible for cleaning up the contents of $OSG_WN_TMP (see table above for size recommendations). We recommend one of the following solutions: (Recommended) Use batch-system capabilities to create directories in the job scratch directory and bind mount them for the job so that the batch system performs the clean up. For HTCondor batch systems , HTCondor has this ability through MOUNT_UNDER_SCRATCH : MOUNT_UNDER_SCRATCH = $(MOUNT_UNDER_SCRATCH), <PATH TO OSG_WN_TMP> If using this method, space set aside for OSG_WN_TMP should be reallocated to the partition containing the job scratch directories. If using HTCondor, this will be the partition containing the path defined by the HTCondor EXECUTE configuration variable. For Slurm batch systems , we recommend using the Lua plugin Slurm-tmpdir alongside prolog/epilog scripts ( https://slurm.schedmd.com/prolog_epilog.html ). This method will create per job /scratch and /tmp directories which will be cleaned up after the job completes. Periodically purge the directory (e.g. tmpwatch ). Job removal grace periods Additionally, increasing the batch system grace period for job removal will give pilot jobs a better chance of cleaning up after themselves. For example, the time between scancel triggering a SIGTERM and a SIGKILL is controlled by the value of the KillWait configuration. Consider increasing this grace period scaling with the number of cores given to a pilot job as there could be more data to clean up with an increasing core count.","title":"For site administrators"},{"location":"worker-node/using-wn/#for-vo-managers","text":"Note The following advice applies to VO managers or maintainers of pilot software; end-users should contact their VO for the proper locations to stage temporary work (often, this will be either $TMPDIR or $_CONDOR_SCRATCH_DIR ). Be careful with using $OSG_WN_TMP ; at some sites, this directory might be shared with other VOs. We recommend creating a new sub-directory as a precaution: mkdir -p $OSG_WN_TMP /MYVO export mydir = ` mktemp -d -t MYVO ` cd $mydir # Run the rest of your application rm -rf $mydir The pilot should utilize $TMPDIR to communicate the location of temporary storage to payloads. A significant number of sites use the batch system to make an independent directory for each user job, and change $OSG_WN_TMP on the fly to point to this directory. There is no way to know in advance how much scratch disk space any given worker node has available; recall, what disk space is available may be shared among a number of job slots.","title":"For VO managers"}]}
\ No newline at end of file
+{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"OSG Site Documentation \u00b6 User documentation If you are a researcher interested in accessing OSG computational capacity, please consult our user documentation instead. The OSG Consortium provides common service and support for capacity providers and scientific institutions (i.e., \"sites\") using a distributed fabric of high throughput computational services. The OSG Consortium does not own computational capacity but provides software and services to users and capacity providers alike to enable the opportunistic usage and sharing of capacity. This documentation aims to provide HTC/HPC system administrators with the necessary information to contribute computational capacity to the OSG Consortium. Contributing to the OSG \u00b6 We offer two models for sites to contribute capacity to the OSG Consortium: one where OSG staff hosts and maintains capacity provisioning services for users; and the traditional model where the site hosts and maintains these same services. In both of these cases, the following will be needed: An existing compute cluster running on a supported operating system with a supported batch system: Grid Engine , HTCondor , LSF , PBS Pro / Torque , or Slurm . Outbound network connectivity from your cluster's worker nodes Temporary scratch space on each worker node Don't meet the requirements? If your site does not meet the above conditions, please contact us to discuss your options for contributing to the OSG Consortium. OSG-hosted services \u00b6 To contribute computational capacity with OSG-hosted services, your site will also need the following: Allow SSH access to your local cluster's login host from a known IP address Shared home directories on each cluster node Next steps If you are interested in OSG-hosted services, please contact us for a consultation, even if your site does not meet the conditions as outlined above! Self-hosted services \u00b6 If you are interested in contributing capacity by hosting your own OSG services, please continue with the site planning page.","title":"Home"},{"location":"#osg-site-documentation","text":"User documentation If you are a researcher interested in accessing OSG computational capacity, please consult our user documentation instead. The OSG Consortium provides common service and support for capacity providers and scientific institutions (i.e., \"sites\") using a distributed fabric of high throughput computational services. The OSG Consortium does not own computational capacity but provides software and services to users and capacity providers alike to enable the opportunistic usage and sharing of capacity. This documentation aims to provide HTC/HPC system administrators with the necessary information to contribute computational capacity to the OSG Consortium.","title":"OSG Site Documentation"},{"location":"#contributing-to-the-osg","text":"We offer two models for sites to contribute capacity to the OSG Consortium: one where OSG staff hosts and maintains capacity provisioning services for users; and the traditional model where the site hosts and maintains these same services. In both of these cases, the following will be needed: An existing compute cluster running on a supported operating system with a supported batch system: Grid Engine , HTCondor , LSF , PBS Pro / Torque , or Slurm . Outbound network connectivity from your cluster's worker nodes Temporary scratch space on each worker node Don't meet the requirements? If your site does not meet the above conditions, please contact us to discuss your options for contributing to the OSG Consortium.","title":"Contributing to the OSG"},{"location":"#osg-hosted-services","text":"To contribute computational capacity with OSG-hosted services, your site will also need the following: Allow SSH access to your local cluster's login host from a known IP address Shared home directories on each cluster node Next steps If you are interested in OSG-hosted services, please contact us for a consultation, even if your site does not meet the conditions as outlined above!","title":"OSG-hosted services"},{"location":"#self-hosted-services","text":"If you are interested in contributing capacity by hosting your own OSG services, please continue with the site planning page.","title":"Self-hosted services"},{"location":"site-maintenance/","text":"Site Maintenance \u00b6 This document outlines how to maintain your OSG site, including steps to take if you suspect that OSG jobs are causing issues. Handle Misbehaving Jobs \u00b6 In rare instances, you may experience issues at your site caused by misbehaving jobs (e.g., over-utilization of memory) from an OSG community or Virtual Organization (VO). If this occurs, you should immediately stop accepting job submissions from the OSG and remove the offending jobs: Configure your batch system to stop accepting jobs from the VO: For HTCondor batch systems, set the following in /etc/condor/config.d/ on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: SUBMIT_REQUIREMENT_Ban_OSG = (Owner != \"<OFFENDING VO USER>\") SUBMIT_REQUIREMENT_Ban_OSG_REASON = \"OSG pilot job submission temporarily disabled\" SUBMIT_REQUIREMENT_NAMES = $(SUBMIT_REQUIREMENT_NAMES) Ban_OSG Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, disable the relevant Slurm partition : [root@host] # scontrol update PartitionName = <OSG PARTITION> State = DOWN Replacing <OSG PARTITION> with the name of the partition where you are sending OSG jobs. Remove the VO's jobs: For HTCondor batch systems, run the following command on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: [root@access-point] # condor_rm <OFFENDING VO USER> Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, run the following command: [root@host] # scancel -u <OFFENDING VO USER> Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. Let us know so that we can track down the offending software or user: the same issue that you're experiencing may also be affecting other sites! Keep OSG Software Updated \u00b6 It is important to keep your software and data (e.g., CAs and VO client) up-to-date with the latest OSG release. See the release notes for your installed release series: OSG 3.6 release notes To stay abreast of software releases, we recommend subscribing to the osg-sites@opensciencegrid.org mailing list. Notify OSG of Major Changes \u00b6 To avoid potential issues with OSG job submissions, please notify us of major changes to your site, including: Major OS version changes on the worker nodes (e.g., upgraded from EL 7 to EL 8) Adding or removing container support through singularity or apptainer Policy changes regarding OSG resource requests (e.g., number of cores or GPUs, memory usage, or maximum walltime) Scheduled or unscheduled downtimes Site topology changes such as additions, modifications, or retirements of OSG services Changes to site contacts, such as administrative or security staff Help \u00b6 If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Site Maintenance"},{"location":"site-maintenance/#site-maintenance","text":"This document outlines how to maintain your OSG site, including steps to take if you suspect that OSG jobs are causing issues.","title":"Site Maintenance"},{"location":"site-maintenance/#handle-misbehaving-jobs","text":"In rare instances, you may experience issues at your site caused by misbehaving jobs (e.g., over-utilization of memory) from an OSG community or Virtual Organization (VO). If this occurs, you should immediately stop accepting job submissions from the OSG and remove the offending jobs: Configure your batch system to stop accepting jobs from the VO: For HTCondor batch systems, set the following in /etc/condor/config.d/ on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: SUBMIT_REQUIREMENT_Ban_OSG = (Owner != \"<OFFENDING VO USER>\") SUBMIT_REQUIREMENT_Ban_OSG_REASON = \"OSG pilot job submission temporarily disabled\" SUBMIT_REQUIREMENT_NAMES = $(SUBMIT_REQUIREMENT_NAMES) Ban_OSG Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, disable the relevant Slurm partition : [root@host] # scontrol update PartitionName = <OSG PARTITION> State = DOWN Replacing <OSG PARTITION> with the name of the partition where you are sending OSG jobs. Remove the VO's jobs: For HTCondor batch systems, run the following command on your HTCondor-CE or Access Point accepting jobs from an OSG Hosted CE: [root@access-point] # condor_rm <OFFENDING VO USER> Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. For Slurm batch systems, run the following command: [root@host] # scancel -u <OFFENDING VO USER> Replacing <OFFENDING VO USER> with the name of the local Unix account corresponding to the problematic VO. Let us know so that we can track down the offending software or user: the same issue that you're experiencing may also be affecting other sites!","title":"Handle Misbehaving Jobs"},{"location":"site-maintenance/#keep-osg-software-updated","text":"It is important to keep your software and data (e.g., CAs and VO client) up-to-date with the latest OSG release. See the release notes for your installed release series: OSG 3.6 release notes To stay abreast of software releases, we recommend subscribing to the osg-sites@opensciencegrid.org mailing list.","title":"Keep OSG Software Updated"},{"location":"site-maintenance/#notify-osg-of-major-changes","text":"To avoid potential issues with OSG job submissions, please notify us of major changes to your site, including: Major OS version changes on the worker nodes (e.g., upgraded from EL 7 to EL 8) Adding or removing container support through singularity or apptainer Policy changes regarding OSG resource requests (e.g., number of cores or GPUs, memory usage, or maximum walltime) Scheduled or unscheduled downtimes Site topology changes such as additions, modifications, or retirements of OSG services Changes to site contacts, such as administrative or security staff","title":"Notify OSG of Major Changes"},{"location":"site-maintenance/#help","text":"If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Help"},{"location":"site-planning/","text":"Site Planning \u00b6 The OSG vision is to integrate computing across different resource types and business models to allow campus IT to offer a maximally flexible high throughput computing (HTC) environment for their researchers. This document is for System Administrators and aims to provide an overview of the different options to consider when planning to share resources via the OSG. After reading, you should be able to understand what software or services you want to provide to support your researchers Note This document covers the most common options. OSG is a diverse infrastructure: depending on what groups you want to support, you may need to install additional services. Coordinate with your local researchers. OSG Site Services \u00b6 The OSG Software stack tries to provide a uniform computing and storage fabric across many independently-managed computing and storage resources. These individual services will be accessed by virtual organizations (VOs), which will delegate the resources to scientists, researchers, and students. Sharing is a fundamental principle for the OSG: your site is encouraged to support as many OSG-registered VOs as local conditions allow. Autonomy is another principle: you are not required to support any VOs you do not want. As the administrator, your task is to make your existing computing and storage resources available to and reliable for your supported VOs. We break this down into three tasks: Getting \"pilot jobs\" submitted to your site batch system. Establishing an OSG runtime environment for running jobs. Delivering data to payload applications to be processed. There are multiple approaches for each item, depending on the VOs you support, and time you have to invest in the OSG. Note An essential concept in the OSG is the \"pilot job\". The pilot, which arrives at your batch system, is sent by the VO to get a resource allocation. However, it does not contain any research payload. Once started, it will connect back to a resource pool and pull down individuals' research \"payload jobs\". Hence, we do not think about submitting \"jobs\" to sites but rather \"resource requests\". Pilot Jobs \u00b6 Traditionally, an OSG Compute Entrypoint (CE) provides remote access for VOs to submit pilot jobs to your local batch system . There are two options for accepting pilot jobs at your site: Hosted CE : OSG will run and operate the CE services at no cost; the site only needs to provide a SSH pubkey-based authentication access to the central OSG host. OSG will interface with the VO and submit pilots directly to your batch system via SSH. By far, this is the simplest option : however, it is less-scalable and the site delegates many of the scheduling decisions to the OSG. Contact help@osg-htc.org for more information on the hosted CE. OSG CE : The traditional option where the site installs and operates a HTCondor-based CE on a dedicated host. This provides the best scalability and flexibility, but may require an ongoing time investment from the site. The OSG CE install and operation is covered in this documentation page . There are additional ways that pilots can be started at a site (either by the site administrator or an end-user); see resource sharing for more details. Runtime environment \u00b6 The OSG requires a very minimal runtime environment that can be deployed via tarball , RPM , or through a global filesystem on your cluster's worker nodes. We believe that all research applications should be portable and self-contained, with no OS dependencies. This provides access to the most resources and minimizes the presence at sites. However, this ideal is often difficult to achieve in practice. For sites that want to support a uniform runtime environment, we provide a global filesystem called CVMFS that VOs can use to distribute their own software dependencies. Finally, many researchers use applications that require a specific OS environment - not just individual dependencies - that is distributed as a container. OSG supports the use of the Singularity container runtime with Docker-based image distribution. Data Services \u00b6 Whether accessed through CVMFS or command-line software like curl , the majority of software is moved via HTTP in cache-friendly patterns. All sites are highly encouraged to use an HTTP proxy to reduce the load on the WAN from the cluster. Depending on the VOs you want to support, additional data services may be necessary: Some VOs elect to stream their larger input data from offsite using OSG's Data Federation . User jobs can make use of the OSG Data Federation without any services at your site but you may wish to run one or more of the following services: Data Cache to further reduce load on your connection to the WAN. Data Origin to allow local users to stage their data into the OSG Data Federation. The largest sites will additionally run large-scale data services such as a \"storage element\". This is often required for sites that want to support more complex organizations such as ATLAS or CMS. Site Policies \u00b6 Sites are encouraged to clearly specify and communicate their local policies regarding resource access. One common mechanism to do this is post them on a web page and make this page part of your site registration . Written policies help external entities understand what your site wants to accomplish with the OSG -- and are often internally clarifying. In line of our principle of sharing , we encourage you to allow virtual organizations registered with the OSG \"opportunistic use\" of your resources. You may need to preempt those jobs when higher priority jobs come around. The end-users using the OSG generally prefer having access to your site subject to preemption over having no access at all. Getting Help \u00b6 If you need help with planning your site, follow the contact instructions .","title":"Site Planning"},{"location":"site-planning/#site-planning","text":"The OSG vision is to integrate computing across different resource types and business models to allow campus IT to offer a maximally flexible high throughput computing (HTC) environment for their researchers. This document is for System Administrators and aims to provide an overview of the different options to consider when planning to share resources via the OSG. After reading, you should be able to understand what software or services you want to provide to support your researchers Note This document covers the most common options. OSG is a diverse infrastructure: depending on what groups you want to support, you may need to install additional services. Coordinate with your local researchers.","title":"Site Planning"},{"location":"site-planning/#osg-site-services","text":"The OSG Software stack tries to provide a uniform computing and storage fabric across many independently-managed computing and storage resources. These individual services will be accessed by virtual organizations (VOs), which will delegate the resources to scientists, researchers, and students. Sharing is a fundamental principle for the OSG: your site is encouraged to support as many OSG-registered VOs as local conditions allow. Autonomy is another principle: you are not required to support any VOs you do not want. As the administrator, your task is to make your existing computing and storage resources available to and reliable for your supported VOs. We break this down into three tasks: Getting \"pilot jobs\" submitted to your site batch system. Establishing an OSG runtime environment for running jobs. Delivering data to payload applications to be processed. There are multiple approaches for each item, depending on the VOs you support, and time you have to invest in the OSG. Note An essential concept in the OSG is the \"pilot job\". The pilot, which arrives at your batch system, is sent by the VO to get a resource allocation. However, it does not contain any research payload. Once started, it will connect back to a resource pool and pull down individuals' research \"payload jobs\". Hence, we do not think about submitting \"jobs\" to sites but rather \"resource requests\".","title":"OSG Site Services"},{"location":"site-planning/#pilot-jobs","text":"Traditionally, an OSG Compute Entrypoint (CE) provides remote access for VOs to submit pilot jobs to your local batch system . There are two options for accepting pilot jobs at your site: Hosted CE : OSG will run and operate the CE services at no cost; the site only needs to provide a SSH pubkey-based authentication access to the central OSG host. OSG will interface with the VO and submit pilots directly to your batch system via SSH. By far, this is the simplest option : however, it is less-scalable and the site delegates many of the scheduling decisions to the OSG. Contact help@osg-htc.org for more information on the hosted CE. OSG CE : The traditional option where the site installs and operates a HTCondor-based CE on a dedicated host. This provides the best scalability and flexibility, but may require an ongoing time investment from the site. The OSG CE install and operation is covered in this documentation page . There are additional ways that pilots can be started at a site (either by the site administrator or an end-user); see resource sharing for more details.","title":"Pilot Jobs"},{"location":"site-planning/#runtime-environment","text":"The OSG requires a very minimal runtime environment that can be deployed via tarball , RPM , or through a global filesystem on your cluster's worker nodes. We believe that all research applications should be portable and self-contained, with no OS dependencies. This provides access to the most resources and minimizes the presence at sites. However, this ideal is often difficult to achieve in practice. For sites that want to support a uniform runtime environment, we provide a global filesystem called CVMFS that VOs can use to distribute their own software dependencies. Finally, many researchers use applications that require a specific OS environment - not just individual dependencies - that is distributed as a container. OSG supports the use of the Singularity container runtime with Docker-based image distribution.","title":"Runtime environment"},{"location":"site-planning/#data-services","text":"Whether accessed through CVMFS or command-line software like curl , the majority of software is moved via HTTP in cache-friendly patterns. All sites are highly encouraged to use an HTTP proxy to reduce the load on the WAN from the cluster. Depending on the VOs you want to support, additional data services may be necessary: Some VOs elect to stream their larger input data from offsite using OSG's Data Federation . User jobs can make use of the OSG Data Federation without any services at your site but you may wish to run one or more of the following services: Data Cache to further reduce load on your connection to the WAN. Data Origin to allow local users to stage their data into the OSG Data Federation. The largest sites will additionally run large-scale data services such as a \"storage element\". This is often required for sites that want to support more complex organizations such as ATLAS or CMS.","title":"Data Services"},{"location":"site-planning/#site-policies","text":"Sites are encouraged to clearly specify and communicate their local policies regarding resource access. One common mechanism to do this is post them on a web page and make this page part of your site registration . Written policies help external entities understand what your site wants to accomplish with the OSG -- and are often internally clarifying. In line of our principle of sharing , we encourage you to allow virtual organizations registered with the OSG \"opportunistic use\" of your resources. You may need to preempt those jobs when higher priority jobs come around. The end-users using the OSG generally prefer having access to your site subject to preemption over having no access at all.","title":"Site Policies"},{"location":"site-planning/#getting-help","text":"If you need help with planning your site, follow the contact instructions .","title":"Getting Help"},{"location":"site-verification/","text":"Site Verification \u00b6 After installing and registering services from the site planning document , you will need to perform some verification steps before your site can scale up to full production . Verify OSG Software \u00b6 To verify your site's installation of OSG Software, you will need to: Submit local test jobs Contact the OSG for end-to-end tests of pilot job submission Check that OSG usage is reported to the GRACC Local verification \u00b6 It is useful to submit jobs from within your site to verify CE's ability to submit jobs to your local batch system. Consult the document for submitting jobs into an HTCondor-CE for detailed instructions on how to test job submission. Verify end-to-end pilot job submission \u00b6 Once you have validated job submission from within your site, request test pilot jobs from OSG Factory Operations and provide the following information: The fully qualified domain name of the CE Registered OSG resource name Supported OS version of your worker nodes (e.g., EL7, EL8, or a combination) Support for multicore jobs Support for GPUs Maximum job walltime Maximum job memory usage Once the Factory Operations team has enough information, they will start submitting pilots to your CE. Initially, this will be a handful of pilots at a time but once the factory verifies that pilot jobs are running successfully, that number will be ramped up. Verify reporting and monitoring \u00b6 To verify that your site is correctly reporting to the OSG, visit OSG's Accounting Portal and select your registered OSG site name from the Site dropdown. If you don't see your site in the dropdown, please contact us for assistance . Scale Up to Full Production \u00b6 After verifying end-to-end pilot job submission and usage reporting, your site is ready for production! In the same OSG Factory Operations ticket that you opened above , let OSG staff know when you are ready to accept production pilots. After requesting production pilots, review the documentation for how to maintain an OSG site . Getting Help \u00b6 If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Site Verification"},{"location":"site-verification/#site-verification","text":"After installing and registering services from the site planning document , you will need to perform some verification steps before your site can scale up to full production .","title":"Site Verification"},{"location":"site-verification/#verify-osg-software","text":"To verify your site's installation of OSG Software, you will need to: Submit local test jobs Contact the OSG for end-to-end tests of pilot job submission Check that OSG usage is reported to the GRACC","title":"Verify OSG Software"},{"location":"site-verification/#local-verification","text":"It is useful to submit jobs from within your site to verify CE's ability to submit jobs to your local batch system. Consult the document for submitting jobs into an HTCondor-CE for detailed instructions on how to test job submission.","title":"Local verification"},{"location":"site-verification/#verify-end-to-end-pilot-job-submission","text":"Once you have validated job submission from within your site, request test pilot jobs from OSG Factory Operations and provide the following information: The fully qualified domain name of the CE Registered OSG resource name Supported OS version of your worker nodes (e.g., EL7, EL8, or a combination) Support for multicore jobs Support for GPUs Maximum job walltime Maximum job memory usage Once the Factory Operations team has enough information, they will start submitting pilots to your CE. Initially, this will be a handful of pilots at a time but once the factory verifies that pilot jobs are running successfully, that number will be ramped up.","title":"Verify end-to-end pilot job submission"},{"location":"site-verification/#verify-reporting-and-monitoring","text":"To verify that your site is correctly reporting to the OSG, visit OSG's Accounting Portal and select your registered OSG site name from the Site dropdown. If you don't see your site in the dropdown, please contact us for assistance .","title":"Verify reporting and monitoring"},{"location":"site-verification/#scale-up-to-full-production","text":"After verifying end-to-end pilot job submission and usage reporting, your site is ready for production! In the same OSG Factory Operations ticket that you opened above , let OSG staff know when you are ready to accept production pilots. After requesting production pilots, review the documentation for how to maintain an OSG site .","title":"Scale Up to Full Production"},{"location":"site-verification/#getting-help","text":"If you need help with your site, or need to report a security incident, follow the contact instructions .","title":"Getting Help"},{"location":"common/ca/","text":"Installing Certificate Authorities (CAs) \u00b6 The certificate authorities (CAs) provide the trust roots for the public key infrastructure OSG uses to maintain integrity of its sites and services. This document provides details of various options to install the Certificate Authority (CA) certificates and have up-to-date certificate revocation lists (CRLs) on your OSG hosts. We provide three options for installing CA certificates that offer varying levels of control: Install an RPM for a specific set of CA certificates ( default ) Install osg-ca-scripts , a set of scripts that provide fine-grained CA management Install an RPM that doesn't install any CAs. This is useful if you'd like to manage CAs yourself while satisfying RPM dependencies. Prior to following the instructions on this page, you must enable our yum repositories Installing CA Certificates \u00b6 Please choose one of the three options to install CA certificates. Option 1: Install an RPM for a specific set of CA certificates \u00b6 Note This option is the default if you install OSG software without pre-installing CAs. For example, yum install osg-ce will bring in osg-ca-certs by default. In the OSG repositories, you will find two different sets of predefined CA certificates: ( default ) The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. If you chose... Then run the following command... OSG CA certificates yum install osg-ca-certs IGTF CA certificates yum install igtf-ca-certs To automatically keep your RPM installation of CAs up to date, we recommend the OSG CA certificates updater service. Option 2: Install osg-ca-scripts \u00b6 The osg-ca-scripts package provides scripts to install and update predefined sets of CAs with the ability to add or remove specific CAs. The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. Install the osg-ca-scripts package: root@host # yum install osg-ca-scripts Choose and install the CA certificate set: If you choose... Then run the following command... OSG CA certificates osg-ca-manage setupCA --location root --url osg IGTF CA certificates osg-ca-manage setupCA --location root --url igtf Enable the osg-update-certs-cron service to enable periodic CA updates. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> (Optional) To add a new CA: osg-ca-manage add [--dir <local_dir>] --hash <CA-HASH> (Optional) To remove a CA osg-ca-manage remove --hash <CA-HASH> A complete set of options available though osg-ca-manage command, can be found in the osg-ca-manage documentation Option 3: Site-managed CAs \u00b6 If you want to handle the list of CAs completely internally to your site, you can utilize the empty-ca-certs RPM to satisfy RPM dependencies while not actually installing any CAs. To install this RPM, run the following command: root@host # yum install empty-ca-certs \u2013-enablerepo = osg-empty Warning If you choose this option, you are responsible for installing and maintaining the CA certificates. They must be installed in /etc/grid-security/certificates , or a symlink must be made from that location to the directory that contains the CA certificates. Installing other CAs \u00b6 In addition to the above CAs, you can install other CAs via RPM. These only work with the RPMs that provide CAs (that is, osg-ca-certs and the like, but not osg-ca-scripts .) They are in addition to the above RPMs, so do not only install these extra CAs. Set of CAs RPM name Installation command (as root) cilogon-openid cilogon-openid-ca-cert yum install cilogon-openid-ca-cert Verifying CA Certificates \u00b6 After installing or updating the CA certificates, they can be verified with the following command: root@host # curl --cacert <CA FILE> \\ --capath <CA DIRECTORY> \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" Where <CA FILE> is the path to a valid X.509 CA certificate and <CA DIRECTORY> is the path to the directory containing the installed CA certificates. For example, the following command can be used to verify a default OSG CA certificate installation: root@host # curl --cacert /etc/grid-security/certificates/cilogon-osg.pem \\ --capath /etc/grid-security/certificates/ \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 22005 0 22005 0 0 86633 0 --:--:-- --:--:-- --:--:-- 499k CA certificate installation verified If you do not see CA certificate installation verified this means that your CA certificate installation is broken. First, ensure that your CA installation is up-to-date and if you continue to see issues please contact us . Keeping CA Certificates Up-to-date \u00b6 It is important to keep CA certificates up-to-date for services and their clients to maintain integrity of production services. To verify that your CA certificates are on the latest version on a given host, determine the most recently released versions and the method by which your CA certificates have been installed: Retrieve the versions of the most recently released IGTF CA certificates and OSG CA certificates Determine which of the three CA certificate installation methods you are using: # rpm -q igtf-ca-certs osg-ca-certs osg-ca-scripts empty-ca-certs Based on which package is installed from the output in the previous step, choose one of the following options: If igtf-ca-certs or osg-ca-certs is installed , compare the installed version from step 2 to the corresponding version from step 1. If the version is older than the corresponding version from step 1, continue onto option 1 to upgrade your current installation and keep your installation up-to-date. If the versions match, your CA certificates are up-to-date! If osg-ca-scripts is installed , run the following command to update your CA certificates: # osg-ca-manage refreshCA And continue to the instructions in option 2 to enable automatic updates of your CA certificates. If empty-ca-scripts is installed , then you are responsible for maintaining your own CA certificates as outlined in option 3 . If none of the packages are installed , your host likely does not need CA certificates and you are done. Managing Certificate Revocation Lists \u00b6 In addition to CA certificates, you must have updated Certificate Revocation Lists (CRLs). CRLs contain certificate blacklists that OSG software uses to ensure that your hosts are only talking to valid clients or servers. To maintain up to date CAs, you will need to run the fetch-crl services. Note Normally fetch-crl is installed when you install the rest of the software and you do not need to explicitly install it. If you do wish to install it manually, run the following command: root@host # yum install fetch-crl If you do not wish to change the frequency of fetch-crl updates (default: every 6 hours) or use syslog for fetch-crl output, skip to the service management section Optional: configuring fetch-crl \u00b6 The following sub-sections contain optional configuration instructions. Note Note that the nosymlinks option in the configuration files refers to ignoring links within the certificates directory (e.g. two different names for the same file). It is perfectly fine if the path of the CA certificates directory itself ( infodir ) is a link to a directory. Changing the frequency of fetch-crl-cron \u00b6 To modify the times that fetch-crl-cron runs, edit /etc/cron.d/fetch-crl . Logging with syslog \u00b6 fetch-crl can produce quite a bit of output when run in verbose mode. To send fetch-crl output to syslog, use the following instructions: Change the configuration file to enable syslog: logmode = syslog syslogfacility = daemon Make sure the file /var/log/daemon exists, e.g. touching the file Change /etc/logrotate.d files to rotate it Managing fetch-crl services \u00b6 fetch-crl is installed as two different system services. The fetch-crl-boot service runs fetch-crl and is intended to only be enabled or disabled. The fetch-crl-cron service runs fetch-crl every 6 hours (with a random sleep time included). Both services are disabled by default. At the very minimum, the fetch-crl-cron service needs to be enabled and started, otherwise services will begin to fail as existing CRLs expire. Software Service name Notes Fetch CRL fetch-crl.timer (EL8-only) Runs fetch-crl every 6 hours and on boot fetch-crl-cron (EL7-only) Runs fetch-crl every 6 hours fetch-crl-boot (EL7-only) Runs fetch-crl immediately and on boot Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> Getting Help \u00b6 To get assistance, please use the this page . References \u00b6 Some guides on X.509 certificates: Useful commands: http://security.ncsa.illinois.edu/research/grid-howtos/usefulopenssl.html Install GSI authentication on a server: http://security.ncsa.illinois.edu/research/wssec/gsihttps/ Certificates how-to: http://www.nordugrid.org/documents/certificate_howto.html See this page for examples of verifying certificates. Related software: osg-ca-manage osg-ca-certs-updater Configuration files \u00b6 Package File Description Location Comment All CA Packages CA File Location /etc/grid-security/certificates All CA Packages Index files /etc/grid-security/certificates/INDEX.html or /etc/grid-security/certificates/INDEX.txt Latest version also available at http://repo.opensciencegrid.org/cadist/ All CA Packages Change Log /etc/grid-security/certificates/CHANGES Latest version also available at http://repo.opensciencegrid.org/cadist/CHANGES osg-ca-certs or igtf-ca-certs contain only CA files osg-ca-scripts Configuration File for osg-update-certs /etc/osg/osg-update-certs.conf This file may be edited by hand, though it is recommended to use osg-ca-manage to set configuration parameters. fetch-crl-3.x Configuration file /etc/fetch-crl.conf The index and change log files contain a summary of all the CA distributed and their version. Logs files \u00b6 Package File Description Location osg-ca-scripts Log file of osg-update-certs /var/log/osg-update-certs.log osg-ca-scripts Stdout of osg-update-certs /var/log/osg-ca-certs-status.system.out osg-ca-scripts Stdout of osg-ca-manage /var/log/osg-ca-manage.system.out osg-ca-scripts Stdout of initial CA setup /var/log/osg-setup-ca-certificates.system.out","title":"Overview"},{"location":"common/ca/#installing-certificate-authorities-cas","text":"The certificate authorities (CAs) provide the trust roots for the public key infrastructure OSG uses to maintain integrity of its sites and services. This document provides details of various options to install the Certificate Authority (CA) certificates and have up-to-date certificate revocation lists (CRLs) on your OSG hosts. We provide three options for installing CA certificates that offer varying levels of control: Install an RPM for a specific set of CA certificates ( default ) Install osg-ca-scripts , a set of scripts that provide fine-grained CA management Install an RPM that doesn't install any CAs. This is useful if you'd like to manage CAs yourself while satisfying RPM dependencies. Prior to following the instructions on this page, you must enable our yum repositories","title":"Installing Certificate Authorities (CAs)"},{"location":"common/ca/#installing-ca-certificates","text":"Please choose one of the three options to install CA certificates.","title":"Installing CA Certificates"},{"location":"common/ca/#option-1-install-an-rpm-for-a-specific-set-of-ca-certificates","text":"Note This option is the default if you install OSG software without pre-installing CAs. For example, yum install osg-ce will bring in osg-ca-certs by default. In the OSG repositories, you will find two different sets of predefined CA certificates: ( default ) The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. If you chose... Then run the following command... OSG CA certificates yum install osg-ca-certs IGTF CA certificates yum install igtf-ca-certs To automatically keep your RPM installation of CAs up to date, we recommend the OSG CA certificates updater service.","title":"Option 1: Install an RPM for a specific set of CA certificates"},{"location":"common/ca/#option-2-install-osg-ca-scripts","text":"The osg-ca-scripts package provides scripts to install and update predefined sets of CAs with the ability to add or remove specific CAs. The OSG CA certificates. This is similar to the IGTF set but may have a small number of additions or deletions The IGTF CA certificates See this page for details of the contents of the OSG CA package. Install the osg-ca-scripts package: root@host # yum install osg-ca-scripts Choose and install the CA certificate set: If you choose... Then run the following command... OSG CA certificates osg-ca-manage setupCA --location root --url osg IGTF CA certificates osg-ca-manage setupCA --location root --url igtf Enable the osg-update-certs-cron service to enable periodic CA updates. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> (Optional) To add a new CA: osg-ca-manage add [--dir <local_dir>] --hash <CA-HASH> (Optional) To remove a CA osg-ca-manage remove --hash <CA-HASH> A complete set of options available though osg-ca-manage command, can be found in the osg-ca-manage documentation","title":"Option 2: Install osg-ca-scripts"},{"location":"common/ca/#option-3-site-managed-cas","text":"If you want to handle the list of CAs completely internally to your site, you can utilize the empty-ca-certs RPM to satisfy RPM dependencies while not actually installing any CAs. To install this RPM, run the following command: root@host # yum install empty-ca-certs \u2013-enablerepo = osg-empty Warning If you choose this option, you are responsible for installing and maintaining the CA certificates. They must be installed in /etc/grid-security/certificates , or a symlink must be made from that location to the directory that contains the CA certificates.","title":"Option 3: Site-managed CAs"},{"location":"common/ca/#installing-other-cas","text":"In addition to the above CAs, you can install other CAs via RPM. These only work with the RPMs that provide CAs (that is, osg-ca-certs and the like, but not osg-ca-scripts .) They are in addition to the above RPMs, so do not only install these extra CAs. Set of CAs RPM name Installation command (as root) cilogon-openid cilogon-openid-ca-cert yum install cilogon-openid-ca-cert","title":"Installing other CAs"},{"location":"common/ca/#verifying-ca-certificates","text":"After installing or updating the CA certificates, they can be verified with the following command: root@host # curl --cacert <CA FILE> \\ --capath <CA DIRECTORY> \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" Where <CA FILE> is the path to a valid X.509 CA certificate and <CA DIRECTORY> is the path to the directory containing the installed CA certificates. For example, the following command can be used to verify a default OSG CA certificate installation: root@host # curl --cacert /etc/grid-security/certificates/cilogon-osg.pem \\ --capath /etc/grid-security/certificates/ \\ -o /dev/null \\ https://gracc.opensciencegrid.org \\ && echo \"CA certificate installation verified\" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 22005 0 22005 0 0 86633 0 --:--:-- --:--:-- --:--:-- 499k CA certificate installation verified If you do not see CA certificate installation verified this means that your CA certificate installation is broken. First, ensure that your CA installation is up-to-date and if you continue to see issues please contact us .","title":"Verifying CA Certificates"},{"location":"common/ca/#keeping-ca-certificates-up-to-date","text":"It is important to keep CA certificates up-to-date for services and their clients to maintain integrity of production services. To verify that your CA certificates are on the latest version on a given host, determine the most recently released versions and the method by which your CA certificates have been installed: Retrieve the versions of the most recently released IGTF CA certificates and OSG CA certificates Determine which of the three CA certificate installation methods you are using: # rpm -q igtf-ca-certs osg-ca-certs osg-ca-scripts empty-ca-certs Based on which package is installed from the output in the previous step, choose one of the following options: If igtf-ca-certs or osg-ca-certs is installed , compare the installed version from step 2 to the corresponding version from step 1. If the version is older than the corresponding version from step 1, continue onto option 1 to upgrade your current installation and keep your installation up-to-date. If the versions match, your CA certificates are up-to-date! If osg-ca-scripts is installed , run the following command to update your CA certificates: # osg-ca-manage refreshCA And continue to the instructions in option 2 to enable automatic updates of your CA certificates. If empty-ca-scripts is installed , then you are responsible for maintaining your own CA certificates as outlined in option 3 . If none of the packages are installed , your host likely does not need CA certificates and you are done.","title":"Keeping CA Certificates Up-to-date"},{"location":"common/ca/#managing-certificate-revocation-lists","text":"In addition to CA certificates, you must have updated Certificate Revocation Lists (CRLs). CRLs contain certificate blacklists that OSG software uses to ensure that your hosts are only talking to valid clients or servers. To maintain up to date CAs, you will need to run the fetch-crl services. Note Normally fetch-crl is installed when you install the rest of the software and you do not need to explicitly install it. If you do wish to install it manually, run the following command: root@host # yum install fetch-crl If you do not wish to change the frequency of fetch-crl updates (default: every 6 hours) or use syslog for fetch-crl output, skip to the service management section","title":"Managing Certificate Revocation Lists"},{"location":"common/ca/#optional-configuring-fetch-crl","text":"The following sub-sections contain optional configuration instructions. Note Note that the nosymlinks option in the configuration files refers to ignoring links within the certificates directory (e.g. two different names for the same file). It is perfectly fine if the path of the CA certificates directory itself ( infodir ) is a link to a directory.","title":"Optional: configuring fetch-crl"},{"location":"common/ca/#changing-the-frequency-of-fetch-crl-cron","text":"To modify the times that fetch-crl-cron runs, edit /etc/cron.d/fetch-crl .","title":"Changing the frequency of fetch-crl-cron"},{"location":"common/ca/#logging-with-syslog","text":"fetch-crl can produce quite a bit of output when run in verbose mode. To send fetch-crl output to syslog, use the following instructions: Change the configuration file to enable syslog: logmode = syslog syslogfacility = daemon Make sure the file /var/log/daemon exists, e.g. touching the file Change /etc/logrotate.d files to rotate it","title":"Logging with syslog"},{"location":"common/ca/#managing-fetch-crl-services","text":"fetch-crl is installed as two different system services. The fetch-crl-boot service runs fetch-crl and is intended to only be enabled or disabled. The fetch-crl-cron service runs fetch-crl every 6 hours (with a random sleep time included). Both services are disabled by default. At the very minimum, the fetch-crl-cron service needs to be enabled and started, otherwise services will begin to fail as existing CRLs expire. Software Service name Notes Fetch CRL fetch-crl.timer (EL8-only) Runs fetch-crl every 6 hours and on boot fetch-crl-cron (EL7-only) Runs fetch-crl every 6 hours fetch-crl-boot (EL7-only) Runs fetch-crl immediately and on boot Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME>","title":"Managing fetch-crl services"},{"location":"common/ca/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"common/ca/#references","text":"Some guides on X.509 certificates: Useful commands: http://security.ncsa.illinois.edu/research/grid-howtos/usefulopenssl.html Install GSI authentication on a server: http://security.ncsa.illinois.edu/research/wssec/gsihttps/ Certificates how-to: http://www.nordugrid.org/documents/certificate_howto.html See this page for examples of verifying certificates. Related software: osg-ca-manage osg-ca-certs-updater","title":"References"},{"location":"common/ca/#configuration-files","text":"Package File Description Location Comment All CA Packages CA File Location /etc/grid-security/certificates All CA Packages Index files /etc/grid-security/certificates/INDEX.html or /etc/grid-security/certificates/INDEX.txt Latest version also available at http://repo.opensciencegrid.org/cadist/ All CA Packages Change Log /etc/grid-security/certificates/CHANGES Latest version also available at http://repo.opensciencegrid.org/cadist/CHANGES osg-ca-certs or igtf-ca-certs contain only CA files osg-ca-scripts Configuration File for osg-update-certs /etc/osg/osg-update-certs.conf This file may be edited by hand, though it is recommended to use osg-ca-manage to set configuration parameters. fetch-crl-3.x Configuration file /etc/fetch-crl.conf The index and change log files contain a summary of all the CA distributed and their version.","title":"Configuration files"},{"location":"common/ca/#logs-files","text":"Package File Description Location osg-ca-scripts Log file of osg-update-certs /var/log/osg-update-certs.log osg-ca-scripts Stdout of osg-update-certs /var/log/osg-ca-certs-status.system.out osg-ca-scripts Stdout of osg-ca-manage /var/log/osg-ca-manage.system.out osg-ca-scripts Stdout of initial CA setup /var/log/osg-setup-ca-certificates.system.out","title":"Logs files"},{"location":"common/contact-registration/","text":"Registering Contact Information \u00b6 OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinate in case of security incidents or troubleshooting services. The OSG contact management service is backed by InCommon federation , meaning that contacts may register with the OSG using their institutional identities with familiar Single Sign-On forms. Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations). How do I register a mailing list? If you would like to register a mailing list as a contact for your site, please contact us directly . Submitting an Application \u00b6 To register with the OSG, submit an application using the self-signup process: Visit https://osg-htc.org/register You will be presented with a Single-Sign On page. Select your insitution and sign in with your insitutional credentials: Help, my institution does not show up in the drop-down! If your institution does not show up in the drop-down menu, then your institution is not part of the InCommon federation . In this case, we recommend using an ORCID account instead, registering a new one if necessary. After you have signed in, you will be presented with the self-signup form. Click the \"BEGIN\" button: Enter your name, email address, GitHub username (optional), and a comment describing why you are registering as a participant in the OSG Consortium. Your institution may provide defaults for your name and email address but you may override these values. Once you have updated all the fields to your liking, click the \"SUBMIT\" button: Verifying Your Email Address \u00b6 After submitting your registration application, you will receive an email from registry@cilogon.org to verify your email address. Follow the link in the email and click the \"Accept\" button to complete the verification: Wait for URL redirection After clicking the email verification link, be sure to let the page to completely load (you will be redirected back to this page), otherwise you may have issues completing your registration. If you believe this has happened to you, please contact us for assistance. Help, my email verification link has expired! If the email verification link has expired, please contact us to request a new verification link. Waiting for Approval \u00b6 After verifying your email address, your registration application must be approved by OSG staff. Once your registration application has been approved, you will receive a confirmation email: Once you have received your confirmation email, you may start using OSG services such as registering your resources . OASIS Managers: Adding an SSH Key \u00b6 After approval by OSG staff, OASIS managers must upload a public SSH key before being able to access the OASIS login host: Visit https://osg-htc.org/register and login if prompted Click your name in the top right to get a dropdown and click the My Profile (OSG) button On the right-side of your profile, click the Authenticators link: On the authenticators page, click the Manage button: On the SSH keys page, click the Add SSH Key link: Finally, upload your public SSH key from your computer: Getting Help \u00b6 For assistance with the OSG contact registration process, please use this page .","title":"Contact Information"},{"location":"common/contact-registration/#registering-contact-information","text":"OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinate in case of security incidents or troubleshooting services. The OSG contact management service is backed by InCommon federation , meaning that contacts may register with the OSG using their institutional identities with familiar Single Sign-On forms. Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations). How do I register a mailing list? If you would like to register a mailing list as a contact for your site, please contact us directly .","title":"Registering Contact Information"},{"location":"common/contact-registration/#submitting-an-application","text":"To register with the OSG, submit an application using the self-signup process: Visit https://osg-htc.org/register You will be presented with a Single-Sign On page. Select your insitution and sign in with your insitutional credentials: Help, my institution does not show up in the drop-down! If your institution does not show up in the drop-down menu, then your institution is not part of the InCommon federation . In this case, we recommend using an ORCID account instead, registering a new one if necessary. After you have signed in, you will be presented with the self-signup form. Click the \"BEGIN\" button: Enter your name, email address, GitHub username (optional), and a comment describing why you are registering as a participant in the OSG Consortium. Your institution may provide defaults for your name and email address but you may override these values. Once you have updated all the fields to your liking, click the \"SUBMIT\" button:","title":"Submitting an Application"},{"location":"common/contact-registration/#verifying-your-email-address","text":"After submitting your registration application, you will receive an email from registry@cilogon.org to verify your email address. Follow the link in the email and click the \"Accept\" button to complete the verification: Wait for URL redirection After clicking the email verification link, be sure to let the page to completely load (you will be redirected back to this page), otherwise you may have issues completing your registration. If you believe this has happened to you, please contact us for assistance. Help, my email verification link has expired! If the email verification link has expired, please contact us to request a new verification link.","title":"Verifying Your Email Address"},{"location":"common/contact-registration/#waiting-for-approval","text":"After verifying your email address, your registration application must be approved by OSG staff. Once your registration application has been approved, you will receive a confirmation email: Once you have received your confirmation email, you may start using OSG services such as registering your resources .","title":"Waiting for Approval"},{"location":"common/contact-registration/#oasis-managers-adding-an-ssh-key","text":"After approval by OSG staff, OASIS managers must upload a public SSH key before being able to access the OASIS login host: Visit https://osg-htc.org/register and login if prompted Click your name in the top right to get a dropdown and click the My Profile (OSG) button On the right-side of your profile, click the Authenticators link: On the authenticators page, click the Manage button: On the SSH keys page, click the Add SSH Key link: Finally, upload your public SSH key from your computer:","title":"OASIS Managers: Adding an SSH Key"},{"location":"common/contact-registration/#getting-help","text":"For assistance with the OSG contact registration process, please use this page .","title":"Getting Help"},{"location":"common/help/","text":"How to Get Help \u00b6 This page is aimed at OSG site administrators looking for support. Help for OSG users can be found at our support desk . Security Incidents \u00b6 Security incidents can be reported by following the instructions on the Incident Discovery and Reporting page. Software or Service Support \u00b6 If you are experiencing issues with OSG software or services, please consult the following resources before opening a support inquiry: Troubleshooting sections or pages for the problematic software Recent OSG Software release notes OSG 23 OSG 3.6 Outage information for OSG services Submitting support inquiries \u00b6 If your problem still hasn't been resolved by consulting the resources above, please submit a support inquiry with the information noted below: If you came to this page from an installation guide, please provide the following information: Commands and output from any Troubleshooting sections or pages The OSG system profile ( osg-profile.txt ), generated by running the following command: root@host # osg-system-profiler Submit a support inquiry to the system based on the VOs that you are associated with: If you are primarily associated with... Submit new tickets to... LHC VOs GGUS Anyone else help@osg-htc.org Community-specific support \u00b6 Some OSG VOs have dedicated forums or mechanisms for community-specific support. If your VO provides user support, that should be a user's first line of support because the VO is most familiar with your applications and requirements. The list of support centers for OSG VOs can be found in the here . Resources for CMS sites: http://www.uscms.org/uscms_at_work/physics/computing/grid/index.shtml CMS Hyper News: https://hypernews.cern.ch/HyperNews/CMS/get/osg-tier3.html CMS Twiki: https://twiki.cern.ch/twiki/bin/viewauth/CMS/USTier3Computing","title":"Help / Security Incidents"},{"location":"common/help/#how-to-get-help","text":"This page is aimed at OSG site administrators looking for support. Help for OSG users can be found at our support desk .","title":"How to Get Help"},{"location":"common/help/#security-incidents","text":"Security incidents can be reported by following the instructions on the Incident Discovery and Reporting page.","title":"Security Incidents"},{"location":"common/help/#software-or-service-support","text":"If you are experiencing issues with OSG software or services, please consult the following resources before opening a support inquiry: Troubleshooting sections or pages for the problematic software Recent OSG Software release notes OSG 23 OSG 3.6 Outage information for OSG services","title":"Software or Service Support"},{"location":"common/help/#submitting-support-inquiries","text":"If your problem still hasn't been resolved by consulting the resources above, please submit a support inquiry with the information noted below: If you came to this page from an installation guide, please provide the following information: Commands and output from any Troubleshooting sections or pages The OSG system profile ( osg-profile.txt ), generated by running the following command: root@host # osg-system-profiler Submit a support inquiry to the system based on the VOs that you are associated with: If you are primarily associated with... Submit new tickets to... LHC VOs GGUS Anyone else help@osg-htc.org","title":"Submitting support inquiries"},{"location":"common/help/#community-specific-support","text":"Some OSG VOs have dedicated forums or mechanisms for community-specific support. If your VO provides user support, that should be a user's first line of support because the VO is most familiar with your applications and requirements. The list of support centers for OSG VOs can be found in the here . Resources for CMS sites: http://www.uscms.org/uscms_at_work/physics/computing/grid/index.shtml CMS Hyper News: https://hypernews.cern.ch/HyperNews/CMS/get/osg-tier3.html CMS Twiki: https://twiki.cern.ch/twiki/bin/viewauth/CMS/USTier3Computing","title":"Community-specific support"},{"location":"common/registration/","text":"Registering with the OSG Consortium \u00b6 OSG staff keeps a registry containing active projects, collaborations (a.k.a. virtual organizations or VOs), resources, and resource downtimes stored as YAML files in the topology GitHub repository . This registry is used for accounting data , contact information, and resource availability. Use this page to learn how to register information in the OSG Consortium. Registration Requirements \u00b6 The instructions in this document require the following: A GitHub account A working knowledge of GitHub collaboration OSG contact registration Registering Contacts \u00b6 OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinating in case of security incidents or troubleshooting services. To register your contact information with the OSG Consortium, follow the instructions in this document . Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations). Registering Resources \u00b6 An OSG resource is a host that provides services to OSG campuses and collaborations; some examples are Compute Entrypoints, storage endpoints, or perfSONAR hosts. See the full list of services that should be registered in the OSG topology here . OSG resources are stored under a hierarchy of facilities, sites, and resource groups, defined as follows: Facility : The institution or company name where your resource is located. Site : Smaller than a facility; typically represents a computing center or an academic department. Frequently used as the display name for accounting dashboards . Resource Group : A logical grouping of resources at a site, i.e. all resources associated with a specific computing cluster. Multi-resource downtimes are easiest to declare across a resource group. Production and testing resources must be placed into separate resource groups. Resource : A host that provides services, e.g. Compute Entrypoints, storage endpoints, or perfSONAR hosts. Throughout this document, you will be asked to substitute your own facility, site, resource group, and resource names when registering with the OSG. If you don't already know the relevant names for your resource, using the following naming conventions: Level Naming convention Facility Unabbreviated institution or company name, e.g. University of Wisconsin - Madison Site Computing center or academic department, e.g. CHTC , MWT2 ATLAS UC , San Diego Supercomputer Center The only characters allowed in Site names are letters, numbers, underscores, hyphens, and spaces; i.e., a Site name must match the regular expression ^[A-Za-z0-9_ -]+$ Resource Group Abbreviated facility, site, and cluster name. Resource groups used for testing purposes should have an -ITB or - ITB suffix, e.g. TCNJ-ELSA-ITB Resource In all capital letters, <ABBREV FACILTY>-<CLUSTER>-<RESOURCE TYPE> , for example: TCNJ-ELSA-CE or NMSU-AGGIE-GRID-SQUID If you don't know which VO to use, pick OSG . OSG resources are stored in the GitHub repository as YAML files under a directory structure that reflects the above hierarchy, i.e. topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml from the root of the topology repository . New site \u00b6 To register a site, first choose a name for it (see the naming conventions table above ) The site name will appear in OSG accounting in places such as the GRACC site dashboard . Once you have chosen a site name, open the following in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology/<FACILITY>/<SITE>/SITE.yaml (replacing <FACILITY> and <SITE> with the facility and the site name that you chose ). \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the site template as a guide. You may leave the ID field blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding AggieGrid cluster for New Mexico State Searching for resources \u00b6 Whether you are registering a new resource or modifying an existing resource, start by searching for the FQDN of your host to avoid any duplicate registrations: Open the topology repository in your browser. Search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search doesn't return any results , skip to these instructions for registering a new resource. If the search returns a single YAML file , open the link to the YAML file and skip to these instructions for modifying existing resources. If the search returns more than one YAML file , please contact us . Note If you are adding a new service to a host which is already registered as a resource, follow the instructions for modifying existing resources. New resources \u00b6 Before registering a new resource, make sure that its FQDN is not already registered . To register a new resource, follow the instructions below: Find the facility, site, and resource group for your resource in the topology repository under this directory structure: topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml . When searching for these, keep in mind that case and spaces matter. If you do not have a facility, contact help@osg-htc.org for help. If you have a facility but not a site, first follow the instructions for registering a site above. If you have a facility and a site but not a resource group, pick a resource group name . Once you have your facility, site, and resource group, follow the instructions below, replacing instances of <FACILITY> , <SITE> , and <RESOURCE GROUP> with the corresponding names that you chose above : If your resource group already exists under your facility and site, open the following URL in your browser: https://github.com/opensciencegrid/topology/edit/master/topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml For example, to add a resource to the CHTC resource group for the CHTC site at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/edit/master/topology/University of Wisconsin/CHTC/CHTC.yaml If your resource group does not exist, open the following URL in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml For example, to create a CHTC-Slurm-HPC resource group for the Center for High Throughput Computing ( CHTC ) at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/new/master?filename=topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding a new compute entrypoint to the CHTC Modifying existing resources \u00b6 To modify an existing resource, follow these instructions: Find the resource that you would like to modify by searching GitHub , and open the link to the YAML file. Click the branch selector button next to the file path and select the master branch. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. Make sure that the formatting and indentation of the modified entry does not change. If you are adding a new service to a host that is already registered as a resource, add the new service to the existing resource; do not create a new resource for the same host. !!! note \"\"You're editing a file in a project you don't have write access to.\"\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating administrative contact information for CHTC-glidein2 Retiring resources \u00b6 To retire an already registered resource, set Active: false . For example: ... Production: true Resources: GLOW: Active: false ... Services: CE: Description: Compute Entrypoint Details: hidden: false If the Active attribute does not already exist within the resource definition, add it. If your resource becomes available again, set Active: true . Registering Resource Downtimes \u00b6 Resource downtime is a finite period of time for which one or more of the services of a registered resource are unavailable. Warning If you expect your resource to be indefinitely unavailable, retire the resource instead of registering a downtime. Downtimes are stored in YAML files alongside the resource group YAML files as described here . For example, downtimes for resources in the CHTC-Slurm-HPC resource group of the CHTC site at the University of Wisconsin can be found and registered in the following file, relative to the root of the topology repository : topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC_downtime.yaml Note Do not put downtime updates in the same pull request as other topology updates. Registering new downtime \u00b6 To register a new downtime for a resource or for multiples resources that are part of a resource group, you will use webforms to generate the contents of the downtime entry, copy it into the downtime file corresponding to your resource, and submit it as a GitHub pull request. Follow the instructions below: Open one of the downtime generation webforms in your browser: Use the resource downtime generator if you only need to declare a downtime for a single resource. Use the resource group downtime generator if you need to declare a downtime for multiple resources across a resource group. Select your facility, site, resource group, and/or resource from the corresponding lists. For the single resource downtime form: Select all the services that will be down. To select multiple, use Control-Click on Windows and Linux, or Command-Click on macOS. Fill the other fields with information about the downtime. Click the Generate button. If the information is valid, a block of text will be displayed in the box labeled Generated YAML . Otherwise, check for error messages and fix your input. Follow the instructions shown below the generated block of text. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Wait for OSG staff to approve and merge your new downtime. Modifying existing downtime \u00b6 In case an already registered downtime is incorrect or need to be updated to reflect new information, you can modify existing downtime entries using the GitHub editor. Failure Changes to the ID or CreatedTime fields will be rejected. To modify an existing downtime entry for a registered resource, manually make the changes in the matching downtime YAML file. Follow the instructions below: Open the topology repository in your browser. If you do not know the facility, site, and resource group of the resource the downtime entry refers to, search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search returns a single YAML file , note the name of the facility, site, and resource group and continue to the next step. If the search doesn't return any results or returns more than one YAML file , please contact us . Open the following URL in your browser using the facility, site, and resource group names to replace <FACILITY> , <SITE> , and <RESOURCE GROUP> , respectively: https://github.com/opensciencegrid/topology/edit/master/topology/<FACILITY>/<SITE>/<RESOURCE GROUP>_downtime.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the downtime template as a reference. Make sure that the formatting and indentation of the modified entry does not change. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Move forward end date for CHTC-glidein2 regular maintenance Wait for OSG staff to approve and merge your modified downtime. Registering Virtual Organizations \u00b6 Virtual Organizations (VOs) are sets of groups or individuals defined by some common cyber-infrastructure need. This can be a scientific experiment, a university campus or a distributed research effort. A VO represents all its members and their common needs in distributed computing environment. A VO also includes the group\u2019s computing/storage resources and services. For more information about VOs, see this page . Info Before submitting a registration for a new VO, please contact us describing your organization's computing needs. VO information is stored as YAML files in the virtual-organizations directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your VO in the list, open the file and continue to the next step. If you do not see your VO in the list, click Create new file button: In the new file dialog, enter <VO>.yaml , replacing <VO> with the name of your VO. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the VO template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the GLOW VO Registering Projects \u00b6 Info Before submitting a registration for a new project, please contact us describing your organization's computing needs. Project information is stored as YAML files in the projects directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your project in the list, open the file and continue to the next step. If you do not see your project in the list, click Create new file button: In the new file dialog, enter <PROJECT>.yaml , replacing <PROJECT> with the name of your project. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the project template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the Mu2e project Getting Help \u00b6 To get assistance, please use the this page .","title":"Resources and Collaborations"},{"location":"common/registration/#registering-with-the-osg-consortium","text":"OSG staff keeps a registry containing active projects, collaborations (a.k.a. virtual organizations or VOs), resources, and resource downtimes stored as YAML files in the topology GitHub repository . This registry is used for accounting data , contact information, and resource availability. Use this page to learn how to register information in the OSG Consortium.","title":"Registering with the OSG Consortium"},{"location":"common/registration/#registration-requirements","text":"The instructions in this document require the following: A GitHub account A working knowledge of GitHub collaboration OSG contact registration","title":"Registration Requirements"},{"location":"common/registration/#registering-contacts","text":"OSG staff keep track of contact information for OSG Consortium participants to provide access to OSG services, notify administrators and security contacts of software and security updates, and coordinating in case of security incidents or troubleshooting services. To register your contact information with the OSG Consortium, follow the instructions in this document . Privacy Notice The OSG treats any email addresses and phone numbers as confidential data but does not make any guarantees of privacy. All other data is public (such as name, GitHub username, and any association with particular services or collaborations).","title":"Registering Contacts"},{"location":"common/registration/#registering-resources","text":"An OSG resource is a host that provides services to OSG campuses and collaborations; some examples are Compute Entrypoints, storage endpoints, or perfSONAR hosts. See the full list of services that should be registered in the OSG topology here . OSG resources are stored under a hierarchy of facilities, sites, and resource groups, defined as follows: Facility : The institution or company name where your resource is located. Site : Smaller than a facility; typically represents a computing center or an academic department. Frequently used as the display name for accounting dashboards . Resource Group : A logical grouping of resources at a site, i.e. all resources associated with a specific computing cluster. Multi-resource downtimes are easiest to declare across a resource group. Production and testing resources must be placed into separate resource groups. Resource : A host that provides services, e.g. Compute Entrypoints, storage endpoints, or perfSONAR hosts. Throughout this document, you will be asked to substitute your own facility, site, resource group, and resource names when registering with the OSG. If you don't already know the relevant names for your resource, using the following naming conventions: Level Naming convention Facility Unabbreviated institution or company name, e.g. University of Wisconsin - Madison Site Computing center or academic department, e.g. CHTC , MWT2 ATLAS UC , San Diego Supercomputer Center The only characters allowed in Site names are letters, numbers, underscores, hyphens, and spaces; i.e., a Site name must match the regular expression ^[A-Za-z0-9_ -]+$ Resource Group Abbreviated facility, site, and cluster name. Resource groups used for testing purposes should have an -ITB or - ITB suffix, e.g. TCNJ-ELSA-ITB Resource In all capital letters, <ABBREV FACILTY>-<CLUSTER>-<RESOURCE TYPE> , for example: TCNJ-ELSA-CE or NMSU-AGGIE-GRID-SQUID If you don't know which VO to use, pick OSG . OSG resources are stored in the GitHub repository as YAML files under a directory structure that reflects the above hierarchy, i.e. topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml from the root of the topology repository .","title":"Registering Resources"},{"location":"common/registration/#new-site","text":"To register a site, first choose a name for it (see the naming conventions table above ) The site name will appear in OSG accounting in places such as the GRACC site dashboard . Once you have chosen a site name, open the following in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology/<FACILITY>/<SITE>/SITE.yaml (replacing <FACILITY> and <SITE> with the facility and the site name that you chose ). \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the site template as a guide. You may leave the ID field blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding AggieGrid cluster for New Mexico State","title":"New site"},{"location":"common/registration/#searching-for-resources","text":"Whether you are registering a new resource or modifying an existing resource, start by searching for the FQDN of your host to avoid any duplicate registrations: Open the topology repository in your browser. Search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search doesn't return any results , skip to these instructions for registering a new resource. If the search returns a single YAML file , open the link to the YAML file and skip to these instructions for modifying existing resources. If the search returns more than one YAML file , please contact us . Note If you are adding a new service to a host which is already registered as a resource, follow the instructions for modifying existing resources.","title":"Searching for resources"},{"location":"common/registration/#new-resources","text":"Before registering a new resource, make sure that its FQDN is not already registered . To register a new resource, follow the instructions below: Find the facility, site, and resource group for your resource in the topology repository under this directory structure: topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml . When searching for these, keep in mind that case and spaces matter. If you do not have a facility, contact help@osg-htc.org for help. If you have a facility but not a site, first follow the instructions for registering a site above. If you have a facility and a site but not a resource group, pick a resource group name . Once you have your facility, site, and resource group, follow the instructions below, replacing instances of <FACILITY> , <SITE> , and <RESOURCE GROUP> with the corresponding names that you chose above : If your resource group already exists under your facility and site, open the following URL in your browser: https://github.com/opensciencegrid/topology/edit/master/topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml For example, to add a resource to the CHTC resource group for the CHTC site at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/edit/master/topology/University of Wisconsin/CHTC/CHTC.yaml If your resource group does not exist, open the following URL in your browser: https://github.com/opensciencegrid/topology/new/master?filename=topology/<FACILITY>/<SITE>/<RESOURCE GROUP>.yaml For example, to create a CHTC-Slurm-HPC resource group for the Center for High Throughput Computing ( CHTC ) at the University of Wisconsin , open the following URL: https://github.com/opensciencegrid/topology/new/master?filename=topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. When adding new entries, make sure that the formatting and indentation of your entry matches that of the template. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Adding a new compute entrypoint to the CHTC","title":"New resources"},{"location":"common/registration/#modifying-existing-resources","text":"To modify an existing resource, follow these instructions: Find the resource that you would like to modify by searching GitHub , and open the link to the YAML file. Click the branch selector button next to the file path and select the master branch. Make changes with the GitHub file editor using the resource group template as a guide. You may leave any ID or GroupID fields blank. Make sure that the formatting and indentation of the modified entry does not change. If you are adding a new service to a host that is already registered as a resource, add the new service to the existing resource; do not create a new resource for the same host. !!! note \"\"You're editing a file in a project you don't have write access to.\"\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating administrative contact information for CHTC-glidein2","title":"Modifying existing resources"},{"location":"common/registration/#retiring-resources","text":"To retire an already registered resource, set Active: false . For example: ... Production: true Resources: GLOW: Active: false ... Services: CE: Description: Compute Entrypoint Details: hidden: false If the Active attribute does not already exist within the resource definition, add it. If your resource becomes available again, set Active: true .","title":"Retiring resources"},{"location":"common/registration/#registering-resource-downtimes","text":"Resource downtime is a finite period of time for which one or more of the services of a registered resource are unavailable. Warning If you expect your resource to be indefinitely unavailable, retire the resource instead of registering a downtime. Downtimes are stored in YAML files alongside the resource group YAML files as described here . For example, downtimes for resources in the CHTC-Slurm-HPC resource group of the CHTC site at the University of Wisconsin can be found and registered in the following file, relative to the root of the topology repository : topology/University of Wisconsin/CHTC/CHTC-Slurm-HPC_downtime.yaml Note Do not put downtime updates in the same pull request as other topology updates.","title":"Registering Resource Downtimes"},{"location":"common/registration/#registering-new-downtime","text":"To register a new downtime for a resource or for multiples resources that are part of a resource group, you will use webforms to generate the contents of the downtime entry, copy it into the downtime file corresponding to your resource, and submit it as a GitHub pull request. Follow the instructions below: Open one of the downtime generation webforms in your browser: Use the resource downtime generator if you only need to declare a downtime for a single resource. Use the resource group downtime generator if you need to declare a downtime for multiple resources across a resource group. Select your facility, site, resource group, and/or resource from the corresponding lists. For the single resource downtime form: Select all the services that will be down. To select multiple, use Control-Click on Windows and Linux, or Command-Click on macOS. Fill the other fields with information about the downtime. Click the Generate button. If the information is valid, a block of text will be displayed in the box labeled Generated YAML . Otherwise, check for error messages and fix your input. Follow the instructions shown below the generated block of text. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Wait for OSG staff to approve and merge your new downtime.","title":"Registering new downtime"},{"location":"common/registration/#modifying-existing-downtime","text":"In case an already registered downtime is incorrect or need to be updated to reflect new information, you can modify existing downtime entries using the GitHub editor. Failure Changes to the ID or CreatedTime fields will be rejected. To modify an existing downtime entry for a registered resource, manually make the changes in the matching downtime YAML file. Follow the instructions below: Open the topology repository in your browser. If you do not know the facility, site, and resource group of the resource the downtime entry refers to, search the repository for the FQDN of your resource wrapped in double-quotes using the GitHub search bar (e.g., \"glidein2.chtc.wisc.edu\" ): If the search returns a single YAML file , note the name of the facility, site, and resource group and continue to the next step. If the search doesn't return any results or returns more than one YAML file , please contact us . Open the following URL in your browser using the facility, site, and resource group names to replace <FACILITY> , <SITE> , and <RESOURCE GROUP> , respectively: https://github.com/opensciencegrid/topology/edit/master/topology/<FACILITY>/<SITE>/<RESOURCE GROUP>_downtime.yaml \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the downtime template as a reference. Make sure that the formatting and indentation of the modified entry does not change. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Move forward end date for CHTC-glidein2 regular maintenance Wait for OSG staff to approve and merge your modified downtime.","title":"Modifying existing downtime"},{"location":"common/registration/#registering-virtual-organizations","text":"Virtual Organizations (VOs) are sets of groups or individuals defined by some common cyber-infrastructure need. This can be a scientific experiment, a university campus or a distributed research effort. A VO represents all its members and their common needs in distributed computing environment. A VO also includes the group\u2019s computing/storage resources and services. For more information about VOs, see this page . Info Before submitting a registration for a new VO, please contact us describing your organization's computing needs. VO information is stored as YAML files in the virtual-organizations directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your VO in the list, open the file and continue to the next step. If you do not see your VO in the list, click Create new file button: In the new file dialog, enter <VO>.yaml , replacing <VO> with the name of your VO. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the VO template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the GLOW VO","title":"Registering Virtual Organizations"},{"location":"common/registration/#registering-projects","text":"Info Before submitting a registration for a new project, please contact us describing your organization's computing needs. Project information is stored as YAML files in the projects directory of the topology repository . To modify a VO's information or register a new VO, follow the instructions below: Open the topology repository in your browser. If you see your project in the list, open the file and continue to the next step. If you do not see your project in the list, click Create new file button: In the new file dialog, enter <PROJECT>.yaml , replacing <PROJECT> with the name of your project. \"You're editing a file in a project you don't have write access to.\" If you see this message in the GitHub file editor, this is normal and it is because you do not have direct write access to the OSG copy of the topology data, which is why you are creating a pull request. Make changes with the GitHub file editor using the project template as a guide. You may leave any ID fields blank. If you are modifying existing entries, make sure you do not change formatting or indentation of the modified entry. Submit your changes as a pull request; select \"opensciencegrid/topology\" as the base repo. Provide a descriptive commit message, for example: Updating contact information for the Mu2e project","title":"Registering Projects"},{"location":"common/registration/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"common/yum/","text":"OSG Yum Repositories \u00b6 This document introduces Yum repositories and how they are used in the OSG. If you are unfamiliar with Yum, see the documentation on using Yum and RPM . Repositories \u00b6 The OSG hosts multiple repositories at repo.opensciencegrid.org that are intended for public use: The OSG Yum repositories... Contain RPMs that... osg , osg-upcoming are considered production-ready (default). osg-testing , osg-upcoming-testing have passed developer or integration testing but not acceptance testing osg-development , osg-upcoming-development have not passed developer, integration or acceptance testing. Do not use without instruction from the OSG Software and Release Team. osg-contrib have been contributed from outside of the OSG Software and Release Team. See this section for details. Note The upcoming repositories contain newer software that might require manual action after an update. They are not enabled by default and must be enabled in addition to the main osg repository. See the upcoming software section for details. OSG's RPM packages also rely on external packages provided by supported OSes and EPEL. You must have the following repositories available and enabled: OS repositories, including the following ones that aren't enabled by default: extras (SL 7, CentOS 7, CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) Server-Extras (RHEL 7) powertools (CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) CodeReady Builder (RHEL 8) or crb (all EL9 variants) EPEL repositories OSG repositories If any of these repositories are missing, you may end up with installation issues or missing dependencies. Danger Other repositories, such as jpackage , dag , or rpmforge , are not supported and you may encounter problems if you use them. Upcoming Software \u00b6 Certain sites have requested new versions of software that would be considered \"disruptive\" or \"experimental\": upgrading to them would likely require manual intervention after their installation. We do not want sites to unwittingly upgrade to these versions. We have placed such software in separate repositories. Their names start with osg-upcoming and have the same structure as our standard repositories, as well as the same guarantees of quality and production-readiness. There are separate sets of upcoming repositories for each release series. For example, the OSG 23 repos have corresponding 23-upcoming repos . The upcoming repositories are meant to be layered on top of our standard repositories: installing software from the upcoming repositories requires also enabling the standard repositories from the same release. Contrib Software \u00b6 In addition to our regular software repositories, we also have a contrib (short for \"contributed\") software repository. This is software that is does not go through the same software testing and release processes as the official OSG Software release, but may be useful to you. Particularly, contrib software is not guaranteed to be compatible with the rest of the OSG Software stack nor is it supported by the OSG. The definitive list of software in the contrib repository can be found here: OSG 23 EL8 contrib software repository OSG 23 EL9 contrib software repository OSG 3.6 EL7 contrib software repository OSG 3.6 EL8 contrib software repository OSG 3.6 EL9 contrib software repository If you would like to distribute your software in the OSG contrib repository, please contact us with a description of your software, what users it serves, and relevant RPM packaging. Installing Yum Repositories \u00b6 Install the Yum priorities plugin (EL7) \u00b6 The Yum priorities plugin is used to tell Yum to prefer OSG packages over EPEL or OS packages. It is important to install and enable the Yum priorities plugin before installing OSG Software to ensure that you are getting the OSG-supported versions. This plugin is built into Yum on EL8 and EL9 distributions. Install the Yum priorities package: root@host # yum install yum-plugin-priorities Ensure that /etc/yum.conf has the following line in the [main] section: plugins=1 Enable additional OS repositories \u00b6 Some packages depend on packages that are in OS repositories not enabled by default. The repositories to enable, as well as the instructions to enable them, are OS-dependent. Note A repository is enabled if it has enabled=1 in its definition, or if the enabled line is missing (i.e. it is enabled unless specified otherwise.) SL 7 \u00b6 Install the yum-conf-extras RPM package. Ensure that the sl-extras repo in /etc/yum.repos.d/sl-extras.repo is enabled. CentOS 7 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/CentOS-Base.repo is enabled. CentOS Stream 8 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/CentOS-Stream-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/CentOS-Stream-PowerTools.repo is enabled. Rocky Linux 8 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/Rocky-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/Rocky-PowerTools.repo is enabled. AlmaLinux 8 \u00b6 Ensure that the extras repo in /etc/yum.repos.d/almalinux.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/almalinux-powertools.repo is enabled. RHEL 7 \u00b6 Ensure that the Server-Extras channel is enabled. RHEL 8 \u00b6 Ensure that the CodeReady Linux Builder channel is enabled. See Red Hat's instructions on how to enable this repo. Rocky Linux 9 \u00b6 Ensure that the crb repo in /etc/yum.repos.d/rocky.repo is enabled AlmaLinux 9 \u00b6 Ensure that the crb repo in /etc/yum.repos.d/almalinux-crb.repo is enabled CentOS Stream 9 \u00b6 Ensure that the crb repo in /etc/yum.repos.d/centos.repo is enabled Install the EPEL repositories \u00b6 OSG software depends on packages distributed via the EPEL repositories. You must install and enable these first. Install the EPEL repository, if not already present. Choose the right version to match your OS version. # # EPEL 7 (For RHEL 7, CentOS 7, and SL 7) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # # EPEL 8 (For RHEL 8 and CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm # # EPEL 9 (For RHEL 9 and CentOS Stream 9, Rocky Linux 9, AlmaLinux 9) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm Verify that /etc/yum.repos.d/epel.repo exists; the [epel] section should contain: The line enabled=1 Either no priority setting, or a priority setting that is 99 or higher Warning If you have your own mirror or configuration of the EPEL repository, you MUST verify that the priority of the EPEL repository is either missing, or 99 or a higher number. The OSG repositories must have a better (numerically lower) priority than the EPEL repositories; otherwise, you might have dependency resolution (\"depsolving\") issues. Install the OSG Repositories \u00b6 This document assumes a fresh install. For instructions on upgrading from one OSG series to another, see the release series document . Install the OSG repository for your OS version and the OSG release series that you wish to use: OSG 23 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el8-release-latest.rpm OSG 23 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el9-release-latest.rpm OSG 3.6 EL7: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm OSG 3.6 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el8-release-latest.rpm OSG 3.6 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el9-release-latest.rpm The only OSG repository enabled by default is the release one. If you want to enable another one (e.g. osg-testing ), then edit its file (e.g. /etc/yum.repos.d/osg-testing.repo ) and change the enabled option from 0 to 1: [osg-testing] name=OSG Software for Enterprise Linux 7 - Testing - $basearch #baseurl=https://repo.opensciencegrid.org/osg/3.6/el7/testing/$basearch mirrorlist=https://repo.opensciencegrid.org/mirror/osg/3.6/el7/testing/$basearch failovermethod=priority priority=98 enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 Optional Configuration \u00b6 Enable automatic security updates \u00b6 For production services, we suggest only changing software versions during controlled downtime. Therefore we recommend security-only automatic updates or disabling automatic updates entirely. Note Automatic updates for EL8 and EL9 variants are provided in the dnf-automatic RPM, which is not installed by default. To enable only security related automatic updates: On EL 7 variants, edit /etc/yum/yum-cron.conf and set update_cmd = security On EL8 and EL9 variants, edit /etc/dnf/automatic.conf and set upgrade_type = security CentOS 7, CentOS Stream 8, and CentOS Stream 9 do not support security-only automatic updates; doing any of the above steps will prevent automatic updates from happening at all. To disable automatic updates entirely: On EL7 variants, run: root@host # service yum-cron stop On EL8 and EL9 variants, run: root@host # systemctl disable --now dnf-automatic.timer Configuring Spacewalk priorities \u00b6 Sites using Spacewalk to manage RPM packages will need to configure OSG Yum repository priorities using their Spacewalk ID. For example, if the OSG 3.4 repository's Spacewalk ID is centos_7_osg34_dev , modify /etc/yum/pluginconf.d/90-osg.conf to include the following: [centos_7_osg_34_dev] priority = 98 Repository Mirrors \u00b6 If you run a large site (>20 nodes), you should consider setting up a local mirror for the OSG repositories. A local Yum mirror allows you to reduce the amount of external bandwidth used when updating or installing packages. Add the following to a file in /etc/cron.d : <RANDOM> * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg/ /var/www/html/osg/ Or, to mirror only a single repository: <RANDOM> * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg/<OSG_RELEASE>/el9/development /var/www/html/osg/<OSG_RELEASE>/el7 Replace <OSG_RELEASE> with the OSG release you would like to use (e.g. 23-main ) and <RANDOM> with a number between 0 and 59. On your worker node, you can replace the baseurl line of /etc/yum.repos.d/osg.repo with the appropriate URL for your mirror. If you are interested in having your mirror be part of the OSG's default set of mirrors, please file a support ticket . Reference \u00b6 Basic use of Yum","title":"OSG Yum Repos"},{"location":"common/yum/#osg-yum-repositories","text":"This document introduces Yum repositories and how they are used in the OSG. If you are unfamiliar with Yum, see the documentation on using Yum and RPM .","title":"OSG Yum Repositories"},{"location":"common/yum/#repositories","text":"The OSG hosts multiple repositories at repo.opensciencegrid.org that are intended for public use: The OSG Yum repositories... Contain RPMs that... osg , osg-upcoming are considered production-ready (default). osg-testing , osg-upcoming-testing have passed developer or integration testing but not acceptance testing osg-development , osg-upcoming-development have not passed developer, integration or acceptance testing. Do not use without instruction from the OSG Software and Release Team. osg-contrib have been contributed from outside of the OSG Software and Release Team. See this section for details. Note The upcoming repositories contain newer software that might require manual action after an update. They are not enabled by default and must be enabled in addition to the main osg repository. See the upcoming software section for details. OSG's RPM packages also rely on external packages provided by supported OSes and EPEL. You must have the following repositories available and enabled: OS repositories, including the following ones that aren't enabled by default: extras (SL 7, CentOS 7, CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) Server-Extras (RHEL 7) powertools (CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) CodeReady Builder (RHEL 8) or crb (all EL9 variants) EPEL repositories OSG repositories If any of these repositories are missing, you may end up with installation issues or missing dependencies. Danger Other repositories, such as jpackage , dag , or rpmforge , are not supported and you may encounter problems if you use them.","title":"Repositories"},{"location":"common/yum/#upcoming-software","text":"Certain sites have requested new versions of software that would be considered \"disruptive\" or \"experimental\": upgrading to them would likely require manual intervention after their installation. We do not want sites to unwittingly upgrade to these versions. We have placed such software in separate repositories. Their names start with osg-upcoming and have the same structure as our standard repositories, as well as the same guarantees of quality and production-readiness. There are separate sets of upcoming repositories for each release series. For example, the OSG 23 repos have corresponding 23-upcoming repos . The upcoming repositories are meant to be layered on top of our standard repositories: installing software from the upcoming repositories requires also enabling the standard repositories from the same release.","title":"Upcoming Software"},{"location":"common/yum/#contrib-software","text":"In addition to our regular software repositories, we also have a contrib (short for \"contributed\") software repository. This is software that is does not go through the same software testing and release processes as the official OSG Software release, but may be useful to you. Particularly, contrib software is not guaranteed to be compatible with the rest of the OSG Software stack nor is it supported by the OSG. The definitive list of software in the contrib repository can be found here: OSG 23 EL8 contrib software repository OSG 23 EL9 contrib software repository OSG 3.6 EL7 contrib software repository OSG 3.6 EL8 contrib software repository OSG 3.6 EL9 contrib software repository If you would like to distribute your software in the OSG contrib repository, please contact us with a description of your software, what users it serves, and relevant RPM packaging.","title":"Contrib Software"},{"location":"common/yum/#installing-yum-repositories","text":"","title":"Installing Yum Repositories"},{"location":"common/yum/#install-the-yum-priorities-plugin-el7","text":"The Yum priorities plugin is used to tell Yum to prefer OSG packages over EPEL or OS packages. It is important to install and enable the Yum priorities plugin before installing OSG Software to ensure that you are getting the OSG-supported versions. This plugin is built into Yum on EL8 and EL9 distributions. Install the Yum priorities package: root@host # yum install yum-plugin-priorities Ensure that /etc/yum.conf has the following line in the [main] section: plugins=1","title":"Install the Yum priorities plugin (EL7)"},{"location":"common/yum/#enable-additional-os-repositories","text":"Some packages depend on packages that are in OS repositories not enabled by default. The repositories to enable, as well as the instructions to enable them, are OS-dependent. Note A repository is enabled if it has enabled=1 in its definition, or if the enabled line is missing (i.e. it is enabled unless specified otherwise.)","title":"Enable additional OS repositories"},{"location":"common/yum/#sl-7","text":"Install the yum-conf-extras RPM package. Ensure that the sl-extras repo in /etc/yum.repos.d/sl-extras.repo is enabled.","title":"SL 7"},{"location":"common/yum/#centos-7","text":"Ensure that the extras repo in /etc/yum.repos.d/CentOS-Base.repo is enabled.","title":"CentOS 7"},{"location":"common/yum/#centos-stream-8","text":"Ensure that the extras repo in /etc/yum.repos.d/CentOS-Stream-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/CentOS-Stream-PowerTools.repo is enabled.","title":"CentOS Stream 8"},{"location":"common/yum/#rocky-linux-8","text":"Ensure that the extras repo in /etc/yum.repos.d/Rocky-Extras.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/Rocky-PowerTools.repo is enabled.","title":"Rocky Linux 8"},{"location":"common/yum/#almalinux-8","text":"Ensure that the extras repo in /etc/yum.repos.d/almalinux.repo is enabled. Ensure that the powertools repo in /etc/yum.repos.d/almalinux-powertools.repo is enabled.","title":"AlmaLinux 8"},{"location":"common/yum/#rhel-7","text":"Ensure that the Server-Extras channel is enabled.","title":"RHEL 7"},{"location":"common/yum/#rhel-8","text":"Ensure that the CodeReady Linux Builder channel is enabled. See Red Hat's instructions on how to enable this repo.","title":"RHEL 8"},{"location":"common/yum/#rocky-linux-9","text":"Ensure that the crb repo in /etc/yum.repos.d/rocky.repo is enabled","title":"Rocky Linux 9"},{"location":"common/yum/#almalinux-9","text":"Ensure that the crb repo in /etc/yum.repos.d/almalinux-crb.repo is enabled","title":"AlmaLinux 9"},{"location":"common/yum/#centos-stream-9","text":"Ensure that the crb repo in /etc/yum.repos.d/centos.repo is enabled","title":"CentOS Stream 9"},{"location":"common/yum/#install-the-epel-repositories","text":"OSG software depends on packages distributed via the EPEL repositories. You must install and enable these first. Install the EPEL repository, if not already present. Choose the right version to match your OS version. # # EPEL 7 (For RHEL 7, CentOS 7, and SL 7) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # # EPEL 8 (For RHEL 8 and CentOS Stream 8, Rocky Linux 8, AlmaLinux 8) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm # # EPEL 9 (For RHEL 9 and CentOS Stream 9, Rocky Linux 9, AlmaLinux 9) root@host # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm Verify that /etc/yum.repos.d/epel.repo exists; the [epel] section should contain: The line enabled=1 Either no priority setting, or a priority setting that is 99 or higher Warning If you have your own mirror or configuration of the EPEL repository, you MUST verify that the priority of the EPEL repository is either missing, or 99 or a higher number. The OSG repositories must have a better (numerically lower) priority than the EPEL repositories; otherwise, you might have dependency resolution (\"depsolving\") issues.","title":"Install the EPEL repositories"},{"location":"common/yum/#install-the-osg-repositories","text":"This document assumes a fresh install. For instructions on upgrading from one OSG series to another, see the release series document . Install the OSG repository for your OS version and the OSG release series that you wish to use: OSG 23 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el8-release-latest.rpm OSG 23 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/23-main/osg-23-main-el9-release-latest.rpm OSG 3.6 EL7: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm OSG 3.6 EL8: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el8-release-latest.rpm OSG 3.6 EL9: root@host # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el9-release-latest.rpm The only OSG repository enabled by default is the release one. If you want to enable another one (e.g. osg-testing ), then edit its file (e.g. /etc/yum.repos.d/osg-testing.repo ) and change the enabled option from 0 to 1: [osg-testing] name=OSG Software for Enterprise Linux 7 - Testing - $basearch #baseurl=https://repo.opensciencegrid.org/osg/3.6/el7/testing/$basearch mirrorlist=https://repo.opensciencegrid.org/mirror/osg/3.6/el7/testing/$basearch failovermethod=priority priority=98 enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2","title":"Install the OSG Repositories"},{"location":"common/yum/#optional-configuration","text":"","title":"Optional Configuration"},{"location":"common/yum/#enable-automatic-security-updates","text":"For production services, we suggest only changing software versions during controlled downtime. Therefore we recommend security-only automatic updates or disabling automatic updates entirely. Note Automatic updates for EL8 and EL9 variants are provided in the dnf-automatic RPM, which is not installed by default. To enable only security related automatic updates: On EL 7 variants, edit /etc/yum/yum-cron.conf and set update_cmd = security On EL8 and EL9 variants, edit /etc/dnf/automatic.conf and set upgrade_type = security CentOS 7, CentOS Stream 8, and CentOS Stream 9 do not support security-only automatic updates; doing any of the above steps will prevent automatic updates from happening at all. To disable automatic updates entirely: On EL7 variants, run: root@host # service yum-cron stop On EL8 and EL9 variants, run: root@host # systemctl disable --now dnf-automatic.timer","title":"Enable automatic security updates"},{"location":"common/yum/#configuring-spacewalk-priorities","text":"Sites using Spacewalk to manage RPM packages will need to configure OSG Yum repository priorities using their Spacewalk ID. For example, if the OSG 3.4 repository's Spacewalk ID is centos_7_osg34_dev , modify /etc/yum/pluginconf.d/90-osg.conf to include the following: [centos_7_osg_34_dev] priority = 98","title":"Configuring Spacewalk priorities"},{"location":"common/yum/#repository-mirrors","text":"If you run a large site (>20 nodes), you should consider setting up a local mirror for the OSG repositories. A local Yum mirror allows you to reduce the amount of external bandwidth used when updating or installing packages. Add the following to a file in /etc/cron.d : <RANDOM> * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg/ /var/www/html/osg/ Or, to mirror only a single repository: <RANDOM> * * * * root rsync -aH rsync://repo-rsync.opensciencegrid.org/osg/<OSG_RELEASE>/el9/development /var/www/html/osg/<OSG_RELEASE>/el7 Replace <OSG_RELEASE> with the OSG release you would like to use (e.g. 23-main ) and <RANDOM> with a number between 0 and 59. On your worker node, you can replace the baseurl line of /etc/yum.repos.d/osg.repo with the appropriate URL for your mirror. If you are interested in having your mirror be part of the OSG's default set of mirrors, please file a support ticket .","title":"Repository Mirrors"},{"location":"common/yum/#reference","text":"Basic use of Yum","title":"Reference"},{"location":"compute-element/covid-19/","text":"Supporting COVID-19 Research on the OSG \u00b6 Info The instructions in this document are deprecated, as COVID-19 jobs are no longer prioritized. There a few options available for sites with computing resources who want to support the important and urgent work of COVID-19 researchers using the OSG. As we're currently routing such projects through the OSG VO, your site can be configured to accept pilots that exclusively run OSG VO jobs relating to COVID-19 research (among other pilots you support), allowing you to prioritize these pilots and account for this usage separately from other OSG activity. To support COVID-19 work, the overall process includes the following: Make the site computing resources available through a HTCondor-CE if you have not already done so. You can install a locally-managed instance or ask OSG to host the CE on your behalf. If neither solution is viable, or you'd like to discuss the options, please send email to help@osg-htc.org and we'll work with you to arrive at the best solution. If you already provide resources through an OSG Hosted CE, skip to this section . Enable the OSG VO on your HTCondor-CE. Setup a job route specific to COVID-19 pilot jobs (documented below). The job route will allow you to prioritize these jobs using local policy in your site's cluster. (Optional) To attract more user jobs, install CVMFS and Apptainer on your site's worker nodes Send email to help@osg-htc.org requesting that your CE receive COVID-19 pilots. We will need to know the CE hostname and any special restrictions that might apply to these pilots. Setting up a COVID-19 Job Route \u00b6 By default, COVID-19 pilots will look identical to OSG pilots except they will have the attribute IsCOVID19 = true . They do not require mapping to a distinct Unix account but can be sent to a prioritized queue or accounting group. Job routes are controlled by the JOB_ROUTER_ENTRIES configuration variable in HTCondor-CE. Customizations may be placed in /etc/condor-ce/config.d/ where files are parsed in lexicographical order, e.g. JOB_ROUTER_ENTRIES specified in 50-covid-routes.conf will override JOB_ROUTER_ENTRIES in 02-local-slurm.conf . For Non-HTCondor batch systems \u00b6 To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"covid19\"; Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing slurm in the GridResource attribute with the appropriate value for your batch system (e.g., lsf , pbs , sge , or slurm ); and the value of set_default_queue with the name of the partition or queue of your local batch system dedicated to COVID-19 work. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) If your configuration does not already define JOB_ROUTER_ROUTE_NAMES , you need to add the name of all previous routes to it, leaving OSG_COVID19_Jobs at the start of the list. For example: JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, Local_Condor, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration For HTCondor batch systems \u00b6 Similarly, at an HTCondor site, one can place these jobs into a separate accounting group by providing the set_AcctGroup and eval_set_AccountingGroup attributes in a new job route. To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; TargetUniverse = 5; set_AcctGroup = \"covid19\"; eval_set_AccountingGroup = strcat(AcctGroup, \".\", Owner); Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing covid19 in set_AcctGroup with the name of the accounting group that you would like to use for COVID-19 jobs. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"atlas\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration Verifying the COVID-19 Job Route \u00b6 To verify that your HTCondor-CE is configured to support COVID-19 jobs, perform the following steps: Ensure that the OSG_COVID19_Jobs route appears with all of your other previously enabled routes: condor_ce_job_router_info -config Known issue: removing old routes If your HTCondor-CE has jobs associated with a route that is removed from your configuration, this will result in a crashing Job Router. If you accidentally remove an old route, restore the route or remove all jobs associated with said route. Ensure that COVID-19 jobs will match to your new job route: For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: OSG_COVID19_Jobs should be the first route in the routing table: condor_ce_config_val -verbose JOB_ROUTER_ROUTE_NAMES For older versions of HTCondor: the Requirements expresison of your OSG_COVID19_Jobs route must contain (TARGET.IsCOVID19 =?= true) and all other routes must contain (TARGET.IsCOVID19 =!= true) in their Requirements expression. After requesting COVID-19 jobs , verify that jobs are being routed appropriately, by examining pilots with condor_ce_router_q . Requesting COVID-19 Jobs \u00b6 To receive COVID-19 pilot jobs, send an email to help@osg-htc.org with the subject Requesting COVID-19 pilots and the following information: Whether you want to receive only COVID-19 jobs, or if you want to accept COVID-19 and other OSG jobs The hostname(s) of your HTCondor-CE(s) Any other restrictions that may apply to these jobs (e.g. number of available cores) Viewing COVID-19 Contributions \u00b6 You can view how many hours that COVID-19 projects have consumed at your site with this GRACC dashboard . Getting Help \u00b6 To get assistance, please use this page .","title":"Supporting COVID-19 Research on the OSG"},{"location":"compute-element/covid-19/#supporting-covid-19-research-on-the-osg","text":"Info The instructions in this document are deprecated, as COVID-19 jobs are no longer prioritized. There a few options available for sites with computing resources who want to support the important and urgent work of COVID-19 researchers using the OSG. As we're currently routing such projects through the OSG VO, your site can be configured to accept pilots that exclusively run OSG VO jobs relating to COVID-19 research (among other pilots you support), allowing you to prioritize these pilots and account for this usage separately from other OSG activity. To support COVID-19 work, the overall process includes the following: Make the site computing resources available through a HTCondor-CE if you have not already done so. You can install a locally-managed instance or ask OSG to host the CE on your behalf. If neither solution is viable, or you'd like to discuss the options, please send email to help@osg-htc.org and we'll work with you to arrive at the best solution. If you already provide resources through an OSG Hosted CE, skip to this section . Enable the OSG VO on your HTCondor-CE. Setup a job route specific to COVID-19 pilot jobs (documented below). The job route will allow you to prioritize these jobs using local policy in your site's cluster. (Optional) To attract more user jobs, install CVMFS and Apptainer on your site's worker nodes Send email to help@osg-htc.org requesting that your CE receive COVID-19 pilots. We will need to know the CE hostname and any special restrictions that might apply to these pilots.","title":"Supporting COVID-19 Research on the OSG"},{"location":"compute-element/covid-19/#setting-up-a-covid-19-job-route","text":"By default, COVID-19 pilots will look identical to OSG pilots except they will have the attribute IsCOVID19 = true . They do not require mapping to a distinct Unix account but can be sent to a prioritized queue or accounting group. Job routes are controlled by the JOB_ROUTER_ENTRIES configuration variable in HTCondor-CE. Customizations may be placed in /etc/condor-ce/config.d/ where files are parsed in lexicographical order, e.g. JOB_ROUTER_ENTRIES specified in 50-covid-routes.conf will override JOB_ROUTER_ENTRIES in 02-local-slurm.conf .","title":"Setting up a COVID-19 Job Route"},{"location":"compute-element/covid-19/#for-non-htcondor-batch-systems","text":"To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"covid19\"; Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing slurm in the GridResource attribute with the appropriate value for your batch system (e.g., lsf , pbs , sge , or slurm ); and the value of set_default_queue with the name of the partition or queue of your local batch system dedicated to COVID-19 work. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) If your configuration does not already define JOB_ROUTER_ROUTE_NAMES , you need to add the name of all previous routes to it, leaving OSG_COVID19_Jobs at the start of the list. For example: JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, Local_Condor, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Slurm\" GridResource = \"batch slurm\"; TargetUniverse = 9; set_default_queue = \"atlas; Requirements = (TARGET.Owner =!= \"osg\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration","title":"For Non-HTCondor batch systems"},{"location":"compute-element/covid-19/#for-htcondor-batch-systems","text":"Similarly, at an HTCondor site, one can place these jobs into a separate accounting group by providing the set_AcctGroup and eval_set_AccountingGroup attributes in a new job route. To add a new route for COVID-19 pilots for non-HTCondor batch systems: Note the names of your currently enabled routes: condor_ce_job_router_info -config Add the following configuration to a file in /etc/condor-ce/config.d/ (files are parsed in lexicographical order): JOB_ROUTER_ENTRIES @=jre [ name = \"OSG_COVID19_Jobs\"; TargetUniverse = 5; set_AcctGroup = \"covid19\"; eval_set_AccountingGroup = strcat(AcctGroup, \".\", Owner); Requirements = (TARGET.IsCOVID19 =?= true); ] $(JOB_ROUTER_ENTRIES) @jre Replacing covid19 in set_AcctGroup with the name of the accounting group that you would like to use for COVID-19 jobs. Ensure that COVID-19 jobs match to the new route. Choose one of the options below depending on your HTCondor version ( condor_version ): For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: specify the routes considered by the job router and the order in which they're considered by adding the following configuration to a file in /etc/condor-ce/config.d/ : JOB_ROUTER_ROUTE_NAMES = OSG_COVID19_Jobs, $(JOB_ROUTER_ROUTE_NAMES) For older versions of HTCondor: add (TARGET.IsCOVID19 =!= true) to the Requirements of any existing routes. For example, the following job route: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"osg\"); ] @jre Should be updated as follows: JOB_ROUTER_ENTRIES @=jre [ name = \"Local_Condor\" TargetUniverse = 5; Requirements = (TARGET.Owner =!= \"atlas\") && (TARGET.IsCOVID19 =!= true); ] @jre Reconfigure your HTCondor-CE: condor_ce_reconfig Continue onto this section to verify your configuration","title":"For HTCondor batch systems"},{"location":"compute-element/covid-19/#verifying-the-covid-19-job-route","text":"To verify that your HTCondor-CE is configured to support COVID-19 jobs, perform the following steps: Ensure that the OSG_COVID19_Jobs route appears with all of your other previously enabled routes: condor_ce_job_router_info -config Known issue: removing old routes If your HTCondor-CE has jobs associated with a route that is removed from your configuration, this will result in a crashing Job Router. If you accidentally remove an old route, restore the route or remove all jobs associated with said route. Ensure that COVID-19 jobs will match to your new job route: For versions of HTCondor >= 8.8.7 and < 8.9.0; or HTCondor >= 8.9.6: OSG_COVID19_Jobs should be the first route in the routing table: condor_ce_config_val -verbose JOB_ROUTER_ROUTE_NAMES For older versions of HTCondor: the Requirements expresison of your OSG_COVID19_Jobs route must contain (TARGET.IsCOVID19 =?= true) and all other routes must contain (TARGET.IsCOVID19 =!= true) in their Requirements expression. After requesting COVID-19 jobs , verify that jobs are being routed appropriately, by examining pilots with condor_ce_router_q .","title":"Verifying the COVID-19 Job Route"},{"location":"compute-element/covid-19/#requesting-covid-19-jobs","text":"To receive COVID-19 pilot jobs, send an email to help@osg-htc.org with the subject Requesting COVID-19 pilots and the following information: Whether you want to receive only COVID-19 jobs, or if you want to accept COVID-19 and other OSG jobs The hostname(s) of your HTCondor-CE(s) Any other restrictions that may apply to these jobs (e.g. number of available cores)","title":"Requesting COVID-19 Jobs"},{"location":"compute-element/covid-19/#viewing-covid-19-contributions","text":"You can view how many hours that COVID-19 projects have consumed at your site with this GRACC dashboard .","title":"Viewing COVID-19 Contributions"},{"location":"compute-element/covid-19/#getting-help","text":"To get assistance, please use this page .","title":"Getting Help"},{"location":"compute-element/hosted-ce/","text":"Requesting an OSG Hosted CE \u00b6 An OSG Hosted Compute Entrypoint (CE) is the entry point for resource requests coming from the OSG; it handles authorization and delegation of resource requests to your existing campus HPC/HTC cluster. Many sites set up their compute entrypoint locally. As an alternative, OSG offers a no-cost Hosted CE option wherein the OSG team will host and operate the HTCondor Compute Entrypoint, and configure it for the communities that you choose to support. This document explains the requirements and the procedure for requesting an OSG Hosted CE. Running more than 10,000 resource requests The Hosted CE can support thousands of concurrent resource request submissions. If you wish to run your own local compute entrypoint or expect to support more than 10,000 concurrently running OSG resource requests, see this page for installing the HTCondor-CE. Before Starting \u00b6 Before preparing your cluster for OSG resource requests, consider the following requirements: An existing compute cluster with a supported batch system running on a supported operating system Outbound network connectivity from the worker nodes (they can be behind NAT) One or more Unix accounts on your cluster's submit server with the following capabilities: Accessible via SSH key Use of SSH remote port forwarding ( AllowTcpForwarding yes ) and SSH multiplexing ( MaxSessions 10 or greater) Permission to submit jobs to your local cluster. Shared user home directories between the submit server and the worker nodes. Not required for HTCondor clusters: see this section for more details. Temporary scratch space on each worker node; site administrators should ensure that files in this directory are regularly cleaned out. OSG resource contributors must inform the OSG of any relevant changes to their site. Site downtimes For an improved turnaround time regarding an outage or downtime at your site, contact us and include downtime in the subject or body of the email. For additional technical details, please consult the reference section below. Don't meet the requirements? If your site does not meet these conditions, please contact us to discuss your options for contributing to the OSG. Scheduling a Planning Consultation \u00b6 Before participating in the OSG, either as a computational resource contributor or consumer, we ask that you contact us to set up a consultation. During this consultation, OSG staff will introduce you and your team to the OSG and develop a plan to meet your resource contribution and/or research goals. Preparing Your Local Cluster \u00b6 After the consultation, ensure that your local cluster meets the requirements as outlined above . In particular, you should now know which accounts to create for the communities that you wish to serve at your cluster. Also consider the size and number of jobs that the OSG should send to your site (e.g., number of cores, memory, GPUs, walltime) as well as their scheduling policy (e.g. preemptible backfill partitions). Additionally, OSG staff may have directed you to follow installation instructions from one or more of the following sections: (Recommended) Providing access to CVMFS \u00b6 Maximize resource utilization; required for GPU support Installing CVMFS on your cluster makes your resources more attractive to OSG user jobs! Additionally, if you plan to contribute GPUs to the OSG, installation of CVMFS is required . Many users in the OSG make of use software modules and/or containers provided by their collaborations or by the OSG Research Facilitation team. In order to support these users without having to install specific software modules on your cluster, you may provide a distributed software repository system called CernVM File System (CVMFS). In order to provide CVMFS at your site, you will need the following: A cluster-wide Frontier Squid proxy service with at least 50GB of cache space; installation instructions for Frontier Squid are provided here . A local CVMFS cache per worker node (10 GB minimum, 20 GB recommended) After setting up the Frontier Squid proxy and worker node local caches, install CVMFS on each worker node. (HTCondor clusters only) Installing the OSG Worker Node Client \u00b6 Skip this section if you have CVMFS or shared home directories! If you have CVMFS installed or shared home directories on your worker nodes, you can skip manual installation of the OSG Worker Node Client. All OSG sites need to provide the OSG Worker Node Client on each worker node in their local cluster. This is normally handled by OSG staff for a Hosted CE but that requires shared home directories across the cluster. However, for sites with an HTCondor batch system, often there is no shared filesystem set up. If you run an HTCondor site and it is easier to install and maintain the Worker Node Client on each worker node than to install CVMFS or maintain shared file system, you have the following options: Install the Worker Node Client from RPM Install the Worker Node Client from tarball Requesting an OSG Hosted CE \u00b6 After preparing your local cluster, apply for a Hosted CE by filling out the cluster integration questionnaire. Your answers will help our operators submit resource requests to your local cluster of the appropriate size and scale. Cluster Integration Questionnaire Can I change my answers at a later date? Yes! If you want the OSG to change the size (i.e. CPU, RAM), type (e.g., GPU requests), or number of resource requests, contact us with the FQDN of your login host and the details of your changes. Finalizing Installation \u00b6 After applying for an OSG Hosted CE, our staff will contact you with the following information: IP ranges of OSG hosted services Public SSH key to be installed in the OSG accounts Once this is done, OSG staff will work with you and your team to begin submitting resource requests to your site, first with some tests, then with a steady ramp-up to full production. Validating contributions \u00b6 In addition to any internal validation processes that you may have, the OSG provides monitoring to view which communities and projects within said communities are accessing your site, their fields of science, and home institution. Below is an example of the monitoring views that will be available for your cluster. To view your contributions, select your site from the Facility dropdown of the Payload job summary dashboard. Note that accounting data may take up to 24 hours to display. Reference \u00b6 User accounts \u00b6 Each resource pool in the OSG Consortium that uses Hosted CEs is mapped to your site as a fixed, specific account; we request the account names are of the form osg01 through osg20 . The mappings from Unix usernames to resource pools are as follows: Username Pool Supported Research osg01 OSPool Projects (primarily single PI) supported directly by the OSG organization osg02 GLOW Projects coming from the Center for High Throughput Computing at the University of Wisconsin-Madison osg03 HCC Projects coming from the Holland Computing Center at the University of Nebraska\u2013Lincoln osg04 CMS High-energy physics experiment from the Large Hadron Collider at CERN osg05 Fermilab Experiments from the Fermi National Accelerator Laboratory osg07 IGWN Gravitational wave detection experiments osg08 IGWN Gravitational wave detection experiments osg09 ATLAS High-energy physics experiment from the Large Hadron Collider at CERN osg10 GlueX Study of quark and gluon degrees of freedom in hadrons using high-energy photons osg11 DUNE Experiment for neutrino science and proton decay studies osg12 IceCube Research based on data from the IceCube neutrino detector osg13 XENON Dark matter search experiment osg14 JLab Experiments from the Thomas Jefferson National Accelerator Facility osg15 - osg20 - Unassigned For example, the activities in your batch system corresponding to the user osg02 will always be associated with the GLOW resource pool. Security \u00b6 OSG takes multiple precautions to maintain security and prevent unauthorized usage of resources: Access to the OSG system with SSH keys are restricted to the OSG staff maintaining them Users are carefully vetted before they are allowed to submit jobs to OSG Jobs running through OSG can be traced back to the user that submitted them Job submission can quickly be disabled if needed Our security team is readily contactable in case of an emergency: https://osg-htc.org/security/#reporting-a-security-incident How to Get Help \u00b6 Is your site not receiving jobs from an OSG Hosted CE? Consult our status page for Hosted CE outages. If there isn't an outage, you need help with setup, or otherwise have questions, contact us .","title":"Request a Hosted CE"},{"location":"compute-element/hosted-ce/#requesting-an-osg-hosted-ce","text":"An OSG Hosted Compute Entrypoint (CE) is the entry point for resource requests coming from the OSG; it handles authorization and delegation of resource requests to your existing campus HPC/HTC cluster. Many sites set up their compute entrypoint locally. As an alternative, OSG offers a no-cost Hosted CE option wherein the OSG team will host and operate the HTCondor Compute Entrypoint, and configure it for the communities that you choose to support. This document explains the requirements and the procedure for requesting an OSG Hosted CE. Running more than 10,000 resource requests The Hosted CE can support thousands of concurrent resource request submissions. If you wish to run your own local compute entrypoint or expect to support more than 10,000 concurrently running OSG resource requests, see this page for installing the HTCondor-CE.","title":"Requesting an OSG Hosted CE"},{"location":"compute-element/hosted-ce/#before-starting","text":"Before preparing your cluster for OSG resource requests, consider the following requirements: An existing compute cluster with a supported batch system running on a supported operating system Outbound network connectivity from the worker nodes (they can be behind NAT) One or more Unix accounts on your cluster's submit server with the following capabilities: Accessible via SSH key Use of SSH remote port forwarding ( AllowTcpForwarding yes ) and SSH multiplexing ( MaxSessions 10 or greater) Permission to submit jobs to your local cluster. Shared user home directories between the submit server and the worker nodes. Not required for HTCondor clusters: see this section for more details. Temporary scratch space on each worker node; site administrators should ensure that files in this directory are regularly cleaned out. OSG resource contributors must inform the OSG of any relevant changes to their site. Site downtimes For an improved turnaround time regarding an outage or downtime at your site, contact us and include downtime in the subject or body of the email. For additional technical details, please consult the reference section below. Don't meet the requirements? If your site does not meet these conditions, please contact us to discuss your options for contributing to the OSG.","title":"Before Starting"},{"location":"compute-element/hosted-ce/#scheduling-a-planning-consultation","text":"Before participating in the OSG, either as a computational resource contributor or consumer, we ask that you contact us to set up a consultation. During this consultation, OSG staff will introduce you and your team to the OSG and develop a plan to meet your resource contribution and/or research goals.","title":"Scheduling a Planning Consultation"},{"location":"compute-element/hosted-ce/#preparing-your-local-cluster","text":"After the consultation, ensure that your local cluster meets the requirements as outlined above . In particular, you should now know which accounts to create for the communities that you wish to serve at your cluster. Also consider the size and number of jobs that the OSG should send to your site (e.g., number of cores, memory, GPUs, walltime) as well as their scheduling policy (e.g. preemptible backfill partitions). Additionally, OSG staff may have directed you to follow installation instructions from one or more of the following sections:","title":"Preparing Your Local Cluster"},{"location":"compute-element/hosted-ce/#recommended-providing-access-to-cvmfs","text":"Maximize resource utilization; required for GPU support Installing CVMFS on your cluster makes your resources more attractive to OSG user jobs! Additionally, if you plan to contribute GPUs to the OSG, installation of CVMFS is required . Many users in the OSG make of use software modules and/or containers provided by their collaborations or by the OSG Research Facilitation team. In order to support these users without having to install specific software modules on your cluster, you may provide a distributed software repository system called CernVM File System (CVMFS). In order to provide CVMFS at your site, you will need the following: A cluster-wide Frontier Squid proxy service with at least 50GB of cache space; installation instructions for Frontier Squid are provided here . A local CVMFS cache per worker node (10 GB minimum, 20 GB recommended) After setting up the Frontier Squid proxy and worker node local caches, install CVMFS on each worker node.","title":"(Recommended) Providing access to CVMFS"},{"location":"compute-element/hosted-ce/#htcondor-clusters-only-installing-the-osg-worker-node-client","text":"Skip this section if you have CVMFS or shared home directories! If you have CVMFS installed or shared home directories on your worker nodes, you can skip manual installation of the OSG Worker Node Client. All OSG sites need to provide the OSG Worker Node Client on each worker node in their local cluster. This is normally handled by OSG staff for a Hosted CE but that requires shared home directories across the cluster. However, for sites with an HTCondor batch system, often there is no shared filesystem set up. If you run an HTCondor site and it is easier to install and maintain the Worker Node Client on each worker node than to install CVMFS or maintain shared file system, you have the following options: Install the Worker Node Client from RPM Install the Worker Node Client from tarball","title":"(HTCondor clusters only) Installing the OSG Worker Node Client"},{"location":"compute-element/hosted-ce/#requesting-an-osg-hosted-ce_1","text":"After preparing your local cluster, apply for a Hosted CE by filling out the cluster integration questionnaire. Your answers will help our operators submit resource requests to your local cluster of the appropriate size and scale. Cluster Integration Questionnaire Can I change my answers at a later date? Yes! If you want the OSG to change the size (i.e. CPU, RAM), type (e.g., GPU requests), or number of resource requests, contact us with the FQDN of your login host and the details of your changes.","title":"Requesting an OSG Hosted CE"},{"location":"compute-element/hosted-ce/#finalizing-installation","text":"After applying for an OSG Hosted CE, our staff will contact you with the following information: IP ranges of OSG hosted services Public SSH key to be installed in the OSG accounts Once this is done, OSG staff will work with you and your team to begin submitting resource requests to your site, first with some tests, then with a steady ramp-up to full production.","title":"Finalizing Installation"},{"location":"compute-element/hosted-ce/#validating-contributions","text":"In addition to any internal validation processes that you may have, the OSG provides monitoring to view which communities and projects within said communities are accessing your site, their fields of science, and home institution. Below is an example of the monitoring views that will be available for your cluster. To view your contributions, select your site from the Facility dropdown of the Payload job summary dashboard. Note that accounting data may take up to 24 hours to display.","title":"Validating contributions"},{"location":"compute-element/hosted-ce/#reference","text":"","title":"Reference"},{"location":"compute-element/hosted-ce/#user-accounts","text":"Each resource pool in the OSG Consortium that uses Hosted CEs is mapped to your site as a fixed, specific account; we request the account names are of the form osg01 through osg20 . The mappings from Unix usernames to resource pools are as follows: Username Pool Supported Research osg01 OSPool Projects (primarily single PI) supported directly by the OSG organization osg02 GLOW Projects coming from the Center for High Throughput Computing at the University of Wisconsin-Madison osg03 HCC Projects coming from the Holland Computing Center at the University of Nebraska\u2013Lincoln osg04 CMS High-energy physics experiment from the Large Hadron Collider at CERN osg05 Fermilab Experiments from the Fermi National Accelerator Laboratory osg07 IGWN Gravitational wave detection experiments osg08 IGWN Gravitational wave detection experiments osg09 ATLAS High-energy physics experiment from the Large Hadron Collider at CERN osg10 GlueX Study of quark and gluon degrees of freedom in hadrons using high-energy photons osg11 DUNE Experiment for neutrino science and proton decay studies osg12 IceCube Research based on data from the IceCube neutrino detector osg13 XENON Dark matter search experiment osg14 JLab Experiments from the Thomas Jefferson National Accelerator Facility osg15 - osg20 - Unassigned For example, the activities in your batch system corresponding to the user osg02 will always be associated with the GLOW resource pool.","title":"User accounts"},{"location":"compute-element/hosted-ce/#security","text":"OSG takes multiple precautions to maintain security and prevent unauthorized usage of resources: Access to the OSG system with SSH keys are restricted to the OSG staff maintaining them Users are carefully vetted before they are allowed to submit jobs to OSG Jobs running through OSG can be traced back to the user that submitted them Job submission can quickly be disabled if needed Our security team is readily contactable in case of an emergency: https://osg-htc.org/security/#reporting-a-security-incident","title":"Security"},{"location":"compute-element/hosted-ce/#how-to-get-help","text":"Is your site not receiving jobs from an OSG Hosted CE? Consult our status page for Hosted CE outages. If there isn't an outage, you need help with setup, or otherwise have questions, contact us .","title":"How to Get Help"},{"location":"compute-element/htcondor-ce-overview/","text":"HTCondor-CE Overview \u00b6 This document serves as an introduction to HTCondor-CE and how it works. Before continuing with the overview, make sure that you are familiar with the following concepts: An OSG site plan What is a batch system and which one will you use ( HTCondor , PBS, LSF, SGE, or SLURM )? Security via host certificates to authenticate servers and bearer tokens to authenticate clients Pilot jobs, frontends, and factories (i.e., GlideinWMS , AutoPyFactory) What is a Compute Entrypoint? \u00b6 An OSG Compute Entrypoint (CE) is the door for remote organizations to submit requests to temporarily allocate local compute resources. At the heart of the CE is the job gateway software, which is responsible for handling incoming jobs, authenticating and authorizing them, and delegating them to your batch system for execution. Most jobs that arrive at a CE (here referred to as \"CE jobs\") are not end-user jobs, but rather pilot jobs submitted from factories. Successful pilot jobs create and make available an environment for actual end-user jobs to match and ultimately run within the pilot job container. Eventually pilot jobs remove themselves, typically after a period of inactivity. Note The Compute Entrypoint was previously known as the \"Compute Element\". What is HTCondor-CE? \u00b6 HTCondor-CE is a special configuration of the HTCondor software designed to be a job gateway solution for the OSG Fabric of Services. It is configured to use the JobRouter daemon to delegate jobs by transforming and submitting them to the site\u2019s batch system. Benefits of running the HTCondor-CE: Scalability: HTCondor-CE is capable of supporting job workloads of large sites Debugging tools: HTCondor-CE offers many tools to help troubleshoot issues with jobs Routing as configuration: HTCondor-CE\u2019s mechanism to transform and submit jobs is customized via configuration variables, which means that customizations will persist across upgrades and will not involve modification of software internals to route jobs How CE Jobs Run \u00b6 Once an incoming CE job is authorized, it is placed into HTCondor-CE\u2019s scheduler where the JobRouter creates a transformed copy (called the routed job ) and submits the copy to the batch system (called the batch system job ). After submission, HTCondor-CE monitors the batch system job and communicates its status to the original CE job, which in turn notifies the original submitter (e.g., job factory) of any updates. When the job completes, files are transferred along the same chain: from the batch system to the CE, then from the CE to the original submitter. Hosted CE over SSH \u00b6 The Hosted CE is intended for small sites or as an introduction to providing capacity to collaborations. OSG staff configure and maintain an HTCondor-CE on behalf of the site. The Hosted CE is a special configuration of HTCondor-CE that can submit jobs to a remote cluster over SSH. It provides a simple starting point for opportunistic resource owners that want to start contributing capacity with minimal effort: an organization will be able to accept CE jobs by allowing SSH access to a login node in their cluster. If your site intends to run over 10,000 concurrent CE jobs, you will need to host your own HTCondor-CE because the Hosted CE has not yet been optimized for such loads. If you are interested in a Hosted CE solution, please follow the instructions on this page . On HTCondor batch systems \u00b6 For a site with an HTCondor batch system , the JobRouter can use HTCondor protocols to place a transformed copy of the CE job directly into the batch system\u2019s scheduler, meaning that the routed and batch system jobs are one and the same. Thus, there are three representations of your job, each with its own ID (see diagram below): Access point: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID HTCondor batch system: the routed job\u2019s ID In an HTCondor-CE/HTCondor setup, files are transferred from HTCondor-CE\u2019s spool directory to the batch system\u2019s spool directory using internal HTCondor protocols. Note The JobRouter copies the job directly into the batch system and does not make use of condor_submit . This means that if the HTCondor batch system is configured to add attributes to incoming jobs when they are submitted (i.e., SUBMIT_EXPRS ), these attributes will not be added to the routed jobs. On other batch systems \u00b6 For non-HTCondor batch systems, the JobRouter transforms the CE job into a routed job on the CE and the routed job submits a job into the batch system via a process called the BLAHP. Thus, there are four representations of your job, each with its own ID (see diagram below): Login node: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID and the routed job\u2019s ID HTCondor batch system: the batch system\u2019s job ID Although the following figure specifies the PBS case, it applies to all non-HTCondor batch systems: With non-HTCondor batch systems, HTCondor-CE cannot use internal HTCondor protocols to transfer files so its spool directory must be exported to a shared file system that is mounted on the batch system\u2019s worker nodes. How the CE is Customized \u00b6 Aside from the basic configuration required in the CE installation, there are two main ways to customize your CE (if you decide any customization is required at all): Deciding which collaborations are allowed to run at your site: collaborations will submit resource allocation requests to your CE using bearer tokens, and you can configure which collaboration's tokens you are willing to accept. How to filter and transform the CE jobs to be run on your batch system: Filtering and transforming CE jobs (i.e., setting site-specific attributes or resource limits), requires configuration of your site\u2019s job routes. For examples of common job routes, consult the JobRouter recipes page. Note If you are running HTCondor as your batch system, you will have two HTCondor configurations side-by-side (one residing in /etc/condor/ and the other in /etc/condor-ce ) and will need to make sure to differentiate the two when editing any configuration. How Security Works \u00b6 Among OSG services, communication is secured between various parties using a combination of PKI infrastructure involving Certificate Authorities (CAs) and bearer tokens. Services such as a Compute Entrypoint, present host certificates to prove their identity to clients, much like your browser verifies websites that you may visit. And to use these services, clients present bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client. In turn, the service may be configured to authorize the client based on their collaboration. Next steps \u00b6 Once the basic installation is done, additional activities include: Setting up job routes to customize incoming jobs Submitting jobs to a HTCondor-CE Troubleshooting the HTCondor-CE Register the CE Register with the OSG GlideinWMS factories and/or the ATLAS AutoPyFactory","title":"HTCondor-CE Overview"},{"location":"compute-element/htcondor-ce-overview/#htcondor-ce-overview","text":"This document serves as an introduction to HTCondor-CE and how it works. Before continuing with the overview, make sure that you are familiar with the following concepts: An OSG site plan What is a batch system and which one will you use ( HTCondor , PBS, LSF, SGE, or SLURM )? Security via host certificates to authenticate servers and bearer tokens to authenticate clients Pilot jobs, frontends, and factories (i.e., GlideinWMS , AutoPyFactory)","title":"HTCondor-CE Overview"},{"location":"compute-element/htcondor-ce-overview/#what-is-a-compute-entrypoint","text":"An OSG Compute Entrypoint (CE) is the door for remote organizations to submit requests to temporarily allocate local compute resources. At the heart of the CE is the job gateway software, which is responsible for handling incoming jobs, authenticating and authorizing them, and delegating them to your batch system for execution. Most jobs that arrive at a CE (here referred to as \"CE jobs\") are not end-user jobs, but rather pilot jobs submitted from factories. Successful pilot jobs create and make available an environment for actual end-user jobs to match and ultimately run within the pilot job container. Eventually pilot jobs remove themselves, typically after a period of inactivity. Note The Compute Entrypoint was previously known as the \"Compute Element\".","title":"What is a Compute Entrypoint?"},{"location":"compute-element/htcondor-ce-overview/#what-is-htcondor-ce","text":"HTCondor-CE is a special configuration of the HTCondor software designed to be a job gateway solution for the OSG Fabric of Services. It is configured to use the JobRouter daemon to delegate jobs by transforming and submitting them to the site\u2019s batch system. Benefits of running the HTCondor-CE: Scalability: HTCondor-CE is capable of supporting job workloads of large sites Debugging tools: HTCondor-CE offers many tools to help troubleshoot issues with jobs Routing as configuration: HTCondor-CE\u2019s mechanism to transform and submit jobs is customized via configuration variables, which means that customizations will persist across upgrades and will not involve modification of software internals to route jobs","title":"What is HTCondor-CE?"},{"location":"compute-element/htcondor-ce-overview/#how-ce-jobs-run","text":"Once an incoming CE job is authorized, it is placed into HTCondor-CE\u2019s scheduler where the JobRouter creates a transformed copy (called the routed job ) and submits the copy to the batch system (called the batch system job ). After submission, HTCondor-CE monitors the batch system job and communicates its status to the original CE job, which in turn notifies the original submitter (e.g., job factory) of any updates. When the job completes, files are transferred along the same chain: from the batch system to the CE, then from the CE to the original submitter.","title":"How CE Jobs Run"},{"location":"compute-element/htcondor-ce-overview/#hosted-ce-over-ssh","text":"The Hosted CE is intended for small sites or as an introduction to providing capacity to collaborations. OSG staff configure and maintain an HTCondor-CE on behalf of the site. The Hosted CE is a special configuration of HTCondor-CE that can submit jobs to a remote cluster over SSH. It provides a simple starting point for opportunistic resource owners that want to start contributing capacity with minimal effort: an organization will be able to accept CE jobs by allowing SSH access to a login node in their cluster. If your site intends to run over 10,000 concurrent CE jobs, you will need to host your own HTCondor-CE because the Hosted CE has not yet been optimized for such loads. If you are interested in a Hosted CE solution, please follow the instructions on this page .","title":"Hosted CE over SSH"},{"location":"compute-element/htcondor-ce-overview/#on-htcondor-batch-systems","text":"For a site with an HTCondor batch system , the JobRouter can use HTCondor protocols to place a transformed copy of the CE job directly into the batch system\u2019s scheduler, meaning that the routed and batch system jobs are one and the same. Thus, there are three representations of your job, each with its own ID (see diagram below): Access point: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID HTCondor batch system: the routed job\u2019s ID In an HTCondor-CE/HTCondor setup, files are transferred from HTCondor-CE\u2019s spool directory to the batch system\u2019s spool directory using internal HTCondor protocols. Note The JobRouter copies the job directly into the batch system and does not make use of condor_submit . This means that if the HTCondor batch system is configured to add attributes to incoming jobs when they are submitted (i.e., SUBMIT_EXPRS ), these attributes will not be added to the routed jobs.","title":"On HTCondor batch systems"},{"location":"compute-element/htcondor-ce-overview/#on-other-batch-systems","text":"For non-HTCondor batch systems, the JobRouter transforms the CE job into a routed job on the CE and the routed job submits a job into the batch system via a process called the BLAHP. Thus, there are four representations of your job, each with its own ID (see diagram below): Login node: the HTCondor job ID in the original queue HTCondor-CE: the incoming CE job\u2019s ID and the routed job\u2019s ID HTCondor batch system: the batch system\u2019s job ID Although the following figure specifies the PBS case, it applies to all non-HTCondor batch systems: With non-HTCondor batch systems, HTCondor-CE cannot use internal HTCondor protocols to transfer files so its spool directory must be exported to a shared file system that is mounted on the batch system\u2019s worker nodes.","title":"On other batch systems"},{"location":"compute-element/htcondor-ce-overview/#how-the-ce-is-customized","text":"Aside from the basic configuration required in the CE installation, there are two main ways to customize your CE (if you decide any customization is required at all): Deciding which collaborations are allowed to run at your site: collaborations will submit resource allocation requests to your CE using bearer tokens, and you can configure which collaboration's tokens you are willing to accept. How to filter and transform the CE jobs to be run on your batch system: Filtering and transforming CE jobs (i.e., setting site-specific attributes or resource limits), requires configuration of your site\u2019s job routes. For examples of common job routes, consult the JobRouter recipes page. Note If you are running HTCondor as your batch system, you will have two HTCondor configurations side-by-side (one residing in /etc/condor/ and the other in /etc/condor-ce ) and will need to make sure to differentiate the two when editing any configuration.","title":"How the CE is Customized"},{"location":"compute-element/htcondor-ce-overview/#how-security-works","text":"Among OSG services, communication is secured between various parties using a combination of PKI infrastructure involving Certificate Authorities (CAs) and bearer tokens. Services such as a Compute Entrypoint, present host certificates to prove their identity to clients, much like your browser verifies websites that you may visit. And to use these services, clients present bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client. In turn, the service may be configured to authorize the client based on their collaboration.","title":"How Security Works"},{"location":"compute-element/htcondor-ce-overview/#next-steps","text":"Once the basic installation is done, additional activities include: Setting up job routes to customize incoming jobs Submitting jobs to a HTCondor-CE Troubleshooting the HTCondor-CE Register the CE Register with the OSG GlideinWMS factories and/or the ATLAS AutoPyFactory","title":"Next steps"},{"location":"compute-element/install-htcondor-ce/","text":"Installing and Maintaining HTCondor-CE \u00b6 The HTCondor-CE software is a job gateway for an OSG Compute Entrypoint (CE). As such, the OSG will submit resource allocation requests (RARs) jobs to your HTCondor-CE and it will handle authorization and delegation of RARs to your local batch system. In OSG today, RARs are sent to CEs as pilot jobs from a factory, which in turn are able to accept and run end-user jobs. See the upstream documentation for a more detailed introduction. Use this page to learn how to install, configure, run, test, and troubleshoot an OSG HTCondor-CE. OSG Hosted CE Unless you plan on running more than 10k concurrently running RARs or plan on making frequent configuration changes, we suggest requesting an OSG Hosted CE . Note If you are installing an HTCondor-CE for use outside of the OSG, consult the upstream documentation instead. Before Starting \u00b6 Before starting the installation process, consider the following points, consulting the upstream references as needed ( HTCondor-CE 23 ): User IDs: If they do not exist already, the installation will create the Linux users condor (UID 4716) and gratia You will also need to create Unix accounts for each collaboration that you wish to support. See details in the 'Configuring authentication' section below . SSL certificate: The HTCondor-CE service uses a host certificate and an accompanying key. If using a Let's Encrypt cert, install these as /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key If using an IGTF cert, install these as /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem See details in the Host Certificates overview . DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP) Access point/login node: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster File Systems : Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Install the appropriate EPEL and OSG Yum repositories for your operating system. Obtain root access to the host Install CA certificates Installing HTCondor-CE \u00b6 An HTCondor-CE installation consists of the job gateway (i.e., the HTCondor-CE job router) and other support software (e.g., osg-configure , a Gratia probe for OSG accounting). To simplify installation, OSG provides convenience RPMs that install all required software. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages (Optional) If your batch system is already installed via non-RPM means and is in the following list, install the appropriate 'empty' RPM. Otherwise, skip to the next step. If your batch system is\u2026 Then run the following command\u2026 HTCondor yum install empty-condor --enablerepo=osg-empty SLURM yum install empty-slurm --enablerepo=osg-empty (Optional) If your HTCondor batch system is already installed via non-OSG RPM means, add the line below to /etc/yum.repos.d/osg.repo . Otherwise, skip to the next step. exclude=condor Select the appropriate convenience RPM: If your batch system is... Then use the following package... HTCondor osg-ce-condor LSF osg-ce-lsf PBS osg-ce-pbs SGE osg-ce-sge SLURM osg-ce-slurm Install the CE software where <PACKAGE> is the package you selected in the above step.: root@host # yum install <PACKAGE> Configuring HTCondor-CE \u00b6 There are a few required configuration steps to connect HTCondor-CE with your batch system and authentication method. For more advanced configuration, see the section on optional configurations . Configuring the local batch system \u00b6 To configure HTCondor-CE to integrate with your local batch system, please refer to the upstream documentation . Configuring authentication \u00b6 HTCondor-CE clients will submit RARs accompanied by bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client The osg-scitokens-mapfile , pulled in by the osg-ce package, provides default token to local user mappings. To accept RARs from a particular collaboration: Create the Unix account(s) corresponding to the last field in the default mapfile: /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf . For example, to add support for the OSPool, create the osg user account on the CE and across your cluster. (Optional) if you wish to change the user mapping, copy the relevant mapping from /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf to a .conf file in /etc/condor-ce/mapfiles.d/ and change the last field to the desired username. For example, if you wish to add support for the OSPool but prefer to map OSPool pilot jobs to the osgpilot account that you created on your CE and across your cluster, you could add the following to /etc/condor-ce/mapfiles.d/50-ospool.conf : # OSG SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot For more details of the mapfile format, consult the \"SciTokens\" section of the upstream documentation . Bannning a collaboration \u00b6 Implicit banning Note that if you have not created the mapped user per the above section , it is not strictly necessary to add a ban mapping. HTCondor-CE will only authenticate remote RAR submission for the relevant credential if the Unix user exists. To explicitly ban a remote submitter from your HTCondor-CE, add a line like the following to a file in /etc/condor-ce/mapfiles.d/*.conf : SCITOKENS /<TOKEN ISSUER>,<TOKEN SUBJECT>/ <USER>@banned.htcondor.org Replacing <CREDENTIAL> with a regular expression and <USER> with an arbitrary user name. For example, to ban OSPool pilots from your site, you could add the following to /etc/condor-ce/config.d/99-bans.conf : SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot@banned.htcondor.org Automatic configuration \u00b6 The OSG CE metapackage brings along a configuration tool, osg-configure , that is designed to automatically configure the different pieces of software required for an OSG HTCondor-CE: Enable your batch system in the HTCondor-CE configuration by editing the enabled field in the /etc/osg/config.d/20-<YOUR BATCH SYSTEM>.ini : enabled = True Read through the other .ini files in the /etc/osg/config.d directory and make any necessary changes. See the osg-configure documentation for details. Validate the configuration settings root@host # osg-configure -v Fix any errors (at least) that osg-configure reports. Once the validation command succeeds without errors, apply the configuration settings: root@host # osg-configure -c Optional configuration \u00b6 In addition to the configurations above, you may need to further configure how pilot jobs are filtered and transformed before they are submitted to your local batch system or otherwise change the behavior of your CE. For detailed instructions, please refer to the upstream documentation: Configuring the Job Router Optional configuration Accounting with multiple CEs or local user jobs \u00b6 Note For non-HTCondor batch systems only If your site has multiple CEs or you have local users submitting to the same local batch system, the OSG accounting software needs to be configured so that it doesn't over report the number of jobs. Modify the value of SuppressNoDNRecords in /etc/gratia/htcondor-ce/ProbeConfig on each of your CE's so that it reads: SuppressNoDNRecords=\"1\" Starting and Validating HTCondor-CE \u00b6 For information on how to start and validate the core HTCondor-CE services, please refer to the upstream documentation Troubleshooting HTCondor-CE \u00b6 For information on how to troubleshoot your HTCondor-CE, please refer to the upstream documentation: Common issues Debugging tools Helpful logs Registering the CE \u00b6 To contribute capacity, your CE must be registered with the OSG Consortium . To register your resource: Identify the facility, site, and resource group where your HTCondor-CE is hosted. For example, the Center for High Throughput Computing at the University of Wisconsin-Madison uses the following information: Facility: University of Wisconsin Site: CHTC Resource Group: CHTC Using the above information, create or update the appropriate YAML file, using this template as a guide. Getting Help \u00b6 To get assistance, please use the this page .","title":"Install HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#installing-and-maintaining-htcondor-ce","text":"The HTCondor-CE software is a job gateway for an OSG Compute Entrypoint (CE). As such, the OSG will submit resource allocation requests (RARs) jobs to your HTCondor-CE and it will handle authorization and delegation of RARs to your local batch system. In OSG today, RARs are sent to CEs as pilot jobs from a factory, which in turn are able to accept and run end-user jobs. See the upstream documentation for a more detailed introduction. Use this page to learn how to install, configure, run, test, and troubleshoot an OSG HTCondor-CE. OSG Hosted CE Unless you plan on running more than 10k concurrently running RARs or plan on making frequent configuration changes, we suggest requesting an OSG Hosted CE . Note If you are installing an HTCondor-CE for use outside of the OSG, consult the upstream documentation instead.","title":"Installing and Maintaining HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#before-starting","text":"Before starting the installation process, consider the following points, consulting the upstream references as needed ( HTCondor-CE 23 ): User IDs: If they do not exist already, the installation will create the Linux users condor (UID 4716) and gratia You will also need to create Unix accounts for each collaboration that you wish to support. See details in the 'Configuring authentication' section below . SSL certificate: The HTCondor-CE service uses a host certificate and an accompanying key. If using a Let's Encrypt cert, install these as /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key If using an IGTF cert, install these as /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem See details in the Host Certificates overview . DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP) Access point/login node: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster File Systems : Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Install the appropriate EPEL and OSG Yum repositories for your operating system. Obtain root access to the host Install CA certificates","title":"Before Starting"},{"location":"compute-element/install-htcondor-ce/#installing-htcondor-ce","text":"An HTCondor-CE installation consists of the job gateway (i.e., the HTCondor-CE job router) and other support software (e.g., osg-configure , a Gratia probe for OSG accounting). To simplify installation, OSG provides convenience RPMs that install all required software. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages (Optional) If your batch system is already installed via non-RPM means and is in the following list, install the appropriate 'empty' RPM. Otherwise, skip to the next step. If your batch system is\u2026 Then run the following command\u2026 HTCondor yum install empty-condor --enablerepo=osg-empty SLURM yum install empty-slurm --enablerepo=osg-empty (Optional) If your HTCondor batch system is already installed via non-OSG RPM means, add the line below to /etc/yum.repos.d/osg.repo . Otherwise, skip to the next step. exclude=condor Select the appropriate convenience RPM: If your batch system is... Then use the following package... HTCondor osg-ce-condor LSF osg-ce-lsf PBS osg-ce-pbs SGE osg-ce-sge SLURM osg-ce-slurm Install the CE software where <PACKAGE> is the package you selected in the above step.: root@host # yum install <PACKAGE>","title":"Installing HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#configuring-htcondor-ce","text":"There are a few required configuration steps to connect HTCondor-CE with your batch system and authentication method. For more advanced configuration, see the section on optional configurations .","title":"Configuring HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#configuring-the-local-batch-system","text":"To configure HTCondor-CE to integrate with your local batch system, please refer to the upstream documentation .","title":"Configuring the local batch system"},{"location":"compute-element/install-htcondor-ce/#configuring-authentication","text":"HTCondor-CE clients will submit RARs accompanied by bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client The osg-scitokens-mapfile , pulled in by the osg-ce package, provides default token to local user mappings. To accept RARs from a particular collaboration: Create the Unix account(s) corresponding to the last field in the default mapfile: /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf . For example, to add support for the OSPool, create the osg user account on the CE and across your cluster. (Optional) if you wish to change the user mapping, copy the relevant mapping from /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf to a .conf file in /etc/condor-ce/mapfiles.d/ and change the last field to the desired username. For example, if you wish to add support for the OSPool but prefer to map OSPool pilot jobs to the osgpilot account that you created on your CE and across your cluster, you could add the following to /etc/condor-ce/mapfiles.d/50-ospool.conf : # OSG SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot For more details of the mapfile format, consult the \"SciTokens\" section of the upstream documentation .","title":"Configuring authentication"},{"location":"compute-element/install-htcondor-ce/#bannning-a-collaboration","text":"Implicit banning Note that if you have not created the mapped user per the above section , it is not strictly necessary to add a ban mapping. HTCondor-CE will only authenticate remote RAR submission for the relevant credential if the Unix user exists. To explicitly ban a remote submitter from your HTCondor-CE, add a line like the following to a file in /etc/condor-ce/mapfiles.d/*.conf : SCITOKENS /<TOKEN ISSUER>,<TOKEN SUBJECT>/ <USER>@banned.htcondor.org Replacing <CREDENTIAL> with a regular expression and <USER> with an arbitrary user name. For example, to ban OSPool pilots from your site, you could add the following to /etc/condor-ce/config.d/99-bans.conf : SCITOKENS /^https\\:\\/\\/scitokens\\.org\\/osg\\-connect,/ osgpilot@banned.htcondor.org","title":"Bannning a collaboration"},{"location":"compute-element/install-htcondor-ce/#automatic-configuration","text":"The OSG CE metapackage brings along a configuration tool, osg-configure , that is designed to automatically configure the different pieces of software required for an OSG HTCondor-CE: Enable your batch system in the HTCondor-CE configuration by editing the enabled field in the /etc/osg/config.d/20-<YOUR BATCH SYSTEM>.ini : enabled = True Read through the other .ini files in the /etc/osg/config.d directory and make any necessary changes. See the osg-configure documentation for details. Validate the configuration settings root@host # osg-configure -v Fix any errors (at least) that osg-configure reports. Once the validation command succeeds without errors, apply the configuration settings: root@host # osg-configure -c","title":"Automatic configuration"},{"location":"compute-element/install-htcondor-ce/#optional-configuration","text":"In addition to the configurations above, you may need to further configure how pilot jobs are filtered and transformed before they are submitted to your local batch system or otherwise change the behavior of your CE. For detailed instructions, please refer to the upstream documentation: Configuring the Job Router Optional configuration","title":"Optional configuration"},{"location":"compute-element/install-htcondor-ce/#accounting-with-multiple-ces-or-local-user-jobs","text":"Note For non-HTCondor batch systems only If your site has multiple CEs or you have local users submitting to the same local batch system, the OSG accounting software needs to be configured so that it doesn't over report the number of jobs. Modify the value of SuppressNoDNRecords in /etc/gratia/htcondor-ce/ProbeConfig on each of your CE's so that it reads: SuppressNoDNRecords=\"1\"","title":"Accounting with multiple CEs or local user jobs"},{"location":"compute-element/install-htcondor-ce/#starting-and-validating-htcondor-ce","text":"For information on how to start and validate the core HTCondor-CE services, please refer to the upstream documentation","title":"Starting and Validating HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#troubleshooting-htcondor-ce","text":"For information on how to troubleshoot your HTCondor-CE, please refer to the upstream documentation: Common issues Debugging tools Helpful logs","title":"Troubleshooting HTCondor-CE"},{"location":"compute-element/install-htcondor-ce/#registering-the-ce","text":"To contribute capacity, your CE must be registered with the OSG Consortium . To register your resource: Identify the facility, site, and resource group where your HTCondor-CE is hosted. For example, the Center for High Throughput Computing at the University of Wisconsin-Madison uses the following information: Facility: University of Wisconsin Site: CHTC Resource Group: CHTC Using the above information, create or update the appropriate YAML file, using this template as a guide.","title":"Registering the CE"},{"location":"compute-element/install-htcondor-ce/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"compute-element/job-router-recipes/","text":"Up-to-date documentation can be found at https://osg-htc.org/docs/compute-element/install-htcondor-ce/","title":"Job router recipes"},{"location":"compute-element/slurm-recipes/","text":"Slurm Configuration Recipes \u00b6 This document contains examples of common Slurm configurations used by sites to contribute capacity to the OSPool. Contributing X% of Your Cluster \u00b6 To contribute a percentage of your Slurm cluster to the OSPool, set aside a number of whole nodes for a dedicated OSPool partition : Determine the percentage of your cluster that you would like to contribute and use that to calculate the number of cores to meet that percentage Select nodes and sum the number of cores to meet your desired contribution In slurm.conf , configure the NodeName for each type of chassis and assign specific nodes to PartitionName=ospool For example, if your cluster is 5120 cores and you wanted to contribute 10% of the cluster to the OSPool, your slurm.conf could contain the following: # Dell PowerEdge C6525, AMD EPYC 7513 32-Core Processor @ 2.6GHz NodeName=spark-a[002-004,006-028] CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=1 RealMemory=256000 State=UNKNOWN Features=amd,avx,avx2 # Dell PowerEdge R6525, AMD EPYC 7763 64-Core Processor NodeName=spark-a[029-071,204-206] CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=1 RealMemory=512000 State=UNKNOWN Features=amd,avx,avx2 # OSPool Partition, -- 10% of Shared is approx 512 cores; 6x64cores + 1x128 cores = 512 PartitionName=ospool State=UP Nodes=spark-a[002-004,006-008,029] DefaultTime=0-04:00:00 MaxTime=1-00:00:00 PreemptMode=OFF Priority=50 AllowGroups=slurm-admin,osg01","title":"Slurm recipes"},{"location":"compute-element/slurm-recipes/#slurm-configuration-recipes","text":"This document contains examples of common Slurm configurations used by sites to contribute capacity to the OSPool.","title":"Slurm Configuration Recipes"},{"location":"compute-element/slurm-recipes/#contributing-x-of-your-cluster","text":"To contribute a percentage of your Slurm cluster to the OSPool, set aside a number of whole nodes for a dedicated OSPool partition : Determine the percentage of your cluster that you would like to contribute and use that to calculate the number of cores to meet that percentage Select nodes and sum the number of cores to meet your desired contribution In slurm.conf , configure the NodeName for each type of chassis and assign specific nodes to PartitionName=ospool For example, if your cluster is 5120 cores and you wanted to contribute 10% of the cluster to the OSPool, your slurm.conf could contain the following: # Dell PowerEdge C6525, AMD EPYC 7513 32-Core Processor @ 2.6GHz NodeName=spark-a[002-004,006-028] CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=1 RealMemory=256000 State=UNKNOWN Features=amd,avx,avx2 # Dell PowerEdge R6525, AMD EPYC 7763 64-Core Processor NodeName=spark-a[029-071,204-206] CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=1 RealMemory=512000 State=UNKNOWN Features=amd,avx,avx2 # OSPool Partition, -- 10% of Shared is approx 512 cores; 6x64cores + 1x128 cores = 512 PartitionName=ospool State=UP Nodes=spark-a[002-004,006-008,029] DefaultTime=0-04:00:00 MaxTime=1-00:00:00 PreemptMode=OFF Priority=50 AllowGroups=slurm-admin,osg01","title":"Contributing X% of Your Cluster"},{"location":"compute-element/submit-htcondor-ce/","text":"Up-to-date documentation can be found at https://osg-htc.org/docs/compute-element/install-htcondor-ce/","title":"Submit htcondor ce"},{"location":"compute-element/troubleshoot-htcondor-ce/","text":"Up-to-date documentation can be found at https://osg-htc.org/docs/compute-element/install-htcondor-ce/","title":"Troubleshoot htcondor ce"},{"location":"data/external-oasis-repos/","text":"Install an OASIS Repository \u00b6 OASIS (the OSG A pplication S oftware I nstallation S ervice) is an infrastructure, based on CVMFS , for distributing software throughout the OSG. Once software is installed into an OASIS repository, the goal is to make it available across about 90% of the OSG within an hour. OASIS consists of keysigning infrastructure, a content distribution network (CDN), and a shared CVMFS repository that is hosted by the OSG. Many use cases will be covered by utilizing the shared repository ; this document covers how to install, configure, and host your own CVMFS repository server . This server will distribute software via OASIS, but will be hosted and operated externally from the OSG project. OASIS-based distribution and key signing is available to OSG VOs or repositories affiliated with an OSG VO. See the policy page for more information on what repositories OSG is willing to distribute. Before Starting \u00b6 The host OS must be: RHEL7 or RHEL8 (or equivalent). Additionally, User IDs: If it does not exist already, the installation will create the cvmfs Linux user Group IDs: If they do not exist already, the installation will create the Linux groups cvmfs and fuse Network ports: This page will configure the repository to distribute using Apache HTTPD on port 8000. At the minimum, the repository needs in-bound access from the OASIS CDN. Disk space: This host will need enough free disk space to host two copies of the software: one compressed and one uncompressed. /srv/cvmfs will hold all the published data (compressed and de-deuplicated). The /var/spool/cvmfs directory will contain all the data in all current transactions (uncompressed). Root access will be needed to install. Installation of software into the repository itself will be done as an unprivileged user. Yum will need to be configured to use the OSG repositories . Overlay-FS limitations CVMFS on RHEL7 only supports Overlay-FS if the underlying filesystem is ext3 or ext4 ; make sure /var/spool/cvmfs is one of these filesystem types. If this is not possible, add CVMFS_DONT_CHECK_OVERLAYFS_VERSION=yes to your CVMFS configuration. Using xfs will work if it was created with ftype=1 Installation \u00b6 Installation is a straightforward install via yum : root@host # yum install cvmfs-server osg-oasis Apache and Repository Mounts \u00b6 For all installs, we recommend mounting all the local repositories on startup: root@host # echo \"cvmfs_server mount -a\" >>/etc/rc.local root@host # chmod +x /etc/rc.local The Apache HTTPD service should be configured to listen on port 8000, have the KeepAlive option enabled, and be started: root@host # echo Listen 8000 >>/etc/httpd/conf.d/cvmfs.conf root@host # echo KeepAlive on >>/etc/httpd/conf.d/cvmfs.conf root@host # chkconfig httpd on root@host # service httpd start Check Firewalls Make sure that port 8000 is available to the Internet. Check the setting of the host- and site-level firewalls. The next steps will fail if the web server is not accessible. Creating a Repository \u00b6 Prior to creation, the repository administrator will need to make two decisions: Select a repository name ; typically, this is derived from the VO or project's name and ends in opensciencegrid.org . For example, the NoVA VO runs the repository nova.opensciencegrid.org . For this section, we will use <EXAMPLE.OPENSCIENCEGRID.ORG> . Select a repository owner : Software publication will need to run by a non- root Unix user account; for this document, we will use <LIBRARIAN> as the account name of the repository owner. The initial repository creation must be run as root : root@host # echo -e \"\\*\\\\t\\\\t-\\\\tnofile\\\\t\\\\t16384\" >>/etc/security/limits.conf root@host # ulimit -n 16384 root@host # cvmfs_server mkfs -o <LIBRARIAN> <EXAMPLE.OPENSCIENCEGRID.ORG> root@host # cat >/srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.htaccess <<xEOFx Order deny,allow Deny from all Allow from 127.0.0.1 Allow from ::1 Allow from 129.79.53.0/24 129.93.244.192/26 129.93.227.64/26 Allow from 2001:18e8:2:6::/56 2600:900:6::/48 xEOFx Here, we increase the number of open files allowed, create the repository using the mkfs command, and then limit the hosts that are allowed to access the repo to the OSG CDN. Next, adjust the configuration in the repository as follows. root@host # cat >>/etc/cvmfs/repositories.d/<EXAMPLE.OPENSCIENCEGRID.ORG>/server.conf <<xEOFx CVMFS_AUTO_TAG_TIMESPAN=\"2 weeks ago\" CVMFS_IGNORE_XDIR_HARDLINKS=true CVMFS_GENERATE_LEGACY_BULK_CHUNKS=false CVMFS_AUTOCATALOGS=true CVMFS_ENFORCE_LIMITS=true CVMFS_FORCE_REMOUNT_WARNING=false xEOFx Additionally, especially if files will be frequently deleted, enabling garbage collection is recommended in this way: root@host # cat >>/etc/cvmfs/repositories.d/<EXAMPLE.OPENSCIENCEGRID.ORG>/server.conf <<xEOFx CVMFS_GARBAGE_COLLECTION=true CVMFS_AUTO_GC=false xEOFx The above assumes that you have your own mechanism to run cvmfs_server gc regularly (typically daily) at a time when it won't interfere with publications, since garbage collection and publication can't be done at the same time. CVMFS_AUTO_GC=true will automatically run garbage collection periodically after publications, but those times are not always convenient. Also, check the cvmfs documentation for additional recommendations for special purpose repositories. Now verify that the repository is readable over HTTP: root@host # wget -qO- http://localhost:8000/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist | cat -v That should print several lines including some gibberish at the end. Hosting a Repository on OASIS \u00b6 In order to host a repository on OASIS, perform the following steps: Verify your VO's registration is up-to-date . All repositories need to be associated with a VO; the VO needs to assign an OASIS manager in Topology who would be responsible for the contents of any of the VO's repositories and will be contacted in case of issues. To designate an OASIS manager, have the VO manager update the Topology registration . Send a message to OSG support using the following template: Please add a new CVMFS repository to OASIS for VO <VO NAME> using the URL http://<FQDN>:8000/cvmfs/<OASIS REPOSITORY> The VO responsible manager will be <OASIS MANAGER>. Replace the <ANGLE BRACKET TEXT> items with the appropriate values. If the repository name matches *.opensciencegrid.org or *.osgstorage.org , wait for the go-ahead from the OSG representative before continuing with the remaining instructions; for all other repositories (such as *.egi.eu ), you are done. When you are told in the ticket to proceed to the next step, first if the repository might be in a transaction abort it: root@host # su <LIBRARIAN> -c \"cvmfs_server abort <EXAMPLE.OPENSCIENCEGRID.ORG>\" Then execute the following commands: root@host # wget -O /srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist \\ http://oasis.opensciencegrid.org/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist root@host # cp /etc/cvmfs/keys/opensciencegrid.org/opensciencegrid.org.pub \\ /etc/cvmfs/keys/<EXAMPLE.OPENSCIENCEGRID.ORG>.pub Replace <EXAMPLE.OPENSCIENCEGRID.ORG> as appropriate. If the cp command prompts about overwriting an existing file, type 'y'. Verify that publishing operation succeeds: root@host # su <LIBRARIAN> -c \"cvmfs_server transaction <EXAMPLE.OPENSCIENCEGRID.ORG>\" root@host # su <LIBRARIAN> -c \"cvmfs_server publish <EXAMPLE.OPENSCIENCEGRID.ORG>\" Within an hour, the repository updates should appear at the OSG Operations and FNAL Stratum-1 servers. On success, make sure the whitelist update happens daily by creating /etc/cron.d/fetch-cvmfs-whitelist with the following contents: 5 4 * * * <LIBRARIAN> cd /srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG> && wget -qO .cvmfswhitelist.new http://oasis.opensciencegrid.org/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist && mv .cvmfswhitelist.new .cvmfswhitelist Note This cronjob eliminates the need for the repository service administrator to periodically use cvmfs_server resign to update .cvmfswhitelist as described in the upstream CVMFS documentation. Update the open support ticket to indicate that the previous steps have been completed Once the repository is fully replicated on the OSG, the VO may proceed in publishing into CVMFS using the <LIBRARIAN> account on the repository server. Tip We strongly recommend the repository maintainer read through the upstream documentation on maintaining repositories and content limitations . Finally, if the new repository will be used outside of the U.S., the VO should open a GGUS ticket following EGI's PROC20 to get the repository replicated onto worldwide Stratum 1s. Replacing an Existing OASIS Repository Server \u00b6 If a need arises to replace a server for an existing *.opensciencegrid.org or *.osgstorage.org repository, there are two ways to do it: one without changing the DNS name and one with changing it. The latter can take longer because it requires OSG Operations intervention. Revision numbers must increase CVMFS does not allow repository revision numbers to decrease, so the instructions below make sure the revision numbers only go up. Without changing the server DNS name \u00b6 If you are recreating the repository on the same machine, use the following command to remove the repository configuration while preserving the data and keys: root@host # cvmfs_server rmfs -p <EXAMPLE.OPENSCIENCEGRID.ORG> Otherwise if it is a new machine, copy the keys from /etc/cvmfs/keys/ <EXAMPLE.OPENSCIENCEGRID.ORG> .* and the data from /srv/cvmfs/ <EXAMPLE.OPENSCIENCEGRID.ORG> from the old server to the new, making sure that no publish operations happen on the old server while you copy the data. Then in either case use cvmfs_server import instead of cvmfs_server mkfs in the above instructions for Creating the Repository , in order to reuse old data and keys. Note that you wil need to reapply any custom configuration changes under /etc/cvmfs/repositories.d/ ` that was on the old server. If you run an old and a new machine in parallel for a while, make sure that when you put the new machine into production (by moving the DNS name) that the new machine has had at least as many publishes as the old machine, so the revision number does not decrease. With changing the server DNS name \u00b6 Note If you create a repository from scratch, as opposed to copying the data and keys from an old server, it is in fact better to change the DNS name of the server because that causes the OSG Operations server to reinitialize the .cvmfswhitelist. If you create a replacement repository on a new machine from scratch, follow the normal instructions on this page above, but with the following differences in the Hosting a Repository on OASIS section: In step 2, instead of asking in the support ticket to create a new repository, give the new URL and ask them to change the repository registration to that URL. When you do the publish in step 5, add a -n NNNN option where NNNN is a revision number greater than the number on the existing repository. That number can be found by this command on a client machine: user@host $ attr -qg revision /cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG> Skip step 6; there is no need to tell OSG Operations when you are finished. After enough time has elapsed for the publish to propagate to clients, typically around 15 minutes, verify that the new chosen revision has reached a client. Removing a Repository from OASIS \u00b6 In order to remove a repository that is being hosted on OASIS, perform the following steps: If the repository has been replicated outside of the U.S., open a GGUS ticket assigned to support unit \"Software and Data Distribution (CVMFS)\" asking that the replication be removed from EGI Stratum-1s. Remind them in the ticket that there are worldwide Stratum-1s that automatically replicate all OSG repositories that RAL replicates, so those Stratum-1s cannot remove their replicas before RAL does but their administrators will need to be notified to remove their replicas within 8 hours after RAL does to avoid alarms. Wait until this ticket is resolved before proceeding. Open a support ticket asking to shut down the repository, giving the repository name (e.g., <EXAMPLE.OPENSCIENCEGRID.ORG> ), and the corresponding VO.","title":"Install an OASIS Repo"},{"location":"data/external-oasis-repos/#install-an-oasis-repository","text":"OASIS (the OSG A pplication S oftware I nstallation S ervice) is an infrastructure, based on CVMFS , for distributing software throughout the OSG. Once software is installed into an OASIS repository, the goal is to make it available across about 90% of the OSG within an hour. OASIS consists of keysigning infrastructure, a content distribution network (CDN), and a shared CVMFS repository that is hosted by the OSG. Many use cases will be covered by utilizing the shared repository ; this document covers how to install, configure, and host your own CVMFS repository server . This server will distribute software via OASIS, but will be hosted and operated externally from the OSG project. OASIS-based distribution and key signing is available to OSG VOs or repositories affiliated with an OSG VO. See the policy page for more information on what repositories OSG is willing to distribute.","title":"Install an OASIS Repository"},{"location":"data/external-oasis-repos/#before-starting","text":"The host OS must be: RHEL7 or RHEL8 (or equivalent). Additionally, User IDs: If it does not exist already, the installation will create the cvmfs Linux user Group IDs: If they do not exist already, the installation will create the Linux groups cvmfs and fuse Network ports: This page will configure the repository to distribute using Apache HTTPD on port 8000. At the minimum, the repository needs in-bound access from the OASIS CDN. Disk space: This host will need enough free disk space to host two copies of the software: one compressed and one uncompressed. /srv/cvmfs will hold all the published data (compressed and de-deuplicated). The /var/spool/cvmfs directory will contain all the data in all current transactions (uncompressed). Root access will be needed to install. Installation of software into the repository itself will be done as an unprivileged user. Yum will need to be configured to use the OSG repositories . Overlay-FS limitations CVMFS on RHEL7 only supports Overlay-FS if the underlying filesystem is ext3 or ext4 ; make sure /var/spool/cvmfs is one of these filesystem types. If this is not possible, add CVMFS_DONT_CHECK_OVERLAYFS_VERSION=yes to your CVMFS configuration. Using xfs will work if it was created with ftype=1","title":"Before Starting"},{"location":"data/external-oasis-repos/#installation","text":"Installation is a straightforward install via yum : root@host # yum install cvmfs-server osg-oasis","title":"Installation"},{"location":"data/external-oasis-repos/#apache-and-repository-mounts","text":"For all installs, we recommend mounting all the local repositories on startup: root@host # echo \"cvmfs_server mount -a\" >>/etc/rc.local root@host # chmod +x /etc/rc.local The Apache HTTPD service should be configured to listen on port 8000, have the KeepAlive option enabled, and be started: root@host # echo Listen 8000 >>/etc/httpd/conf.d/cvmfs.conf root@host # echo KeepAlive on >>/etc/httpd/conf.d/cvmfs.conf root@host # chkconfig httpd on root@host # service httpd start Check Firewalls Make sure that port 8000 is available to the Internet. Check the setting of the host- and site-level firewalls. The next steps will fail if the web server is not accessible.","title":"Apache and Repository Mounts"},{"location":"data/external-oasis-repos/#creating-a-repository","text":"Prior to creation, the repository administrator will need to make two decisions: Select a repository name ; typically, this is derived from the VO or project's name and ends in opensciencegrid.org . For example, the NoVA VO runs the repository nova.opensciencegrid.org . For this section, we will use <EXAMPLE.OPENSCIENCEGRID.ORG> . Select a repository owner : Software publication will need to run by a non- root Unix user account; for this document, we will use <LIBRARIAN> as the account name of the repository owner. The initial repository creation must be run as root : root@host # echo -e \"\\*\\\\t\\\\t-\\\\tnofile\\\\t\\\\t16384\" >>/etc/security/limits.conf root@host # ulimit -n 16384 root@host # cvmfs_server mkfs -o <LIBRARIAN> <EXAMPLE.OPENSCIENCEGRID.ORG> root@host # cat >/srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.htaccess <<xEOFx Order deny,allow Deny from all Allow from 127.0.0.1 Allow from ::1 Allow from 129.79.53.0/24 129.93.244.192/26 129.93.227.64/26 Allow from 2001:18e8:2:6::/56 2600:900:6::/48 xEOFx Here, we increase the number of open files allowed, create the repository using the mkfs command, and then limit the hosts that are allowed to access the repo to the OSG CDN. Next, adjust the configuration in the repository as follows. root@host # cat >>/etc/cvmfs/repositories.d/<EXAMPLE.OPENSCIENCEGRID.ORG>/server.conf <<xEOFx CVMFS_AUTO_TAG_TIMESPAN=\"2 weeks ago\" CVMFS_IGNORE_XDIR_HARDLINKS=true CVMFS_GENERATE_LEGACY_BULK_CHUNKS=false CVMFS_AUTOCATALOGS=true CVMFS_ENFORCE_LIMITS=true CVMFS_FORCE_REMOUNT_WARNING=false xEOFx Additionally, especially if files will be frequently deleted, enabling garbage collection is recommended in this way: root@host # cat >>/etc/cvmfs/repositories.d/<EXAMPLE.OPENSCIENCEGRID.ORG>/server.conf <<xEOFx CVMFS_GARBAGE_COLLECTION=true CVMFS_AUTO_GC=false xEOFx The above assumes that you have your own mechanism to run cvmfs_server gc regularly (typically daily) at a time when it won't interfere with publications, since garbage collection and publication can't be done at the same time. CVMFS_AUTO_GC=true will automatically run garbage collection periodically after publications, but those times are not always convenient. Also, check the cvmfs documentation for additional recommendations for special purpose repositories. Now verify that the repository is readable over HTTP: root@host # wget -qO- http://localhost:8000/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist | cat -v That should print several lines including some gibberish at the end.","title":"Creating a Repository"},{"location":"data/external-oasis-repos/#hosting-a-repository-on-oasis","text":"In order to host a repository on OASIS, perform the following steps: Verify your VO's registration is up-to-date . All repositories need to be associated with a VO; the VO needs to assign an OASIS manager in Topology who would be responsible for the contents of any of the VO's repositories and will be contacted in case of issues. To designate an OASIS manager, have the VO manager update the Topology registration . Send a message to OSG support using the following template: Please add a new CVMFS repository to OASIS for VO <VO NAME> using the URL http://<FQDN>:8000/cvmfs/<OASIS REPOSITORY> The VO responsible manager will be <OASIS MANAGER>. Replace the <ANGLE BRACKET TEXT> items with the appropriate values. If the repository name matches *.opensciencegrid.org or *.osgstorage.org , wait for the go-ahead from the OSG representative before continuing with the remaining instructions; for all other repositories (such as *.egi.eu ), you are done. When you are told in the ticket to proceed to the next step, first if the repository might be in a transaction abort it: root@host # su <LIBRARIAN> -c \"cvmfs_server abort <EXAMPLE.OPENSCIENCEGRID.ORG>\" Then execute the following commands: root@host # wget -O /srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist \\ http://oasis.opensciencegrid.org/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist root@host # cp /etc/cvmfs/keys/opensciencegrid.org/opensciencegrid.org.pub \\ /etc/cvmfs/keys/<EXAMPLE.OPENSCIENCEGRID.ORG>.pub Replace <EXAMPLE.OPENSCIENCEGRID.ORG> as appropriate. If the cp command prompts about overwriting an existing file, type 'y'. Verify that publishing operation succeeds: root@host # su <LIBRARIAN> -c \"cvmfs_server transaction <EXAMPLE.OPENSCIENCEGRID.ORG>\" root@host # su <LIBRARIAN> -c \"cvmfs_server publish <EXAMPLE.OPENSCIENCEGRID.ORG>\" Within an hour, the repository updates should appear at the OSG Operations and FNAL Stratum-1 servers. On success, make sure the whitelist update happens daily by creating /etc/cron.d/fetch-cvmfs-whitelist with the following contents: 5 4 * * * <LIBRARIAN> cd /srv/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG> && wget -qO .cvmfswhitelist.new http://oasis.opensciencegrid.org/cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG>/.cvmfswhitelist && mv .cvmfswhitelist.new .cvmfswhitelist Note This cronjob eliminates the need for the repository service administrator to periodically use cvmfs_server resign to update .cvmfswhitelist as described in the upstream CVMFS documentation. Update the open support ticket to indicate that the previous steps have been completed Once the repository is fully replicated on the OSG, the VO may proceed in publishing into CVMFS using the <LIBRARIAN> account on the repository server. Tip We strongly recommend the repository maintainer read through the upstream documentation on maintaining repositories and content limitations . Finally, if the new repository will be used outside of the U.S., the VO should open a GGUS ticket following EGI's PROC20 to get the repository replicated onto worldwide Stratum 1s.","title":"Hosting a Repository on OASIS"},{"location":"data/external-oasis-repos/#replacing-an-existing-oasis-repository-server","text":"If a need arises to replace a server for an existing *.opensciencegrid.org or *.osgstorage.org repository, there are two ways to do it: one without changing the DNS name and one with changing it. The latter can take longer because it requires OSG Operations intervention. Revision numbers must increase CVMFS does not allow repository revision numbers to decrease, so the instructions below make sure the revision numbers only go up.","title":"Replacing an Existing OASIS Repository Server"},{"location":"data/external-oasis-repos/#without-changing-the-server-dns-name","text":"If you are recreating the repository on the same machine, use the following command to remove the repository configuration while preserving the data and keys: root@host # cvmfs_server rmfs -p <EXAMPLE.OPENSCIENCEGRID.ORG> Otherwise if it is a new machine, copy the keys from /etc/cvmfs/keys/ <EXAMPLE.OPENSCIENCEGRID.ORG> .* and the data from /srv/cvmfs/ <EXAMPLE.OPENSCIENCEGRID.ORG> from the old server to the new, making sure that no publish operations happen on the old server while you copy the data. Then in either case use cvmfs_server import instead of cvmfs_server mkfs in the above instructions for Creating the Repository , in order to reuse old data and keys. Note that you wil need to reapply any custom configuration changes under /etc/cvmfs/repositories.d/ ` that was on the old server. If you run an old and a new machine in parallel for a while, make sure that when you put the new machine into production (by moving the DNS name) that the new machine has had at least as many publishes as the old machine, so the revision number does not decrease.","title":"Without changing the server DNS name"},{"location":"data/external-oasis-repos/#with-changing-the-server-dns-name","text":"Note If you create a repository from scratch, as opposed to copying the data and keys from an old server, it is in fact better to change the DNS name of the server because that causes the OSG Operations server to reinitialize the .cvmfswhitelist. If you create a replacement repository on a new machine from scratch, follow the normal instructions on this page above, but with the following differences in the Hosting a Repository on OASIS section: In step 2, instead of asking in the support ticket to create a new repository, give the new URL and ask them to change the repository registration to that URL. When you do the publish in step 5, add a -n NNNN option where NNNN is a revision number greater than the number on the existing repository. That number can be found by this command on a client machine: user@host $ attr -qg revision /cvmfs/<EXAMPLE.OPENSCIENCEGRID.ORG> Skip step 6; there is no need to tell OSG Operations when you are finished. After enough time has elapsed for the publish to propagate to clients, typically around 15 minutes, verify that the new chosen revision has reached a client.","title":"With changing the server DNS name"},{"location":"data/external-oasis-repos/#removing-a-repository-from-oasis","text":"In order to remove a repository that is being hosted on OASIS, perform the following steps: If the repository has been replicated outside of the U.S., open a GGUS ticket assigned to support unit \"Software and Data Distribution (CVMFS)\" asking that the replication be removed from EGI Stratum-1s. Remind them in the ticket that there are worldwide Stratum-1s that automatically replicate all OSG repositories that RAL replicates, so those Stratum-1s cannot remove their replicas before RAL does but their administrators will need to be notified to remove their replicas within 8 hours after RAL does to avoid alarms. Wait until this ticket is resolved before proceeding. Open a support ticket asking to shut down the repository, giving the repository name (e.g., <EXAMPLE.OPENSCIENCEGRID.ORG> ), and the corresponding VO.","title":"Removing a Repository from OASIS"},{"location":"data/frontier-squid/","text":"Install the Frontier Squid HTTP Caching Proxy \u00b6 Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. This document is intended for System Administrators who are installing frontier-squid , the OSG distribution of the Frontier Squid software. Frontier Squid Is Recommended \u00b6 OSG recommends that all sites run a caching proxy for HTTP and HTTPS to help reduce bandwidth and improve throughput. To that end, Compute Element (CE) installations include Frontier Squid automatically. We encourage all sites to configure and use this service, as described below. For large sites that expect heavy load on the proxy, it is best to run the proxy on its own host. If you are unsure if your site qualifies, we recommend initially running the proxy on your CE host and monitoring its bandwidth. If the network usage regularly peaks at over one third of the bandwidth capacity, move the proxy to a new host. Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If it does not exist already, the installation will create the squid Linux user Network ports: Clients within your cluster (e.g., OSG user jobs) will communicate with Frontier Squid on port 3128 (TCP). Additionally, central infrastructure will monitor Frontier Squid through port 3401 (UDP); see this section for more details. Host choice: If you will be supporting the Frontier application at your site, review the upstream documentation to determine how to size your equipment. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Installing Frontier Squid \u00b6 To install Frontier Squid, make sure that your host is up to date before installing the required packages: Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install Frontier Squid: root@host # yum install frontier-squid Configuring Frontier Squid \u00b6 Configuring the Frontier Squid Service \u00b6 To configure the Frontier Squid service itself: Follow the Configuration section of the upstream Frontier Squid documentation . Enable, start, and test the service (as described below). Register the squid (also as described below ). Note An important difference between the standard Squid software and the Frontier Squid variant is that Frontier Squid changes are in /etc/squid/customize.sh instead of /etc/squid/squid.conf . Configuring the OSG CE \u00b6 To configure the OSG Compute Entrypoint (CE) to know about your Frontier Squid service: On your CE host (which may be different than your Frontier Squid host), edit /etc/osg/config.d/01-squid.ini Make sure that enabled is set to True Set location to the hostname and port of your Frontier Squid service (e.g., my.squid.host.edu:3128 ) Leave the other settings at DEFAULT unless you have specific reasons to change them Run osg-configure -c to propagate the changes on your CE. Note You may want to finish other CE configuration tasks before running osg-configure . Just be sure to run it once before starting CE services. Using Frontier-Squid \u00b6 Start the frontier-squid service and enable it to start at boot time. As a reminder, here are common service commands (all run as root ): To... Run the command... Start the service systemctl start frontier-squid Stop the service systemctl stop frontier-squid Enable the service to start on boot systemctl enable frontier-squid Disable the service from starting on boot systemctl disable frontier-squid Validating Frontier Squid \u00b6 As any user on another computer, do the following (where <MY.SQUID.HOST.EDU> is the fully qualified domain name of your squid server): user@host $ export http_proxy = http:// ` <MY.SQUID.HOST.EDU> ` :3128 user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from `<MY.SQUID.HOST.EDU>` user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from `<MY.SQUID.HOST.EDU>` If the grep doesn't print anything, try removing it from the pipeline to see if errors are obvious. If the second try says MISS again, something is probably wrong with the squid cache writes. Look at the squid access.log file to try to see what's wrong. If your squid will be supporting the Frontier application, it is also good to do the test in the upstream documentation Testing the installation section . Registering Frontier Squid \u00b6 To register your Frontier Squid host, follow the general registration instructions here with the following Frontier Squid-specific details. Alternatively, contact us for assistance with the registration process. Add a Squid: section to the Services: list, with any relevant fields for that service. This is a partial example: ... FQDN: <FULLY QUALIFIED DOMAIN NAME> Services: Squid: Description: Generic squid service ... Replacing <FULLY QUALIFIED DOMAIN NAME> with your Frontier Squid server's DNS entry or in the case of multiple Frontier Squid servers for a single resource, the round-robin DNS entry. See the BNL_ATLAS_Frontier_Squid for a complete example. Normally registered squids will be monitored by WLCG. This is strongly recommended even for non-WLCG sites so operations experts can help with diagnosing problems. However, if a site declines monitoring, that can be indicated by setting Monitored: false in a Details: section below Description: . Registration is still important for the sake of excluding squids from worker node failover monitors. The default if Details: Monitored: is not set is true . If you set Monitored to true, also enable monitoring as described in the upstream documentation on enabling monitoring . A few hours after a squid is registered and marked Active (and not marked Monitored: false ), verify that it is monitored by WLCG . Reference \u00b6 Users \u00b6 The frontier-squid installation will create one user account unless it already exists. User Comment squid Reduced privilege user that the squid process runs under. Set the default gid of the \"squid\" user to be a group that is also called \"squid\". The package can instead use another user name of your choice if you create a configuration file before installation. Details are in the upstream documentation Preparation section . Networking \u00b6 Open the following ports on your Frontier Squid hosts: Port Number Protocol WAN LAN Comment 3128 tcp \u2713 Also limited in squid ACLs. Should be limited to access from your worker nodes 3401 udp \u2713 Also limited in squid ACLs. Should be limited to public monitoring server addresses The addresses of the WLCG monitoring servers for use in firewalls are listed in the upstream documentation Enabling monitoring section . Frontier Squid Log Files \u00b6 Log file contents are explained in the upstream documentation Log file contents section .","title":"Install Frontier Squid RPM"},{"location":"data/frontier-squid/#install-the-frontier-squid-http-caching-proxy","text":"Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. This document is intended for System Administrators who are installing frontier-squid , the OSG distribution of the Frontier Squid software.","title":"Install the Frontier Squid HTTP Caching Proxy"},{"location":"data/frontier-squid/#frontier-squid-is-recommended","text":"OSG recommends that all sites run a caching proxy for HTTP and HTTPS to help reduce bandwidth and improve throughput. To that end, Compute Element (CE) installations include Frontier Squid automatically. We encourage all sites to configure and use this service, as described below. For large sites that expect heavy load on the proxy, it is best to run the proxy on its own host. If you are unsure if your site qualifies, we recommend initially running the proxy on your CE host and monitoring its bandwidth. If the network usage regularly peaks at over one third of the bandwidth capacity, move the proxy to a new host.","title":"Frontier Squid Is Recommended"},{"location":"data/frontier-squid/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If it does not exist already, the installation will create the squid Linux user Network ports: Clients within your cluster (e.g., OSG user jobs) will communicate with Frontier Squid on port 3128 (TCP). Additionally, central infrastructure will monitor Frontier Squid through port 3401 (UDP); see this section for more details. Host choice: If you will be supporting the Frontier application at your site, review the upstream documentation to determine how to size your equipment. As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories","title":"Before Starting"},{"location":"data/frontier-squid/#installing-frontier-squid","text":"To install Frontier Squid, make sure that your host is up to date before installing the required packages: Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install Frontier Squid: root@host # yum install frontier-squid","title":"Installing Frontier Squid"},{"location":"data/frontier-squid/#configuring-frontier-squid","text":"","title":"Configuring Frontier Squid"},{"location":"data/frontier-squid/#configuring-the-frontier-squid-service","text":"To configure the Frontier Squid service itself: Follow the Configuration section of the upstream Frontier Squid documentation . Enable, start, and test the service (as described below). Register the squid (also as described below ). Note An important difference between the standard Squid software and the Frontier Squid variant is that Frontier Squid changes are in /etc/squid/customize.sh instead of /etc/squid/squid.conf .","title":"Configuring the Frontier Squid Service"},{"location":"data/frontier-squid/#configuring-the-osg-ce","text":"To configure the OSG Compute Entrypoint (CE) to know about your Frontier Squid service: On your CE host (which may be different than your Frontier Squid host), edit /etc/osg/config.d/01-squid.ini Make sure that enabled is set to True Set location to the hostname and port of your Frontier Squid service (e.g., my.squid.host.edu:3128 ) Leave the other settings at DEFAULT unless you have specific reasons to change them Run osg-configure -c to propagate the changes on your CE. Note You may want to finish other CE configuration tasks before running osg-configure . Just be sure to run it once before starting CE services.","title":"Configuring the OSG CE"},{"location":"data/frontier-squid/#using-frontier-squid","text":"Start the frontier-squid service and enable it to start at boot time. As a reminder, here are common service commands (all run as root ): To... Run the command... Start the service systemctl start frontier-squid Stop the service systemctl stop frontier-squid Enable the service to start on boot systemctl enable frontier-squid Disable the service from starting on boot systemctl disable frontier-squid","title":"Using Frontier-Squid"},{"location":"data/frontier-squid/#validating-frontier-squid","text":"As any user on another computer, do the following (where <MY.SQUID.HOST.EDU> is the fully qualified domain name of your squid server): user@host $ export http_proxy = http:// ` <MY.SQUID.HOST.EDU> ` :3128 user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from `<MY.SQUID.HOST.EDU>` user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from `<MY.SQUID.HOST.EDU>` If the grep doesn't print anything, try removing it from the pipeline to see if errors are obvious. If the second try says MISS again, something is probably wrong with the squid cache writes. Look at the squid access.log file to try to see what's wrong. If your squid will be supporting the Frontier application, it is also good to do the test in the upstream documentation Testing the installation section .","title":"Validating Frontier Squid"},{"location":"data/frontier-squid/#registering-frontier-squid","text":"To register your Frontier Squid host, follow the general registration instructions here with the following Frontier Squid-specific details. Alternatively, contact us for assistance with the registration process. Add a Squid: section to the Services: list, with any relevant fields for that service. This is a partial example: ... FQDN: <FULLY QUALIFIED DOMAIN NAME> Services: Squid: Description: Generic squid service ... Replacing <FULLY QUALIFIED DOMAIN NAME> with your Frontier Squid server's DNS entry or in the case of multiple Frontier Squid servers for a single resource, the round-robin DNS entry. See the BNL_ATLAS_Frontier_Squid for a complete example. Normally registered squids will be monitored by WLCG. This is strongly recommended even for non-WLCG sites so operations experts can help with diagnosing problems. However, if a site declines monitoring, that can be indicated by setting Monitored: false in a Details: section below Description: . Registration is still important for the sake of excluding squids from worker node failover monitors. The default if Details: Monitored: is not set is true . If you set Monitored to true, also enable monitoring as described in the upstream documentation on enabling monitoring . A few hours after a squid is registered and marked Active (and not marked Monitored: false ), verify that it is monitored by WLCG .","title":"Registering Frontier Squid"},{"location":"data/frontier-squid/#reference","text":"","title":"Reference"},{"location":"data/frontier-squid/#users","text":"The frontier-squid installation will create one user account unless it already exists. User Comment squid Reduced privilege user that the squid process runs under. Set the default gid of the \"squid\" user to be a group that is also called \"squid\". The package can instead use another user name of your choice if you create a configuration file before installation. Details are in the upstream documentation Preparation section .","title":"Users"},{"location":"data/frontier-squid/#networking","text":"Open the following ports on your Frontier Squid hosts: Port Number Protocol WAN LAN Comment 3128 tcp \u2713 Also limited in squid ACLs. Should be limited to access from your worker nodes 3401 udp \u2713 Also limited in squid ACLs. Should be limited to public monitoring server addresses The addresses of the WLCG monitoring servers for use in firewalls are listed in the upstream documentation Enabling monitoring section .","title":"Networking"},{"location":"data/frontier-squid/#frontier-squid-log-files","text":"Log file contents are explained in the upstream documentation Log file contents section .","title":"Frontier Squid Log Files"},{"location":"data/run-frontier-squid-container/","text":"Running Frontier Squid in a Container \u00b6 Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. Tip OSG recommends that all sites run a caching proxy for HTTP to help reduce bandwidth and improve throughput. This document outlines how to run Frontier Squid in a Docker container. Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Frontier Squid Reference section as needed): Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: Frontier squid communicates on ports 3128 (TCP) and 3401 (UDP). We encourage sites to allow monitoring on port 3401 via UDP from CERN IP address ranges, 128.142.0.0/16, 188.184.128.0/17, 188.185.48.0/20 and 188.185.128.0/17. See the CERN monitoring documentation for additional details. If outgoing connections are filtered, note that CVMFS always uses ports 8000, 80, or 8080. Host choice: If you will be supporting the Frontier application at your site, review the upstream documentation to determine how to size your equipment. Configuring Squid \u00b6 Environment variables (optional) \u00b6 In addition to the required configuration above (ports and file systems), you may also configure the behavior of your cache with the following environment variables: Variable name Description Defaults SQUID_IPRANGE Limits the incoming connections to the provided whitelist. By default only standard private network addresses are whitelisted. SQUID_CACHE_DISK Sets the cache_dir option which determines the disk size squid uses. Must be an integer value, and its unit is MBs. Note: The cache disk area is located at /var/cache/squid. Defaults to 10000. SQUID_CACHE_MEM Sets the cache_mem option which regulates the size squid reserves for caching small objects in memory. Includes a space and unit, e.g. \"MB\". Defaults to \"128 MB\". Cache Disk Size For production deployments, OSG recommends allocating at least 50 to 100 GB (50000 to 100000 MB) to SQUID_CACHE_DISK. Mount points \u00b6 In order to preserve the cache between redeployments, you should map the following areas to persistent storage outside the container: Mountpoint Description Example docker mount /var/cache/squid This directory contains the cache for squid. See also SQUID_CACHE_DISK above. -v /tmp/squid:/var/cache/squid /var/log/squid This directory contains the squid logs. -v /tmp/log:/var/log/squid For more details, see the Frontier Squid documentation . Configuration customization (optional) \u00b6 More complicated configuration customization can be done by mounting .sh and .awk files into /etc/squid/customize.d. For details on the names and content of those files see the comments in the customization script and see the upstream documentation on configuration customization. Running a Frontier Squid Container \u00b6 To run a Frontier Squid container with the defaults: user@host $ docker run --rm --name frontier-squid \\ -v <HOST CACHE PARTITION>:/var/cache/squid \\ -v <HOST LOG PARTITION>:/var/log/squid \\ -p <HOST PORT>:3128 opensciencegrid/frontier-squid:3.6-release You may pass configuration variables in KEY=VALUE format with either docker -e options or in a file specified with --env-file=<FILENAME> . Running a Frontier Squid container with systemd \u00b6 An example systemd service file for Frontier Squid. This will require creating the environment file in the directory /opt/xcache/.env . Note This example systemd file assumes <HOST PORT> is 3128 and <HOST CACHE PARTITION> is /tmp/squid and <HOST LOG PARTITION> is /tmp/log . Create the systemd service file /etc/systemd/system/docker.frontier-squid.service as follows: [Unit] Description=Stash Cache Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/frontier-squid:3.6-release ExecStart=/usr/bin/docker run --rm --name %n --publish 3128:3128 -v /tmp/squid:/var/cache/squid -v /tmp/log:/var/log/squid --env-file /opt/xcache/.env opensciencegrid/frontier-squid:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.frontier-squid root@host $ systemctl start docker.frontier-squid Validating the Frontier Squid Cache \u00b6 The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl or wget . Here, <HOST PORT> is the port chosen in the docker run command, 3128 by default. user@host $ export http_proxy = http://localhost:<HOST PORT> user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from 797a56e426cf user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from 797a56e426cf Registering Frontier Squid \u00b6 See the Registering Frontier Squid instructions to register your Frontier Squid host. Getting Help \u00b6 To get assistance, please use the this page .","title":"Running Frontier Squid in a Container"},{"location":"data/run-frontier-squid-container/#running-frontier-squid-in-a-container","text":"Frontier Squid is a distribution of the well-known squid HTTP caching proxy software that is optimized for use with applications on the Worldwide LHC Computing Grid (WLCG). It has many advantages over regular squid for common distributed computing applications, especially Frontier and CVMFS. The OSG distribution of frontier-squid is a straight rebuild of the upstream frontier-squid package for the convenience of OSG users. Tip OSG recommends that all sites run a caching proxy for HTTP to help reduce bandwidth and improve throughput. This document outlines how to run Frontier Squid in a Docker container.","title":"Running Frontier Squid in a Container"},{"location":"data/run-frontier-squid-container/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Frontier Squid Reference section as needed): Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: Frontier squid communicates on ports 3128 (TCP) and 3401 (UDP). We encourage sites to allow monitoring on port 3401 via UDP from CERN IP address ranges, 128.142.0.0/16, 188.184.128.0/17, 188.185.48.0/20 and 188.185.128.0/17. See the CERN monitoring documentation for additional details. If outgoing connections are filtered, note that CVMFS always uses ports 8000, 80, or 8080. Host choice: If you will be supporting the Frontier application at your site, review the upstream documentation to determine how to size your equipment.","title":"Before Starting"},{"location":"data/run-frontier-squid-container/#configuring-squid","text":"","title":"Configuring Squid"},{"location":"data/run-frontier-squid-container/#environment-variables-optional","text":"In addition to the required configuration above (ports and file systems), you may also configure the behavior of your cache with the following environment variables: Variable name Description Defaults SQUID_IPRANGE Limits the incoming connections to the provided whitelist. By default only standard private network addresses are whitelisted. SQUID_CACHE_DISK Sets the cache_dir option which determines the disk size squid uses. Must be an integer value, and its unit is MBs. Note: The cache disk area is located at /var/cache/squid. Defaults to 10000. SQUID_CACHE_MEM Sets the cache_mem option which regulates the size squid reserves for caching small objects in memory. Includes a space and unit, e.g. \"MB\". Defaults to \"128 MB\". Cache Disk Size For production deployments, OSG recommends allocating at least 50 to 100 GB (50000 to 100000 MB) to SQUID_CACHE_DISK.","title":"Environment variables (optional)"},{"location":"data/run-frontier-squid-container/#mount-points","text":"In order to preserve the cache between redeployments, you should map the following areas to persistent storage outside the container: Mountpoint Description Example docker mount /var/cache/squid This directory contains the cache for squid. See also SQUID_CACHE_DISK above. -v /tmp/squid:/var/cache/squid /var/log/squid This directory contains the squid logs. -v /tmp/log:/var/log/squid For more details, see the Frontier Squid documentation .","title":"Mount points"},{"location":"data/run-frontier-squid-container/#configuration-customization-optional","text":"More complicated configuration customization can be done by mounting .sh and .awk files into /etc/squid/customize.d. For details on the names and content of those files see the comments in the customization script and see the upstream documentation on configuration customization.","title":"Configuration customization (optional)"},{"location":"data/run-frontier-squid-container/#running-a-frontier-squid-container","text":"To run a Frontier Squid container with the defaults: user@host $ docker run --rm --name frontier-squid \\ -v <HOST CACHE PARTITION>:/var/cache/squid \\ -v <HOST LOG PARTITION>:/var/log/squid \\ -p <HOST PORT>:3128 opensciencegrid/frontier-squid:3.6-release You may pass configuration variables in KEY=VALUE format with either docker -e options or in a file specified with --env-file=<FILENAME> .","title":"Running a Frontier Squid Container"},{"location":"data/run-frontier-squid-container/#running-a-frontier-squid-container-with-systemd","text":"An example systemd service file for Frontier Squid. This will require creating the environment file in the directory /opt/xcache/.env . Note This example systemd file assumes <HOST PORT> is 3128 and <HOST CACHE PARTITION> is /tmp/squid and <HOST LOG PARTITION> is /tmp/log . Create the systemd service file /etc/systemd/system/docker.frontier-squid.service as follows: [Unit] Description=Stash Cache Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/frontier-squid:3.6-release ExecStart=/usr/bin/docker run --rm --name %n --publish 3128:3128 -v /tmp/squid:/var/cache/squid -v /tmp/log:/var/log/squid --env-file /opt/xcache/.env opensciencegrid/frontier-squid:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.frontier-squid root@host $ systemctl start docker.frontier-squid","title":"Running a Frontier Squid container with systemd"},{"location":"data/run-frontier-squid-container/#validating-the-frontier-squid-cache","text":"The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl or wget . Here, <HOST PORT> is the port chosen in the docker run command, 3128 by default. user@host $ export http_proxy = http://localhost:<HOST PORT> user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: MISS from 797a56e426cf user@host $ wget -qdO/dev/null http://frontier.cern.ch 2 > & 1 | grep X-Cache X-Cache: HIT from 797a56e426cf","title":"Validating the Frontier Squid Cache"},{"location":"data/run-frontier-squid-container/#registering-frontier-squid","text":"See the Registering Frontier Squid instructions to register your Frontier Squid host.","title":"Registering Frontier Squid"},{"location":"data/run-frontier-squid-container/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/update-oasis/","text":"Updating Software in OASIS \u00b6 OASIS is the OSG Application Software Installation Service that can be used to publish and update software on OSG Worker Nodes under /cvmfs/oasis.opensciencegrid.org . It is implemented using CernVM FileSystem (CVMFS) technology and is the recommended method to make software available to researchers in the OSG Consortium. This document is a step by step explanation of how a member of a Virtual Organization (VO) can become an OASIS manager for their VO and gain access to the shared OASIS service for software management. The shared OASIS service is especially appropropriate for VOs that have a relatively small number of members and a relatively small amount of software to distribute. Larger VOs should consider hosting their own separate repositories . Note For information on how to configure an OASIS client see the CVMFS installation documentation . Requirements \u00b6 To begin the process to distribute software on OASIS using the service, you must: Register as an OSG contact and upload your SSH Key . Submit a request to help@osg-htc.org to become an OASIS manager with the following: The names of the VO(s) whose software that you would like to manage with the shared OASIS login host The names of any other VO members that should be OASIS managers The name of a member of the VO(s) that can verify your affiliation, and Cc that person on your emailed request How to use OASIS \u00b6 Log in with SSH \u00b6 The shared OASIS login server is accessible via SSH for all OASIS managers with registered SSH keys: user@host $ ssh -i <PATH TO SSH KEY> ouser.<VO>@oasis-login.opensciencegrid.org Change <VO> for the name of the Virtual Organization you are trying to access and <PATH TO SSH KEY> with the path to the private part of the SSH key whose public part you registered with the OSG . Instead of putting -i <PATH TO SSH KEY> or ouser.<VO>@ on the command line, you can put it in your ~/.ssh/config : Host oasis-login.opensciencegrid.org User ouser.<VO> IdentityFile <PATH TO SSH KEY> Install and update software \u00b6 Once you log in, you can add/modify/remove content on a staging area at /stage/oasis/$VO where $VO is the name of the VO represented by the manager. Files here are visible to both oasis-login and the Stratum 0 server (oasis.opensciencegrid.org). There is a symbolic link at /cvmfs/oasis.opensciencegrid.org/$VO that points to the same staging area. Request an oasis publish with this command: user@host $ osg-oasis-update This command queues a process to sync the content of OASIS with the content of /stage/oasis/$VO osg-oasis-update returns immediately, but only one update can run at a time (across all VOs); your request may be queued behind a different VO. If you encounter severe delays before the update is finished being published (more than 4 hours), please file a support ticket . Limitations on repository content \u00b6 Although CVMFS provides a POSIX filesystem, it does not work well with all types of content. Content in OASIS is expected to adhere to the CVMFS repository content limitations so please review those guidelines carefully. Testing \u00b6 After osg-oasis-update completes and the changes have been propagated to the CVMFS stratum 1 servers (typically between 0 and 60 minutes, but possibly longer if the servers are busy with updates of other repositories) then the changes can be visible under /cvmfs/oasis.opensciencegrid.org on a computer that has the CVMFS client installed . A client normally only checks for updates if at least an hour has passed since it last checked, but people who have superuser access on the client machine can force it to check again with root@host # cvmfs_talk -i oasis.opensciencegrid.org remount This can be done while the filesystem is mounted (despite the name, it does not do an OS-level umount/mount of the filesystem). If the filesystem is not mounted, it will automatically check for new updates the next time it is mounted. In order to find out if an update has reached the CVMFS stratum 1 server, you can find out the latest osg-oasis-update time seen by the stratum 1 most favored by your CVMFS client with the following long command on your client machine: user@host $ date -d \"1970-1-1 GMT + $( wget -qO- $( attr -qg host /cvmfs/oasis.opensciencegrid.org ) /.cvmfspublished | \\ cat -v | sed -n '/^T/{s/^T//p;q;}' ) sec\" References \u00b6 CVMFS Documentation","title":"Update OASIS Shared Repo"},{"location":"data/update-oasis/#updating-software-in-oasis","text":"OASIS is the OSG Application Software Installation Service that can be used to publish and update software on OSG Worker Nodes under /cvmfs/oasis.opensciencegrid.org . It is implemented using CernVM FileSystem (CVMFS) technology and is the recommended method to make software available to researchers in the OSG Consortium. This document is a step by step explanation of how a member of a Virtual Organization (VO) can become an OASIS manager for their VO and gain access to the shared OASIS service for software management. The shared OASIS service is especially appropropriate for VOs that have a relatively small number of members and a relatively small amount of software to distribute. Larger VOs should consider hosting their own separate repositories . Note For information on how to configure an OASIS client see the CVMFS installation documentation .","title":"Updating Software in OASIS"},{"location":"data/update-oasis/#requirements","text":"To begin the process to distribute software on OASIS using the service, you must: Register as an OSG contact and upload your SSH Key . Submit a request to help@osg-htc.org to become an OASIS manager with the following: The names of the VO(s) whose software that you would like to manage with the shared OASIS login host The names of any other VO members that should be OASIS managers The name of a member of the VO(s) that can verify your affiliation, and Cc that person on your emailed request","title":"Requirements"},{"location":"data/update-oasis/#how-to-use-oasis","text":"","title":"How to use OASIS"},{"location":"data/update-oasis/#log-in-with-ssh","text":"The shared OASIS login server is accessible via SSH for all OASIS managers with registered SSH keys: user@host $ ssh -i <PATH TO SSH KEY> ouser.<VO>@oasis-login.opensciencegrid.org Change <VO> for the name of the Virtual Organization you are trying to access and <PATH TO SSH KEY> with the path to the private part of the SSH key whose public part you registered with the OSG . Instead of putting -i <PATH TO SSH KEY> or ouser.<VO>@ on the command line, you can put it in your ~/.ssh/config : Host oasis-login.opensciencegrid.org User ouser.<VO> IdentityFile <PATH TO SSH KEY>","title":"Log in with SSH"},{"location":"data/update-oasis/#install-and-update-software","text":"Once you log in, you can add/modify/remove content on a staging area at /stage/oasis/$VO where $VO is the name of the VO represented by the manager. Files here are visible to both oasis-login and the Stratum 0 server (oasis.opensciencegrid.org). There is a symbolic link at /cvmfs/oasis.opensciencegrid.org/$VO that points to the same staging area. Request an oasis publish with this command: user@host $ osg-oasis-update This command queues a process to sync the content of OASIS with the content of /stage/oasis/$VO osg-oasis-update returns immediately, but only one update can run at a time (across all VOs); your request may be queued behind a different VO. If you encounter severe delays before the update is finished being published (more than 4 hours), please file a support ticket .","title":"Install and update software"},{"location":"data/update-oasis/#limitations-on-repository-content","text":"Although CVMFS provides a POSIX filesystem, it does not work well with all types of content. Content in OASIS is expected to adhere to the CVMFS repository content limitations so please review those guidelines carefully.","title":"Limitations on repository content"},{"location":"data/update-oasis/#testing","text":"After osg-oasis-update completes and the changes have been propagated to the CVMFS stratum 1 servers (typically between 0 and 60 minutes, but possibly longer if the servers are busy with updates of other repositories) then the changes can be visible under /cvmfs/oasis.opensciencegrid.org on a computer that has the CVMFS client installed . A client normally only checks for updates if at least an hour has passed since it last checked, but people who have superuser access on the client machine can force it to check again with root@host # cvmfs_talk -i oasis.opensciencegrid.org remount This can be done while the filesystem is mounted (despite the name, it does not do an OS-level umount/mount of the filesystem). If the filesystem is not mounted, it will automatically check for new updates the next time it is mounted. In order to find out if an update has reached the CVMFS stratum 1 server, you can find out the latest osg-oasis-update time seen by the stratum 1 most favored by your CVMFS client with the following long command on your client machine: user@host $ date -d \"1970-1-1 GMT + $( wget -qO- $( attr -qg host /cvmfs/oasis.opensciencegrid.org ) /.cvmfspublished | \\ cat -v | sed -n '/^T/{s/^T//p;q;}' ) sec\"","title":"Testing"},{"location":"data/update-oasis/#references","text":"CVMFS Documentation","title":"References"},{"location":"data/stashcache/install-cache/","text":"Installing the OSDF Cache \u00b6 This document describes how to install an Open Science Data Federation (OSDF) cache service. This service allows a site or regional network to cache data frequently used on the OSG, reducing data transfer over the wide-area network and decreasing access latency. Minimum version for this documentation This document describes features introduced in XCache 3.3.0, released on 2022-12-08. When installing, ensure that your version of the stash-cache RPM is at least 3.3.0. Note The OSDF cache was previously named \"Stash Cache\" and some documentation and software may use the old name. Before Starting \u00b6 Before starting the installation process, consider the following requirements: Operating system: Ensure the host has a supported operating system User IDs: If they do not exist already, the installation will create the Linux user IDs condor and xrootd Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request and install host certificates. Network ports: Your host may run a public cache instance (for serving public data only), an authenticated cache instance (for serving protected data), or both. A public cache instance requires the following ports open: Inbound TCP port 1094 for file access via the XRootD protocol Inbound TCP port 8000 for file access via HTTP(S) Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring An authenticated cache instance requires the following ports open: Inbound TCP port 8443 for authenticated file access via HTTPS Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 1TB of disk space for the cache directory, and 12GB of RAM. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates Registering the Cache \u00b6 To be part of the OSDF, your cache must be registered with the OSG. You will need basic information like the resource name, hostname, host certificate DN, and the administrative and security contacts. Initial registration \u00b6 To register your cache host, follow the general registration instructions here . The service type is XRootD cache server . Info This step must be completed before installation. In your registration, you must specify which VOs your cache will serve by adding an AllowedVOs list, with each line specifying a VO whose data you are willing to cache. There are special values you may use in AllowedVOs : ANY_PUBLIC indicates that the cache is willing to serve public data from any VO. ANY indicates that the cache is willing to serve data from any VO, both public and protected. ANY implies ANY_PUBLIC . There are extra requirements for serving protected data: In addition to the cache allowing a VO in the AllowedVOs list, that VO must also allow the cache in its AllowedCaches list. See the page on getting your VO's data into OSDF . There must be an authenticated XRootD instance on the cache server. There must be a DN attribute in the resource registration with the subject DN of the host certificate This is an example registration for a cache server that serves all public data: MY_OSDF_CACHE : FQDN : my-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - ANY_PUBLIC This is an example registration for a cache server that only serves protected data for the Open Science Pool: MY_AUTH_OSDF_CACHE : FQDN : my-auth-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-auth-cache.example.net This is an example registration for a cache server that serves all public data and protected data from the OSG VO: MY_COMBO_OSDF_CACHE : FQDN : my-combo-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG - ANY_PUBLIC DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-combo-cache.example.net Non-standard ports \u00b6 By default, an unauthenticated cache instance serves public data on port 8000, and an authenticated cache instance serves protected data on port 8443. If you change the ports for your cache instances, you must specify the new endpoints under the service, as follows: MY_COMBO_OSDF_CACHE2 : FQDN : my-combo-cache2.example.net Services : XRootD cache server : Description : OSDF cache server Details : endpoint_override : my-combo-cache2.example.net:8080 auth_endpoint_override : my-combo-cache2.example.net:8444 Finalizing registration \u00b6 Once initial registration is complete, you may start the installation process. In the meantime, open a help ticket with your cache name. Mention in your ticket that you would like to \"Finalize the cache registration.\" Installing the Cache \u00b6 The OSDF software consists of an XRootD server with special configuration and supporting services. To simplify installation, OSG provides convenience RPMs that install all required packages with a single command: root@host # yum install stash-cache Configuring the Cache \u00b6 First, you must create a \"cache directory\", which will be used to store downloaded files. By default this is /mnt/stash . We recommend using a separate file system for the cache directory, with at least 1 TB of storage available. Note The cache directory must be writable by the xrootd:xrootd user and group. The stash-cache package provides default configuration files in /etc/xrootd/xrootd-stash-cache.cfg and /etc/xrootd/config.d/ . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d/1*.cfg (for files that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for files that need to be processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG. Ensure the xrootd service has a certificate \u00b6 The service will need a certificate for reporting and to authenticate to origins. The easiest solution for this is to use your host certificate and key as follows: Copy the host certificate to /etc/grid-security/xrd/xrd{cert,key}.pem Set the owner of the directory and contents /etc/grid-security/xrd/ to xrootd:xrootd : root@host # chown -R xrootd:xrootd /etc/grid-security/xrd/ Note You must repeat the above steps whenever you renew your host certificate. If you automate certificate renewal, you should automate copying as well. In addition, you will need to restart the XRootD services ( xrootd@stash-cache and/or xrootd@stash-cache-auth ) so they load the updated certificates. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site . Configuring Optional Features \u00b6 Adjust disk utilization \u00b6 To adjust the disk utilization of your cache, create or edit a file named /etc/xrootd/config.d/90-local.cfg and set the values of pfc.diskusage . pfc.diskusage 0.90 0.95 The two values correspond to the low and high usage water marks, respectively. When usage goes above the high water mark, the XRootD service will delete cached files until usage goes below the low water mark. Enable remote debugging \u00b6 XRootD provides remote debugging via a read-only file system named digFS. This feature is disabled by default, but you may enable it if you need help troubleshooting your server. Warning Remote debugging should only be enabled for long as it is needed to troubleshoot your server. To enable remote debugging, edit /etc/xrootd/digauth.cfg and specify the authorizations for reading digFS. An example of authorizations: all allow gsi g=/glow h=*.cs.wisc.edu This gives access to the config file, log files, core files, and process information to anyone from *.cs.wisc.edu in the /glow VOMS group. See the XRootD manual for the full syntax. Remote debugging should only be enabled for as long as you need assistance. As soon as your issue has been resolved, revert any changes you have made to /etc/xrootd/digauth.cfg . Enable HTTPS on the unauthenticated cache \u00b6 By default, the unauthenticated cache instance uses plain HTTP, not HTTPS. To use HTTPS: Add a certificate according to the instructions above Uncomment set EnableVoms = 1 in /etc/xrootd/config.d/10-osg-xrdvoms.cfg Upgrading from OSG 3.5 If upgrading from OSG 3.5, you may have a file with the following contents in /etc/xrootd/config.d : # Support HTTPS access to unauthenticated cache if named stash-cache http.cadir /etc/grid-security/certificates http.cert /etc/grid-security/xrd/xrdcert.pem http.key /etc/grid-security/xrd/xrdkey.pem http.secxtractor /usr/lib64/libXrdLcmaps.so fi You must delete this config block or XRootD will fail to start. Manually Setting the FQDN (optional) \u00b6 The FQDN of the cache server that you registered in Topology may be different than its internal hostname (as reported by hostname -f ). For example, this may be the case if your cache is behind a load balancer such as LVS. In this case, you must manually tell the cache services which FQDN to use for topology lookups. Create the file /etc/systemd/system/stash-authfile@.service.d/override.conf (note the @ in the directory name) with the following contents: [Service] Environment = CACHE_FQDN=<Topology-registered FQDN> Run systemctl daemon-reload after modifying the file. Managing OSDF services \u00b6 These services must be managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> Public cache services \u00b6 Software Service name Notes XRootD xrootd@stash-cache.service The XRootD daemon, which performs the data transfers XCache xcache-reporter.timer Reports usage information to collector.opensciencegrid.org Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron Required to authenticate monitoring services. See CA documentation for more info stash-authfile@stash-cache.service Generate authentication configuration files for XRootD (public cache instance) stash-authfile@stash-cache.timer Periodically run the above service (public cache instance) Authenticated cache services \u00b6 Software Service name Notes XRootD xrootd-renew-proxy.service Renew a proxy for authenticated downloads to the cache xrootd@stash-cache-auth.service The xrootd daemon which performs authenticated data transfers xrootd-renew-proxy.timer Trigger daily proxy renewal stash-authfile@stash-cache-auth.service Generate the authentication configuration files for XRootD (authenticated cache instance) stash-authfile@stash-cache-auth.timer Periodically run the above service (authenticated cache instance) Validating the Cache \u00b6 The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl . user@host $ curl -O http://cache_host:8000/osgconnect/public/rynge/test.data curl may not correctly report a failure, so verify that the contents of the file are: hello world! Test cache server reporting to the central collector \u00b6 To verify the cache is reporting to the central collector, run the following command from the cache server: user@host $ condor_status -any -pool collector.opensciencegrid.org:9619 \\ -l -const \"Name==\\\"xrootd@`hostname`\\\"\" The output of the above command should detail what the collector knows about the status of your cache. Here is an example snippet of the output: AuthenticatedIdentity = \"sc-cache.chtc.wisc.edu@daemon.opensciencegrid.org\" AuthenticationMethod = \"GSI\" free_cache_bytes = 868104454144 free_cache_fraction = 0.8022261674321525 LastHeardFrom = 1552002482 most_recent_access_time = 1551997049 MyType = \"Machine\" Name = \"xrootd@sc-cache.chtc.wisc.edu\" ping_elapsed_time = 0.00763392448425293 ping_response_code = 0 ping_response_message = \"[SUCCESS] \" ping_response_status = \"ok\" STASHCACHE_DaemonVersion = \"1.0.0\" ... Updating to OSG 3.6 \u00b6 The OSG 3.5 series reached end-of-life on May 1, 2022. Admins are strongly encouraged to move their caches to OSG 3.6. See general update instructions . Unauthenticated caches ( xrootd@stash-cache service) do not need any configuration changes, unless HTTPS access has been enabled. See the \"enable HTTPS on the unauthenticated cache\" section ) for the necessary configuration changes. Authenticated caches ( xrootd@stash-cache-auth service) may need the configuration changes described in the updating to OSG 3.6 section of the XRootD authorization configuration document. Getting Help \u00b6 To get assistance, please use the this page .","title":"Install from RPM"},{"location":"data/stashcache/install-cache/#installing-the-osdf-cache","text":"This document describes how to install an Open Science Data Federation (OSDF) cache service. This service allows a site or regional network to cache data frequently used on the OSG, reducing data transfer over the wide-area network and decreasing access latency. Minimum version for this documentation This document describes features introduced in XCache 3.3.0, released on 2022-12-08. When installing, ensure that your version of the stash-cache RPM is at least 3.3.0. Note The OSDF cache was previously named \"Stash Cache\" and some documentation and software may use the old name.","title":"Installing the OSDF Cache"},{"location":"data/stashcache/install-cache/#before-starting","text":"Before starting the installation process, consider the following requirements: Operating system: Ensure the host has a supported operating system User IDs: If they do not exist already, the installation will create the Linux user IDs condor and xrootd Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request and install host certificates. Network ports: Your host may run a public cache instance (for serving public data only), an authenticated cache instance (for serving protected data), or both. A public cache instance requires the following ports open: Inbound TCP port 1094 for file access via the XRootD protocol Inbound TCP port 8000 for file access via HTTP(S) Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring An authenticated cache instance requires the following ports open: Inbound TCP port 8443 for authenticated file access via HTTPS Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 1TB of disk space for the cache directory, and 12GB of RAM. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"data/stashcache/install-cache/#registering-the-cache","text":"To be part of the OSDF, your cache must be registered with the OSG. You will need basic information like the resource name, hostname, host certificate DN, and the administrative and security contacts.","title":"Registering the Cache"},{"location":"data/stashcache/install-cache/#initial-registration","text":"To register your cache host, follow the general registration instructions here . The service type is XRootD cache server . Info This step must be completed before installation. In your registration, you must specify which VOs your cache will serve by adding an AllowedVOs list, with each line specifying a VO whose data you are willing to cache. There are special values you may use in AllowedVOs : ANY_PUBLIC indicates that the cache is willing to serve public data from any VO. ANY indicates that the cache is willing to serve data from any VO, both public and protected. ANY implies ANY_PUBLIC . There are extra requirements for serving protected data: In addition to the cache allowing a VO in the AllowedVOs list, that VO must also allow the cache in its AllowedCaches list. See the page on getting your VO's data into OSDF . There must be an authenticated XRootD instance on the cache server. There must be a DN attribute in the resource registration with the subject DN of the host certificate This is an example registration for a cache server that serves all public data: MY_OSDF_CACHE : FQDN : my-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - ANY_PUBLIC This is an example registration for a cache server that only serves protected data for the Open Science Pool: MY_AUTH_OSDF_CACHE : FQDN : my-auth-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-auth-cache.example.net This is an example registration for a cache server that serves all public data and protected data from the OSG VO: MY_COMBO_OSDF_CACHE : FQDN : my-combo-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG - ANY_PUBLIC DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-combo-cache.example.net","title":"Initial registration"},{"location":"data/stashcache/install-cache/#non-standard-ports","text":"By default, an unauthenticated cache instance serves public data on port 8000, and an authenticated cache instance serves protected data on port 8443. If you change the ports for your cache instances, you must specify the new endpoints under the service, as follows: MY_COMBO_OSDF_CACHE2 : FQDN : my-combo-cache2.example.net Services : XRootD cache server : Description : OSDF cache server Details : endpoint_override : my-combo-cache2.example.net:8080 auth_endpoint_override : my-combo-cache2.example.net:8444","title":"Non-standard ports"},{"location":"data/stashcache/install-cache/#finalizing-registration","text":"Once initial registration is complete, you may start the installation process. In the meantime, open a help ticket with your cache name. Mention in your ticket that you would like to \"Finalize the cache registration.\"","title":"Finalizing registration"},{"location":"data/stashcache/install-cache/#installing-the-cache","text":"The OSDF software consists of an XRootD server with special configuration and supporting services. To simplify installation, OSG provides convenience RPMs that install all required packages with a single command: root@host # yum install stash-cache","title":"Installing the Cache"},{"location":"data/stashcache/install-cache/#configuring-the-cache","text":"First, you must create a \"cache directory\", which will be used to store downloaded files. By default this is /mnt/stash . We recommend using a separate file system for the cache directory, with at least 1 TB of storage available. Note The cache directory must be writable by the xrootd:xrootd user and group. The stash-cache package provides default configuration files in /etc/xrootd/xrootd-stash-cache.cfg and /etc/xrootd/config.d/ . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d/1*.cfg (for files that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for files that need to be processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG.","title":"Configuring the Cache"},{"location":"data/stashcache/install-cache/#ensure-the-xrootd-service-has-a-certificate","text":"The service will need a certificate for reporting and to authenticate to origins. The easiest solution for this is to use your host certificate and key as follows: Copy the host certificate to /etc/grid-security/xrd/xrd{cert,key}.pem Set the owner of the directory and contents /etc/grid-security/xrd/ to xrootd:xrootd : root@host # chown -R xrootd:xrootd /etc/grid-security/xrd/ Note You must repeat the above steps whenever you renew your host certificate. If you automate certificate renewal, you should automate copying as well. In addition, you will need to restart the XRootD services ( xrootd@stash-cache and/or xrootd@stash-cache-auth ) so they load the updated certificates. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site .","title":"Ensure the xrootd service has a certificate"},{"location":"data/stashcache/install-cache/#configuring-optional-features","text":"","title":"Configuring Optional Features"},{"location":"data/stashcache/install-cache/#adjust-disk-utilization","text":"To adjust the disk utilization of your cache, create or edit a file named /etc/xrootd/config.d/90-local.cfg and set the values of pfc.diskusage . pfc.diskusage 0.90 0.95 The two values correspond to the low and high usage water marks, respectively. When usage goes above the high water mark, the XRootD service will delete cached files until usage goes below the low water mark.","title":"Adjust disk utilization"},{"location":"data/stashcache/install-cache/#enable-remote-debugging","text":"XRootD provides remote debugging via a read-only file system named digFS. This feature is disabled by default, but you may enable it if you need help troubleshooting your server. Warning Remote debugging should only be enabled for long as it is needed to troubleshoot your server. To enable remote debugging, edit /etc/xrootd/digauth.cfg and specify the authorizations for reading digFS. An example of authorizations: all allow gsi g=/glow h=*.cs.wisc.edu This gives access to the config file, log files, core files, and process information to anyone from *.cs.wisc.edu in the /glow VOMS group. See the XRootD manual for the full syntax. Remote debugging should only be enabled for as long as you need assistance. As soon as your issue has been resolved, revert any changes you have made to /etc/xrootd/digauth.cfg .","title":"Enable remote debugging"},{"location":"data/stashcache/install-cache/#enable-https-on-the-unauthenticated-cache","text":"By default, the unauthenticated cache instance uses plain HTTP, not HTTPS. To use HTTPS: Add a certificate according to the instructions above Uncomment set EnableVoms = 1 in /etc/xrootd/config.d/10-osg-xrdvoms.cfg Upgrading from OSG 3.5 If upgrading from OSG 3.5, you may have a file with the following contents in /etc/xrootd/config.d : # Support HTTPS access to unauthenticated cache if named stash-cache http.cadir /etc/grid-security/certificates http.cert /etc/grid-security/xrd/xrdcert.pem http.key /etc/grid-security/xrd/xrdkey.pem http.secxtractor /usr/lib64/libXrdLcmaps.so fi You must delete this config block or XRootD will fail to start.","title":"Enable HTTPS on the unauthenticated cache"},{"location":"data/stashcache/install-cache/#manually-setting-the-fqdn-optional","text":"The FQDN of the cache server that you registered in Topology may be different than its internal hostname (as reported by hostname -f ). For example, this may be the case if your cache is behind a load balancer such as LVS. In this case, you must manually tell the cache services which FQDN to use for topology lookups. Create the file /etc/systemd/system/stash-authfile@.service.d/override.conf (note the @ in the directory name) with the following contents: [Service] Environment = CACHE_FQDN=<Topology-registered FQDN> Run systemctl daemon-reload after modifying the file.","title":"Manually Setting the FQDN (optional)"},{"location":"data/stashcache/install-cache/#managing-osdf-services","text":"These services must be managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME>","title":"Managing OSDF services"},{"location":"data/stashcache/install-cache/#public-cache-services","text":"Software Service name Notes XRootD xrootd@stash-cache.service The XRootD daemon, which performs the data transfers XCache xcache-reporter.timer Reports usage information to collector.opensciencegrid.org Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron Required to authenticate monitoring services. See CA documentation for more info stash-authfile@stash-cache.service Generate authentication configuration files for XRootD (public cache instance) stash-authfile@stash-cache.timer Periodically run the above service (public cache instance)","title":"Public cache services"},{"location":"data/stashcache/install-cache/#authenticated-cache-services","text":"Software Service name Notes XRootD xrootd-renew-proxy.service Renew a proxy for authenticated downloads to the cache xrootd@stash-cache-auth.service The xrootd daemon which performs authenticated data transfers xrootd-renew-proxy.timer Trigger daily proxy renewal stash-authfile@stash-cache-auth.service Generate the authentication configuration files for XRootD (authenticated cache instance) stash-authfile@stash-cache-auth.timer Periodically run the above service (authenticated cache instance)","title":"Authenticated cache services"},{"location":"data/stashcache/install-cache/#validating-the-cache","text":"The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl . user@host $ curl -O http://cache_host:8000/osgconnect/public/rynge/test.data curl may not correctly report a failure, so verify that the contents of the file are: hello world!","title":"Validating the Cache"},{"location":"data/stashcache/install-cache/#test-cache-server-reporting-to-the-central-collector","text":"To verify the cache is reporting to the central collector, run the following command from the cache server: user@host $ condor_status -any -pool collector.opensciencegrid.org:9619 \\ -l -const \"Name==\\\"xrootd@`hostname`\\\"\" The output of the above command should detail what the collector knows about the status of your cache. Here is an example snippet of the output: AuthenticatedIdentity = \"sc-cache.chtc.wisc.edu@daemon.opensciencegrid.org\" AuthenticationMethod = \"GSI\" free_cache_bytes = 868104454144 free_cache_fraction = 0.8022261674321525 LastHeardFrom = 1552002482 most_recent_access_time = 1551997049 MyType = \"Machine\" Name = \"xrootd@sc-cache.chtc.wisc.edu\" ping_elapsed_time = 0.00763392448425293 ping_response_code = 0 ping_response_message = \"[SUCCESS] \" ping_response_status = \"ok\" STASHCACHE_DaemonVersion = \"1.0.0\" ...","title":"Test cache server reporting to the central collector"},{"location":"data/stashcache/install-cache/#updating-to-osg-36","text":"The OSG 3.5 series reached end-of-life on May 1, 2022. Admins are strongly encouraged to move their caches to OSG 3.6. See general update instructions . Unauthenticated caches ( xrootd@stash-cache service) do not need any configuration changes, unless HTTPS access has been enabled. See the \"enable HTTPS on the unauthenticated cache\" section ) for the necessary configuration changes. Authenticated caches ( xrootd@stash-cache-auth service) may need the configuration changes described in the updating to OSG 3.6 section of the XRootD authorization configuration document.","title":"Updating to OSG 3.6"},{"location":"data/stashcache/install-cache/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/stashcache/install-origin/","text":"Installing the OSDF Origin \u00b6 This document describes how to install an Open Science Data Federation (OSDF) origin service. This service allows an organization to export its data to the data federation. Minimum version for this documentation This document describes features introduced in XCache 3.3.0, released on 2022-12-08. When installing, ensure that your version of the stash-origin RPM is at least 3.3.0. Note The OSDF Origin was previously named \"Stash Origin\" and some documentation and software may use the old name. Note The origin must be registered with the OSG prior to joining the data federation. You may start the registration process prior to finishing the installation by using this link along with information like: Resource name and hostname VO associated with this origin server (which will be used to determine the origin's namespace prefix) Administrative and security contact(s) Who (or what) will be allowed to access the VO's data Which caches will be allowed to cache the VO data Before Starting \u00b6 Before starting the installation process, consider the following requirements: Operating system: A RHEL 7 or RHEL 8 or compatible operating systems. User IDs: If they do not exist already, the installation will create the Linux user IDs condor and xrootd ; only the xrootd user is utilized for the running daemons. Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request and install host certificates. Network ports: The origin service requires the following ports open: Inbound TCP port 1094 for unauthenticated file access via the XRoot or HTTP protocols (if serving public data) Inbound TCP port 1095 for authenticated file access via the XRoot or HTTPS protocols (if serving authenticated data) Outbound TCP port 1213 to redirector.osgstorage.org for connecting to the data federation Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring. Hardware requirements: We recommend that an origin has at least 1Gbps connectivity and 8GB of RAM. We suggest that several gigabytes of local disk space be available for log files, although some logging verbosity can be reduced. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates Installing the Origin \u00b6 The origin service consists of one or more XRootD daemons and their dependencies for the authentication infrastructure. To simplify installation, OSG provides convenience RPMs that install all required software with a single command: root@host # yum install stash-origin For this installation guide, we assume that the data to be exported to the federation is mounted at /mnt/stash and owned by the xrootd:xrootd user. Configuring the Origin Server \u00b6 The stash-origin package provides a default configuration files in /etc/xrootd/xrootd-stash-origin.cfg and /etc/xrootd/config.d . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d of the form /etc/xrootd/config.d/1*.cfg (for directives that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for directives that are processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg and /etc/xrootd/config.d/10-origin-site-local.cfg . The mandatory variables to configure are: File Config line Description 10-common-site-local.cfg set rootdir = /mnt/stash The mounted filesystem path to export; this document calls it /mnt/stash 10-common-site-local.cfg set resourcename = YOUR_RESOURCE_NAME The resource name registered with OSG 10-origin-site-local.cfg set PublicOriginExport = /VO/PUBLIC The directory relative to rootdir that is the top of the exported namespace for public (unauthenticated) origin services 10-origin-site-local.cfg set AuthOriginExport = /VO/PUBLIC The directory relative to rootdir that is the top of the exported namespace for authenticated origin services For example, if the HCC VO would like to set up an origin server exporting from the mount point /mnt/stash , and HCC has a public registered namespace at /hcc/PUBLIC , then the following would be set in 10-common-site-local.cfg : set rootdir = /mnt/stash set resourcename = HCC_OSDF_ORIGIN And the following would be set in 10-origin-site-local.cfg : set PublicOriginExport = /hcc/PUBLIC With this configuration, the data under /mnt/stash/hcc/PUBLIC/bio/datasets would be available under the path /hcc/PUBLIC/bio/datasets in the OSDF namespace and the data under /mnt/stash/hcc/PUBLIC/hep/generators would be available under the path /hcc/PUBLIC/hep/generators in the OSDF namespace. If the HCC has a protected registered namespace at /hcc/PROTECTED then set the following in 10-origin-site-local.cfg : set AuthOriginExport = /hcc/PROTECTED If you are serving public data from the origin, you must set PublicOriginExport and use the xrootd@stash-origin service. If you are serving protected data from the origin, you must set AuthOriginExport and use the xrootd@stash-origin-auth service (if not using xrootd-multiuser ) or xrootd-privileged@stash-origin-auth service (if using xrootd-multiuser ). Warning The OSDF namespace is a global namespace. Directories you export must not collide with directories provided by other origin servers; this is why the explicit registration is required. Manually Setting the FQDN (optional) \u00b6 The FQDN of the origin server that you registered in Topology may be different than its internal hostname (as reported by hostname -f ). For example, this may be the case if your origin is behind a load balancer such as LVS. In this case, you must manually tell the origin services which FQDN to use for topology lookups. Create the file /etc/systemd/system/stash-authfile@.service.d/override.conf with the following contents: [Service] Environment = ORIGIN_FQDN=<Topology-registered FQDN> Run systemctl daemon-reload after modifying the file. Managing the Origin Services \u00b6 Serving data for an origin is done by the xrootd daemon. There can be multiple instances of xrootd , running on different ports. The instance that serves unauthenticated data will run on port 1094. The instance that serves authenticated data will run on port 1095. If your origin serves both authenticated and unauthenticated data, you will run both instances. Use of multiuser plugin Some of the service names are different if you have configured the XRootD Multiuser plugin : - xrootd-privileged is used instead of xrootd - cmsd-privileged is used instead of cmsd The privileged and non-privileged services are mutually exclusive. The origin services consist of the following SystemD units that you must directly manage: Service name Notes xrootd@stash-origin.service Performs data transfers (unauthenticated instance) xrootd@stash-origin-auth.service Performs data transfers (authenticated instance without multiuser ) xrootd-privileged@stash-origin-auth.service Performs data transfers (authenticated instance with multiuser ) These services must be managed with systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> In addition, the origin service automatically uses the following SystemD units: Service name Notes cmsd@stash-origin.service Integrates the origin into the data federation (unauthenticated instance) cmsd@stash-origin-auth.service Integrates the origin into the data federation (authenticated instance without multiuser ) cmsd-privileged@stash-origin-auth.service Integrates the origin into the data federation (authenticated instance with multiuser ) stash-authfile@stash-origin.timer Updates the authorization files periodically (unauthenticated instance) stash-authfile@stash-origin-auth.timer Updates the authorization files periodically (authenticated instance) Verifying the Origin Server \u00b6 Once your server has been registered with the OSG and started, perform the following steps to verify that it is functional. Testing availability \u00b6 To verify that your origin is correctly advertising its availability, run the following command from the origin server: [user@server ~]$ xrdmapc -r --list s redirector.osgstorage.org:1094 0**** redirector.osgstorage.org:1094 Srv ceph-gridftp1.grid.uchicago.edu:1094 Srv stashcache.fnal.gov:1094 Srv stash.osgconnect.net:1094 Srv origin.ligo.caltech.edu:1094 Srv csiu.grid.iu.edu:1094 The output should list the hostname of your origin server. Testing directory export \u00b6 To verify that the directories you are exporting are visible from the redirector, run the following command from the origin server: [user@server ~]$ xrdmapc -r --verify --list s redirector.osgstorage.org:1094 <EXPORTED DIR> 0*rv* redirector.osgstorage.org:1094 >+ Srv ceph-gridftp1.grid.uchicago.edu:1094 ? Srv stashcache.fnal.gov:1094 [not authorized] >+ Srv stash.osgconnect.net:1094 - Srv origin.ligo.caltech.edu:1094 ? Srv csiu.grid.iu.edu:1094 [connect error] Change <EXPORTED_DIR> for the directory the service is suppose to export. Your server should be marked with a >+ to indicate that it contains the given path and the path was accessible. Testing file access (unauthenticated origin) \u00b6 To verify that you can download a file from the origin server, use the stashcp tool, which is available in the stashcp RPM. Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a publicly accessible path. Run the following command: [user@host]$ stashcp <TEST FILE> /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. If unsuccessful, you can pass the -d flag to stashcp for debug info. You can also test directly downloading from the origin via xrdcp , which is available in the xrootd-client RPM. Run the following command: [user@host]$ xrdcp xroot://<origin server>:1094/<TEST FILE> /tmp/testfile Testing file access (authenticated origin) \u00b6 In order to download files from the origin, caches must be able to access the origin via SSL certificates. To test SSL authentication, use the curl command. Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a protected location. As root on your origin, run the following command: [root@host]# curl --cert /etc/grid-security/hostcert.pem \\ --key /etc/grid-security/hostkey.pem \\ https://<origin server>:1095/<TEST FILE> \\ -o /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. Note This test requires including the DN of your origin in your origin's OSG Topology registration . To verify that a user can download a file from the origin server, use the stashcp tool, which is available in the stashcp RPM. Obtain a credential (a SciToken or WLCG Token, depending on your origin's configuration). Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a path you expect to be accessible using the credential you just obtained. Run the following command: [user@host]$ stashcp <TEST FILE> /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. If unsuccessful, you can pass the -d flag to stashcp for debug info. Registering the Origin \u00b6 To be part of the Open Science Data Federation, your origin must be registered with the OSG . The service type is XRootD origin server . The resource must also specify which VOs it will serve data from. To do this, add an AllowedVOs list, with each line specifying a VO whose data the resource is willing to host. For example: MY_OSDF_ORIGIN : Services : XRootD origin server : Description : OSDF origin server AllowedVOs : - GLOW - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-osdf-origin.example.net You can use the special value ANY to indicate that the origin will serve data from any VO that puts data on it. In addition to the origin allowing a VOs via the AllowedVOs list, that VO must also allow the origin in one of its AllowedOrigins lists in DataFederation/StashCache/Namespaces . See the page on getting your VO's data into OSDF . Specifying the DN of your origin is not required but it is useful for testing. Updating to OSG 3.6 \u00b6 The OSG 3.5 series reached end-of-life on May 1, 2022. Admins are strongly encouraged to move their origins to OSG 3.6. See general update instructions . Unauthenticated origins ( xrootd@stash-origin service) do not need any configuration changes. Authenticated origins ( xrootd@stash-origin-auth service) may need the configuration changes described in the updating to OSG 3.6 section of the XRootD authorization configuration document. Getting Help \u00b6 To get assistance, please use the this page .","title":"Install from RPM"},{"location":"data/stashcache/install-origin/#installing-the-osdf-origin","text":"This document describes how to install an Open Science Data Federation (OSDF) origin service. This service allows an organization to export its data to the data federation. Minimum version for this documentation This document describes features introduced in XCache 3.3.0, released on 2022-12-08. When installing, ensure that your version of the stash-origin RPM is at least 3.3.0. Note The OSDF Origin was previously named \"Stash Origin\" and some documentation and software may use the old name. Note The origin must be registered with the OSG prior to joining the data federation. You may start the registration process prior to finishing the installation by using this link along with information like: Resource name and hostname VO associated with this origin server (which will be used to determine the origin's namespace prefix) Administrative and security contact(s) Who (or what) will be allowed to access the VO's data Which caches will be allowed to cache the VO data","title":"Installing the OSDF Origin"},{"location":"data/stashcache/install-origin/#before-starting","text":"Before starting the installation process, consider the following requirements: Operating system: A RHEL 7 or RHEL 8 or compatible operating systems. User IDs: If they do not exist already, the installation will create the Linux user IDs condor and xrootd ; only the xrootd user is utilized for the running daemons. Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request and install host certificates. Network ports: The origin service requires the following ports open: Inbound TCP port 1094 for unauthenticated file access via the XRoot or HTTP protocols (if serving public data) Inbound TCP port 1095 for authenticated file access via the XRoot or HTTPS protocols (if serving authenticated data) Outbound TCP port 1213 to redirector.osgstorage.org for connecting to the data federation Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring. Hardware requirements: We recommend that an origin has at least 1Gbps connectivity and 8GB of RAM. We suggest that several gigabytes of local disk space be available for log files, although some logging verbosity can be reduced. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"data/stashcache/install-origin/#installing-the-origin","text":"The origin service consists of one or more XRootD daemons and their dependencies for the authentication infrastructure. To simplify installation, OSG provides convenience RPMs that install all required software with a single command: root@host # yum install stash-origin For this installation guide, we assume that the data to be exported to the federation is mounted at /mnt/stash and owned by the xrootd:xrootd user.","title":"Installing the Origin"},{"location":"data/stashcache/install-origin/#configuring-the-origin-server","text":"The stash-origin package provides a default configuration files in /etc/xrootd/xrootd-stash-origin.cfg and /etc/xrootd/config.d . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d of the form /etc/xrootd/config.d/1*.cfg (for directives that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for directives that are processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg and /etc/xrootd/config.d/10-origin-site-local.cfg . The mandatory variables to configure are: File Config line Description 10-common-site-local.cfg set rootdir = /mnt/stash The mounted filesystem path to export; this document calls it /mnt/stash 10-common-site-local.cfg set resourcename = YOUR_RESOURCE_NAME The resource name registered with OSG 10-origin-site-local.cfg set PublicOriginExport = /VO/PUBLIC The directory relative to rootdir that is the top of the exported namespace for public (unauthenticated) origin services 10-origin-site-local.cfg set AuthOriginExport = /VO/PUBLIC The directory relative to rootdir that is the top of the exported namespace for authenticated origin services For example, if the HCC VO would like to set up an origin server exporting from the mount point /mnt/stash , and HCC has a public registered namespace at /hcc/PUBLIC , then the following would be set in 10-common-site-local.cfg : set rootdir = /mnt/stash set resourcename = HCC_OSDF_ORIGIN And the following would be set in 10-origin-site-local.cfg : set PublicOriginExport = /hcc/PUBLIC With this configuration, the data under /mnt/stash/hcc/PUBLIC/bio/datasets would be available under the path /hcc/PUBLIC/bio/datasets in the OSDF namespace and the data under /mnt/stash/hcc/PUBLIC/hep/generators would be available under the path /hcc/PUBLIC/hep/generators in the OSDF namespace. If the HCC has a protected registered namespace at /hcc/PROTECTED then set the following in 10-origin-site-local.cfg : set AuthOriginExport = /hcc/PROTECTED If you are serving public data from the origin, you must set PublicOriginExport and use the xrootd@stash-origin service. If you are serving protected data from the origin, you must set AuthOriginExport and use the xrootd@stash-origin-auth service (if not using xrootd-multiuser ) or xrootd-privileged@stash-origin-auth service (if using xrootd-multiuser ). Warning The OSDF namespace is a global namespace. Directories you export must not collide with directories provided by other origin servers; this is why the explicit registration is required.","title":"Configuring the Origin Server"},{"location":"data/stashcache/install-origin/#manually-setting-the-fqdn-optional","text":"The FQDN of the origin server that you registered in Topology may be different than its internal hostname (as reported by hostname -f ). For example, this may be the case if your origin is behind a load balancer such as LVS. In this case, you must manually tell the origin services which FQDN to use for topology lookups. Create the file /etc/systemd/system/stash-authfile@.service.d/override.conf with the following contents: [Service] Environment = ORIGIN_FQDN=<Topology-registered FQDN> Run systemctl daemon-reload after modifying the file.","title":"Manually Setting the FQDN (optional)"},{"location":"data/stashcache/install-origin/#managing-the-origin-services","text":"Serving data for an origin is done by the xrootd daemon. There can be multiple instances of xrootd , running on different ports. The instance that serves unauthenticated data will run on port 1094. The instance that serves authenticated data will run on port 1095. If your origin serves both authenticated and unauthenticated data, you will run both instances. Use of multiuser plugin Some of the service names are different if you have configured the XRootD Multiuser plugin : - xrootd-privileged is used instead of xrootd - cmsd-privileged is used instead of cmsd The privileged and non-privileged services are mutually exclusive. The origin services consist of the following SystemD units that you must directly manage: Service name Notes xrootd@stash-origin.service Performs data transfers (unauthenticated instance) xrootd@stash-origin-auth.service Performs data transfers (authenticated instance without multiuser ) xrootd-privileged@stash-origin-auth.service Performs data transfers (authenticated instance with multiuser ) These services must be managed with systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> In addition, the origin service automatically uses the following SystemD units: Service name Notes cmsd@stash-origin.service Integrates the origin into the data federation (unauthenticated instance) cmsd@stash-origin-auth.service Integrates the origin into the data federation (authenticated instance without multiuser ) cmsd-privileged@stash-origin-auth.service Integrates the origin into the data federation (authenticated instance with multiuser ) stash-authfile@stash-origin.timer Updates the authorization files periodically (unauthenticated instance) stash-authfile@stash-origin-auth.timer Updates the authorization files periodically (authenticated instance)","title":"Managing the Origin Services"},{"location":"data/stashcache/install-origin/#verifying-the-origin-server","text":"Once your server has been registered with the OSG and started, perform the following steps to verify that it is functional.","title":"Verifying the Origin Server"},{"location":"data/stashcache/install-origin/#testing-availability","text":"To verify that your origin is correctly advertising its availability, run the following command from the origin server: [user@server ~]$ xrdmapc -r --list s redirector.osgstorage.org:1094 0**** redirector.osgstorage.org:1094 Srv ceph-gridftp1.grid.uchicago.edu:1094 Srv stashcache.fnal.gov:1094 Srv stash.osgconnect.net:1094 Srv origin.ligo.caltech.edu:1094 Srv csiu.grid.iu.edu:1094 The output should list the hostname of your origin server.","title":"Testing availability"},{"location":"data/stashcache/install-origin/#testing-directory-export","text":"To verify that the directories you are exporting are visible from the redirector, run the following command from the origin server: [user@server ~]$ xrdmapc -r --verify --list s redirector.osgstorage.org:1094 <EXPORTED DIR> 0*rv* redirector.osgstorage.org:1094 >+ Srv ceph-gridftp1.grid.uchicago.edu:1094 ? Srv stashcache.fnal.gov:1094 [not authorized] >+ Srv stash.osgconnect.net:1094 - Srv origin.ligo.caltech.edu:1094 ? Srv csiu.grid.iu.edu:1094 [connect error] Change <EXPORTED_DIR> for the directory the service is suppose to export. Your server should be marked with a >+ to indicate that it contains the given path and the path was accessible.","title":"Testing directory export"},{"location":"data/stashcache/install-origin/#testing-file-access-unauthenticated-origin","text":"To verify that you can download a file from the origin server, use the stashcp tool, which is available in the stashcp RPM. Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a publicly accessible path. Run the following command: [user@host]$ stashcp <TEST FILE> /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. If unsuccessful, you can pass the -d flag to stashcp for debug info. You can also test directly downloading from the origin via xrdcp , which is available in the xrootd-client RPM. Run the following command: [user@host]$ xrdcp xroot://<origin server>:1094/<TEST FILE> /tmp/testfile","title":"Testing file access (unauthenticated origin)"},{"location":"data/stashcache/install-origin/#testing-file-access-authenticated-origin","text":"In order to download files from the origin, caches must be able to access the origin via SSL certificates. To test SSL authentication, use the curl command. Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a protected location. As root on your origin, run the following command: [root@host]# curl --cert /etc/grid-security/hostcert.pem \\ --key /etc/grid-security/hostkey.pem \\ https://<origin server>:1095/<TEST FILE> \\ -o /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. Note This test requires including the DN of your origin in your origin's OSG Topology registration . To verify that a user can download a file from the origin server, use the stashcp tool, which is available in the stashcp RPM. Obtain a credential (a SciToken or WLCG Token, depending on your origin's configuration). Place a <TEST FILE> in <EXPORTED DIR> , where <TEST FILE> can be any file in a path you expect to be accessible using the credential you just obtained. Run the following command: [user@host]$ stashcp <TEST FILE> /tmp/testfile If successful, there should be a file at /tmp/testfile with the contents of the test file on your origin server. If unsuccessful, you can pass the -d flag to stashcp for debug info.","title":"Testing file access (authenticated origin)"},{"location":"data/stashcache/install-origin/#registering-the-origin","text":"To be part of the Open Science Data Federation, your origin must be registered with the OSG . The service type is XRootD origin server . The resource must also specify which VOs it will serve data from. To do this, add an AllowedVOs list, with each line specifying a VO whose data the resource is willing to host. For example: MY_OSDF_ORIGIN : Services : XRootD origin server : Description : OSDF origin server AllowedVOs : - GLOW - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-osdf-origin.example.net You can use the special value ANY to indicate that the origin will serve data from any VO that puts data on it. In addition to the origin allowing a VOs via the AllowedVOs list, that VO must also allow the origin in one of its AllowedOrigins lists in DataFederation/StashCache/Namespaces . See the page on getting your VO's data into OSDF . Specifying the DN of your origin is not required but it is useful for testing.","title":"Registering the Origin"},{"location":"data/stashcache/install-origin/#updating-to-osg-36","text":"The OSG 3.5 series reached end-of-life on May 1, 2022. Admins are strongly encouraged to move their origins to OSG 3.6. See general update instructions . Unauthenticated origins ( xrootd@stash-origin service) do not need any configuration changes. Authenticated origins ( xrootd@stash-origin-auth service) may need the configuration changes described in the updating to OSG 3.6 section of the XRootD authorization configuration document.","title":"Updating to OSG 3.6"},{"location":"data/stashcache/install-origin/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/stashcache/overview/","text":"Open Science Data Federation Overview \u00b6 The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data at each site. The map below shows the location of the current caches in the federation: Joining and Using the OSDF \u00b6 We support three types of deployments: We operate the service for you. All you need is provide us with a Kubernetes host to deploy our container into. This is our preferred way for you to join. It is conceptually described on our home website for an origin. A cache would be deployed exactly the same way. If this is how you want to join OSDF, please send email to support@osg-htc.org and we will guide you through the process. You can deploy our container yourself as described in our documentation . You can deploy from RPM as described in our documentation We strongly suggest that you allow us to operate these services for you (option 1) . The software that implements the service changes frequently enough, and is complicated enough, that keeping up with changes may require significant effort. If your installation is deemed too out-of-date, your service may be excluded from the OSDF. For more information on the OSDF , please see our overview page .","title":"Overview"},{"location":"data/stashcache/overview/#open-science-data-federation-overview","text":"The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data at each site. The map below shows the location of the current caches in the federation:","title":"Open Science Data Federation Overview"},{"location":"data/stashcache/overview/#joining-and-using-the-osdf","text":"We support three types of deployments: We operate the service for you. All you need is provide us with a Kubernetes host to deploy our container into. This is our preferred way for you to join. It is conceptually described on our home website for an origin. A cache would be deployed exactly the same way. If this is how you want to join OSDF, please send email to support@osg-htc.org and we will guide you through the process. You can deploy our container yourself as described in our documentation . You can deploy from RPM as described in our documentation We strongly suggest that you allow us to operate these services for you (option 1) . The software that implements the service changes frequently enough, and is complicated enough, that keeping up with changes may require significant effort. If your installation is deemed too out-of-date, your service may be excluded from the OSDF. For more information on the OSDF , please see our overview page .","title":"Joining and Using the OSDF"},{"location":"data/stashcache/run-stash-origin-container/","text":"Running OSDF Origin in a Container \u00b6 The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data across sites or operate their own scalable infrastructure. Origins store copies of users' data. Each community (or experiment) needs to run one origin to export its data via the federation. This document outlines how to run such an origin in a Docker container. Note The OSDF Origin was previously named \"Stash Origin\" and some documentation and software may use the old name. Before Starting \u00b6 Before starting the installation process, consider the following requirements: Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: The origin listens for incoming HTTP(S) and XRootD connections on ports 1094 and/or 1095. 1094 is used for serving public (unauthenticated) data, and 1095 is used for serving authenticated data. File Systems: The origin needs a host partition to store user data. Hardware requirements: We recommend that an origin has at least 1Gbps connectivity and 8GB of RAM. Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request host certificates. Registration: Before deploying an origin, you must register the service in the OSG Topology Note This document describes features introduced in XCache 3.2.2, released on 2022-09-29. You must use a version of the opensciencegrid/stash-origin image built after that date. Configuring the Origin \u00b6 In addition to the required configuration above (ports and file systems), you may also configure the behavior of your origin with the following variables using an environment variable file: Where the environment file on the docker host, /opt/origin/.env , has (at least) the following contents, replacing <YOUR_RESOURCE_NAME> with the resource name of your origin as registered in Topology and <FQDN> with the public DNS name that should be used to contact your origin: XC_RESOURCENAME=YOUR_SITE_NAME ORIGIN_FQDN=<FQDN> In addition, define the following variables to specify which subpaths should be served as public (unauthenticated) data on port 1094, and which subpaths should be served as authenticated data on port 1095: XC_PUBLIC_ORIGIN_EXPORT=/<VO>/PUBLIC XC_AUTH_ORIGIN_EXPORT=/<VO>/PROTECTED These paths are relative to the host partition being served -- see the Populating Origin Data section below. If you only define XC_AUTH_ORIGIN_EXPORT , you will only serve data on port 1095. If you only define XC_PUBLIC_ORIGIN_EXPORT , you will only serve data on port 1094. If you do not define either, you will serve the entire host partition as public data on port 1094. Note For backward compatibility, XC_ORIGINEXPORT is accepted as an alias for XC_PUBLIC_ORIGIN_EXPORT . Providing a host certificate \u00b6 The service will need a certificate for contacting central OSDF services and for authenticating connections. Follow our host certificate documentation to obtain a host certificate and key. Then, volume-mount the host certificate to /etc/grid-security/hostcert.pem , and the key to /etc/grid-security/hostkey.pem . Note You must restart the container whenever you renew your certificate in order for the services to pick up the new certificate. If you automate certificate renewal, you should automate restarts as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site . Populating Origin Data \u00b6 The OSDF namespace is shared by multiple VOs so you must choose a namespace for your own VO's data. When running an origin container, your chosen namespace must be reflected in your host partition. For example, if your host partition is /srv/origin and the name of your VO is ASTRO , you should store the Astro VO's public data in /srv/origin/astro/PUBLIC , and protected data in /srv/origin/astro/PROTECTED . When starting the container, mount /srv/origin/ into /xcache/namespace in the container, and set the environment variables XC_PUBLIC_ORIGIN_EXPORT=/astro/PUBLIC and XC_AUTH_ORIGIN_EXPORT=/astro/PROTECTED . You may omit XC_AUTH_ORIGIN_EXPORT if you are only serving public data, or omit XC_PUBLIC_ORIGIN_EXPORT if you are only serving protected data. If you omit both, the entire /srv/origin partition will be served as public data. Running the Origin \u00b6 It is recommended to use a container orchestration service such as docker-compose or kubernetes whose details are beyond the scope of this document. The following sections provide examples for starting origin containers from the command-line as well as a more production-appropriate method using systemd. user@host $ docker run --rm --publish 1094 :1094 --publish 1095 :1095 \\ --volume <HOST PARTITION>:/xcache/namespace \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/origin/.env \\ opensciencegrid/stash-origin:3.6-release Replacing <HOST PARTITION> with the host directory containing data that your origin should serve. See this section for details. Warning Unless configured otherwise via the env file /opt/origin/.env , a container deployed this way will serve the entire contents of <HOST PARTITION> . See the Configuring the Origin section for information on how to serve one subpath as public and another as protected. Note You may omit --publish 1094:1094 if you are only serving authenticated data, or omit --publish 1095:1095 if you are only serving public data. Running on origin container with systemd \u00b6 An example systemd service file for the OSDF. This will require creating the environment file in the directory /opt/origin/.env . Note This example systemd file assumes <HOST PARTITION> is /srv/origin , and the cert and key to use are in /etc/ssl/host.crt and /etc/ssl/host.key , respectively. Create the systemd service file /etc/systemd/system/docker.stash-origin.service as follows: [Unit] Description=Origin Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/stash-origin:3.6-release ExecStart=/usr/bin/docker run --rm --name %n \\ --publish 1094:1094 \\ --publish 1095:1095 \\ --volume /srv/origin:/xcache/namespace \\ --volume /etc/ssl/host.crt:/etc/grid-security/hostcert.pem \\ --volume /etc/ssl/host.key:/etc/grid-security/hostkey.pem \\ --env-file /opt/origin/.env \\ opensciencegrid/stash-origin:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.stash-origin root@host $ systemctl start docker.stash-origin Warning Unless configured otherwise via the env file /opt/origin/.env , a container deployed this way will serve the entire contents of /srv/origin . See the Configuring the Origin section for information on how to serve one subpath as public and another as protected. Note You may omit --publish 1094:1094 if you are only serving authenticated data, or omit --publish 1095:1095 if you are only serving public data. Warning You must register the origin before starting it up. Validating the Origin \u00b6 To validate the origin please follow the validating origin instructions . Getting Help \u00b6 To get assistance, please use the this page .","title":"Install from container"},{"location":"data/stashcache/run-stash-origin-container/#running-osdf-origin-in-a-container","text":"The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data across sites or operate their own scalable infrastructure. Origins store copies of users' data. Each community (or experiment) needs to run one origin to export its data via the federation. This document outlines how to run such an origin in a Docker container. Note The OSDF Origin was previously named \"Stash Origin\" and some documentation and software may use the old name.","title":"Running OSDF Origin in a Container"},{"location":"data/stashcache/run-stash-origin-container/#before-starting","text":"Before starting the installation process, consider the following requirements: Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: The origin listens for incoming HTTP(S) and XRootD connections on ports 1094 and/or 1095. 1094 is used for serving public (unauthenticated) data, and 1095 is used for serving authenticated data. File Systems: The origin needs a host partition to store user data. Hardware requirements: We recommend that an origin has at least 1Gbps connectivity and 8GB of RAM. Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request host certificates. Registration: Before deploying an origin, you must register the service in the OSG Topology Note This document describes features introduced in XCache 3.2.2, released on 2022-09-29. You must use a version of the opensciencegrid/stash-origin image built after that date.","title":"Before Starting"},{"location":"data/stashcache/run-stash-origin-container/#configuring-the-origin","text":"In addition to the required configuration above (ports and file systems), you may also configure the behavior of your origin with the following variables using an environment variable file: Where the environment file on the docker host, /opt/origin/.env , has (at least) the following contents, replacing <YOUR_RESOURCE_NAME> with the resource name of your origin as registered in Topology and <FQDN> with the public DNS name that should be used to contact your origin: XC_RESOURCENAME=YOUR_SITE_NAME ORIGIN_FQDN=<FQDN> In addition, define the following variables to specify which subpaths should be served as public (unauthenticated) data on port 1094, and which subpaths should be served as authenticated data on port 1095: XC_PUBLIC_ORIGIN_EXPORT=/<VO>/PUBLIC XC_AUTH_ORIGIN_EXPORT=/<VO>/PROTECTED These paths are relative to the host partition being served -- see the Populating Origin Data section below. If you only define XC_AUTH_ORIGIN_EXPORT , you will only serve data on port 1095. If you only define XC_PUBLIC_ORIGIN_EXPORT , you will only serve data on port 1094. If you do not define either, you will serve the entire host partition as public data on port 1094. Note For backward compatibility, XC_ORIGINEXPORT is accepted as an alias for XC_PUBLIC_ORIGIN_EXPORT .","title":"Configuring the Origin"},{"location":"data/stashcache/run-stash-origin-container/#providing-a-host-certificate","text":"The service will need a certificate for contacting central OSDF services and for authenticating connections. Follow our host certificate documentation to obtain a host certificate and key. Then, volume-mount the host certificate to /etc/grid-security/hostcert.pem , and the key to /etc/grid-security/hostkey.pem . Note You must restart the container whenever you renew your certificate in order for the services to pick up the new certificate. If you automate certificate renewal, you should automate restarts as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site .","title":"Providing a host certificate"},{"location":"data/stashcache/run-stash-origin-container/#populating-origin-data","text":"The OSDF namespace is shared by multiple VOs so you must choose a namespace for your own VO's data. When running an origin container, your chosen namespace must be reflected in your host partition. For example, if your host partition is /srv/origin and the name of your VO is ASTRO , you should store the Astro VO's public data in /srv/origin/astro/PUBLIC , and protected data in /srv/origin/astro/PROTECTED . When starting the container, mount /srv/origin/ into /xcache/namespace in the container, and set the environment variables XC_PUBLIC_ORIGIN_EXPORT=/astro/PUBLIC and XC_AUTH_ORIGIN_EXPORT=/astro/PROTECTED . You may omit XC_AUTH_ORIGIN_EXPORT if you are only serving public data, or omit XC_PUBLIC_ORIGIN_EXPORT if you are only serving protected data. If you omit both, the entire /srv/origin partition will be served as public data.","title":"Populating Origin Data"},{"location":"data/stashcache/run-stash-origin-container/#running-the-origin","text":"It is recommended to use a container orchestration service such as docker-compose or kubernetes whose details are beyond the scope of this document. The following sections provide examples for starting origin containers from the command-line as well as a more production-appropriate method using systemd. user@host $ docker run --rm --publish 1094 :1094 --publish 1095 :1095 \\ --volume <HOST PARTITION>:/xcache/namespace \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/origin/.env \\ opensciencegrid/stash-origin:3.6-release Replacing <HOST PARTITION> with the host directory containing data that your origin should serve. See this section for details. Warning Unless configured otherwise via the env file /opt/origin/.env , a container deployed this way will serve the entire contents of <HOST PARTITION> . See the Configuring the Origin section for information on how to serve one subpath as public and another as protected. Note You may omit --publish 1094:1094 if you are only serving authenticated data, or omit --publish 1095:1095 if you are only serving public data.","title":"Running the Origin"},{"location":"data/stashcache/run-stash-origin-container/#running-on-origin-container-with-systemd","text":"An example systemd service file for the OSDF. This will require creating the environment file in the directory /opt/origin/.env . Note This example systemd file assumes <HOST PARTITION> is /srv/origin , and the cert and key to use are in /etc/ssl/host.crt and /etc/ssl/host.key , respectively. Create the systemd service file /etc/systemd/system/docker.stash-origin.service as follows: [Unit] Description=Origin Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/stash-origin:3.6-release ExecStart=/usr/bin/docker run --rm --name %n \\ --publish 1094:1094 \\ --publish 1095:1095 \\ --volume /srv/origin:/xcache/namespace \\ --volume /etc/ssl/host.crt:/etc/grid-security/hostcert.pem \\ --volume /etc/ssl/host.key:/etc/grid-security/hostkey.pem \\ --env-file /opt/origin/.env \\ opensciencegrid/stash-origin:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.stash-origin root@host $ systemctl start docker.stash-origin Warning Unless configured otherwise via the env file /opt/origin/.env , a container deployed this way will serve the entire contents of /srv/origin . See the Configuring the Origin section for information on how to serve one subpath as public and another as protected. Note You may omit --publish 1094:1094 if you are only serving authenticated data, or omit --publish 1095:1095 if you are only serving public data. Warning You must register the origin before starting it up.","title":"Running on origin container with systemd"},{"location":"data/stashcache/run-stash-origin-container/#validating-the-origin","text":"To validate the origin please follow the validating origin instructions .","title":"Validating the Origin"},{"location":"data/stashcache/run-stash-origin-container/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/stashcache/run-stashcache-container/","text":"Running OSDF Cache in a Container \u00b6 The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data across sites or operate their own scalable infrastructure. OSDF Caches transfer data to clients such as jobs or users. A set of caches are operated across the OSG for the benefit of nearby sites; in addition, each site may run its own cache in order to reduce the amount of data transferred over the WAN. This document outlines how to run a cache in a Docker container. Note The OSDF cache was previously named \"Stash Cache\" and some documentation and software may use the old name. Before Starting \u00b6 Before starting the installation process, consider the following requirements: Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: The cache service requires the following open ports: Inbound TCP port 1094 for unauthenticated file access via the XRootD protocol (optional) Inbound TCP port 8000 for unauthenticated file access via HTTP(S) and/or Inbound TCP port 8443 for authenticated file access via HTTPS Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring File Systems: The cache needs host partitions to store user data. For improved performance and storage, we recommend multiple partitions for handling namespaces (HDD, SSD, or NVMe), data (HDDs), and metadata (SSDs or NVMe). Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request host certificates. Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 1 TB of disk space for the cache directory, and 12GB of RAM. Registering the Cache \u00b6 To be part of the OSDF, your cache must be registered with the OSG. You will need basic information like the resource name, hostname, host certificate DN, and the administrative and security contacts. Initial registration \u00b6 To register your cache host, follow the general registration instructions here . The service type is XRootD cache server . Info This step must be completed before installation. In your registration, you must specify which VOs your cache will serve by adding an AllowedVOs list, with each line specifying a VO whose data you are willing to cache. There are special values you may use in AllowedVOs : ANY_PUBLIC indicates that the cache is willing to serve public data from any VO. ANY indicates that the cache is willing to serve data from any VO, both public and protected. ANY implies ANY_PUBLIC . There are extra requirements for serving protected data: In addition to the cache allowing a VO in the AllowedVOs list, that VO must also allow the cache in its AllowedCaches list. See the page on getting your VO's data into OSDF . There must be an authenticated XRootD instance on the cache server. There must be a DN attribute in the resource registration with the subject DN of the host certificate This is an example registration for a cache server that serves all public data: MY_OSDF_CACHE : FQDN : my-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - ANY_PUBLIC This is an example registration for a cache server that only serves protected data for the Open Science Pool: MY_AUTH_OSDF_CACHE : FQDN : my-auth-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-auth-cache.example.net This is an example registration for a cache server that serves all public data and protected data from the OSG VO: MY_COMBO_OSDF_CACHE : FQDN : my-combo-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG - ANY_PUBLIC DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-combo-cache.example.net Configuring the OSDF Cache \u00b6 In addition to the required configuration above (ports and file systems), you may also configure the behavior of your cache with the following variables using an environment variable file: Where the environment file on the docker host, /opt/xcache/.env , has (at least) the following contents, replacing <YOUR_RESOURCE_NAME> with the name of your resource as registered in Topology and <FQDN> with the public DNS name that should be used to contact your cache: XC_RESOURCENAME=<YOUR_RESOURCE_NAME> CACHE_FQDN=<FQDN> Providing a host certificate \u00b6 The service will need a certificate for contacting central OSDF services and for authenticating to origins. Follow our host certificate documentation to obtain a host certificate and key. Then, volume-mount the host certificate to /etc/grid-security/hostcert.pem , and the key to /etc/grid-security/hostkey.pem . Note You must restart the container whenever you renew your certificate in order for the services to pick up the new certificate. If you automate certificate renewal, you should automate restarts as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site . Optional configuration \u00b6 Further behavior of the cache can be configured by setting the following in the environment variable file: XC_SPACE_HIGH_WM , XC_SPACE_LOW_WM : High-water and low-water marks for disk usage, as numbers between 0.00 (0%) and 1.00 (100%); when usage goes above the high-water mark, the cache will delete files until it hits the low-water mark. XC_RAMSIZE : Amount of memory to use for storing blocks before writting them to disk. (Use higher for slower disks). XC_BLOCKSIZE : Size of the blocks in the cache. XC_PREFETCH : Number of blocks to prefetch from a file at once. This controls how aggressive the cache is to request portions of a file. Running a Cache \u00b6 Cache containers may be run with either multiple mounted host partitions (recommended) or a single host partition. It is recommended to use a container orchestration service such as docker-compose or kubernetes whose details are beyond the scope of this document. The following sections provide examples for starting cache containers from the command-line as well as a more production-appropriate method using systemd. Multiple host partitions (recommended) \u00b6 For improved performance and storage, especially if your cache is serving over 10 TB of data, we recommend multiple partitions for handling namespaces (HDD, SSD, or NVMe), data (HDDs), and metadata (SSDs or NVMe). Note Under this configuration the <NAMESPACE PARTITION> is not used to store the files. Instead, the partition stores symlinks to the files in the metadata and data partitions. user@host $ docker run --rm \\ --publish <HTTP HOST PORT>:8000 \\ --publish <HTTPS HOST PORT>:8443 \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --volume <NAMESPACE PARTITION>:/xcache/namespace \\ --volume <METADATA PARTITION 1 >:/xcache/meta1 ... --volume <METADATA PARTITION N>:/xcache/metaN --volume <DATA PARTITION 1>:/xcache/data1 ... --volume <DATA PARTITION N>:/xcache/dataN --env-file=/opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release Warning For over 10 TB of assigned space we highly encourage to use this setup and mount <NAMESPACE PARTITION> in solid state disks or NVMe. Single host partition \u00b6 For a simpler installation, you may use a single host partition mounted to /xcache/ : user@host $ docker run --rm \\ --publish <HTTP HOST PORT>:8000 \\ --publish <HTTPS HOST PORT>:8443 \\ --volume <HOST PARTITION>:/xcache \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release Running a cache on container with systemd \u00b6 An example systemd service file for the OSDF cache. This will require creating the environment file in the directory /opt/xcache/.env . Note This example systemd file assumes <HTTP HOST PORT> is 8000 , <HTTPS HOST PORT> is 8443 , <HOST PARTITION> is /srv/cache , and the cert and key to use are in /etc/ssl/host.crt and /etc/ssl/host.key , respectively. Create the systemd service file /etc/systemd/system/docker.stash-cache.service as follows: [Unit] Description=Cache Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/stash-cache:3.6-release ExecStart=/usr/bin/docker run --rm --name %n \\ --publish 8000:8000 \\ --publish 8443:8443 \\ --volume /srv/cache:/xcache \\ --volume /etc/ssl/host.crt:/etc/grid-security/hostcert.pem \\ --volume /etc/ssl/host.key:/etc/grid-security/hostkey.pem \\ --env-file /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.stash-cache root@host $ systemctl start docker.stash-cache Warning You must register the cache before starting it up. Network optimization \u00b6 For caches that are connected to NICs over 40 Gbps we recommend that you disable the virtualized network and \"bind\" the container to the host network: user@host $ docker run --rm \\ --network = \"host\" \\ --volume <HOST PARTITION>:/cache \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release Memory optimization \u00b6 The cache uses the host's memory for two purposes: Caching files recently read from disk (via the kernel page cache). Buffering files recently received from the network before writing them to disk (to compensate for slow disks). An easy way to increase the performance of the cache is to assign it more memory. If you set a limit on the container's memory usage via the docker option --memory or Kubernetes resource limits, make sure it is at least twice the value of XC_RAMSIZE . Validating the Cache \u00b6 The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl . Here, <HTTP HOST PORT> is the port chosen in the docker run command, 8000 by default. user@host $ curl -O http://cache_host:<HTTP HOST PORT>/osgconnect/public/rynge/test.data curl may not correctly report a failure, so verify that the contents of the file are: hello world! Getting Help \u00b6 To get assistance, please use the this page .","title":"Install from container"},{"location":"data/stashcache/run-stashcache-container/#running-osdf-cache-in-a-container","text":"The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data across sites or operate their own scalable infrastructure. OSDF Caches transfer data to clients such as jobs or users. A set of caches are operated across the OSG for the benefit of nearby sites; in addition, each site may run its own cache in order to reduce the amount of data transferred over the WAN. This document outlines how to run a cache in a Docker container. Note The OSDF cache was previously named \"Stash Cache\" and some documentation and software may use the old name.","title":"Running OSDF Cache in a Container"},{"location":"data/stashcache/run-stashcache-container/#before-starting","text":"Before starting the installation process, consider the following requirements: Docker: For the purpose of this guide, the host must have a running docker service and you must have the ability to start containers (i.e., belong to the docker Unix group). Network ports: The cache service requires the following open ports: Inbound TCP port 1094 for unauthenticated file access via the XRootD protocol (optional) Inbound TCP port 8000 for unauthenticated file access via HTTP(S) and/or Inbound TCP port 8443 for authenticated file access via HTTPS Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring File Systems: The cache needs host partitions to store user data. For improved performance and storage, we recommend multiple partitions for handling namespaces (HDD, SSD, or NVMe), data (HDDs), and metadata (SSDs or NVMe). Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request host certificates. Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 1 TB of disk space for the cache directory, and 12GB of RAM.","title":"Before Starting"},{"location":"data/stashcache/run-stashcache-container/#registering-the-cache","text":"To be part of the OSDF, your cache must be registered with the OSG. You will need basic information like the resource name, hostname, host certificate DN, and the administrative and security contacts.","title":"Registering the Cache"},{"location":"data/stashcache/run-stashcache-container/#initial-registration","text":"To register your cache host, follow the general registration instructions here . The service type is XRootD cache server . Info This step must be completed before installation. In your registration, you must specify which VOs your cache will serve by adding an AllowedVOs list, with each line specifying a VO whose data you are willing to cache. There are special values you may use in AllowedVOs : ANY_PUBLIC indicates that the cache is willing to serve public data from any VO. ANY indicates that the cache is willing to serve data from any VO, both public and protected. ANY implies ANY_PUBLIC . There are extra requirements for serving protected data: In addition to the cache allowing a VO in the AllowedVOs list, that VO must also allow the cache in its AllowedCaches list. See the page on getting your VO's data into OSDF . There must be an authenticated XRootD instance on the cache server. There must be a DN attribute in the resource registration with the subject DN of the host certificate This is an example registration for a cache server that serves all public data: MY_OSDF_CACHE : FQDN : my-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - ANY_PUBLIC This is an example registration for a cache server that only serves protected data for the Open Science Pool: MY_AUTH_OSDF_CACHE : FQDN : my-auth-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-auth-cache.example.net This is an example registration for a cache server that serves all public data and protected data from the OSG VO: MY_COMBO_OSDF_CACHE : FQDN : my-combo-cache.example.net Services : XRootD cache server : Description : OSDF cache server AllowedVOs : - OSG - ANY_PUBLIC DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-combo-cache.example.net","title":"Initial registration"},{"location":"data/stashcache/run-stashcache-container/#configuring-the-osdf-cache","text":"In addition to the required configuration above (ports and file systems), you may also configure the behavior of your cache with the following variables using an environment variable file: Where the environment file on the docker host, /opt/xcache/.env , has (at least) the following contents, replacing <YOUR_RESOURCE_NAME> with the name of your resource as registered in Topology and <FQDN> with the public DNS name that should be used to contact your cache: XC_RESOURCENAME=<YOUR_RESOURCE_NAME> CACHE_FQDN=<FQDN>","title":"Configuring the OSDF Cache"},{"location":"data/stashcache/run-stashcache-container/#providing-a-host-certificate","text":"The service will need a certificate for contacting central OSDF services and for authenticating to origins. Follow our host certificate documentation to obtain a host certificate and key. Then, volume-mount the host certificate to /etc/grid-security/hostcert.pem , and the key to /etc/grid-security/hostkey.pem . Note You must restart the container whenever you renew your certificate in order for the services to pick up the new certificate. If you automate certificate renewal, you should automate restarts as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented on the Certbot site .","title":"Providing a host certificate"},{"location":"data/stashcache/run-stashcache-container/#optional-configuration","text":"Further behavior of the cache can be configured by setting the following in the environment variable file: XC_SPACE_HIGH_WM , XC_SPACE_LOW_WM : High-water and low-water marks for disk usage, as numbers between 0.00 (0%) and 1.00 (100%); when usage goes above the high-water mark, the cache will delete files until it hits the low-water mark. XC_RAMSIZE : Amount of memory to use for storing blocks before writting them to disk. (Use higher for slower disks). XC_BLOCKSIZE : Size of the blocks in the cache. XC_PREFETCH : Number of blocks to prefetch from a file at once. This controls how aggressive the cache is to request portions of a file.","title":"Optional configuration"},{"location":"data/stashcache/run-stashcache-container/#running-a-cache","text":"Cache containers may be run with either multiple mounted host partitions (recommended) or a single host partition. It is recommended to use a container orchestration service such as docker-compose or kubernetes whose details are beyond the scope of this document. The following sections provide examples for starting cache containers from the command-line as well as a more production-appropriate method using systemd.","title":"Running a Cache"},{"location":"data/stashcache/run-stashcache-container/#multiple-host-partitions-recommended","text":"For improved performance and storage, especially if your cache is serving over 10 TB of data, we recommend multiple partitions for handling namespaces (HDD, SSD, or NVMe), data (HDDs), and metadata (SSDs or NVMe). Note Under this configuration the <NAMESPACE PARTITION> is not used to store the files. Instead, the partition stores symlinks to the files in the metadata and data partitions. user@host $ docker run --rm \\ --publish <HTTP HOST PORT>:8000 \\ --publish <HTTPS HOST PORT>:8443 \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --volume <NAMESPACE PARTITION>:/xcache/namespace \\ --volume <METADATA PARTITION 1 >:/xcache/meta1 ... --volume <METADATA PARTITION N>:/xcache/metaN --volume <DATA PARTITION 1>:/xcache/data1 ... --volume <DATA PARTITION N>:/xcache/dataN --env-file=/opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release Warning For over 10 TB of assigned space we highly encourage to use this setup and mount <NAMESPACE PARTITION> in solid state disks or NVMe.","title":"Multiple host partitions (recommended)"},{"location":"data/stashcache/run-stashcache-container/#single-host-partition","text":"For a simpler installation, you may use a single host partition mounted to /xcache/ : user@host $ docker run --rm \\ --publish <HTTP HOST PORT>:8000 \\ --publish <HTTPS HOST PORT>:8443 \\ --volume <HOST PARTITION>:/xcache \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release","title":"Single host partition"},{"location":"data/stashcache/run-stashcache-container/#running-a-cache-on-container-with-systemd","text":"An example systemd service file for the OSDF cache. This will require creating the environment file in the directory /opt/xcache/.env . Note This example systemd file assumes <HTTP HOST PORT> is 8000 , <HTTPS HOST PORT> is 8443 , <HOST PARTITION> is /srv/cache , and the cert and key to use are in /etc/ssl/host.crt and /etc/ssl/host.key , respectively. Create the systemd service file /etc/systemd/system/docker.stash-cache.service as follows: [Unit] Description=Cache Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/stash-cache:3.6-release ExecStart=/usr/bin/docker run --rm --name %n \\ --publish 8000:8000 \\ --publish 8443:8443 \\ --volume /srv/cache:/xcache \\ --volume /etc/ssl/host.crt:/etc/grid-security/hostcert.pem \\ --volume /etc/ssl/host.key:/etc/grid-security/hostkey.pem \\ --env-file /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release [Install] WantedBy=multi-user.target Enable and start the service with: root@host $ systemctl enable docker.stash-cache root@host $ systemctl start docker.stash-cache Warning You must register the cache before starting it up.","title":"Running a cache on container with systemd"},{"location":"data/stashcache/run-stashcache-container/#network-optimization","text":"For caches that are connected to NICs over 40 Gbps we recommend that you disable the virtualized network and \"bind\" the container to the host network: user@host $ docker run --rm \\ --network = \"host\" \\ --volume <HOST PARTITION>:/cache \\ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \\ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \\ --env-file = /opt/xcache/.env \\ opensciencegrid/stash-cache:3.6-release","title":"Network optimization"},{"location":"data/stashcache/run-stashcache-container/#memory-optimization","text":"The cache uses the host's memory for two purposes: Caching files recently read from disk (via the kernel page cache). Buffering files recently received from the network before writing them to disk (to compensate for slow disks). An easy way to increase the performance of the cache is to assign it more memory. If you set a limit on the container's memory usage via the docker option --memory or Kubernetes resource limits, make sure it is at least twice the value of XC_RAMSIZE .","title":"Memory optimization"},{"location":"data/stashcache/run-stashcache-container/#validating-the-cache","text":"The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as curl . Here, <HTTP HOST PORT> is the port chosen in the docker run command, 8000 by default. user@host $ curl -O http://cache_host:<HTTP HOST PORT>/osgconnect/public/rynge/test.data curl may not correctly report a failure, so verify that the contents of the file are: hello world!","title":"Validating the Cache"},{"location":"data/stashcache/run-stashcache-container/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/stashcache/vo-data/","text":"Getting VO Data into the OSDF \u00b6 This document describes the steps required to manage a VO's role in the Open Science Data Federation (OSDF) including selecting a namespace, registration, and selecting which resources are allowed to host or cache your data. For general information about the OSDF, see the overview document . Site admins should work together with VO managers in order to perform these steps. Definitions \u00b6 Namespace: a directory tree in the federation that is used to find VO data. Public data: data that can be read by anyone. Protected data: data that requires authorization to read. Requirements \u00b6 In order for a Virtual Organization to join the federation, the VO must already be registered in OSG Topology. See the registration document . Choosing Namespaces \u00b6 The VO must pick one or more \"namespaces\" for their data. A namespace is a directory tree in the federation where VO data is found. Note Namespaces are global across the federation, so you must work with the OSG Operations team to ensure that your VO's namespaces do not collide with those of another VO. Send an email to help@osg-htc.org with the following subject: \"Requesting OSDF namespaces for VO \" and put the desired namespaces in the body of the email. A namespace should be easy for your users to remember but not so generic that it collides with other VOs. We recommend using the lowercase version of your VO as the top-level directory. In addition, public data, if any, should be stored in a subdirectory named PUBLIC , and protected data, if any, should be stored in a subdirectory named PROTECTED . Putting this together, if your VO is named Astro , you should have: /astro/PUBLIC for public data /astro/PROTECTED for protected data Separating the public and protected data in separate directory trees is preferred for technical reasons. Registering Data Federation Information \u00b6 The VO must allow one or more origins to host their data. An origin will typically be hosted on a site owned by the VO. For information about setting up an origin, see the installation document . In order to declare your VO's role in the federation, you must add OSDF information to your VO's YAML file in the OSG Topology repository. For example, the full registration for the Astro VO may look something like the following: DataFederations : StashCache : Namespaces : - Path : /astro/PUBLIC Authorizations : - PUBLIC AllowedCaches : - ANY AllowedOrigins : - ASTRO_OSDF_ORIGIN - Path : /astro/PROTECTED Authorizations : - FQAN : /Astro - DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci - SciTokens : Issuer : https://astro.org Base Path : /astro/PROTECTED AllowedCaches : - ASTRO_EAST_CACHE - ASTRO_WEST_CACHE AllowedOrigins : - ASTRO_AUTH_OSDF_ORIGIN The sections are described below. Namespaces section \u00b6 In the namespaces section, you will declare one or more namespaces. A namespace is a directory tree in the data federation that is owned by a VO/collaboration. Each namespace requires: a Path that is the path to the directory tree, e.g. /astro/PUBLIC an Authorizations list which describes how users are authorized to access data within the namespace an AllowedCaches list of the OSDF caches that are allowed to cache the data within the namespace an AllowedOrigins list of the OSDF origins that are allowed to serve the data within the namespace In addition, a namespace may have the following optional attributes: a Writeback endpoint that is an HTTPS URL like https://stash-xrd.osgconnect.net:1094 that can be used for jobs to write data to the origin a DirList endpoint that is an HTTPS URL like https://origin-auth2001.chtc.wisc.edu:1095 that can be used for getting a directory listing of that namespace Authorizations list \u00b6 The Authorizations list of each namespace describes how a user can get authorized in order to access the data within the namespace. The list will contain one or more of these: FQAN: <VOMS FQAN> allows someone using a proxy with the specified VOMS FQAN DN: <DN> allows someone using a proxy with that specific DN PUBLIC allows anyone; this is used for public data SciTokens allows someone using a SciToken with the given parameters, which are described below A complete declaration looks like: Namespaces : - Path : /astro/PUBLIC Authorizations : - PUBLIC AllowedCaches : ... AllowedOrigins : ... - Path : /astro/PROTECTED Authorizations : - FQAN : /Astro - DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci - SciTokens : Issuer : https://astro.org Base Path : /astro/PROTECTED Map Subject : True AllowedCaches : ... AllowedOrigins : ... This declares two namespaces: /astro/PUBLIC for public data, and /astro/PROTECTED which can only be read by someone with the /Astro FQAN, by Matyas Selmeci, or by someone with a SciToken issued by https://astro.org . SciTokens \u00b6 A SciTokens authorization has multiple parameters: Issuer (required) is the token issuer of the SciToken that the authorization accepts. Base Path (required) is a path that will be prepended to the scopes of the token in order to construct the full path to the file(s) that the bearer of the token is allowed to access. For example, if Base Path is set to /astro/PROTECTED then a token with the scope read:/matyas will have the permission to read from the directory tree under /astro/PROTECTED/matyas . The correct value for Base Path depends on how the issuer is set up, but we recommend that you set Base Path to the namespace path, and configure the issuer to create scopes relative to the namespace path. Map Subject (optional, False if not specified) should be set to True if the origin uses the XRootD-Multiuser plugin. It will cause the origin to use the token subject ( sub field) to map to a Unix user in order to access files. Restricted Path (optional) is a further restriction on paths the token is allowed to access. Only tokens whose scopes start with the Restricted Path will be accepted. Use this only if your issuer does not create relative scopes. AllowedCaches list \u00b6 The VO must allow one or more OSDF caches to cache their data. The more places a VO's data can be cached in, the bigger the data transfer benefit for the VO. The majority of caches across OSG will automatically cache all \"public\" VO data. Caching \"protected\" VO data will often be done on a site owned by the VO. For information about setting up a cache, see the installation document . AllowedCaches is a list of which caches are allowed to host copies of your data. There are two cases: If you only have public data, your AllowedCaches list can look like: AllowedCaches : - ANY This allows any cache to host a copy of your data. If you have some protected data, then AllowedCaches is a list of resources that are allowed to cache your data. A resource is an entry in a /topology/<FACILITY>/<SITE>/<RESOURCEGROUP>.yaml file, for example CHTC_OSDF_CACHE . The following requirements must be met for the resource: It must have an \"XRootD cache server\" service It must have an AllowedVOs list that includes either your VO, \"ANY\", or \"ANY_PUBLIC\" It must have a DN attribute with the DN of its host cert AllowedOrigins list \u00b6 AllowedOrigins is a list of which origins are allowed to host your data. This is a list of resources . A resource is an entry in a /topology/<FACILITY>/<SITE>/<RESOURCEGROUP>.yaml file, for example CHTC_OSDF_ORIGIN . The following requirements must be met for the resource: It must have an \"XRootD origin server\" service It must have an AllowedVOs list that includes either your VO or \"ANY\"","title":"Publishing VO data"},{"location":"data/stashcache/vo-data/#getting-vo-data-into-the-osdf","text":"This document describes the steps required to manage a VO's role in the Open Science Data Federation (OSDF) including selecting a namespace, registration, and selecting which resources are allowed to host or cache your data. For general information about the OSDF, see the overview document . Site admins should work together with VO managers in order to perform these steps.","title":"Getting VO Data into the OSDF"},{"location":"data/stashcache/vo-data/#definitions","text":"Namespace: a directory tree in the federation that is used to find VO data. Public data: data that can be read by anyone. Protected data: data that requires authorization to read.","title":"Definitions"},{"location":"data/stashcache/vo-data/#requirements","text":"In order for a Virtual Organization to join the federation, the VO must already be registered in OSG Topology. See the registration document .","title":"Requirements"},{"location":"data/stashcache/vo-data/#choosing-namespaces","text":"The VO must pick one or more \"namespaces\" for their data. A namespace is a directory tree in the federation where VO data is found. Note Namespaces are global across the federation, so you must work with the OSG Operations team to ensure that your VO's namespaces do not collide with those of another VO. Send an email to help@osg-htc.org with the following subject: \"Requesting OSDF namespaces for VO \" and put the desired namespaces in the body of the email. A namespace should be easy for your users to remember but not so generic that it collides with other VOs. We recommend using the lowercase version of your VO as the top-level directory. In addition, public data, if any, should be stored in a subdirectory named PUBLIC , and protected data, if any, should be stored in a subdirectory named PROTECTED . Putting this together, if your VO is named Astro , you should have: /astro/PUBLIC for public data /astro/PROTECTED for protected data Separating the public and protected data in separate directory trees is preferred for technical reasons.","title":"Choosing Namespaces"},{"location":"data/stashcache/vo-data/#registering-data-federation-information","text":"The VO must allow one or more origins to host their data. An origin will typically be hosted on a site owned by the VO. For information about setting up an origin, see the installation document . In order to declare your VO's role in the federation, you must add OSDF information to your VO's YAML file in the OSG Topology repository. For example, the full registration for the Astro VO may look something like the following: DataFederations : StashCache : Namespaces : - Path : /astro/PUBLIC Authorizations : - PUBLIC AllowedCaches : - ANY AllowedOrigins : - ASTRO_OSDF_ORIGIN - Path : /astro/PROTECTED Authorizations : - FQAN : /Astro - DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci - SciTokens : Issuer : https://astro.org Base Path : /astro/PROTECTED AllowedCaches : - ASTRO_EAST_CACHE - ASTRO_WEST_CACHE AllowedOrigins : - ASTRO_AUTH_OSDF_ORIGIN The sections are described below.","title":"Registering Data Federation Information"},{"location":"data/stashcache/vo-data/#namespaces-section","text":"In the namespaces section, you will declare one or more namespaces. A namespace is a directory tree in the data federation that is owned by a VO/collaboration. Each namespace requires: a Path that is the path to the directory tree, e.g. /astro/PUBLIC an Authorizations list which describes how users are authorized to access data within the namespace an AllowedCaches list of the OSDF caches that are allowed to cache the data within the namespace an AllowedOrigins list of the OSDF origins that are allowed to serve the data within the namespace In addition, a namespace may have the following optional attributes: a Writeback endpoint that is an HTTPS URL like https://stash-xrd.osgconnect.net:1094 that can be used for jobs to write data to the origin a DirList endpoint that is an HTTPS URL like https://origin-auth2001.chtc.wisc.edu:1095 that can be used for getting a directory listing of that namespace","title":"Namespaces section"},{"location":"data/stashcache/vo-data/#authorizations-list","text":"The Authorizations list of each namespace describes how a user can get authorized in order to access the data within the namespace. The list will contain one or more of these: FQAN: <VOMS FQAN> allows someone using a proxy with the specified VOMS FQAN DN: <DN> allows someone using a proxy with that specific DN PUBLIC allows anyone; this is used for public data SciTokens allows someone using a SciToken with the given parameters, which are described below A complete declaration looks like: Namespaces : - Path : /astro/PUBLIC Authorizations : - PUBLIC AllowedCaches : ... AllowedOrigins : ... - Path : /astro/PROTECTED Authorizations : - FQAN : /Astro - DN : /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci - SciTokens : Issuer : https://astro.org Base Path : /astro/PROTECTED Map Subject : True AllowedCaches : ... AllowedOrigins : ... This declares two namespaces: /astro/PUBLIC for public data, and /astro/PROTECTED which can only be read by someone with the /Astro FQAN, by Matyas Selmeci, or by someone with a SciToken issued by https://astro.org .","title":"Authorizations list"},{"location":"data/stashcache/vo-data/#scitokens","text":"A SciTokens authorization has multiple parameters: Issuer (required) is the token issuer of the SciToken that the authorization accepts. Base Path (required) is a path that will be prepended to the scopes of the token in order to construct the full path to the file(s) that the bearer of the token is allowed to access. For example, if Base Path is set to /astro/PROTECTED then a token with the scope read:/matyas will have the permission to read from the directory tree under /astro/PROTECTED/matyas . The correct value for Base Path depends on how the issuer is set up, but we recommend that you set Base Path to the namespace path, and configure the issuer to create scopes relative to the namespace path. Map Subject (optional, False if not specified) should be set to True if the origin uses the XRootD-Multiuser plugin. It will cause the origin to use the token subject ( sub field) to map to a Unix user in order to access files. Restricted Path (optional) is a further restriction on paths the token is allowed to access. Only tokens whose scopes start with the Restricted Path will be accepted. Use this only if your issuer does not create relative scopes.","title":"SciTokens"},{"location":"data/stashcache/vo-data/#allowedcaches-list","text":"The VO must allow one or more OSDF caches to cache their data. The more places a VO's data can be cached in, the bigger the data transfer benefit for the VO. The majority of caches across OSG will automatically cache all \"public\" VO data. Caching \"protected\" VO data will often be done on a site owned by the VO. For information about setting up a cache, see the installation document . AllowedCaches is a list of which caches are allowed to host copies of your data. There are two cases: If you only have public data, your AllowedCaches list can look like: AllowedCaches : - ANY This allows any cache to host a copy of your data. If you have some protected data, then AllowedCaches is a list of resources that are allowed to cache your data. A resource is an entry in a /topology/<FACILITY>/<SITE>/<RESOURCEGROUP>.yaml file, for example CHTC_OSDF_CACHE . The following requirements must be met for the resource: It must have an \"XRootD cache server\" service It must have an AllowedVOs list that includes either your VO, \"ANY\", or \"ANY_PUBLIC\" It must have a DN attribute with the DN of its host cert","title":"AllowedCaches list"},{"location":"data/stashcache/vo-data/#allowedorigins-list","text":"AllowedOrigins is a list of which origins are allowed to host your data. This is a list of resources . A resource is an entry in a /topology/<FACILITY>/<SITE>/<RESOURCEGROUP>.yaml file, for example CHTC_OSDF_ORIGIN . The following requirements must be met for the resource: It must have an \"XRootD origin server\" service It must have an AllowedVOs list that includes either your VO or \"ANY\"","title":"AllowedOrigins list"},{"location":"data/xrootd/install-client/","text":"Using XRootD \u00b6 XRootD is a high performance data system widely used by several science VOs on OSG to store and to distribute data to jobs. It can be used to create a data store from distributed data nodes or to serve data to systems using a distributed caching architecture. Either mode of operation requires you to install the XRootD client software. This page provides instructions for accessing data on XRootD data systems using a variety of methods. As a user you have three different ways to interact with XRootD: Using the XRootD clients Using a XRootDFS FUSE mount to access a local XRootD data store Using LD_PRELOAD to use XRootD libraries with Unix tools We'll show how to install the XRootD client software and use all three mechanisms to access data. Note Only the client tools method should be used to access XRootD systems across a WAN link. Before Starting \u00b6 As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates If you are using the FUSE mount, you should also consider the following requirement: User IDs: If it does not exist already, you will need to create a xrootd user Using the XRootD client software \u00b6 Installing the XRootD Client \u00b6 If you are planning on interacting with XRootD using the XRootD client, then you'll need to install the XRootD client RPM. Installing the XRootD Client RPM \u00b6 The following steps will install the rpm on your system. Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update This command will update all packages Install XRootD Client rpm: root@client $ yum install xrootd-client Using the XRootD Client \u00b6 Once the xrootd-client rpm is installed, you should be able to use the xrdcp command to copy files to and from XRootD systems and the local file system. For example: user@client $ echo \"This is a test\" >/tmp/test user@client $ xrdcp /tmp/test xroot://redirector.domain.org:1094//storage/path/test user@client $ xrdcp xroot://redirector.domain.org:1094//storage/path/test /tmp/test1 user@client $ diff /tmp/test1 /tmp/test For other operations, you'll need to use the xrdfs command. This command allows you to do file operations such as creating directories, removing directories, deleting files, and moving files on a XRootD system, provided you have the appropriate authorization. The xrdfs command can be used interactively by running xrdfs xroot://redirector.domain.org:1094/ . Alternatively, you can use it in batch mode by adding the xrdfs command after the xroot URI. For example: user@client $ echo \"This is a test\" >/tmp/test user@client $ xrdfs xroot://redirector.domain.org:1094/ mkdir /storage/path/test user@client $ xrdcp xroot://redirector.domain.org:1094//storage/path/test/test1 /tmp/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ ls /storage/path/test/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ rm /storage/path/test/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ rmdir /storage/path/test Note To access remote XRootD resources, you will may need to use a VOMS proxy in order to authenticate successfully. The XRootD client tools will automatically locate your proxy if you generate it using voms-proxy-init , otherwise you can set the X509_USER_PROXY environment variable to the location of the proxy XRootD should use. Validation \u00b6 Assuming that there is a file called test_file in your XRootD data store, you can do the following to validate your installation. Here we assume that there is a file on your XRootD system at /storage/path/test_file . user@client $ xrdcp xroot://redirector.yourdomain.org:1094//storage/path/test_file /tmp/test1 Using XRootDFS FUSE mount \u00b6 This section will explain how to install, setup, and interact with XRootD using a FUSE mount. This method of accessing XRootD only works when accessing a local XRootD system. Installing the XRootD FUSE RPM \u00b6 If you are planning on using a FUSE mount, you'll need to install the xrootd-fuse rpm by running the following commands: Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update Install XRootD FUSE rpm: root@client $ yum install xrootd-fuse Configuring the FUSE Mount \u00b6 Once the appropriate rpms are installed, the FUSE setup will need further configuration. See this for instructions on updating your fstab file. Using the XRootDFS FUSE Mount \u00b6 The directory mounted using XRootDFS can be used as any other directory mounted on your file system. All the normal Unix commands should work out of the box. Try using cp , rm , mv , mkdir , rmdir . Assuming your mount is /mnt/xrootd : user@client $ echo \"This is a new test\" >/tmp/test user@client $ mkdir -p /mnt/xrootd/subdir/sub2 user@client $ cp /tmp/test /mnt/xrootd/subdir/sub2/test user@client $ cp /mnt/xrootd/subdir/sub2/test /mnt/xrootd/subdir/sub2/test1 user@client $ cp /mnt/xuserd/subdir/sub2/test1 /tmp/test1 user@client $ diff /tmp/test1 /tmp/test user@client $ rm -r /mnt/xrootd/subdir Validation \u00b6 Assuming your mount is /mnt/xrootd and that there is a file called test_file in your XRootD data store: user@client $ cp /mnt/xrootd/test_file /tmp/test1 Using LD_PRELOAD to access XRootD \u00b6 Installing XRootD Libraries For LD_PRELOAD \u00b6 In order to use LD_PRELOAD to access XRootD, you'll need to install the XRootD client libraries. The following steps will install them on your system: Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update This command will update all packages Install XRootD Client rpm: root@client $ yum install xrootd-client Using LD_PRELOAD method \u00b6 In order to use the LD_PRELOAD method to access a XRootD data store, you'll need to change your environment to use the XRootD libraries in conjunction with the standard Unix binaries. This is done by setting the LD_PRELOAD environment variable. Once this is done, the standard unix commands like mkdir , rm , cp , etc. will work with xroot URIs. For example: user@client $ export LD_PRELOAD = /usr/lib64/libXrdPosixPreload.so user@client $ echo \"This is a new test\" >/tmp/test user@client $ mkdir xroot://redirector.yourdomain.org:1094//storage/path/subdir user@client $ cp /tmp/test xroot://redirector.yourdomain.org:1094//storage/path/subdir/test user@client $ cp xuser://redirector.yourdomain.org:1094//storage/path/subdir/test /tmp/test1 user@client $ diff /tmp/test1 /tmp/test user@client $ rm xroot://redirector.yourdomain.org:1094//storage/path/subdir/test user@client $ rmdir xroot://redirector.yourdomain.org:1094//storage/path/subdir Validation \u00b6 Assuming that there is a file called test_file in your XRootD data store, the following steps will validate your installation: user@client $ export LD_PRELOAD = /usr/lib64/libXrdPosixPreload.so user@client $ cp xroot://redirector.yourdomain.org:1094//storage/path/test_file /tmp/test1 How to get Help? \u00b6 If you cannot resolve the problem, please consult this page for assistance..","title":"Using XRootD"},{"location":"data/xrootd/install-client/#using-xrootd","text":"XRootD is a high performance data system widely used by several science VOs on OSG to store and to distribute data to jobs. It can be used to create a data store from distributed data nodes or to serve data to systems using a distributed caching architecture. Either mode of operation requires you to install the XRootD client software. This page provides instructions for accessing data on XRootD data systems using a variety of methods. As a user you have three different ways to interact with XRootD: Using the XRootD clients Using a XRootDFS FUSE mount to access a local XRootD data store Using LD_PRELOAD to use XRootD libraries with Unix tools We'll show how to install the XRootD client software and use all three mechanisms to access data. Note Only the client tools method should be used to access XRootD systems across a WAN link.","title":"Using XRootD"},{"location":"data/xrootd/install-client/#before-starting","text":"As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates If you are using the FUSE mount, you should also consider the following requirement: User IDs: If it does not exist already, you will need to create a xrootd user","title":"Before Starting"},{"location":"data/xrootd/install-client/#using-the-xrootd-client-software","text":"","title":"Using the XRootD client software"},{"location":"data/xrootd/install-client/#installing-the-xrootd-client","text":"If you are planning on interacting with XRootD using the XRootD client, then you'll need to install the XRootD client RPM.","title":"Installing the XRootD Client"},{"location":"data/xrootd/install-client/#installing-the-xrootd-client-rpm","text":"The following steps will install the rpm on your system. Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update This command will update all packages Install XRootD Client rpm: root@client $ yum install xrootd-client","title":"Installing the XRootD Client RPM"},{"location":"data/xrootd/install-client/#using-the-xrootd-client","text":"Once the xrootd-client rpm is installed, you should be able to use the xrdcp command to copy files to and from XRootD systems and the local file system. For example: user@client $ echo \"This is a test\" >/tmp/test user@client $ xrdcp /tmp/test xroot://redirector.domain.org:1094//storage/path/test user@client $ xrdcp xroot://redirector.domain.org:1094//storage/path/test /tmp/test1 user@client $ diff /tmp/test1 /tmp/test For other operations, you'll need to use the xrdfs command. This command allows you to do file operations such as creating directories, removing directories, deleting files, and moving files on a XRootD system, provided you have the appropriate authorization. The xrdfs command can be used interactively by running xrdfs xroot://redirector.domain.org:1094/ . Alternatively, you can use it in batch mode by adding the xrdfs command after the xroot URI. For example: user@client $ echo \"This is a test\" >/tmp/test user@client $ xrdfs xroot://redirector.domain.org:1094/ mkdir /storage/path/test user@client $ xrdcp xroot://redirector.domain.org:1094//storage/path/test/test1 /tmp/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ ls /storage/path/test/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ rm /storage/path/test/test1 user@client $ xrdfs xroot://redirector.domain.org:1094/ rmdir /storage/path/test Note To access remote XRootD resources, you will may need to use a VOMS proxy in order to authenticate successfully. The XRootD client tools will automatically locate your proxy if you generate it using voms-proxy-init , otherwise you can set the X509_USER_PROXY environment variable to the location of the proxy XRootD should use.","title":"Using the XRootD Client"},{"location":"data/xrootd/install-client/#validation","text":"Assuming that there is a file called test_file in your XRootD data store, you can do the following to validate your installation. Here we assume that there is a file on your XRootD system at /storage/path/test_file . user@client $ xrdcp xroot://redirector.yourdomain.org:1094//storage/path/test_file /tmp/test1","title":"Validation"},{"location":"data/xrootd/install-client/#using-xrootdfs-fuse-mount","text":"This section will explain how to install, setup, and interact with XRootD using a FUSE mount. This method of accessing XRootD only works when accessing a local XRootD system.","title":"Using XRootDFS FUSE mount"},{"location":"data/xrootd/install-client/#installing-the-xrootd-fuse-rpm","text":"If you are planning on using a FUSE mount, you'll need to install the xrootd-fuse rpm by running the following commands: Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update Install XRootD FUSE rpm: root@client $ yum install xrootd-fuse","title":"Installing the XRootD FUSE RPM"},{"location":"data/xrootd/install-client/#configuring-the-fuse-mount","text":"Once the appropriate rpms are installed, the FUSE setup will need further configuration. See this for instructions on updating your fstab file.","title":"Configuring the FUSE Mount"},{"location":"data/xrootd/install-client/#using-the-xrootdfs-fuse-mount","text":"The directory mounted using XRootDFS can be used as any other directory mounted on your file system. All the normal Unix commands should work out of the box. Try using cp , rm , mv , mkdir , rmdir . Assuming your mount is /mnt/xrootd : user@client $ echo \"This is a new test\" >/tmp/test user@client $ mkdir -p /mnt/xrootd/subdir/sub2 user@client $ cp /tmp/test /mnt/xrootd/subdir/sub2/test user@client $ cp /mnt/xrootd/subdir/sub2/test /mnt/xrootd/subdir/sub2/test1 user@client $ cp /mnt/xuserd/subdir/sub2/test1 /tmp/test1 user@client $ diff /tmp/test1 /tmp/test user@client $ rm -r /mnt/xrootd/subdir","title":"Using the XRootDFS FUSE Mount"},{"location":"data/xrootd/install-client/#validation_1","text":"Assuming your mount is /mnt/xrootd and that there is a file called test_file in your XRootD data store: user@client $ cp /mnt/xrootd/test_file /tmp/test1","title":"Validation"},{"location":"data/xrootd/install-client/#using-ld_preload-to-access-xrootd","text":"","title":"Using LD_PRELOAD to access XRootD"},{"location":"data/xrootd/install-client/#installing-xrootd-libraries-for-ld_preload","text":"In order to use LD_PRELOAD to access XRootD, you'll need to install the XRootD client libraries. The following steps will install them on your system: Clean yum cache: root@client $ yum clean all --enablerepo = \\* Update software: root@client $ yum update This command will update all packages Install XRootD Client rpm: root@client $ yum install xrootd-client","title":"Installing XRootD Libraries For LD_PRELOAD"},{"location":"data/xrootd/install-client/#using-ld_preload-method","text":"In order to use the LD_PRELOAD method to access a XRootD data store, you'll need to change your environment to use the XRootD libraries in conjunction with the standard Unix binaries. This is done by setting the LD_PRELOAD environment variable. Once this is done, the standard unix commands like mkdir , rm , cp , etc. will work with xroot URIs. For example: user@client $ export LD_PRELOAD = /usr/lib64/libXrdPosixPreload.so user@client $ echo \"This is a new test\" >/tmp/test user@client $ mkdir xroot://redirector.yourdomain.org:1094//storage/path/subdir user@client $ cp /tmp/test xroot://redirector.yourdomain.org:1094//storage/path/subdir/test user@client $ cp xuser://redirector.yourdomain.org:1094//storage/path/subdir/test /tmp/test1 user@client $ diff /tmp/test1 /tmp/test user@client $ rm xroot://redirector.yourdomain.org:1094//storage/path/subdir/test user@client $ rmdir xroot://redirector.yourdomain.org:1094//storage/path/subdir","title":"Using LD_PRELOAD method"},{"location":"data/xrootd/install-client/#validation_2","text":"Assuming that there is a file called test_file in your XRootD data store, the following steps will validate your installation: user@client $ export LD_PRELOAD = /usr/lib64/libXrdPosixPreload.so user@client $ cp xroot://redirector.yourdomain.org:1094//storage/path/test_file /tmp/test1","title":"Validation"},{"location":"data/xrootd/install-client/#how-to-get-help","text":"If you cannot resolve the problem, please consult this page for assistance..","title":"How to get Help?"},{"location":"data/xrootd/install-cms-xcache/","text":"Installing the CMS XCache \u00b6 This document describes how to install a CMS XCache. This service allows a site or regional network to cache data frequently used by the CMS experiment , reducing data transfer over the wide-area network and decreasing access latency. The are two types of installations described in this document: single or multinode cache. The difference might be based on the total disk that your cache needs. Before Starting \u00b6 Before starting the installation process, consider the following requirements: Operating system: A RHEL 7 or compatible operating systems. User IDs: If they do not exist already, the installation will create the Linux user IDs xrootd Host certificate: Required for client authentication and authentication with CMS VOMS Server See our documentation for instructions on how to request and install host certificates. Network ports: The cache service requires the following ports open: Inbound TCP port 1094 for file access via the XRootD protocol Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 100TB of disk space for the whole cache (can be divided among several caches), and 8GB of RAM. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates Installing the Cache \u00b6 The CMS XCache ROM software consists of an XRootD server with special configuration and supporting services. To simplify installation, OSG provides convenience RPMs that install all required packages with a single command: root@host # yum install cms-xcache Configuring the Cache \u00b6 First, you must create a \"cache directory\", which will be used to store downloaded files. By default this is /mnt/stash . We recommend using a separate file system for the cache directory, with at least 1 TB of storage available. Note The cache directory must be writable by the xrootd:xrootd user and group. The cms-xcache package provides default configuration files in /etc/xrootd/xrootd-cms-xcache.cfg and /etc/xrootd/config.d/ . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d/1*.cfg (for files that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for files that need to be processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG for example (\"T2_US_UCSD\") Note XRootD can manage a set of independent disk for the cache. So you can modify file 90-cms-xcache-disks.cfg and add the disks there then rootdir just becomes a place to hold symlinks. Ensure the xrootd service has a certificate \u00b6 The service will need a certificate for reporting and to authenticate to CMS AAA. The easiest solution for this is to use your host certificate and key as follows: Copy the host certificate to /etc/grid-security/xrd/xrd{cert,key}.pem Set the owner of the directory and contents /etc/grid-security/xrd/ to xrootd:xrootd : root@host # chown -R xrootd:xrootd /etc/grid-security/xrd/ Note You must repeat the above steps whenever you renew your host certificate. If you automate certificate renewal, you should automate copying as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented here . Note You must also register this certificate with the CMS VOMS (https://voms2.cern.ch:8443/voms/cms/) Configuring Optional Features \u00b6 Adjust disk utilization \u00b6 To adjust the disk utilization of your cache, create or edit a file named /etc/xrootd/config.d/90-local.cfg and set the values of pfc.diskusage . pfc.diskusage 0.90 0.95 The two values correspond to the low and high usage water marks, respectively. When usage goes above the high water mark, the XRootD service will delete cached files until usage goes below the low water mark. Modify the storage access settings at a site \u00b6 In order for CMSSW jobs to use the cache at your site you need to modify the storage.xml and create the following rules # Portions of /store in xcache <lfn-to-pfn protocol=\"direct\" destination-match=\".*\" path-match=\"/+store/(data/.*/.*/NANOAOD/.*)\" result=\"root://yourlocalcache:1094//store/$1\"/> <lfn-to-pfn protocol=\"direct\" destination-match=\".*\" path-match=\"/+store/(mc/.*/.*/NANOAODSIM/.*)\" result=\"root://yourlocalcache:1094//store/$1\"/> Note If you are installing a multinode cache then instead of yourlocalcache:1094 url should be changed for yourcacheredirector:2040 Enable remote debugging \u00b6 XRootD provides remote debugging via a read-only file system named digFS. This feature is disabled by default, but you may enable it if you need help troubleshooting your server. To enable remote debugging, edit /etc/xrootd/digauth.cfg and specify the authorizations for reading digFS. An example of authorizations: all allow gsi g=/glow h=*.cs.wisc.edu This gives access to the config file, log files, core files, and process information to anyone from *.cs.wisc.edu in the /glow VOMS group. See the XRootD manual for the full syntax. Remote debugging should only be enabled for as long as you need assistance. As soon as your issue has been resolved, revert any changes you have made to /etc/xrootd/digauth.cfg . Installing a Multinode Cache (optional) \u00b6 Some sites would like to have a single logical cache composed of several nodes as shown below: This can be achieved by following the next steps Install an XCache redirector \u00b6 This can be a simple lightweight virtual machine and will be the single point of contact from jobs to the caches. Install the redirector package root@host # yum install xcache-redirector Create file named /etc/xrootd/config.d/04-local-redir.cfg with contents: all.manager yourlocalredir:2041 You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG for example (\"T2_US_UCSD\") Start and enable the cmsd and xrootd proccess: Software Service name Notes XRootD cmsd@xcache-redir.service The cmsd daemon that interact with the different xrootd servers XRootD xrootd@xcache-redir.service The xrootd daemon which performs authenticated data transfers Configuring each of your cache nodes \u00b6 Create a config file in the nodes where you installed your caches /etc/xrootd/config.d/94-xrootd-manager.cfg with the following contents: all.manager yourlocalredir:2041 Start and enable the cmsd service: Software Service name Notes XRootD cmsd@cms-xcache.service The xrootd daemon which performs authenticated data transfers Managing CMS XCache and associated services \u00b6 These services must be managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ) for EL7: To... On EL7, run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> CMS XCache services \u00b6 Software Service name Notes XRootD xrootd@cms-xcache.service The XRootD daemon, which performs the data transfers XRootD (Optional) cmsd@cms-xcache.service The cmsd daemon that interact with the different xrootd servers Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron Required to authenticate monitoring services. See CA documentation for more info xrootd-renew-proxy.service Renew a proxy for downloads to the cache xrootd-renew-proxy.timer Trigger daily proxy renewal XCache redirector services (Optional) \u00b6 In the node where the cache redirector is installed these are the list of services: Software Service name Notes XRootD (Optional) xrootd@xcache-redir.service The xrootd daemon which performs authenticated data transfers XRootD (Optional) cmsd@xcache-redir.service The xrootd daemon which performs authenticated data transfers Validating the Cache \u00b6 The cache server functions as a normal CMS XRootD server so first verify it with a personal CMS X.509 proxy: === VO cms extension information === VO : cms subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=efajardo/CN=722781/CN=Edgar Fajardo Hernandez issuer : /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch attribute : /cms/Role=NULL/Capability=NULL attribute : /cms/uscms/Role=NULL/Capability=NULL timeleft : 71:59:46 uri : lcg-voms2.cern.ch:15002 Then test using xrdcp directly in your cache: user@host $ xrdcp -vf -d 1 root://cache_host:1094//store/data/Run2017B/SingleElectron/MINIAOD/31Mar2018-v1/60000/9E0F8458-EA37-E811-93F1-008CFAC919F0.root /dev/null Getting Help \u00b6 To get assistance, please use the this page .","title":"Install CMS XCache"},{"location":"data/xrootd/install-cms-xcache/#installing-the-cms-xcache","text":"This document describes how to install a CMS XCache. This service allows a site or regional network to cache data frequently used by the CMS experiment , reducing data transfer over the wide-area network and decreasing access latency. The are two types of installations described in this document: single or multinode cache. The difference might be based on the total disk that your cache needs.","title":"Installing the CMS XCache"},{"location":"data/xrootd/install-cms-xcache/#before-starting","text":"Before starting the installation process, consider the following requirements: Operating system: A RHEL 7 or compatible operating systems. User IDs: If they do not exist already, the installation will create the Linux user IDs xrootd Host certificate: Required for client authentication and authentication with CMS VOMS Server See our documentation for instructions on how to request and install host certificates. Network ports: The cache service requires the following ports open: Inbound TCP port 1094 for file access via the XRootD protocol Outbound UDP port 9930 for reporting to xrd-report.osgstorage.org and xrd-mon.osgstorage.org for monitoring Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 100TB of disk space for the whole cache (can be divided among several caches), and 8GB of RAM. As with all OSG software installations, there are some one-time steps to prepare in advance: Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"data/xrootd/install-cms-xcache/#installing-the-cache","text":"The CMS XCache ROM software consists of an XRootD server with special configuration and supporting services. To simplify installation, OSG provides convenience RPMs that install all required packages with a single command: root@host # yum install cms-xcache","title":"Installing the Cache"},{"location":"data/xrootd/install-cms-xcache/#configuring-the-cache","text":"First, you must create a \"cache directory\", which will be used to store downloaded files. By default this is /mnt/stash . We recommend using a separate file system for the cache directory, with at least 1 TB of storage available. Note The cache directory must be writable by the xrootd:xrootd user and group. The cms-xcache package provides default configuration files in /etc/xrootd/xrootd-cms-xcache.cfg and /etc/xrootd/config.d/ . Administrators may provide additional configuration by placing files in /etc/xrootd/config.d/1*.cfg (for files that need to be processed BEFORE the OSG configuration) or /etc/xrootd/config.d/9*.cfg (for files that need to be processed AFTER the OSG configuration). You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG for example (\"T2_US_UCSD\") Note XRootD can manage a set of independent disk for the cache. So you can modify file 90-cms-xcache-disks.cfg and add the disks there then rootdir just becomes a place to hold symlinks.","title":"Configuring the Cache"},{"location":"data/xrootd/install-cms-xcache/#ensure-the-xrootd-service-has-a-certificate","text":"The service will need a certificate for reporting and to authenticate to CMS AAA. The easiest solution for this is to use your host certificate and key as follows: Copy the host certificate to /etc/grid-security/xrd/xrd{cert,key}.pem Set the owner of the directory and contents /etc/grid-security/xrd/ to xrootd:xrootd : root@host # chown -R xrootd:xrootd /etc/grid-security/xrd/ Note You must repeat the above steps whenever you renew your host certificate. If you automate certificate renewal, you should automate copying as well. For example, if you are using Certbot for Let's Encrypt, you should write a \"deploy hook\" as documented here . Note You must also register this certificate with the CMS VOMS (https://voms2.cern.ch:8443/voms/cms/)","title":"Ensure the xrootd service has a certificate"},{"location":"data/xrootd/install-cms-xcache/#configuring-optional-features","text":"","title":"Configuring Optional Features"},{"location":"data/xrootd/install-cms-xcache/#adjust-disk-utilization","text":"To adjust the disk utilization of your cache, create or edit a file named /etc/xrootd/config.d/90-local.cfg and set the values of pfc.diskusage . pfc.diskusage 0.90 0.95 The two values correspond to the low and high usage water marks, respectively. When usage goes above the high water mark, the XRootD service will delete cached files until usage goes below the low water mark.","title":"Adjust disk utilization"},{"location":"data/xrootd/install-cms-xcache/#modify-the-storage-access-settings-at-a-site","text":"In order for CMSSW jobs to use the cache at your site you need to modify the storage.xml and create the following rules # Portions of /store in xcache <lfn-to-pfn protocol=\"direct\" destination-match=\".*\" path-match=\"/+store/(data/.*/.*/NANOAOD/.*)\" result=\"root://yourlocalcache:1094//store/$1\"/> <lfn-to-pfn protocol=\"direct\" destination-match=\".*\" path-match=\"/+store/(mc/.*/.*/NANOAODSIM/.*)\" result=\"root://yourlocalcache:1094//store/$1\"/> Note If you are installing a multinode cache then instead of yourlocalcache:1094 url should be changed for yourcacheredirector:2040","title":"Modify the storage access settings at a site"},{"location":"data/xrootd/install-cms-xcache/#enable-remote-debugging","text":"XRootD provides remote debugging via a read-only file system named digFS. This feature is disabled by default, but you may enable it if you need help troubleshooting your server. To enable remote debugging, edit /etc/xrootd/digauth.cfg and specify the authorizations for reading digFS. An example of authorizations: all allow gsi g=/glow h=*.cs.wisc.edu This gives access to the config file, log files, core files, and process information to anyone from *.cs.wisc.edu in the /glow VOMS group. See the XRootD manual for the full syntax. Remote debugging should only be enabled for as long as you need assistance. As soon as your issue has been resolved, revert any changes you have made to /etc/xrootd/digauth.cfg .","title":"Enable remote debugging"},{"location":"data/xrootd/install-cms-xcache/#installing-a-multinode-cache-optional","text":"Some sites would like to have a single logical cache composed of several nodes as shown below: This can be achieved by following the next steps","title":"Installing a Multinode Cache (optional)"},{"location":"data/xrootd/install-cms-xcache/#install-an-xcache-redirector","text":"This can be a simple lightweight virtual machine and will be the single point of contact from jobs to the caches. Install the redirector package root@host # yum install xcache-redirector Create file named /etc/xrootd/config.d/04-local-redir.cfg with contents: all.manager yourlocalredir:2041 You must configure every variable in /etc/xrootd/config.d/10-common-site-local.cfg . The mandatory variables to configure are: set rootdir = /mnt/stash : the mounted filesystem path to export. This document refers to this as /mnt/stash . set resourcename = YOUR_RESOURCE_NAME : the resource name registered with the OSG for example (\"T2_US_UCSD\") Start and enable the cmsd and xrootd proccess: Software Service name Notes XRootD cmsd@xcache-redir.service The cmsd daemon that interact with the different xrootd servers XRootD xrootd@xcache-redir.service The xrootd daemon which performs authenticated data transfers","title":"Install an XCache redirector"},{"location":"data/xrootd/install-cms-xcache/#configuring-each-of-your-cache-nodes","text":"Create a config file in the nodes where you installed your caches /etc/xrootd/config.d/94-xrootd-manager.cfg with the following contents: all.manager yourlocalredir:2041 Start and enable the cmsd service: Software Service name Notes XRootD cmsd@cms-xcache.service The xrootd daemon which performs authenticated data transfers","title":"Configuring each of your cache nodes"},{"location":"data/xrootd/install-cms-xcache/#managing-cms-xcache-and-associated-services","text":"These services must be managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ) for EL7: To... On EL7, run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME>","title":"Managing CMS XCache and associated services"},{"location":"data/xrootd/install-cms-xcache/#cms-xcache-services","text":"Software Service name Notes XRootD xrootd@cms-xcache.service The XRootD daemon, which performs the data transfers XRootD (Optional) cmsd@cms-xcache.service The cmsd daemon that interact with the different xrootd servers Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron Required to authenticate monitoring services. See CA documentation for more info xrootd-renew-proxy.service Renew a proxy for downloads to the cache xrootd-renew-proxy.timer Trigger daily proxy renewal","title":"CMS XCache services"},{"location":"data/xrootd/install-cms-xcache/#xcache-redirector-services-optional","text":"In the node where the cache redirector is installed these are the list of services: Software Service name Notes XRootD (Optional) xrootd@xcache-redir.service The xrootd daemon which performs authenticated data transfers XRootD (Optional) cmsd@xcache-redir.service The xrootd daemon which performs authenticated data transfers","title":"XCache redirector services (Optional)"},{"location":"data/xrootd/install-cms-xcache/#validating-the-cache","text":"The cache server functions as a normal CMS XRootD server so first verify it with a personal CMS X.509 proxy: === VO cms extension information === VO : cms subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=efajardo/CN=722781/CN=Edgar Fajardo Hernandez issuer : /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch attribute : /cms/Role=NULL/Capability=NULL attribute : /cms/uscms/Role=NULL/Capability=NULL timeleft : 71:59:46 uri : lcg-voms2.cern.ch:15002 Then test using xrdcp directly in your cache: user@host $ xrdcp -vf -d 1 root://cache_host:1094//store/data/Run2017B/SingleElectron/MINIAOD/31Mar2018-v1/60000/9E0F8458-EA37-E811-93F1-008CFAC919F0.root /dev/null","title":"Validating the Cache"},{"location":"data/xrootd/install-cms-xcache/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"data/xrootd/install-shoveler/","text":"Installing the XRootD Monitoring Shoveler \u00b6 The XRootD Monitoring Shoveler is designed to accept the XRootD monitoring packets and \"shovel\" them to the OSG message bus. Shoveling is the act of moving messages from one medium to another. In this case, the shoveler is moving messages from a UDP stream to a message bus. graph LR subgraph Site subgraph Node 1 node1[XRootD] -- UDP --> shoveler1{Shoveler}; end subgraph Node 2 node2[XRootD] -- UDP --> shoveler1{Shoveler}; end end; subgraph OSG Operations shoveler1 -- TCP/TLS --> C[Message Bus]; C -- Raw --> D[XRootD Collector]; D -- Summary --> C; C -- Summary --> E[(Storage)]; style shoveler1 font-weight:bolder,stroke-width:4px,stroke:#E74C3C,font-size:4em,color:#E74C3C end; Installing the Shoveler \u00b6 The shoveler can be installed via RPM, container, or staticly compiled binary. Requirements for running the Shoveler \u00b6 An open port (configurable) that can receive UDP packets from the XRootD servers on the shoveler server. It does not need to be an open port to the internet, only open to the XRootD servers. Outgoing TCP connectivity on the shoveler host. A directory on the shoveler host to store the on-disk queue. Resource Requirements \u00b6 RAM : Production shovelers use less than 50MB of memory. Disk : If the shoveler is disconnected from the message bus, it will store the messages on disk until reconnected. Through testing, a disconnected shoveler with 12 busy XRootD servers will generate <30 MB of data a day on disk. CPU : A production shoveler will use 1-2% of a CPU, depending on how many XRootD servers are reporting to the shoveler. A shoveler with 12 busy XRootD servers reporting to it uses 1-2% of a CPU. Network : A production shoveler will receive UDP messages from XRootD servers and send them to a message bus. The incoming and outgoing network utilization will be the same. In testing, a shoveler will use <30MB of data a day on the network. Configuring the Shoveler \u00b6 Configuration can be specified with environment variables or a configuration file. The configuration file is in yaml . An example configuration file is distributed with the shoveler. In the RPM, the configuration file is located in /etc/xrootd-monitoring-shoveler/config.yaml . Below, we will break the configuration file into fragments but together they make a whole configuration file. Environment variables can be derived from the yaml. Every environment variable starts with SHOVELER_ , then continues with the structure of the configuration file. For example, the amqp url can be configured with the environment variable SHOVELER_AMQP_URL . The verify option can be configured with SHOVELER_VERIFY . Configuration Fragments \u00b6 AMQP Configuration \u00b6 AMQP configuration. For the OSG, the url should be amqps://clever-turkey.rmq.cloudamqp.com/xrd-mon . The exchange should is correct for the OSG. token_location is the path to the authentication token. # AMQP configuration amqp : url : amqps://username:password@example.com/vhost exchange : shoveled-xrd topic : token_location : /etc/xrootd-monitoring-shoveler/token Listening to UDP messages \u00b6 Where to listen for UDP messages from XRootD servers. listen : port : 9993 ip : 0.0.0.0 Verify packet header \u00b6 Whether to verify the header of the packet matches XRootD's monitoring packet format. verify : true Prometheus monitoring data \u00b6 Listening location of Prometheus metrics to view the performance and status of the shoveler in Prometheus format. # Export prometheus metrics metrics : enable : true port : 8000 Queue Configuration \u00b6 Directory to store overflow of queue onto disk. The queue keeps 100 messages in memory. If the shoveler is disconnected from the message bus, it will store messages over the 100 in memory onto disk into this directory. Once the connection has been re-established the queue will be emptied. The queue on disk is persistent between restarts. queue_directory : /tmp/shoveler-queue IP Mapping Configuration \u00b6 Mapping configuration (optional). If map.all is set, all messages will be mapped to the configured IP address. For example, with the above configuration, if a packet comes in with the private IP address of 192.168.0.4, the packet origin will be changed to 172.0.0.4. The port is always preserved. # map: # all: 172.0.0.4 If you want multiple mappings, you can specify multiple map entries. # map: # 192.168.0.5: 172.0.0.5 # 192.168.0.6: 129.93.10.7 Configuring Security \u00b6 A token is used to authenticate and authorize the shoveler with the message bus. The token is generated by the shoveler's lightweight issuer. Sequence of getting a token for the shoveler is shown below. sequenceDiagram User->>oidc-agent: Authenticate oidc-agent->>Issuer: Register agent Issuer->>oidc-agent: User Code oidc-agent->>User: User Code and URL User->>Issuer: Authenticate at URL oidc-agent->>Issuer: Get Token Get your unique CILogon User Identifier from CILogon . It is under User Attributes, and follows the pattern http://cilogon.org/serverA/users/12345. Open a ticket at help@osg-htc.org with your CILogon User Identifier to authorize your login with the renewer. Install the OSG Token Renewal Service When installing, the issuer is https://lw-issuer.osgdev.chtc.io/scitokens-server/ When asked about scopes, accept the default. Follow through authentication the flow. In the configuration for the issuer, /etc/osg/token-renewer/config.ini , the token location must match the location of the token in the Shoveler configuration.","title":"Install XRootD Shoveler"},{"location":"data/xrootd/install-shoveler/#installing-the-xrootd-monitoring-shoveler","text":"The XRootD Monitoring Shoveler is designed to accept the XRootD monitoring packets and \"shovel\" them to the OSG message bus. Shoveling is the act of moving messages from one medium to another. In this case, the shoveler is moving messages from a UDP stream to a message bus. graph LR subgraph Site subgraph Node 1 node1[XRootD] -- UDP --> shoveler1{Shoveler}; end subgraph Node 2 node2[XRootD] -- UDP --> shoveler1{Shoveler}; end end; subgraph OSG Operations shoveler1 -- TCP/TLS --> C[Message Bus]; C -- Raw --> D[XRootD Collector]; D -- Summary --> C; C -- Summary --> E[(Storage)]; style shoveler1 font-weight:bolder,stroke-width:4px,stroke:#E74C3C,font-size:4em,color:#E74C3C end;","title":"Installing the XRootD Monitoring Shoveler"},{"location":"data/xrootd/install-shoveler/#installing-the-shoveler","text":"The shoveler can be installed via RPM, container, or staticly compiled binary.","title":"Installing the Shoveler"},{"location":"data/xrootd/install-shoveler/#requirements-for-running-the-shoveler","text":"An open port (configurable) that can receive UDP packets from the XRootD servers on the shoveler server. It does not need to be an open port to the internet, only open to the XRootD servers. Outgoing TCP connectivity on the shoveler host. A directory on the shoveler host to store the on-disk queue.","title":"Requirements for running the Shoveler"},{"location":"data/xrootd/install-shoveler/#resource-requirements","text":"RAM : Production shovelers use less than 50MB of memory. Disk : If the shoveler is disconnected from the message bus, it will store the messages on disk until reconnected. Through testing, a disconnected shoveler with 12 busy XRootD servers will generate <30 MB of data a day on disk. CPU : A production shoveler will use 1-2% of a CPU, depending on how many XRootD servers are reporting to the shoveler. A shoveler with 12 busy XRootD servers reporting to it uses 1-2% of a CPU. Network : A production shoveler will receive UDP messages from XRootD servers and send them to a message bus. The incoming and outgoing network utilization will be the same. In testing, a shoveler will use <30MB of data a day on the network.","title":"Resource Requirements"},{"location":"data/xrootd/install-shoveler/#configuring-the-shoveler","text":"Configuration can be specified with environment variables or a configuration file. The configuration file is in yaml . An example configuration file is distributed with the shoveler. In the RPM, the configuration file is located in /etc/xrootd-monitoring-shoveler/config.yaml . Below, we will break the configuration file into fragments but together they make a whole configuration file. Environment variables can be derived from the yaml. Every environment variable starts with SHOVELER_ , then continues with the structure of the configuration file. For example, the amqp url can be configured with the environment variable SHOVELER_AMQP_URL . The verify option can be configured with SHOVELER_VERIFY .","title":"Configuring the Shoveler"},{"location":"data/xrootd/install-shoveler/#configuration-fragments","text":"","title":"Configuration Fragments"},{"location":"data/xrootd/install-shoveler/#amqp-configuration","text":"AMQP configuration. For the OSG, the url should be amqps://clever-turkey.rmq.cloudamqp.com/xrd-mon . The exchange should is correct for the OSG. token_location is the path to the authentication token. # AMQP configuration amqp : url : amqps://username:password@example.com/vhost exchange : shoveled-xrd topic : token_location : /etc/xrootd-monitoring-shoveler/token","title":"AMQP Configuration"},{"location":"data/xrootd/install-shoveler/#listening-to-udp-messages","text":"Where to listen for UDP messages from XRootD servers. listen : port : 9993 ip : 0.0.0.0","title":"Listening to UDP messages"},{"location":"data/xrootd/install-shoveler/#verify-packet-header","text":"Whether to verify the header of the packet matches XRootD's monitoring packet format. verify : true","title":"Verify packet header"},{"location":"data/xrootd/install-shoveler/#prometheus-monitoring-data","text":"Listening location of Prometheus metrics to view the performance and status of the shoveler in Prometheus format. # Export prometheus metrics metrics : enable : true port : 8000","title":"Prometheus monitoring data"},{"location":"data/xrootd/install-shoveler/#queue-configuration","text":"Directory to store overflow of queue onto disk. The queue keeps 100 messages in memory. If the shoveler is disconnected from the message bus, it will store messages over the 100 in memory onto disk into this directory. Once the connection has been re-established the queue will be emptied. The queue on disk is persistent between restarts. queue_directory : /tmp/shoveler-queue","title":"Queue Configuration"},{"location":"data/xrootd/install-shoveler/#ip-mapping-configuration","text":"Mapping configuration (optional). If map.all is set, all messages will be mapped to the configured IP address. For example, with the above configuration, if a packet comes in with the private IP address of 192.168.0.4, the packet origin will be changed to 172.0.0.4. The port is always preserved. # map: # all: 172.0.0.4 If you want multiple mappings, you can specify multiple map entries. # map: # 192.168.0.5: 172.0.0.5 # 192.168.0.6: 129.93.10.7","title":"IP Mapping Configuration"},{"location":"data/xrootd/install-shoveler/#configuring-security","text":"A token is used to authenticate and authorize the shoveler with the message bus. The token is generated by the shoveler's lightweight issuer. Sequence of getting a token for the shoveler is shown below. sequenceDiagram User->>oidc-agent: Authenticate oidc-agent->>Issuer: Register agent Issuer->>oidc-agent: User Code oidc-agent->>User: User Code and URL User->>Issuer: Authenticate at URL oidc-agent->>Issuer: Get Token Get your unique CILogon User Identifier from CILogon . It is under User Attributes, and follows the pattern http://cilogon.org/serverA/users/12345. Open a ticket at help@osg-htc.org with your CILogon User Identifier to authorize your login with the renewer. Install the OSG Token Renewal Service When installing, the issuer is https://lw-issuer.osgdev.chtc.io/scitokens-server/ When asked about scopes, accept the default. Follow through authentication the flow. In the configuration for the issuer, /etc/osg/token-renewer/config.ini , the token location must match the location of the token in the Shoveler configuration.","title":"Configuring Security"},{"location":"data/xrootd/install-standalone/","text":"Install XRootD Standalone \u00b6 XRootD is a hierarchical storage system that can be used in many ways to access data, typically distributed among actual storage resources. In its standalone configuration, XRootD acts as a simple layer exporting data from a storage system to the outside world. This document focuses on installing a default configuration of XRootD standalone that provides the following features: Supports any POSIX-based storage system Macaroons, X.509 proxy, and VOMS proxy authentication Third-Party Copy over HTTP (HTTP-TPC) Before Starting \u00b6 Before starting the installation process, consider the following points: User IDs: If it does not exist already, the installation will create the Linux user ID xrootd Service certificate: The XRootD service uses a host certificate and key pair at /etc/grid-security/xrd/xrdcert.pem and /etc/grid-security/xrd/xrdkey.pem that must be owned by the xrootd user Networking: The XRootD service uses port 1094 by default As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Installing XRootD \u00b6 Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. To install an XRootD Standalone server, run the following command: root@xrootd-standalone # yum install osg-xrootd-standalone Configuring XRootD \u00b6 To configure XRootD as a standalone server, you will modify /etc/xrootd/xrootd-standalone.cfg and the config files under /etc/xrootd/config.d/ as follows: Configure a rootdir in /etc/xrootd/config.d/10-common-site-local.cfg , to point to the top of the directory hierarchy which you wish to serve via XRootD. set rootdir = <DIRECTORY> Carefully consider your rootdir Do not set rootdir to / . This might result in serving private information. If you want to limit the sub-directories to serve under your configured rootdir , comment out the all.export / directive in /etc/xrootd/config.d/90-osg-standalone-paths.cfg , and add an all.export directive for each directory under rootdir that you wish to serve via XRootD. This is useful if you have a mixture of files under your rootdir , for example from multiple users, but only want to expose a subset of them to the world. For example, to serve the contents of /data/store and /data/public (with rootdir configured to /data ): all.export /store/ all.export /public/ If you want to serve everything under your configured rootdir , you don't have to change anything. Danger The directories specified this way are writable by default. Access controls should be managed via authorization configuration . In /etc/xrootd/config.d/10-common-site-local.cfg , add a line to set the resourcename variable. Unless your supported VOs' policies state otherwise, this should match the resource name of your XRootD service. For example, the XRootD service registered at the University of Florida site should set the following configuration: set resourcename = UFlorida-XRD Configuring authentication and authorization \u00b6 XRootD offers several authentication options using security plugins to validate incoming credentials, such as bearer tokens, X.509 proxies, and VOMS proxies. Please follow the XRootD authorization documentation for instructions on how to configure authentication and authorization, including validating credentials and mapping them to users if desired. Optional configuration \u00b6 The following configuration steps are optional and will likely not be required for setting up a small site. If you do not need any of the following special configurations, skip to the section on using XRootD . Enabling multi-user support \u00b6 Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. The xrootd-multiuser plugin allows XRootD to write files on the storage system as the authenticated user instead of the xrootd user. If your XRootD service only allows read-only access, you should skip installation of this plugin. To set up XRootD in multi-user mode, install the xrootd-multiuser package: root@xrootd-standalone # yum install xrootd-multiuser Throttling IO requests \u00b6 XRootD allows throttling of requests to the underlying filesystem. To enable this, In an /etc/xrootd/config.d/*.cfg file, e.g. /etc/xrootd/config.d/99-local.cfg , set the following configuration: xrootd.fslib throttle default throttle.throttle concurrency <CONCUR> data <RATE> Replacing <CONCUR> with the IO concurrency limit, measured in seconds (e.g., 100 connections taking 1ms each, would be 0.1), and <RATE> with the data rate limit in bytes per second. Note that you may also just specify either the concurrency limit: xrootd.fslib throttle default throttle.throttle concurrency <CONCUR> Or the data rate limit: xrootd.fslib throttle default throttle.throttle data <RATE> If XRootD is already running, restart the relevant XRootD service for your configuration to take effect. For more details of the throttling implementation, see the upstream documentation . Enabling CMS TFC support (CMS sites only) \u00b6 For CMS sites, there is a package available to integrate rule-based name lookup using a storage.xml file. If you are not setting up a service for CMS, skip this section. To install an xrootd-cmstfc on OSG 3.6, run the following command: root@xrootd-standalone # yum install --enablerepo = osg-contrib xrootd-cmstfc You will need to add your storage.xml to /etc/xrootd/storage.xml and then add the following line to your XRootD configuration: # Integrate with CMS TFC, placed in /etc/xrootd/storage.xml oss.namelib /usr/lib64/libXrdCmsTfc.so file:/etc/xrootd/storage.xml?protocol=hadoop Add the orange text only if you are running hadoop (see below). See the CMS TWiki for more information: https://twiki.cern.ch/twiki/bin/view/Main/XrootdTfcChanges https://twiki.cern.ch/twiki/bin/view/Main/HdfsXrootdInstall Using XRootD \u00b6 In addition to the XRootD service itself, there are a number of supporting services in your installation. The specific services are: Software Service Name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info XRootD xrootd@standalone Primary xrootd service if not running in multi-user mode XRootD Multi-user xrootd-privileged@standalone Primary xrootd service to start instead of xrootd@standalone if running in multi-user mode Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To \u2026 Run the command\u2026 Start a service systemctl start SERVICE-NAME Stop a service systemctl stop SERVICE-NAME Enable a service to start during boot systemctl enable SERVICE-NAME Disable a service from starting during boot systemctl disable SERVICE-NAME Validating XRootD \u00b6 To validate an XRootD installation, perform the following verification steps: Note If you have configured authentication/authorization for XRootD, be sure you have given yourself the necessary permissions to run these tests. For example, if you are using an X.509 proxy, make sure your DN is mapped to a user in /etc/grid-security/grid-mapfile , make sure you have a valid proxy on your local machine, and ensure that the Authfile on the XRootD server gives write access to the mapped user from /etc/grid-security/grid-mapfile . Verify authorization of bearer tokens and/or proxies Verify HTTP-TPC using the same GFAL2 client tools: Requires gfal2 >= 2.20.0 gfal2-2.20.0 contains a fix for a bug affecting XRootD HTTP-TPC support. Copy a file from your XRootD standalone host to another host and path where you have write access: root@xrootd-standalone # gfal-copy davs://localhost:1094/<PATH TO LOCAL FILE> \\ <REMOTE HOST>/<PATH TO WRITE REMOTE FILE> Replacing <PATH TO LOCAL FILE> with the path to a file that you can read on your host relative to rootdir ; <REMOTE HOST> with the protocol, FQDN, and port of the remote storage host; and <PATH TO WRITE REMOTE FILE> to a location on the remote storage host where you have write access. Copy a file from a remote host where you have read access to your XRootD standalone installation: root@xrootd-standalone # gfal-copy <REMOTE HOST>/<PATH TO REMOTE FILE> \\ davs://localhost:1094/<PATH TO WRITE LOCAL FILE> Replacing <REMOTE HOST> with the protocol, FQDN, and port of the remote storage host; <PATH TO REMOTE FILE> with the path to a file that you can read on the remote storage host; and <PATH TO WRITE LOCAL FILE> to a location on the XRootD standalone host relative to rootdir where you have write access. Registering an XRootD Standalone Server \u00b6 To register your XRootD server, follow the general registration instructions here with the following XRootD-specific details: Add an XRootD component: section to the Services: list, with any relevant fields for that service. This is a partial example: ... FQDN: <FULLY QUALIFIED DOMAIN NAME> Services: XRootD component: Description: Standalone XRootD server ... Replacing <FULLY QUALIFIED DOMAIN NAME> with your XRootD server's DNS entry. If you are setting up a new resource, set Active: false . Only set Active: true for a resource when it is accepting requests and ready for production. Getting Help \u00b6 To get assistance. please use the Help Procedure page. Reference \u00b6 XRootD documentation Export directive in the XRootD configuration and relevant options Service Configuration \u00b6 The configuration that your XRootD service uses is determined by the service name given to systemctl . To use the standalone config, you would start XRootD with the following command: root@host # systemctl start xrootd@standalone File locations \u00b6 Service/Process Configuration File Description xrootd /etc/xrootd/xrootd-standalone.cfg Main XRootD configuration /etc/xrootd/config.d/ Drop-in configuration dir /etc/xrootd/auth_file Authorized users file Service/Process Log File Description xrootd /var/log/xrootd/server/xrootd.log XRootD server daemon log cmsd /var/log/xrootd/server/cmsd.log Cluster management log","title":"Install XRootD Standalone"},{"location":"data/xrootd/install-standalone/#install-xrootd-standalone","text":"XRootD is a hierarchical storage system that can be used in many ways to access data, typically distributed among actual storage resources. In its standalone configuration, XRootD acts as a simple layer exporting data from a storage system to the outside world. This document focuses on installing a default configuration of XRootD standalone that provides the following features: Supports any POSIX-based storage system Macaroons, X.509 proxy, and VOMS proxy authentication Third-Party Copy over HTTP (HTTP-TPC)","title":"Install XRootD Standalone"},{"location":"data/xrootd/install-standalone/#before-starting","text":"Before starting the installation process, consider the following points: User IDs: If it does not exist already, the installation will create the Linux user ID xrootd Service certificate: The XRootD service uses a host certificate and key pair at /etc/grid-security/xrd/xrdcert.pem and /etc/grid-security/xrd/xrdkey.pem that must be owned by the xrootd user Networking: The XRootD service uses port 1094 by default As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"data/xrootd/install-standalone/#installing-xrootd","text":"Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. To install an XRootD Standalone server, run the following command: root@xrootd-standalone # yum install osg-xrootd-standalone","title":"Installing XRootD"},{"location":"data/xrootd/install-standalone/#configuring-xrootd","text":"To configure XRootD as a standalone server, you will modify /etc/xrootd/xrootd-standalone.cfg and the config files under /etc/xrootd/config.d/ as follows: Configure a rootdir in /etc/xrootd/config.d/10-common-site-local.cfg , to point to the top of the directory hierarchy which you wish to serve via XRootD. set rootdir = <DIRECTORY> Carefully consider your rootdir Do not set rootdir to / . This might result in serving private information. If you want to limit the sub-directories to serve under your configured rootdir , comment out the all.export / directive in /etc/xrootd/config.d/90-osg-standalone-paths.cfg , and add an all.export directive for each directory under rootdir that you wish to serve via XRootD. This is useful if you have a mixture of files under your rootdir , for example from multiple users, but only want to expose a subset of them to the world. For example, to serve the contents of /data/store and /data/public (with rootdir configured to /data ): all.export /store/ all.export /public/ If you want to serve everything under your configured rootdir , you don't have to change anything. Danger The directories specified this way are writable by default. Access controls should be managed via authorization configuration . In /etc/xrootd/config.d/10-common-site-local.cfg , add a line to set the resourcename variable. Unless your supported VOs' policies state otherwise, this should match the resource name of your XRootD service. For example, the XRootD service registered at the University of Florida site should set the following configuration: set resourcename = UFlorida-XRD","title":"Configuring XRootD"},{"location":"data/xrootd/install-standalone/#configuring-authentication-and-authorization","text":"XRootD offers several authentication options using security plugins to validate incoming credentials, such as bearer tokens, X.509 proxies, and VOMS proxies. Please follow the XRootD authorization documentation for instructions on how to configure authentication and authorization, including validating credentials and mapping them to users if desired.","title":"Configuring authentication and authorization"},{"location":"data/xrootd/install-standalone/#optional-configuration","text":"The following configuration steps are optional and will likely not be required for setting up a small site. If you do not need any of the following special configurations, skip to the section on using XRootD .","title":"Optional configuration"},{"location":"data/xrootd/install-standalone/#enabling-multi-user-support","text":"Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. The xrootd-multiuser plugin allows XRootD to write files on the storage system as the authenticated user instead of the xrootd user. If your XRootD service only allows read-only access, you should skip installation of this plugin. To set up XRootD in multi-user mode, install the xrootd-multiuser package: root@xrootd-standalone # yum install xrootd-multiuser","title":"Enabling multi-user support"},{"location":"data/xrootd/install-standalone/#throttling-io-requests","text":"XRootD allows throttling of requests to the underlying filesystem. To enable this, In an /etc/xrootd/config.d/*.cfg file, e.g. /etc/xrootd/config.d/99-local.cfg , set the following configuration: xrootd.fslib throttle default throttle.throttle concurrency <CONCUR> data <RATE> Replacing <CONCUR> with the IO concurrency limit, measured in seconds (e.g., 100 connections taking 1ms each, would be 0.1), and <RATE> with the data rate limit in bytes per second. Note that you may also just specify either the concurrency limit: xrootd.fslib throttle default throttle.throttle concurrency <CONCUR> Or the data rate limit: xrootd.fslib throttle default throttle.throttle data <RATE> If XRootD is already running, restart the relevant XRootD service for your configuration to take effect. For more details of the throttling implementation, see the upstream documentation .","title":"Throttling IO requests"},{"location":"data/xrootd/install-standalone/#enabling-cms-tfc-support-cms-sites-only","text":"For CMS sites, there is a package available to integrate rule-based name lookup using a storage.xml file. If you are not setting up a service for CMS, skip this section. To install an xrootd-cmstfc on OSG 3.6, run the following command: root@xrootd-standalone # yum install --enablerepo = osg-contrib xrootd-cmstfc You will need to add your storage.xml to /etc/xrootd/storage.xml and then add the following line to your XRootD configuration: # Integrate with CMS TFC, placed in /etc/xrootd/storage.xml oss.namelib /usr/lib64/libXrdCmsTfc.so file:/etc/xrootd/storage.xml?protocol=hadoop Add the orange text only if you are running hadoop (see below). See the CMS TWiki for more information: https://twiki.cern.ch/twiki/bin/view/Main/XrootdTfcChanges https://twiki.cern.ch/twiki/bin/view/Main/HdfsXrootdInstall","title":"Enabling CMS TFC support (CMS sites only)"},{"location":"data/xrootd/install-standalone/#using-xrootd","text":"In addition to the XRootD service itself, there are a number of supporting services in your installation. The specific services are: Software Service Name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info XRootD xrootd@standalone Primary xrootd service if not running in multi-user mode XRootD Multi-user xrootd-privileged@standalone Primary xrootd service to start instead of xrootd@standalone if running in multi-user mode Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To \u2026 Run the command\u2026 Start a service systemctl start SERVICE-NAME Stop a service systemctl stop SERVICE-NAME Enable a service to start during boot systemctl enable SERVICE-NAME Disable a service from starting during boot systemctl disable SERVICE-NAME","title":"Using XRootD"},{"location":"data/xrootd/install-standalone/#validating-xrootd","text":"To validate an XRootD installation, perform the following verification steps: Note If you have configured authentication/authorization for XRootD, be sure you have given yourself the necessary permissions to run these tests. For example, if you are using an X.509 proxy, make sure your DN is mapped to a user in /etc/grid-security/grid-mapfile , make sure you have a valid proxy on your local machine, and ensure that the Authfile on the XRootD server gives write access to the mapped user from /etc/grid-security/grid-mapfile . Verify authorization of bearer tokens and/or proxies Verify HTTP-TPC using the same GFAL2 client tools: Requires gfal2 >= 2.20.0 gfal2-2.20.0 contains a fix for a bug affecting XRootD HTTP-TPC support. Copy a file from your XRootD standalone host to another host and path where you have write access: root@xrootd-standalone # gfal-copy davs://localhost:1094/<PATH TO LOCAL FILE> \\ <REMOTE HOST>/<PATH TO WRITE REMOTE FILE> Replacing <PATH TO LOCAL FILE> with the path to a file that you can read on your host relative to rootdir ; <REMOTE HOST> with the protocol, FQDN, and port of the remote storage host; and <PATH TO WRITE REMOTE FILE> to a location on the remote storage host where you have write access. Copy a file from a remote host where you have read access to your XRootD standalone installation: root@xrootd-standalone # gfal-copy <REMOTE HOST>/<PATH TO REMOTE FILE> \\ davs://localhost:1094/<PATH TO WRITE LOCAL FILE> Replacing <REMOTE HOST> with the protocol, FQDN, and port of the remote storage host; <PATH TO REMOTE FILE> with the path to a file that you can read on the remote storage host; and <PATH TO WRITE LOCAL FILE> to a location on the XRootD standalone host relative to rootdir where you have write access.","title":"Validating XRootD"},{"location":"data/xrootd/install-standalone/#registering-an-xrootd-standalone-server","text":"To register your XRootD server, follow the general registration instructions here with the following XRootD-specific details: Add an XRootD component: section to the Services: list, with any relevant fields for that service. This is a partial example: ... FQDN: <FULLY QUALIFIED DOMAIN NAME> Services: XRootD component: Description: Standalone XRootD server ... Replacing <FULLY QUALIFIED DOMAIN NAME> with your XRootD server's DNS entry. If you are setting up a new resource, set Active: false . Only set Active: true for a resource when it is accepting requests and ready for production.","title":"Registering an XRootD Standalone Server"},{"location":"data/xrootd/install-standalone/#getting-help","text":"To get assistance. please use the Help Procedure page.","title":"Getting Help"},{"location":"data/xrootd/install-standalone/#reference","text":"XRootD documentation Export directive in the XRootD configuration and relevant options","title":"Reference"},{"location":"data/xrootd/install-standalone/#service-configuration","text":"The configuration that your XRootD service uses is determined by the service name given to systemctl . To use the standalone config, you would start XRootD with the following command: root@host # systemctl start xrootd@standalone","title":"Service Configuration"},{"location":"data/xrootd/install-standalone/#file-locations","text":"Service/Process Configuration File Description xrootd /etc/xrootd/xrootd-standalone.cfg Main XRootD configuration /etc/xrootd/config.d/ Drop-in configuration dir /etc/xrootd/auth_file Authorized users file Service/Process Log File Description xrootd /var/log/xrootd/server/xrootd.log XRootD server daemon log cmsd /var/log/xrootd/server/cmsd.log Cluster management log","title":"File locations"},{"location":"data/xrootd/install-storage-element/","text":"Installing an XRootD Storage Element \u00b6 XRootD is a hierarchical storage system that can be used in a variety of ways to access data, typically distributed among actual storage resources. One way to use XRootD is to have it refer to many data resources at a single site, and another way to use it is to refer to many storage systems, most likely distributed among sites. An XRootD system includes a redirector , which accepts requests for data and finds a storage repository \u2014 locally or otherwise \u2014 that can provide the data to the requestor. Use this page to learn how to install, configure, and use an XRootD redirector as part of a Storage Element (SE) or as part of a global namespace. Before Starting \u00b6 Before starting the installation process, consider the following points: User IDs: If it does not exist already, the installation will create the Linux user ID xrootd Service certificate: The XRootD service uses a host certificate at /etc/grid-security/host*.pem Networking: The XRootD service uses port 1094 by default As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Installing an XRootD Server \u00b6 An installation of the XRootD server consists of the server itself and its dependencies. Install these with Yum: root@host # yum install osg-xrootd Configuring an XRootD Server \u00b6 An advanced XRootD setup has multiple components; it is important to validate that each additional component that you set up is working before moving on to the next component. We have included validation instructions after each component below. Creating an XRootD cluster \u00b6 If your storage is spread out over multiple hosts, you will need to set up an XRootD cluster . The cluster uses one \"redirector\" node as a frontend for user accesses, and multiple data nodes that have the data that users request. Two daemons will run on each node: xrootd The eXtended Root Daemon controls file access and storage. cmsd The Cluster Management Services Daemon controls communication between nodes. Note that for large virtual organizations, a site-level redirector may actually also communicate upwards to a regional or global redirector that handles access to a multi-level hierarchy. This section will only cover handling one level of XRootD hierarchy. In the instructions below, <RDRNODE> will refer to the redirector host and <DATANODE> will refer to the data node host. These should be replaced with the fully-qualified domain name of the host in question. Modify /etc/xrootd/xrootd-clustered.cfg \u00b6 You will need to modify the xrootd-clustered.cfg on the redirector node and each data node. The following example should serve as a base configuration for clustering. Further customizations are detailed below. all.export /mnt/xrootd stage set xrdr = <RDRNODE> all.manager $(xrdr):3121 if $(xrdr) # Lines in this block are only executed on the redirector node all.role manager else # Lines in this block are executed on all nodes but the redirector node all.role server cms.space min 2g 5g fi You will need to customize the following lines: Configuration Line Changes Needed all.export /mnt/xrootd stage Change /mnt/xrootd to the directory to allow XRootD access to set xrdr=<RDRNODE> Change to the hostname of the redirector cms.space min 2g 5g Reserve this amount of free space on the node. For this example, if space falls below 2GB, xrootd will not store further files on this node until space climbs above 5GB. You can use k , m , g , or t to indicate kilobyte, megabytes, gigabytes, or terabytes, respectively. Further information can be found at https://xrootd.slac.stanford.edu/docs.html Verifying the clustered config \u00b6 Start both xrootd and cmsd on all nodes according to the instructions in the Using XRootD section . Verify that you can copy a file such as /bin/sh to /mnt/xrootd on the server data via the redirector: root@host # xrdcp /bin/sh root://<RDRNODE>:1094///mnt/xrootd/second_test [xrootd] Total 0.76 MB [====================] 100.00 % [inf MB/s] Check that the /mnt/xrootd/second_test is located on data server <DATANODE> . (Optional) Adding High Availability (HA) redirectors \u00b6 It is possible to have an XRootD clustered setup with more than one redirector to ensure high availability service. To do this: In the /etc/xrootd/xrootd-clustered.cfg on each data node follow the instructions in this section with: set xrdr1 = <RDRNODE1> set xrdr2 = <RDRNODE2> all.manager $(xrdr1):3121 all.manager $(xrdr2):3121 Create DNS ALIAS records for <RDRNODE> pointing to <RDNODE1> and <RDRNODE2> Advertise the <RDRNODE> FQDN to users interacting with the XRootD cluster should be <RDRNODE> . (Optional) Adding Simple Server Inventory to your cluster \u00b6 The Simple Server Inventory (SSI) provide means to have an inventory for each data server. SSI requires: A second instance of the xrootd daemon on the redirector A \"composite name space daemon\" ( XrdCnsd ) on each data server; this daemon handles the inventory As an example, we will set up a two-node XRootD cluster with SSI. Host A is a redirector node that is running the following daemons: xrootd redirector cmsd xrootd - second instance that required for SSI Host B is a data server that is running the following daemons: xrootd data server cmsd XrdCnsd - started automatically by xrootd We will need to create a directory on the redirector node for Inventory files. root@host # mkdir -p /data/inventory root@host # chown xrootd:xrootd /data/inventory On the data server (host B) let's use a storage cache that will be at a different location from /mnt/xrootd . root@host # mkdir -p /local/xrootd root@host # chown xrootd:xrootd /local/xrootd We will be running two instances of XRootD on <HOST A> . Modify /etc/xrootd/xrootd-clustered.cfg to give the two instances different behavior, as such: all.export /data/xrootdfs set xrdr=<HOST A> all.manager $(xrdr):3121 if $(xrdr) && named cns all.export /data/inventory xrd.port 1095 else if $(xrdr) all.role manager xrd.port 1094 else all.role server oss.localroot /local/xrootd ofs.notify closew create mkdir mv rm rmdir trunc | /usr/bin/XrdCnsd -d -D 2 -i 90 -b $(xrdr):1095:/data/inventory #add cms.space if you have less the 11GB # cms.space options https://xrootd.slac.stanford.edu/doc/dev410/cms_config.htm cms.space min 2g 5g fi The value of oss.localroot will be prepended to any file access. E.g. accessing root://<RDRNODE>:1094//data/xrootdfs/test1 will actually go to /local/xrootd/data/xrootdfs/test1 . Starting a second instance of XRootD \u00b6 Create a symlink pointing to /etc/xrootd/xrootd-clustered.cfg at /etc/xrootd/xrootd-cns.cfg : root@host # ln -s /etc/xrootd/xrootd-clustered.cfg /etc/xrootd/xrootd-cns.cfg Start an instance of the xrootd service named cns using the syntax in the managing services section : root@host # systemctl start xrootd@cns Testing an XRootD cluster with SSI \u00b6 Copy file to redirector node specifying storage path (/data/xrootdfs instead of /mnt/xrootd): root@host # xrdcp /bin/sh root://<RDRNODE>:1094//data/xrootdfs/test1 [xrootd] Total 0.00 MB [================] 100.00 % [inf MB/s] To verify that SSI is working execute cns_ssi command on the redirector node: root@host # cns_ssi list /data/inventory fermicloud054.fnal.gov incomplete inventory as of Mon Apr 11 17:28:11 2011 root@host # cns_ssi updt /data/inventory cns_ssi: fermicloud054.fnal.gov inventory with 1 directory and 1 file updated with 0 errors. root@host # cns_ssi list /data/inventory fermicloud054.fnal.gov complete inventory as of Tue Apr 12 07:38:29 2011 /data/xrootdfs/test1 Note : In this example, fermicloud53.fnal.gov is a redirector node and fermicloud054.fnal.gov is a data node. (Optional) Enabling Xrootd over HTTP \u00b6 XRootD can be accessed using the HTTP protocol. To do that: Add the following line to /etc/xrootd/config.d/10-common-site-local.cfg : set EnableHttp = 1 Testing the configuration From the terminal, generate a proxy and attempt to use davix-get to copy from your XRootD host (the XRootD service needs running; see the services section ). For example, if your server has a file named /store/user/test.root : davix-get https://<YOUR FQDN>:1094/store/user/test.root -E /mnt/xrootd/x509up_u`id -u` --capath /etc/grid-security/certificates Note For clients to successfully read from the regional redirector, HTTPS must be enabled for the data servers and the site-level redirector. Warning If you have u * in your Authfile, recall this provides an authorization to ALL users, including unauthenticated. This includes random web spiders! (Optional) Enable HTTP based Writes \u00b6 No changes to the HTTP module is needed to enable HTTP-based writes. The HTTP protocol uses the same authorization setup as the XRootD protocol. For example, you may need to provide a (all) style authorizations to allow users authorization to write. See the Authentication File section for more details. (Optional) Enabling a FUSE mount \u00b6 XRootD storage can be mounted as a standard POSIX filesystem via FUSE, providing users with a more familiar interface.. Modify /etc/fstab by adding the following entries: .... xrootdfs /mnt/xrootd fuse rdr=xroot://<REDIRECTOR FQDN>:1094/<PATH TO FILE>,uid=xrootd 0 0 Replace /mnt/xrootd with the path that you would like to access with. Create /mnt/xrootd directory. Make sure the xrootd user exists on the system. Once you are finished, you can mount it: mount /mnt/xrootd You should now be able to run UNIX commands such as ls /mnt/xrootd to see the contents of the XRootD server. (Optional) Authorization \u00b6 For information on how to configure XRootD authorization, please refer to the Configuring XRootD Authorization guide . (Optional) Adding CMS TFC support to XRootD (CMS sites only) \u00b6 For CMS users, there is a package available to integrate rule-based name lookup using a storage.xml file. See this documentation . (Optional) Adding Multi user support for an XRootd server \u00b6 For documentation how to enable multi-user support using XRootD see this documentation . (Optional) Adding File Residency Manager (FRM) to an XRootd cluster \u00b6 If you have a multi-tiered storage system (e.g. some data is stored on SSDs and some on disks or tapes), then install the File Residency Manager (FRM), so you can move data between tiers more easily. If you do not have a multi-tiered storage system, then you do not need FRM and you can skip this section. The FRM deals with two major mechanisms: local disk remote servers The description of fully functional multiple XRootD clusters is beyond the scope of this document. In order to have this fully functional system you will need a global redirector and at least one remote XRootD cluster from where files could be moved to the local cluster. Below are the modifications you should make in order to enable FRM on your local cluster: Make sure that FRM is enabled in /etc/sysconfig/xrootd on your data sever: ROOTD_USER=xrootd XROOTD_GROUP=xrootd XROOTD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/xrootd.log -c /etc/xrootd/xrootd-clustered.cfg\" CMSD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/cmsd.log -c /etc/xrootd/xrootd-clustered.cfg\" FRMD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/frmd.log -c /etc/xrootd/xrootd-clustered.cfg\" XROOTD_INSTANCES=\"default\" CMSD_INSTANCES=\"default\" FRMD_INSTANCES=\"default\" Modify /etc/xrootd/xrootd-clustered.cfg on both nodes to specify options for frm_xfrd (File Transfer Daemon) and frm_purged (File Purging Daemon). For more information, you can visit the FRM Documentation Start frm daemons on data server: root@host # service frm_xfrd start root@host # service frm_purged start Using XRootD \u00b6 Managing XRootD services \u00b6 Start services on the redirector node before starting any services on the data nodes. If you installed only XRootD itself, you will only need to start the xrootd service. However, if you installed cluster management services, you will need to start cmsd as well. XRootD determines which configuration to use based on the service name specified by systemctl . For example, to have xrootd use the clustered config, you would start up xrootd with this line: root@host # systemctl start xrootd@clustered To use the standalone config instead, you would use: root@host # systemctl start xrootd@standalone The services are: Service EL 7 & 8 service name XRootD (standalone config) xrootd@standalone XRootD (clustered config) xrootd@clustered XRootD (multiuser) xrootd-privileged@clustered CMSD (clustered config) cmsd@clustered As a reminder, here are common service commands (all run as root ): To ... On EL 7 & 8, run the command... Start a service systemctl start SERVICE-NAME Stop a service systemctl stop SERVICE-NAME Enable a service to start during boot systemctl enable SERVICE-NAME Disable a service from starting during boot systemctl disable SERVICE-NAME Getting Help \u00b6 To get assistance. please use the Help Procedure page. Reference \u00b6 File locations \u00b6 Service/Process Configuration File Description xrootd /etc/xrootd/xrootd-clustered.cfg Main clustered mode XRootD configuration /etc/xrootd/auth_file Authorized users file Service/Process Log File Description xrootd /var/log/xrootd/xrootd.log XRootD server daemon log cmsd /var/log/xrootd/cmsd.log Cluster management log cns /var/log/xrootd/cns/xrootd.log Server inventory (composite name space) log frm_xfrd , frm_purged /var/log/xrootd/frmd.log File Residency Manager log Links \u00b6 XRootD documentation","title":"Install XRootD SE"},{"location":"data/xrootd/install-storage-element/#installing-an-xrootd-storage-element","text":"XRootD is a hierarchical storage system that can be used in a variety of ways to access data, typically distributed among actual storage resources. One way to use XRootD is to have it refer to many data resources at a single site, and another way to use it is to refer to many storage systems, most likely distributed among sites. An XRootD system includes a redirector , which accepts requests for data and finds a storage repository \u2014 locally or otherwise \u2014 that can provide the data to the requestor. Use this page to learn how to install, configure, and use an XRootD redirector as part of a Storage Element (SE) or as part of a global namespace.","title":"Installing an XRootD Storage Element"},{"location":"data/xrootd/install-storage-element/#before-starting","text":"Before starting the installation process, consider the following points: User IDs: If it does not exist already, the installation will create the Linux user ID xrootd Service certificate: The XRootD service uses a host certificate at /etc/grid-security/host*.pem Networking: The XRootD service uses port 1094 by default As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"data/xrootd/install-storage-element/#installing-an-xrootd-server","text":"An installation of the XRootD server consists of the server itself and its dependencies. Install these with Yum: root@host # yum install osg-xrootd","title":"Installing an XRootD Server"},{"location":"data/xrootd/install-storage-element/#configuring-an-xrootd-server","text":"An advanced XRootD setup has multiple components; it is important to validate that each additional component that you set up is working before moving on to the next component. We have included validation instructions after each component below.","title":"Configuring an XRootD Server"},{"location":"data/xrootd/install-storage-element/#creating-an-xrootd-cluster","text":"If your storage is spread out over multiple hosts, you will need to set up an XRootD cluster . The cluster uses one \"redirector\" node as a frontend for user accesses, and multiple data nodes that have the data that users request. Two daemons will run on each node: xrootd The eXtended Root Daemon controls file access and storage. cmsd The Cluster Management Services Daemon controls communication between nodes. Note that for large virtual organizations, a site-level redirector may actually also communicate upwards to a regional or global redirector that handles access to a multi-level hierarchy. This section will only cover handling one level of XRootD hierarchy. In the instructions below, <RDRNODE> will refer to the redirector host and <DATANODE> will refer to the data node host. These should be replaced with the fully-qualified domain name of the host in question.","title":"Creating an XRootD cluster"},{"location":"data/xrootd/install-storage-element/#modify-etcxrootdxrootd-clusteredcfg","text":"You will need to modify the xrootd-clustered.cfg on the redirector node and each data node. The following example should serve as a base configuration for clustering. Further customizations are detailed below. all.export /mnt/xrootd stage set xrdr = <RDRNODE> all.manager $(xrdr):3121 if $(xrdr) # Lines in this block are only executed on the redirector node all.role manager else # Lines in this block are executed on all nodes but the redirector node all.role server cms.space min 2g 5g fi You will need to customize the following lines: Configuration Line Changes Needed all.export /mnt/xrootd stage Change /mnt/xrootd to the directory to allow XRootD access to set xrdr=<RDRNODE> Change to the hostname of the redirector cms.space min 2g 5g Reserve this amount of free space on the node. For this example, if space falls below 2GB, xrootd will not store further files on this node until space climbs above 5GB. You can use k , m , g , or t to indicate kilobyte, megabytes, gigabytes, or terabytes, respectively. Further information can be found at https://xrootd.slac.stanford.edu/docs.html","title":"Modify /etc/xrootd/xrootd-clustered.cfg"},{"location":"data/xrootd/install-storage-element/#verifying-the-clustered-config","text":"Start both xrootd and cmsd on all nodes according to the instructions in the Using XRootD section . Verify that you can copy a file such as /bin/sh to /mnt/xrootd on the server data via the redirector: root@host # xrdcp /bin/sh root://<RDRNODE>:1094///mnt/xrootd/second_test [xrootd] Total 0.76 MB [====================] 100.00 % [inf MB/s] Check that the /mnt/xrootd/second_test is located on data server <DATANODE> .","title":"Verifying the clustered config"},{"location":"data/xrootd/install-storage-element/#optional-adding-high-availability-ha-redirectors","text":"It is possible to have an XRootD clustered setup with more than one redirector to ensure high availability service. To do this: In the /etc/xrootd/xrootd-clustered.cfg on each data node follow the instructions in this section with: set xrdr1 = <RDRNODE1> set xrdr2 = <RDRNODE2> all.manager $(xrdr1):3121 all.manager $(xrdr2):3121 Create DNS ALIAS records for <RDRNODE> pointing to <RDNODE1> and <RDRNODE2> Advertise the <RDRNODE> FQDN to users interacting with the XRootD cluster should be <RDRNODE> .","title":"(Optional) Adding High Availability (HA) redirectors"},{"location":"data/xrootd/install-storage-element/#optional-adding-simple-server-inventory-to-your-cluster","text":"The Simple Server Inventory (SSI) provide means to have an inventory for each data server. SSI requires: A second instance of the xrootd daemon on the redirector A \"composite name space daemon\" ( XrdCnsd ) on each data server; this daemon handles the inventory As an example, we will set up a two-node XRootD cluster with SSI. Host A is a redirector node that is running the following daemons: xrootd redirector cmsd xrootd - second instance that required for SSI Host B is a data server that is running the following daemons: xrootd data server cmsd XrdCnsd - started automatically by xrootd We will need to create a directory on the redirector node for Inventory files. root@host # mkdir -p /data/inventory root@host # chown xrootd:xrootd /data/inventory On the data server (host B) let's use a storage cache that will be at a different location from /mnt/xrootd . root@host # mkdir -p /local/xrootd root@host # chown xrootd:xrootd /local/xrootd We will be running two instances of XRootD on <HOST A> . Modify /etc/xrootd/xrootd-clustered.cfg to give the two instances different behavior, as such: all.export /data/xrootdfs set xrdr=<HOST A> all.manager $(xrdr):3121 if $(xrdr) && named cns all.export /data/inventory xrd.port 1095 else if $(xrdr) all.role manager xrd.port 1094 else all.role server oss.localroot /local/xrootd ofs.notify closew create mkdir mv rm rmdir trunc | /usr/bin/XrdCnsd -d -D 2 -i 90 -b $(xrdr):1095:/data/inventory #add cms.space if you have less the 11GB # cms.space options https://xrootd.slac.stanford.edu/doc/dev410/cms_config.htm cms.space min 2g 5g fi The value of oss.localroot will be prepended to any file access. E.g. accessing root://<RDRNODE>:1094//data/xrootdfs/test1 will actually go to /local/xrootd/data/xrootdfs/test1 .","title":"(Optional) Adding Simple Server Inventory to your cluster"},{"location":"data/xrootd/install-storage-element/#starting-a-second-instance-of-xrootd","text":"Create a symlink pointing to /etc/xrootd/xrootd-clustered.cfg at /etc/xrootd/xrootd-cns.cfg : root@host # ln -s /etc/xrootd/xrootd-clustered.cfg /etc/xrootd/xrootd-cns.cfg Start an instance of the xrootd service named cns using the syntax in the managing services section : root@host # systemctl start xrootd@cns","title":"Starting a second instance of XRootD"},{"location":"data/xrootd/install-storage-element/#testing-an-xrootd-cluster-with-ssi","text":"Copy file to redirector node specifying storage path (/data/xrootdfs instead of /mnt/xrootd): root@host # xrdcp /bin/sh root://<RDRNODE>:1094//data/xrootdfs/test1 [xrootd] Total 0.00 MB [================] 100.00 % [inf MB/s] To verify that SSI is working execute cns_ssi command on the redirector node: root@host # cns_ssi list /data/inventory fermicloud054.fnal.gov incomplete inventory as of Mon Apr 11 17:28:11 2011 root@host # cns_ssi updt /data/inventory cns_ssi: fermicloud054.fnal.gov inventory with 1 directory and 1 file updated with 0 errors. root@host # cns_ssi list /data/inventory fermicloud054.fnal.gov complete inventory as of Tue Apr 12 07:38:29 2011 /data/xrootdfs/test1 Note : In this example, fermicloud53.fnal.gov is a redirector node and fermicloud054.fnal.gov is a data node.","title":"Testing an XRootD cluster with SSI"},{"location":"data/xrootd/install-storage-element/#optional-enabling-xrootd-over-http","text":"XRootD can be accessed using the HTTP protocol. To do that: Add the following line to /etc/xrootd/config.d/10-common-site-local.cfg : set EnableHttp = 1 Testing the configuration From the terminal, generate a proxy and attempt to use davix-get to copy from your XRootD host (the XRootD service needs running; see the services section ). For example, if your server has a file named /store/user/test.root : davix-get https://<YOUR FQDN>:1094/store/user/test.root -E /mnt/xrootd/x509up_u`id -u` --capath /etc/grid-security/certificates Note For clients to successfully read from the regional redirector, HTTPS must be enabled for the data servers and the site-level redirector. Warning If you have u * in your Authfile, recall this provides an authorization to ALL users, including unauthenticated. This includes random web spiders!","title":"(Optional) Enabling Xrootd over HTTP"},{"location":"data/xrootd/install-storage-element/#optional-enable-http-based-writes","text":"No changes to the HTTP module is needed to enable HTTP-based writes. The HTTP protocol uses the same authorization setup as the XRootD protocol. For example, you may need to provide a (all) style authorizations to allow users authorization to write. See the Authentication File section for more details.","title":"(Optional) Enable HTTP based Writes"},{"location":"data/xrootd/install-storage-element/#optional-enabling-a-fuse-mount","text":"XRootD storage can be mounted as a standard POSIX filesystem via FUSE, providing users with a more familiar interface.. Modify /etc/fstab by adding the following entries: .... xrootdfs /mnt/xrootd fuse rdr=xroot://<REDIRECTOR FQDN>:1094/<PATH TO FILE>,uid=xrootd 0 0 Replace /mnt/xrootd with the path that you would like to access with. Create /mnt/xrootd directory. Make sure the xrootd user exists on the system. Once you are finished, you can mount it: mount /mnt/xrootd You should now be able to run UNIX commands such as ls /mnt/xrootd to see the contents of the XRootD server.","title":"(Optional) Enabling a FUSE mount"},{"location":"data/xrootd/install-storage-element/#optional-authorization","text":"For information on how to configure XRootD authorization, please refer to the Configuring XRootD Authorization guide .","title":"(Optional) Authorization"},{"location":"data/xrootd/install-storage-element/#optional-adding-cms-tfc-support-to-xrootd-cms-sites-only","text":"For CMS users, there is a package available to integrate rule-based name lookup using a storage.xml file. See this documentation .","title":"(Optional) Adding CMS TFC support to XRootD (CMS sites only)"},{"location":"data/xrootd/install-storage-element/#optional-adding-multi-user-support-for-an-xrootd-server","text":"For documentation how to enable multi-user support using XRootD see this documentation .","title":"(Optional) Adding Multi user support for an XRootd server"},{"location":"data/xrootd/install-storage-element/#optional-adding-file-residency-manager-frm-to-an-xrootd-cluster","text":"If you have a multi-tiered storage system (e.g. some data is stored on SSDs and some on disks or tapes), then install the File Residency Manager (FRM), so you can move data between tiers more easily. If you do not have a multi-tiered storage system, then you do not need FRM and you can skip this section. The FRM deals with two major mechanisms: local disk remote servers The description of fully functional multiple XRootD clusters is beyond the scope of this document. In order to have this fully functional system you will need a global redirector and at least one remote XRootD cluster from where files could be moved to the local cluster. Below are the modifications you should make in order to enable FRM on your local cluster: Make sure that FRM is enabled in /etc/sysconfig/xrootd on your data sever: ROOTD_USER=xrootd XROOTD_GROUP=xrootd XROOTD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/xrootd.log -c /etc/xrootd/xrootd-clustered.cfg\" CMSD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/cmsd.log -c /etc/xrootd/xrootd-clustered.cfg\" FRMD_DEFAULT_OPTIONS=\"-l /var/log/xrootd/frmd.log -c /etc/xrootd/xrootd-clustered.cfg\" XROOTD_INSTANCES=\"default\" CMSD_INSTANCES=\"default\" FRMD_INSTANCES=\"default\" Modify /etc/xrootd/xrootd-clustered.cfg on both nodes to specify options for frm_xfrd (File Transfer Daemon) and frm_purged (File Purging Daemon). For more information, you can visit the FRM Documentation Start frm daemons on data server: root@host # service frm_xfrd start root@host # service frm_purged start","title":"(Optional) Adding File Residency Manager (FRM) to an XRootd cluster"},{"location":"data/xrootd/install-storage-element/#using-xrootd","text":"","title":"Using XRootD"},{"location":"data/xrootd/install-storage-element/#managing-xrootd-services","text":"Start services on the redirector node before starting any services on the data nodes. If you installed only XRootD itself, you will only need to start the xrootd service. However, if you installed cluster management services, you will need to start cmsd as well. XRootD determines which configuration to use based on the service name specified by systemctl . For example, to have xrootd use the clustered config, you would start up xrootd with this line: root@host # systemctl start xrootd@clustered To use the standalone config instead, you would use: root@host # systemctl start xrootd@standalone The services are: Service EL 7 & 8 service name XRootD (standalone config) xrootd@standalone XRootD (clustered config) xrootd@clustered XRootD (multiuser) xrootd-privileged@clustered CMSD (clustered config) cmsd@clustered As a reminder, here are common service commands (all run as root ): To ... On EL 7 & 8, run the command... Start a service systemctl start SERVICE-NAME Stop a service systemctl stop SERVICE-NAME Enable a service to start during boot systemctl enable SERVICE-NAME Disable a service from starting during boot systemctl disable SERVICE-NAME","title":"Managing XRootD services"},{"location":"data/xrootd/install-storage-element/#getting-help","text":"To get assistance. please use the Help Procedure page.","title":"Getting Help"},{"location":"data/xrootd/install-storage-element/#reference","text":"","title":"Reference"},{"location":"data/xrootd/install-storage-element/#file-locations","text":"Service/Process Configuration File Description xrootd /etc/xrootd/xrootd-clustered.cfg Main clustered mode XRootD configuration /etc/xrootd/auth_file Authorized users file Service/Process Log File Description xrootd /var/log/xrootd/xrootd.log XRootD server daemon log cmsd /var/log/xrootd/cmsd.log Cluster management log cns /var/log/xrootd/cns/xrootd.log Server inventory (composite name space) log frm_xfrd , frm_purged /var/log/xrootd/frmd.log File Residency Manager log","title":"File locations"},{"location":"data/xrootd/install-storage-element/#links","text":"XRootD documentation","title":"Links"},{"location":"data/xrootd/overview/","text":"XRootD Overview \u00b6 XRootD is a highly-configurable data server used by sites in the OSG to support VO-specific storage needs. The software can be used to create an export of an existing file system through multiple protocols, participate in a data federation, or act as a caching service. XRootD data servers can stream data directly to client applications or support experiment-wide data management by performing bulk data transfer via \"third-party-copy\" between distinct sites. The OSG supports multiple different configurations of XRootD: XCache \u00b6 Previously known as the \"XRootD proxy cache\", XCache provides a caching service for data federations that serve one or more VOs. If your site contributes large amounts of computing resources to the OSG, a site XCache could be part of a solution to help reduce incoming WAN usage. In the OSG, there are three data federations based on XCache: ATLAS XCache, CMS XCache, and StashCache for all other VOs. If you are affiliated with a site or VO interested in contributing to a data federation, contact us at help@osg-htc.org . XRootD Standalone \u00b6 An XRootD standalone server exports data from an existing network storage solution, such as HDFS or Lustre, using both the XRootD and WebDAV protocols. Generally, only sites affiliated with large VOs would need to install an XRootD standalone server so consult your VO if you are interested in contributing storage. XRootD Storage Element \u00b6 For an XRootD storage element (SE) , the XRootD software acts as the network storage technology, exporting data from multiple, distributed hosts using both the XRootD and WebDAV protocols. Generally, only sites affiliated with large VOs would need to install an XRootD SE so consult your VO if you are interested in contributing storage.","title":"XRootD Overview"},{"location":"data/xrootd/overview/#xrootd-overview","text":"XRootD is a highly-configurable data server used by sites in the OSG to support VO-specific storage needs. The software can be used to create an export of an existing file system through multiple protocols, participate in a data federation, or act as a caching service. XRootD data servers can stream data directly to client applications or support experiment-wide data management by performing bulk data transfer via \"third-party-copy\" between distinct sites. The OSG supports multiple different configurations of XRootD:","title":"XRootD Overview"},{"location":"data/xrootd/overview/#xcache","text":"Previously known as the \"XRootD proxy cache\", XCache provides a caching service for data federations that serve one or more VOs. If your site contributes large amounts of computing resources to the OSG, a site XCache could be part of a solution to help reduce incoming WAN usage. In the OSG, there are three data federations based on XCache: ATLAS XCache, CMS XCache, and StashCache for all other VOs. If you are affiliated with a site or VO interested in contributing to a data federation, contact us at help@osg-htc.org .","title":"XCache"},{"location":"data/xrootd/overview/#xrootd-standalone","text":"An XRootD standalone server exports data from an existing network storage solution, such as HDFS or Lustre, using both the XRootD and WebDAV protocols. Generally, only sites affiliated with large VOs would need to install an XRootD standalone server so consult your VO if you are interested in contributing storage.","title":"XRootD Standalone"},{"location":"data/xrootd/overview/#xrootd-storage-element","text":"For an XRootD storage element (SE) , the XRootD software acts as the network storage technology, exporting data from multiple, distributed hosts using both the XRootD and WebDAV protocols. Generally, only sites affiliated with large VOs would need to install an XRootD SE so consult your VO if you are interested in contributing storage.","title":"XRootD Storage Element"},{"location":"data/xrootd/xrootd-authorization/","text":"Configuring XRootD Authorization \u00b6 XRootD offers several authentication options using security plugins to validate incoming credentials, such as bearer tokens, X.509 proxies, and VOMS proxies. In the case of X.509 and VOMS proxies, after the incoming credential has been mapped to a username or groupname, the authorization database is used to provide fine-grained file access. Note On data nodes, files will be owned by Unix user xrootd (or other daemon user), not as the user authenticated to, under most circumstances. XRootD will verify the permissions and authorization based on the user that the security plugin authenticates you to, but, internally, the data node files will be owned by the xrootd user. If this behaviour is not desired, enable XRootD multi-user support . Authorizing Bearer Tokens \u00b6 The OSG 3.6 configurations of XRootD support authorization of bearer tokens such as macaroons, SciTokens, or WLCG tokens. Encoded in the bearer tokens themselves are information about the files that they should have read/write access to and in the case of SciTokens and WLCG tokens, you may configure XRootD to further restrict access. Configuring SciTokens/WLCG Tokens \u00b6 SciTokens and WLCG Tokens are asymmetrically signed bearer tokens: they are signed by a token issuer (e.g., CILogon, IAM) and can be verified with the token issuer's public key. To configure XRootD to accept tokens from a given token issuer use the following instructions: Add a section for each token issuer to /etc/xrootd/scitokens.conf : [Issuer <NAME>] issuer = <URL> base_path = <RELATIVE PATH> Replacing <NAME > with a descriptive name, <URL> with the token issuer URL, and base_path to a path relative to rootdir that the client should be restricted to accessing. (Optional) if you want to map the incoming token for a given issuer to a Unix username: Install xrootd-multiuser Add the following to the relevant issuer section in /etc/xrootd/scitokens.conf : map_subject = True (Optional) if you want to only accept tokens with the appropriate aud field, add the following to /etc/xrootd/scitokens.conf : [Global] audience = <COMMMA SEPARATED LIST OF AUDIENCES> An example configuration that supports tokens issued by the OSG Connect and CMS: [Global] audience = https://testserver.example.com/, MySite [Issuer OSG-Connect] issuer = https://scitokens.org/osg-connect base_path = /stash map_subject = True [Issuer CMS] issuer = https://scitokens.org/cms base_path = /user/cms Configuring macaroons \u00b6 Macaroons are symetrically signed bearer tokens so your XRootD host must have access to the same secret key that is used to sign incoming macaroons. When used in an XRootD cluster, all data nodes and the redirector need access to the same secret. To enable macaroon support: Place the shared secret in /etc/xrootd/macaroon-secret Ensure that it has the appropriate file ownership and permissions: root@host # chown xrootd:xrootd /etc/xrootd/macaroon-secret root@host # chmod 0600 /etc/xrootd/macaroon-secret Authorizing X.509 proxies \u00b6 Authenticating proxies \u00b6 Authorizations for proxy-based security are declared in an XRootD authorization database file . XRootD authentication plugins are used to provide the mappings that are used in the database. Starting with OSG 3.6 , DN mappings are performed with XRootD's built-in GSI support, and FQAN mappings are with the XRootD-VOMS ( XrdVoms ) plugin. To enable proxy authentication, edit /etc/xrootd/config.d/10-osg-xrdvoms.cfg and add or uncomment the line set EnableVoms = 1 Note Proxy authentication is already enabled in XRootD Standalone , so this step is not necessary there. Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, or XRootD 5.5.0 or newer. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. Key length requirements Servers on EL 8 or newer will reject proxies that are not at least 2048 bits long. Ensure your clients' proxies have at least 2048 bits long with voms-proxy-info ; if necessary, have them add the argument -bits 2048 to their voms-proxy-init calls. Mapping subject DNs \u00b6 DN mappings take precedence over VOMS attributes If you have mapped the subject Distinguished Name (DN) of an incoming proxy with VOMS attributes, XRootD will map it to a username. In OSG 3.6, X.509 proxies are mapped using the built-in XRootD GSI plug-in. To map an incoming proxy's subject DN to an XRootD username , add lines of the following format to /etc/grid-security/grid-mapfile : \"<SUBJECT DN>\" <AUTHDB USERNAME> Replacing <SUBJECT DN> with the X.509 proxy's DN to map and <AUTHDB USERNAME> with the username to reference in the authorization database . For example, the following mapping: \"/DC=org/DC=cilogon/C=US/O=University of Wisconsin-Madison/CN=Brian Lin A2266246\" blin Will result in the username blin , i.e. authorize access to clients presenting the above proxy with u blin ... in the authorization database. Mapping VOMS attributes \u00b6 Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, or XRootD 5.5.0 or newer. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. In OSG 3.6, if the XRootD-VOMS plugin is enabled, an incoming VOMS proxy will authenticate the first VOMS FQAN and map it to an organization name ( o ), groupname ( g ), and role name ( r ) in the authorization database . For example, a proxy from the OSPool whose first VOMS FQAN is /osg/Role=NULL/Capability=NULL will be authenticated to the /osg groupname; note that the / is included in the groupname. Instead of only using the first VOMS FQAN, you can configure XRootD to consider all VOMS FQANs in the proxy for authentication by setting the following in /etc/xrootd/config.d/10-osg-xrdvoms.cfg : set vomsfqans = useall Mapping VOMS attributes to users \u00b6 In order for the XRootD-Multiuser plugin to work, a proxy must be mapped to a user ( u ) that is a valid Unix user. Use a VOMS Mapfile, conventionally in /etc/grid-security/voms-mapfile that contains lines in the following form: \"<FQAN PATTERN>\" <USERNAME> replacing <FQAN PATTERN> with a glob matching FQANs, and <USERNAME> with the user that you want to map matching FQANs to. For example, \"/osg/*\" osg01 will map FQANs starting with /osg/ to the user osg01 . To enable using VOMS mapfiles in the first place, add the following line to your XRootD configuration: voms.mapfile /etc/grid-security/voms-mapfile replacing /etc/grid-security/voms-mapfile with the actual location of your mapfile, if it is different. Note A VOMS Mapfile only affects mapping the user ( u ) attribute understood in the authorization-database . The FQAN will always be used for the groupname ( g ), organization name ( o ), and role name ( r ), even if the mapfile is missing or does not contain a matching mapping. See the VOMS Mapping documentation for details. VOMS Mapfiles previously used with LCMAPS should continue to work unmodified, but the plugin can only look at a single mapfile, so if you are using the mappings provided in /usr/share/osg/voms-mapfile-default (by the vo-client-lcmaps-voms package), you will have to copy them to /etc/grid-security/voms-mapfile . Authorization database \u00b6 XRootD allows configuring fine-grained file access permissions based on authenticated identities and paths. This is configured in the authorization file /etc/xrootd/Authfile , which should be writable only by the xrootd user, optionally readable by others. Here is an example /etc/xrootd/Authfile : # This means that all the users have read access to the datasets, _except_ under /private u * <STORAGE PATH>/private -rl <STORAGE PATH> rl # Or the following, without a restricted /private dir # u * <STORAGE PATH> rl # This means that all the users have full access to their private home dirs u = <STORAGE PATH>/home/@=/ a # This means that the privileged 'xrootd' user can do everything # There must be at least one such user in order to create the # private dirs for users willing to store their data in the facility u xrootd <STORAGE PATH> a # This means that OSPool clients presenting a VOMS proxy can do anything under the 'osg' directory g /osg <STORAGE PATH>/osg a Replacing <STORAGE PATH> with the path to the directory that will contain data served by XRootD, e.g. /data/xrootdfs . This path is relative to the rootdir . Configure most to least specific paths Specific paths need to be specified before generic paths. For example, this line will allow all users to read the contents /data/xrootdfs/private : u * /data/xrootdfs rl /data/xrootdfs/private -rl Instead, specify the following to ensure that a given user will not be able to read the contents of /data/xrootdfs/private unless specified with another authorization rule: u * /data/xrootdfs/private -rl /data/xrootdfs rl Formatting \u00b6 More generally, each authorization rule of the authorization database has the following form: idtype id path privs Field Description idtype Type of id. Use u for username, g for groupname, o for organization name, r for role name, etc. id ID name, e.g. username or groupname. Use * for all users or = for user-specific capabilities, like home directories path The path prefix to be used for matching purposes. @= expands to the current user name before a path prefix match is attempted privs Letter list of privileges: a - all ; l - lookup ; d - delete ; n - rename ; i - insert ; r - read ; k - lock (not used) ; w - write ; - - prefix to remove specified privileges For more details or examples on how to use templated user options, see XRootD authorization database . Verifying file ownership and permissions \u00b6 Ensure the authorization datbase file is owned by xrootd (if you have created file as root), and that it is not writable by others. root@host # chown xrootd:xrootd /etc/xrootd/Authfile root@host # chmod 0640 /etc/xrootd/Authfile # or 0644 Multiuser and the authorization database \u00b6 The XRootD-Multiuser plugin can be used to perform file system operations as a different user than the XRootD daemon (whose user is xrootd ). If it is enabled, then after authorization is done using the authorization database, XRootD will take the user ( u ) attribute of the incoming request, and perform file operations as the Unix user with the same name as that attribute. Note If there is no Unix user with a matching name, you will see an error like XRootD mapped request to username that does not exist: <username> ; the operation will then fail with \"EACCES\" (access denied). Applying Authorization Changes \u00b6 After making changes to your authorization database , you must restart the relevant services . Verifying XRootD Authorization \u00b6 Bearer tokens \u00b6 To test read access using macaroon, SciTokens, and WLCG token authorization with an OSG 3.6 installation, run the following command: user@host $ curl -v \\ -H 'Authorization: Bearer <TOKEN>' \\ https://host.example.com//path/to/directory/hello_world Replacing <TOKEN> with the contents of your encoded token, host.example.com with the target XRootD host, and /path/to/directory/hello_world with the path of the file to read. To test write access, using macaroon, SciTokens, and WLCG token authorization, run the following command: user@host $ curl -v \\ -X PUT \\ --upload-file <FILE TO UPLOAD> \\ -H 'Authorization: Bearer <TOKEN>' \\ https://host.example.com//path/to/directory/hello_world Replacing <TOKEN> with the contents of your encoded token, <FILE TO UPLOAD> with the file to write to the XRootD host, host.example.com with the target XRootD host, and /path/to/directory/hello_world with the path of the file to write. X.509 and VOMS proxies \u00b6 To verify X.509 and VOMS proxy authorization, run the following commands from a machine with your user certificate/key pair, xrootd-client , and voms-clients-cpp installed: Destroy any pre-existing proxies and attempt a copy to a directory (which we will refer to as <DESTINATION PATH> ) on the <XROOTD HOST> to verify failure: user@client $ voms-proxy-destroy user@client $ xrdcp /bin/bash root://<XROOTD HOST>/<DESTINATION PATH> 180213 13:56:49 396570 cryptossl_X509CreateProxy: EEC certificate has expired [0B/0B][100%][==================================================][0B/s] Run: [FATAL] Auth failed On the XRootD host, add your DN to /etc/grid-security/grid-mapfile Add a line to the authorization database to ensure the mapped user can write to <DESTINATION PATH> Restart the relevant XRootD services. See this section for details Generate your proxy and verify that you can successfully transfer files: user@client $ voms-proxy-init user@client $ xrdcp /bin/sh root://<XROOTD HOST>/<DESTINATION PATH> [938.1kB/938.1kB][100%][==================================================][938.1kB/s] If your transfer does not succeed, re-run xrdcp with --debug 2 for more information. Updating to OSG 3.6 \u00b6 There are some manual steps that need to be taken for authentication to work in OSG 3.6. Ensure OSG XRootD packages are fully up-to-date \u00b6 Some authentication configuration is provided by OSG packaging. Old versions of the packages may result in broken configuration. It is best if your packages match the versions in the appropriate release subdirectories of https://repo.opensciencegrid.org/osg/3.6/ , but at the very least these should be true: xrootd >= 5.4 xrootd-multiuser >= 2 (if using multiuser) xrootd-scitokens >= 5.4 (if using SciTokens/WLCG Tokens) xrootd-voms >= 5.4.2-1.1 (if using VOMS auth) osg-xrootd >= 3.6 osg-xrootd-standalone >= 3.6 (if installed) xcache >= 3 (if using xcache-derived software such as stash-cache, stash-origin, atlas-xcache, or cms-xcache) SciToken auth \u00b6 Updating from XRootD 4 (OSG 3.5 without 3.5-upcoming) \u00b6 The config syntax for adding auth plugins has changed between XRootD 4 and XRootD 5. Replace ofs.authlib libXrdAccSciTokens.so ... with ofs.authlib ++ libXrdAccSciTokens.so ... Updating from XRootD 5 (OSG 3.5 with 3.5-upcoming) \u00b6 No config changes are necessary. Proxy auth: transitioning from XrdLcmaps to XrdVoms \u00b6 In OSG 3.5 and previous, proxy authentication was handled by the XrdLcmaps plugin, provided in the xrootd-lcmaps RPM. This is no longer the case in OSG 3.6; instead it is handled by the XrdVoms plugin, provided in the xrootd-voms RPM. To continue using proxy authentication, update your configuration and your authorization database (Authfile) as described below. Updating XRootD configuration \u00b6 Remove any old config in /etc/xrootd and /etc/xrootd/config.d that mentions LCMAPS or libXrdLcmaps.so , otherwise XRootD may fail to start. If you do not have both an unauthenticated stash-cache and an authenticated stash-cache on the same server, uncomment set EnableVoms = 1 in /etc/xrootd/config.d/10-osg-xrdvoms.cfg . If you have both an an authenticated stash-cache and an unauthenticated stash-cache on the same server, add the following block to /etc/xrootd/config.d/10-osg-xrdvoms.cfg : if named stash-cache-auth set EnableVoms = 1 fi If you are using XRootD Multiuser, create a VOMS Mapfile at /etc/grid-security/voms-mapfile , with the syntax described above , then add voms.mapfile /etc/grid-security/voms-mapfile to your XRootD config if it's not already present. Note In order to make yum update easier, xrootd-lcmaps has been replaced with an empty package, which can be removed after upgrading. Updating your authorization database \u00b6 Unlike the XrdLcmaps plugin, which mapped VOMS FQANs to users u , the XrdVoms plugin maps FQANs to groups g , roles r , and organizations o , as described in the mapping VOMS attributes section . You can still use a VOMS mapfile but if you want to use the mappings provided at /usr/share/osg/voms-mapfile-default by the vo-client-lcmaps-voms package, you must copy them to /etc/grid-security/voms-mapfile . Replace mappings based on users with mappings based on the other attributes. For example, instead of u uscmslocal /uscms rl use g /cms/uscms /uscms rl If you need to make a mapping based on group and role, create and use a \"compound ID\" as described in the XRootD security documentation . # create the ID named \"cmsprod\" = cmsprod g /cms r Production # use it x cmsprod /cmsprod rl","title":"Configure Authorization"},{"location":"data/xrootd/xrootd-authorization/#configuring-xrootd-authorization","text":"XRootD offers several authentication options using security plugins to validate incoming credentials, such as bearer tokens, X.509 proxies, and VOMS proxies. In the case of X.509 and VOMS proxies, after the incoming credential has been mapped to a username or groupname, the authorization database is used to provide fine-grained file access. Note On data nodes, files will be owned by Unix user xrootd (or other daemon user), not as the user authenticated to, under most circumstances. XRootD will verify the permissions and authorization based on the user that the security plugin authenticates you to, but, internally, the data node files will be owned by the xrootd user. If this behaviour is not desired, enable XRootD multi-user support .","title":"Configuring XRootD Authorization"},{"location":"data/xrootd/xrootd-authorization/#authorizing-bearer-tokens","text":"The OSG 3.6 configurations of XRootD support authorization of bearer tokens such as macaroons, SciTokens, or WLCG tokens. Encoded in the bearer tokens themselves are information about the files that they should have read/write access to and in the case of SciTokens and WLCG tokens, you may configure XRootD to further restrict access.","title":"Authorizing Bearer Tokens"},{"location":"data/xrootd/xrootd-authorization/#configuring-scitokenswlcg-tokens","text":"SciTokens and WLCG Tokens are asymmetrically signed bearer tokens: they are signed by a token issuer (e.g., CILogon, IAM) and can be verified with the token issuer's public key. To configure XRootD to accept tokens from a given token issuer use the following instructions: Add a section for each token issuer to /etc/xrootd/scitokens.conf : [Issuer <NAME>] issuer = <URL> base_path = <RELATIVE PATH> Replacing <NAME > with a descriptive name, <URL> with the token issuer URL, and base_path to a path relative to rootdir that the client should be restricted to accessing. (Optional) if you want to map the incoming token for a given issuer to a Unix username: Install xrootd-multiuser Add the following to the relevant issuer section in /etc/xrootd/scitokens.conf : map_subject = True (Optional) if you want to only accept tokens with the appropriate aud field, add the following to /etc/xrootd/scitokens.conf : [Global] audience = <COMMMA SEPARATED LIST OF AUDIENCES> An example configuration that supports tokens issued by the OSG Connect and CMS: [Global] audience = https://testserver.example.com/, MySite [Issuer OSG-Connect] issuer = https://scitokens.org/osg-connect base_path = /stash map_subject = True [Issuer CMS] issuer = https://scitokens.org/cms base_path = /user/cms","title":"Configuring SciTokens/WLCG Tokens"},{"location":"data/xrootd/xrootd-authorization/#configuring-macaroons","text":"Macaroons are symetrically signed bearer tokens so your XRootD host must have access to the same secret key that is used to sign incoming macaroons. When used in an XRootD cluster, all data nodes and the redirector need access to the same secret. To enable macaroon support: Place the shared secret in /etc/xrootd/macaroon-secret Ensure that it has the appropriate file ownership and permissions: root@host # chown xrootd:xrootd /etc/xrootd/macaroon-secret root@host # chmod 0600 /etc/xrootd/macaroon-secret","title":"Configuring macaroons"},{"location":"data/xrootd/xrootd-authorization/#authorizing-x509-proxies","text":"","title":"Authorizing X.509 proxies"},{"location":"data/xrootd/xrootd-authorization/#authenticating-proxies","text":"Authorizations for proxy-based security are declared in an XRootD authorization database file . XRootD authentication plugins are used to provide the mappings that are used in the database. Starting with OSG 3.6 , DN mappings are performed with XRootD's built-in GSI support, and FQAN mappings are with the XRootD-VOMS ( XrdVoms ) plugin. To enable proxy authentication, edit /etc/xrootd/config.d/10-osg-xrdvoms.cfg and add or uncomment the line set EnableVoms = 1 Note Proxy authentication is already enabled in XRootD Standalone , so this step is not necessary there. Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, or XRootD 5.5.0 or newer. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. Key length requirements Servers on EL 8 or newer will reject proxies that are not at least 2048 bits long. Ensure your clients' proxies have at least 2048 bits long with voms-proxy-info ; if necessary, have them add the argument -bits 2048 to their voms-proxy-init calls.","title":"Authenticating proxies"},{"location":"data/xrootd/xrootd-authorization/#mapping-subject-dns","text":"DN mappings take precedence over VOMS attributes If you have mapped the subject Distinguished Name (DN) of an incoming proxy with VOMS attributes, XRootD will map it to a username. In OSG 3.6, X.509 proxies are mapped using the built-in XRootD GSI plug-in. To map an incoming proxy's subject DN to an XRootD username , add lines of the following format to /etc/grid-security/grid-mapfile : \"<SUBJECT DN>\" <AUTHDB USERNAME> Replacing <SUBJECT DN> with the X.509 proxy's DN to map and <AUTHDB USERNAME> with the username to reference in the authorization database . For example, the following mapping: \"/DC=org/DC=cilogon/C=US/O=University of Wisconsin-Madison/CN=Brian Lin A2266246\" blin Will result in the username blin , i.e. authorize access to clients presenting the above proxy with u blin ... in the authorization database.","title":"Mapping subject DNs"},{"location":"data/xrootd/xrootd-authorization/#mapping-voms-attributes","text":"Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, or XRootD 5.5.0 or newer. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. In OSG 3.6, if the XRootD-VOMS plugin is enabled, an incoming VOMS proxy will authenticate the first VOMS FQAN and map it to an organization name ( o ), groupname ( g ), and role name ( r ) in the authorization database . For example, a proxy from the OSPool whose first VOMS FQAN is /osg/Role=NULL/Capability=NULL will be authenticated to the /osg groupname; note that the / is included in the groupname. Instead of only using the first VOMS FQAN, you can configure XRootD to consider all VOMS FQANs in the proxy for authentication by setting the following in /etc/xrootd/config.d/10-osg-xrdvoms.cfg : set vomsfqans = useall","title":"Mapping VOMS attributes"},{"location":"data/xrootd/xrootd-authorization/#mapping-voms-attributes-to-users","text":"In order for the XRootD-Multiuser plugin to work, a proxy must be mapped to a user ( u ) that is a valid Unix user. Use a VOMS Mapfile, conventionally in /etc/grid-security/voms-mapfile that contains lines in the following form: \"<FQAN PATTERN>\" <USERNAME> replacing <FQAN PATTERN> with a glob matching FQANs, and <USERNAME> with the user that you want to map matching FQANs to. For example, \"/osg/*\" osg01 will map FQANs starting with /osg/ to the user osg01 . To enable using VOMS mapfiles in the first place, add the following line to your XRootD configuration: voms.mapfile /etc/grid-security/voms-mapfile replacing /etc/grid-security/voms-mapfile with the actual location of your mapfile, if it is different. Note A VOMS Mapfile only affects mapping the user ( u ) attribute understood in the authorization-database . The FQAN will always be used for the groupname ( g ), organization name ( o ), and role name ( r ), even if the mapfile is missing or does not contain a matching mapping. See the VOMS Mapping documentation for details. VOMS Mapfiles previously used with LCMAPS should continue to work unmodified, but the plugin can only look at a single mapfile, so if you are using the mappings provided in /usr/share/osg/voms-mapfile-default (by the vo-client-lcmaps-voms package), you will have to copy them to /etc/grid-security/voms-mapfile .","title":"Mapping VOMS attributes to users"},{"location":"data/xrootd/xrootd-authorization/#authorization-database","text":"XRootD allows configuring fine-grained file access permissions based on authenticated identities and paths. This is configured in the authorization file /etc/xrootd/Authfile , which should be writable only by the xrootd user, optionally readable by others. Here is an example /etc/xrootd/Authfile : # This means that all the users have read access to the datasets, _except_ under /private u * <STORAGE PATH>/private -rl <STORAGE PATH> rl # Or the following, without a restricted /private dir # u * <STORAGE PATH> rl # This means that all the users have full access to their private home dirs u = <STORAGE PATH>/home/@=/ a # This means that the privileged 'xrootd' user can do everything # There must be at least one such user in order to create the # private dirs for users willing to store their data in the facility u xrootd <STORAGE PATH> a # This means that OSPool clients presenting a VOMS proxy can do anything under the 'osg' directory g /osg <STORAGE PATH>/osg a Replacing <STORAGE PATH> with the path to the directory that will contain data served by XRootD, e.g. /data/xrootdfs . This path is relative to the rootdir . Configure most to least specific paths Specific paths need to be specified before generic paths. For example, this line will allow all users to read the contents /data/xrootdfs/private : u * /data/xrootdfs rl /data/xrootdfs/private -rl Instead, specify the following to ensure that a given user will not be able to read the contents of /data/xrootdfs/private unless specified with another authorization rule: u * /data/xrootdfs/private -rl /data/xrootdfs rl","title":"Authorization database"},{"location":"data/xrootd/xrootd-authorization/#formatting","text":"More generally, each authorization rule of the authorization database has the following form: idtype id path privs Field Description idtype Type of id. Use u for username, g for groupname, o for organization name, r for role name, etc. id ID name, e.g. username or groupname. Use * for all users or = for user-specific capabilities, like home directories path The path prefix to be used for matching purposes. @= expands to the current user name before a path prefix match is attempted privs Letter list of privileges: a - all ; l - lookup ; d - delete ; n - rename ; i - insert ; r - read ; k - lock (not used) ; w - write ; - - prefix to remove specified privileges For more details or examples on how to use templated user options, see XRootD authorization database .","title":"Formatting"},{"location":"data/xrootd/xrootd-authorization/#verifying-file-ownership-and-permissions","text":"Ensure the authorization datbase file is owned by xrootd (if you have created file as root), and that it is not writable by others. root@host # chown xrootd:xrootd /etc/xrootd/Authfile root@host # chmod 0640 /etc/xrootd/Authfile # or 0644","title":"Verifying file ownership and permissions"},{"location":"data/xrootd/xrootd-authorization/#multiuser-and-the-authorization-database","text":"The XRootD-Multiuser plugin can be used to perform file system operations as a different user than the XRootD daemon (whose user is xrootd ). If it is enabled, then after authorization is done using the authorization database, XRootD will take the user ( u ) attribute of the incoming request, and perform file operations as the Unix user with the same name as that attribute. Note If there is no Unix user with a matching name, you will see an error like XRootD mapped request to username that does not exist: <username> ; the operation will then fail with \"EACCES\" (access denied).","title":"Multiuser and the authorization database"},{"location":"data/xrootd/xrootd-authorization/#applying-authorization-changes","text":"After making changes to your authorization database , you must restart the relevant services .","title":"Applying Authorization Changes"},{"location":"data/xrootd/xrootd-authorization/#verifying-xrootd-authorization","text":"","title":"Verifying XRootD Authorization"},{"location":"data/xrootd/xrootd-authorization/#bearer-tokens","text":"To test read access using macaroon, SciTokens, and WLCG token authorization with an OSG 3.6 installation, run the following command: user@host $ curl -v \\ -H 'Authorization: Bearer <TOKEN>' \\ https://host.example.com//path/to/directory/hello_world Replacing <TOKEN> with the contents of your encoded token, host.example.com with the target XRootD host, and /path/to/directory/hello_world with the path of the file to read. To test write access, using macaroon, SciTokens, and WLCG token authorization, run the following command: user@host $ curl -v \\ -X PUT \\ --upload-file <FILE TO UPLOAD> \\ -H 'Authorization: Bearer <TOKEN>' \\ https://host.example.com//path/to/directory/hello_world Replacing <TOKEN> with the contents of your encoded token, <FILE TO UPLOAD> with the file to write to the XRootD host, host.example.com with the target XRootD host, and /path/to/directory/hello_world with the path of the file to write.","title":"Bearer tokens"},{"location":"data/xrootd/xrootd-authorization/#x509-and-voms-proxies","text":"To verify X.509 and VOMS proxy authorization, run the following commands from a machine with your user certificate/key pair, xrootd-client , and voms-clients-cpp installed: Destroy any pre-existing proxies and attempt a copy to a directory (which we will refer to as <DESTINATION PATH> ) on the <XROOTD HOST> to verify failure: user@client $ voms-proxy-destroy user@client $ xrdcp /bin/bash root://<XROOTD HOST>/<DESTINATION PATH> 180213 13:56:49 396570 cryptossl_X509CreateProxy: EEC certificate has expired [0B/0B][100%][==================================================][0B/s] Run: [FATAL] Auth failed On the XRootD host, add your DN to /etc/grid-security/grid-mapfile Add a line to the authorization database to ensure the mapped user can write to <DESTINATION PATH> Restart the relevant XRootD services. See this section for details Generate your proxy and verify that you can successfully transfer files: user@client $ voms-proxy-init user@client $ xrdcp /bin/sh root://<XROOTD HOST>/<DESTINATION PATH> [938.1kB/938.1kB][100%][==================================================][938.1kB/s] If your transfer does not succeed, re-run xrdcp with --debug 2 for more information.","title":"X.509 and VOMS proxies"},{"location":"data/xrootd/xrootd-authorization/#updating-to-osg-36","text":"There are some manual steps that need to be taken for authentication to work in OSG 3.6.","title":"Updating to OSG 3.6"},{"location":"data/xrootd/xrootd-authorization/#ensure-osg-xrootd-packages-are-fully-up-to-date","text":"Some authentication configuration is provided by OSG packaging. Old versions of the packages may result in broken configuration. It is best if your packages match the versions in the appropriate release subdirectories of https://repo.opensciencegrid.org/osg/3.6/ , but at the very least these should be true: xrootd >= 5.4 xrootd-multiuser >= 2 (if using multiuser) xrootd-scitokens >= 5.4 (if using SciTokens/WLCG Tokens) xrootd-voms >= 5.4.2-1.1 (if using VOMS auth) osg-xrootd >= 3.6 osg-xrootd-standalone >= 3.6 (if installed) xcache >= 3 (if using xcache-derived software such as stash-cache, stash-origin, atlas-xcache, or cms-xcache)","title":"Ensure OSG XRootD packages are fully up-to-date"},{"location":"data/xrootd/xrootd-authorization/#scitoken-auth","text":"","title":"SciToken auth"},{"location":"data/xrootd/xrootd-authorization/#updating-from-xrootd-4-osg-35-without-35-upcoming","text":"The config syntax for adding auth plugins has changed between XRootD 4 and XRootD 5. Replace ofs.authlib libXrdAccSciTokens.so ... with ofs.authlib ++ libXrdAccSciTokens.so ...","title":"Updating from XRootD 4 (OSG 3.5 without 3.5-upcoming)"},{"location":"data/xrootd/xrootd-authorization/#updating-from-xrootd-5-osg-35-with-35-upcoming","text":"No config changes are necessary.","title":"Updating from XRootD 5 (OSG 3.5 with 3.5-upcoming)"},{"location":"data/xrootd/xrootd-authorization/#proxy-auth-transitioning-from-xrdlcmaps-to-xrdvoms","text":"In OSG 3.5 and previous, proxy authentication was handled by the XrdLcmaps plugin, provided in the xrootd-lcmaps RPM. This is no longer the case in OSG 3.6; instead it is handled by the XrdVoms plugin, provided in the xrootd-voms RPM. To continue using proxy authentication, update your configuration and your authorization database (Authfile) as described below.","title":"Proxy auth: transitioning from XrdLcmaps to XrdVoms"},{"location":"data/xrootd/xrootd-authorization/#updating-xrootd-configuration","text":"Remove any old config in /etc/xrootd and /etc/xrootd/config.d that mentions LCMAPS or libXrdLcmaps.so , otherwise XRootD may fail to start. If you do not have both an unauthenticated stash-cache and an authenticated stash-cache on the same server, uncomment set EnableVoms = 1 in /etc/xrootd/config.d/10-osg-xrdvoms.cfg . If you have both an an authenticated stash-cache and an unauthenticated stash-cache on the same server, add the following block to /etc/xrootd/config.d/10-osg-xrdvoms.cfg : if named stash-cache-auth set EnableVoms = 1 fi If you are using XRootD Multiuser, create a VOMS Mapfile at /etc/grid-security/voms-mapfile , with the syntax described above , then add voms.mapfile /etc/grid-security/voms-mapfile to your XRootD config if it's not already present. Note In order to make yum update easier, xrootd-lcmaps has been replaced with an empty package, which can be removed after upgrading.","title":"Updating XRootD configuration"},{"location":"data/xrootd/xrootd-authorization/#updating-your-authorization-database","text":"Unlike the XrdLcmaps plugin, which mapped VOMS FQANs to users u , the XrdVoms plugin maps FQANs to groups g , roles r , and organizations o , as described in the mapping VOMS attributes section . You can still use a VOMS mapfile but if you want to use the mappings provided at /usr/share/osg/voms-mapfile-default by the vo-client-lcmaps-voms package, you must copy them to /etc/grid-security/voms-mapfile . Replace mappings based on users with mappings based on the other attributes. For example, instead of u uscmslocal /uscms rl use g /cms/uscms /uscms rl If you need to make a mapping based on group and role, create and use a \"compound ID\" as described in the XRootD security documentation . # create the ID named \"cmsprod\" = cmsprod g /cms r Production # use it x cmsprod /cmsprod rl","title":"Updating your authorization database"},{"location":"other/configuration-with-osg-configure/","text":"Configuration with OSG-Configure \u00b6 OSG-Configure and the INI files in /etc/osg/config.d allow a high level configuration of OSG services. This document outlines the settings and options found in the INI files for system administers that are installing and configuring OSG software. This page gives an overview of the options for each of the sections of the configuration files that osg-configure uses. Invocation and script usage \u00b6 The osg-configure script is used to process the INI files and apply changes to the system. osg-configure must be run as root. The typical workflow of OSG-Configure is to first edit the INI files, then verify them, then apply the changes. To verify the config files, run: [root@server] osg-configure -v OSG-Configure will list any errors in your configuration, usually including the section and option where the problem is. Potential problems are: Required option not filled in Invalid value Syntax error Inconsistencies between options To apply changes, run: [root@server] osg-configure -c If your INI files do not change, then re-running osg-configure -c will result in the same configuration as when you ran it the last time. This allows you to experiment with your settings without having to worry about messing up your system. OSG-Configure is split up into modules. Normally, all modules are run when calling osg-configure . However, it is possible to run specific modules separately. To see a list of modules, including whether they can be run separately, run: [root@server] osg-configure -l If the module can be run separately, specify it with the -m <MODULE> option, where <MODULE> is one of the items of the output of the previous command. [root@server] osg-configure -c -m <MODULE> Options may be specified in multiple INI files, which may make it hard to determine which value OSG-Configure uses. You may query the final value of an option via one of these methods: [root@server] osg-configure -q -o <OPTION> [root@server] osg-configure -q -o <SECTION>.<OPTION> Where <OPTION> is the variable from which we want to know the value and <SECTION> refers to a section in any of the INI files, i.e. any name between brackets e.g. [Squid] . Logs are written to /var/log/osg/osg-configure.log . If something goes wrong, specify the -d flag to add more verbose output to osg-configure.log . The rest of this document will detail what to specify in the INI files. Conventions \u00b6 In the tables below: Mandatory options for a section are given in bold type. Sometime the default value may be OK and no edit required, but the variable has to be in the file. Options that are not found in the default ini file are in italics . Syntax and layout \u00b6 The configuration files used by osg-configure are the one supported by Python's configparser , similar in format to the INI configuration file used by MS Windows: Config files are separated into sections, specified by a section name in square brackets (e.g. [Section 1] ) Options should be set using name = value pairs Lines that begin with ; or # are comments Long lines can be split up using continutations: each white space character can be preceded by a newline to fold/continue the field on a new line (same syntax as specified in email RFC 822 ) Variable substitutions are supported -- see below osg-configure reads and uses all of the files in /etc/osg/config.d that have a \".ini\" suffix. The files in this directory are ordered with a numeric prefix with higher numbers being applied later and thus having higher precedence (e.g. 00-foo.ini has a lower precedence than 99-local-site-settings.ini). Configuration sections and options can be specified multiple times in different files. E.g. a section called [PBS] can be given in 20-pbs.ini as well as 99-local-site-settings.ini . Each of the files are successively read and merged to create a final configuration that is then used to configure OSG software. Options and settings in files read later override the ones in previous files. This allows admins to create a file with local settings (e.g. 99-local-site-settings.ini ) that can be read last and which will be take precedence over the default settings in configuration files installed by various RPMs and which will not be overwritten if RPMs are updated. Variable substitution \u00b6 The osg-configure parser allows variables to be defined and used in the configuration file: any option set in a given section can be used as a variable in that section. Assuming that you have set an option with the name myoption in the section, you can substitute the value of that option elsewhere in the section by referring to it as %(myoption)s . Note The trailing s is required. Also, option names cannot have a variable subsitution in them. Special Settings \u00b6 If a setting is set to UNAVAILABLE or DEFAULT or left blank, osg-configure will try to use a sensible default for setting if possible. Ignore setting \u00b6 The enabled option, specifying whether a service is enabled or not, is a boolean but also accepts Ignore as a possible value. Using Ignore, results in the service associated with the section being ignored entirely (and any configuration is skipped). This differs from using False (or the %(disabled)s variable), because using False results in the service associated with the section being disabled. osg-configure will not change the configuration of the service if the enabled is set to Ignore . This is useful, if you have a complex configuration for a given that can't be set up using the ini configuration files. You can manually configure that service by hand editing config files, manually start/stop the service and then use the Ignore setting so that osg-configure does not alter the service's configuration and status. Configuration sections \u00b6 The OSG configuration is divided into sections with each section starting with a section name in square brackets (e.g. [Section 1] ). The configuration is split in multiple files and options form one section can be in more than one files. The following sections give an overview of the options for each of the sections of the configuration files that osg-configure uses. Bosco \u00b6 This section is contained in /etc/osg/config.d/20-bosco.ini which is provided by the osg-configure-bosco RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Bosco jobmanager is being used or not. users String A comma separated string. The existing usernames on the CE for which to install Bosco and allow submissions. In order to have separate usernames per VO, for example the CMS VO to have the cms username, each user must have Bosco installed. The osg-configure service will install Bosco on each of the users listed here. endpoint String The remote cluster submission host for which Bosco will submit jobs to the scheduler. This is in the form of user@example.com , exactly as you would use to ssh into the remote cluster. batch String The type of scheduler installed on the remote cluster. ssh_key String The location of the ssh key, as created above. Condor \u00b6 This section describes the parameters for a Condor jobmanager if it's being used in the current CE installation. If Condor is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-condor.ini which is provided by the osg-configure-condor RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Condor jobmanager is being used or not. condor_location String This should be set to be directory where condor is installed. If this is set to a blank variable, DEFAULT or UNAVAILABLE, the osg-configure script will try to get this from the CONDOR_LOCATION environment variable if available otherwise it will use /usr which works for the RPM installation. condor_config String This should be set to be path where the condor_config file is located. If this is set to a blank variable, DEFAULT or UNAVAILABLE, the osg-configure script will try to get this from the CONDOR_CONFIG environment variable if available otherwise it will use /etc/condor/condor_config , the default for the RPM installation. LSF \u00b6 This section describes the parameters for a LSF jobmanager if it's being used in the current CE installation. If LSF is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-lsf.ini which is provided by the osg-configure-lsf RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the LSF jobmanager is being used or not. lsf_location String This should be set to be directory where lsf is installed PBS \u00b6 This section describes the parameters for a pbs jobmanager if it's being used in the current CE installation. If PBS is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-pbs.ini which is provided by the osg-configure-pbs RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the PBS jobmanager is being used or not. pbs_location String This should be set to be directory where pbs is installed. osg-configure will try to loocation for the pbs binaries in pbs_location/bin. accounting_log_directory String This setting is used to tell Gratia where to find your accounting log files, and it is required for proper accounting. pbs_server String This setting is optional and should point to your PBS server node if it is different from your OSG CE SGE \u00b6 This section describes the parameters for a SGE jobmanager if it's being used in the current CE installation. If SGE is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-sge.ini which is provided by the osg-configure-sge RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the SGE jobmanager is being used or not. sge_root String This should be set to be directory where sge is installed (e.g. same as $SGE_ROOT variable). sge_cell String The sge_cell setting should be set to the value of $SGE_CELL for your SGE install. Slurm \u00b6 This section describes the parameters for a Slurm jobmanager if it's being used in the current CE installation. If Slurm is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-slurm.ini which is provided by the osg-configure-slurm RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Slurm jobmanager is being used or not. slurm_location String This should be set to be directory where slurm is installed. osg-configure will try to location for the slurm binaries in slurm_location/bin. db_host String Hostname of the machine hosting the SLURM database. This information is needed to configure the SLURM gratia probe. db_port String Port of where the SLURM database is listening. This information is needed to configure the SLURM gratia probe. db_user String Username used to access the SLURM database. This information is needed to configure the SLURM gratia probe. db_pass String The location of a file containing the password used to access the SLURM database. This information is needed to configure the SLURM gratia probe. db_name String Name of the SLURM database. This information is needed to configure the SLURM gratia probe. slurm_cluster String The name of the Slurm cluster Gratia \u00b6 This section configures Gratia. If probes is set to UNAVAILABLE , then osg-configure will use appropriate default values. If you need to specify custom reporting (e.g. a local gratia collector) in addition to the default probes, %(osg-jobmanager-gratia)s , and %(itb-jobmanager-gratia)s are defined in the default configuration files to make it easier to specify the standard osg reporting. This section is contained in /etc/osg/config.d/30-gratia.ini which is provided by the osg-configure-gratia RPM. Option Values Accepted Explanation enabled True , False , Ignore This should be set to True if gratia should be configured and enabled on the installation being configured. resource String This should be set to the resource name as given in the OIM registration probes String This should be set to the gratia probes that should be enabled. A probe is specified by using as [probe_type]:server:port . See note Note probes : Legal values for probe_type are: jobmanager (for the HTCondor-CE probe) Info Services \u00b6 Reporting to the central CE Collectors is configured in this section. In the majority of cases, this file can be left untouched; you only need to configure this section if you wish to report to your own CE Collector instead of the ones run by OSG Operations. This section is contained in /etc/osg/config.d/30-infoservices.ini , which is provided by the osg-configure-infoservices RPM. (This is for historical reasons.) Option Values Accepted Explanation enabled True , False , Ignore True if reporting should be configured and enabled ce_collectors String The server(s) HTCondor-CE information should be sent to. See note Note ce_collectors : Set this to DEFAULT to report to the OSG Production or ITB servers (depending on your Site Information configuration). Set this to PRODUCTION to report to the OSG Production servers Set this to ITB to report to the OSG ITB servers Otherwise, set this to the hostname:port of a host running a condor-ce-collector daemon Subcluster / Resource Entry for AGIS / GlideinWMS Entry \u00b6 Subcluster and Resource Entry configuration is for reporting about the worker resources on your site. A subcluster is a homogeneous set of worker node hardware; a resource is a set of subcluster(s) with common capabilities that will be reported to the ATLAS AGIS system. At least one Subcluster or Resource Entry section is required on a CE; please populate the information for all your subclusters. This information will be reported to a central collector and will be used to send GlideIns / pilot jobs to your site; having accurate information is necessary for OSG jobs to effectively use your resources. These configuration files are provided by the osg-configure-cluster RPM. This configuration uses multiple sections of the OSG configuration files: Subcluster* in /etc/osg/config.d/31-cluster.ini : options about homogeneous subclusters Resource Entry* in /etc/osg/config.d/31-cluster.ini : options for specifying ATLAS queues for AGIS GlideinWMS Entry* in /etc/osg/config.d/35-pilot.ini : options for specifying queues for the CMS and OSG GlideinWMS factories Notes for multi-CE sites. \u00b6 If you would like to properly advertise multiple CEs per cluster, make sure that you: Set the value of site_name in the \"Site Information\" section to be the same for each CE. Have the exact same configuration values for the Subcluster* and Resource Entry* sections in each CE. Subcluster Configuration \u00b6 Each homogeneous set of worker node hardware is called a subcluster . For each subcluster in your cluster, fill in the information about the worker node hardware by creating a new Subcluster section with a unique name in the following format: [Subcluster CHANGEME] , where CHANGEME is the globally unique subcluster name (yes, it must be a globally unique name for the whole grid, not just unique to your site. Get creative.) Option Values Accepted Explanation name String The same name that is in the Section label; it should be globally unique ram_mb Positive Integer Megabytes of RAM per node cores_per_node Positive Integer Number of cores per node allowed_vos Comma-separated List or * The collaborations that are allowed to run jobs on this subcluster The following attributes are optional: Option Values Accepted Explanation max_wall_time Positive Integer Maximum wall-clock time, in minutes, that a job is allowed to run on this subcluster. The default is 1440, or the equivalent of one day. queue String The queue to which jobs should be submitted in order to run on this subcluster extra_transforms Classad Transformation attributes which the HTCondor Job Router should apply to incoming jobs so they can run on this subcluster Resource Entry Configuration (ATLAS only) \u00b6 If you are configuring a CE for the ATLAS VO, you must provide hardware information to advertise the queues that are available to AGIS. For each queue, create a new Resource Entry section with a unique name in the following format: [Resource Entry RESOURCE] where RESOURCE is a globally unique resource name (it must be a globally unique name for the whole grid, not just unique to your site). The following options are required for the Resource Entry section and are used to generate the data required by AGIS: Option Values Accepted Explanation name String The same name that is in the Resource Entry label; it must be globally unique max_wall_time Positive Integer Maximum wall-clock time, in minutes, that a job is allowed to run on this resource queue String The queue to which jobs should be submitted to run on this resource cpucount (alias cores_per_node ) Positive Integer Number of cores that a job using this resource can get maxmemory (alias ram_mb ) Positive Integer Maximum amount of memory (in MB) that a job using this resource can get allowed_vos Comma-separated List or * The collaborations that are allowed to run jobs on this resource The following attributes are optional: Option Values Accepted Explanation subclusters Comma-separated List The physical subclusters the resource entry refers to; must be defined as Subcluster sections elsewhere in the file vo_tag String An arbitrary label that is added to jobs routed through this resource GlideinWMS Entry (CMS and OSG pilot factories) \u00b6 If you are configuring a CE that is going to receive pilot jobs from the CMS or the OSG factories (CMS, OSG, LIGO, CLAS12, DUNE, Glow, IceCube, ...), you can provide pilot job specifications to help operators automatically configure the factory entries in GlideinWMS. For each pilot type, create a new Pilot section with a unique name in the following format: [Pilot NAME] where NAME is a string describing the pilot type (e.g.: GPU, WholeNode, default). The following options can be specified in the Pilot section: This section is contained in /etc/osg/config.d/35-pilot.ini Option Values Accepted Explanation cpucount Positive Integer The number of cores for this pilot type. ram_mb Positive Integer The amount of memory (in megabytes) for this pilot type. whole_node true, false This is a whole node pilot; cpucount and ram_mb are ignored if this is true. gpucount Positive Integer The number of GPUs available max_pilots Positive Integer The maximum number of pilots of this type that can be sent max_wall_time Positive Integer The maximum wall-clock time a job is allowed to run for this pilot type, in minutes queue String The queue or partition which jobs should be submitted to in order to run on this resource. Equivalent to the HTCondor grid universe classad attribute remote_queue require_singularity true, false True if the pilot should require singularity or apptainer on the workers. os Comma-separated List The OS of the workers; allowed values are rhel6 , rhel7 , rhel8 , or ubuntu18 . This is required unless require_singularity = true * send_tests true, false Send test pilots? Currently not working, placeholder allowed_vos Comma-separated List or * A comma-separated list of collaborations that are allowed to submit to this subcluster Gateway \u00b6 This section gives information about the options in the Gateway section of the configuration files. These options control the behavior of job gateways on the CE. CEs are based on HTCondor-CE, which uses condor-ce as the gateway. This section is contained in /etc/osg/config.d/10-gateway.ini which is provided by the osg-configure-gateway RPM. Option Values Accepted Explanation htcondor_gateway_enabled True , False (default True). True if the CE is using HTCondor-CE, False otherwise. HTCondor-CE will be configured to support enabled batch systems. RSV will use HTCondor-CE to launch remote probes. job_envvar_path String The value of the PATH environment variable to put into HTCondor jobs running with HTCondor-CE. This value is ignored if not using that batch system/gateway combination. Local Settings \u00b6 This section differs from other sections in that there are no set options in this section. Rather, the options set in this section will be placed in the osg-local-job-environment.conf verbatim. The options in this section are case sensitive and the case will be preserved when they are converted to environment variables. The osg-local-job-environment.conf file gets sourced by jobs run on your cluster so any variables set in this section will appear in the environment of jobs run on your system. Adding a line such as My_Setting = my_Value would result in the an environment variable called My_Setting set to my_Value in the job's environment. my_Value can also be defined in terms of an environment variable (i.e My_Setting = $my_Value ) that will be evaluated on the worker node. For example, to add a variable MY_PATH set to /usr/local/myapp , you'd have the following: [Local Settings] MY_PATH = /usr/local/myapp This section is contained in /etc/osg/config.d/40-localsettings.ini which is provided by the osg-configure-ce RPM. Site Information \u00b6 The settings found in the Site Information section are described below. This section is used to give information about a resource such as resource name, site sponsors, administrators, etc. This section is contained in /etc/osg/config.d/40-siteinfo.ini which is provided by the osg-configure-ce RPM. Option Values Accepted Description group OSG , OSG-ITB This should be set to either OSG or OSG-ITB depending on whether your resource is in the OSG or OSG-ITB group. Most sites should specify OSG host_name String This should be set to be hostname of the CE that is being configured resource String The resource name of this CE endpoint as registered in OIM. resource_group String The resource_group of this CE as registered in OIM. sponsor String This should be set to the sponsor of the resource. See note. site_policy Url This should be a url pointing to the resource's usage policy contact String This should be the name of the resource's admin contact email Email address This should be the email address of the admin contact for the resource city String This should be the city that the resource is located in country String This should be two letter country code for the country that the resource is located in. longitude Number This should be the longitude of the resource. It should be a number between -180 and 180. latitude Number This should be the latitude of the resource. It should be a number between -90 and 90. Note sponsor : If your resource has multiple sponsors, you can separate them using commas or specify the percentage using the following format 'osg, atlas, cms' or 'osg:10, atlas:45, cms:45'. The percentages must add up to 100 if multiple sponsors are used. If you have a sponsor that is not an OSG VO, you can indicate this by using 'local' as the VO. Squid \u00b6 This section handles the configuration and setup of the squid web caching and proxy service. This section is contained in /etc/osg/config.d/01-squid.ini which is provided by the osg-configure-squid RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the squid service is being used or not. location String This should be set to the hostname:port of the squid server. Storage \u00b6 This section gives information about the options in the Storage section of the configuration file. Several of these values are constrained and need to be set in a way that is consistent with one of the OSG storage models. Please review the Storage Related Parameters section of the Environment Variables description and Site Planning discussions for explanations of the various storage models and the requirements for them. This section is contained in /etc/osg/config.d/10-storage.ini which is provided by the osg-configure-ce RPM. Option Values Accepted Explanation se_available True , False This indicates whether there is an associated SE available. default_se String If an SE is available at your cluster, set default_se to the hostname of this SE, otherwise set default_se to UNAVAILABLE. grid_dir String This setting should point to the directory which holds the files from the OSG worker node package. See note app_dir String This setting should point to the directory which contains the VO specific applications. See note data_dir String This setting should point to a directory that can be used to store and stage data in and out of the cluster. See note worker_node_temp String This directory should point to a directory that can be used as scratch space on compute nodes. If not set, the default is UNAVAILABLE. See note site_read String This setting should be the location or url to a directory that can be read to stage in data via the variable $OSG_SITE_READ . This is an url if you are using a SE. If not set, the default is UNAVAILABLE site_write String This setting should be the location or url to a directory that can be write to stage out data via the variable $OSG_SITE_WRITE . This is an url if you are using a SE. If not set, the default is UNAVAILABLE Dynamic worker node paths The above variables may be set to an environment variable that is set on your site's worker nodes. For example, if each of your worker nodes has a different location for its scratch directory specified by LOCAL_SCRATCH_DIR , set the following configuration: [Storage] worker_node_temp = $LOCAL_SCRATCH_DIR grid_dir : If you have installed the worker node client via RPM (the normal case) it should be /etc/osg/wn-client . If you have installed the worker node in a special location (perhaps via the worker node client tarball or via OASIS), it should be the location of that directory. This directory will be accessed via the $OSG_GRID environment variable. It should be visible on all of the compute nodes. Read access is required, though worker nodes don't need write access. app_dir : This directory will be accesed via the $OSG_APP environment variable. It should be visible on both the CE and worker nodes. Only the CE needs to have write access to this directory. This directory must also contain a sub-directory etc/ with 1777 permissions. This directory may also be in OASIS, in which case set app_dir to /cvmfs/oasis.opensciencegrid.org . (The CE does not need write access in that case.) data_dir : This directory can be accessed via the $OSG_DATA environment variable. It should be readable and writable on both the CE and worker nodes. worker_node_temp : This directory will be accessed via the $OSG_WN_TMP environment variable. It should allow read and write access on a worker node and can be visible to just that worker node.","title":"Configuration with OSG-Configure"},{"location":"other/configuration-with-osg-configure/#configuration-with-osg-configure","text":"OSG-Configure and the INI files in /etc/osg/config.d allow a high level configuration of OSG services. This document outlines the settings and options found in the INI files for system administers that are installing and configuring OSG software. This page gives an overview of the options for each of the sections of the configuration files that osg-configure uses.","title":"Configuration with OSG-Configure"},{"location":"other/configuration-with-osg-configure/#invocation-and-script-usage","text":"The osg-configure script is used to process the INI files and apply changes to the system. osg-configure must be run as root. The typical workflow of OSG-Configure is to first edit the INI files, then verify them, then apply the changes. To verify the config files, run: [root@server] osg-configure -v OSG-Configure will list any errors in your configuration, usually including the section and option where the problem is. Potential problems are: Required option not filled in Invalid value Syntax error Inconsistencies between options To apply changes, run: [root@server] osg-configure -c If your INI files do not change, then re-running osg-configure -c will result in the same configuration as when you ran it the last time. This allows you to experiment with your settings without having to worry about messing up your system. OSG-Configure is split up into modules. Normally, all modules are run when calling osg-configure . However, it is possible to run specific modules separately. To see a list of modules, including whether they can be run separately, run: [root@server] osg-configure -l If the module can be run separately, specify it with the -m <MODULE> option, where <MODULE> is one of the items of the output of the previous command. [root@server] osg-configure -c -m <MODULE> Options may be specified in multiple INI files, which may make it hard to determine which value OSG-Configure uses. You may query the final value of an option via one of these methods: [root@server] osg-configure -q -o <OPTION> [root@server] osg-configure -q -o <SECTION>.<OPTION> Where <OPTION> is the variable from which we want to know the value and <SECTION> refers to a section in any of the INI files, i.e. any name between brackets e.g. [Squid] . Logs are written to /var/log/osg/osg-configure.log . If something goes wrong, specify the -d flag to add more verbose output to osg-configure.log . The rest of this document will detail what to specify in the INI files.","title":"Invocation and script usage"},{"location":"other/configuration-with-osg-configure/#conventions","text":"In the tables below: Mandatory options for a section are given in bold type. Sometime the default value may be OK and no edit required, but the variable has to be in the file. Options that are not found in the default ini file are in italics .","title":"Conventions"},{"location":"other/configuration-with-osg-configure/#syntax-and-layout","text":"The configuration files used by osg-configure are the one supported by Python's configparser , similar in format to the INI configuration file used by MS Windows: Config files are separated into sections, specified by a section name in square brackets (e.g. [Section 1] ) Options should be set using name = value pairs Lines that begin with ; or # are comments Long lines can be split up using continutations: each white space character can be preceded by a newline to fold/continue the field on a new line (same syntax as specified in email RFC 822 ) Variable substitutions are supported -- see below osg-configure reads and uses all of the files in /etc/osg/config.d that have a \".ini\" suffix. The files in this directory are ordered with a numeric prefix with higher numbers being applied later and thus having higher precedence (e.g. 00-foo.ini has a lower precedence than 99-local-site-settings.ini). Configuration sections and options can be specified multiple times in different files. E.g. a section called [PBS] can be given in 20-pbs.ini as well as 99-local-site-settings.ini . Each of the files are successively read and merged to create a final configuration that is then used to configure OSG software. Options and settings in files read later override the ones in previous files. This allows admins to create a file with local settings (e.g. 99-local-site-settings.ini ) that can be read last and which will be take precedence over the default settings in configuration files installed by various RPMs and which will not be overwritten if RPMs are updated.","title":"Syntax and layout"},{"location":"other/configuration-with-osg-configure/#variable-substitution","text":"The osg-configure parser allows variables to be defined and used in the configuration file: any option set in a given section can be used as a variable in that section. Assuming that you have set an option with the name myoption in the section, you can substitute the value of that option elsewhere in the section by referring to it as %(myoption)s . Note The trailing s is required. Also, option names cannot have a variable subsitution in them.","title":"Variable substitution"},{"location":"other/configuration-with-osg-configure/#special-settings","text":"If a setting is set to UNAVAILABLE or DEFAULT or left blank, osg-configure will try to use a sensible default for setting if possible.","title":"Special Settings"},{"location":"other/configuration-with-osg-configure/#ignore-setting","text":"The enabled option, specifying whether a service is enabled or not, is a boolean but also accepts Ignore as a possible value. Using Ignore, results in the service associated with the section being ignored entirely (and any configuration is skipped). This differs from using False (or the %(disabled)s variable), because using False results in the service associated with the section being disabled. osg-configure will not change the configuration of the service if the enabled is set to Ignore . This is useful, if you have a complex configuration for a given that can't be set up using the ini configuration files. You can manually configure that service by hand editing config files, manually start/stop the service and then use the Ignore setting so that osg-configure does not alter the service's configuration and status.","title":"Ignore setting"},{"location":"other/configuration-with-osg-configure/#configuration-sections","text":"The OSG configuration is divided into sections with each section starting with a section name in square brackets (e.g. [Section 1] ). The configuration is split in multiple files and options form one section can be in more than one files. The following sections give an overview of the options for each of the sections of the configuration files that osg-configure uses.","title":"Configuration sections"},{"location":"other/configuration-with-osg-configure/#bosco","text":"This section is contained in /etc/osg/config.d/20-bosco.ini which is provided by the osg-configure-bosco RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Bosco jobmanager is being used or not. users String A comma separated string. The existing usernames on the CE for which to install Bosco and allow submissions. In order to have separate usernames per VO, for example the CMS VO to have the cms username, each user must have Bosco installed. The osg-configure service will install Bosco on each of the users listed here. endpoint String The remote cluster submission host for which Bosco will submit jobs to the scheduler. This is in the form of user@example.com , exactly as you would use to ssh into the remote cluster. batch String The type of scheduler installed on the remote cluster. ssh_key String The location of the ssh key, as created above.","title":"Bosco"},{"location":"other/configuration-with-osg-configure/#condor","text":"This section describes the parameters for a Condor jobmanager if it's being used in the current CE installation. If Condor is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-condor.ini which is provided by the osg-configure-condor RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Condor jobmanager is being used or not. condor_location String This should be set to be directory where condor is installed. If this is set to a blank variable, DEFAULT or UNAVAILABLE, the osg-configure script will try to get this from the CONDOR_LOCATION environment variable if available otherwise it will use /usr which works for the RPM installation. condor_config String This should be set to be path where the condor_config file is located. If this is set to a blank variable, DEFAULT or UNAVAILABLE, the osg-configure script will try to get this from the CONDOR_CONFIG environment variable if available otherwise it will use /etc/condor/condor_config , the default for the RPM installation.","title":"Condor"},{"location":"other/configuration-with-osg-configure/#lsf","text":"This section describes the parameters for a LSF jobmanager if it's being used in the current CE installation. If LSF is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-lsf.ini which is provided by the osg-configure-lsf RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the LSF jobmanager is being used or not. lsf_location String This should be set to be directory where lsf is installed","title":"LSF"},{"location":"other/configuration-with-osg-configure/#pbs","text":"This section describes the parameters for a pbs jobmanager if it's being used in the current CE installation. If PBS is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-pbs.ini which is provided by the osg-configure-pbs RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the PBS jobmanager is being used or not. pbs_location String This should be set to be directory where pbs is installed. osg-configure will try to loocation for the pbs binaries in pbs_location/bin. accounting_log_directory String This setting is used to tell Gratia where to find your accounting log files, and it is required for proper accounting. pbs_server String This setting is optional and should point to your PBS server node if it is different from your OSG CE","title":"PBS"},{"location":"other/configuration-with-osg-configure/#sge","text":"This section describes the parameters for a SGE jobmanager if it's being used in the current CE installation. If SGE is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-sge.ini which is provided by the osg-configure-sge RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the SGE jobmanager is being used or not. sge_root String This should be set to be directory where sge is installed (e.g. same as $SGE_ROOT variable). sge_cell String The sge_cell setting should be set to the value of $SGE_CELL for your SGE install.","title":"SGE"},{"location":"other/configuration-with-osg-configure/#slurm","text":"This section describes the parameters for a Slurm jobmanager if it's being used in the current CE installation. If Slurm is not being used, the enabled setting should be set to False . This section is contained in /etc/osg/config.d/20-slurm.ini which is provided by the osg-configure-slurm RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the Slurm jobmanager is being used or not. slurm_location String This should be set to be directory where slurm is installed. osg-configure will try to location for the slurm binaries in slurm_location/bin. db_host String Hostname of the machine hosting the SLURM database. This information is needed to configure the SLURM gratia probe. db_port String Port of where the SLURM database is listening. This information is needed to configure the SLURM gratia probe. db_user String Username used to access the SLURM database. This information is needed to configure the SLURM gratia probe. db_pass String The location of a file containing the password used to access the SLURM database. This information is needed to configure the SLURM gratia probe. db_name String Name of the SLURM database. This information is needed to configure the SLURM gratia probe. slurm_cluster String The name of the Slurm cluster","title":"Slurm"},{"location":"other/configuration-with-osg-configure/#gratia","text":"This section configures Gratia. If probes is set to UNAVAILABLE , then osg-configure will use appropriate default values. If you need to specify custom reporting (e.g. a local gratia collector) in addition to the default probes, %(osg-jobmanager-gratia)s , and %(itb-jobmanager-gratia)s are defined in the default configuration files to make it easier to specify the standard osg reporting. This section is contained in /etc/osg/config.d/30-gratia.ini which is provided by the osg-configure-gratia RPM. Option Values Accepted Explanation enabled True , False , Ignore This should be set to True if gratia should be configured and enabled on the installation being configured. resource String This should be set to the resource name as given in the OIM registration probes String This should be set to the gratia probes that should be enabled. A probe is specified by using as [probe_type]:server:port . See note Note probes : Legal values for probe_type are: jobmanager (for the HTCondor-CE probe)","title":"Gratia"},{"location":"other/configuration-with-osg-configure/#info-services","text":"Reporting to the central CE Collectors is configured in this section. In the majority of cases, this file can be left untouched; you only need to configure this section if you wish to report to your own CE Collector instead of the ones run by OSG Operations. This section is contained in /etc/osg/config.d/30-infoservices.ini , which is provided by the osg-configure-infoservices RPM. (This is for historical reasons.) Option Values Accepted Explanation enabled True , False , Ignore True if reporting should be configured and enabled ce_collectors String The server(s) HTCondor-CE information should be sent to. See note Note ce_collectors : Set this to DEFAULT to report to the OSG Production or ITB servers (depending on your Site Information configuration). Set this to PRODUCTION to report to the OSG Production servers Set this to ITB to report to the OSG ITB servers Otherwise, set this to the hostname:port of a host running a condor-ce-collector daemon","title":"Info Services"},{"location":"other/configuration-with-osg-configure/#subcluster-resource-entry-for-agis-glideinwms-entry","text":"Subcluster and Resource Entry configuration is for reporting about the worker resources on your site. A subcluster is a homogeneous set of worker node hardware; a resource is a set of subcluster(s) with common capabilities that will be reported to the ATLAS AGIS system. At least one Subcluster or Resource Entry section is required on a CE; please populate the information for all your subclusters. This information will be reported to a central collector and will be used to send GlideIns / pilot jobs to your site; having accurate information is necessary for OSG jobs to effectively use your resources. These configuration files are provided by the osg-configure-cluster RPM. This configuration uses multiple sections of the OSG configuration files: Subcluster* in /etc/osg/config.d/31-cluster.ini : options about homogeneous subclusters Resource Entry* in /etc/osg/config.d/31-cluster.ini : options for specifying ATLAS queues for AGIS GlideinWMS Entry* in /etc/osg/config.d/35-pilot.ini : options for specifying queues for the CMS and OSG GlideinWMS factories","title":"Subcluster / Resource Entry for AGIS / GlideinWMS Entry"},{"location":"other/configuration-with-osg-configure/#notes-for-multi-ce-sites","text":"If you would like to properly advertise multiple CEs per cluster, make sure that you: Set the value of site_name in the \"Site Information\" section to be the same for each CE. Have the exact same configuration values for the Subcluster* and Resource Entry* sections in each CE.","title":"Notes for multi-CE sites."},{"location":"other/configuration-with-osg-configure/#subcluster-configuration","text":"Each homogeneous set of worker node hardware is called a subcluster . For each subcluster in your cluster, fill in the information about the worker node hardware by creating a new Subcluster section with a unique name in the following format: [Subcluster CHANGEME] , where CHANGEME is the globally unique subcluster name (yes, it must be a globally unique name for the whole grid, not just unique to your site. Get creative.) Option Values Accepted Explanation name String The same name that is in the Section label; it should be globally unique ram_mb Positive Integer Megabytes of RAM per node cores_per_node Positive Integer Number of cores per node allowed_vos Comma-separated List or * The collaborations that are allowed to run jobs on this subcluster The following attributes are optional: Option Values Accepted Explanation max_wall_time Positive Integer Maximum wall-clock time, in minutes, that a job is allowed to run on this subcluster. The default is 1440, or the equivalent of one day. queue String The queue to which jobs should be submitted in order to run on this subcluster extra_transforms Classad Transformation attributes which the HTCondor Job Router should apply to incoming jobs so they can run on this subcluster","title":"Subcluster Configuration"},{"location":"other/configuration-with-osg-configure/#resource-entry-configuration-atlas-only","text":"If you are configuring a CE for the ATLAS VO, you must provide hardware information to advertise the queues that are available to AGIS. For each queue, create a new Resource Entry section with a unique name in the following format: [Resource Entry RESOURCE] where RESOURCE is a globally unique resource name (it must be a globally unique name for the whole grid, not just unique to your site). The following options are required for the Resource Entry section and are used to generate the data required by AGIS: Option Values Accepted Explanation name String The same name that is in the Resource Entry label; it must be globally unique max_wall_time Positive Integer Maximum wall-clock time, in minutes, that a job is allowed to run on this resource queue String The queue to which jobs should be submitted to run on this resource cpucount (alias cores_per_node ) Positive Integer Number of cores that a job using this resource can get maxmemory (alias ram_mb ) Positive Integer Maximum amount of memory (in MB) that a job using this resource can get allowed_vos Comma-separated List or * The collaborations that are allowed to run jobs on this resource The following attributes are optional: Option Values Accepted Explanation subclusters Comma-separated List The physical subclusters the resource entry refers to; must be defined as Subcluster sections elsewhere in the file vo_tag String An arbitrary label that is added to jobs routed through this resource","title":"Resource Entry Configuration (ATLAS only)"},{"location":"other/configuration-with-osg-configure/#glideinwms-entry-cms-and-osg-pilot-factories","text":"If you are configuring a CE that is going to receive pilot jobs from the CMS or the OSG factories (CMS, OSG, LIGO, CLAS12, DUNE, Glow, IceCube, ...), you can provide pilot job specifications to help operators automatically configure the factory entries in GlideinWMS. For each pilot type, create a new Pilot section with a unique name in the following format: [Pilot NAME] where NAME is a string describing the pilot type (e.g.: GPU, WholeNode, default). The following options can be specified in the Pilot section: This section is contained in /etc/osg/config.d/35-pilot.ini Option Values Accepted Explanation cpucount Positive Integer The number of cores for this pilot type. ram_mb Positive Integer The amount of memory (in megabytes) for this pilot type. whole_node true, false This is a whole node pilot; cpucount and ram_mb are ignored if this is true. gpucount Positive Integer The number of GPUs available max_pilots Positive Integer The maximum number of pilots of this type that can be sent max_wall_time Positive Integer The maximum wall-clock time a job is allowed to run for this pilot type, in minutes queue String The queue or partition which jobs should be submitted to in order to run on this resource. Equivalent to the HTCondor grid universe classad attribute remote_queue require_singularity true, false True if the pilot should require singularity or apptainer on the workers. os Comma-separated List The OS of the workers; allowed values are rhel6 , rhel7 , rhel8 , or ubuntu18 . This is required unless require_singularity = true * send_tests true, false Send test pilots? Currently not working, placeholder allowed_vos Comma-separated List or * A comma-separated list of collaborations that are allowed to submit to this subcluster","title":"GlideinWMS Entry (CMS and OSG pilot factories)"},{"location":"other/configuration-with-osg-configure/#gateway","text":"This section gives information about the options in the Gateway section of the configuration files. These options control the behavior of job gateways on the CE. CEs are based on HTCondor-CE, which uses condor-ce as the gateway. This section is contained in /etc/osg/config.d/10-gateway.ini which is provided by the osg-configure-gateway RPM. Option Values Accepted Explanation htcondor_gateway_enabled True , False (default True). True if the CE is using HTCondor-CE, False otherwise. HTCondor-CE will be configured to support enabled batch systems. RSV will use HTCondor-CE to launch remote probes. job_envvar_path String The value of the PATH environment variable to put into HTCondor jobs running with HTCondor-CE. This value is ignored if not using that batch system/gateway combination.","title":"Gateway"},{"location":"other/configuration-with-osg-configure/#local-settings","text":"This section differs from other sections in that there are no set options in this section. Rather, the options set in this section will be placed in the osg-local-job-environment.conf verbatim. The options in this section are case sensitive and the case will be preserved when they are converted to environment variables. The osg-local-job-environment.conf file gets sourced by jobs run on your cluster so any variables set in this section will appear in the environment of jobs run on your system. Adding a line such as My_Setting = my_Value would result in the an environment variable called My_Setting set to my_Value in the job's environment. my_Value can also be defined in terms of an environment variable (i.e My_Setting = $my_Value ) that will be evaluated on the worker node. For example, to add a variable MY_PATH set to /usr/local/myapp , you'd have the following: [Local Settings] MY_PATH = /usr/local/myapp This section is contained in /etc/osg/config.d/40-localsettings.ini which is provided by the osg-configure-ce RPM.","title":"Local Settings"},{"location":"other/configuration-with-osg-configure/#site-information","text":"The settings found in the Site Information section are described below. This section is used to give information about a resource such as resource name, site sponsors, administrators, etc. This section is contained in /etc/osg/config.d/40-siteinfo.ini which is provided by the osg-configure-ce RPM. Option Values Accepted Description group OSG , OSG-ITB This should be set to either OSG or OSG-ITB depending on whether your resource is in the OSG or OSG-ITB group. Most sites should specify OSG host_name String This should be set to be hostname of the CE that is being configured resource String The resource name of this CE endpoint as registered in OIM. resource_group String The resource_group of this CE as registered in OIM. sponsor String This should be set to the sponsor of the resource. See note. site_policy Url This should be a url pointing to the resource's usage policy contact String This should be the name of the resource's admin contact email Email address This should be the email address of the admin contact for the resource city String This should be the city that the resource is located in country String This should be two letter country code for the country that the resource is located in. longitude Number This should be the longitude of the resource. It should be a number between -180 and 180. latitude Number This should be the latitude of the resource. It should be a number between -90 and 90. Note sponsor : If your resource has multiple sponsors, you can separate them using commas or specify the percentage using the following format 'osg, atlas, cms' or 'osg:10, atlas:45, cms:45'. The percentages must add up to 100 if multiple sponsors are used. If you have a sponsor that is not an OSG VO, you can indicate this by using 'local' as the VO.","title":"Site Information"},{"location":"other/configuration-with-osg-configure/#squid","text":"This section handles the configuration and setup of the squid web caching and proxy service. This section is contained in /etc/osg/config.d/01-squid.ini which is provided by the osg-configure-squid RPM. Option Values Accepted Explanation enabled True , False , Ignore This indicates whether the squid service is being used or not. location String This should be set to the hostname:port of the squid server.","title":"Squid"},{"location":"other/configuration-with-osg-configure/#storage","text":"This section gives information about the options in the Storage section of the configuration file. Several of these values are constrained and need to be set in a way that is consistent with one of the OSG storage models. Please review the Storage Related Parameters section of the Environment Variables description and Site Planning discussions for explanations of the various storage models and the requirements for them. This section is contained in /etc/osg/config.d/10-storage.ini which is provided by the osg-configure-ce RPM. Option Values Accepted Explanation se_available True , False This indicates whether there is an associated SE available. default_se String If an SE is available at your cluster, set default_se to the hostname of this SE, otherwise set default_se to UNAVAILABLE. grid_dir String This setting should point to the directory which holds the files from the OSG worker node package. See note app_dir String This setting should point to the directory which contains the VO specific applications. See note data_dir String This setting should point to a directory that can be used to store and stage data in and out of the cluster. See note worker_node_temp String This directory should point to a directory that can be used as scratch space on compute nodes. If not set, the default is UNAVAILABLE. See note site_read String This setting should be the location or url to a directory that can be read to stage in data via the variable $OSG_SITE_READ . This is an url if you are using a SE. If not set, the default is UNAVAILABLE site_write String This setting should be the location or url to a directory that can be write to stage out data via the variable $OSG_SITE_WRITE . This is an url if you are using a SE. If not set, the default is UNAVAILABLE Dynamic worker node paths The above variables may be set to an environment variable that is set on your site's worker nodes. For example, if each of your worker nodes has a different location for its scratch directory specified by LOCAL_SCRATCH_DIR , set the following configuration: [Storage] worker_node_temp = $LOCAL_SCRATCH_DIR grid_dir : If you have installed the worker node client via RPM (the normal case) it should be /etc/osg/wn-client . If you have installed the worker node in a special location (perhaps via the worker node client tarball or via OASIS), it should be the location of that directory. This directory will be accessed via the $OSG_GRID environment variable. It should be visible on all of the compute nodes. Read access is required, though worker nodes don't need write access. app_dir : This directory will be accesed via the $OSG_APP environment variable. It should be visible on both the CE and worker nodes. Only the CE needs to have write access to this directory. This directory must also contain a sub-directory etc/ with 1777 permissions. This directory may also be in OASIS, in which case set app_dir to /cvmfs/oasis.opensciencegrid.org . (The CE does not need write access in that case.) data_dir : This directory can be accessed via the $OSG_DATA environment variable. It should be readable and writable on both the CE and worker nodes. worker_node_temp : This directory will be accessed via the $OSG_WN_TMP environment variable. It should allow read and write access on a worker node and can be visible to just that worker node.","title":"Storage"},{"location":"other/firewall/","text":"Firewall Considerations \u00b6 Services run at a site need to communicate with the distributed OSG Fabric of Services, which may require changes in your firewall. For instance, the OSG Factory hosts need to communicate with CEs in order for your site to receive any work. This page contains information about ports and hosts that need to communicate with your site services. Note Inbound hosts only apply to certain collaborations (including the OSPool). If an administrator is unsure about which hosts to allow, they should contact the collaborations that they support. Limiting inbound connections will limit collaborators' ability to remotely troubleshoot issues or result in service outages as a collaboration's glidein submission infrastructure evolves. Compute Entrypoints \u00b6 Destination Port Direction Hosts TCP 9619 Inbound gfactory-2.opensciencegrid.org gfactory-itb-1.opensciencegrid.org vocms0207.cern.ch TCP 9619 Outbound collector.opensciencegrid.org collector1.opensciencegrid.org collector1.opensciencegrid.org","title":"Firewall Considerations"},{"location":"other/firewall/#firewall-considerations","text":"Services run at a site need to communicate with the distributed OSG Fabric of Services, which may require changes in your firewall. For instance, the OSG Factory hosts need to communicate with CEs in order for your site to receive any work. This page contains information about ports and hosts that need to communicate with your site services. Note Inbound hosts only apply to certain collaborations (including the OSPool). If an administrator is unsure about which hosts to allow, they should contact the collaborations that they support. Limiting inbound connections will limit collaborators' ability to remotely troubleshoot issues or result in service outages as a collaboration's glidein submission infrastructure evolves.","title":"Firewall Considerations"},{"location":"other/firewall/#compute-entrypoints","text":"Destination Port Direction Hosts TCP 9619 Inbound gfactory-2.opensciencegrid.org gfactory-itb-1.opensciencegrid.org vocms0207.cern.ch TCP 9619 Outbound collector.opensciencegrid.org collector1.opensciencegrid.org collector1.opensciencegrid.org","title":"Compute Entrypoints"},{"location":"other/install-cvmfs-stratum1/","text":"Install a CVMFS Stratum 1 \u00b6 This document describes how to install a CVMFS Stratum 1. There are many different variations on how to do that, but this document focuses on the configuration of the OSG Operations Stratum 1 oasis-replica.opensciencegrid.org. It is applicable to other Stratum 1s as well, very likely with modifications (some of which are suggested in the document below). Applicable versions The applicable software versions for this document are cvmfs and cvmfs-server >= 2.4.2. Before Starting \u00b6 Before starting the installation process, consider the following points: User IDs and Group IDs: If your machine is also going to be a repository server like OSG Operations, the installation will create the same user and group IDs as the cvmfs client . If you are installing frontier-squid, the installation will also create the same user id as frontier-squid . Network ports: This installation will host the stratum 1 on ports 80, 8000 and 8080, and if squid is installed it will host the uncached apache on port 8081. Port 80 is default but sometimes runs into operational problems, port 8000 is the alternate for most production use, and port 8080 is for Cloudflare (https://openhtc.io). Host choice: - Make sure there is adequate disk space for all the repositories that will be served, at /srv/cvmfs . In addition, about 100GB should be reserved for apache and squid logs under /var/log on a production server, although they normally will not get that large. Apache logs get larger than squid logs because by default they are rotated much less frequently. Many installations share that space with the filesystem used for /srv/cvmfs by turning that directory along with /var/log/squid and /var/log/httpd into symlinks pointing to directories on the big filesystem. SELinux - Ensure SELinux is disabled As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Installing \u00b6 All CVMFS Stratum 1s require cvmfs-server software and apache (httpd). It is highly recommended to also install frontier-squid and frontier-awstats on the same machine to be able to easily join the WLCG MRTG and awstats monitoring systems. The recommended configuration for frontier-squid below only caches geo api lookups. Other than that, it is primarily for monitoring. Installing cvmfs-server and httpd \u00b6 Use this command to install cvmfs-server and httpd: root@host # yum -y install cvmfs-server cvmfs-config mod_wsgi Installing frontier-squid and frontier-awstats \u00b6 frontier-awstats is not distributed by OSG so these instructions get it from its original source. Do these commands to install frontier-squid and frontier-awstats: root@host # rpm -i http://frontier.cern.ch/dist/rpms/RPMS/noarch/frontier-release-1.1-1.noarch.rpm root@host # yum -y install frontier-squid frontier-awstats Configuring \u00b6 Configuring the system \u00b6 Increase the default number of open file descriptors: root@host # echo -e \"*\\t\\t-\\tnofile\\t\\t16384\" >>/etc/security/limits.conf root@host # ulimit -n 16384 In order for this to apply also interactively when logging in over ssh, the option UsePAM has to be set to yes in /etc/ssh/sshd_config . Configuring cron \u00b6 First, create the log directory: root@host # mkdir -p /var/log/cvmfs Put the following in /etc/cron.d/cvmfs : */5 * * * * root test -d /srv/cvmfs || exit;cvmfs_server snapshot -ai 6 1 * * * root cvmfs_server gc -af 2>/dev/null || true 0 9 * * * root find /srv/cvmfs/*.*/data/txn -name \"*.*\" -mtime +2 2>/dev/null|xargs rm -f Also, put the following in /etc/logrotate.d/cvmfs : /var/log/cvmfs/*.log { weekly missingok notifempty } Configuring apache \u00b6 If you are installing frontier-squid, create /etc/httpd/conf.d/cvmfs.conf and put the following lines into it: Listen 8081 KeepAlive On If you are not installing frontier-squid, instead put the following lines into that file: Listen 8000 Listen 8080 KeepAlive On Then enable apache: root@host # systemctl enable httpd root@host # systemctl start httpd Configuring frontier-squid \u00b6 Put the following in /etc/squid/customize.sh after the existing comment header: awk -- file ` dirname $ 0 ` / customhelps . awk -- source '{ # cache only api calls insertline(\"^http_access deny all\", \"acl CVMFSAPI urlpath_regex ^/cvmfs/[^/]*/api/\") insertline(\"^http_access deny all\", \"cache deny !CVMFSAPI\") # port 80 is also supported, through an iptables redirect setoption(\"http_port\", \"8080 accel defaultsite=localhost:8081 no-vhost\") insertline(\"^http_port\",\"http_port 8000 accel defaultsite=localhost:8081 no-vhost\") setoption(\"cache_peer\", \"localhost parent 8081 0 no-query originserver\") # allow incoming http accesses from anywhere # all requests will be forwarded to the originserver commentout(\"http_access allow NET_LOCAL\") insertline(\"^http_access deny all\", \"http_access allow all\") # do not let squid cache DNS entries more than 5 minutes setoption(\"positive_dns_ttl\", \"5 minutes\") # set shutdown_lifetime to 0 to avoid giving new connections error # codes, which get cached upstream setoption(\"shutdown_lifetime\", \"0 seconds\") # turn off collapsed_forwarding to prevent slow clients from slowing down # faster ones setoption(\"collapsed_forwarding\", \"off\") print }' On EL7 and EL8 systems, make sure that firewalld is disabled and iptables-services is installed and enabled: root@host # systemctl stop firewalld root@host # systemctl disable firewalld root@host # systemctl mask --now firewalld root@host # yum -y install iptables-services root@host # systemctl start iptables root@host # systemctl enable iptables root@host # systemctl start ip6tables root@host # systemctl enable ip6tables Forward port 80 to port 8000: root@host # iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000 root@host # service iptables save root@host # ip6tables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000 root@host # service ip6tables save Enable frontier-squid: root@host # systemctl enable frontier-squid root@host # systemctl start frontier-squid Note The above configuration is for a single squid thread, which is fine for 1Gbit/s and possibly 2Gbit/s, but if higher bandwidth is needed, see the instructions for running multiple squid workers . Verifying \u00b6 In order to verify that everything is installed correctly, create a repository replica. The repository chosen for the instructions below is the OSG config repository because it is very small, but you can use another one if you prefer. Adding an example repository \u00b6 It's a good idea to make your own script for adding repository replicas, because there's always at least two commands to run, and it's easy to forget which commands to run. The commands are: root@host # cvmfs_server add-replica -o root http://oasis.opensciencegrid.org:8000/cvmfs/config-osg.opensciencegrid.org /etc/cvmfs/keys/opensciencegrid.org/opensciencegrid.org.pub root@host # cvmfs_server snapshot config-osg.opensciencegrid.org With large repositories that can take a very long time, but with small repositories it should be very quick and not show any errors. Verifying that the replica is being served \u00b6 Now to verify that the replication is working, do the following commands: root@host # wget -qdO- http://localhost:8000/cvmfs/config-osg.opensciencegrid.org/.cvmfspublished | cat -v root@host # wget -qdO- http://localhost:80/cvmfs/config-osg.opensciencegrid.org/.cvmfspublished | cat -v Both commands should show a short file including gibberish at the end which is the signature. It is a good idea to familiarize yourself with the log entries at /var/log/httpd/access_log and also, if you have installed frontier-squid, at /var/log/squid/access.log . Also, at least 15 minutes after the snapshot is finished, check the log /var/log/cvmfs/snapshots.log to see that it tried to get an update and got no errors. Setting up monitoring \u00b6 If you installed frontier-squid and frontier-awstats, there is a little more to do to configure monitoring. First, make sure that your firewall accepts UDP queries from the monitoring server at CERN. Details are in the frontier-squid instructions . Next, choose any random password and put it in /etc/awstats/password-file . Then tell Dave Dykstra the fully qualified domain name of your machine and the password you chose, and he'll set up the monitoring servers. Finally, install the cvmfs-servermon package so the stratum 1 can be watched for problems with repositories. Managing replication \u00b6 Instead of manually managing replication it is highly recommended to use the cvmfs-manage-replicas package which can automatically add repositories based on wildcards of repositories installed elsewhere.","title":"Install a CVMFS Stratum 1"},{"location":"other/install-cvmfs-stratum1/#install-a-cvmfs-stratum-1","text":"This document describes how to install a CVMFS Stratum 1. There are many different variations on how to do that, but this document focuses on the configuration of the OSG Operations Stratum 1 oasis-replica.opensciencegrid.org. It is applicable to other Stratum 1s as well, very likely with modifications (some of which are suggested in the document below). Applicable versions The applicable software versions for this document are cvmfs and cvmfs-server >= 2.4.2.","title":"Install a CVMFS Stratum 1"},{"location":"other/install-cvmfs-stratum1/#before-starting","text":"Before starting the installation process, consider the following points: User IDs and Group IDs: If your machine is also going to be a repository server like OSG Operations, the installation will create the same user and group IDs as the cvmfs client . If you are installing frontier-squid, the installation will also create the same user id as frontier-squid . Network ports: This installation will host the stratum 1 on ports 80, 8000 and 8080, and if squid is installed it will host the uncached apache on port 8081. Port 80 is default but sometimes runs into operational problems, port 8000 is the alternate for most production use, and port 8080 is for Cloudflare (https://openhtc.io). Host choice: - Make sure there is adequate disk space for all the repositories that will be served, at /srv/cvmfs . In addition, about 100GB should be reserved for apache and squid logs under /var/log on a production server, although they normally will not get that large. Apache logs get larger than squid logs because by default they are rotated much less frequently. Many installations share that space with the filesystem used for /srv/cvmfs by turning that directory along with /var/log/squid and /var/log/httpd into symlinks pointing to directories on the big filesystem. SELinux - Ensure SELinux is disabled As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories","title":"Before Starting"},{"location":"other/install-cvmfs-stratum1/#installing","text":"All CVMFS Stratum 1s require cvmfs-server software and apache (httpd). It is highly recommended to also install frontier-squid and frontier-awstats on the same machine to be able to easily join the WLCG MRTG and awstats monitoring systems. The recommended configuration for frontier-squid below only caches geo api lookups. Other than that, it is primarily for monitoring.","title":"Installing"},{"location":"other/install-cvmfs-stratum1/#installing-cvmfs-server-and-httpd","text":"Use this command to install cvmfs-server and httpd: root@host # yum -y install cvmfs-server cvmfs-config mod_wsgi","title":"Installing cvmfs-server and httpd"},{"location":"other/install-cvmfs-stratum1/#installing-frontier-squid-and-frontier-awstats","text":"frontier-awstats is not distributed by OSG so these instructions get it from its original source. Do these commands to install frontier-squid and frontier-awstats: root@host # rpm -i http://frontier.cern.ch/dist/rpms/RPMS/noarch/frontier-release-1.1-1.noarch.rpm root@host # yum -y install frontier-squid frontier-awstats","title":"Installing frontier-squid and frontier-awstats"},{"location":"other/install-cvmfs-stratum1/#configuring","text":"","title":"Configuring"},{"location":"other/install-cvmfs-stratum1/#configuring-the-system","text":"Increase the default number of open file descriptors: root@host # echo -e \"*\\t\\t-\\tnofile\\t\\t16384\" >>/etc/security/limits.conf root@host # ulimit -n 16384 In order for this to apply also interactively when logging in over ssh, the option UsePAM has to be set to yes in /etc/ssh/sshd_config .","title":"Configuring the system"},{"location":"other/install-cvmfs-stratum1/#configuring-cron","text":"First, create the log directory: root@host # mkdir -p /var/log/cvmfs Put the following in /etc/cron.d/cvmfs : */5 * * * * root test -d /srv/cvmfs || exit;cvmfs_server snapshot -ai 6 1 * * * root cvmfs_server gc -af 2>/dev/null || true 0 9 * * * root find /srv/cvmfs/*.*/data/txn -name \"*.*\" -mtime +2 2>/dev/null|xargs rm -f Also, put the following in /etc/logrotate.d/cvmfs : /var/log/cvmfs/*.log { weekly missingok notifempty }","title":"Configuring cron"},{"location":"other/install-cvmfs-stratum1/#configuring-apache","text":"If you are installing frontier-squid, create /etc/httpd/conf.d/cvmfs.conf and put the following lines into it: Listen 8081 KeepAlive On If you are not installing frontier-squid, instead put the following lines into that file: Listen 8000 Listen 8080 KeepAlive On Then enable apache: root@host # systemctl enable httpd root@host # systemctl start httpd","title":"Configuring apache"},{"location":"other/install-cvmfs-stratum1/#configuring-frontier-squid","text":"Put the following in /etc/squid/customize.sh after the existing comment header: awk -- file ` dirname $ 0 ` / customhelps . awk -- source '{ # cache only api calls insertline(\"^http_access deny all\", \"acl CVMFSAPI urlpath_regex ^/cvmfs/[^/]*/api/\") insertline(\"^http_access deny all\", \"cache deny !CVMFSAPI\") # port 80 is also supported, through an iptables redirect setoption(\"http_port\", \"8080 accel defaultsite=localhost:8081 no-vhost\") insertline(\"^http_port\",\"http_port 8000 accel defaultsite=localhost:8081 no-vhost\") setoption(\"cache_peer\", \"localhost parent 8081 0 no-query originserver\") # allow incoming http accesses from anywhere # all requests will be forwarded to the originserver commentout(\"http_access allow NET_LOCAL\") insertline(\"^http_access deny all\", \"http_access allow all\") # do not let squid cache DNS entries more than 5 minutes setoption(\"positive_dns_ttl\", \"5 minutes\") # set shutdown_lifetime to 0 to avoid giving new connections error # codes, which get cached upstream setoption(\"shutdown_lifetime\", \"0 seconds\") # turn off collapsed_forwarding to prevent slow clients from slowing down # faster ones setoption(\"collapsed_forwarding\", \"off\") print }' On EL7 and EL8 systems, make sure that firewalld is disabled and iptables-services is installed and enabled: root@host # systemctl stop firewalld root@host # systemctl disable firewalld root@host # systemctl mask --now firewalld root@host # yum -y install iptables-services root@host # systemctl start iptables root@host # systemctl enable iptables root@host # systemctl start ip6tables root@host # systemctl enable ip6tables Forward port 80 to port 8000: root@host # iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000 root@host # service iptables save root@host # ip6tables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000 root@host # service ip6tables save Enable frontier-squid: root@host # systemctl enable frontier-squid root@host # systemctl start frontier-squid Note The above configuration is for a single squid thread, which is fine for 1Gbit/s and possibly 2Gbit/s, but if higher bandwidth is needed, see the instructions for running multiple squid workers .","title":"Configuring frontier-squid"},{"location":"other/install-cvmfs-stratum1/#verifying","text":"In order to verify that everything is installed correctly, create a repository replica. The repository chosen for the instructions below is the OSG config repository because it is very small, but you can use another one if you prefer.","title":"Verifying"},{"location":"other/install-cvmfs-stratum1/#adding-an-example-repository","text":"It's a good idea to make your own script for adding repository replicas, because there's always at least two commands to run, and it's easy to forget which commands to run. The commands are: root@host # cvmfs_server add-replica -o root http://oasis.opensciencegrid.org:8000/cvmfs/config-osg.opensciencegrid.org /etc/cvmfs/keys/opensciencegrid.org/opensciencegrid.org.pub root@host # cvmfs_server snapshot config-osg.opensciencegrid.org With large repositories that can take a very long time, but with small repositories it should be very quick and not show any errors.","title":"Adding an example repository"},{"location":"other/install-cvmfs-stratum1/#verifying-that-the-replica-is-being-served","text":"Now to verify that the replication is working, do the following commands: root@host # wget -qdO- http://localhost:8000/cvmfs/config-osg.opensciencegrid.org/.cvmfspublished | cat -v root@host # wget -qdO- http://localhost:80/cvmfs/config-osg.opensciencegrid.org/.cvmfspublished | cat -v Both commands should show a short file including gibberish at the end which is the signature. It is a good idea to familiarize yourself with the log entries at /var/log/httpd/access_log and also, if you have installed frontier-squid, at /var/log/squid/access.log . Also, at least 15 minutes after the snapshot is finished, check the log /var/log/cvmfs/snapshots.log to see that it tried to get an update and got no errors.","title":"Verifying that the replica is being served"},{"location":"other/install-cvmfs-stratum1/#setting-up-monitoring","text":"If you installed frontier-squid and frontier-awstats, there is a little more to do to configure monitoring. First, make sure that your firewall accepts UDP queries from the monitoring server at CERN. Details are in the frontier-squid instructions . Next, choose any random password and put it in /etc/awstats/password-file . Then tell Dave Dykstra the fully qualified domain name of your machine and the password you chose, and he'll set up the monitoring servers. Finally, install the cvmfs-servermon package so the stratum 1 can be watched for problems with repositories.","title":"Setting up monitoring"},{"location":"other/install-cvmfs-stratum1/#managing-replication","text":"Instead of manually managing replication it is highly recommended to use the cvmfs-manage-replicas package which can automatically add repositories based on wildcards of repositories installed elsewhere.","title":"Managing replication"},{"location":"other/install-gwms-frontend/","text":"GlideinWMS VO Frontend Installation \u00b6 This document describes how to install the Glidein Workflow Managment System (GlideinWMS) VO Frontend for use with the OSG Glidein factory. This software is the minimum requirement for a VO to use GlideinWMS. This document assumes expertise with HTCondor and familiarity with the GlideinWMS software. It does not cover anything but the simplest possible install. Please consult the GlideinWMS reference documentation for advanced topics, including non- root , non-RPM-based installation. This document covers three components of the GlideinWMS a VO needs to install: User Pool Collectors : A set of condor_collector processes. Pilots submitted by the factory will join to one of these collectors to form a HTCondor pool. User Pool Schedd : A condor_schedd . Users may submit HTCondor vanilla universe jobs to this schedd; it will run jobs in the HTCondor pool formed by the User Pool Collectors . Glidein Frontend : The frontend will periodically query the User Pool Schedd to determine the desired number of running job slots. If necessary, it will request the Factory to launch additional pilots. This guide covers installation of all three components on the same host: it is designed for small to medium VOs (see the Hardware Requirements below). Given a significant, large host, we have been able to scale the single-host install to 20,000 running jobs. Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If they do not exist already, the installation will create the Linux users apache (UID 48), condor , frontend , and gratia Network: The VO frontend must have reliable network connectivity and be on the public internet (i.e. no NAT). The latest version requires the following TCP ports to be open: 80 (HTTP) for monitoring and serving configuration to workers 9618 (HTCondor shared port) for HTCondor daemons including the Schedd and User Collector 9620 to 9660 for secondary collectors (depending on configuration, see below) Host choice : The GlideinWMS VO Frontend has the following hardware requirements for a production host: CPU : Four cores, preferably no more than 2 years old. RAM : 3GB plus 2MB per running job. For example, to sustain 2000 running jobs, a host with 5GB is needed. Disk : 30GB will be sufficient for all the binaries, config and log files related to GlideinWMS. As this will be an interactive access point, have enough disk space for your users' jobs. Note The default configuration uses a port range (9620 to 9660) for the secondary collectors. You can configure the secondary collectors to use the shared port 9618 instead; this will become the default in the future. Note GlideinWMS versions prior to 3.4.1 also required port 9615 for the Schedd, and did not support using shared port for the secondary collectors. If you are upgrading a standalone access point from version 3.4 or earlier, the default open port has changed from 9615 to 9618, and you need to update your firewall rules to reflect this change. You can figure out which port will be used by running the following command: condor_config_val SHARED_PORT_ARGS For more detailed information, see Configuring GlideinWMS Frontend . As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Credentials and Proxies \u00b6 The VO Frontend will use two credentials in its interactions with the other GlideinWMS services. At this time, these will be proxy files. the VO Frontend proxy (used to authenticate with the other GlideinWMS services). one or more GlideinWMS pilot proxies (used/delegated to the Factory services and submitted on the GlideinWMS pilot jobs). The VO Frontend proxy and the pilot proxy can be the same. By default, the VO Frontend will run as user frontend (UID is machine dependent) so these proxies must be owned by the user frontend . Note Both proxies need to be passwordless to allow automatic proxy renewal . VO Frontend proxy \u00b6 The use of a service certificate is recommended. Then you create a proxy from the certificate as explained in the proxy configuration section . You must give the Factory operations team the DN of this proxy when you initially setup the Frontend and each time the DN changes . Pilot proxies \u00b6 These proxies are used by the Factory to submit the GlideinWMS pilot jobs. Therefore, they must be authorized to access to the CEs (Factory entry points) where jobs are submitted. There is no need to notify the Factory operation about the DN of these proxies (neither at the initial registration nor for subsequent changes). These additional proxies have no special requirements or controls added by the Factory but will probably require VO attributes because of the CEs: if you are able to use one of these proxies to submit jobs to the corresponding CEs where the Factory runs GlideinWMS pilots for you, then the proxies are OK. You can test each of your proxies using globusrun or HTCondor-G. To check the important information about a PEM certificate you can use: openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout . You will need that to find out information for the configuration files and the request to the GlideinWMS Factory. OSG Factory access \u00b6 Before installing the GlideinWMS VO Frontend you need the information about a Glidein Factory that you can access: (recommended) OSG is managing a factory at UCSD You have another Glidein Factory that you can access To request access to the OSG Glidein Factory at UCSD you have to send an email to osg-gfactory-support@physics.ucsd.edu providing: Your Name The VO that is utilizing the VO Frontend The DN of the proxy you will use to communicate with the Factory (VO Frontend DN, e.g. the host certificate subject if you follow the proxy configuration section ) You can propose a security name that will have to be confirmed/changed by the Factory managers (see below) A list of sites where you want to run: Your VO must be supported on those sites You can provide a list or piggy back on existing lists, e.g. all the sites supported for the VO. Check with the Factory managers You can start with one single site In the reply from the OSG Factory managers you will receive some information needed for the configuration of your VO Frontend The exact spelling and capitalization of your VO name. Sometime is different from what is commonly used, e.g. OSG VO is \"OSGVO\". The host of the Factory Collector: gfactory-1.t2.ucsd.edu The DN os the factory, e.g. /DC=org/DC=doegrids/OU=Services/CN=gfactory-1.t2.ucsd.edu The factory identity, e.g.: gfactory@gfactory-1.t2.ucsd.edu The identity on the Factory you will be mapped to. Something like: username@gfactory-1.t2.ucsd.edu Your security name. A unique name, usually containing your VO name: My_SecName A string to add in the main Factory query_expr in the Frontend configuration, e.g. stringListMember(\"<VO>\",GLIDEIN_Supported_VOs) . This is used to select the entries you can use. From there you get the correct name of the VO (above in this list). Installing GlideinWMS Frontend \u00b6 Installing HTCondor \u00b6 If you don't have HTCondor already installed, you can install the HTCondor RPM from the OSG repository: root@host # yum install condor.x86_64 If you already have installed HTCondor using a tarball or a source other than the OSG ROM, you will need to install the empty-condor RPM: root@host # yum install empty-condor --enablerepo = osg-empty Installing the VO Frontend RPM \u00b6 Install the RPM and dependencies (be prepared for a lot of dependencies). root@host # yum install glideinwms-vofrontend This will install the current production release verified and tested by OSG with default HTCondor configuration. This command will install the GlideinWMS vofrontend, HTCondor, the OSG client, and all the required dependencies all on one node. If you wish to install a different version of GlideinWMS, add the \"--enablerepo\" argument to the command as follows: yum install --enablerepo=osg-testing glideinwms-vofrontend : The most recent production release, still in testing phase. This will usually match the current tarball version on the GlideinWMS home page. (The osg-release production version may lag behind the tarball release by a few weeks as it is verified and packaged by OSG). Note that this will also take the osg-testing versions of all dependencies as well. yum install --enablerepo=osg-upcoming glideinwms-vofrontend : The most recent development series release, i.e. version 3.3 release. This has newer features such as cloud submission support, but is less tested. Note that these commands will install default HTCondor configurations with all GlideinWMS services on one node. Installing GlideinWMS Frontend on Multiple Nodes (Advanced) \u00b6 For advanced users expecting heavy usage on their access point, you may want to consider splitting the user collector, user submit, and vo frontend services. This can be doing using the following three commands (on different machines): root@host # yum install glideinwms-vofrontend-standalone root@host # yum install glideinwms-usercollector root@host # yum install glideinwms-userschedd In addition, you will need to perform the following steps: On the vofrontend and userschedd, modify CONDOR_HOST to point to your usercollector. This is in /etc/condor/config.d/00_gwms_general.config . You can also override this value by placing it in a new config file. (For instance, /etc/condor/config.d/99_local_custom.config to avoid rpmsave/rpmnew conflicts on upgrades). In /etc/condor/certs/condor_mapfile , you will need to add the DNs of each machine (userschedd, usercollector, vofrontend). Take great care to escape all special characters. Alternatively, you can use the glidecondor_addDN to add these values. In the /etc/gwms-frontend/frontend.xml file, change the schedd locations to match the correct server. Also change the collectors tags at the bottom of the file. More details on frontend.xml are in the following sections. Configuring GlideinWMS Frontend \u00b6 After installing the RPM, you need to configure the components of the GlideinWMS VO Frontend: Edit Frontend configuration options Edit HTCondor configuration options Create a HTCondor grid map file Reconfigure and Start the Frontend Configuring the Frontend \u00b6 The VO Frontend configuration file is /etc/gwms-frontend/frontend.xml . The next steps will describe each line that you will need to edit if you are using the OSG Factory at UCSD. The portions to edit are highlighted. If you are using a different Factory more changes are necessary, please check the VO Frontend configuration reference. The VO you are affiliated with. This will identify those CEs that the GlideinWMS pilot will be authorized to run on using the pilot proxy described previously in this section . Sometimes the whole query_expr is provided to you by the Factory operators (see Factory access above): <factory query_expr= '((stringListMember(\"VO\", GLIDEIN_Supported_VOs)))' > Factory collector information. The username that you are assigned by the Factory (also called the identity you will be mapped to on the factory, see above) . Note that if you are using a factory different than the production Factory, you will have to change also DN , factory_identity and node attributes. (refer to the information provided to you by the Factory operator): <collector DN= \"/DC=org/DC=doegrids/OU=Services/CN=gfactory-1.t2.ucsd.edu\" comment= \"Define factory collector globally for simplicity\" factory_identity= \"gfactory@gfactory-1.t2.ucsd.edu\" my_identity= \"username@gfactory-1.t2.ucsd.edu\" node= \"gfactory-1.t2.ucsd.edu\" /> Frontend security information. The classad_proxy in the security entry is the location of the VO Frontend proxy described previously here . The proxy_DN is the DN of the classad_proxy above. The security_name identifies this VO Frontend to the the Factory, It is provided by the Factory operator. The absfname in the credential entry is the location of the GlideinWMS pilot proxy described in the requirements section here . There can be multiple pilot proxies, or even other kind of keys (e.g. if you use cloud resources). The type and trust_domain of the credential must match respectively auth_method and trust_domain used in the entry definition in the Factory. If there is no match, between these two attributes in one of the credentials and the corresponding ones in some entry in one of the Factories, then this Frontend cannot trigger glideins. Both the classad_proxy and absfname files should be owned by frontend user. <security classad_proxy= \"/tmp/vo_proxy\" proxy_DN= \"DN of vo_proxy\" proxy_selection_plugin= \"ProxyAll\" security_name= \"The security name, this is used by factory\" sym_key= \"aes_256_cbc\" > <credentials> <credential absfname= \"/tmp/pilot_proxy\" security_class= \"frontend\" trust_domain= \"OSG\" type= \"grid_proxy\" /> </credentials> </security> The schedd information. The DN of the VO Frontend Proxy described previously here . The fullname attribute is the fully qualified domain name of the host where you installed the VO Frontend ( hostname --fqdn ). A secondary schedd is optional. You will need to delete the secondary schedd line if you are not using it. Multiple schedds allow the Frontend to service requests from multiple access points. <schedds> <schedd DN= \"Cert DN used by the schedd at fullname:\" fullname= \"Hostname of the schedd\" /> <schedd DN= \"Cert DN used by the second Schedd at fullname:\" fullname= \"schedd name@Hostname of second schedd\" /> </schedds> The User Collector information. The DN of the VO Frontend Proxy described previously here . The node attribute is the full hostname of the collectors ( hostname --fqdn ) and port The secondary attribute indicates whether the element is for the primary or secondary collectors (True/False). The default HTCondor configuration of the VO Frontend starts multiple Collector processes on the host ( /etc/condor/config.d/11_gwms_secondary_collectors.config ). The DN and hostname on the first line are the hostname and the host certificate of the VO Frontend. The DN and hostname on the second line are the same as the ones in the first one. The hostname (e.g. hostname.domain.tld) is filled automatically during the installation. The secondary collector connection can be defined as sinful string for the sock case , e.g., hostname.domain.tld:9618?sock=collector16. [Example 1] :::xml <collector DN=\"DN of main collector\" node=\"hostname.domain.tld:9618\" secondary=\"False\"/> <collector DN=\"DN of secondary collectors (usually same as DN in line above)\" node=\"hostname.domain.tld:9620-9660\" secondary=\"True\"/> Note In GlideinWMS v3.4.1, shared port only configuration is incompatible if talking to older Factories (v3.4 or older). We strongly recommend any user of GlideinWMS Frontend v3.4.1 or newer, to transition to the use of shared port for secondary collectors and CCBs. The shared port configuration is incompatible if your Frontend is talking to Factories v3.4 or older and you'll get an error telling you to wait. To transition to the use of shared port for secondary collectors, you have to change the collectors section in the Frontend configuration. If you are using the default port range for the secondary collectors as shown in [Example 2] below, then you should replace it with port 9618 and the sock-range as shown in [Example 1] above. If you have a more complex configuration, please read the detailed GlideinWMS configuration [Example 2] :::xml <collector DN=\"DN of main collector\" node=\"hostname.domain.tld:9618\" secondary=\"False\"/> <collector DN=\"DN of secondary collectors (usually same as DN in line above)\" node=\u201chostname.domain.tld:9618?sock=collector0-40\" secondary=\"True\"/> The CCBs information. If you have a different configuration of the HTCondor Connection Brokering (CCB servers) from the default (usually the section is empty as the User Collectors acts as CCB if needed), you can set the connection in the CCB section the same way that User Collector information previously mentioned. Also, the same rules for transition to shared_port of the connections, apply to the CCBs. :::xml <ccb DN=\"DN of the CCB server\" node=\"hostname.domain.tld:9618\"/> <ccb DN=\"DN of the CCB server\" node=\u201chostname.domain.tld:9618?sock=collector0-40\" secondary=\"True\"/> Warning The Frontend configuration includes many knobs, some of which are conflicting with a RPM installation where there is only one version of the Frontend installed and it uses well known paths. Do not change the following in the Frontend configuration (you must leave the default values coming with the RPM installation): frontend_versioning='False' (in the first line of XML, versioning is useful to install multiple tarball versions) for RPM installs, work base_dir must be /var/lib/gwms-frontend/vofrontend/ (other scripts like /etc/init.d/gwms-frontend count on that value) Using a Different Factory \u00b6 The configuration above points to the OSG production Factory. If you are using a different Factory, then you have to: replace gfactory@gfactory-1.t2.ucsd.edu and gfactory-1.t2.ucsd.edu with the correct values for your Factory. And control also that the name used for the Frontend () matches. make sure that the Factory is advertising the attributes used in the Factory query expression ( query_expr ). Configuring HTCondor \u00b6 The HTCondor configuration for the Frontend is placed in /etc/condor/config.d . 00_gwms_general.config 01_gwms_collectors.config 02_gwms_schedds.config 03_gwms_local.config 11_gwms_secondary_collectors.config 90_gwms_dns.config For most installations create a new file named /etc/condor/config.d/92_local_condor.config Using other HTCondor RPMs, e.g. UW Madison HTCondor RPM \u00b6 The above procedure will work if you are using the OSG HTCondor RPMS. You can verify that you used the OSG HTCondor RPM by using yum list condor . The version name should include \"osg\", e.g. 8.6.4-3.osg.el7 . If you are using the UW Madison HTCondor RPMS, be aware of the following changes: This HTCondor RPM uses a file /etc/condor/condor_config.local to add your local machine slot to the user pool. If you want to disable this behavior (recommended), you should blank out that file or comment out the line in /etc/condor/condor_config for LOCAL_CONFIG_FILE. (Make sure that LOCAL_CONFIG_DIR is set to /etc/condor/config.d ) Note that the variable LOCAL_DIR is set differently in UW Madison and OSG RPMs. This should not cause any more problems in the GlideinWMS RPMs, but please take note if you use this variable in your job submissions or other customizations. In general if you are using a non OSG RPM or if you added custom configuration files for HTCondor please check the order of the configuration files: root@host # condor_config_val -config Configuration source: /etc/condor/condor_config Local configuration sources: /etc/condor/config.d/00_gwms_general.config /etc/condor/config.d/01_gwms_collectors.config /etc/condor/config.d/02_gwms_schedds.config /etc/condor/config.d/03_gwms_local.config /etc/condor/config.d/11_gwms_secondary_collectors.config /etc/condor/config.d/90_gwms_dns.config /etc/condor/condor_config.local If, like in the example above, the GlideinWMS configuration files are not the last ones in the list please verify that important configuration options have not been overridden by the other configuration files. Verifying your HTCondor configuration \u00b6 The GlideinWMS configuration files in /etc/condor/config.d should be the last ones in the list. If not, please verify that important configuration options have not been overridden by the other configuration files. Verify the alll the expected HTCondor daemons are running: root@host # condor_config_val -verbose DAEMON_LIST DAEMON_LIST: MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, SHARED_PORT, COLLECTOR0 COLLECTOR1 COLLECTOR2 COLLECTOR3 COLLECTOR4 COLLECTOR5 COLLECTOR6 COLLECTOR7 COLLECTOR8 COLLECTOR9 COLLECTOR10 , COLLECTOR11, COLLECTOR12, COLLECTOR13, COLLECTOR14, COLLECTOR15, COLLECTOR16, COLLECTOR17, COLLECTOR18, COLLECTOR19, COLLECTOR20, COLLECTOR21, COLLECTOR22, COLLECTOR23, COLLECTOR24, COLLECTOR25, COLLECTOR26, COLLECTOR27, COLLECTOR28, COLLECTOR29, COLLECTOR30, COLLECTOR31, COLLECTOR32, COLLECTOR33, COLLECTOR34, COLLECTOR35, COLLECTOR36, COLLECTOR37, COLLECTOR38, COLLECTOR39, COLLECTOR40 Defined in '/etc/condor/config.d/11_gwms_secondary_collectors.config', line 193. If you don't see all the collectors. shared port and the two schedd, then the configuration must be corrected. There should be no startd daemons listed. Creating a HTCondor grid mapfile. \u00b6 The HTCondor mapfile ( /etc/condor/certs/condor_mapfile ) is used for authentication between the GlideinWMS pilot running on a remote worker node, and the local collector. HTCondor uses the mapfile to map certificates to pseudo-users on the local machine. It is important that you map the DN's of: Each schedd proxy : The DN of each schedd that the frontend talks to. Specified in the frontend.xml schedd element DN attribute: <schedds> <schedd DN= \"/DC=org/DC=doegrids/OU=Services/CN=YOUR_HOST\" fullname= \"YOUR_HOST\" /> <schedd DN= \"/DC=org/DC=doegrids/OU=Services/CN=YOUR_HOST\" fullname= \"schedd_jobs2@YOUR_HOST\" /> </schedds> Frontend proxy : The DN of the proxy that the Frontend uses to communicate with the other GlideinWMS services. Specified in the frontend.xml security element proxy_DN attribute: <security classad_proxy= \"/tmp/vo_proxy\" proxy_DN= \"DN of vo_proxy\" .... Each pilot proxy The DN of each proxy that the frontend forwards to the factory to use with the GlideinWMS pilots. This allows the GlideinWMS pilot jobs to communicate with the User Collector. Specified in the frontend.xml proxy absfname attribute (you need to specify the DN of each of those proxies: <security .... <proxies > < proxy absfname= \"/tmp/vo_proxy\" .... : </proxies > Below is an example mapfile, by default found in /etc/condor/certs/condor_mapfile . In this example there are lines for each of services mentioned above. GSI \"<DN OF SCHEDD PROXY>\" schedd GSI \"<DN OF FRONTEND PROXY>\" frontend GSI \"<DN OF PILOT PROXY>\" pilot_proxy GSI \"^/DC=org/DC=doegrids/OU=Services/CN=personal-submit-host2.mydomain.edu$\" <example_of_format> GSI (.*) anonymous FS (.*) \\1 Change <DN OF SCHEDD PROXY> , <DN OF FRONTEND PROXY> , and <DN OF PILOT PROXY> to the distinguished names of the respective proxies. Restarting HTCondor \u00b6 After configuring HTCondor, be sure to restart HTCondor: root@host # service condor restart Proxy Configuration \u00b6 GlideinWMS comes with the gwms-renew-proxies service that can automatically generate and renew the pilot proxies and VO Frontend proxy . To configure this service, modify /etc/gwms-frontend/proxies.ini using the following instructions: For each of your pilot proxies , create a [PILOT <NAME>] section, where <NAME> is a descriptive name for the proxy that is unique to your local configuration. In each section, set the proxy_cert , proxy_key , output , and vo corresponding to each pilot proxy: [PILOT <NAME>] proxy_cert = <PATH TO THE PILOT CERTIFICATE> proxy_key = <PATH TO THE PILOT KEY> output = <PATH TO CREATE THE PILOT PROXY> vo = <NAME OF VIRTUAL ORGANIZATION> Change <PATH TO THE PILOT CERTIFICATE> , <PATH TO THE PILOT KEY> and <PATH TO CREATE THE PILOT PROXY> appropriately to point to the locations of the pilot certificate, pilot key, and pilot proxy, respectively. Additionally, in each [PILOT <NAME>] section, you must specify how the proxy's VOMS attributes will be signed by setting use_voms_server . Choose one of the following options: To directly sign the VOMS attributes (recommended), you must have access to the vo 's certificate and key. Specify the paths to the vo certificate and key, and optionally, the VOMS attribute (e.g. /osg/Role=NULL/Capability=NULL for the OSG VO): use_voms_server = false vo_cert = <PATH TO THE PILOT CERTIFICATE> vo_key = <PATH TO THE PILOT KEY> fqan = <VOMS ATTRIBUTE> Note If you do not have access to the vo 's voms_cert and voms_key , contact the VO manager. To have your proxy's VOMS attributes signed by the vo 's VOMS server, set use_voms_server = true and the VOMS attribute (e.g. /osg/Role=NULL/Capability=NULL for the OSG VO): use_voms_server = true fqan = <VOMS ATTRIBUTE> Warning Due to the retirement of VOMS Admin server in the OSG, use_voms_server = false is the preferred method for signing VOMS attributes. Optionally, the proxy renewal frequency and lifetime (in hours) can be specified in each [PILOT <NAME>] section: # Default: 1 frequency = <RENEWAL FREQUENCY> # Default: 24 lifetime = <PROXY LIFETIME> Configure the location and output of the VO Frontend proxy under the [FRONTEND] section and set the proxy_cert , proxy_key , and output to paths corresponding to your VO Frontend: [FRONTEND] proxy_cert = <PATH TO THE FRONTEND CERTIFICATE> proxy_key = <PATH TO THE FRONTEND KEY> output = <PATH TO CREATE THE FRONTEND PROXY> Note output must be the same path as the classad_proxy specified in this section (OPTIONAL) If you are running the gwms-frontend service under a <NON-DEFAULT USER> (default: frontend ), specify the user as the owner of your proxies under the [COMMON] section: [COMMON] owner = <NON-DEFAULT USER> Note The [COMMON] section is required but its contents are optional Adding Gratia Accounting and a Local Monitoring Page on a Production Server \u00b6 You must report accounting information if you are running more than a few test jobs on the OSG . Install the GlideinWMS Gratia Probe on each of your access points in your GlideinWMS installation: root@host # yum install gratia-probe-glideinwms Edit the ProbeConfig located in /etc/gratia/condor/ProbeConfig . First, edit the SiteName and ProbeName to be a unique identifier for your GlideinWMS access point. There can be multiple probes (with different names) per site. If you haven't already, you should register your GlideinWMS access point in OIM . Then you can use the name you used to register the resource. ProbeName=\"condor:<hostname>\" SiteName=\"HCC-GlideinWMW-Frontend\" Next, turn the probe on by setting EnableProbe : EnableProbe=\"1\" Reconfigure HTCondor: root@host # condor_reconfig Optional Accounting Configuration \u00b6 The following sections contain additional configuration that may be required depending on the customizations you've made to your GlideinWMS frontend installation. Users without Certificates \u00b6 If you have users that submit jobs without a certificate explicitly declared in the submit file, you will need to add MapUnknownToGroup to the ProbeConfig. In the file /etc/gratia/condor/ProbeConfig , add the value after the EnableProbe . ... SuppressGridLocalRecords=\"0\" EnableProbe=\"1\" MapUnknownToGroup=\"1\" Title3=\"Tuning parameter\" ... Further, if you want to record all usage as coming from a single VO, you can configure the probe to override the 'guessed' VO. In the below example, replace <ENGAGE> with a registered VO that you would like to report as. If you don't have a VO that you are affiliated with, you may use \"Engage\". ... MapUnknownToGroup=\"1\" MapGroupToRole=\"1\" VOOverride=\"<ENGAGE>\" ... Non-Standard HTCondor Install \u00b6 If HTCondor is installed in a non-standard location (i.e., not RPMs, or relocated RPM outside /usr/bin ), then you need to tell the probe where to find the HTCondor binaries. This can be done with a script with a special attribute in /etc/gratia/condor/ProbeConfig , CondorLocation . Point it to the location of the HTCondor install, such that CondorLocation/bin/condor_version exists. New Data Directory \u00b6 If your PER_JOB_HISTORY_DIR HTCondor configuration variable is different from the default value, you must update the value of DataFolder in /etc/gratia/condor/ProbeConfig . To check the value of PER_JOB_HISTORY_DIR run the following command: user@host $ condor_config_val PER_JOB_HISTORY_DIR Different collector and other customizations \u00b6 By default the probe reports to the OSG GRACC. To change that you must edit the configuration file, /etc/gratia/condor/ProbeConfig , and replace the OSG production host with your desired one: ... CollectorHost=\"gratia-osg-prod.opensciencegrid.org:80\" SSLHost=\"gratia-osg-prod.opensciencegrid.org:443\" SSLRegistrationHost=\"gratia-osg-prod.opensciencegrid.org:80\" ... Optional Configuration \u00b6 The following configuration steps are optional and will likely not be required for setting up a small site. If you do not need any of the following special configurations, skip to the section on using GlideinWMS . Allow users to specify where their jobs run Creating a group to test configuration changes Allowing users to specify where their jobs run \u00b6 In order to allow users to specify the sites at which their jobs want to run (or to test a specific site), a Frontend can be configured to match on DESIRED_Sites or ignore it if not specified. Modify /etc/gwms-frontend/frontend.xml using the following instructions: In the Frontend's global <match> stanza, set the match_expr : '((job.get(\"DESIRED_Sites\",\"nosite\")==\"nosite\") or (glidein[\"attrs\"][\"GLIDEIN_Site\"] in job.get(\"DESIRED_Sites\",\"nosite\").split(\",\")))' In the same <match> stanza, set the start_expr : '(DESIRED_Sites=?=undefined || stringListMember(GLIDEIN_Site,DESIRED_Sites,\",\")) Add the DESIRED_Sites attribute to the match attributes list: <match_attrs> <match_attr name= \"DESIRED_Sites\" type= \"string\" /> </match_attrs> Reconfigure the Frontend: root@host # /etc/init.d/gwms-frontend reconfig Creating a group for testing configuration changes \u00b6 To perform configuration changes without impacting production the recommended way is to create an ITB group in /etc/gwms-frontend/frontend.xml . This groupwould only match jobs that have the +is_itb=True ClassAd. Create a group named itb. Set the group's start_expr so that the group's glideins will only match user jobs with +is_itb=True : <match match_expr= \"True\" start_expr= \"(is_itb)\" > Set the factory_query_expr so that this group only communicates with ITB factories: <factory query_expr= 'FactoryType=?=\"itb\"' > Set the group's collector stanza to reference the ITB factory, replacing username@gfactory-1.t2.ucsd.edu with your factory identity: <collector DN= \"/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=glidein-itb.grid.iu.edu\" \\ factory_identity= \"gfactory@glidein-itb.grid.iu.edu\" \\ my_identity= \"username@gfactory-1.t2.ucsd.edu\" \\ node= \"glidein-itb.grid.iu.edu\" /> Set the job query_expr so that only ITB jobs appear in condor_q : <job query_expr= \"(!isUndefined(is_itb) && is_itb)\" > Reconfigure the Frontend (see the section below ): # on EL7 systems systemctl reload gwms-frontend Using GlideinWMS \u00b6 Managing GlideinWMS Services \u00b6 In addition to the GlideinWMS service itself, there are a number of supporting services in your installation. The specific services are: Software Service name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info Gratia gratia-probes-cron Accounting software HTCondor condor HTTPD httpd GlideinWMS monitoring and staging GlideinWMS gwms-renew-proxies.timer Automatic proxy renewal gwms-frontend The main GlideinWMS service Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> Reconfiguring GlideinWMS \u00b6 After changing the configuration of GlideinWMS, run the following command as root : root@host # systemctl reload gwms-frontend Note Note that systemctl reload gwms-frontend will work only if: - gwms-frontend service is running - gwms-frontend service was started with systemctl Otherwise, you will get the following error in any of the cases: # systemctl reload gwms-frontend Job for gwms-frontend.service invalid. Upgrading GlideinWMS FrontEnd \u00b6 After upgrading the GlideinWMS RPM, you must issue an upgrade command to GlideinWMS: Stop the condor and gwms-frontend services as specified in this section Issue the upgrade command: root@host # /usr/sbin/gwms-frontend upgrade Start the condor and gwms-frontend services as specified in this section Validating GlideinWMS Frontend \u00b6 The complete validation of the Frontend is the submission of actual jobs. However, there are a few things that can be checked prior to submitting user jobs to HTCondor. Verifying Services Are Running \u00b6 There are a few things that can be checked prior to submitting user jobs to HTCondor. Verify all HTCondor daemons are started. user@host $ condor_config_val -verbose DAEMON_LIST DAEMON_LIST: MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, SHARED_PORT, SCHEDDJOBS2 COLLECTOR0 COLLECTOR1 COLLECTOR2 COLLECTOR3 COLLECTOR4 COLLECTOR5 COLLECTOR6 COLLECTOR7 COLLECTOR8 COLLECTOR9 COLLECTOR10 , COLLECTOR11, COLLECTOR12, COLLECTOR13, COLLECTOR14, COLLECTOR15, COLLECTOR16, COLLECTOR17, COLLECTOR18, COLLECTOR19, COLLECTOR20, COLLECTOR21, COLLECTOR22, COLLECTOR23, COLLECTOR24, COLLECTOR25, COLLECTOR26, COLLECTOR27, COLLECTOR28, COLLECTOR29, COLLECTOR30, COLLECTOR31, COLLECTOR32, COLLECTOR33, COLLECTOR34, COLLECTOR35, COLLECTOR36, COLLECTOR37, COLLECTOR38, COLLECTOR39, COLLECTOR40 Defined in '/etc/condor/config.d/11_gwms_secondary_collectors.config', line 193. If you don't see all the collectors and the two schedds , then the configuration must be corrected. There should be no startd daemons listed Verify all VO Frontend HTCondor services are communicating. user@host $ condor_status -any MyType TargetType Name glideresource None MM_fermicloud026@gfactory_inst Scheduler None fermicloud020.fnal.gov DaemonMaster None fermicloud020.fnal.gov Negotiator None fermicloud020.fnal.gov Collector None frontend_service@fermicloud020.fnal.gov Scheduler None schedd_jobs2@fermicloud020.fnal.gov To see the details of the glidein resource use condor_status -subsystem glideresource -l , including the GlideFactoryName. Verify that the Factory is seeing correctly the Frontend using condor_status -pool <FACTORY_HOST> -any -constraint 'FrontendName==\"<FRONTEND_NAME_FROM_CONFIG>\"' -l , including the GlideFactoryName. Where <FACTORY_HOST> is the hostname of the factory being used, for example: gfactory-1.t2.ucsd.edu and is the value set for \"frontend_name\" in the frontend.xml file GlideinWMS Job submission \u00b6 HTCondor submit file glidein-job.sub . This is a simple job printing the hostname of the host where the job is running: #file glidein-job.sub universe = vanilla executable = /bin/hostname output = glidein/test.out error = glidein/test.err requirements = IS_GLIDEIN == True log = glidein/test.log ShouldTransferFiles = YES when_to_transfer_output = ON_EXIT queue To submit the job: root@host # condor_submit glidein-job.sub Then you can control the job like a normal HTCondor job, e.g. to check the status of the job use condor_q . Monitoring Web pages \u00b6 You should be able to see the jobs also in the GlideinWMS monitoring pages that are made available on the Web: http://gwms-frontend-host.domain/vofrontend/monitor/ Troubleshooting GlideinWMS \u00b6 File Locations \u00b6 File Description File Location Configuration file /etc/gwms-frontend/frontend.xml Logs /var/log/gwms-frontend/ Startup script /usr/bin/gwms-frontend Web Directory /var/lib/gwms-frontend/web-area Web Base /var/lib/gwms-frontend/web-base Web configuration /etc/httpd/conf.d/gwms-frontend.conf Working Directory /var/lib/gwms-frontend/vofrontend/ Lock files /var/lib/gwms-frontend/vofrontend/lock/frontend.lock /var/lib/gwms-frontend/vofrontend/group_*/lock/frontend.lock Status files /var/lib/gwms-frontend/vofrontend/monitor/group_*/frontend_status.xml Note /var/lib/gwms-frontend is also the home directory of the frontend user Certificates brief \u00b6 Here a short list of files to check when you change the certificates. Note that if you renew a proxy or certificate and the DN remains the same no configuration file needs to change, just put the renewed certificate/proxy in place. File Description File Location Configuration file /etc/gwms-frontend/frontend.xml HTCondor certificates map /etc/condor/certs/condor_mapfile (1) Host certificate and key (2) /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem VO Frontend proxy (from host certificate) /tmp/vofe_proxy (3) Pilot proxy /tmp/pilot_proxy (3) If using HTCondor RPM installation, e.g. the one coming from OSG. If you have separate/multiple HTCondor hosts (schedds, collectors, negotiators, ..) you may have to check this file on all of them to make sure that the HTCondor authentication works correctly. Used to create the VO Frontend proxy if following the instructions above If using the Frontend configuration and scripts described above in this document . These paths are the ones specified in the configuration file. Remember also that when you change DN: The VO Frontend certificate DN must be communicated to the GlideinWMS Factory ( see above ) The pilot proxy must be able to run jobs at the sites you are using, e.g. by being added to the correct VO in OSG (the Factory forwards the proxy and does not care about the DN) Increase the log level and change rotation policies \u00b6 You can increase the log level of the frontend. To add a log file with all the log information add the following line with all the message types in the process_log section of /etc/gwms-frontend/frontend.xml : <log_retention> <process_logs> <process_log extension= \"all\" max_days= \"7.0\" max_mbytes= \"100.0\" min_days= \"3.0\" msg_types= \"DEBUG,EXCEPTION,INFO,ERROR,ERR\" /> You can also change the rotation policy and choose whether compress the rotated files, all in the same section of the config files: max_bytes is the max size of the log files max_days it will be rotated. compression specifies if rotated files are compressed backup_count is the number of rotated log files kept Further details are in the reference documentation . Frontend reconfig failing \u00b6 If service gwms-frontend reconfig fails at the end with an error like \"Writing back config file failed, Reconfiguring the frontend [FAILED]\", make sure that /etc/gwms-frontend/ belongs to the frontend user. It must be able to write to update the configuration file. Frontend failing to start \u00b6 If the startup script of the frontend is failing, check the log file for errors (probably /var/log/gwms-frontend/frontend/frontend.<TODAY's DATE>.err.log and .debug.log ). If you find errors like \"Exception occurred: ... 'ExpatError: no element found: line 1, column 0\\n']\" and \"IOError: [Errno 9] Bad file descriptor\" you may have an empty status file ( /var/lib/gwms-frontend/vofrontend/monitor/group_*/frontend_status.xml ) that causes GlideinWMS Frontend not to start. The glideinFrontend crashes after a XML parsing exception visible in the log file (\"Exception occurred: ... 'ExpatError: no element found: line 1, column 0\\n']\"). Remove the status file. Then start the frontend. The Frontend will be fixed in future versions to handle this automatically. Certificates not there \u00b6 The scripts should send an email warning if there are problems and they fail to generate the proxies. Anyway something could go wrong and you want to check manually. If you are using the scripts to generate automatically the proxies but the proxies are not there (in /tmp or wherever you expect them): make sure that the scripts are there and configured with the correct values make sure that the scripts are executable make sure that the scripts are in frontend 's crontab make sure that the certificates (or master proxy) used to generate the proxies is not expired Failed authentication \u00b6 If you get a failed authentication error (e.g. \"Failed to talk to factory_pool gfactory-1.t2.ucsd.edu...) then: check that you have the right x509 certificates mentioned in the security section of /etc/gwms-frontend/frontend.xml the owner must be frontend (user running the frontend) the permission must be 600 they must be valid for more than one hour (2/300 hours), at least the non VO part check that the clock is synchronized (see HostTimeSetup) Frontend doesn't trust Factory \u00b6 If your frontend complains in the debug log: code 256:['Error: communication error\\n', 'AUTHENTICATE:1003:Failed to authenticate with any method\\n', 'AUTHENTICATE:1004:Failed to authenticate using GSI\\n', \"GSI:5006:Failed to authenticate because the subject '/DC=org/DC=doegrids/OU=Services/CN=devg-3.t2.ucsd.edu' is not currently trusted by you. If it should be, add it to GSI_DAEMON_NAME in the condor_config, or use the environment variable override (check the manual).\\n\", 'GSI:5004:Failed to gss_assist_gridmap /DC=org/DC=doegrids/OU=Services/CN=devg-3.t2.ucsd.edu to a local user. A possible solution is to comment/remove the LOCAL_CONFIG_DIR in the file /var/lib/gwms-frontend/vofrontend/frontend.condor_config . No security credentials match for factory pool ..., not advertising request \u00b6 You may see a warning like \"No security credentials match for factory pool ..., not advertising request\", if the trust_domain and auth_method of an entry in the Factory configuration is not matching any of the trust_domain , type couples in the credentials in the Frontend configuration. This causes the Frontend not to use some Factory entries (the ones not matching) and may end up without entries to send glideins to. To fix the problem make sure that those attributes match as desired. Jobs not running \u00b6 If your jobs remain Idle Check the frontend log files (see above) Check the HTCondor log files ( condor_config_val LOG will give you the correct log directory): Specifically look the CollectorXXXLog files Common causes of problems could be: x509 certificates missing or expired or too short-lived proxy incorrect ownership or permission on the certificate/proxy file missing certificates If the Frontend http server is down in the glidein logs in the Factory there will be errors like \"Failed to load file 'description.dbceCN.cfg' from http://FRONTEND_HOST/vofrontend/stage .\" check that the http server is running and you can reach the URL ( http://FRONTEND_HOST/vofrontend/stage/description.dbceCN.cfg ) Getting Help \u00b6 To get assistance about the OSG software please use this page . For specific questions about the Frontend configuration (and how to add it in your HTCondor infrastructure) you can email the glideinWMS support glideinwms-support@fnal.gov To request access the OSG Glidein Factory (e.g. the UCSD factory) you have to send an email to osg-gfactory-support@physics.ucsd.edu (see below). References \u00b6 Definitions: What is a Virtual Organization Documents about the Glidein-WMS system and the VO frontend: http://glideinwms.fnal.gov/ Users \u00b6 The Glidein WMS Frontend installation will create the following users unless they are already created. User Default uid Comment apache 48 Runs httpd to provide the monitoring page (installed via dependencies). condor none HTCondor user (installed via dependencies). frontend none This user runs the glideinWMS VO frontend. It also owns the credentials forwarded to the factory to use for the glideins. gratia none Runs the Gratia probes to collect accounting data (optional see the Gratia section below ) Warning UID 48 is reserved by RedHat for user apache . If it is already taken by a different username, you will experience errors. Certificates \u00b6 This document has a proxy configuration section that uses the host certificate/key and a user certificate to generate the required proxies. Certificate User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem Host key root /etc/grid-security/hostkey.pem Here are instructions to request a host certificate. Networking \u00b6 Service Name Protocol Port Number Inbound Outbound Comment HTCondor port range tcp LOWPORT, HIGHPORT YES contiguous range of ports GlideinWMS Frontend tcp 9618, 9620 to 9660 YES HTCondor Collectors for the GlideinWMS Frontend (received ClassAds from resources and jobs) The VO frontend must have reliable network connectivity, be on the public internet (no NAT), and preferably with no firewalls. Incoming TCP ports 9618 to 9660 must be open.","title":"Install GlideinWMS Frontend"},{"location":"other/install-gwms-frontend/#glideinwms-vo-frontend-installation","text":"This document describes how to install the Glidein Workflow Managment System (GlideinWMS) VO Frontend for use with the OSG Glidein factory. This software is the minimum requirement for a VO to use GlideinWMS. This document assumes expertise with HTCondor and familiarity with the GlideinWMS software. It does not cover anything but the simplest possible install. Please consult the GlideinWMS reference documentation for advanced topics, including non- root , non-RPM-based installation. This document covers three components of the GlideinWMS a VO needs to install: User Pool Collectors : A set of condor_collector processes. Pilots submitted by the factory will join to one of these collectors to form a HTCondor pool. User Pool Schedd : A condor_schedd . Users may submit HTCondor vanilla universe jobs to this schedd; it will run jobs in the HTCondor pool formed by the User Pool Collectors . Glidein Frontend : The frontend will periodically query the User Pool Schedd to determine the desired number of running job slots. If necessary, it will request the Factory to launch additional pilots. This guide covers installation of all three components on the same host: it is designed for small to medium VOs (see the Hardware Requirements below). Given a significant, large host, we have been able to scale the single-host install to 20,000 running jobs.","title":"GlideinWMS VO Frontend Installation"},{"location":"other/install-gwms-frontend/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If they do not exist already, the installation will create the Linux users apache (UID 48), condor , frontend , and gratia Network: The VO frontend must have reliable network connectivity and be on the public internet (i.e. no NAT). The latest version requires the following TCP ports to be open: 80 (HTTP) for monitoring and serving configuration to workers 9618 (HTCondor shared port) for HTCondor daemons including the Schedd and User Collector 9620 to 9660 for secondary collectors (depending on configuration, see below) Host choice : The GlideinWMS VO Frontend has the following hardware requirements for a production host: CPU : Four cores, preferably no more than 2 years old. RAM : 3GB plus 2MB per running job. For example, to sustain 2000 running jobs, a host with 5GB is needed. Disk : 30GB will be sufficient for all the binaries, config and log files related to GlideinWMS. As this will be an interactive access point, have enough disk space for your users' jobs. Note The default configuration uses a port range (9620 to 9660) for the secondary collectors. You can configure the secondary collectors to use the shared port 9618 instead; this will become the default in the future. Note GlideinWMS versions prior to 3.4.1 also required port 9615 for the Schedd, and did not support using shared port for the secondary collectors. If you are upgrading a standalone access point from version 3.4 or earlier, the default open port has changed from 9615 to 9618, and you need to update your firewall rules to reflect this change. You can figure out which port will be used by running the following command: condor_config_val SHARED_PORT_ARGS For more detailed information, see Configuring GlideinWMS Frontend . As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"other/install-gwms-frontend/#credentials-and-proxies","text":"The VO Frontend will use two credentials in its interactions with the other GlideinWMS services. At this time, these will be proxy files. the VO Frontend proxy (used to authenticate with the other GlideinWMS services). one or more GlideinWMS pilot proxies (used/delegated to the Factory services and submitted on the GlideinWMS pilot jobs). The VO Frontend proxy and the pilot proxy can be the same. By default, the VO Frontend will run as user frontend (UID is machine dependent) so these proxies must be owned by the user frontend . Note Both proxies need to be passwordless to allow automatic proxy renewal .","title":"Credentials and Proxies"},{"location":"other/install-gwms-frontend/#vo-frontend-proxy","text":"The use of a service certificate is recommended. Then you create a proxy from the certificate as explained in the proxy configuration section . You must give the Factory operations team the DN of this proxy when you initially setup the Frontend and each time the DN changes .","title":"VO Frontend proxy"},{"location":"other/install-gwms-frontend/#pilot-proxies","text":"These proxies are used by the Factory to submit the GlideinWMS pilot jobs. Therefore, they must be authorized to access to the CEs (Factory entry points) where jobs are submitted. There is no need to notify the Factory operation about the DN of these proxies (neither at the initial registration nor for subsequent changes). These additional proxies have no special requirements or controls added by the Factory but will probably require VO attributes because of the CEs: if you are able to use one of these proxies to submit jobs to the corresponding CEs where the Factory runs GlideinWMS pilots for you, then the proxies are OK. You can test each of your proxies using globusrun or HTCondor-G. To check the important information about a PEM certificate you can use: openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout . You will need that to find out information for the configuration files and the request to the GlideinWMS Factory.","title":"Pilot proxies"},{"location":"other/install-gwms-frontend/#osg-factory-access","text":"Before installing the GlideinWMS VO Frontend you need the information about a Glidein Factory that you can access: (recommended) OSG is managing a factory at UCSD You have another Glidein Factory that you can access To request access to the OSG Glidein Factory at UCSD you have to send an email to osg-gfactory-support@physics.ucsd.edu providing: Your Name The VO that is utilizing the VO Frontend The DN of the proxy you will use to communicate with the Factory (VO Frontend DN, e.g. the host certificate subject if you follow the proxy configuration section ) You can propose a security name that will have to be confirmed/changed by the Factory managers (see below) A list of sites where you want to run: Your VO must be supported on those sites You can provide a list or piggy back on existing lists, e.g. all the sites supported for the VO. Check with the Factory managers You can start with one single site In the reply from the OSG Factory managers you will receive some information needed for the configuration of your VO Frontend The exact spelling and capitalization of your VO name. Sometime is different from what is commonly used, e.g. OSG VO is \"OSGVO\". The host of the Factory Collector: gfactory-1.t2.ucsd.edu The DN os the factory, e.g. /DC=org/DC=doegrids/OU=Services/CN=gfactory-1.t2.ucsd.edu The factory identity, e.g.: gfactory@gfactory-1.t2.ucsd.edu The identity on the Factory you will be mapped to. Something like: username@gfactory-1.t2.ucsd.edu Your security name. A unique name, usually containing your VO name: My_SecName A string to add in the main Factory query_expr in the Frontend configuration, e.g. stringListMember(\"<VO>\",GLIDEIN_Supported_VOs) . This is used to select the entries you can use. From there you get the correct name of the VO (above in this list).","title":"OSG Factory access"},{"location":"other/install-gwms-frontend/#installing-glideinwms-frontend","text":"","title":"Installing GlideinWMS Frontend"},{"location":"other/install-gwms-frontend/#installing-htcondor","text":"If you don't have HTCondor already installed, you can install the HTCondor RPM from the OSG repository: root@host # yum install condor.x86_64 If you already have installed HTCondor using a tarball or a source other than the OSG ROM, you will need to install the empty-condor RPM: root@host # yum install empty-condor --enablerepo = osg-empty","title":"Installing HTCondor"},{"location":"other/install-gwms-frontend/#installing-the-vo-frontend-rpm","text":"Install the RPM and dependencies (be prepared for a lot of dependencies). root@host # yum install glideinwms-vofrontend This will install the current production release verified and tested by OSG with default HTCondor configuration. This command will install the GlideinWMS vofrontend, HTCondor, the OSG client, and all the required dependencies all on one node. If you wish to install a different version of GlideinWMS, add the \"--enablerepo\" argument to the command as follows: yum install --enablerepo=osg-testing glideinwms-vofrontend : The most recent production release, still in testing phase. This will usually match the current tarball version on the GlideinWMS home page. (The osg-release production version may lag behind the tarball release by a few weeks as it is verified and packaged by OSG). Note that this will also take the osg-testing versions of all dependencies as well. yum install --enablerepo=osg-upcoming glideinwms-vofrontend : The most recent development series release, i.e. version 3.3 release. This has newer features such as cloud submission support, but is less tested. Note that these commands will install default HTCondor configurations with all GlideinWMS services on one node.","title":"Installing the VO Frontend RPM"},{"location":"other/install-gwms-frontend/#installing-glideinwms-frontend-on-multiple-nodes-advanced","text":"For advanced users expecting heavy usage on their access point, you may want to consider splitting the user collector, user submit, and vo frontend services. This can be doing using the following three commands (on different machines): root@host # yum install glideinwms-vofrontend-standalone root@host # yum install glideinwms-usercollector root@host # yum install glideinwms-userschedd In addition, you will need to perform the following steps: On the vofrontend and userschedd, modify CONDOR_HOST to point to your usercollector. This is in /etc/condor/config.d/00_gwms_general.config . You can also override this value by placing it in a new config file. (For instance, /etc/condor/config.d/99_local_custom.config to avoid rpmsave/rpmnew conflicts on upgrades). In /etc/condor/certs/condor_mapfile , you will need to add the DNs of each machine (userschedd, usercollector, vofrontend). Take great care to escape all special characters. Alternatively, you can use the glidecondor_addDN to add these values. In the /etc/gwms-frontend/frontend.xml file, change the schedd locations to match the correct server. Also change the collectors tags at the bottom of the file. More details on frontend.xml are in the following sections.","title":"Installing GlideinWMS Frontend on Multiple Nodes (Advanced)"},{"location":"other/install-gwms-frontend/#configuring-glideinwms-frontend","text":"After installing the RPM, you need to configure the components of the GlideinWMS VO Frontend: Edit Frontend configuration options Edit HTCondor configuration options Create a HTCondor grid map file Reconfigure and Start the Frontend","title":"Configuring GlideinWMS Frontend"},{"location":"other/install-gwms-frontend/#configuring-the-frontend","text":"The VO Frontend configuration file is /etc/gwms-frontend/frontend.xml . The next steps will describe each line that you will need to edit if you are using the OSG Factory at UCSD. The portions to edit are highlighted. If you are using a different Factory more changes are necessary, please check the VO Frontend configuration reference. The VO you are affiliated with. This will identify those CEs that the GlideinWMS pilot will be authorized to run on using the pilot proxy described previously in this section . Sometimes the whole query_expr is provided to you by the Factory operators (see Factory access above): <factory query_expr= '((stringListMember(\"VO\", GLIDEIN_Supported_VOs)))' > Factory collector information. The username that you are assigned by the Factory (also called the identity you will be mapped to on the factory, see above) . Note that if you are using a factory different than the production Factory, you will have to change also DN , factory_identity and node attributes. (refer to the information provided to you by the Factory operator): <collector DN= \"/DC=org/DC=doegrids/OU=Services/CN=gfactory-1.t2.ucsd.edu\" comment= \"Define factory collector globally for simplicity\" factory_identity= \"gfactory@gfactory-1.t2.ucsd.edu\" my_identity= \"username@gfactory-1.t2.ucsd.edu\" node= \"gfactory-1.t2.ucsd.edu\" /> Frontend security information. The classad_proxy in the security entry is the location of the VO Frontend proxy described previously here . The proxy_DN is the DN of the classad_proxy above. The security_name identifies this VO Frontend to the the Factory, It is provided by the Factory operator. The absfname in the credential entry is the location of the GlideinWMS pilot proxy described in the requirements section here . There can be multiple pilot proxies, or even other kind of keys (e.g. if you use cloud resources). The type and trust_domain of the credential must match respectively auth_method and trust_domain used in the entry definition in the Factory. If there is no match, between these two attributes in one of the credentials and the corresponding ones in some entry in one of the Factories, then this Frontend cannot trigger glideins. Both the classad_proxy and absfname files should be owned by frontend user. <security classad_proxy= \"/tmp/vo_proxy\" proxy_DN= \"DN of vo_proxy\" proxy_selection_plugin= \"ProxyAll\" security_name= \"The security name, this is used by factory\" sym_key= \"aes_256_cbc\" > <credentials> <credential absfname= \"/tmp/pilot_proxy\" security_class= \"frontend\" trust_domain= \"OSG\" type= \"grid_proxy\" /> </credentials> </security> The schedd information. The DN of the VO Frontend Proxy described previously here . The fullname attribute is the fully qualified domain name of the host where you installed the VO Frontend ( hostname --fqdn ). A secondary schedd is optional. You will need to delete the secondary schedd line if you are not using it. Multiple schedds allow the Frontend to service requests from multiple access points. <schedds> <schedd DN= \"Cert DN used by the schedd at fullname:\" fullname= \"Hostname of the schedd\" /> <schedd DN= \"Cert DN used by the second Schedd at fullname:\" fullname= \"schedd name@Hostname of second schedd\" /> </schedds> The User Collector information. The DN of the VO Frontend Proxy described previously here . The node attribute is the full hostname of the collectors ( hostname --fqdn ) and port The secondary attribute indicates whether the element is for the primary or secondary collectors (True/False). The default HTCondor configuration of the VO Frontend starts multiple Collector processes on the host ( /etc/condor/config.d/11_gwms_secondary_collectors.config ). The DN and hostname on the first line are the hostname and the host certificate of the VO Frontend. The DN and hostname on the second line are the same as the ones in the first one. The hostname (e.g. hostname.domain.tld) is filled automatically during the installation. The secondary collector connection can be defined as sinful string for the sock case , e.g., hostname.domain.tld:9618?sock=collector16. [Example 1] :::xml <collector DN=\"DN of main collector\" node=\"hostname.domain.tld:9618\" secondary=\"False\"/> <collector DN=\"DN of secondary collectors (usually same as DN in line above)\" node=\"hostname.domain.tld:9620-9660\" secondary=\"True\"/> Note In GlideinWMS v3.4.1, shared port only configuration is incompatible if talking to older Factories (v3.4 or older). We strongly recommend any user of GlideinWMS Frontend v3.4.1 or newer, to transition to the use of shared port for secondary collectors and CCBs. The shared port configuration is incompatible if your Frontend is talking to Factories v3.4 or older and you'll get an error telling you to wait. To transition to the use of shared port for secondary collectors, you have to change the collectors section in the Frontend configuration. If you are using the default port range for the secondary collectors as shown in [Example 2] below, then you should replace it with port 9618 and the sock-range as shown in [Example 1] above. If you have a more complex configuration, please read the detailed GlideinWMS configuration [Example 2] :::xml <collector DN=\"DN of main collector\" node=\"hostname.domain.tld:9618\" secondary=\"False\"/> <collector DN=\"DN of secondary collectors (usually same as DN in line above)\" node=\u201chostname.domain.tld:9618?sock=collector0-40\" secondary=\"True\"/> The CCBs information. If you have a different configuration of the HTCondor Connection Brokering (CCB servers) from the default (usually the section is empty as the User Collectors acts as CCB if needed), you can set the connection in the CCB section the same way that User Collector information previously mentioned. Also, the same rules for transition to shared_port of the connections, apply to the CCBs. :::xml <ccb DN=\"DN of the CCB server\" node=\"hostname.domain.tld:9618\"/> <ccb DN=\"DN of the CCB server\" node=\u201chostname.domain.tld:9618?sock=collector0-40\" secondary=\"True\"/> Warning The Frontend configuration includes many knobs, some of which are conflicting with a RPM installation where there is only one version of the Frontend installed and it uses well known paths. Do not change the following in the Frontend configuration (you must leave the default values coming with the RPM installation): frontend_versioning='False' (in the first line of XML, versioning is useful to install multiple tarball versions) for RPM installs, work base_dir must be /var/lib/gwms-frontend/vofrontend/ (other scripts like /etc/init.d/gwms-frontend count on that value)","title":"Configuring the Frontend"},{"location":"other/install-gwms-frontend/#using-a-different-factory","text":"The configuration above points to the OSG production Factory. If you are using a different Factory, then you have to: replace gfactory@gfactory-1.t2.ucsd.edu and gfactory-1.t2.ucsd.edu with the correct values for your Factory. And control also that the name used for the Frontend () matches. make sure that the Factory is advertising the attributes used in the Factory query expression ( query_expr ).","title":"Using a Different Factory"},{"location":"other/install-gwms-frontend/#configuring-htcondor","text":"The HTCondor configuration for the Frontend is placed in /etc/condor/config.d . 00_gwms_general.config 01_gwms_collectors.config 02_gwms_schedds.config 03_gwms_local.config 11_gwms_secondary_collectors.config 90_gwms_dns.config For most installations create a new file named /etc/condor/config.d/92_local_condor.config","title":"Configuring HTCondor"},{"location":"other/install-gwms-frontend/#using-other-htcondor-rpms-eg-uw-madison-htcondor-rpm","text":"The above procedure will work if you are using the OSG HTCondor RPMS. You can verify that you used the OSG HTCondor RPM by using yum list condor . The version name should include \"osg\", e.g. 8.6.4-3.osg.el7 . If you are using the UW Madison HTCondor RPMS, be aware of the following changes: This HTCondor RPM uses a file /etc/condor/condor_config.local to add your local machine slot to the user pool. If you want to disable this behavior (recommended), you should blank out that file or comment out the line in /etc/condor/condor_config for LOCAL_CONFIG_FILE. (Make sure that LOCAL_CONFIG_DIR is set to /etc/condor/config.d ) Note that the variable LOCAL_DIR is set differently in UW Madison and OSG RPMs. This should not cause any more problems in the GlideinWMS RPMs, but please take note if you use this variable in your job submissions or other customizations. In general if you are using a non OSG RPM or if you added custom configuration files for HTCondor please check the order of the configuration files: root@host # condor_config_val -config Configuration source: /etc/condor/condor_config Local configuration sources: /etc/condor/config.d/00_gwms_general.config /etc/condor/config.d/01_gwms_collectors.config /etc/condor/config.d/02_gwms_schedds.config /etc/condor/config.d/03_gwms_local.config /etc/condor/config.d/11_gwms_secondary_collectors.config /etc/condor/config.d/90_gwms_dns.config /etc/condor/condor_config.local If, like in the example above, the GlideinWMS configuration files are not the last ones in the list please verify that important configuration options have not been overridden by the other configuration files.","title":"Using other HTCondor RPMs, e.g. UW Madison HTCondor RPM"},{"location":"other/install-gwms-frontend/#verifying-your-htcondor-configuration","text":"The GlideinWMS configuration files in /etc/condor/config.d should be the last ones in the list. If not, please verify that important configuration options have not been overridden by the other configuration files. Verify the alll the expected HTCondor daemons are running: root@host # condor_config_val -verbose DAEMON_LIST DAEMON_LIST: MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, SHARED_PORT, COLLECTOR0 COLLECTOR1 COLLECTOR2 COLLECTOR3 COLLECTOR4 COLLECTOR5 COLLECTOR6 COLLECTOR7 COLLECTOR8 COLLECTOR9 COLLECTOR10 , COLLECTOR11, COLLECTOR12, COLLECTOR13, COLLECTOR14, COLLECTOR15, COLLECTOR16, COLLECTOR17, COLLECTOR18, COLLECTOR19, COLLECTOR20, COLLECTOR21, COLLECTOR22, COLLECTOR23, COLLECTOR24, COLLECTOR25, COLLECTOR26, COLLECTOR27, COLLECTOR28, COLLECTOR29, COLLECTOR30, COLLECTOR31, COLLECTOR32, COLLECTOR33, COLLECTOR34, COLLECTOR35, COLLECTOR36, COLLECTOR37, COLLECTOR38, COLLECTOR39, COLLECTOR40 Defined in '/etc/condor/config.d/11_gwms_secondary_collectors.config', line 193. If you don't see all the collectors. shared port and the two schedd, then the configuration must be corrected. There should be no startd daemons listed.","title":"Verifying your HTCondor configuration"},{"location":"other/install-gwms-frontend/#creating-a-htcondor-grid-mapfile","text":"The HTCondor mapfile ( /etc/condor/certs/condor_mapfile ) is used for authentication between the GlideinWMS pilot running on a remote worker node, and the local collector. HTCondor uses the mapfile to map certificates to pseudo-users on the local machine. It is important that you map the DN's of: Each schedd proxy : The DN of each schedd that the frontend talks to. Specified in the frontend.xml schedd element DN attribute: <schedds> <schedd DN= \"/DC=org/DC=doegrids/OU=Services/CN=YOUR_HOST\" fullname= \"YOUR_HOST\" /> <schedd DN= \"/DC=org/DC=doegrids/OU=Services/CN=YOUR_HOST\" fullname= \"schedd_jobs2@YOUR_HOST\" /> </schedds> Frontend proxy : The DN of the proxy that the Frontend uses to communicate with the other GlideinWMS services. Specified in the frontend.xml security element proxy_DN attribute: <security classad_proxy= \"/tmp/vo_proxy\" proxy_DN= \"DN of vo_proxy\" .... Each pilot proxy The DN of each proxy that the frontend forwards to the factory to use with the GlideinWMS pilots. This allows the GlideinWMS pilot jobs to communicate with the User Collector. Specified in the frontend.xml proxy absfname attribute (you need to specify the DN of each of those proxies: <security .... <proxies > < proxy absfname= \"/tmp/vo_proxy\" .... : </proxies > Below is an example mapfile, by default found in /etc/condor/certs/condor_mapfile . In this example there are lines for each of services mentioned above. GSI \"<DN OF SCHEDD PROXY>\" schedd GSI \"<DN OF FRONTEND PROXY>\" frontend GSI \"<DN OF PILOT PROXY>\" pilot_proxy GSI \"^/DC=org/DC=doegrids/OU=Services/CN=personal-submit-host2.mydomain.edu$\" <example_of_format> GSI (.*) anonymous FS (.*) \\1 Change <DN OF SCHEDD PROXY> , <DN OF FRONTEND PROXY> , and <DN OF PILOT PROXY> to the distinguished names of the respective proxies.","title":"Creating a HTCondor grid mapfile."},{"location":"other/install-gwms-frontend/#restarting-htcondor","text":"After configuring HTCondor, be sure to restart HTCondor: root@host # service condor restart","title":"Restarting HTCondor"},{"location":"other/install-gwms-frontend/#proxy-configuration","text":"GlideinWMS comes with the gwms-renew-proxies service that can automatically generate and renew the pilot proxies and VO Frontend proxy . To configure this service, modify /etc/gwms-frontend/proxies.ini using the following instructions: For each of your pilot proxies , create a [PILOT <NAME>] section, where <NAME> is a descriptive name for the proxy that is unique to your local configuration. In each section, set the proxy_cert , proxy_key , output , and vo corresponding to each pilot proxy: [PILOT <NAME>] proxy_cert = <PATH TO THE PILOT CERTIFICATE> proxy_key = <PATH TO THE PILOT KEY> output = <PATH TO CREATE THE PILOT PROXY> vo = <NAME OF VIRTUAL ORGANIZATION> Change <PATH TO THE PILOT CERTIFICATE> , <PATH TO THE PILOT KEY> and <PATH TO CREATE THE PILOT PROXY> appropriately to point to the locations of the pilot certificate, pilot key, and pilot proxy, respectively. Additionally, in each [PILOT <NAME>] section, you must specify how the proxy's VOMS attributes will be signed by setting use_voms_server . Choose one of the following options: To directly sign the VOMS attributes (recommended), you must have access to the vo 's certificate and key. Specify the paths to the vo certificate and key, and optionally, the VOMS attribute (e.g. /osg/Role=NULL/Capability=NULL for the OSG VO): use_voms_server = false vo_cert = <PATH TO THE PILOT CERTIFICATE> vo_key = <PATH TO THE PILOT KEY> fqan = <VOMS ATTRIBUTE> Note If you do not have access to the vo 's voms_cert and voms_key , contact the VO manager. To have your proxy's VOMS attributes signed by the vo 's VOMS server, set use_voms_server = true and the VOMS attribute (e.g. /osg/Role=NULL/Capability=NULL for the OSG VO): use_voms_server = true fqan = <VOMS ATTRIBUTE> Warning Due to the retirement of VOMS Admin server in the OSG, use_voms_server = false is the preferred method for signing VOMS attributes. Optionally, the proxy renewal frequency and lifetime (in hours) can be specified in each [PILOT <NAME>] section: # Default: 1 frequency = <RENEWAL FREQUENCY> # Default: 24 lifetime = <PROXY LIFETIME> Configure the location and output of the VO Frontend proxy under the [FRONTEND] section and set the proxy_cert , proxy_key , and output to paths corresponding to your VO Frontend: [FRONTEND] proxy_cert = <PATH TO THE FRONTEND CERTIFICATE> proxy_key = <PATH TO THE FRONTEND KEY> output = <PATH TO CREATE THE FRONTEND PROXY> Note output must be the same path as the classad_proxy specified in this section (OPTIONAL) If you are running the gwms-frontend service under a <NON-DEFAULT USER> (default: frontend ), specify the user as the owner of your proxies under the [COMMON] section: [COMMON] owner = <NON-DEFAULT USER> Note The [COMMON] section is required but its contents are optional","title":"Proxy Configuration"},{"location":"other/install-gwms-frontend/#adding-gratia-accounting-and-a-local-monitoring-page-on-a-production-server","text":"You must report accounting information if you are running more than a few test jobs on the OSG . Install the GlideinWMS Gratia Probe on each of your access points in your GlideinWMS installation: root@host # yum install gratia-probe-glideinwms Edit the ProbeConfig located in /etc/gratia/condor/ProbeConfig . First, edit the SiteName and ProbeName to be a unique identifier for your GlideinWMS access point. There can be multiple probes (with different names) per site. If you haven't already, you should register your GlideinWMS access point in OIM . Then you can use the name you used to register the resource. ProbeName=\"condor:<hostname>\" SiteName=\"HCC-GlideinWMW-Frontend\" Next, turn the probe on by setting EnableProbe : EnableProbe=\"1\" Reconfigure HTCondor: root@host # condor_reconfig","title":"Adding Gratia Accounting and a Local Monitoring Page on a Production Server"},{"location":"other/install-gwms-frontend/#optional-accounting-configuration","text":"The following sections contain additional configuration that may be required depending on the customizations you've made to your GlideinWMS frontend installation.","title":"Optional Accounting Configuration"},{"location":"other/install-gwms-frontend/#users-without-certificates","text":"If you have users that submit jobs without a certificate explicitly declared in the submit file, you will need to add MapUnknownToGroup to the ProbeConfig. In the file /etc/gratia/condor/ProbeConfig , add the value after the EnableProbe . ... SuppressGridLocalRecords=\"0\" EnableProbe=\"1\" MapUnknownToGroup=\"1\" Title3=\"Tuning parameter\" ... Further, if you want to record all usage as coming from a single VO, you can configure the probe to override the 'guessed' VO. In the below example, replace <ENGAGE> with a registered VO that you would like to report as. If you don't have a VO that you are affiliated with, you may use \"Engage\". ... MapUnknownToGroup=\"1\" MapGroupToRole=\"1\" VOOverride=\"<ENGAGE>\" ...","title":"Users without Certificates"},{"location":"other/install-gwms-frontend/#non-standard-htcondor-install","text":"If HTCondor is installed in a non-standard location (i.e., not RPMs, or relocated RPM outside /usr/bin ), then you need to tell the probe where to find the HTCondor binaries. This can be done with a script with a special attribute in /etc/gratia/condor/ProbeConfig , CondorLocation . Point it to the location of the HTCondor install, such that CondorLocation/bin/condor_version exists.","title":"Non-Standard HTCondor Install"},{"location":"other/install-gwms-frontend/#new-data-directory","text":"If your PER_JOB_HISTORY_DIR HTCondor configuration variable is different from the default value, you must update the value of DataFolder in /etc/gratia/condor/ProbeConfig . To check the value of PER_JOB_HISTORY_DIR run the following command: user@host $ condor_config_val PER_JOB_HISTORY_DIR","title":"New Data Directory"},{"location":"other/install-gwms-frontend/#different-collector-and-other-customizations","text":"By default the probe reports to the OSG GRACC. To change that you must edit the configuration file, /etc/gratia/condor/ProbeConfig , and replace the OSG production host with your desired one: ... CollectorHost=\"gratia-osg-prod.opensciencegrid.org:80\" SSLHost=\"gratia-osg-prod.opensciencegrid.org:443\" SSLRegistrationHost=\"gratia-osg-prod.opensciencegrid.org:80\" ...","title":"Different collector and other customizations"},{"location":"other/install-gwms-frontend/#optional-configuration","text":"The following configuration steps are optional and will likely not be required for setting up a small site. If you do not need any of the following special configurations, skip to the section on using GlideinWMS . Allow users to specify where their jobs run Creating a group to test configuration changes","title":"Optional Configuration"},{"location":"other/install-gwms-frontend/#allowing-users-to-specify-where-their-jobs-run","text":"In order to allow users to specify the sites at which their jobs want to run (or to test a specific site), a Frontend can be configured to match on DESIRED_Sites or ignore it if not specified. Modify /etc/gwms-frontend/frontend.xml using the following instructions: In the Frontend's global <match> stanza, set the match_expr : '((job.get(\"DESIRED_Sites\",\"nosite\")==\"nosite\") or (glidein[\"attrs\"][\"GLIDEIN_Site\"] in job.get(\"DESIRED_Sites\",\"nosite\").split(\",\")))' In the same <match> stanza, set the start_expr : '(DESIRED_Sites=?=undefined || stringListMember(GLIDEIN_Site,DESIRED_Sites,\",\")) Add the DESIRED_Sites attribute to the match attributes list: <match_attrs> <match_attr name= \"DESIRED_Sites\" type= \"string\" /> </match_attrs> Reconfigure the Frontend: root@host # /etc/init.d/gwms-frontend reconfig","title":"Allowing users to specify where their jobs run"},{"location":"other/install-gwms-frontend/#creating-a-group-for-testing-configuration-changes","text":"To perform configuration changes without impacting production the recommended way is to create an ITB group in /etc/gwms-frontend/frontend.xml . This groupwould only match jobs that have the +is_itb=True ClassAd. Create a group named itb. Set the group's start_expr so that the group's glideins will only match user jobs with +is_itb=True : <match match_expr= \"True\" start_expr= \"(is_itb)\" > Set the factory_query_expr so that this group only communicates with ITB factories: <factory query_expr= 'FactoryType=?=\"itb\"' > Set the group's collector stanza to reference the ITB factory, replacing username@gfactory-1.t2.ucsd.edu with your factory identity: <collector DN= \"/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=glidein-itb.grid.iu.edu\" \\ factory_identity= \"gfactory@glidein-itb.grid.iu.edu\" \\ my_identity= \"username@gfactory-1.t2.ucsd.edu\" \\ node= \"glidein-itb.grid.iu.edu\" /> Set the job query_expr so that only ITB jobs appear in condor_q : <job query_expr= \"(!isUndefined(is_itb) && is_itb)\" > Reconfigure the Frontend (see the section below ): # on EL7 systems systemctl reload gwms-frontend","title":"Creating a group for testing configuration changes"},{"location":"other/install-gwms-frontend/#using-glideinwms","text":"","title":"Using GlideinWMS"},{"location":"other/install-gwms-frontend/#managing-glideinwms-services","text":"In addition to the GlideinWMS service itself, there are a number of supporting services in your installation. The specific services are: Software Service name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info Gratia gratia-probes-cron Accounting software HTCondor condor HTTPD httpd GlideinWMS monitoring and staging GlideinWMS gwms-renew-proxies.timer Automatic proxy renewal gwms-frontend The main GlideinWMS service Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... Run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME>","title":"Managing GlideinWMS Services"},{"location":"other/install-gwms-frontend/#reconfiguring-glideinwms","text":"After changing the configuration of GlideinWMS, run the following command as root : root@host # systemctl reload gwms-frontend Note Note that systemctl reload gwms-frontend will work only if: - gwms-frontend service is running - gwms-frontend service was started with systemctl Otherwise, you will get the following error in any of the cases: # systemctl reload gwms-frontend Job for gwms-frontend.service invalid.","title":"Reconfiguring GlideinWMS"},{"location":"other/install-gwms-frontend/#upgrading-glideinwms-frontend","text":"After upgrading the GlideinWMS RPM, you must issue an upgrade command to GlideinWMS: Stop the condor and gwms-frontend services as specified in this section Issue the upgrade command: root@host # /usr/sbin/gwms-frontend upgrade Start the condor and gwms-frontend services as specified in this section","title":"Upgrading GlideinWMS FrontEnd"},{"location":"other/install-gwms-frontend/#validating-glideinwms-frontend","text":"The complete validation of the Frontend is the submission of actual jobs. However, there are a few things that can be checked prior to submitting user jobs to HTCondor.","title":"Validating GlideinWMS Frontend"},{"location":"other/install-gwms-frontend/#verifying-services-are-running","text":"There are a few things that can be checked prior to submitting user jobs to HTCondor. Verify all HTCondor daemons are started. user@host $ condor_config_val -verbose DAEMON_LIST DAEMON_LIST: MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, SHARED_PORT, SCHEDDJOBS2 COLLECTOR0 COLLECTOR1 COLLECTOR2 COLLECTOR3 COLLECTOR4 COLLECTOR5 COLLECTOR6 COLLECTOR7 COLLECTOR8 COLLECTOR9 COLLECTOR10 , COLLECTOR11, COLLECTOR12, COLLECTOR13, COLLECTOR14, COLLECTOR15, COLLECTOR16, COLLECTOR17, COLLECTOR18, COLLECTOR19, COLLECTOR20, COLLECTOR21, COLLECTOR22, COLLECTOR23, COLLECTOR24, COLLECTOR25, COLLECTOR26, COLLECTOR27, COLLECTOR28, COLLECTOR29, COLLECTOR30, COLLECTOR31, COLLECTOR32, COLLECTOR33, COLLECTOR34, COLLECTOR35, COLLECTOR36, COLLECTOR37, COLLECTOR38, COLLECTOR39, COLLECTOR40 Defined in '/etc/condor/config.d/11_gwms_secondary_collectors.config', line 193. If you don't see all the collectors and the two schedds , then the configuration must be corrected. There should be no startd daemons listed Verify all VO Frontend HTCondor services are communicating. user@host $ condor_status -any MyType TargetType Name glideresource None MM_fermicloud026@gfactory_inst Scheduler None fermicloud020.fnal.gov DaemonMaster None fermicloud020.fnal.gov Negotiator None fermicloud020.fnal.gov Collector None frontend_service@fermicloud020.fnal.gov Scheduler None schedd_jobs2@fermicloud020.fnal.gov To see the details of the glidein resource use condor_status -subsystem glideresource -l , including the GlideFactoryName. Verify that the Factory is seeing correctly the Frontend using condor_status -pool <FACTORY_HOST> -any -constraint 'FrontendName==\"<FRONTEND_NAME_FROM_CONFIG>\"' -l , including the GlideFactoryName. Where <FACTORY_HOST> is the hostname of the factory being used, for example: gfactory-1.t2.ucsd.edu and is the value set for \"frontend_name\" in the frontend.xml file","title":"Verifying Services Are Running"},{"location":"other/install-gwms-frontend/#glideinwms-job-submission","text":"HTCondor submit file glidein-job.sub . This is a simple job printing the hostname of the host where the job is running: #file glidein-job.sub universe = vanilla executable = /bin/hostname output = glidein/test.out error = glidein/test.err requirements = IS_GLIDEIN == True log = glidein/test.log ShouldTransferFiles = YES when_to_transfer_output = ON_EXIT queue To submit the job: root@host # condor_submit glidein-job.sub Then you can control the job like a normal HTCondor job, e.g. to check the status of the job use condor_q .","title":"GlideinWMS Job submission"},{"location":"other/install-gwms-frontend/#monitoring-web-pages","text":"You should be able to see the jobs also in the GlideinWMS monitoring pages that are made available on the Web: http://gwms-frontend-host.domain/vofrontend/monitor/","title":"Monitoring Web pages"},{"location":"other/install-gwms-frontend/#troubleshooting-glideinwms","text":"","title":"Troubleshooting GlideinWMS"},{"location":"other/install-gwms-frontend/#file-locations","text":"File Description File Location Configuration file /etc/gwms-frontend/frontend.xml Logs /var/log/gwms-frontend/ Startup script /usr/bin/gwms-frontend Web Directory /var/lib/gwms-frontend/web-area Web Base /var/lib/gwms-frontend/web-base Web configuration /etc/httpd/conf.d/gwms-frontend.conf Working Directory /var/lib/gwms-frontend/vofrontend/ Lock files /var/lib/gwms-frontend/vofrontend/lock/frontend.lock /var/lib/gwms-frontend/vofrontend/group_*/lock/frontend.lock Status files /var/lib/gwms-frontend/vofrontend/monitor/group_*/frontend_status.xml Note /var/lib/gwms-frontend is also the home directory of the frontend user","title":"File Locations"},{"location":"other/install-gwms-frontend/#certificates-brief","text":"Here a short list of files to check when you change the certificates. Note that if you renew a proxy or certificate and the DN remains the same no configuration file needs to change, just put the renewed certificate/proxy in place. File Description File Location Configuration file /etc/gwms-frontend/frontend.xml HTCondor certificates map /etc/condor/certs/condor_mapfile (1) Host certificate and key (2) /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem VO Frontend proxy (from host certificate) /tmp/vofe_proxy (3) Pilot proxy /tmp/pilot_proxy (3) If using HTCondor RPM installation, e.g. the one coming from OSG. If you have separate/multiple HTCondor hosts (schedds, collectors, negotiators, ..) you may have to check this file on all of them to make sure that the HTCondor authentication works correctly. Used to create the VO Frontend proxy if following the instructions above If using the Frontend configuration and scripts described above in this document . These paths are the ones specified in the configuration file. Remember also that when you change DN: The VO Frontend certificate DN must be communicated to the GlideinWMS Factory ( see above ) The pilot proxy must be able to run jobs at the sites you are using, e.g. by being added to the correct VO in OSG (the Factory forwards the proxy and does not care about the DN)","title":"Certificates brief"},{"location":"other/install-gwms-frontend/#increase-the-log-level-and-change-rotation-policies","text":"You can increase the log level of the frontend. To add a log file with all the log information add the following line with all the message types in the process_log section of /etc/gwms-frontend/frontend.xml : <log_retention> <process_logs> <process_log extension= \"all\" max_days= \"7.0\" max_mbytes= \"100.0\" min_days= \"3.0\" msg_types= \"DEBUG,EXCEPTION,INFO,ERROR,ERR\" /> You can also change the rotation policy and choose whether compress the rotated files, all in the same section of the config files: max_bytes is the max size of the log files max_days it will be rotated. compression specifies if rotated files are compressed backup_count is the number of rotated log files kept Further details are in the reference documentation .","title":"Increase the log level and change rotation policies"},{"location":"other/install-gwms-frontend/#frontend-reconfig-failing","text":"If service gwms-frontend reconfig fails at the end with an error like \"Writing back config file failed, Reconfiguring the frontend [FAILED]\", make sure that /etc/gwms-frontend/ belongs to the frontend user. It must be able to write to update the configuration file.","title":"Frontend reconfig failing"},{"location":"other/install-gwms-frontend/#frontend-failing-to-start","text":"If the startup script of the frontend is failing, check the log file for errors (probably /var/log/gwms-frontend/frontend/frontend.<TODAY's DATE>.err.log and .debug.log ). If you find errors like \"Exception occurred: ... 'ExpatError: no element found: line 1, column 0\\n']\" and \"IOError: [Errno 9] Bad file descriptor\" you may have an empty status file ( /var/lib/gwms-frontend/vofrontend/monitor/group_*/frontend_status.xml ) that causes GlideinWMS Frontend not to start. The glideinFrontend crashes after a XML parsing exception visible in the log file (\"Exception occurred: ... 'ExpatError: no element found: line 1, column 0\\n']\"). Remove the status file. Then start the frontend. The Frontend will be fixed in future versions to handle this automatically.","title":"Frontend failing to start"},{"location":"other/install-gwms-frontend/#certificates-not-there","text":"The scripts should send an email warning if there are problems and they fail to generate the proxies. Anyway something could go wrong and you want to check manually. If you are using the scripts to generate automatically the proxies but the proxies are not there (in /tmp or wherever you expect them): make sure that the scripts are there and configured with the correct values make sure that the scripts are executable make sure that the scripts are in frontend 's crontab make sure that the certificates (or master proxy) used to generate the proxies is not expired","title":"Certificates not there"},{"location":"other/install-gwms-frontend/#failed-authentication","text":"If you get a failed authentication error (e.g. \"Failed to talk to factory_pool gfactory-1.t2.ucsd.edu...) then: check that you have the right x509 certificates mentioned in the security section of /etc/gwms-frontend/frontend.xml the owner must be frontend (user running the frontend) the permission must be 600 they must be valid for more than one hour (2/300 hours), at least the non VO part check that the clock is synchronized (see HostTimeSetup)","title":"Failed authentication"},{"location":"other/install-gwms-frontend/#frontend-doesnt-trust-factory","text":"If your frontend complains in the debug log: code 256:['Error: communication error\\n', 'AUTHENTICATE:1003:Failed to authenticate with any method\\n', 'AUTHENTICATE:1004:Failed to authenticate using GSI\\n', \"GSI:5006:Failed to authenticate because the subject '/DC=org/DC=doegrids/OU=Services/CN=devg-3.t2.ucsd.edu' is not currently trusted by you. If it should be, add it to GSI_DAEMON_NAME in the condor_config, or use the environment variable override (check the manual).\\n\", 'GSI:5004:Failed to gss_assist_gridmap /DC=org/DC=doegrids/OU=Services/CN=devg-3.t2.ucsd.edu to a local user. A possible solution is to comment/remove the LOCAL_CONFIG_DIR in the file /var/lib/gwms-frontend/vofrontend/frontend.condor_config .","title":"Frontend doesn't trust Factory"},{"location":"other/install-gwms-frontend/#no-security-credentials-match-for-factory-pool-not-advertising-request","text":"You may see a warning like \"No security credentials match for factory pool ..., not advertising request\", if the trust_domain and auth_method of an entry in the Factory configuration is not matching any of the trust_domain , type couples in the credentials in the Frontend configuration. This causes the Frontend not to use some Factory entries (the ones not matching) and may end up without entries to send glideins to. To fix the problem make sure that those attributes match as desired.","title":"No security credentials match for factory pool ..., not advertising request"},{"location":"other/install-gwms-frontend/#jobs-not-running","text":"If your jobs remain Idle Check the frontend log files (see above) Check the HTCondor log files ( condor_config_val LOG will give you the correct log directory): Specifically look the CollectorXXXLog files Common causes of problems could be: x509 certificates missing or expired or too short-lived proxy incorrect ownership or permission on the certificate/proxy file missing certificates If the Frontend http server is down in the glidein logs in the Factory there will be errors like \"Failed to load file 'description.dbceCN.cfg' from http://FRONTEND_HOST/vofrontend/stage .\" check that the http server is running and you can reach the URL ( http://FRONTEND_HOST/vofrontend/stage/description.dbceCN.cfg )","title":"Jobs not running"},{"location":"other/install-gwms-frontend/#getting-help","text":"To get assistance about the OSG software please use this page . For specific questions about the Frontend configuration (and how to add it in your HTCondor infrastructure) you can email the glideinWMS support glideinwms-support@fnal.gov To request access the OSG Glidein Factory (e.g. the UCSD factory) you have to send an email to osg-gfactory-support@physics.ucsd.edu (see below).","title":"Getting Help"},{"location":"other/install-gwms-frontend/#references","text":"Definitions: What is a Virtual Organization Documents about the Glidein-WMS system and the VO frontend: http://glideinwms.fnal.gov/","title":"References"},{"location":"other/install-gwms-frontend/#users","text":"The Glidein WMS Frontend installation will create the following users unless they are already created. User Default uid Comment apache 48 Runs httpd to provide the monitoring page (installed via dependencies). condor none HTCondor user (installed via dependencies). frontend none This user runs the glideinWMS VO frontend. It also owns the credentials forwarded to the factory to use for the glideins. gratia none Runs the Gratia probes to collect accounting data (optional see the Gratia section below ) Warning UID 48 is reserved by RedHat for user apache . If it is already taken by a different username, you will experience errors.","title":"Users"},{"location":"other/install-gwms-frontend/#certificates","text":"This document has a proxy configuration section that uses the host certificate/key and a user certificate to generate the required proxies. Certificate User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem Host key root /etc/grid-security/hostkey.pem Here are instructions to request a host certificate.","title":"Certificates"},{"location":"other/install-gwms-frontend/#networking","text":"Service Name Protocol Port Number Inbound Outbound Comment HTCondor port range tcp LOWPORT, HIGHPORT YES contiguous range of ports GlideinWMS Frontend tcp 9618, 9620 to 9660 YES HTCondor Collectors for the GlideinWMS Frontend (received ClassAds from resources and jobs) The VO frontend must have reliable network connectivity, be on the public internet (no NAT), and preferably with no firewalls. Incoming TCP ports 9618 to 9660 must be open.","title":"Networking"},{"location":"other/osg-token-renewer/","text":"Installing and Using the OSG Token Renewal Service \u00b6 This document contains instructions to install and configure the OSG Token Renewal Service package, osg-token-renewer , for obtaining and automatically renewing tokens with oidc-agent . Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Reference section below as needed): An account is needed with an OIDC token issuer that offers the device flow User and Group IDs: If they do not exist already, the installation will create the Linux user and group named osg-token-svc As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Installing the OSG Token Renewal Service \u00b6 Install the OSG Token Renewal Service package: root@server # yum install osg-token-renewer This will install the osg-token-renewer scripts & systemd service files, and will pull in the oidc-agent package that the service depends on. Configuring the OSG Token Renewal Service \u00b6 Configuring accounts \u00b6 To create a new client account named <ACCOUNT_SHORTNAME> : Create a corresponding file named /etc/osg/tokens/<ACCOUNT_SHORTNAME>.pw with the encryption password to use for this client account. Consult the Requesting Tokens document to determine which scopes you will need for this client account. Run the setup script as follows: root@server # osg-token-renewer-setup <ACCOUNT_SHORTNAME> For example, root@server # osg-token-renewer-setup myaccount123 That will use dynamic client registration. If you instead have a predefined client id and secret, add a --manual option, for example, :::console root@server # osg-token-renewer-setup --manual myaccount123 When prompted, enter your Issuer and desired scopes for this account from the list of valid options. If you used --manual , also enter the client id and secret. You will also be prompted on the console to visit a web link to authorize the client request with a passcode printed on the console. Follow the prompts (visit the web link, enter the request passcode, log in with your account for your issuer, and authorize the request). If this succeeds, you will be prompted with a new [account <ACCOUNT_SHORTNAME>] section to add to your config.ini . Add the section to your /etc/osg/token-renewer/config.ini , replacing the example section if it's still there. Next you can configure one or more tokens for this client account. Configuring tokens \u00b6 After you have created an OIDC client account and added it to /etc/osg/token-renewer/config.ini , you need to create a corresponding token section in the config for each token that should be generated for this account (possibly with different options). Choose a <TOKEN_NAME> and add a new [token <TOKEN_NAME>] section (replacing the example section if it's still there). The account option in this section must match the <ACCOUNT_SHORTNAME> for the corresponding [account <ACCOUNT_SHORTNAME>] section. Set the token_path to /etc/osg/tokens/<ACCOUNT_SHORTNAME>.<TOKEN_NAME>.token Optionally, you may also specify any of the following options, which can limit the respective values in the generated token compared to the associated account: Option Description audience list of audiences (see RFC7519 ) scope list of scopes min_lifetime min token lifetime in seconds Note For tokens used against an HTCondor-CE, set the audience option to <CE FQDN>:<CE PORT> . Example configuration \u00b6 [account myclient1234] password_file = /etc/osg/tokens/myclient1234.pw [token mytoken567] account = myclient1234 token_path = /etc/osg/tokens/myclient1234.mytoken567.token Adjusting token renewal frequency \u00b6 It is possible to override the default osg-token-renewer systemd timer frequency for this service by creating a config override file under /etc/systemd/system/osg-token-renewer.timer.d/ . For example, to configure the token renewal service to run every 10 minutes, run the following: root@host # cat << EOF > /etc/systemd/system/osg-token-renewer.timer.d/timer-frequency.conf [Timer] OnBootSec=10min OnUnitActiveSec=10min EOF root@host # systemctl daemon-reload Note Be aware that the default timer configuration also has a 3 minute random delay built in, via the parameter RandomizedDelaySec=3min . Thus setting the frequency to 10min only guarantees runs every 13 minutes. This parameter is also configurable in the above systemd override file. Managing the OSG Token Renewal Service \u00b6 These services are managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ) for EL7: To... On EL7, run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME> Token renewal services \u00b6 Software Service name Notes OSG Token Renewer osg-token-renewer.service The OSG Token Renewer, runs as a \"oneshot\" service, not a daemon. OSG Token Renewer timer osg-token-renewer.timer Timer to run the OSG Token Renewer every 15 minutes The OSG token renewal service is set to run via a systemd timer every 15 minutes. After configuring your account(s) and token(s), enable the timer with: root@host # systemctl enable osg-token-renewer.timer root@host # systemctl start osg-token-renewer.timer If you would like to run the service manually at a different time (e.g., to generate all the tokens immediately), you can run the service once with: root@host # systemctl start osg-token-renewer.service If this succeeds, the new token will be written to the location you configured for token_path ( /etc/osg/tokens/<ACCOUNT_SHORTNAME>.token , by convention). Failures can be diagnosed by running: root@host # journalctl -eu osg-token-renewer Help \u00b6 To get assistance please use this Help Procedure . Reference \u00b6 Files of interest \u00b6 Path Description /etc/osg/token-renewer/config.ini Main config file for service /etc/osg/tokens/<ACCOUNT_SHORTNAME>.pw Encryption password file for client account /etc/osg/tokens/<ACCOUNT_SHORTNAME>.<TOKEN_NAME>.token Output location for token files /usr/sbin/osg-token-renewer-setup Setup script for each new client account /usr/lib/systemd/system/osg-token-renewer.service SystemD service unit configuruation /usr/lib/systemd/system/osg-token-renewer.timer SystemD timer for service /usr/libexec/osg-token-renewer/osg-token-renewer.sh Main wrapper script invoked by service /usr/libexec/osg-token-renewer/osg-token-renewer Token renewal program invoked by main wrapper","title":"Install OSG Token Renewal Service"},{"location":"other/osg-token-renewer/#installing-and-using-the-osg-token-renewal-service","text":"This document contains instructions to install and configure the OSG Token Renewal Service package, osg-token-renewer , for obtaining and automatically renewing tokens with oidc-agent .","title":"Installing and Using the OSG Token Renewal Service"},{"location":"other/osg-token-renewer/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Reference section below as needed): An account is needed with an OIDC token issuer that offers the device flow User and Group IDs: If they do not exist already, the installation will create the Linux user and group named osg-token-svc As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories","title":"Before Starting"},{"location":"other/osg-token-renewer/#installing-the-osg-token-renewal-service","text":"Install the OSG Token Renewal Service package: root@server # yum install osg-token-renewer This will install the osg-token-renewer scripts & systemd service files, and will pull in the oidc-agent package that the service depends on.","title":"Installing the OSG Token Renewal Service"},{"location":"other/osg-token-renewer/#configuring-the-osg-token-renewal-service","text":"","title":"Configuring the OSG Token Renewal Service"},{"location":"other/osg-token-renewer/#configuring-accounts","text":"To create a new client account named <ACCOUNT_SHORTNAME> : Create a corresponding file named /etc/osg/tokens/<ACCOUNT_SHORTNAME>.pw with the encryption password to use for this client account. Consult the Requesting Tokens document to determine which scopes you will need for this client account. Run the setup script as follows: root@server # osg-token-renewer-setup <ACCOUNT_SHORTNAME> For example, root@server # osg-token-renewer-setup myaccount123 That will use dynamic client registration. If you instead have a predefined client id and secret, add a --manual option, for example, :::console root@server # osg-token-renewer-setup --manual myaccount123 When prompted, enter your Issuer and desired scopes for this account from the list of valid options. If you used --manual , also enter the client id and secret. You will also be prompted on the console to visit a web link to authorize the client request with a passcode printed on the console. Follow the prompts (visit the web link, enter the request passcode, log in with your account for your issuer, and authorize the request). If this succeeds, you will be prompted with a new [account <ACCOUNT_SHORTNAME>] section to add to your config.ini . Add the section to your /etc/osg/token-renewer/config.ini , replacing the example section if it's still there. Next you can configure one or more tokens for this client account.","title":"Configuring accounts"},{"location":"other/osg-token-renewer/#configuring-tokens","text":"After you have created an OIDC client account and added it to /etc/osg/token-renewer/config.ini , you need to create a corresponding token section in the config for each token that should be generated for this account (possibly with different options). Choose a <TOKEN_NAME> and add a new [token <TOKEN_NAME>] section (replacing the example section if it's still there). The account option in this section must match the <ACCOUNT_SHORTNAME> for the corresponding [account <ACCOUNT_SHORTNAME>] section. Set the token_path to /etc/osg/tokens/<ACCOUNT_SHORTNAME>.<TOKEN_NAME>.token Optionally, you may also specify any of the following options, which can limit the respective values in the generated token compared to the associated account: Option Description audience list of audiences (see RFC7519 ) scope list of scopes min_lifetime min token lifetime in seconds Note For tokens used against an HTCondor-CE, set the audience option to <CE FQDN>:<CE PORT> .","title":"Configuring tokens"},{"location":"other/osg-token-renewer/#example-configuration","text":"[account myclient1234] password_file = /etc/osg/tokens/myclient1234.pw [token mytoken567] account = myclient1234 token_path = /etc/osg/tokens/myclient1234.mytoken567.token","title":"Example configuration"},{"location":"other/osg-token-renewer/#adjusting-token-renewal-frequency","text":"It is possible to override the default osg-token-renewer systemd timer frequency for this service by creating a config override file under /etc/systemd/system/osg-token-renewer.timer.d/ . For example, to configure the token renewal service to run every 10 minutes, run the following: root@host # cat << EOF > /etc/systemd/system/osg-token-renewer.timer.d/timer-frequency.conf [Timer] OnBootSec=10min OnUnitActiveSec=10min EOF root@host # systemctl daemon-reload Note Be aware that the default timer configuration also has a 3 minute random delay built in, via the parameter RandomizedDelaySec=3min . Thus setting the frequency to 10min only guarantees runs every 13 minutes. This parameter is also configurable in the above systemd override file.","title":"Adjusting token renewal frequency"},{"location":"other/osg-token-renewer/#managing-the-osg-token-renewal-service","text":"These services are managed by systemctl and may start additional services as dependencies. As a reminder, here are common service commands (all run as root ) for EL7: To... On EL7, run the command... Start a service systemctl start <SERVICE-NAME> Stop a service systemctl stop <SERVICE-NAME> Enable a service to start on boot systemctl enable <SERVICE-NAME> Disable a service from starting on boot systemctl disable <SERVICE-NAME>","title":"Managing the OSG Token Renewal Service"},{"location":"other/osg-token-renewer/#token-renewal-services","text":"Software Service name Notes OSG Token Renewer osg-token-renewer.service The OSG Token Renewer, runs as a \"oneshot\" service, not a daemon. OSG Token Renewer timer osg-token-renewer.timer Timer to run the OSG Token Renewer every 15 minutes The OSG token renewal service is set to run via a systemd timer every 15 minutes. After configuring your account(s) and token(s), enable the timer with: root@host # systemctl enable osg-token-renewer.timer root@host # systemctl start osg-token-renewer.timer If you would like to run the service manually at a different time (e.g., to generate all the tokens immediately), you can run the service once with: root@host # systemctl start osg-token-renewer.service If this succeeds, the new token will be written to the location you configured for token_path ( /etc/osg/tokens/<ACCOUNT_SHORTNAME>.token , by convention). Failures can be diagnosed by running: root@host # journalctl -eu osg-token-renewer","title":"Token renewal services"},{"location":"other/osg-token-renewer/#help","text":"To get assistance please use this Help Procedure .","title":"Help"},{"location":"other/osg-token-renewer/#reference","text":"","title":"Reference"},{"location":"other/osg-token-renewer/#files-of-interest","text":"Path Description /etc/osg/token-renewer/config.ini Main config file for service /etc/osg/tokens/<ACCOUNT_SHORTNAME>.pw Encryption password file for client account /etc/osg/tokens/<ACCOUNT_SHORTNAME>.<TOKEN_NAME>.token Output location for token files /usr/sbin/osg-token-renewer-setup Setup script for each new client account /usr/lib/systemd/system/osg-token-renewer.service SystemD service unit configuruation /usr/lib/systemd/system/osg-token-renewer.timer SystemD timer for service /usr/libexec/osg-token-renewer/osg-token-renewer.sh Main wrapper script invoked by service /usr/libexec/osg-token-renewer/osg-token-renewer Token renewal program invoked by main wrapper","title":"Files of interest"},{"location":"other/schedd-filebeats/","text":"Warning This is a technology preview document and will probably change content and location withouth notice. Installation of FileBeats for Access Points \u00b6 This document is for frontend administrators. It describes the installation of Filebeats to continuously upload the HTCondor access point transfer log to Elastic Search. Introduction \u00b6 An access point (HTCondor schedd) is a login node where users submit jobs to distributed computing pools. One interesting log that it produces is the TransferLog. The TransferLogs report all the transfers of files between compute node and access points. In this guide we describe the installation of Filebeats to upload this log to Elastic Search. Installation \u00b6 FileBeat Installation \u00b6 For the installation of filebeats follow the official instruction to set up the repositories and install filebeats as described here . Configuration \u00b6 Configuration of Filebeats \u00b6 The configuration of filebeats revolves around this file /etc/filebeat/filebeat.yml . Bellow are the steps to modify the different sections of this file The Filebeat Inputs section, the input should look like this: filebeat.inputs: - type: log enabled: true paths: - /var/log/condor/XferStatsLog The output logstash section should look like: #----------------------------- Logstash output -------------------------------- output.logstash: # The Logstash hosts hosts: [\"gracc.opensciencegrid.org:6938\"] # Optional SSL. By default is off. # List of root certificates for HTTPS server verifications ssl.certificate_authorities: [\"/etc/grid-security/certificates/InCommon-IGTF-Server-CA.pem\"] # Certificate for SSL client authentication ssl.certificate: \"/etc/grid-security/hostcert.pem\" # Client Certificate Key ssl.key: \"/etc/grid-security/hostkey.pem\" Comment out all of the Elasticsearch output since we are using LogStash #-------------------------- Elasticsearch output ------------------------------ #output.elasticsearch: # Array of hosts to connect to. #hosts: [\"localhost:9200\"] # Optional protocol and basic auth credentials. #protocol: \"https\" #username: \"elastic\" #password: \"changeme\" The general section should look like this, where <HOSTNAME> should be replaced by the hostname of the machine you are installing filebeats on. #================================ General ===================================== name: <HOSTNAME> tags: [\"xfer-log\"] Test that the configuration is correct by running: root@host # filebeat test config Start the filebeats services: root@host # service filebeat start Configuration of HTCondor \u00b6 For the configuration of the HTCondor access point to use the TransferLog follow the next instructions: Note The transfer metrics was introduced in HTCondor 8.6 series. You need to be running a version equal or greater than 8.6.1 to enable it. Create a file named /etc/condor/config.d/50-transferLog.config with the following contents: SHADOW_DEBUG = D_STATS SHADOW_STATS_LOG = $(LOG)/XferStatsLog STARTER_STATS_LOG = $(LOG)/XferStatsLog Reconfigure condor: root@host # condor_reconfig Make sure that after a couple of minutes the new log /var/log/condor/XferStatsLog is present.","title":"Install Transfer Log Filebeats"},{"location":"other/schedd-filebeats/#installation-of-filebeats-for-access-points","text":"This document is for frontend administrators. It describes the installation of Filebeats to continuously upload the HTCondor access point transfer log to Elastic Search.","title":"Installation of FileBeats for Access Points"},{"location":"other/schedd-filebeats/#introduction","text":"An access point (HTCondor schedd) is a login node where users submit jobs to distributed computing pools. One interesting log that it produces is the TransferLog. The TransferLogs report all the transfers of files between compute node and access points. In this guide we describe the installation of Filebeats to upload this log to Elastic Search.","title":"Introduction"},{"location":"other/schedd-filebeats/#installation","text":"","title":"Installation"},{"location":"other/schedd-filebeats/#filebeat-installation","text":"For the installation of filebeats follow the official instruction to set up the repositories and install filebeats as described here .","title":"FileBeat Installation"},{"location":"other/schedd-filebeats/#configuration","text":"","title":"Configuration"},{"location":"other/schedd-filebeats/#configuration-of-filebeats","text":"The configuration of filebeats revolves around this file /etc/filebeat/filebeat.yml . Bellow are the steps to modify the different sections of this file The Filebeat Inputs section, the input should look like this: filebeat.inputs: - type: log enabled: true paths: - /var/log/condor/XferStatsLog The output logstash section should look like: #----------------------------- Logstash output -------------------------------- output.logstash: # The Logstash hosts hosts: [\"gracc.opensciencegrid.org:6938\"] # Optional SSL. By default is off. # List of root certificates for HTTPS server verifications ssl.certificate_authorities: [\"/etc/grid-security/certificates/InCommon-IGTF-Server-CA.pem\"] # Certificate for SSL client authentication ssl.certificate: \"/etc/grid-security/hostcert.pem\" # Client Certificate Key ssl.key: \"/etc/grid-security/hostkey.pem\" Comment out all of the Elasticsearch output since we are using LogStash #-------------------------- Elasticsearch output ------------------------------ #output.elasticsearch: # Array of hosts to connect to. #hosts: [\"localhost:9200\"] # Optional protocol and basic auth credentials. #protocol: \"https\" #username: \"elastic\" #password: \"changeme\" The general section should look like this, where <HOSTNAME> should be replaced by the hostname of the machine you are installing filebeats on. #================================ General ===================================== name: <HOSTNAME> tags: [\"xfer-log\"] Test that the configuration is correct by running: root@host # filebeat test config Start the filebeats services: root@host # service filebeat start","title":"Configuration of Filebeats"},{"location":"other/schedd-filebeats/#configuration-of-htcondor","text":"For the configuration of the HTCondor access point to use the TransferLog follow the next instructions: Note The transfer metrics was introduced in HTCondor 8.6 series. You need to be running a version equal or greater than 8.6.1 to enable it. Create a file named /etc/condor/config.d/50-transferLog.config with the following contents: SHADOW_DEBUG = D_STATS SHADOW_STATS_LOG = $(LOG)/XferStatsLog STARTER_STATS_LOG = $(LOG)/XferStatsLog Reconfigure condor: root@host # condor_reconfig Make sure that after a couple of minutes the new log /var/log/condor/XferStatsLog is present.","title":"Configuration of HTCondor"},{"location":"other/troubleshooting-gratia/","text":"Troubleshooting Gratia Accounting \u00b6 Gratia is software used in OSG to gather accounting information for usage of computational resources. This information is collected from individual services at a site, such as a Compute Entrypoint (CE) or an Access Point (AP), through \"Gratia probes\" and transferred to the central OSG GRACC server. Gratia probes are run periodically as cron jobs under an HTCondor Schedd process (i.e., Schedd cron jobs) as the condor user. The commands that you run, configuration locations that you verify, and log locations that you check will depend on the type of host that you are troubleshooting. Accounting Architecture \u00b6 These are the definitions of the major elements in the above figure. Gratia probe : A piece of software that collects accounting data from the host on which it's running, and transmits it to the GRACC server. GRACC server : A server that collects Gratia accounting data from one or more sites and can share it with users via a web page. The GRACC server is hosted by the OSG. Reporter : A web service running on the GRACC server. Users can connect to the reporter via a web browser to explore the Gratia data. Collector : A web service running on the GRACC server that collects data from one or more Gratia probes. Users do not directly interact with the collector. You can explore the details of the OSG accounting data at https://gracc.opensciencegrid.org and https://display.opensciencegrid.org/ . Determine Your Host Type \u00b6 Before continuing with the rest of the document, it is important to know what type of host you are troubleshooting: Do users log into this host and submit jobs? Then you are running an Access Point Does this host accept pilot jobs from remote clients? Then you are running a Compute Entrypoint If you are still not sure, you can run the following command to determine if this is a CE installation: $ rpm -q osg-ce osg-ce-3.6-4.osg36.el7.x86_64 If the output is blank, then you are not working with a CE host. Access Points vs Compute Entrypoints A single host should not be used as both an AP and a CE but if you've inherited a host, it's possible that the host was installed improperly. Is Gratia Up-To-Date? \u00b6 Ensure that you have the latest bug-fixes: yum update 'gratia-probe*' Is Gratia Running? \u00b6 Since Gratia probes run as Schedd cron jobs, first verify that the relevant HTCondor service is running based on the type of host that you are troubleshooting: Host Command Access Point systemctl status condor Compute Entrypoint systemctl status condor-ce If they are not running, consult the relevant documentation to enable and start the appropriate service: Access Point Compute Entrypoint Identifying failures \u00b6 Schedd cron jobs are logged to the SchedLog, whose location depends on the type of host you are troubleshooting: Host Log Path Access Point /var/log/condor/SchedLog Compute Entrypoint /var/log/condor-ce/SchedLog Currently, the default log level does not show Schedd cron job activity (future releases will show failures by default) so you must perform the following steps to see the relevant log messages: Identify the configuration location for your host: Host Configuration Directory Access Point /etc/condor/config.d Compute Entrypoint /etc/condor-ce/config.d In a .conf file in the configuration directory that you determined above, increase the debug level: SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_CAT D_ALWAYS:2 Successful cron jobs will appear in the relevant SchedLog like so: 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: Starting job 'GRATIA' (/usr/share/gratia/htcondor-ce/condor_meter) 05/06/22 19:25:31 (D_ALWAYS:2) Create_Process: using fast clone() to create child process. 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: STDOUT closed for 'GRATIA' 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: STDERR closed for 'GRATIA' 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: 'GRATIA' (pid 1082) exit_status=0 05/06/22 19:25:31 (D_ALWAYS:2) CronJob::Schedule 'GRATIA' IR=F IP=T IWE=F IOS=F IOD=F nr=116 nf=0 05/06/22 19:25:31 (D_ALWAYS:2) CronJob::Schedule 'GRATIA_CLEANUP' IR=F IP=T IWE=F IOS=F IOD=F nr=2 nf=0 Verifying packaging \u00b6 Gratia probe RPM packaging will create the appropriate files and folder structure with the correct permissions so that Gratia probes can run smoothly. However, it's possible that stale configuration management or other automation scripts at your site could To verify that file contents and ownership have not been changed, run one of the following commands based on the type of host you are troubleshooting: Host Command Access Point rpm -q --verify gratia-probe-condor-ap Compute Entrypoint rpm -q --verify gratia-probe-htcondor-ce Verifying configuration \u00b6 When troubleshooting Gratia, there are two different configurations to investigate: HTCondor configuration if job history records aren't being processed. Gratia ProbeConfig if your job history records are being processed but they are either malformed or are not being sent to the GRACC HTCondor configuration \u00b6 The HTCondor and/or HTCondor-CE configuration determines where job history files are written and how often the Gratia probe Schedd cron job are run. If you recently updated your host to OSG 3.6, it's important to verify the location of the job history files. Access Points \u00b6 Verify the values of your HTCondor PER_JOB_HISTORY_DIR configurations match the output below: # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor/gratia/data # at: /usr/share/condor/config.d/50-gratia-gwms.conf, line 28 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor/gratia/data/ root@host # chown -R condor:condor /var/lib/condor/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a. HTCondor-CE and HTCondor batch systems \u00b6 Verify the values of your HTCondor and HTCondor-CE PER_JOB_HISTORY_DIR configurations match the output below: # condor_ce_config_val -v PER_JOB_HISTORY_DIR Not defined: PER_JOB_HISTORY_DIR # at: <Default> # raw: PER_JOB_HISTORY_DIR = # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor/config.d/99-gratia.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor-ce/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If the value of condor_ce_config_val -v PER_JOB_HISTORY_DIR is set, visit the referenced file and remove the offending configuration. Repeat until the output of condor_ce_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor-ce/gratia/data/ root@host # chown -R condor:condor /var/lib/condor-ce/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a. HTCondor-CE and non-HTCondor batch systems \u00b6 After updating your gratia-probe-* packages, verify that your HTCondor-CE's PER_JOB_HISTORY_DIR is set to /var/lib/condor-ce/gratia/data : root@host # condor_ce_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor-ce/config.d/99_gratia-ce.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output , visit the file listed in the output of condor_ce_config_val , remove the offending value, and repeat until the proper value is returned. Gratia ProbeConfig \u00b6 Access Points \u00b6 Verify that your Gratia configuration is correct in /etc/gratia/condor-ap/ProbeConfig based on the table below: Fill in the value for SiteName with the Resource Name you registered in Topology (see this section for details). For example: SiteName=\"OSG_US_EXAMPLE_SUBMIT\" Set the ProbeName: ProbeName=\"condor-ap: <HOST_FQDN> \" Replacing <HOST_FQDN> with your access point's fully qualifed domain name Enable the Gratia Probe: EnableProbe=\"1\" If you are updating an existing ProbeConfig from a pre-OSG 3.6 installation, also ensure that the following values are set: Option Value VOOverrides The collaboration's resource pool of your AP, e.g. osg for an OSPool AP SuppressGridLocalRecords \"1\" MapUnknownToGroup \"1\" DataFolder \"/var/lib/condor/gratia/data/\" WorkingFolder \"/var/lib/condor/gratia/tmp/\" LogFolder \"/var/log/condor/gratia/\" Compute Entrypoints \u00b6 In normal cases, osg-configure manages the relevant ProbeConfig and it can be configured by modifying /etc/osg/config.d/30-gratia.ini . Consult the osg-configuration documentation for details. If there are problems or special configuration, you might need to edit the Gratia configuration files yourself by modifying /etc/gratia/htcondor-ce/ProbeConfig . The ProbeConfig files have many details. A few options that you might need to edit are shown before. This is not a complete file, but only shows a subset of the options. <ProbeConfiguration CollectorHost=\"gratia-osg-itb.opensciencegrid.org:80\" SSLHost=\"gratia-osg-itb.opensciencegrid.org:80\" SSLRegistrationHost=\"gratia-osg-itb.opensciencegrid.org:80\" ProbeName=\"htcondor-ce:fermicloud084.fnal.gov\" SiteName=\"WISC_OSG_EDU\" EnableProbe=\"1\" /> The options you see here are: Option Comments CollectorHost The GRACC server this probe reports to SSLHost The GRACC server this probe reports to SSLRegistrationHost The GRACC server this probe reports to ProbeName The unique name for this probe. Note that it includes the probe type and the host name SiteName The name of your Resource, as registered in OSG Topology . EnableProbe The probe will only run if this is \"1\" Again, there are many more options in this file. Most of the time you won't need to touch them. Have Records Been Uploaded To the GRACC? \u00b6 If you have verified that your Gratia probe is running and you are receiving pilot jobs, you should see data in the GRACC for your service approximately 24h after jobs have completed successfully by entering your Topology-registered site name into the Facility dropdown: Access Points Compute Entrypoints If you still aren't seeing data in GRACC, use this section to ensure that your resources are registered properly. Have you configured the resource names correctly? \u00b6 Access Points \u00b6 Ensure that SiteName in your ProbeConfig matches your Topology-registered resource name. See this section for details Compute Entrypoints \u00b6 Do the names of your resources match the names in OSG Topology ? Gratia retrieves the resource name from the Site Information section of the /etc/osg/config.d/40-siteinfo.ini ;=================================================================== ; Site Information ;=================================================================== [Site Information] ; Set \"group\" to \"OSG\" for a production site, or \"OSG-ITB\" for an ITB site. ; ; YOU WILL NEED TO CHANGE THIS group = OSG ; Set \"host_name\" to the host name of the CE being configured. ; This should resolve in DNS; if DNS is not set up yet, enter an IPv4/v6 address instead. ; ; YOU WILL NEED TO CHANGE THIS host_name = tusker-gw1.unl.edu ; Set \"resource\" to the name of the resource that you have registered ; in the OSG topology repository at https://github.com/opensciencegrid/topology ; ; YOU WILL NEED TO CHANGE THIS resource = Tusker-CE1 Do those names match the names that you registered with OSG Topology? If not, edit the names, and rerun \"osg-configure -c\". Did the site name change? \u00b6 Was the site previously reporting data, but the registered Topology site name changed? When the site name changes, you need to ask the GRACC operations team to update the name of your site at the GRACC collector: Open a support ticket Select \"Software or Service\" Select \"GRACC Operations\" Write a friendly email that asks the GRACC team to change your site name at the collector. Make sure to tell them the old name and the new name: Hello GRACC Team, Please change the site name of my site from <OLD NAME> to <NEW NAME>. Thanks, ... Reference \u00b6 If you need to look for more data, consider consulting some of the relevant files here: File Purpose /var/log/condor/gratia/<DATE>.log or /var/log/condor-ce/gratia/<DATE>.log Log file that records information about processing and uploading of Gratia accounting data /var/lib/condor/gratia/data or /var/lib/condor-ce/gratia/data Location for AP and CE job data before being processed by Gratia HTCondor or HTCondor-CE's PER_JOB_HISTORY_DIR should be set to this location /var/lib/condor/gratia/tmp or /var/lib/condor-ce/gratia/tmp Location for temporary Gratia data as it is being processed, usually empty. If you have files that are more than 30 minutes old in this directory, there may be a problem /etc/gratia/condor-ap/ProbeConfig or /etc/gratia/htcondor-ce/ProbeConfig Configuration for Gratia probes Not all RPMs will be on all hosts. Instead, only the gratia-probe-common and the one RPM specific to that host will be installed. The most common RPMs you will see are: RPM Purpose gratia-probe-common Code shared between all Gratia probes gratia-probe-condor An empty probe to ease updates from OSG 3.5 to OSG 3.6. Can be safely removed gratia-probe-condor-ap The probe that tracks Access Point usage gratia-probe-htcondor-ce Probe that tracks HTCondor-CE usage","title":"Troubleshooting Gratia"},{"location":"other/troubleshooting-gratia/#troubleshooting-gratia-accounting","text":"Gratia is software used in OSG to gather accounting information for usage of computational resources. This information is collected from individual services at a site, such as a Compute Entrypoint (CE) or an Access Point (AP), through \"Gratia probes\" and transferred to the central OSG GRACC server. Gratia probes are run periodically as cron jobs under an HTCondor Schedd process (i.e., Schedd cron jobs) as the condor user. The commands that you run, configuration locations that you verify, and log locations that you check will depend on the type of host that you are troubleshooting.","title":"Troubleshooting Gratia Accounting"},{"location":"other/troubleshooting-gratia/#accounting-architecture","text":"These are the definitions of the major elements in the above figure. Gratia probe : A piece of software that collects accounting data from the host on which it's running, and transmits it to the GRACC server. GRACC server : A server that collects Gratia accounting data from one or more sites and can share it with users via a web page. The GRACC server is hosted by the OSG. Reporter : A web service running on the GRACC server. Users can connect to the reporter via a web browser to explore the Gratia data. Collector : A web service running on the GRACC server that collects data from one or more Gratia probes. Users do not directly interact with the collector. You can explore the details of the OSG accounting data at https://gracc.opensciencegrid.org and https://display.opensciencegrid.org/ .","title":"Accounting Architecture"},{"location":"other/troubleshooting-gratia/#determine-your-host-type","text":"Before continuing with the rest of the document, it is important to know what type of host you are troubleshooting: Do users log into this host and submit jobs? Then you are running an Access Point Does this host accept pilot jobs from remote clients? Then you are running a Compute Entrypoint If you are still not sure, you can run the following command to determine if this is a CE installation: $ rpm -q osg-ce osg-ce-3.6-4.osg36.el7.x86_64 If the output is blank, then you are not working with a CE host. Access Points vs Compute Entrypoints A single host should not be used as both an AP and a CE but if you've inherited a host, it's possible that the host was installed improperly.","title":"Determine Your Host Type"},{"location":"other/troubleshooting-gratia/#is-gratia-up-to-date","text":"Ensure that you have the latest bug-fixes: yum update 'gratia-probe*'","title":"Is Gratia Up-To-Date?"},{"location":"other/troubleshooting-gratia/#is-gratia-running","text":"Since Gratia probes run as Schedd cron jobs, first verify that the relevant HTCondor service is running based on the type of host that you are troubleshooting: Host Command Access Point systemctl status condor Compute Entrypoint systemctl status condor-ce If they are not running, consult the relevant documentation to enable and start the appropriate service: Access Point Compute Entrypoint","title":"Is Gratia Running?"},{"location":"other/troubleshooting-gratia/#identifying-failures","text":"Schedd cron jobs are logged to the SchedLog, whose location depends on the type of host you are troubleshooting: Host Log Path Access Point /var/log/condor/SchedLog Compute Entrypoint /var/log/condor-ce/SchedLog Currently, the default log level does not show Schedd cron job activity (future releases will show failures by default) so you must perform the following steps to see the relevant log messages: Identify the configuration location for your host: Host Configuration Directory Access Point /etc/condor/config.d Compute Entrypoint /etc/condor-ce/config.d In a .conf file in the configuration directory that you determined above, increase the debug level: SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_CAT D_ALWAYS:2 Successful cron jobs will appear in the relevant SchedLog like so: 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: Starting job 'GRATIA' (/usr/share/gratia/htcondor-ce/condor_meter) 05/06/22 19:25:31 (D_ALWAYS:2) Create_Process: using fast clone() to create child process. 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: STDOUT closed for 'GRATIA' 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: STDERR closed for 'GRATIA' 05/06/22 19:25:31 (D_ALWAYS:2) CronJob: 'GRATIA' (pid 1082) exit_status=0 05/06/22 19:25:31 (D_ALWAYS:2) CronJob::Schedule 'GRATIA' IR=F IP=T IWE=F IOS=F IOD=F nr=116 nf=0 05/06/22 19:25:31 (D_ALWAYS:2) CronJob::Schedule 'GRATIA_CLEANUP' IR=F IP=T IWE=F IOS=F IOD=F nr=2 nf=0","title":"Identifying failures"},{"location":"other/troubleshooting-gratia/#verifying-packaging","text":"Gratia probe RPM packaging will create the appropriate files and folder structure with the correct permissions so that Gratia probes can run smoothly. However, it's possible that stale configuration management or other automation scripts at your site could To verify that file contents and ownership have not been changed, run one of the following commands based on the type of host you are troubleshooting: Host Command Access Point rpm -q --verify gratia-probe-condor-ap Compute Entrypoint rpm -q --verify gratia-probe-htcondor-ce","title":"Verifying packaging"},{"location":"other/troubleshooting-gratia/#verifying-configuration","text":"When troubleshooting Gratia, there are two different configurations to investigate: HTCondor configuration if job history records aren't being processed. Gratia ProbeConfig if your job history records are being processed but they are either malformed or are not being sent to the GRACC","title":"Verifying configuration"},{"location":"other/troubleshooting-gratia/#htcondor-configuration","text":"The HTCondor and/or HTCondor-CE configuration determines where job history files are written and how often the Gratia probe Schedd cron job are run. If you recently updated your host to OSG 3.6, it's important to verify the location of the job history files.","title":"HTCondor configuration"},{"location":"other/troubleshooting-gratia/#access-points","text":"Verify the values of your HTCondor PER_JOB_HISTORY_DIR configurations match the output below: # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor/gratia/data # at: /usr/share/condor/config.d/50-gratia-gwms.conf, line 28 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor/gratia/data/ root@host # chown -R condor:condor /var/lib/condor/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a.","title":"Access Points"},{"location":"other/troubleshooting-gratia/#htcondor-ce-and-htcondor-batch-systems","text":"Verify the values of your HTCondor and HTCondor-CE PER_JOB_HISTORY_DIR configurations match the output below: # condor_ce_config_val -v PER_JOB_HISTORY_DIR Not defined: PER_JOB_HISTORY_DIR # at: <Default> # raw: PER_JOB_HISTORY_DIR = # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor/config.d/99-gratia.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor-ce/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If the value of condor_ce_config_val -v PER_JOB_HISTORY_DIR is set, visit the referenced file and remove the offending configuration. Repeat until the output of condor_ce_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor-ce/gratia/data/ root@host # chown -R condor:condor /var/lib/condor-ce/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a.","title":"HTCondor-CE and HTCondor batch systems"},{"location":"other/troubleshooting-gratia/#htcondor-ce-and-non-htcondor-batch-systems","text":"After updating your gratia-probe-* packages, verify that your HTCondor-CE's PER_JOB_HISTORY_DIR is set to /var/lib/condor-ce/gratia/data : root@host # condor_ce_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor-ce/config.d/99_gratia-ce.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output , visit the file listed in the output of condor_ce_config_val , remove the offending value, and repeat until the proper value is returned.","title":"HTCondor-CE and non-HTCondor batch systems"},{"location":"other/troubleshooting-gratia/#gratia-probeconfig","text":"","title":"Gratia ProbeConfig"},{"location":"other/troubleshooting-gratia/#access-points_1","text":"Verify that your Gratia configuration is correct in /etc/gratia/condor-ap/ProbeConfig based on the table below: Fill in the value for SiteName with the Resource Name you registered in Topology (see this section for details). For example: SiteName=\"OSG_US_EXAMPLE_SUBMIT\" Set the ProbeName: ProbeName=\"condor-ap: <HOST_FQDN> \" Replacing <HOST_FQDN> with your access point's fully qualifed domain name Enable the Gratia Probe: EnableProbe=\"1\" If you are updating an existing ProbeConfig from a pre-OSG 3.6 installation, also ensure that the following values are set: Option Value VOOverrides The collaboration's resource pool of your AP, e.g. osg for an OSPool AP SuppressGridLocalRecords \"1\" MapUnknownToGroup \"1\" DataFolder \"/var/lib/condor/gratia/data/\" WorkingFolder \"/var/lib/condor/gratia/tmp/\" LogFolder \"/var/log/condor/gratia/\"","title":"Access Points"},{"location":"other/troubleshooting-gratia/#compute-entrypoints","text":"In normal cases, osg-configure manages the relevant ProbeConfig and it can be configured by modifying /etc/osg/config.d/30-gratia.ini . Consult the osg-configuration documentation for details. If there are problems or special configuration, you might need to edit the Gratia configuration files yourself by modifying /etc/gratia/htcondor-ce/ProbeConfig . The ProbeConfig files have many details. A few options that you might need to edit are shown before. This is not a complete file, but only shows a subset of the options. <ProbeConfiguration CollectorHost=\"gratia-osg-itb.opensciencegrid.org:80\" SSLHost=\"gratia-osg-itb.opensciencegrid.org:80\" SSLRegistrationHost=\"gratia-osg-itb.opensciencegrid.org:80\" ProbeName=\"htcondor-ce:fermicloud084.fnal.gov\" SiteName=\"WISC_OSG_EDU\" EnableProbe=\"1\" /> The options you see here are: Option Comments CollectorHost The GRACC server this probe reports to SSLHost The GRACC server this probe reports to SSLRegistrationHost The GRACC server this probe reports to ProbeName The unique name for this probe. Note that it includes the probe type and the host name SiteName The name of your Resource, as registered in OSG Topology . EnableProbe The probe will only run if this is \"1\" Again, there are many more options in this file. Most of the time you won't need to touch them.","title":"Compute Entrypoints"},{"location":"other/troubleshooting-gratia/#have-records-been-uploaded-to-the-gracc","text":"If you have verified that your Gratia probe is running and you are receiving pilot jobs, you should see data in the GRACC for your service approximately 24h after jobs have completed successfully by entering your Topology-registered site name into the Facility dropdown: Access Points Compute Entrypoints If you still aren't seeing data in GRACC, use this section to ensure that your resources are registered properly.","title":"Have Records Been Uploaded To the GRACC?"},{"location":"other/troubleshooting-gratia/#have-you-configured-the-resource-names-correctly","text":"","title":"Have you configured the resource names correctly?"},{"location":"other/troubleshooting-gratia/#access-points_2","text":"Ensure that SiteName in your ProbeConfig matches your Topology-registered resource name. See this section for details","title":"Access Points"},{"location":"other/troubleshooting-gratia/#compute-entrypoints_1","text":"Do the names of your resources match the names in OSG Topology ? Gratia retrieves the resource name from the Site Information section of the /etc/osg/config.d/40-siteinfo.ini ;=================================================================== ; Site Information ;=================================================================== [Site Information] ; Set \"group\" to \"OSG\" for a production site, or \"OSG-ITB\" for an ITB site. ; ; YOU WILL NEED TO CHANGE THIS group = OSG ; Set \"host_name\" to the host name of the CE being configured. ; This should resolve in DNS; if DNS is not set up yet, enter an IPv4/v6 address instead. ; ; YOU WILL NEED TO CHANGE THIS host_name = tusker-gw1.unl.edu ; Set \"resource\" to the name of the resource that you have registered ; in the OSG topology repository at https://github.com/opensciencegrid/topology ; ; YOU WILL NEED TO CHANGE THIS resource = Tusker-CE1 Do those names match the names that you registered with OSG Topology? If not, edit the names, and rerun \"osg-configure -c\".","title":"Compute Entrypoints"},{"location":"other/troubleshooting-gratia/#did-the-site-name-change","text":"Was the site previously reporting data, but the registered Topology site name changed? When the site name changes, you need to ask the GRACC operations team to update the name of your site at the GRACC collector: Open a support ticket Select \"Software or Service\" Select \"GRACC Operations\" Write a friendly email that asks the GRACC team to change your site name at the collector. Make sure to tell them the old name and the new name: Hello GRACC Team, Please change the site name of my site from <OLD NAME> to <NEW NAME>. Thanks, ...","title":"Did the site name change?"},{"location":"other/troubleshooting-gratia/#reference","text":"If you need to look for more data, consider consulting some of the relevant files here: File Purpose /var/log/condor/gratia/<DATE>.log or /var/log/condor-ce/gratia/<DATE>.log Log file that records information about processing and uploading of Gratia accounting data /var/lib/condor/gratia/data or /var/lib/condor-ce/gratia/data Location for AP and CE job data before being processed by Gratia HTCondor or HTCondor-CE's PER_JOB_HISTORY_DIR should be set to this location /var/lib/condor/gratia/tmp or /var/lib/condor-ce/gratia/tmp Location for temporary Gratia data as it is being processed, usually empty. If you have files that are more than 30 minutes old in this directory, there may be a problem /etc/gratia/condor-ap/ProbeConfig or /etc/gratia/htcondor-ce/ProbeConfig Configuration for Gratia probes Not all RPMs will be on all hosts. Instead, only the gratia-probe-common and the one RPM specific to that host will be installed. The most common RPMs you will see are: RPM Purpose gratia-probe-common Code shared between all Gratia probes gratia-probe-condor An empty probe to ease updates from OSG 3.5 to OSG 3.6. Can be safely removed gratia-probe-condor-ap The probe that tracks Access Point usage gratia-probe-htcondor-ce Probe that tracks HTCondor-CE usage","title":"Reference"},{"location":"release/osg-23/","text":"OSG 23 News \u00b6 Supported OS Versions: EL8, EL9 (see this document for details) OSG 23 is the first release series that follows our new support policy , which aims to increase the regularity of the OSG Software Stack lifecycle. Moving forward, we plan to distribute a new release series each year, supporting each release series for approximately 2 years total. As with OSG 3.6, we will continue to release OSG 23 package updates in a rolling fashion. Additionally, OSG 23 aligns the OSG and HTCondor Software Suite (HTCSS) release cycles: OSG 23 main Yum repositories will contain HTCSS LTS series ( HTCondor 23.0 , HTCondor-CE 23.0 ) OSG 23 upcoming Yum repositories will contain HTCSS feature series (HTCondor 23.x, HTCondor-CE 23.x) Known Issues \u00b6 HTCondor-CE and Torque \u00b6 We have noticed issues with our HTCondor-CE + Torque batch system automated tests for Torque RPMs installed out of EPEL. This issue is still under investigation. Latest News \u00b6 November 2, 2023: IGTF 1.124, CVMFS 2.11.2, cvmfs-x509-helper 2.4 \u00b6 CA certificates based on IGTF 1.124 Updated contact meta-data for ArmeSFo authority (AM) Removed discontinued AEGIS authority (RS) Removed suspended KENET Root and issuing CAs (KE) Removed suspended SDG-G2 authority (CN) Removed suspended CNIC authority (CN) Removed all four discontinued DigitalTrust CAs operated by their issuer (AE) CVMFS 2.11.2 Bug fix release cvmfs-x509-helper 2.4 Important bug fix for reading credentials from within an unprivileged user namespace such as unprivileged apptainer users. This is needed due to a change in recent el8 & el9 kernels. October 31, 2023: HTCondor 23.0.1 LTS; Upcoming: HTCondor 23.1.0 \u00b6 HTCondor 23.0.1 LTS Update to apptainer version 1.2.4 in the HTCondor tarballs Fix 10.6.0 bug that broke PID namespaces Fix bug where execution times for ARC CE jobs were 60 times too large Fix bug where a failed 'Service' node would crash DAGMan Condor-C and Job Router jobs now get resources provisioned updates Upcoming: HTCondor 23.1.0 Enhanced filtering with condor_watch_q The Access Point can now be told to use a non-standard ssh port when sending jobs to a remote scheduling system (such as Slurm) Laid groundwork to allow an Execution Point running without root access to accurately limit the job's usage of CPU and Memory in real time via Linux kernel cgroups; this is particularly interesting for glidein pools HTCondor file transfers using HTTPS can now utilize CA certificates in a non-standard location All the fixes from HTCondor 23.0.1 October 26, 2023: CVMFS 2.11.1-1.3, XRootD 5.6.2-2.3, osg-update-vos 1.4.2-2 \u00b6 CVMFS 2.11.1-1.3 Important fix to bug impacting osgstorage.org repositories introduced in 2.11.0 -- all 2.11.0 installations should upgrade urgently Fix race conditions on concurrent fuse3 mounts XRootD 5.6.2-2.3 Update to -2.3 release to avoid confusion with upstream -2 release Fix a bug with parsing compound IDs in authfiles osg-update-vos 1.4.2-2 tarballs now contain cpio, so osg-update-vos will work October 3, 2023 : Initial Release \u00b6 OSG 3.6 retirement As part of our transition to our new series release cadence, we are planning to end support for OSG 3.6 on 30 June 2024 to align with the EL7 end-of-life. See our release series life-cycle table for details. The initial release of OSG 23 contains major package updates , package removals , and new container images . All other packages may have received minor version and/or packaging updates compared to OSG 3.6. Major package updates \u00b6 This release contains the following major package updates compared to the current OSG 3.6 release: HTCondor 23.0.0 : an update from 10.0.8 in OSG 3.6 main and 10.8.0 in OSG 3.6 upcoming. New features A condor_startd without any slot types defined will now default to a single partitionable slot rather than a number of static slots equal to the number of cores as it was in previous versions. The configuration template use FEATURE : StaticSlots was added for admins wanting the old behavior. The TargetType attribute is no longer a required attribute in most Classads. It is still used for queries to the condor_collector and it remains in the Job ClassAd and the Machine ClassAd because of older versions of HTCondor require it to be present. The -dry-run option of condor_submit will now print the output of a SEC_CREDENTIAL_STORER script . This can be useful when developing such a script. Added ability to query epoch history records from the python bindings. The default value of SEC_DEFAULT_AUTHENTICATION_METHODS will now be visible in condor_config_val . The default for SEC_*_AUTHENTICATION_METHODS will inherit from this value, and thus no READ and CLIENT will no longer automatically have CLAIMTOBE . Added new tool condor_test_token , which will create a SciToken with configurable contents (including issuer) which will be accepted for a short period of time by the local HTCondor daemons. Bug fixes Fixed a bug that would cause the condor_startd to crash in rare cases when jobs go on hold Fixed a bug where if a user-level checkpoint could not be transferred from the starter to the AP, the job would go on hold Now it will retry, or go back to idle Fixed a bug where the CommittedTime attribute was not set correctly for Docker Universe jobs doing user level check-pointing Fixed a bug where condor_preen was deleting files named 'OfflineAds' in the spool directory Fixed a bug where the blahpd would incorrectly believe that an LSF batch scheduler was not working Fixed the Execution Point\u2019s detection of whether libvirt is working properly for the VM universe Fixed a bug where container universe did not work for late materialization jobs submitted to the condor_schedd Fixed a bug where the condor_startd could crash if a new match is made at the end a drain request HTCondor-CE 23.0.0 : an update from 6.0.0 in OSG 3.6 main. Job router configuration deprecation The configuration macros JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of HTCondor. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_<name> . Note: The removal will occur during the lifetime of the HTCondor V23 feature series, i.e. the versions that will be available in OSG upcoming repositories. Adds deprecation warnings for old job router configuration syntax Adds grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory on startup condor_ce_upgrade_check checks compatibility with HTCondor 23 Adds an option to allow running condor_ce_trace without a SciToken for testing batch system integration XRootD 5.6.2 : an update from XRootD 5.5.5 in OSG 3.6 main. New Features Add xrdfs cache subcommand to allow for cache evictions Better handling of unicode strings in the API Add gsi option to display DN when it differs from entity name Allow specfication of minimum and maximum creation mode Make maxfd be configurable (default is 256k) Include token information in the monitoring stream (phase 1) Implement a file evict function Increase default number of parallel event loops to 10 xrdcp: number of parallel copy jobs increased from 4 to 128 Allow XRootD to return trailers indicating failure Denote Accept-Ranges in HEAD response Report cache object age for caching proxy mode Allow origin to be a directory of a locally mounted file system Implement ability to have the token username as a separate claim Use SHA-256 for signatures, and message digest algorithm Allow option '-tokenlib none' to disable token validation Allow to point to a token file using CGI '?xrd.ztn=tokenfile' Major bug fixes Fix SEGV in case request has object for opaque data but no content Fix memory leaks in GSI authentication Fix chunked PUT creating empty files GlideinWMS 3.10.5 : an update from 3.10.1 in OSG 3.6 main If you are using custom setup scripts... If you are using custom setup scripts please change the use of glidein_config : Custom scripts should always read values via gconfig_get() . The only exception is the parsing of the line to get the add_config_line source file. add_config_line is deprecated in favor of gconfig_add . add_config_line will be removed from future versions. add_config_line_safe is deprecated in favor of gconfig_add_safe . gconfig_add is the recommended method to use also in concurrent scripts. Custom scripts in Python should import gconfig.py (compatible with both Python 2 and 3) and use the provided functions or classes: gconfig_get , gconfig_add , etc. This release completes EL9 and Python 3.9 support Added structured logging Various OSG_Autoconf improvements Fixed bugs with Python 3.9 and rrdtools failures with missing ClassAds and monitoring Package Removals \u00b6 The following packages were removed from OSG 23 compared to OSG 3.6: blahp : available as part of the condor package oidc-agent : available in EPEL python-jwt : available in EPEL python-scitokens : available in EPEL rrdtool available from OS repositories Container Images \u00b6 The following container images have new tags for OSG 23: Image name Tags hub.opensciencegrid.org/opensciencegrid/atlas-xcache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/cms-xcache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/frontier-squid 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/oidc-agent 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/osgvo-docker-pilot 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/stash-cache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/stash-origin 23-release , 23-testing For example, to retreive an OSG 23 backfill container image, run the following command: docker pull hub.opensciencegrid.org/opensciencegrid/osgvo-docker-pilot:23-release For more details on OSG container images, see our policy document . Announcements \u00b6 Updates to critical packages also announced by email and are sent to the following recipients and lists: Registered administrative contacts osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org site-announce@opensciencegrid.org software-discuss@osg-htc.org","title":"News"},{"location":"release/osg-23/#osg-23-news","text":"Supported OS Versions: EL8, EL9 (see this document for details) OSG 23 is the first release series that follows our new support policy , which aims to increase the regularity of the OSG Software Stack lifecycle. Moving forward, we plan to distribute a new release series each year, supporting each release series for approximately 2 years total. As with OSG 3.6, we will continue to release OSG 23 package updates in a rolling fashion. Additionally, OSG 23 aligns the OSG and HTCondor Software Suite (HTCSS) release cycles: OSG 23 main Yum repositories will contain HTCSS LTS series ( HTCondor 23.0 , HTCondor-CE 23.0 ) OSG 23 upcoming Yum repositories will contain HTCSS feature series (HTCondor 23.x, HTCondor-CE 23.x)","title":"OSG 23 News"},{"location":"release/osg-23/#known-issues","text":"","title":"Known Issues"},{"location":"release/osg-23/#htcondor-ce-and-torque","text":"We have noticed issues with our HTCondor-CE + Torque batch system automated tests for Torque RPMs installed out of EPEL. This issue is still under investigation.","title":"HTCondor-CE and Torque"},{"location":"release/osg-23/#latest-news","text":"","title":"Latest News"},{"location":"release/osg-23/#november-2-2023-igtf-1124-cvmfs-2112-cvmfs-x509-helper-24","text":"CA certificates based on IGTF 1.124 Updated contact meta-data for ArmeSFo authority (AM) Removed discontinued AEGIS authority (RS) Removed suspended KENET Root and issuing CAs (KE) Removed suspended SDG-G2 authority (CN) Removed suspended CNIC authority (CN) Removed all four discontinued DigitalTrust CAs operated by their issuer (AE) CVMFS 2.11.2 Bug fix release cvmfs-x509-helper 2.4 Important bug fix for reading credentials from within an unprivileged user namespace such as unprivileged apptainer users. This is needed due to a change in recent el8 & el9 kernels.","title":"November 2, 2023: IGTF 1.124, CVMFS 2.11.2, cvmfs-x509-helper 2.4"},{"location":"release/osg-23/#october-31-2023-htcondor-2301-lts-upcoming-htcondor-2310","text":"HTCondor 23.0.1 LTS Update to apptainer version 1.2.4 in the HTCondor tarballs Fix 10.6.0 bug that broke PID namespaces Fix bug where execution times for ARC CE jobs were 60 times too large Fix bug where a failed 'Service' node would crash DAGMan Condor-C and Job Router jobs now get resources provisioned updates Upcoming: HTCondor 23.1.0 Enhanced filtering with condor_watch_q The Access Point can now be told to use a non-standard ssh port when sending jobs to a remote scheduling system (such as Slurm) Laid groundwork to allow an Execution Point running without root access to accurately limit the job's usage of CPU and Memory in real time via Linux kernel cgroups; this is particularly interesting for glidein pools HTCondor file transfers using HTTPS can now utilize CA certificates in a non-standard location All the fixes from HTCondor 23.0.1","title":"October 31, 2023: HTCondor 23.0.1 LTS; Upcoming: HTCondor 23.1.0"},{"location":"release/osg-23/#october-26-2023-cvmfs-2111-13-xrootd-562-23-osg-update-vos-142-2","text":"CVMFS 2.11.1-1.3 Important fix to bug impacting osgstorage.org repositories introduced in 2.11.0 -- all 2.11.0 installations should upgrade urgently Fix race conditions on concurrent fuse3 mounts XRootD 5.6.2-2.3 Update to -2.3 release to avoid confusion with upstream -2 release Fix a bug with parsing compound IDs in authfiles osg-update-vos 1.4.2-2 tarballs now contain cpio, so osg-update-vos will work","title":"October 26, 2023: CVMFS 2.11.1-1.3, XRootD 5.6.2-2.3, osg-update-vos 1.4.2-2"},{"location":"release/osg-23/#october-3-2023-initial-release","text":"OSG 3.6 retirement As part of our transition to our new series release cadence, we are planning to end support for OSG 3.6 on 30 June 2024 to align with the EL7 end-of-life. See our release series life-cycle table for details. The initial release of OSG 23 contains major package updates , package removals , and new container images . All other packages may have received minor version and/or packaging updates compared to OSG 3.6.","title":"October 3, 2023: Initial Release"},{"location":"release/osg-23/#major-package-updates","text":"This release contains the following major package updates compared to the current OSG 3.6 release: HTCondor 23.0.0 : an update from 10.0.8 in OSG 3.6 main and 10.8.0 in OSG 3.6 upcoming. New features A condor_startd without any slot types defined will now default to a single partitionable slot rather than a number of static slots equal to the number of cores as it was in previous versions. The configuration template use FEATURE : StaticSlots was added for admins wanting the old behavior. The TargetType attribute is no longer a required attribute in most Classads. It is still used for queries to the condor_collector and it remains in the Job ClassAd and the Machine ClassAd because of older versions of HTCondor require it to be present. The -dry-run option of condor_submit will now print the output of a SEC_CREDENTIAL_STORER script . This can be useful when developing such a script. Added ability to query epoch history records from the python bindings. The default value of SEC_DEFAULT_AUTHENTICATION_METHODS will now be visible in condor_config_val . The default for SEC_*_AUTHENTICATION_METHODS will inherit from this value, and thus no READ and CLIENT will no longer automatically have CLAIMTOBE . Added new tool condor_test_token , which will create a SciToken with configurable contents (including issuer) which will be accepted for a short period of time by the local HTCondor daemons. Bug fixes Fixed a bug that would cause the condor_startd to crash in rare cases when jobs go on hold Fixed a bug where if a user-level checkpoint could not be transferred from the starter to the AP, the job would go on hold Now it will retry, or go back to idle Fixed a bug where the CommittedTime attribute was not set correctly for Docker Universe jobs doing user level check-pointing Fixed a bug where condor_preen was deleting files named 'OfflineAds' in the spool directory Fixed a bug where the blahpd would incorrectly believe that an LSF batch scheduler was not working Fixed the Execution Point\u2019s detection of whether libvirt is working properly for the VM universe Fixed a bug where container universe did not work for late materialization jobs submitted to the condor_schedd Fixed a bug where the condor_startd could crash if a new match is made at the end a drain request HTCondor-CE 23.0.0 : an update from 6.0.0 in OSG 3.6 main. Job router configuration deprecation The configuration macros JOB_ROUTER_DEFAULTS , JOB_ROUTER_ENTRIES , JOB_ROUTER_ENTRIES_CMD , and JOB_ROUTER_ENTRIES_FILE are deprecated and will be removed for V24 of HTCondor. New configuration syntax for the job router is defined using JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ROUTE_<name> . Note: The removal will occur during the lifetime of the HTCondor V23 feature series, i.e. the versions that will be available in OSG upcoming repositories. Adds deprecation warnings for old job router configuration syntax Adds grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory on startup condor_ce_upgrade_check checks compatibility with HTCondor 23 Adds an option to allow running condor_ce_trace without a SciToken for testing batch system integration XRootD 5.6.2 : an update from XRootD 5.5.5 in OSG 3.6 main. New Features Add xrdfs cache subcommand to allow for cache evictions Better handling of unicode strings in the API Add gsi option to display DN when it differs from entity name Allow specfication of minimum and maximum creation mode Make maxfd be configurable (default is 256k) Include token information in the monitoring stream (phase 1) Implement a file evict function Increase default number of parallel event loops to 10 xrdcp: number of parallel copy jobs increased from 4 to 128 Allow XRootD to return trailers indicating failure Denote Accept-Ranges in HEAD response Report cache object age for caching proxy mode Allow origin to be a directory of a locally mounted file system Implement ability to have the token username as a separate claim Use SHA-256 for signatures, and message digest algorithm Allow option '-tokenlib none' to disable token validation Allow to point to a token file using CGI '?xrd.ztn=tokenfile' Major bug fixes Fix SEGV in case request has object for opaque data but no content Fix memory leaks in GSI authentication Fix chunked PUT creating empty files GlideinWMS 3.10.5 : an update from 3.10.1 in OSG 3.6 main If you are using custom setup scripts... If you are using custom setup scripts please change the use of glidein_config : Custom scripts should always read values via gconfig_get() . The only exception is the parsing of the line to get the add_config_line source file. add_config_line is deprecated in favor of gconfig_add . add_config_line will be removed from future versions. add_config_line_safe is deprecated in favor of gconfig_add_safe . gconfig_add is the recommended method to use also in concurrent scripts. Custom scripts in Python should import gconfig.py (compatible with both Python 2 and 3) and use the provided functions or classes: gconfig_get , gconfig_add , etc. This release completes EL9 and Python 3.9 support Added structured logging Various OSG_Autoconf improvements Fixed bugs with Python 3.9 and rrdtools failures with missing ClassAds and monitoring","title":"Major package updates"},{"location":"release/osg-23/#package-removals","text":"The following packages were removed from OSG 23 compared to OSG 3.6: blahp : available as part of the condor package oidc-agent : available in EPEL python-jwt : available in EPEL python-scitokens : available in EPEL rrdtool available from OS repositories","title":"Package Removals"},{"location":"release/osg-23/#container-images","text":"The following container images have new tags for OSG 23: Image name Tags hub.opensciencegrid.org/opensciencegrid/atlas-xcache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/cms-xcache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/frontier-squid 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/oidc-agent 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/osgvo-docker-pilot 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/stash-cache 23-release , 23-testing hub.opensciencegrid.org/opensciencegrid/stash-origin 23-release , 23-testing For example, to retreive an OSG 23 backfill container image, run the following command: docker pull hub.opensciencegrid.org/opensciencegrid/osgvo-docker-pilot:23-release For more details on OSG container images, see our policy document .","title":"Container Images"},{"location":"release/osg-23/#announcements","text":"Updates to critical packages also announced by email and are sent to the following recipients and lists: Registered administrative contacts osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org site-announce@opensciencegrid.org software-discuss@osg-htc.org","title":"Announcements"},{"location":"release/osg-36/","text":"OSG 3.6 News \u00b6 Supported OS Versions: EL7, EL8, EL9 The OSG 3.6 release series is a major overhaul of the OSG software stack compared to previous release series with changes to core protocols used for authentication and data transfer: bearer tokens, such as SciTokens or WLCG tokens, are used for authentication instead of GSI proxies and HTTP is used for data transfer instead of GridFTP. To support these new protocols, OSG 3.6 includes HTCondor 9.0, HTCondor-CE 5, GlideinWMS 3.9, and XRootD 5.4. We also dropped support for the GridFTP, GSI authentication, and Hadoop. Known Issues \u00b6 The following issues are known to currently affect packages distributed in OSG 3.6: Preparing for HTCondor 10.0 \u00b6 We have released HTCondor version 10.0 into the OSG repositories. Note The condor-upgrade-checks RPM version 10.0.5 works with existing HTCondor 9.0.x installations. It can be installed with either HTCondor version 9 or 10. HTCondor-CE and HTCondor pool administrators should install the condor-upgrade-checks RPM and run the condor_upgrade_check script to check for actions that need to be taken before upgrading to HTCondor version 10. This script checks for three possible issues: HTCondor upgrade causing a change in TRUST_DOMAIN which would invalidate existing IDTOKENS Recent and current GPU jobs that will no longer match, because the new require_gpus condor_submit command must be used for GPU matching HTCondor map files that have regular expressions that the new PCRE2 library will not accept To check your Access Point configuration run: root@access-point # condor_upgrade_check To check your Compute Entrypoint configuration run: root@htcondor-ce # condor_upgrade_check -ce Note Don't forget to check to HTCondor batch system as well For more information, consult the HTCondor documentation CA Certificates on EL9 \u00b6 EL9 operating systems have a tighter default cryptographic policy that can cause services to reject certificates issued by SHA-1 signed CAs. Some CAs in the igtf-ca-certs and osg-ca-certs packages are affected and you may see service issues if your server certificate or certificates presented by clients are issued by these CAs. The Software Team is investigating solutions but in the meantime, we recommend running the following command on XRootD hosts to accept certificates issued by SHA-1 signed CAs: root@host # update-crypto-policies --set DEFAULT:SHA1 Do I need to run this on my Compute Entrypoint (CE) hosts? No. At this time, the Software Team believes that CE hosts are unaffected since their clients only present tokens and token issuers present modern CAs. rrdtool \u00b6 To improve support for Python 3 based GlideinWMS in EL7, the EL7 OSG Yum repositories contain a newer version of rrdtool than the operating system repositories. This may cause dependency solving issues with non-OSG packages. Therefore, on EL7 hosts that are not running GlideinWMS, we suggest adding the following line under the [osg] section of /etc/yum.repos.d/osg.repo : excludepkgs=rrdtool Latest News \u00b6 November 2, 2023: IGTF 1.124, CVMFS 2.11.2, cvmfs-x509-helper 2.4 \u00b6 CA certificates based on IGTF 1.124 Updated contact meta-data for ArmeSFo authority (AM) Removed discontinued AEGIS authority (RS) Removed suspended KENET Root and issuing CAs (KE) Removed suspended SDG-G2 authority (CN) Removed suspended CNIC authority (CN) Removed all four discontinued DigitalTrust CAs operated by their issuer (AE) CVMFS 2.11.2 Bug fix release cvmfs-x509-helper 2.4 Important bug fix for reading credentials from within an unprivileged user namespace such as unprivileged apptainer users. This is needed due to a change in recent el8 & el9 kernels. October 26, 2023: CVMFS 2.11.1-1.3, XRootD 5.6.2-2.3, HTCondor-CE 6.0.1, osg-update-vos 1.4.2-2 \u00b6 CVMFS 2.11.1-1.3 Important fix to bug impacting osgstorage.org repositories introduced in 2.11.0 -- all 2.11.0 installations should upgrade urgently Fix race conditions on concurrent fuse3 mounts XRootD 5.6.2-2.3 Update to -2.3 release to avoid confusion with upstream -2 release Fix a bug with parsing compound IDs in authfiles HTCondor-CE 6.0.1 Add grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory Can use condor_ce_trace without SciToken to test batch system integration condor_ce_upgrade_check checks compatibility with HTCondor 23.0 Adds deprecation warnings for old job router configuration syntax osg-update-vos 1.4.2-2 tarballs now contain cpio, so osg-update-vos will work October 5, 2023: HTCondor 10.0.9, HTCondor 10.9.0, XRootD 5.6.2, GlideinWMS 3.10.5, CVMFS 2.11.0, cvmfs-x509-helper 2.3, osg-pki-tools 3.6.1, oidc-agent 4.5.2 \u00b6 HTCondor 10.0.9 : EL7, EL8 The condor_upgrade_check script now provides guidance on updating to 23.0 Avoid kernel panic on some Enterprise Linux 8 systems Fix bug where early termination of service nodes could crash DAGMan The htchirp Python binding now properly locates the chirp configuration Limit email about long file transfer queue to once daily Various fixes to condor_adstash HTCondor 10.9.0 : EL7 Upcoming, EL8 Upcoming, EL9 Fold the classads, blahp, and procd RPMs into the main condor RPM On Linux, the default configuration enforces memory limits with cgroups condor_status -gpus shows nodes with GPUs and the GPU properties condor_status -compact shows a row for each slot type New ENV command controls which environment variables are present in DAGMan All the fixes from HTCondor 10.0.9 (listed above) XRootD 5.6.2 New feature release: XRootD 5.6.0 with many improvements Plus two bug fix releases: XRootD 5.6.1 and XRootD 5.6.2 GlideinWMS 3.10.5 Enterprise Linux 9 and Python 3.9 support Added structured logging CVMFS 2.11.0 Support for symlink kernel caching A new reference-counted cache manager mode that reduces the number of open file descriptors A bug fix for an issue that could slow down client startup A new telemetry option to send client performance counters to influx cvmfs-x509-helper 2.3 Fixes to support Enterprise Linux 9 Fix for tokens that are bigger than 1024 bytes Fix usage of $BEARER_TOKEN when accessing data osg-pki-tools 3.6.1 Add configuration file for osg-incommon-cert-request Update default CSR key length to 4096, add CLI option oidc-agent 4.5.2 Update to a recent release that has timeouts to prevent hangs September 8, 2023: IGTF 1.123-2 \u00b6 Warning If you updated to osg-ca-certs-1.114-1.1 or igtf-ca-certs-1.123-1.1, update to osg-ca-certs-1.114-2 or igtf-ca-certs-123-2 as soon a possible. Java-based services may need to be restarted to pick up the new certificates. Reverted work around for supporting SHA1-signed CA certificates on systems with tight cryptographic policies (i.e., the EL9 default) September 7, 2023: IGTF 1.123, htgettoken 1.20, Pegasus 5.0.6 \u00b6 CA certificates based on IGTF 1.123 Add ECC private trust hierarchy for GEANT (Research and Education) TCS (EU) Added accredited private trust eMudhra IGTF root and issuers (IN) Resolved issue on EL9 with SHA1 signed Certificate Authorities htgettoken 1.20 Adds httokensh command to automatically renew access tokens as long a subshell runs Update httokensh to by default set the minimum vault token time to live to 6 days, and to make sure that the background refresh never gets a new vault token Changed the preferred name of httokendecode to htdecodetoken , keeping links in the opposite direction Add man pages for httokensh , htdestroytoken , and htdecodetoken Pegasus 5.0.6 : Bug fix release August 10, 2023: frontier-squid 5.9-1.1, xrootd-multiuser 2.1.3-1.3 \u00b6 frontier-squid 5.9-1.1 Improvement of debug logging related to the reply_body_max_size parameter Consistent with squid5, disallow the combination of multiple workers, ufs cache, and memory_cache_shared even if collapsed_forwarding is off. Limit the maximum number of file descriptors to 65536 even if the OS would allow a higher number xrootd-multiuser 2.1.3-1.3 Add support for supplementary groups First release of worker node tarballs for EL9 August 8, 2023: IGTF 1.122 \u00b6 CA certificates based on IGTF 1.122 Added private trust hierarchy for GEANT (Research and Education) TCS (EU) (RSA variants only) Added accredited eMudhra joint public trust root and issuing CAs (IN) Added private trust eMudhra IGTF root and issuers as experimental (IN, US) August 2, 2023: HTCondor 10.0.7; Upcoming: HTCondor 10.7.0 \u00b6 Danger The format of the HTCondor job queue log has changed. Once you have updated the Access Point and HTCondor-CE (i.e., hosts with a condor_schedd daemon) to HTCondor 10.7.0, you may only downgrade to a version that can parse this new format. (LTS: 10.0.4 and later, feature: 10.5.0 and later) We recommend upgrading your Access Points and HTCondor-CE hosts to the latest 10.0.x release or 10.5.0 first, then proceeding with an upgrade to 10.7.0. HTCondor 10.0.7 : EL7, EL8 Fixed bug where held condor cron jobs would never run when released Improved daemon IDTOKENS logging to make useful messages more prominent Remove limit on certificate chain length in SSL authentication condor_config_val -summary now works with a remote configuration query Prints detailed message when condor_remote_cluster fails to fetch a URL Improvements to condor_preen HTCondor 10.7.0 : EL7 Upcoming, EL8 Upcoming, EL9 Can run defrag daemons with different policies on distinct sets of nodes Added want_io_proxy submit command Apptainer is now included in the HTCondor tarballs Fix 10.5.0 bug where reported CPU time is very low when using cgroups v1 Fix 10.5.0 bug where .job.ad and .machine.ad were missing for local jobs July 19, 2023: HTCondor 10.0.6, osg-xrootd 3.6-20, XCache 3.5.0-2, osg-ca-scripts 1.2.4-2; Upcoming: HTCondor 10.6.0 \u00b6 Danger The format of the HTCondor job queue log has changed. Once you have updated the Access Point and HTCondor-CE (i.e., hosts with a condor_schedd daemon) to HTCondor 10.6.0, you may only downgrade to a version that can parse this new format. (LTS: 10.0.4 and later, feature: 10.5.0 and later) We recommend upgrading your Access Points and HTCondor-CE hosts to the latest 10.0.x release or 10.5.0 first, then proceeding with an upgrade to 10.6.0. HTCondor 10.0.6 : EL7, EL8 In SSL Authentication, use the identity instead of the X.509 proxy subject Can use environment variable to locate the client's SSL X.509 credential ClassAd aggregate functions now tolerate undefined values Fix Python binding bug where accounting ads were omitted from the result The Python bindings now properly report the HTCondor version remote_initial_dir works when submitting remote grid batch jobs via ssh Add a ClassAd stringlist subset match function osg-xrootd 3.6-20 Allow create_macaroon_secret to be run as a non-root user XCache 3.5.0-2 Add dependency on the xrdcl-http package osg-ca-scripts 1.2.4-2 Update package dependencies for Enterprise Linux 9 HTCondor 10.6.0 : EL7 Upcoming, EL8 Upcoming, EL9 Administrators can enable and disable job submission for a specific user Work around memory leak in libcurl on EL7 when using the ARC-CE GAHP Container images may now be transferred via a file transfer plugin Add submit file macro $(JobId) which expands to full ID of the job The job's executable is no longer renamed to condor_exec.exe June 29, 2023: XRootD 5.5.5-1.2, osdf-client 6.12.1, hosted-ce-tools 1.0 \u00b6 XRootD 5.5.5-1.2 Patched to allow Diffie-Hellman key exchange between Enterprise Linux 7 clients and Enterprise Linux 9 servers osdf-client 6.12.1 Bug fixes and improvements, notably with regard to authenticated access hosted-ce-tools 1.0 Dereference hardlinks when extracting the cvmfsexec tarball More aggressively kill timed out rsync processes Convert update worker node client scripts to Python 3 June 20, 2023: IGTF 1.121 \u00b6 CA certificates based on IGTF 1.121 Added accredited (classic) InCommon RSA IGTF Server CA 3 under the Sectigo USERTrust RSA root, for which namespaces have been updated (US) June 9, 2023: HTCondor 10.0.5 \u00b6 HTCondor 10.0.5 : EL7, EL8 Rename upgrade9to10checks.py script to condor_upgrade_check Fix spurious warning from condor_upgrade_check about regular expression that contain a space Note The condor-upgrade-checks RPM version 10.0.5 works with existing HTCondor 9.0.x installations. It can be installed with either HTCondor version 9 or 10. June 8, 2023: HTCondor 10.0.4, XCache 3.5.0, frontier-squid 5.8, IGTF 1.120; Upcoming: HTCondor 10.5.1 \u00b6 HTCondor 10.0.4 : EL7, EL8 Users can prevent runaway jobs by specifying an allowed duration Able to extend submit commands and create job submit templates Initial implementation of htcondor command line interface Initial implementation of Job Sets in the htcondor CLI tool Users can supply a container image without concern for which container runtime is used Add the ability to select a particular model of GPU when the execution points have heterogeneous GPU cards installed or cards that support nVidia MIG File transfer error messages are now returned and clearly indicate where the error occurred HTCondor now utilizes ARC-CE's REST interface Support for ARM and PowerPC for Enterprise Linux 8 Security Enhancements For IDTOKENS, signing key not required on every execution point Trust on first use ability for SSL connections Improvements against replay attacks XCache 3.5.0 The authfile updater pulls a grid-mapfile from Topology frontier-squid 5.8 Add predefined ACL named to_linklocal Bug fix for the cache manager returning mgr_index rather than data CA certificates based on IGTF 1.120 Added transitional CDP mirror URLs for retiring DigitalTrust CAs (AE) Removed discontinued NIIF-Root-CA-2 (HU) Removed expiring GermanGrid (GridKA CrossGrid) CA (DE) htgettoken 1.18 Fixes bug with --nobearertoken when invoked by HTCondor EL9 support osg-token-renewer 0.8.3-2 : Remove X11 UI dependencies osg-update-vos 1.4.1: Remove Python 2 dependencies cigetcert 1.21: Remove warning on EL9 HTCondor 10.5.1 : EL7 Upcoming, EL8 Upcoming, EL9 Can now define DAGMan save points to be able to rerun DAGs from there Expand environment variables passed by default to the DAGMan manager Administrators can prevent users using \"getenv = true\" in submit files Improved throughput when submitting a large number of ARC-CE jobs Execute events contain the slot name, sandbox path, resource quantities Can add attributes of the execution point to be recorded in the user log Enhanced condor_transform_ads tool to ease offline job transform testing Fix bug where memory limits over 2 GiB might not be correctly enforced May 30, 2023: HTCondor 9.0.17-3, osdf-client 6.11.0 \u00b6 HTCondor 9.0.17-3 Provides script to assist updating from HTCondor version 9 to version 10 osdf-client 6.11.0 Distinguish between slow and stopped transfers Fix token finding bug introduced in 6.10.0 May 18, 2023: XRootD 5.5.5, vault 1.13.2, htvault-config 1.15, VOMS 2.0.6-1.6 + 2.1.0-0.14.rc2.6 \u00b6 XRootD 5.5.5 : Bug fix release Vault 1.13.2, htvault-config 1.15 Update to latest upstream plugin versions Add new API for token exchange Fix bug where kerberos policydomain was ignored VOMS 2.0.16-1.6 (el7), 2.1.0-0.14.rc2.6 (el8) More detailed error messages to help diagnose CA or certificate issues May 4, 2023: XRootD 5.5.4, VO Pacakge v131 \u00b6 XRootD 5.5.4 Bug fix release Enabled xrdcl-http, an HTTP plugin to the XRootD clients VO Package v131 New CLAS2 and EIC certificates April 20, 2023: HTCondor-CE 6.0.0, htgettoken 1.17; Upcoming: HTCondor 10.4.0 \u00b6 HTCondor-CE 6.0.0 Align HTCondor-CE security configuration with HTCondor defaults Add example configuration on how to ban users Add condor_ce_transform_ads command Improve essential directory checking and creation at startup htgettoken 1.17 Make --showbearerurl work properly in combination with --nobearertoken httokendecode 's error message for missing token file now goes to stderr EL7/EL8 upcoming and EL9 release: HTCondor 10.4.0 - new feature release Please review the upgrade documentation for any manual steps March 30, 2023: EL9 and Gratia Probe 2.8.4 (all operating systems) \u00b6 No tarball client updates This release does not contain any tarball client updates for EL7 or EL8. Initial EL9 tarballs will be released at a later date. Critical Gratia Probe 2.8.4 update for HTCondor APs for all operating systems, fixing issues with gratia-probe-condor-ap 2.8.1 through 2.8.3. If you have any of these versions, please update and perform the following steps to process # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history* . Too many files for mv If you have a busy AP, you may encounter too many files in the quarantine directory to move all at once. In this case, we suggest moving the history files to the data directory in batches. The Gratia Probe will handle history files in the data directory in bundles so you do not need to wait for processing to complete before moving the next batch over. This is the initial release of OSG Software Stack for EL9! Notable differences between EL9 and EL7/EL8 include: Frontier Squid 5.8-2.1 If you've already installed frontier-squid-5.8-1.1 on an EL9 host... You will need to uninstall frontier-squid and remove /etc/init.d . If you have any other packages that have files in /etc/init.d , they may also need to be reinstalled and cleaned up in a similar fashion. HTCondor 10.3.0: see upstream documentation for manual update steps. HTCondor-CE 6.0.0: see upstream documentation for manual update steps. Missing packages to be released at a later date: hosted-ce-tools htgettoken osg-update-data March 16, 2023: OSDF Client 6.10.0 \u00b6 OSDF Client 6.10.0 The stashcp client, when run in a terminal, can acquire a new token via OAuth2 if supported by the upstream origin The use of the http_proxy can now be disabled via setting the OSDF_DISABLE_HTTP_PROXY environment variable The OSDF client can now handle HTTP 206 Partial Content responses stash_plugin -get-caches <prefix> will print out the caches to be used for a given prefix March 14, 2023: IGTF 1.119, osg-scitokens-mapfile 12, XCache 3.4.0-3 \u00b6 CA certificates based on IGTF 1.119 Updated UKeScience Root (2007) with consistent string encodings (UK) Removed obsolete SHA1 subordinates DigiCertGridTrustCA-Classic and DigiCertGridCA-1-Classic from DigiCert, reflected in RPDNC namespaces Experimental (non-accredited) new InCommon RSA IGTF Server CA 2 (ICA under Sectigo USERTrust RSA root, for which namespaces have been updated) (US) Updated GridCanada CA with re-issued SHA-2 based root (CA) Updated CILogon basic, silver, and openid with re-issued SHA-2 certs (US) Updated UKeScience Root (2007) re-issued with SHA-2, retired 2A ICA (UK) osg-scitokens-mapfile 12 New token for USCMS local pilots XCache 3.4.0-3 Add xrootd-tcp-stats to osdf-cache March 9, 2023: XRootD 5.5.3-1.2, frontier-squid 5.7-2.1, CVMFS 2.10.1 \u00b6 XRootD 5.5.3-1.2 Fix bug where GFAL davs writes fail on EL7 redirectors after eight hours XRootD 5.5.2 was not released because of critical bugs that are fixed in XRootD 5.5.3 frontier-squid-5.7-2.1 Uses first destination IP address that responds ( dns_v4_first removed) Add MAJOR_CVMFS sites cvmfs-stratum-one.cc.kek.jp, cvmfs*.sdcc.bnl.gov Remove obsolete sites frontier*.racf.bnl.gov from ATLAS_FRONTIER Fix bug where old caches may not be cleaned up Note Manual intervention is needed to downgrade frontier-squid . # Downgrade instructions root@host # rpm -e --nodeps frontier-squid root@host # yum install ' frontier-squid < 11 :5 CVMFS 2.10.1 Minor bug fixes and improvements March 2, 2023: gratia-probe 2.8.2, osg-flock 1.9 \u00b6 gratia-probe 2.8.2 CRITICAL bug fix for sites that have installed gratia-probe-htcondor-ce-2.8.2 or gratia-probe-condor-ap-2.8.2 . After updating to 2.8.2, perform the following manual steps for a CE: # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_ce_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history*.0 . And for an AP: # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history* . osg-flock 1.9 Adds the \"OSPool\" attribute to the job ad based on the EP configuration February 21, 2023: VO Package v130 \u00b6 VO Package v130 Update DN for voms2.fnal.gov February 16, 2023: gratia-probe 2.8.1, python-scitokens 1.7.4 \u00b6 gratia-probe 2.8.1 gratia-probe-condor-ap: important reporting update for APs with access to multiple pools (e.g., flocking) python-scitokens 1.7.4 Remove aud enforcement from deserialize function February 7, 2023: VO Package v129 \u00b6 VO Package v129 Update DN for voms1.fnal.gov Note The CA/Browser Forum has changed DN formats again this year. This will affect certificates issued by IGTF CAs. February 2, 2023: GlideinWMS 3.10.1, osg-client 6.9.5, htvault-config 1.14 \u00b6 GlideinWMS 3.10.1 Production release supporting tokens and Python 3 Fix monitoring by including Glidein IDs with SciTokens Add Python module to help with custom scripts Custom setup scripts written in shell should always use the gconfig_* functions introduced in 3.9.6 to read and write glidein configuration. See the release notes for details. osdf-client 6.9.5 Better handling of failures and broken proxies Various token handling bug fixes Add support for specifying token name in URL htvault-config 1.14 Add auditlog option to move the audit log to a different location January 17, 2023: VO Package v128 \u00b6 VO Package v128 Update HCC, GLOW, and OSG VOMS certificates January 3, 2023: htgettoken 1.16 \u00b6 htgettoken 1.16 Fix httokendecode -H functionality to only attempt to convert a parsed word if it is entirely numeric, not if it just contains one digit At the same time, rewrite this functionality in native bash instead of using grep and sed Add htdestroytoken command Add htdecodetoken symbolic link that points to httokendecode December 22, 2022: VO Package v127 \u00b6 VO Package v127 Update VOMS certificates for DESY VOs (IceCube, Belle, ILC, and others) Rebuild packages and sign with new repository key (no software changes) cigetcert cilogon-openid-ca-cert cvmfs-config-osg cvmfs-gateway javascriptrrd osg-ca-certs-updater osg-ca-scripts osg-system-profiler osg-update-vos python-jwt scitokens-credmon December 8, 2022: osg-scitokens-mapfile 11, XRootD 5.5.1, CVMFS 2.10.0, GlideinWMS 3.9.6, XCache 3.3.0, Vault 1.12.1 \u00b6 osg-scitokens-mapfile 11 Support HEPCloud factory XRootD 5.5.1 Fixes critical issue with XRootD FUSE mounts via xrdfs CVMFS 2.10.0 Support for proxy sharding with the new client option CVMFS_PROXY_SHARD={yes|no} Improved use of the kernel page cache resulting in significant client performance improvements in some scenarios Fix for a long-standing open issue regarding the concurrent reading of changing files Support for unpacking container images through Harbor registry proxies in the container conversion tools GlideinWMS 3.9.6 Adds token (and hybrid) support for Clouds (AWS/GCE) XCache 3.3.0 Removed X.509 proxy requirement for an unauthenticated stash-cache instance Vault 1.12.1 Includes a fix to prevent a potential denial of service attack for HA installations November 21, 2022: VO Package v126 \u00b6 VO Package v126 Update VOMS certificates for DESY VOs (IceCube, Belle, ILC, and others) Note Any sites supporting \"IceCube\", \"Belle\", or \"ILC\" must update. If not, pilots will not arrive or jobs will have storage access issues. November 3, 2022: HTCondor-CE 5.1.6, osdf-client 6.9.2, xrootd-multiuser 2.1.2, XCache 3.2.3; Upcoming: HTCondor 9.12.0 \u00b6 HTCondor-CE 5.1.6 HTCondor-CE now uses the C++ Collector plugin for payload job traceability Fix HTCondor-CE mapfiles to be compliant with PCRE2 and HTCondor 9.10.0+ Add support for multiple APEL accounting scaling factors Suppress spurious log message about a missing negotiator Fix crash in HTCondor-CE View osdf-client 6.9.2 (includes stashcp and condor_stash_plugin ) Add support for osdf:// URLs Fix zero-byte file upload error xrootd-multiuser 2.1.2 Fix advertising of files not readable by the \"xrootd\" user XCache 3.2.3 Update XCache systemd overrides for xrootd-multiuser 2.1.2 Upcoming: HTCondor 9.12.0 Provide a mechanism to bootstrap secure authentication within a pool Add the ability to define submit templates Administrators can now extend the help offered by condor_submit Add DAGMan ClassAd attributes to record more information about jobs On Linux, advertise the x86_64 micro-architecture in a slot attribute Added -drain option to condor_off and condor_restart Administrators can now set the shared memory size for Docker jobs Multiple improvements to condor_adstash HAD daemons now use SHA-256 checksums by default October 13, 2022: HTCondor 9.0.17 \u00b6 HTCondor 9.0.17 Fix file descriptor leak when schedd fails to launch scheduler universe jobs Fix failure to forward batch grid universe job's refreshed X.509 proxy Fix DAGMan failure when the DONE keyword appeared in the JOB line Fix HTCondor's handling of extremely large UIDs on Linux Fix bug where OAUTH tokens lose their scope and audience upon refresh Support for Apptainer in addition to Singularity September 29, 2022: XCache 3.2.2 \u00b6 XCache 3.2.2 Allow specifying separate export paths for unauthenticated and authenticated origin instances Allow local scitokens.conf additions Fix authfile generation on origins that serve no public data Note XRootD services should be restarted after this update September 16, 2022: VO Package v125 \u00b6 VO Package v125 DN changes for Gluex VO September 15, 2022: XRootD 5.5.0, stashcp 6.8.1, osg-token-renewer 0.8.3 \u00b6 XRootD 5.5.0 : Multiple new features and bug fixes stashcp 6.8.1 Fix WLCG token discovery Dynamically obtain list of caches based on the source file's namespace osg-token-renewer 0.8.3 Doesn't check for password file when --pw-fd is being used September 9, 2022: VO Package v124 \u00b6 VO Package v124 Add voms1.slac.stanford.edu VOMS server for LSST and SuperCDMS August 30, 2022: VO Package v123, IGTF 1.117 \u00b6 VO Package v123 Update Virgo DNs CA certificates based on IGTF 1.117 Add new intermediate ICA DigiCert Grid-TLS (US) Add new intermediate ICA DigiCert Grid-Client-RSA2048-SHA256-2022-CA1 (US) Removed discontinued NCSA-slcs-2013 following end of XSEDE (US) Removed discontinued PSC-Myproxy-CA following end of XSEDE (US) August 25, 2022: gratia-probe 2.7.1, HTCondor 9.11.0 \u00b6 gratia-probe 2.7.1 Fix condor-ap probe bugs in resource name detection Upcoming: HTCondor 9.11.0 Modified GPU attributes to support the new require_gpus submit command Add PREEMPT_IF_DISK_EXCEEDED and HOLD_IF_DISK_EXCEEDED configuration templates ADVERTISE authorization levels now also provide READ authorization Periodic release expressions no longer apply to manually held jobs If a #! interpreter doesn't exist, a proper hold and log message appears Can now set the Singularity target directory with container_target_dir If SciToken and X.509 available, uses SciToken for arc job authentication Singularity now mounts /tmp and /var/tmp under the scratch directory Fix bug where Singularity jobs go on hold at the first checkpoint Report resources provisioned by the Slurm batch scheduler when available Fix bug where gridmanager deletes the X.509 proxy file instead of the copy Fixes jobs going on hold in the HTCondor-CE with the following message: HoldReason:Failed to get expiration time of proxy: unable to read proxy file August 18, 2022: HTCondor 9.0.16, xrootd-monitoring-shoveler 1.1.2 \u00b6 HTCondor 9.0.16 Singularity now mounts /tmp and /var/tmp under the scratch directory Fix bug where Singularity jobs go on hold at the first checkpoint Fix bug where gridmanager deletes the X.509 proxy file instead of the copy Fixes jobs going on hold in the HTCondor-CE with the following message: HoldReason:Failed to get expiration time of proxy: unable to read proxy file xrootd-monitoring-shoveler 1.1.2 Fix bug that affects those using the auto-renewal of tokens July 28, 2022: gratia-probe 2.7.0, blahp 2.2.1, HTCondor 9.0.15, CVMFS 2.9.3 \u00b6 gratia-probe 2.7.0 Fix issue with accounting for whole node Slurm pilots by reporting allocated CPUs if available Fix broken dcache-transfer probe Improve mechanism to extract Resource Name Add back missing gratia-probe-services package blahp 2.2.1 Report allocated CPUs of Slurm jobs in status result Disable email notifications for blahp->condor jobs HTCondor 9.0.15 Report resources provisioned by the Slurm batch scheduler when available SciToken mapping failures are now recorded in the HTCondor daemon logs Fix bug that stopped file transfers when output and error are the same Ensure that the Python bindings version matches the installed HTCondor $(OPSYSANDVER) now expand properly in job transforms Fix bug where context managed Python htcondor.SecMan sessions crash Fix bug where remote CPU times would rarely be set to zero CVMFS 2.9.3 Bug fix for a type of client crash Bug fix for server garbage collection unreleased locks osg-xrootd 3.6-18 Enable VOMS support in authenticated stash caches and origins Add ability to turn off VOMS support via environment variable XRootD 5.4.3-1.2 Improve logging for xrootd-scitokens htgettoken 1.15 Improve support for vault service using round-robin DNS Upcoming: HTCondor 9.10.1 ActivationSetupDuration is now correct for jobs that checkpoint With collector administrator access, can manage HTCondor pool daemons SciTokens can now be used for authentication with ARC CE servers Prevent negative values when using huge files with file transfer plugins June 16, 2022: HTCondor-CE 5.1.5, gratia-probe 2.6.1, GlideinWMS 3.9.5, HTCondor 9.0.13 \u00b6 HTCondor-CE 5.1.5 Fix whole node job glidein CPUs and GPUs expressions that caused held jobs Fix bug where default CERequirements were being ignored Pass whole node request from GlideinWMS to the batch system Rename AuthToken attributes in the routed job to better support accounting Prevent GSI environment from pointing the job to the wrong certificates Fix issue where HTCondor-CE would need port 9618 open to start up gratia-probe 2.6.1 Log schedd cron errors with newer versions of HTCondor Replace AuthToken* references with routed job attributes Remove certinfo flie log messages Fix crash on send failure GlideinWMS 3.9.5 Support for Apptainer Support for CVMFS on-demand Configurable idtokens lifetime Improved frontend logging Improved default SHARED_PORT configuration Special handling of multiline condor config values Several bug fixes HTCondor 9.0.13 : Bug fix release Schedd and startd cron jobs can now log output upon non-zero exit condor_config_val now produces correct syntax for multi-line values The condor_run tool now reports submit errors and warnings to the terminal Fix issue where Kerberos authentication would fail within DAGMan Fix HTCondor startup failure with certain complex network configurations VO Package v122 Add new sPHENIX and EIC VO certificates XCache 3.1.0 Fixed library dependency issues for xcache-reporter Add systemd overrides for xrootd-privileged XRootD 5.4.3 : Bug fix release stashcp 6.7.5 Adds multi-file transfer and improved error messages Relax download timeouts for file transfer plugin Multiple bug fixes htvault-config 1.13 Removes support for old style secret storage; requires htgettoken >= 1.7 htgettoken 1.12 Avoids crash when verbose output includes UTF-8 osg-pki-tools 3.5.2 Bug fix for osg-incommon-cert-request when using host file osg-token-renewer 0.8.2 Use oidc-agent's built-in password file option Ensure tokens are renewed more frequently than their lifespan rrdtool 1.8.0-1.2.el7 Make Python RRDtools available to GlideinWMS xrootd-multiuser 2.0.4 Fix crash on Enterprise Linux 8 osg-release 3.6-5: Add osg-next yum repository Upcoming HTCondor 9.9.1 A new authentication method for remote HTCondor administration Several changes to improve the security of connections Fix issue where DAGMan direct submission failed when using Kerberos The submission method is now recorded in the job ClassAd Singularity jobs can now pull from Docker style repositories The OWNER authorization level has been folded into the ADMINISTRATOR level May 24, 2022: VO Package v121 \u00b6 VO Package v121 Add new VO certificates for CLAS12 and EIC May 11, 2022: OSG Worker Node Client and Tarballs \u00b6 OSG worker node client 3.6-5 Add in missing stashcp and voms-client-cpp packages Warning The current OASIS tarball link now points to the OSG 3.6 tarball. Packages no longer available in the OSG worker node tarball include: fts-client (was present in EL7 only) MyProxy GridFTP clients (e.g. globus-url-copy) UberFTP SRM and GridFTP plugins for GFAL2 GSISSH client May 5, 2022: HTCondor 9.0.12, XCache 3.0.1, gratia-probe 2.5.2 \u00b6 HTCondor 9.0.12 : Bug fix release XCache 3.0.1 Fixed library dependency issues for xcache-reporter gratia-probe 2.5.2 Remove pre-routed jobs instead of quarantining them Always set MapUnknownToGroup osg-flock 1.8 Remove MapUnknownToGroup and MapGroupToRole from osg-flock Advertise osg-flock version in the osg-flock RPM April 26, 2022: CVMFS 2.9.2, Upcoming: HTCondor 9.8.1 \u00b6 CA certificates based on IGTF 1.116 Updated intermediate CERN Grid CA ICA with extended validity (CERN) CVMFS 2.9.2 : Bug fix release cigetcert 1.20: works better with CILogon's AWS infrastructure osg-ce 3.6-5 Add OSG_SERIES = 3.6 as a schedd attribute Remove default BATCH_GAHP configuration now provided by upstream osg-pki-tools 3.5.1: Python 3 fixes for osg-incommon-cert-request osg-xrootd 3.6-16 Fix stash-cache: enabling VOMS causes unauth cache to crash vault 1.10, htvault-config 1.12 htgettoken 1.11 Update from upstream software and change httokendecode to also verify tokens if scitokens-verify is present VOMS 2: Update default proxy certificate key length to 2048 bits Upcoming: HTCondor 9.8.1 Support for Heterogeneous GPUs, some configuration required Allow HTCondor to use grid sites requiring multi-factor authentication Technology preview: bring your own resources from HPC clusters Fix HTCondor startup failure with complex network configurations April 14, 2022: osg-configure 4.1.1, osg-scitokens-mapfile 8 \u00b6 OSG-Configure 4.1.1 Fix gratia DataFolder/PER_JOB_HISTORY_DIR check for HTCondor-CE with an HTCondor batch system osg-scitokens-mapfile 8 New token mappings for CMS local and USCMS local pilots. New token mappings for HCC pilots March 31, 2022: IGTF 1.115 \u00b6 This release contains updated CA Certificates based on IGTF 1.115 Removed obsolete CNRS2 CAs, superseded by AC-GRID-FR hierarchy (FR) Add supplementary BCDR download location for UGRID-G2 CRL (UA) Extended validity period of HPCI CA (JP) March 24, 2022: XRootD 5.4.2-1.1, HTCondor 9.0.11, stashcp 6.6.0 \u00b6 XRootD 5.4.2 plus OSG patches Add support for VOMS mapfiles Fix DN hashing for HTTPS transfers Add new throttling for max open files and active connections per entity HTCondor 9.0.11 The Job Router can now create an IDTOKEN for use by the job Fix bug where a self-checkpointing job may erroneously be held Fix bug where the Job Router erroneously substitutes a default value Fix bug where a file transfer error may identify the wrong file Fix bug where condor_ssh_to_job may fail to connect stashcp 6.6.0 Rewritten in Go New features HTCondor File Transfer plugin interface Progress bar on interactive terminals Recursive downloads WLCG token discovery python-scitokens 1.7.0 osg-token-renewer 0.8.1: Add support for manual client registration xrootd-monitoring-shoveler 1.0.0 : Initial release vault 1.9.3 htgettoken 1.10 Upcoming HTCondor 9.7.0 Support environment variables, application elements in ARC REST jobs Container universe supports Singularity jobs with hard-coded command DAGMan submits jobs directly (does not shell out to condor_submit) Meaningful error message and sub-code for file transfer failures Add file transfer statistics for file transfer plugins Add named list policy knobs for SYSTEM_PERIODIC_ policies March 15, 2022: High Priority Release \u00b6 HTCondor 9.0.10 and 9.6.0 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: HTCONDOR-2022-0001 HTCONDOR-2022-0002 HTCONDOR-2022-0003 March 3, 2022: XRootD 5.4.1 and GlideinWMS 3.9.4 \u00b6 XRootD 5.4.1 : Bug fix release osg-xrootd 3.6-15 GlideinWMS 3.9.4 Add flexible mount points for CVMFS in the Glideins (not always /cvmfs) Per-Entry IDTOKENS Support per-group SciTokens Frontend and Factory check the expiration of SciTokens, other JWT tokens Bug Fixes: IDTOKEN issuer changed from collector host to trust domain X.509 proxy is now renewed also when using also tokens shared port is now the default in the User (VO) Collector HTCondor February 17, 2022: VO Package v120 \u00b6 VO Package v120 Update FNAL voms1 DN February 10, 2022: HTCondor 9.0.9 LTS \u00b6 HTCondor 9.0.9 LTS The OAUTH credmon is now available on Enterprise Linux 8 Deprecated C-style comments no longer cause the job router to crash VO Package v119 Update OSG VO and GLOW VO DNs hosted-ce-tools 0.9: new for Enterprise Linux 8 scitokens-credmon 0.8.1: new for Enterprise Linux 8 February 3, 2022: Gratia Probe 2.5.1 \u00b6 gratia-probe 2.5.1 Fix a bug that prevented record generation for HTCondor batch systems. Manual intervention required; see this documentation for details. Fix ownership of the record quarantine directory osg-flock 1.7 Fixed capitalization of the OSG VO in the default accounting configuration (access point admins that have already updated to osg-flock-1.6 should change VOOverride=\"OSG\" to VOOverride=\"osg\" in /etc/gratia/condor-ap/ProbeConfig) Dropped configuration required for old versions of HTCondor VO Package v118 Update FNAL voms2 DN January 27, 2022: VO Package v117 and OSG SciTokens mapfile 5 \u00b6 VO Package v117 Update GlueX DN Update hcc-voms2 DN Add ATLAS IAM vomses entry osg-scitokens-mapfile 5 Add default SciTokens mappings for the FNAL VOs January 20, 2022: CVMFS 2.9.0 and HTCondor 5.1.3 updates \u00b6 CA Certificates based on IGTF 1.114 Extend validity for SlovakGrid issuing CA (SK) Remove expired Let's Encrypt ROOT CA X3 and X4 CVMFS 2.9.0 Incremental conversion of container images, resulting in a large speed-up for publishing new container image versions to unpacked.cern.ch Support for maintaining repositories in S3 over HTTPS (not just HTTP) Significant speed-ups for S3 and gateway publisher deployments Various bugfixes and smaller improvements (error reporting, repository statistics, network failure handling, etc.) HTCondor-CE 5.1.3 The HTCondor-CE central collector requires SSL credentials from client CEs Fix BDII crash if an HTCondor Access Point is not available Fix formatting of APEL records that contain huge values HTCondor-CE client mapfiles are not installed on the central collector osg-xrootd 3.6-12 Fix default location for grid-mapfile when using HTTP/WebDAV transfer Fix monitoring of writes osg-ce 3.6-4 Release the osg-ce-bosco package January 13, 2022: XRootD 5.4.0 and Vault updates \u00b6 XRootD 5.4.0 : New feature release Fix problem interacting with version 5.1 or 5.2 origin servers xrootd-tcp-stats 1.0.0: Initial release of TCP statistics plugin vault 1.9.0, htvault-config 1.11, htgettoken 1.9 upgrade to latest vault add support for ssh-agent authentication VO Package v116 Add second Belle2 VOMS server oidc-agent 4.2.4 Upgrade to new major version from version 3.3.3 NOTE: oidc-agent must be restarted after upgrade osg-scitokens-mapfile 4 Add default ATLAS token mappings osg-pki-tools 3.5.0-2: Upgrade to Python 3 December 9, 2021: XRootD and HTCondor updates \u00b6 Problem interoperating with older origin servers If an XRootD 5.3.4 cache interacts with a 5.1 or 5.2 origin and there is an asyncio error, it may crash the origin. Please upgrade your origin at your earliest convenience. You may turn off asyncio ( async off ) on either end to avoid the problem. XRootD 5.3.4 Fix uncorrectable checksum errors in XCache Origins HTCondor 9.0.8 LTS X.509 proxy delegation now works in OSG 3.6 Fix bug where huge values of ImageSize and others would end up negative Fix bug in how MAX_JOBS_PER_OWNER applied to late materialization jobs Fix bug where the schedd could choose a slot with insufficient disk space Fix crash in ClassAd substr() function when the offset is out of range Fix bug in Kerberos code that can crash on macOS and could leak memory Fix bug where a job is ignored for 20 minutes if the startd claim fails December 1, 2021: Initial XRootD release \u00b6 XRootD 5.3.2 Initial release of XRootD in OSG 3.6 XCache 3.0.0 Initial release of XCache in OSG 3.6 HTCondor 9.0.7 : Bug fix release Fix bug where condor_gpu_discovery could crash with older CUDA libraries Fix bug where condor_watch_q would fail on machines with older kernels condor_watch_q no longer has a limit on the number of job event log files Fix bug where a startd could crash claiming a slot with p-slot preemption Fix bug where a job start would not be recorded when a shadow reconnects VO Package v115 Add CMS IAM vomses entry Update WLCG VO certificate GlideinWMS 3.9.3 Type validation support to the check_python3_expr.py script Drops the encondingSupport.py module and its unit tests Fixes an encoding problem affecting cloud submissions Pegasus 5.0.1 First OSG release of the Pegasus 5 series Upcoming HTCondor 9.3.0 File transfer plugin sample code to aid in developing new plugins Add generic knob to set the slot user for all slots November 11, 2021: osg-flock and gratia-probes \u00b6 osg-flock 1.6-3 Update probe configuration to support Open Science Pool Overhaul configuration for HTCondor 9.0 gratia-probe 2.3.3 Add gratia-probe-condor-ap for user job accounting of HTCondor Access Points Drop unused XRootD transfer probes Fix default HTCondor-CE probe directory configurations and ownership October 13, 2021: Initial osg-token-renewer release \u00b6 Initial release of the osg-token-renewer : a service to manage automatic renewal of bearer tokens from OIDC providers (e.g., CILogon, IAM), intended for use by VO managers blahp 2.1.3 : Bug fix release Include the more efficient LSF status script Fix status caching on EL7 for PBS, Slurm, and LSF October 5, 2021: IGTF 1.113 \u00b6 This release contains updated CA Certificates based on IGTF 1.113 Suspended MD-GRID CA due to network resolution issues (MD) September 30, 2021: Urgent Let's Encrypt CA certificate update \u00b6 Please update osg-ca-certs as soon as possible. Applications and tools using OpenSSL such as wget, HTCondor, and XRootD, will to fail to establish TLS/HTTPS connections to servers using Let's Encrypt certificates with a \"certificate has expired\" message. This release of OSG 3.6 contains the following packages: osg-ca-certs 1.99 : Remove expired Let's Encrypt CA certificate osg-wn-client: Fix installation issue causes by EPEL's gfal2 update CVMFS 2.8.2 : Bug fix release cvmfs-x509-helper 2.2-2: Fix a number of issues with SciTokens support HTCondor 9.0.6 CUDA_VISIBLE_DEVICES can now contain GPU- formatted values Fix a bug that caused jobs to fail when using Singularity versions > 3.7 Fix bugs relating to the transfer of standard output and error logs vault 1.8.2, htvault-config 1.6, htgettoken 1.6: Minor improvements Upcoming HTCondor 9.2.0 Add DAGMan SERVICE node, used to monitor or report on DAG workflow Fix problem where proxy delegation to HTCondor versions < 9.1.3 failed Jobs are now re-run if the execute directory unexpectedly disappears HTCondor counts the number of files transferred at the submit node Fix a bug that caused jobs to fail when using Singularity versions > 3.7 September 23, 2021: HTCondor-CE 5.1.2 \u00b6 This release of OSG 3.6 contains the following packages: HTCondor-CE 5.1.2 Fixed the default memory and CPU requests when using job router transforms Apply default MaxJobs and MaxJobsIdle when using job router transforms Improved SciTokens support in submission tools Fixed --debug flag in condor_ce_run Update configuration verification script to handle job router transforms Corrected ownership of the HTCondor PER_JOBS_HISTORY_DIR Fix bug passing maximum wall time requests to the local batch system September 9, 2021: HTCondor 9.0.5 and blahp 2.1.1 \u00b6 This release of OSG 3.6 contains the following packages: HTCondor 9.0.5 : Bug fix release Other authentication methods are tried if mapping fails using SciTokens Fix rare crashes from successful condor_submit , which caused DAGMan issues Fix bug where ExitCode attribute would be suppressed when OnExitHold fired condor_who now suppresses spurious warnings coming from netstat The online manual now has detailed instructions for installing on MacOS Fix bug where misconfigured MIG devices would cause no GPUs to be detected The transfer_checkpoint_file list may now include input files blahp 2.1.1 : Bug fix release Add Python 2 support back for Enterprise Linux 7 Allow the user to override system configuration files Enable flexible configuration via a configuration directory Fix Slurm resource usage reporting August 16, 2021: IGTF 1.112 \u00b6 This release contains updated CA Certificates based on IGTF 1.112 Updated ANSPGrid CA with extended validity date (BR) August 12, 2021: Gratia probes 2.1.0 \u00b6 Gratia probes 2.1.0 Fix a problem that caused a traceback message in the condor_meter Fix a traceback caused by missing LogLevel in ProbeConfig Ensure that Gratia accounts for SciTokens-based pilots August 5, 2021: VOMS Update, htvault-config 1.4, htgettoken 1.3 \u00b6 VOMS 2.0.16-1.2 (EL7) and VOMS 2.1.0-0.14.rc2.2 (EL8) Add IAM and TLS SNI support htvault-config 1.4 and htgettoken 1.3 Improved security through more fine-grained vault tokens and detailed logging Miscellaneous improvements July 30, 2021: High Priority Release \u00b6 HTCondor 9.0.4 and 9.1.2 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: HTCONDOR-2021-0003 HTCONDOR-2021-0004 July 27, 2021: High Priority Release \u00b6 HTCondor 9.0.3 and 9.1.1 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: Unfortunately, these releases did not fully mitigate the vulnerability described in HTCONDOR-2021-0003 HTCONDOR-2021-0003 HTCONDOR-2021-0004 July 22, 2021: HTCondor 9.0.2 and blahp 2.1.0 \u00b6 This release of OSG 3.6 contains the following packages: HTCondor 9.0.2-1.1 : Bug fix release HTCondor can be setup to use only FIPS 140-2 approved security functions If the Singularity test fails, the job returns to the idle state Can divide GPU memory, when making multiple GPU entries for a single GPU Startd and Schedd cron job maximum line length increased to 64k bytes Added first class submit keywords for SciTokens Fixed MUNGE authentication blahp 2.1.0 : Bug fix release Fix bug where GPU request was not passed onto the batch script Fix issue where proxy symlinks were not cleaned up by not creating them Fix bug where output files are overwritten if no transfer output remap Added support for passing in extra submit arguments from the job ad July 15, 2021: VO Package v114 \u00b6 This release contains an updated VO Package with the following changes: Fix typo in CLAS12 and EIC VOMS certificate issuers Add LSC files for CERN VO IAM endpoints July 1, 2021: Frontier Squid 4.15-2.1, Vault 1.7.3, Upcoming: HTCondor 9.1.0 \u00b6 This release of OSG 3.6 contains the following packages: Frontier Squid 4.15-2.1 : Fix log rotation when not compressing Vault 1.7.3 : Bug fix release htvault-config 1.2: Updated to match vault 1.7.3 Upcoming HTCondor 9.1.0 : Start of next feature series June 24, 2021: HTCondor 9.0.1, HTCondor-CE 5.1.1 \u00b6 This release of OSG 3.6 contains the following packages: HTCondor 9.0.1-1.2 : Bug fix release Fix problem where X.509 proxy refresh kills job when using AES encryption Fix problem when jobs require a different machine after a failure Fix problem where a job matched a machine it can't use, delaying job start Fix exit code and retry checking when a job exits because of a signal Fix a memory leak in the job router when a job is removed via job policy Fixed the back-end support for the bosco_cluster --add command HTCondor-CE 5.1.1 Improve restart time of HTCondor-CE View Fix bug that caused HTCondor-CE to ignore incoming BatchRuntime requests Fixed error that occurred during RPM installation of non-HTCondor batch systems regarding missing file batch_gahp June 16, 2021: VO Package v113 \u00b6 This release contains an updated VO Package with the following changes: Added new CLAS12 and EIC VO certificates Retired old CLAS12 and EIC VO certificates June 3, 2021: Vault security update and gratia probes \u00b6 This release of OSG 3.6 contains the following packages: gratia-probe 1.23.3: Fix problem that could cause pilot hours to be zero for non-HTCondor batch systems vault 1.7.2 : Security update; fixes CVE-2021-32923. (OSG configuration not vulnerable) May 25, 2021: IGTF 1.111 \u00b6 This release contains updated CA Certificates based on IGTF 1.111 Removed discontinued NERSC-SLCS CA (US) Removed discontinued MYIFAM CA (MY) May 17, 2021: HTCondor-CE 5.1.0 and HTCondor 9.0.0 \u00b6 This release of OSG 3.6 contains the following packages: HTCondor 9.0.0-1.5 : Major new release with enhanced security Blahp 2.0.2 : GPU Support, Converted to Python 3 HTCondor-CE 5.1.0 Support for Job Router Transform configuration syntax Credential mapping changes Converted to Python 3 osg-scitokens-mapfile 3: Updated to support HTCondor-CE 5.1.0 osg-ce: now requires osg-scitokens-mapfile vault 1.7.1: Update to latest upstream release htvault-config 1.1: Uses yaml configuration files htgettoken 1.2: improved error message handling and bug fixes May 13, 2021: High Priority Release \u00b6 This release of OSG 3.6 contains the following packages: Frontier Squid 4.15-1.2 : Closes multiple security vulnerabilities Updated CA certificates based on IGTF 1.110 osg-ca-certs 1.96 : Fixed Let's Encrypt signing policy to accept cross-signing chain April 22, 2021: CVMFS 2.8.1 \u00b6 This release of OSG 3.6 contains the following packages: CVMFS 2.8.1 : Bug fix release gratia-probe 1.23.2 : Converted to use Python 3 March 25, 2021: HTCondor 8.9.11 patches \u00b6 This release of OSG 3.6 contains the following packages: HTCondor 8.9.11-1.4 (EL7 only) Fixes a potential SchedD crash when using malformed tokens condor_watch_q now works on DAGs vo-client-110-1 with updated WeNMR VOMS information Additionally, the following packages that were already available in OSG 3.6 for EL7 were released for EL8: osg-scitokens-mapfile-1-1 containing a new HTCondor-CE mapfile for VO token issuers vault-1.6.2-1 and htvault-config-0.5-1 for managing tokens cvmfs-gateway-1.2.0-1 : note the upstream documentation for updating from version 0.2.5 February 26, 2021: 3.6 Released \u00b6 Where are GlideinWMS and XRootD? XRootD and GlideinWMS are both absent in the initial OSG 3.6 release: we expect major version updates that may require manual intervention for both of these packages so we are holding their initial releases in this series until they are ready. This initial release of the OSG 3.6 release series is based on the packages available in OSG 3.5.31. One of the major changes in this release series is the shift to token-based authentication from GSI proxy-based authentication. Here is a list of the differences in this initial release: GridFTP, GSI, and Hadoop are no longer available Added packages to support token-based authentication HTCondor 8.9.11 : initial token support (8.9.12, which will contain default configuration using tokens, was delayed) HTCondor-CE 5.0.0: support for Python 3 Gratia Probe 2.0.0 : replace all batch system probes with the non-root HTCondor-CE probe OSG-Configure 4.0.0 : Deprecated RSV Dropped unused configuration modules and attributes Reorganized some configuration (see update instructions for more details) In addition, we have updated our Software Release Policy to follow a rolling release model. Finally, our Docker image releases will more closely track our OSG 3.6 repositories. Announcements \u00b6 Updates to critical packages also announced by email and are sent to the following recipients and lists: Registered administrative contacts osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org site-announce@opensciencegrid.org software-discuss@osg-htc.org","title":"News"},{"location":"release/osg-36/#osg-36-news","text":"Supported OS Versions: EL7, EL8, EL9 The OSG 3.6 release series is a major overhaul of the OSG software stack compared to previous release series with changes to core protocols used for authentication and data transfer: bearer tokens, such as SciTokens or WLCG tokens, are used for authentication instead of GSI proxies and HTTP is used for data transfer instead of GridFTP. To support these new protocols, OSG 3.6 includes HTCondor 9.0, HTCondor-CE 5, GlideinWMS 3.9, and XRootD 5.4. We also dropped support for the GridFTP, GSI authentication, and Hadoop.","title":"OSG 3.6 News"},{"location":"release/osg-36/#known-issues","text":"The following issues are known to currently affect packages distributed in OSG 3.6:","title":"Known Issues"},{"location":"release/osg-36/#preparing-for-htcondor-100","text":"We have released HTCondor version 10.0 into the OSG repositories. Note The condor-upgrade-checks RPM version 10.0.5 works with existing HTCondor 9.0.x installations. It can be installed with either HTCondor version 9 or 10. HTCondor-CE and HTCondor pool administrators should install the condor-upgrade-checks RPM and run the condor_upgrade_check script to check for actions that need to be taken before upgrading to HTCondor version 10. This script checks for three possible issues: HTCondor upgrade causing a change in TRUST_DOMAIN which would invalidate existing IDTOKENS Recent and current GPU jobs that will no longer match, because the new require_gpus condor_submit command must be used for GPU matching HTCondor map files that have regular expressions that the new PCRE2 library will not accept To check your Access Point configuration run: root@access-point # condor_upgrade_check To check your Compute Entrypoint configuration run: root@htcondor-ce # condor_upgrade_check -ce Note Don't forget to check to HTCondor batch system as well For more information, consult the HTCondor documentation","title":"Preparing for HTCondor 10.0"},{"location":"release/osg-36/#ca-certificates-on-el9","text":"EL9 operating systems have a tighter default cryptographic policy that can cause services to reject certificates issued by SHA-1 signed CAs. Some CAs in the igtf-ca-certs and osg-ca-certs packages are affected and you may see service issues if your server certificate or certificates presented by clients are issued by these CAs. The Software Team is investigating solutions but in the meantime, we recommend running the following command on XRootD hosts to accept certificates issued by SHA-1 signed CAs: root@host # update-crypto-policies --set DEFAULT:SHA1 Do I need to run this on my Compute Entrypoint (CE) hosts? No. At this time, the Software Team believes that CE hosts are unaffected since their clients only present tokens and token issuers present modern CAs.","title":"CA Certificates on EL9"},{"location":"release/osg-36/#rrdtool","text":"To improve support for Python 3 based GlideinWMS in EL7, the EL7 OSG Yum repositories contain a newer version of rrdtool than the operating system repositories. This may cause dependency solving issues with non-OSG packages. Therefore, on EL7 hosts that are not running GlideinWMS, we suggest adding the following line under the [osg] section of /etc/yum.repos.d/osg.repo : excludepkgs=rrdtool","title":"rrdtool"},{"location":"release/osg-36/#latest-news","text":"","title":"Latest News"},{"location":"release/osg-36/#november-2-2023-igtf-1124-cvmfs-2112-cvmfs-x509-helper-24","text":"CA certificates based on IGTF 1.124 Updated contact meta-data for ArmeSFo authority (AM) Removed discontinued AEGIS authority (RS) Removed suspended KENET Root and issuing CAs (KE) Removed suspended SDG-G2 authority (CN) Removed suspended CNIC authority (CN) Removed all four discontinued DigitalTrust CAs operated by their issuer (AE) CVMFS 2.11.2 Bug fix release cvmfs-x509-helper 2.4 Important bug fix for reading credentials from within an unprivileged user namespace such as unprivileged apptainer users. This is needed due to a change in recent el8 & el9 kernels.","title":"November 2, 2023: IGTF 1.124, CVMFS 2.11.2, cvmfs-x509-helper 2.4"},{"location":"release/osg-36/#october-26-2023-cvmfs-2111-13-xrootd-562-23-htcondor-ce-601-osg-update-vos-142-2","text":"CVMFS 2.11.1-1.3 Important fix to bug impacting osgstorage.org repositories introduced in 2.11.0 -- all 2.11.0 installations should upgrade urgently Fix race conditions on concurrent fuse3 mounts XRootD 5.6.2-2.3 Update to -2.3 release to avoid confusion with upstream -2 release Fix a bug with parsing compound IDs in authfiles HTCondor-CE 6.0.1 Add grid CA and host certificate/key locations to default SSL search paths Verifies that HTCondor-CE can access the local HTCondor's SPOOL directory Can use condor_ce_trace without SciToken to test batch system integration condor_ce_upgrade_check checks compatibility with HTCondor 23.0 Adds deprecation warnings for old job router configuration syntax osg-update-vos 1.4.2-2 tarballs now contain cpio, so osg-update-vos will work","title":"October 26, 2023: CVMFS 2.11.1-1.3, XRootD 5.6.2-2.3, HTCondor-CE 6.0.1, osg-update-vos 1.4.2-2"},{"location":"release/osg-36/#october-5-2023-htcondor-1009-htcondor-1090-xrootd-562-glideinwms-3105-cvmfs-2110-cvmfs-x509-helper-23-osg-pki-tools-361-oidc-agent-452","text":"HTCondor 10.0.9 : EL7, EL8 The condor_upgrade_check script now provides guidance on updating to 23.0 Avoid kernel panic on some Enterprise Linux 8 systems Fix bug where early termination of service nodes could crash DAGMan The htchirp Python binding now properly locates the chirp configuration Limit email about long file transfer queue to once daily Various fixes to condor_adstash HTCondor 10.9.0 : EL7 Upcoming, EL8 Upcoming, EL9 Fold the classads, blahp, and procd RPMs into the main condor RPM On Linux, the default configuration enforces memory limits with cgroups condor_status -gpus shows nodes with GPUs and the GPU properties condor_status -compact shows a row for each slot type New ENV command controls which environment variables are present in DAGMan All the fixes from HTCondor 10.0.9 (listed above) XRootD 5.6.2 New feature release: XRootD 5.6.0 with many improvements Plus two bug fix releases: XRootD 5.6.1 and XRootD 5.6.2 GlideinWMS 3.10.5 Enterprise Linux 9 and Python 3.9 support Added structured logging CVMFS 2.11.0 Support for symlink kernel caching A new reference-counted cache manager mode that reduces the number of open file descriptors A bug fix for an issue that could slow down client startup A new telemetry option to send client performance counters to influx cvmfs-x509-helper 2.3 Fixes to support Enterprise Linux 9 Fix for tokens that are bigger than 1024 bytes Fix usage of $BEARER_TOKEN when accessing data osg-pki-tools 3.6.1 Add configuration file for osg-incommon-cert-request Update default CSR key length to 4096, add CLI option oidc-agent 4.5.2 Update to a recent release that has timeouts to prevent hangs","title":"October 5, 2023: HTCondor 10.0.9, HTCondor 10.9.0, XRootD 5.6.2, GlideinWMS 3.10.5, CVMFS 2.11.0, cvmfs-x509-helper 2.3, osg-pki-tools 3.6.1, oidc-agent 4.5.2"},{"location":"release/osg-36/#september-8-2023-igtf-1123-2","text":"Warning If you updated to osg-ca-certs-1.114-1.1 or igtf-ca-certs-1.123-1.1, update to osg-ca-certs-1.114-2 or igtf-ca-certs-123-2 as soon a possible. Java-based services may need to be restarted to pick up the new certificates. Reverted work around for supporting SHA1-signed CA certificates on systems with tight cryptographic policies (i.e., the EL9 default)","title":"September 8, 2023: IGTF 1.123-2"},{"location":"release/osg-36/#september-7-2023-igtf-1123-htgettoken-120-pegasus-506","text":"CA certificates based on IGTF 1.123 Add ECC private trust hierarchy for GEANT (Research and Education) TCS (EU) Added accredited private trust eMudhra IGTF root and issuers (IN) Resolved issue on EL9 with SHA1 signed Certificate Authorities htgettoken 1.20 Adds httokensh command to automatically renew access tokens as long a subshell runs Update httokensh to by default set the minimum vault token time to live to 6 days, and to make sure that the background refresh never gets a new vault token Changed the preferred name of httokendecode to htdecodetoken , keeping links in the opposite direction Add man pages for httokensh , htdestroytoken , and htdecodetoken Pegasus 5.0.6 : Bug fix release","title":"September 7, 2023: IGTF 1.123, htgettoken 1.20, Pegasus 5.0.6"},{"location":"release/osg-36/#august-10-2023-frontier-squid-59-11-xrootd-multiuser-213-13","text":"frontier-squid 5.9-1.1 Improvement of debug logging related to the reply_body_max_size parameter Consistent with squid5, disallow the combination of multiple workers, ufs cache, and memory_cache_shared even if collapsed_forwarding is off. Limit the maximum number of file descriptors to 65536 even if the OS would allow a higher number xrootd-multiuser 2.1.3-1.3 Add support for supplementary groups First release of worker node tarballs for EL9","title":"August 10, 2023: frontier-squid 5.9-1.1, xrootd-multiuser 2.1.3-1.3"},{"location":"release/osg-36/#august-8-2023-igtf-1122","text":"CA certificates based on IGTF 1.122 Added private trust hierarchy for GEANT (Research and Education) TCS (EU) (RSA variants only) Added accredited eMudhra joint public trust root and issuing CAs (IN) Added private trust eMudhra IGTF root and issuers as experimental (IN, US)","title":"August 8, 2023: IGTF 1.122"},{"location":"release/osg-36/#august-2-2023-htcondor-1007-upcoming-htcondor-1070","text":"Danger The format of the HTCondor job queue log has changed. Once you have updated the Access Point and HTCondor-CE (i.e., hosts with a condor_schedd daemon) to HTCondor 10.7.0, you may only downgrade to a version that can parse this new format. (LTS: 10.0.4 and later, feature: 10.5.0 and later) We recommend upgrading your Access Points and HTCondor-CE hosts to the latest 10.0.x release or 10.5.0 first, then proceeding with an upgrade to 10.7.0. HTCondor 10.0.7 : EL7, EL8 Fixed bug where held condor cron jobs would never run when released Improved daemon IDTOKENS logging to make useful messages more prominent Remove limit on certificate chain length in SSL authentication condor_config_val -summary now works with a remote configuration query Prints detailed message when condor_remote_cluster fails to fetch a URL Improvements to condor_preen HTCondor 10.7.0 : EL7 Upcoming, EL8 Upcoming, EL9 Can run defrag daemons with different policies on distinct sets of nodes Added want_io_proxy submit command Apptainer is now included in the HTCondor tarballs Fix 10.5.0 bug where reported CPU time is very low when using cgroups v1 Fix 10.5.0 bug where .job.ad and .machine.ad were missing for local jobs","title":"August 2, 2023: HTCondor 10.0.7; Upcoming: HTCondor 10.7.0"},{"location":"release/osg-36/#july-19-2023-htcondor-1006-osg-xrootd-36-20-xcache-350-2-osg-ca-scripts-124-2-upcoming-htcondor-1060","text":"Danger The format of the HTCondor job queue log has changed. Once you have updated the Access Point and HTCondor-CE (i.e., hosts with a condor_schedd daemon) to HTCondor 10.6.0, you may only downgrade to a version that can parse this new format. (LTS: 10.0.4 and later, feature: 10.5.0 and later) We recommend upgrading your Access Points and HTCondor-CE hosts to the latest 10.0.x release or 10.5.0 first, then proceeding with an upgrade to 10.6.0. HTCondor 10.0.6 : EL7, EL8 In SSL Authentication, use the identity instead of the X.509 proxy subject Can use environment variable to locate the client's SSL X.509 credential ClassAd aggregate functions now tolerate undefined values Fix Python binding bug where accounting ads were omitted from the result The Python bindings now properly report the HTCondor version remote_initial_dir works when submitting remote grid batch jobs via ssh Add a ClassAd stringlist subset match function osg-xrootd 3.6-20 Allow create_macaroon_secret to be run as a non-root user XCache 3.5.0-2 Add dependency on the xrdcl-http package osg-ca-scripts 1.2.4-2 Update package dependencies for Enterprise Linux 9 HTCondor 10.6.0 : EL7 Upcoming, EL8 Upcoming, EL9 Administrators can enable and disable job submission for a specific user Work around memory leak in libcurl on EL7 when using the ARC-CE GAHP Container images may now be transferred via a file transfer plugin Add submit file macro $(JobId) which expands to full ID of the job The job's executable is no longer renamed to condor_exec.exe","title":"July 19, 2023: HTCondor 10.0.6, osg-xrootd 3.6-20, XCache 3.5.0-2, osg-ca-scripts 1.2.4-2; Upcoming: HTCondor 10.6.0"},{"location":"release/osg-36/#june-29-2023-xrootd-555-12-osdf-client-6121-hosted-ce-tools-10","text":"XRootD 5.5.5-1.2 Patched to allow Diffie-Hellman key exchange between Enterprise Linux 7 clients and Enterprise Linux 9 servers osdf-client 6.12.1 Bug fixes and improvements, notably with regard to authenticated access hosted-ce-tools 1.0 Dereference hardlinks when extracting the cvmfsexec tarball More aggressively kill timed out rsync processes Convert update worker node client scripts to Python 3","title":"June 29, 2023: XRootD 5.5.5-1.2, osdf-client 6.12.1, hosted-ce-tools 1.0"},{"location":"release/osg-36/#june-20-2023-igtf-1121","text":"CA certificates based on IGTF 1.121 Added accredited (classic) InCommon RSA IGTF Server CA 3 under the Sectigo USERTrust RSA root, for which namespaces have been updated (US)","title":"June 20, 2023: IGTF 1.121"},{"location":"release/osg-36/#june-9-2023-htcondor-1005","text":"HTCondor 10.0.5 : EL7, EL8 Rename upgrade9to10checks.py script to condor_upgrade_check Fix spurious warning from condor_upgrade_check about regular expression that contain a space Note The condor-upgrade-checks RPM version 10.0.5 works with existing HTCondor 9.0.x installations. It can be installed with either HTCondor version 9 or 10.","title":"June 9, 2023: HTCondor 10.0.5"},{"location":"release/osg-36/#june-8-2023-htcondor-1004-xcache-350-frontier-squid-58-igtf-1120-upcoming-htcondor-1051","text":"HTCondor 10.0.4 : EL7, EL8 Users can prevent runaway jobs by specifying an allowed duration Able to extend submit commands and create job submit templates Initial implementation of htcondor command line interface Initial implementation of Job Sets in the htcondor CLI tool Users can supply a container image without concern for which container runtime is used Add the ability to select a particular model of GPU when the execution points have heterogeneous GPU cards installed or cards that support nVidia MIG File transfer error messages are now returned and clearly indicate where the error occurred HTCondor now utilizes ARC-CE's REST interface Support for ARM and PowerPC for Enterprise Linux 8 Security Enhancements For IDTOKENS, signing key not required on every execution point Trust on first use ability for SSL connections Improvements against replay attacks XCache 3.5.0 The authfile updater pulls a grid-mapfile from Topology frontier-squid 5.8 Add predefined ACL named to_linklocal Bug fix for the cache manager returning mgr_index rather than data CA certificates based on IGTF 1.120 Added transitional CDP mirror URLs for retiring DigitalTrust CAs (AE) Removed discontinued NIIF-Root-CA-2 (HU) Removed expiring GermanGrid (GridKA CrossGrid) CA (DE) htgettoken 1.18 Fixes bug with --nobearertoken when invoked by HTCondor EL9 support osg-token-renewer 0.8.3-2 : Remove X11 UI dependencies osg-update-vos 1.4.1: Remove Python 2 dependencies cigetcert 1.21: Remove warning on EL9 HTCondor 10.5.1 : EL7 Upcoming, EL8 Upcoming, EL9 Can now define DAGMan save points to be able to rerun DAGs from there Expand environment variables passed by default to the DAGMan manager Administrators can prevent users using \"getenv = true\" in submit files Improved throughput when submitting a large number of ARC-CE jobs Execute events contain the slot name, sandbox path, resource quantities Can add attributes of the execution point to be recorded in the user log Enhanced condor_transform_ads tool to ease offline job transform testing Fix bug where memory limits over 2 GiB might not be correctly enforced","title":"June 8, 2023: HTCondor 10.0.4, XCache 3.5.0, frontier-squid 5.8, IGTF 1.120; Upcoming: HTCondor 10.5.1"},{"location":"release/osg-36/#may-30-2023-htcondor-9017-3-osdf-client-6110","text":"HTCondor 9.0.17-3 Provides script to assist updating from HTCondor version 9 to version 10 osdf-client 6.11.0 Distinguish between slow and stopped transfers Fix token finding bug introduced in 6.10.0","title":"May 30, 2023: HTCondor 9.0.17-3, osdf-client 6.11.0"},{"location":"release/osg-36/#may-18-2023-xrootd-555-vault-1132-htvault-config-115-voms-206-16-210-014rc26","text":"XRootD 5.5.5 : Bug fix release Vault 1.13.2, htvault-config 1.15 Update to latest upstream plugin versions Add new API for token exchange Fix bug where kerberos policydomain was ignored VOMS 2.0.16-1.6 (el7), 2.1.0-0.14.rc2.6 (el8) More detailed error messages to help diagnose CA or certificate issues","title":"May 18, 2023: XRootD 5.5.5, vault 1.13.2, htvault-config 1.15, VOMS 2.0.6-1.6 + 2.1.0-0.14.rc2.6"},{"location":"release/osg-36/#may-4-2023-xrootd-554-vo-pacakge-v131","text":"XRootD 5.5.4 Bug fix release Enabled xrdcl-http, an HTTP plugin to the XRootD clients VO Package v131 New CLAS2 and EIC certificates","title":"May 4, 2023: XRootD 5.5.4, VO Pacakge v131"},{"location":"release/osg-36/#april-20-2023-htcondor-ce-600-htgettoken-117-upcoming-htcondor-1040","text":"HTCondor-CE 6.0.0 Align HTCondor-CE security configuration with HTCondor defaults Add example configuration on how to ban users Add condor_ce_transform_ads command Improve essential directory checking and creation at startup htgettoken 1.17 Make --showbearerurl work properly in combination with --nobearertoken httokendecode 's error message for missing token file now goes to stderr EL7/EL8 upcoming and EL9 release: HTCondor 10.4.0 - new feature release Please review the upgrade documentation for any manual steps","title":"April 20, 2023: HTCondor-CE 6.0.0, htgettoken 1.17; Upcoming: HTCondor 10.4.0"},{"location":"release/osg-36/#march-30-2023-el9-and-gratia-probe-284-all-operating-systems","text":"No tarball client updates This release does not contain any tarball client updates for EL7 or EL8. Initial EL9 tarballs will be released at a later date. Critical Gratia Probe 2.8.4 update for HTCondor APs for all operating systems, fixing issues with gratia-probe-condor-ap 2.8.1 through 2.8.3. If you have any of these versions, please update and perform the following steps to process # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history* . Too many files for mv If you have a busy AP, you may encounter too many files in the quarantine directory to move all at once. In this case, we suggest moving the history files to the data directory in batches. The Gratia Probe will handle history files in the data directory in bundles so you do not need to wait for processing to complete before moving the next batch over. This is the initial release of OSG Software Stack for EL9! Notable differences between EL9 and EL7/EL8 include: Frontier Squid 5.8-2.1 If you've already installed frontier-squid-5.8-1.1 on an EL9 host... You will need to uninstall frontier-squid and remove /etc/init.d . If you have any other packages that have files in /etc/init.d , they may also need to be reinstalled and cleaned up in a similar fashion. HTCondor 10.3.0: see upstream documentation for manual update steps. HTCondor-CE 6.0.0: see upstream documentation for manual update steps. Missing packages to be released at a later date: hosted-ce-tools htgettoken osg-update-data","title":"March 30, 2023: EL9 and Gratia Probe 2.8.4 (all operating systems)"},{"location":"release/osg-36/#march-16-2023-osdf-client-6100","text":"OSDF Client 6.10.0 The stashcp client, when run in a terminal, can acquire a new token via OAuth2 if supported by the upstream origin The use of the http_proxy can now be disabled via setting the OSDF_DISABLE_HTTP_PROXY environment variable The OSDF client can now handle HTTP 206 Partial Content responses stash_plugin -get-caches <prefix> will print out the caches to be used for a given prefix","title":"March 16, 2023: OSDF Client 6.10.0"},{"location":"release/osg-36/#march-14-2023-igtf-1119-osg-scitokens-mapfile-12-xcache-340-3","text":"CA certificates based on IGTF 1.119 Updated UKeScience Root (2007) with consistent string encodings (UK) Removed obsolete SHA1 subordinates DigiCertGridTrustCA-Classic and DigiCertGridCA-1-Classic from DigiCert, reflected in RPDNC namespaces Experimental (non-accredited) new InCommon RSA IGTF Server CA 2 (ICA under Sectigo USERTrust RSA root, for which namespaces have been updated) (US) Updated GridCanada CA with re-issued SHA-2 based root (CA) Updated CILogon basic, silver, and openid with re-issued SHA-2 certs (US) Updated UKeScience Root (2007) re-issued with SHA-2, retired 2A ICA (UK) osg-scitokens-mapfile 12 New token for USCMS local pilots XCache 3.4.0-3 Add xrootd-tcp-stats to osdf-cache","title":"March 14, 2023: IGTF 1.119, osg-scitokens-mapfile 12, XCache 3.4.0-3"},{"location":"release/osg-36/#march-9-2023-xrootd-553-12-frontier-squid-57-21-cvmfs-2101","text":"XRootD 5.5.3-1.2 Fix bug where GFAL davs writes fail on EL7 redirectors after eight hours XRootD 5.5.2 was not released because of critical bugs that are fixed in XRootD 5.5.3 frontier-squid-5.7-2.1 Uses first destination IP address that responds ( dns_v4_first removed) Add MAJOR_CVMFS sites cvmfs-stratum-one.cc.kek.jp, cvmfs*.sdcc.bnl.gov Remove obsolete sites frontier*.racf.bnl.gov from ATLAS_FRONTIER Fix bug where old caches may not be cleaned up Note Manual intervention is needed to downgrade frontier-squid . # Downgrade instructions root@host # rpm -e --nodeps frontier-squid root@host # yum install ' frontier-squid < 11 :5 CVMFS 2.10.1 Minor bug fixes and improvements","title":"March 9, 2023: XRootD 5.5.3-1.2, frontier-squid 5.7-2.1, CVMFS 2.10.1"},{"location":"release/osg-36/#march-2-2023-gratia-probe-282-osg-flock-19","text":"gratia-probe 2.8.2 CRITICAL bug fix for sites that have installed gratia-probe-htcondor-ce-2.8.2 or gratia-probe-condor-ap-2.8.2 . After updating to 2.8.2, perform the following manual steps for a CE: # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_ce_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history*.0 . And for an AP: # By default, this will bring you to /var/lib/condor-ce/gratia/data/ root@host # cd $( condor_config_val PER_JOB_HISTORY_DIR ) root@host # mv quarantine/history* . osg-flock 1.9 Adds the \"OSPool\" attribute to the job ad based on the EP configuration","title":"March 2, 2023: gratia-probe 2.8.2, osg-flock 1.9"},{"location":"release/osg-36/#february-21-2023-vo-package-v130","text":"VO Package v130 Update DN for voms2.fnal.gov","title":"February 21, 2023: VO Package v130"},{"location":"release/osg-36/#february-16-2023-gratia-probe-281-python-scitokens-174","text":"gratia-probe 2.8.1 gratia-probe-condor-ap: important reporting update for APs with access to multiple pools (e.g., flocking) python-scitokens 1.7.4 Remove aud enforcement from deserialize function","title":"February 16, 2023: gratia-probe 2.8.1, python-scitokens 1.7.4"},{"location":"release/osg-36/#february-7-2023-vo-package-v129","text":"VO Package v129 Update DN for voms1.fnal.gov Note The CA/Browser Forum has changed DN formats again this year. This will affect certificates issued by IGTF CAs.","title":"February 7, 2023: VO Package v129"},{"location":"release/osg-36/#february-2-2023-glideinwms-3101-osg-client-695-htvault-config-114","text":"GlideinWMS 3.10.1 Production release supporting tokens and Python 3 Fix monitoring by including Glidein IDs with SciTokens Add Python module to help with custom scripts Custom setup scripts written in shell should always use the gconfig_* functions introduced in 3.9.6 to read and write glidein configuration. See the release notes for details. osdf-client 6.9.5 Better handling of failures and broken proxies Various token handling bug fixes Add support for specifying token name in URL htvault-config 1.14 Add auditlog option to move the audit log to a different location","title":"February 2, 2023: GlideinWMS 3.10.1, osg-client 6.9.5, htvault-config 1.14"},{"location":"release/osg-36/#january-17-2023-vo-package-v128","text":"VO Package v128 Update HCC, GLOW, and OSG VOMS certificates","title":"January 17, 2023: VO Package v128"},{"location":"release/osg-36/#january-3-2023-htgettoken-116","text":"htgettoken 1.16 Fix httokendecode -H functionality to only attempt to convert a parsed word if it is entirely numeric, not if it just contains one digit At the same time, rewrite this functionality in native bash instead of using grep and sed Add htdestroytoken command Add htdecodetoken symbolic link that points to httokendecode","title":"January 3, 2023: htgettoken 1.16"},{"location":"release/osg-36/#december-22-2022-vo-package-v127","text":"VO Package v127 Update VOMS certificates for DESY VOs (IceCube, Belle, ILC, and others) Rebuild packages and sign with new repository key (no software changes) cigetcert cilogon-openid-ca-cert cvmfs-config-osg cvmfs-gateway javascriptrrd osg-ca-certs-updater osg-ca-scripts osg-system-profiler osg-update-vos python-jwt scitokens-credmon","title":"December 22, 2022: VO Package v127"},{"location":"release/osg-36/#december-8-2022-osg-scitokens-mapfile-11-xrootd-551-cvmfs-2100-glideinwms-396-xcache-330-vault-1121","text":"osg-scitokens-mapfile 11 Support HEPCloud factory XRootD 5.5.1 Fixes critical issue with XRootD FUSE mounts via xrdfs CVMFS 2.10.0 Support for proxy sharding with the new client option CVMFS_PROXY_SHARD={yes|no} Improved use of the kernel page cache resulting in significant client performance improvements in some scenarios Fix for a long-standing open issue regarding the concurrent reading of changing files Support for unpacking container images through Harbor registry proxies in the container conversion tools GlideinWMS 3.9.6 Adds token (and hybrid) support for Clouds (AWS/GCE) XCache 3.3.0 Removed X.509 proxy requirement for an unauthenticated stash-cache instance Vault 1.12.1 Includes a fix to prevent a potential denial of service attack for HA installations","title":"December 8, 2022: osg-scitokens-mapfile 11, XRootD 5.5.1, CVMFS 2.10.0, GlideinWMS 3.9.6, XCache 3.3.0, Vault 1.12.1"},{"location":"release/osg-36/#november-21-2022-vo-package-v126","text":"VO Package v126 Update VOMS certificates for DESY VOs (IceCube, Belle, ILC, and others) Note Any sites supporting \"IceCube\", \"Belle\", or \"ILC\" must update. If not, pilots will not arrive or jobs will have storage access issues.","title":"November 21, 2022: VO Package v126"},{"location":"release/osg-36/#november-3-2022-htcondor-ce-516-osdf-client-692-xrootd-multiuser-212-xcache-323-upcoming-htcondor-9120","text":"HTCondor-CE 5.1.6 HTCondor-CE now uses the C++ Collector plugin for payload job traceability Fix HTCondor-CE mapfiles to be compliant with PCRE2 and HTCondor 9.10.0+ Add support for multiple APEL accounting scaling factors Suppress spurious log message about a missing negotiator Fix crash in HTCondor-CE View osdf-client 6.9.2 (includes stashcp and condor_stash_plugin ) Add support for osdf:// URLs Fix zero-byte file upload error xrootd-multiuser 2.1.2 Fix advertising of files not readable by the \"xrootd\" user XCache 3.2.3 Update XCache systemd overrides for xrootd-multiuser 2.1.2 Upcoming: HTCondor 9.12.0 Provide a mechanism to bootstrap secure authentication within a pool Add the ability to define submit templates Administrators can now extend the help offered by condor_submit Add DAGMan ClassAd attributes to record more information about jobs On Linux, advertise the x86_64 micro-architecture in a slot attribute Added -drain option to condor_off and condor_restart Administrators can now set the shared memory size for Docker jobs Multiple improvements to condor_adstash HAD daemons now use SHA-256 checksums by default","title":"November 3, 2022: HTCondor-CE 5.1.6, osdf-client 6.9.2, xrootd-multiuser 2.1.2, XCache 3.2.3; Upcoming: HTCondor 9.12.0"},{"location":"release/osg-36/#october-13-2022-htcondor-9017","text":"HTCondor 9.0.17 Fix file descriptor leak when schedd fails to launch scheduler universe jobs Fix failure to forward batch grid universe job's refreshed X.509 proxy Fix DAGMan failure when the DONE keyword appeared in the JOB line Fix HTCondor's handling of extremely large UIDs on Linux Fix bug where OAUTH tokens lose their scope and audience upon refresh Support for Apptainer in addition to Singularity","title":"October 13, 2022: HTCondor 9.0.17"},{"location":"release/osg-36/#september-29-2022-xcache-322","text":"XCache 3.2.2 Allow specifying separate export paths for unauthenticated and authenticated origin instances Allow local scitokens.conf additions Fix authfile generation on origins that serve no public data Note XRootD services should be restarted after this update","title":"September 29, 2022: XCache 3.2.2"},{"location":"release/osg-36/#september-16-2022-vo-package-v125","text":"VO Package v125 DN changes for Gluex VO","title":"September 16, 2022: VO Package v125"},{"location":"release/osg-36/#september-15-2022-xrootd-550-stashcp-681-osg-token-renewer-083","text":"XRootD 5.5.0 : Multiple new features and bug fixes stashcp 6.8.1 Fix WLCG token discovery Dynamically obtain list of caches based on the source file's namespace osg-token-renewer 0.8.3 Doesn't check for password file when --pw-fd is being used","title":"September 15, 2022: XRootD 5.5.0, stashcp 6.8.1, osg-token-renewer 0.8.3"},{"location":"release/osg-36/#september-9-2022-vo-package-v124","text":"VO Package v124 Add voms1.slac.stanford.edu VOMS server for LSST and SuperCDMS","title":"September 9, 2022: VO Package v124"},{"location":"release/osg-36/#august-30-2022-vo-package-v123-igtf-1117","text":"VO Package v123 Update Virgo DNs CA certificates based on IGTF 1.117 Add new intermediate ICA DigiCert Grid-TLS (US) Add new intermediate ICA DigiCert Grid-Client-RSA2048-SHA256-2022-CA1 (US) Removed discontinued NCSA-slcs-2013 following end of XSEDE (US) Removed discontinued PSC-Myproxy-CA following end of XSEDE (US)","title":"August 30, 2022: VO Package v123, IGTF 1.117"},{"location":"release/osg-36/#august-25-2022-gratia-probe-271-htcondor-9110","text":"gratia-probe 2.7.1 Fix condor-ap probe bugs in resource name detection Upcoming: HTCondor 9.11.0 Modified GPU attributes to support the new require_gpus submit command Add PREEMPT_IF_DISK_EXCEEDED and HOLD_IF_DISK_EXCEEDED configuration templates ADVERTISE authorization levels now also provide READ authorization Periodic release expressions no longer apply to manually held jobs If a #! interpreter doesn't exist, a proper hold and log message appears Can now set the Singularity target directory with container_target_dir If SciToken and X.509 available, uses SciToken for arc job authentication Singularity now mounts /tmp and /var/tmp under the scratch directory Fix bug where Singularity jobs go on hold at the first checkpoint Report resources provisioned by the Slurm batch scheduler when available Fix bug where gridmanager deletes the X.509 proxy file instead of the copy Fixes jobs going on hold in the HTCondor-CE with the following message: HoldReason:Failed to get expiration time of proxy: unable to read proxy file","title":"August 25, 2022: gratia-probe 2.7.1, HTCondor 9.11.0"},{"location":"release/osg-36/#august-18-2022-htcondor-9016-xrootd-monitoring-shoveler-112","text":"HTCondor 9.0.16 Singularity now mounts /tmp and /var/tmp under the scratch directory Fix bug where Singularity jobs go on hold at the first checkpoint Fix bug where gridmanager deletes the X.509 proxy file instead of the copy Fixes jobs going on hold in the HTCondor-CE with the following message: HoldReason:Failed to get expiration time of proxy: unable to read proxy file xrootd-monitoring-shoveler 1.1.2 Fix bug that affects those using the auto-renewal of tokens","title":"August 18, 2022: HTCondor 9.0.16, xrootd-monitoring-shoveler 1.1.2"},{"location":"release/osg-36/#july-28-2022-gratia-probe-270-blahp-221-htcondor-9015-cvmfs-293","text":"gratia-probe 2.7.0 Fix issue with accounting for whole node Slurm pilots by reporting allocated CPUs if available Fix broken dcache-transfer probe Improve mechanism to extract Resource Name Add back missing gratia-probe-services package blahp 2.2.1 Report allocated CPUs of Slurm jobs in status result Disable email notifications for blahp->condor jobs HTCondor 9.0.15 Report resources provisioned by the Slurm batch scheduler when available SciToken mapping failures are now recorded in the HTCondor daemon logs Fix bug that stopped file transfers when output and error are the same Ensure that the Python bindings version matches the installed HTCondor $(OPSYSANDVER) now expand properly in job transforms Fix bug where context managed Python htcondor.SecMan sessions crash Fix bug where remote CPU times would rarely be set to zero CVMFS 2.9.3 Bug fix for a type of client crash Bug fix for server garbage collection unreleased locks osg-xrootd 3.6-18 Enable VOMS support in authenticated stash caches and origins Add ability to turn off VOMS support via environment variable XRootD 5.4.3-1.2 Improve logging for xrootd-scitokens htgettoken 1.15 Improve support for vault service using round-robin DNS Upcoming: HTCondor 9.10.1 ActivationSetupDuration is now correct for jobs that checkpoint With collector administrator access, can manage HTCondor pool daemons SciTokens can now be used for authentication with ARC CE servers Prevent negative values when using huge files with file transfer plugins","title":"July 28, 2022: gratia-probe 2.7.0, blahp 2.2.1, HTCondor 9.0.15, CVMFS 2.9.3"},{"location":"release/osg-36/#june-16-2022-htcondor-ce-515-gratia-probe-261-glideinwms-395-htcondor-9013","text":"HTCondor-CE 5.1.5 Fix whole node job glidein CPUs and GPUs expressions that caused held jobs Fix bug where default CERequirements were being ignored Pass whole node request from GlideinWMS to the batch system Rename AuthToken attributes in the routed job to better support accounting Prevent GSI environment from pointing the job to the wrong certificates Fix issue where HTCondor-CE would need port 9618 open to start up gratia-probe 2.6.1 Log schedd cron errors with newer versions of HTCondor Replace AuthToken* references with routed job attributes Remove certinfo flie log messages Fix crash on send failure GlideinWMS 3.9.5 Support for Apptainer Support for CVMFS on-demand Configurable idtokens lifetime Improved frontend logging Improved default SHARED_PORT configuration Special handling of multiline condor config values Several bug fixes HTCondor 9.0.13 : Bug fix release Schedd and startd cron jobs can now log output upon non-zero exit condor_config_val now produces correct syntax for multi-line values The condor_run tool now reports submit errors and warnings to the terminal Fix issue where Kerberos authentication would fail within DAGMan Fix HTCondor startup failure with certain complex network configurations VO Package v122 Add new sPHENIX and EIC VO certificates XCache 3.1.0 Fixed library dependency issues for xcache-reporter Add systemd overrides for xrootd-privileged XRootD 5.4.3 : Bug fix release stashcp 6.7.5 Adds multi-file transfer and improved error messages Relax download timeouts for file transfer plugin Multiple bug fixes htvault-config 1.13 Removes support for old style secret storage; requires htgettoken >= 1.7 htgettoken 1.12 Avoids crash when verbose output includes UTF-8 osg-pki-tools 3.5.2 Bug fix for osg-incommon-cert-request when using host file osg-token-renewer 0.8.2 Use oidc-agent's built-in password file option Ensure tokens are renewed more frequently than their lifespan rrdtool 1.8.0-1.2.el7 Make Python RRDtools available to GlideinWMS xrootd-multiuser 2.0.4 Fix crash on Enterprise Linux 8 osg-release 3.6-5: Add osg-next yum repository Upcoming HTCondor 9.9.1 A new authentication method for remote HTCondor administration Several changes to improve the security of connections Fix issue where DAGMan direct submission failed when using Kerberos The submission method is now recorded in the job ClassAd Singularity jobs can now pull from Docker style repositories The OWNER authorization level has been folded into the ADMINISTRATOR level","title":"June 16, 2022: HTCondor-CE 5.1.5, gratia-probe 2.6.1, GlideinWMS 3.9.5, HTCondor 9.0.13"},{"location":"release/osg-36/#may-24-2022-vo-package-v121","text":"VO Package v121 Add new VO certificates for CLAS12 and EIC","title":"May 24, 2022: VO Package v121"},{"location":"release/osg-36/#may-11-2022-osg-worker-node-client-and-tarballs","text":"OSG worker node client 3.6-5 Add in missing stashcp and voms-client-cpp packages Warning The current OASIS tarball link now points to the OSG 3.6 tarball. Packages no longer available in the OSG worker node tarball include: fts-client (was present in EL7 only) MyProxy GridFTP clients (e.g. globus-url-copy) UberFTP SRM and GridFTP plugins for GFAL2 GSISSH client","title":"May 11, 2022: OSG Worker Node Client and Tarballs"},{"location":"release/osg-36/#may-5-2022-htcondor-9012-xcache-301-gratia-probe-252","text":"HTCondor 9.0.12 : Bug fix release XCache 3.0.1 Fixed library dependency issues for xcache-reporter gratia-probe 2.5.2 Remove pre-routed jobs instead of quarantining them Always set MapUnknownToGroup osg-flock 1.8 Remove MapUnknownToGroup and MapGroupToRole from osg-flock Advertise osg-flock version in the osg-flock RPM","title":"May 5, 2022: HTCondor 9.0.12, XCache 3.0.1, gratia-probe 2.5.2"},{"location":"release/osg-36/#april-26-2022-cvmfs-292-upcoming-htcondor-981","text":"CA certificates based on IGTF 1.116 Updated intermediate CERN Grid CA ICA with extended validity (CERN) CVMFS 2.9.2 : Bug fix release cigetcert 1.20: works better with CILogon's AWS infrastructure osg-ce 3.6-5 Add OSG_SERIES = 3.6 as a schedd attribute Remove default BATCH_GAHP configuration now provided by upstream osg-pki-tools 3.5.1: Python 3 fixes for osg-incommon-cert-request osg-xrootd 3.6-16 Fix stash-cache: enabling VOMS causes unauth cache to crash vault 1.10, htvault-config 1.12 htgettoken 1.11 Update from upstream software and change httokendecode to also verify tokens if scitokens-verify is present VOMS 2: Update default proxy certificate key length to 2048 bits Upcoming: HTCondor 9.8.1 Support for Heterogeneous GPUs, some configuration required Allow HTCondor to use grid sites requiring multi-factor authentication Technology preview: bring your own resources from HPC clusters Fix HTCondor startup failure with complex network configurations","title":"April 26, 2022: CVMFS 2.9.2, Upcoming: HTCondor 9.8.1"},{"location":"release/osg-36/#april-14-2022-osg-configure-411-osg-scitokens-mapfile-8","text":"OSG-Configure 4.1.1 Fix gratia DataFolder/PER_JOB_HISTORY_DIR check for HTCondor-CE with an HTCondor batch system osg-scitokens-mapfile 8 New token mappings for CMS local and USCMS local pilots. New token mappings for HCC pilots","title":"April 14, 2022: osg-configure 4.1.1, osg-scitokens-mapfile 8"},{"location":"release/osg-36/#march-31-2022-igtf-1115","text":"This release contains updated CA Certificates based on IGTF 1.115 Removed obsolete CNRS2 CAs, superseded by AC-GRID-FR hierarchy (FR) Add supplementary BCDR download location for UGRID-G2 CRL (UA) Extended validity period of HPCI CA (JP)","title":"March 31, 2022: IGTF 1.115"},{"location":"release/osg-36/#march-24-2022-xrootd-542-11-htcondor-9011-stashcp-660","text":"XRootD 5.4.2 plus OSG patches Add support for VOMS mapfiles Fix DN hashing for HTTPS transfers Add new throttling for max open files and active connections per entity HTCondor 9.0.11 The Job Router can now create an IDTOKEN for use by the job Fix bug where a self-checkpointing job may erroneously be held Fix bug where the Job Router erroneously substitutes a default value Fix bug where a file transfer error may identify the wrong file Fix bug where condor_ssh_to_job may fail to connect stashcp 6.6.0 Rewritten in Go New features HTCondor File Transfer plugin interface Progress bar on interactive terminals Recursive downloads WLCG token discovery python-scitokens 1.7.0 osg-token-renewer 0.8.1: Add support for manual client registration xrootd-monitoring-shoveler 1.0.0 : Initial release vault 1.9.3 htgettoken 1.10 Upcoming HTCondor 9.7.0 Support environment variables, application elements in ARC REST jobs Container universe supports Singularity jobs with hard-coded command DAGMan submits jobs directly (does not shell out to condor_submit) Meaningful error message and sub-code for file transfer failures Add file transfer statistics for file transfer plugins Add named list policy knobs for SYSTEM_PERIODIC_ policies","title":"March 24, 2022: XRootD 5.4.2-1.1, HTCondor 9.0.11, stashcp 6.6.0"},{"location":"release/osg-36/#march-15-2022-high-priority-release","text":"HTCondor 9.0.10 and 9.6.0 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: HTCONDOR-2022-0001 HTCONDOR-2022-0002 HTCONDOR-2022-0003","title":"March 15, 2022: High Priority Release"},{"location":"release/osg-36/#march-3-2022-xrootd-541-and-glideinwms-394","text":"XRootD 5.4.1 : Bug fix release osg-xrootd 3.6-15 GlideinWMS 3.9.4 Add flexible mount points for CVMFS in the Glideins (not always /cvmfs) Per-Entry IDTOKENS Support per-group SciTokens Frontend and Factory check the expiration of SciTokens, other JWT tokens Bug Fixes: IDTOKEN issuer changed from collector host to trust domain X.509 proxy is now renewed also when using also tokens shared port is now the default in the User (VO) Collector HTCondor","title":"March 3, 2022: XRootD 5.4.1 and GlideinWMS 3.9.4"},{"location":"release/osg-36/#february-17-2022-vo-package-v120","text":"VO Package v120 Update FNAL voms1 DN","title":"February 17, 2022: VO Package v120"},{"location":"release/osg-36/#february-10-2022-htcondor-909-lts","text":"HTCondor 9.0.9 LTS The OAUTH credmon is now available on Enterprise Linux 8 Deprecated C-style comments no longer cause the job router to crash VO Package v119 Update OSG VO and GLOW VO DNs hosted-ce-tools 0.9: new for Enterprise Linux 8 scitokens-credmon 0.8.1: new for Enterprise Linux 8","title":"February 10, 2022: HTCondor 9.0.9 LTS"},{"location":"release/osg-36/#february-3-2022-gratia-probe-251","text":"gratia-probe 2.5.1 Fix a bug that prevented record generation for HTCondor batch systems. Manual intervention required; see this documentation for details. Fix ownership of the record quarantine directory osg-flock 1.7 Fixed capitalization of the OSG VO in the default accounting configuration (access point admins that have already updated to osg-flock-1.6 should change VOOverride=\"OSG\" to VOOverride=\"osg\" in /etc/gratia/condor-ap/ProbeConfig) Dropped configuration required for old versions of HTCondor VO Package v118 Update FNAL voms2 DN","title":"February 3, 2022: Gratia Probe 2.5.1"},{"location":"release/osg-36/#january-27-2022-vo-package-v117-and-osg-scitokens-mapfile-5","text":"VO Package v117 Update GlueX DN Update hcc-voms2 DN Add ATLAS IAM vomses entry osg-scitokens-mapfile 5 Add default SciTokens mappings for the FNAL VOs","title":"January 27, 2022: VO Package v117 and OSG SciTokens mapfile 5"},{"location":"release/osg-36/#january-20-2022-cvmfs-290-and-htcondor-513-updates","text":"CA Certificates based on IGTF 1.114 Extend validity for SlovakGrid issuing CA (SK) Remove expired Let's Encrypt ROOT CA X3 and X4 CVMFS 2.9.0 Incremental conversion of container images, resulting in a large speed-up for publishing new container image versions to unpacked.cern.ch Support for maintaining repositories in S3 over HTTPS (not just HTTP) Significant speed-ups for S3 and gateway publisher deployments Various bugfixes and smaller improvements (error reporting, repository statistics, network failure handling, etc.) HTCondor-CE 5.1.3 The HTCondor-CE central collector requires SSL credentials from client CEs Fix BDII crash if an HTCondor Access Point is not available Fix formatting of APEL records that contain huge values HTCondor-CE client mapfiles are not installed on the central collector osg-xrootd 3.6-12 Fix default location for grid-mapfile when using HTTP/WebDAV transfer Fix monitoring of writes osg-ce 3.6-4 Release the osg-ce-bosco package","title":"January 20, 2022: CVMFS 2.9.0 and HTCondor 5.1.3 updates"},{"location":"release/osg-36/#january-13-2022-xrootd-540-and-vault-updates","text":"XRootD 5.4.0 : New feature release Fix problem interacting with version 5.1 or 5.2 origin servers xrootd-tcp-stats 1.0.0: Initial release of TCP statistics plugin vault 1.9.0, htvault-config 1.11, htgettoken 1.9 upgrade to latest vault add support for ssh-agent authentication VO Package v116 Add second Belle2 VOMS server oidc-agent 4.2.4 Upgrade to new major version from version 3.3.3 NOTE: oidc-agent must be restarted after upgrade osg-scitokens-mapfile 4 Add default ATLAS token mappings osg-pki-tools 3.5.0-2: Upgrade to Python 3","title":"January 13, 2022: XRootD 5.4.0 and Vault updates"},{"location":"release/osg-36/#december-9-2021-xrootd-and-htcondor-updates","text":"Problem interoperating with older origin servers If an XRootD 5.3.4 cache interacts with a 5.1 or 5.2 origin and there is an asyncio error, it may crash the origin. Please upgrade your origin at your earliest convenience. You may turn off asyncio ( async off ) on either end to avoid the problem. XRootD 5.3.4 Fix uncorrectable checksum errors in XCache Origins HTCondor 9.0.8 LTS X.509 proxy delegation now works in OSG 3.6 Fix bug where huge values of ImageSize and others would end up negative Fix bug in how MAX_JOBS_PER_OWNER applied to late materialization jobs Fix bug where the schedd could choose a slot with insufficient disk space Fix crash in ClassAd substr() function when the offset is out of range Fix bug in Kerberos code that can crash on macOS and could leak memory Fix bug where a job is ignored for 20 minutes if the startd claim fails","title":"December 9, 2021: XRootD and HTCondor updates"},{"location":"release/osg-36/#december-1-2021-initial-xrootd-release","text":"XRootD 5.3.2 Initial release of XRootD in OSG 3.6 XCache 3.0.0 Initial release of XCache in OSG 3.6 HTCondor 9.0.7 : Bug fix release Fix bug where condor_gpu_discovery could crash with older CUDA libraries Fix bug where condor_watch_q would fail on machines with older kernels condor_watch_q no longer has a limit on the number of job event log files Fix bug where a startd could crash claiming a slot with p-slot preemption Fix bug where a job start would not be recorded when a shadow reconnects VO Package v115 Add CMS IAM vomses entry Update WLCG VO certificate GlideinWMS 3.9.3 Type validation support to the check_python3_expr.py script Drops the encondingSupport.py module and its unit tests Fixes an encoding problem affecting cloud submissions Pegasus 5.0.1 First OSG release of the Pegasus 5 series Upcoming HTCondor 9.3.0 File transfer plugin sample code to aid in developing new plugins Add generic knob to set the slot user for all slots","title":"December 1, 2021: Initial XRootD release"},{"location":"release/osg-36/#november-11-2021-osg-flock-and-gratia-probes","text":"osg-flock 1.6-3 Update probe configuration to support Open Science Pool Overhaul configuration for HTCondor 9.0 gratia-probe 2.3.3 Add gratia-probe-condor-ap for user job accounting of HTCondor Access Points Drop unused XRootD transfer probes Fix default HTCondor-CE probe directory configurations and ownership","title":"November 11, 2021: osg-flock and gratia-probes"},{"location":"release/osg-36/#october-13-2021-initial-osg-token-renewer-release","text":"Initial release of the osg-token-renewer : a service to manage automatic renewal of bearer tokens from OIDC providers (e.g., CILogon, IAM), intended for use by VO managers blahp 2.1.3 : Bug fix release Include the more efficient LSF status script Fix status caching on EL7 for PBS, Slurm, and LSF","title":"October 13, 2021: Initial osg-token-renewer release"},{"location":"release/osg-36/#october-5-2021-igtf-1113","text":"This release contains updated CA Certificates based on IGTF 1.113 Suspended MD-GRID CA due to network resolution issues (MD)","title":"October 5, 2021: IGTF 1.113"},{"location":"release/osg-36/#september-30-2021-urgent-lets-encrypt-ca-certificate-update","text":"Please update osg-ca-certs as soon as possible. Applications and tools using OpenSSL such as wget, HTCondor, and XRootD, will to fail to establish TLS/HTTPS connections to servers using Let's Encrypt certificates with a \"certificate has expired\" message. This release of OSG 3.6 contains the following packages: osg-ca-certs 1.99 : Remove expired Let's Encrypt CA certificate osg-wn-client: Fix installation issue causes by EPEL's gfal2 update CVMFS 2.8.2 : Bug fix release cvmfs-x509-helper 2.2-2: Fix a number of issues with SciTokens support HTCondor 9.0.6 CUDA_VISIBLE_DEVICES can now contain GPU- formatted values Fix a bug that caused jobs to fail when using Singularity versions > 3.7 Fix bugs relating to the transfer of standard output and error logs vault 1.8.2, htvault-config 1.6, htgettoken 1.6: Minor improvements Upcoming HTCondor 9.2.0 Add DAGMan SERVICE node, used to monitor or report on DAG workflow Fix problem where proxy delegation to HTCondor versions < 9.1.3 failed Jobs are now re-run if the execute directory unexpectedly disappears HTCondor counts the number of files transferred at the submit node Fix a bug that caused jobs to fail when using Singularity versions > 3.7","title":"September 30, 2021: Urgent Let's Encrypt CA certificate update"},{"location":"release/osg-36/#september-23-2021-htcondor-ce-512","text":"This release of OSG 3.6 contains the following packages: HTCondor-CE 5.1.2 Fixed the default memory and CPU requests when using job router transforms Apply default MaxJobs and MaxJobsIdle when using job router transforms Improved SciTokens support in submission tools Fixed --debug flag in condor_ce_run Update configuration verification script to handle job router transforms Corrected ownership of the HTCondor PER_JOBS_HISTORY_DIR Fix bug passing maximum wall time requests to the local batch system","title":"September 23, 2021: HTCondor-CE 5.1.2"},{"location":"release/osg-36/#september-9-2021-htcondor-905-and-blahp-211","text":"This release of OSG 3.6 contains the following packages: HTCondor 9.0.5 : Bug fix release Other authentication methods are tried if mapping fails using SciTokens Fix rare crashes from successful condor_submit , which caused DAGMan issues Fix bug where ExitCode attribute would be suppressed when OnExitHold fired condor_who now suppresses spurious warnings coming from netstat The online manual now has detailed instructions for installing on MacOS Fix bug where misconfigured MIG devices would cause no GPUs to be detected The transfer_checkpoint_file list may now include input files blahp 2.1.1 : Bug fix release Add Python 2 support back for Enterprise Linux 7 Allow the user to override system configuration files Enable flexible configuration via a configuration directory Fix Slurm resource usage reporting","title":"September 9, 2021: HTCondor 9.0.5 and blahp 2.1.1"},{"location":"release/osg-36/#august-16-2021-igtf-1112","text":"This release contains updated CA Certificates based on IGTF 1.112 Updated ANSPGrid CA with extended validity date (BR)","title":"August 16, 2021: IGTF 1.112"},{"location":"release/osg-36/#august-12-2021-gratia-probes-210","text":"Gratia probes 2.1.0 Fix a problem that caused a traceback message in the condor_meter Fix a traceback caused by missing LogLevel in ProbeConfig Ensure that Gratia accounts for SciTokens-based pilots","title":"August 12, 2021: Gratia probes 2.1.0"},{"location":"release/osg-36/#august-5-2021-voms-update-htvault-config-14-htgettoken-13","text":"VOMS 2.0.16-1.2 (EL7) and VOMS 2.1.0-0.14.rc2.2 (EL8) Add IAM and TLS SNI support htvault-config 1.4 and htgettoken 1.3 Improved security through more fine-grained vault tokens and detailed logging Miscellaneous improvements","title":"August 5, 2021: VOMS Update, htvault-config 1.4, htgettoken 1.3"},{"location":"release/osg-36/#july-30-2021-high-priority-release","text":"HTCondor 9.0.4 and 9.1.2 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: HTCONDOR-2021-0003 HTCONDOR-2021-0004","title":"July 30, 2021: High Priority Release"},{"location":"release/osg-36/#july-27-2021-high-priority-release","text":"HTCondor 9.0.3 and 9.1.1 Security Release. This release contains fixes for important security issues. More details on the security issues are in the vulnerability reports: Unfortunately, these releases did not fully mitigate the vulnerability described in HTCONDOR-2021-0003 HTCONDOR-2021-0003 HTCONDOR-2021-0004","title":"July 27, 2021: High Priority Release"},{"location":"release/osg-36/#july-22-2021-htcondor-902-and-blahp-210","text":"This release of OSG 3.6 contains the following packages: HTCondor 9.0.2-1.1 : Bug fix release HTCondor can be setup to use only FIPS 140-2 approved security functions If the Singularity test fails, the job returns to the idle state Can divide GPU memory, when making multiple GPU entries for a single GPU Startd and Schedd cron job maximum line length increased to 64k bytes Added first class submit keywords for SciTokens Fixed MUNGE authentication blahp 2.1.0 : Bug fix release Fix bug where GPU request was not passed onto the batch script Fix issue where proxy symlinks were not cleaned up by not creating them Fix bug where output files are overwritten if no transfer output remap Added support for passing in extra submit arguments from the job ad","title":"July 22, 2021: HTCondor 9.0.2 and blahp 2.1.0"},{"location":"release/osg-36/#july-15-2021-vo-package-v114","text":"This release contains an updated VO Package with the following changes: Fix typo in CLAS12 and EIC VOMS certificate issuers Add LSC files for CERN VO IAM endpoints","title":"July 15, 2021: VO Package v114"},{"location":"release/osg-36/#july-1-2021-frontier-squid-415-21-vault-173-upcoming-htcondor-910","text":"This release of OSG 3.6 contains the following packages: Frontier Squid 4.15-2.1 : Fix log rotation when not compressing Vault 1.7.3 : Bug fix release htvault-config 1.2: Updated to match vault 1.7.3 Upcoming HTCondor 9.1.0 : Start of next feature series","title":"July 1, 2021: Frontier Squid 4.15-2.1, Vault 1.7.3, Upcoming: HTCondor 9.1.0"},{"location":"release/osg-36/#june-24-2021-htcondor-901-htcondor-ce-511","text":"This release of OSG 3.6 contains the following packages: HTCondor 9.0.1-1.2 : Bug fix release Fix problem where X.509 proxy refresh kills job when using AES encryption Fix problem when jobs require a different machine after a failure Fix problem where a job matched a machine it can't use, delaying job start Fix exit code and retry checking when a job exits because of a signal Fix a memory leak in the job router when a job is removed via job policy Fixed the back-end support for the bosco_cluster --add command HTCondor-CE 5.1.1 Improve restart time of HTCondor-CE View Fix bug that caused HTCondor-CE to ignore incoming BatchRuntime requests Fixed error that occurred during RPM installation of non-HTCondor batch systems regarding missing file batch_gahp","title":"June 24, 2021: HTCondor 9.0.1, HTCondor-CE 5.1.1"},{"location":"release/osg-36/#june-16-2021-vo-package-v113","text":"This release contains an updated VO Package with the following changes: Added new CLAS12 and EIC VO certificates Retired old CLAS12 and EIC VO certificates","title":"June 16, 2021: VO Package v113"},{"location":"release/osg-36/#june-3-2021-vault-security-update-and-gratia-probes","text":"This release of OSG 3.6 contains the following packages: gratia-probe 1.23.3: Fix problem that could cause pilot hours to be zero for non-HTCondor batch systems vault 1.7.2 : Security update; fixes CVE-2021-32923. (OSG configuration not vulnerable)","title":"June 3, 2021: Vault security update and gratia probes"},{"location":"release/osg-36/#may-25-2021-igtf-1111","text":"This release contains updated CA Certificates based on IGTF 1.111 Removed discontinued NERSC-SLCS CA (US) Removed discontinued MYIFAM CA (MY)","title":"May 25, 2021: IGTF 1.111"},{"location":"release/osg-36/#may-17-2021-htcondor-ce-510-and-htcondor-900","text":"This release of OSG 3.6 contains the following packages: HTCondor 9.0.0-1.5 : Major new release with enhanced security Blahp 2.0.2 : GPU Support, Converted to Python 3 HTCondor-CE 5.1.0 Support for Job Router Transform configuration syntax Credential mapping changes Converted to Python 3 osg-scitokens-mapfile 3: Updated to support HTCondor-CE 5.1.0 osg-ce: now requires osg-scitokens-mapfile vault 1.7.1: Update to latest upstream release htvault-config 1.1: Uses yaml configuration files htgettoken 1.2: improved error message handling and bug fixes","title":"May 17, 2021: HTCondor-CE 5.1.0 and HTCondor 9.0.0"},{"location":"release/osg-36/#may-13-2021-high-priority-release","text":"This release of OSG 3.6 contains the following packages: Frontier Squid 4.15-1.2 : Closes multiple security vulnerabilities Updated CA certificates based on IGTF 1.110 osg-ca-certs 1.96 : Fixed Let's Encrypt signing policy to accept cross-signing chain","title":"May 13, 2021: High Priority Release"},{"location":"release/osg-36/#april-22-2021-cvmfs-281","text":"This release of OSG 3.6 contains the following packages: CVMFS 2.8.1 : Bug fix release gratia-probe 1.23.2 : Converted to use Python 3","title":"April 22, 2021: CVMFS 2.8.1"},{"location":"release/osg-36/#march-25-2021-htcondor-8911-patches","text":"This release of OSG 3.6 contains the following packages: HTCondor 8.9.11-1.4 (EL7 only) Fixes a potential SchedD crash when using malformed tokens condor_watch_q now works on DAGs vo-client-110-1 with updated WeNMR VOMS information Additionally, the following packages that were already available in OSG 3.6 for EL7 were released for EL8: osg-scitokens-mapfile-1-1 containing a new HTCondor-CE mapfile for VO token issuers vault-1.6.2-1 and htvault-config-0.5-1 for managing tokens cvmfs-gateway-1.2.0-1 : note the upstream documentation for updating from version 0.2.5","title":"March 25, 2021: HTCondor 8.9.11 patches"},{"location":"release/osg-36/#february-26-2021-36-released","text":"Where are GlideinWMS and XRootD? XRootD and GlideinWMS are both absent in the initial OSG 3.6 release: we expect major version updates that may require manual intervention for both of these packages so we are holding their initial releases in this series until they are ready. This initial release of the OSG 3.6 release series is based on the packages available in OSG 3.5.31. One of the major changes in this release series is the shift to token-based authentication from GSI proxy-based authentication. Here is a list of the differences in this initial release: GridFTP, GSI, and Hadoop are no longer available Added packages to support token-based authentication HTCondor 8.9.11 : initial token support (8.9.12, which will contain default configuration using tokens, was delayed) HTCondor-CE 5.0.0: support for Python 3 Gratia Probe 2.0.0 : replace all batch system probes with the non-root HTCondor-CE probe OSG-Configure 4.0.0 : Deprecated RSV Dropped unused configuration modules and attributes Reorganized some configuration (see update instructions for more details) In addition, we have updated our Software Release Policy to follow a rolling release model. Finally, our Docker image releases will more closely track our OSG 3.6 repositories.","title":"February 26, 2021: 3.6 Released"},{"location":"release/osg-36/#announcements","text":"Updates to critical packages also announced by email and are sent to the following recipients and lists: Registered administrative contacts osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org site-announce@opensciencegrid.org software-discuss@osg-htc.org","title":"Announcements"},{"location":"release/release_series/","text":"Release Series \u00b6 OSG Software releases are organized into release series, with the intent that software updates within a series do will not take require manual configuration updates, cause significant downtime, or break dependent software. When a new series is released, it is an opportunity for the OSG Software Team to add major new software packages, make substantial updates to existing packages, and remove obsolete packages. When a new series is initially released, most packages are identical to the previous release, but two adjacent series will diverge over time. Support Policy \u00b6 The OSG Software Team supports at most two concurrent release series, current and previous , where the goal is to begin a new release series about every 12 months. Once a new series starts, the Software Team will support the previous series until the next release series and will announce its end-of-life date at least 6 months in advance. When support ends for a release series, it means that the Software Team no longer updates the software, fixes issues, or troubleshoots installations for releases within the series. The plan is to maintain interoperability between supported series, but there is no guarantee that unsupported series will continue to function. Files for release series older than current or previous will be removed from the OSG Software repositories no earlier than when support ends for the previous release. For example, files for OSG 3.2 were not removed until May 2018, when support ended for OSG 3.3 in May 2018. Series Overviews \u00b6 Since the start of the RPM-based OSG Software Stack, we have offered the following release series: OSG 23 (started October 2023) aligns the OSG release series and HTCondor Software Suite release cycles. The initial release includes GlideinWMS 3.10.5, HTCondor 23.0, HTCondor-CE 23.0, and XRootD 5.6.2 OSG 3.6 (started February 2021) overhauls the authentication and data transfer protocols used in the OSG software stack: bearer tokens, such as SciTokens or WLCG tokens, are used for authentication instead of GSI proxies and HTTP is used for data transfer instead of GridFTP. See the OSG GridFTP and GSI migration plan for more details. To support these new protocols, OSG 3.6 includes HTCondor 8.9, HTCondor-CE 5, and will include XRootD 5.1. OSG 3.5 started in August 2019 and was end-of-lifed in May 2022. The main differences between it and 3.4 were the introduction of the HTCondor 8.8 and 8.9 series; also the RSV monitoring probes, EL6 support, and CREAM support were all dropped. OSG 3.4 started June 2017 and was end-of-lifed in November 2020. The main differences between it and 3.3 are the removal of edg-mkgridmap, GUMS, BeStMan, and VOMS Admin Server packages. OSG 3.3 started in August 2015 and was end-of-lifed in May 2018. While the files have not been removed, it is strongly recommended that it not be installed anymore. The main differences between 3.3 and 3.2 are the dropping of EL5 support, the addition of EL7 support, and the dropping of Globus GRAM support. OSG 3.2 started in November 2013, and was end-of-lifed in August 2016. The main differences between it and 3.1 were the introduction of glideinWMS 3.2, HTCondor 8.0, and Hadoop/HDFS 2.0; also the gLite CE Monitor system was dropped in favor of osg-info-services. OSG 3.1 started in April 2012, and was end-of-lifed in April 2015. Historically, there were 3.0.x releases as well, but there was no separate release series for 3.0 and 3.1; we simply went from 3.0.10 to 3.1.0 in the same repositories. Series Life-cycle \u00b6 Support ends at the end of the month of the following dates unless otherwise specified: Release Series Initial Release End of Regular Support End of Critical Bug/Security Support 23 October 2023 Not set Not set 3.6 Februrary 2021 31 March 2024 30 June 2024 3.5 August 2019 30 August 2021 1 May 2022 3.4 June 2017 29 February 2020 30 November 2020 3.3 August 2015 31 December 2017 31 May 2018 3.2 November 2013 29 February 2016 31 August 2016 3.1 April 2012 31 October 2014 30 April 2015 Installing an OSG Release Series \u00b6 See the yum repositories document for instructions on installing the OSG repositories. References \u00b6 Yum repositories Basic use of Yum","title":"Release Series"},{"location":"release/release_series/#release-series","text":"OSG Software releases are organized into release series, with the intent that software updates within a series do will not take require manual configuration updates, cause significant downtime, or break dependent software. When a new series is released, it is an opportunity for the OSG Software Team to add major new software packages, make substantial updates to existing packages, and remove obsolete packages. When a new series is initially released, most packages are identical to the previous release, but two adjacent series will diverge over time.","title":"Release Series"},{"location":"release/release_series/#support-policy","text":"The OSG Software Team supports at most two concurrent release series, current and previous , where the goal is to begin a new release series about every 12 months. Once a new series starts, the Software Team will support the previous series until the next release series and will announce its end-of-life date at least 6 months in advance. When support ends for a release series, it means that the Software Team no longer updates the software, fixes issues, or troubleshoots installations for releases within the series. The plan is to maintain interoperability between supported series, but there is no guarantee that unsupported series will continue to function. Files for release series older than current or previous will be removed from the OSG Software repositories no earlier than when support ends for the previous release. For example, files for OSG 3.2 were not removed until May 2018, when support ended for OSG 3.3 in May 2018.","title":"Support Policy"},{"location":"release/release_series/#series-overviews","text":"Since the start of the RPM-based OSG Software Stack, we have offered the following release series: OSG 23 (started October 2023) aligns the OSG release series and HTCondor Software Suite release cycles. The initial release includes GlideinWMS 3.10.5, HTCondor 23.0, HTCondor-CE 23.0, and XRootD 5.6.2 OSG 3.6 (started February 2021) overhauls the authentication and data transfer protocols used in the OSG software stack: bearer tokens, such as SciTokens or WLCG tokens, are used for authentication instead of GSI proxies and HTTP is used for data transfer instead of GridFTP. See the OSG GridFTP and GSI migration plan for more details. To support these new protocols, OSG 3.6 includes HTCondor 8.9, HTCondor-CE 5, and will include XRootD 5.1. OSG 3.5 started in August 2019 and was end-of-lifed in May 2022. The main differences between it and 3.4 were the introduction of the HTCondor 8.8 and 8.9 series; also the RSV monitoring probes, EL6 support, and CREAM support were all dropped. OSG 3.4 started June 2017 and was end-of-lifed in November 2020. The main differences between it and 3.3 are the removal of edg-mkgridmap, GUMS, BeStMan, and VOMS Admin Server packages. OSG 3.3 started in August 2015 and was end-of-lifed in May 2018. While the files have not been removed, it is strongly recommended that it not be installed anymore. The main differences between 3.3 and 3.2 are the dropping of EL5 support, the addition of EL7 support, and the dropping of Globus GRAM support. OSG 3.2 started in November 2013, and was end-of-lifed in August 2016. The main differences between it and 3.1 were the introduction of glideinWMS 3.2, HTCondor 8.0, and Hadoop/HDFS 2.0; also the gLite CE Monitor system was dropped in favor of osg-info-services. OSG 3.1 started in April 2012, and was end-of-lifed in April 2015. Historically, there were 3.0.x releases as well, but there was no separate release series for 3.0 and 3.1; we simply went from 3.0.10 to 3.1.0 in the same repositories.","title":"Series Overviews"},{"location":"release/release_series/#series-life-cycle","text":"Support ends at the end of the month of the following dates unless otherwise specified: Release Series Initial Release End of Regular Support End of Critical Bug/Security Support 23 October 2023 Not set Not set 3.6 Februrary 2021 31 March 2024 30 June 2024 3.5 August 2019 30 August 2021 1 May 2022 3.4 June 2017 29 February 2020 30 November 2020 3.3 August 2015 31 December 2017 31 May 2018 3.2 November 2013 29 February 2016 31 August 2016 3.1 April 2012 31 October 2014 30 April 2015","title":"Series Life-cycle"},{"location":"release/release_series/#installing-an-osg-release-series","text":"See the yum repositories document for instructions on installing the OSG repositories.","title":"Installing an OSG Release Series"},{"location":"release/release_series/#references","text":"Yum repositories Basic use of Yum","title":"References"},{"location":"release/signing/","text":"OSG Release Signing Information \u00b6 Verifying OSG's RPMs \u00b6 We use a GPG key to sign our software packages. Normally yum and rpm transparently use the GPG signatures to verify the packages have not been corrupted and were created by us. You get our GPG public key when you install the osg-release RPM. If you wish to verify one of our RPMs manually, you can run: $ rpm --checksig -v <NAME.RPM> For example: $ rpm --checksig -v globus-core-8.0-2.osg.x86_64.rpm globus-core-8.0-2.osg.x86_64.rpm: Header V3 DSA signature: OK, key ID 824b8603 Header SHA1 digest: OK (2b5af4348c548c27f10e2e47e1ec80500c4f85d7) MD5 digest: OK (d11503a229a1a0e02262034efe0f7e46) V3 DSA signature: OK, key ID 824b8603 The OSG Packaging Signing Keys \u00b6 The OSG Software Team has several GPG keys for signing RPMs; The key used depends on the OSG version and EL variant used, as documented below: Key 1 (3.0 to 3.5) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG Download UW-Madison , GitHub Fingerprint 6459 !D9D2 AAA9 AB67 A251 FB44 2110 !B1C8 824B 8603 Key ID 824b8603 Key 2 (3.6 and on, EL <= 8) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 Download UW-Madison , GitHub Fingerprint 1216 FF68 897A 77EA 222F C961 27DC 6864 96D2 B90F Key ID 96d2b90f Key 4 (3.6 and on, EL >= 9) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 Download GitHub Fingerprint B77E 70A6 0537 1D3B E109 A18E 3170 E150 1887 C61A Key ID 1887c61a OSG 23 Automated Signing Key Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-auto Download GitHub Fingerprint E2AF 9F6E 239F D62B 5377 05C0 1760 EDF6 4D43 84D0 Key ID 4d4384d0 OSG 23 Developer Signing Key Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-developer Download GitHub Fingerprint 4A56 C5BB CDB0 AAA2 DDE9 A690 BDEE E24C 9289 7C00 Key ID 92897c00 Note Some packages in the 3.6 repos may still be signed with the old key; the osg-release RPM contains both keys so you can verify old packages. You can see the fingerprint for yourself. On EL 7 and older (GnuPG < 2.1.13): $ gpg --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG pub 1024D/824B8603 2011-09-15 OSG Software Team (RPM Signing Key for Koji Packages) <vdt-support@opensciencegrid.org> Key fingerprint = 6459 D9D2 AAA9 AB67 A251 FB44 2110 B1C8 824B 8603 sub 2048g/28E5857C 2011-09-15 $ gpg --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 pub 4096R/96D2B90F 2021-02-24 Open Science Grid Software <help@osg-htc.org> Key fingerprint = 1216 FF68 897A 77EA 222F C961 27DC 6864 96D2 B90F sub 4096R/49E9ACC2 2021-02-24 On EL 8 and newer (GnuPG >= 2.1.13): $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG pub dsa1024 2011-09-15 [SC] 6459D9D2AAA9AB67A251FB442110B1C8824B8603 uid OSG Software Team (RPM Signing Key for Koji Packages) <vdt-support@opensciencegrid.org> sub elg2048 2011-09-15 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 pub rsa4096 2021-02-24 [SC] 1216FF68897A77EA222FC96127DC686496D2B90F uid Open Science Grid Software <help@osg-htc.org> sub rsa4096 2021-02-24 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 pub rsa4096 2022-12-28 [SC] B77E70A605371D3BE109A18E3170E1501887C61A uid OSG Software 3.6 for EL9 RSA <help@osg-htc.org> sub rsa4096 2022-12-28 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-auto pub rsa4096 2023-06-23 [SC] E2AF9F6E239FD62B537705C01760EDF64D4384D0 uid OSG 23 Automated Signing Key <help@osg-htc.org> sub rsa4096 2023-06-23 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-developer pub rsa4096 2023-08-15 [SC] 4A56C5BBCDB0AAA2DDE9A690BDEEE24C92897C00 uid OSG 23 Developer Signing Key <help@osg-chtc.org> sub rsa4096 2023-08-15 [E]","title":"RPM Signing"},{"location":"release/signing/#osg-release-signing-information","text":"","title":"OSG Release Signing Information"},{"location":"release/signing/#verifying-osgs-rpms","text":"We use a GPG key to sign our software packages. Normally yum and rpm transparently use the GPG signatures to verify the packages have not been corrupted and were created by us. You get our GPG public key when you install the osg-release RPM. If you wish to verify one of our RPMs manually, you can run: $ rpm --checksig -v <NAME.RPM> For example: $ rpm --checksig -v globus-core-8.0-2.osg.x86_64.rpm globus-core-8.0-2.osg.x86_64.rpm: Header V3 DSA signature: OK, key ID 824b8603 Header SHA1 digest: OK (2b5af4348c548c27f10e2e47e1ec80500c4f85d7) MD5 digest: OK (d11503a229a1a0e02262034efe0f7e46) V3 DSA signature: OK, key ID 824b8603","title":"Verifying OSG's RPMs"},{"location":"release/signing/#the-osg-packaging-signing-keys","text":"The OSG Software Team has several GPG keys for signing RPMs; The key used depends on the OSG version and EL variant used, as documented below: Key 1 (3.0 to 3.5) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG Download UW-Madison , GitHub Fingerprint 6459 !D9D2 AAA9 AB67 A251 FB44 2110 !B1C8 824B 8603 Key ID 824b8603 Key 2 (3.6 and on, EL <= 8) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 Download UW-Madison , GitHub Fingerprint 1216 FF68 897A 77EA 222F C961 27DC 6864 96D2 B90F Key ID 96d2b90f Key 4 (3.6 and on, EL >= 9) Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 Download GitHub Fingerprint B77E 70A6 0537 1D3B E109 A18E 3170 E150 1887 C61A Key ID 1887c61a OSG 23 Automated Signing Key Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-auto Download GitHub Fingerprint E2AF 9F6E 239F D62B 5377 05C0 1760 EDF6 4D43 84D0 Key ID 4d4384d0 OSG 23 Developer Signing Key Location /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-developer Download GitHub Fingerprint 4A56 C5BB CDB0 AAA2 DDE9 A690 BDEE E24C 9289 7C00 Key ID 92897c00 Note Some packages in the 3.6 repos may still be signed with the old key; the osg-release RPM contains both keys so you can verify old packages. You can see the fingerprint for yourself. On EL 7 and older (GnuPG < 2.1.13): $ gpg --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG pub 1024D/824B8603 2011-09-15 OSG Software Team (RPM Signing Key for Koji Packages) <vdt-support@opensciencegrid.org> Key fingerprint = 6459 D9D2 AAA9 AB67 A251 FB44 2110 B1C8 824B 8603 sub 2048g/28E5857C 2011-09-15 $ gpg --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 pub 4096R/96D2B90F 2021-02-24 Open Science Grid Software <help@osg-htc.org> Key fingerprint = 1216 FF68 897A 77EA 222F C961 27DC 6864 96D2 B90F sub 4096R/49E9ACC2 2021-02-24 On EL 8 and newer (GnuPG >= 2.1.13): $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG pub dsa1024 2011-09-15 [SC] 6459D9D2AAA9AB67A251FB442110B1C8824B8603 uid OSG Software Team (RPM Signing Key for Koji Packages) <vdt-support@opensciencegrid.org> sub elg2048 2011-09-15 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-2 pub rsa4096 2021-02-24 [SC] 1216FF68897A77EA222FC96127DC686496D2B90F uid Open Science Grid Software <help@osg-htc.org> sub rsa4096 2021-02-24 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 pub rsa4096 2022-12-28 [SC] B77E70A605371D3BE109A18E3170E1501887C61A uid OSG Software 3.6 for EL9 RSA <help@osg-htc.org> sub rsa4096 2022-12-28 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-auto pub rsa4096 2023-06-23 [SC] E2AF9F6E239FD62B537705C01760EDF64D4384D0 uid OSG 23 Automated Signing Key <help@osg-htc.org> sub rsa4096 2023-06-23 [E] $ gpg --import-options show-only --import < /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-23-developer pub rsa4096 2023-08-15 [SC] 4A56C5BBCDB0AAA2DDE9A690BDEEE24C92897C00 uid OSG 23 Developer Signing Key <help@osg-chtc.org> sub rsa4096 2023-08-15 [E]","title":"The OSG Packaging Signing Keys"},{"location":"release/supported_platforms/","text":"Supported Platforms \u00b6 The OSG release series are supported on Red Hat Enterprise Linux (RHEL) compatible platforms for 64-bit Intel architectures according to the following table: Platform OSG 3.6 OSG 23 CentOS 7 \u2705 CentOS Stream 8 \u2705 \u2705 CentOS Stream 9 \u2705 \u2705 Alma Linux 8 \u2705 \u2705 Alma Linux 9 \u2705 \u2705 Red Hat Enterprise Linux 7 \u2705 Red Hat Enterprise Linux 8 \u2705 \u2705 Red Hat Enterprise Linux 9 \u2705 \u2705 Rocky Linux 8 \u2705 \u2705 Rocky Linux 9 \u2705 \u2705 Scientifix Linux 7 \u2705 OSG builds and tests its RPMs on the latest releases of the relevant platforms (e.g., in 2023, the RHEL 9 builds were based on RHEL 9.2). Older platform release versions may not receive thorough testing and may have subtle bugs. In particular, versions of RHEL/CentOS/SL 7 less than 7.5 have known issues with several pieces of software, including osg-oasis and xrootd-lcmaps . In addition, versions of RHEL/CentOS/SL 7 less than 7.8 do not have Python 3, which is required to run HTCondor 9 and HTCondor-CE 5. If sites run into problems with one of those versions, we will ask them to update to the latest operating system packages as part of the support process.","title":"Supported Platforms"},{"location":"release/supported_platforms/#supported-platforms","text":"The OSG release series are supported on Red Hat Enterprise Linux (RHEL) compatible platforms for 64-bit Intel architectures according to the following table: Platform OSG 3.6 OSG 23 CentOS 7 \u2705 CentOS Stream 8 \u2705 \u2705 CentOS Stream 9 \u2705 \u2705 Alma Linux 8 \u2705 \u2705 Alma Linux 9 \u2705 \u2705 Red Hat Enterprise Linux 7 \u2705 Red Hat Enterprise Linux 8 \u2705 \u2705 Red Hat Enterprise Linux 9 \u2705 \u2705 Rocky Linux 8 \u2705 \u2705 Rocky Linux 9 \u2705 \u2705 Scientifix Linux 7 \u2705 OSG builds and tests its RPMs on the latest releases of the relevant platforms (e.g., in 2023, the RHEL 9 builds were based on RHEL 9.2). Older platform release versions may not receive thorough testing and may have subtle bugs. In particular, versions of RHEL/CentOS/SL 7 less than 7.5 have known issues with several pieces of software, including osg-oasis and xrootd-lcmaps . In addition, versions of RHEL/CentOS/SL 7 less than 7.8 do not have Python 3, which is required to run HTCondor 9 and HTCondor-CE 5. If sites run into problems with one of those versions, we will ask them to update to the latest operating system packages as part of the support process.","title":"Supported Platforms"},{"location":"release/updating-to-osg-23/","text":"Updating to OSG 23 \u00b6 OSG 23 (the new series ) aligns much more closely with HTCondor 23 and HTCondor-CE 23. Compute Entrypoints should be updated to OSG 23 as soon as possible. HTCondor pools and access points should be updated to OSG 23 as soon as possible. All other services (e.g., OSG Worker Node clients, Frontier Squids) should be updated to OSG 23 as soon as possible. Updating the OSG Repositories \u00b6 Note Before updating the OSG repository, be sure to turn off any OSG services. Consult the sections below that match your situation. Clean the yum cache: root@host # yum clean all --enablerepo = * Disable to upcoming repository: yum-config-manager --disable osg-upcoming Remove the old series Yum repositories: root@host # rpm -e osg-release This step ensures that any local modifications to *.repo files will not prevent installing the new series repos. Any modified *.repo files should appear under /etc/yum.repos.d/ with the *.rpmsave extension. After installing the new OSG repositories (the next step) you may want to apply any changes made in the *.rpmsave files to the new *.repo files. Update your Yum repositories to OSG 23 Update software: root@host # yum update Warning Please be aware that running yum update may also update other RPMs. You can exclude packages from being updated using the --exclude=[package-name or glob] option for the yum command. Watch the yum update carefully for any messages about a .rpmnew file being created. That means that a configuration file had been edited, and a new default version was to be installed. In that case, RPM does not overwrite the edited configuration file but instead installs the new version with a .rpmnew extension. You will need to merge any edits that have made into the .rpmnew file and then move the merged version into place (that is, without the .rpmnew extension). Continue on to any update instructions that match the role(s) that the host performs. Updating Your OSG Access Point \u00b6 In OSG 23, some manual configuration changes may be required for an OSG Access Point (APs). HTCondor \u00b6 Consult the HTCondor upgrade section for details on updating your HTCondor configuration. Restarting HTCondor \u00b6 After updating your RPMs, restart your HTCondor service: root@host # systemctl restart condor Updating Your OSG Compute Entrypoint \u00b6 The OSG 23 release series contains HTCondor-CE 23 , a minor version upgrade from HTCondor-CE 6, which was available in the OSG 3.6 release repositories. To upgrade your CE to OSG 23, follow the sections below. Check for possible incompatibilities \u00b6 Ensure that you have the latest HTCondor installed (either HTCondor 10.9.0 or HTCondor 10.0.9) Run the condor_upgrade_check -ce script and address any issues found. If you have an HTCondor batch system, also run the condor_upgrade_check script and address any issues found. Also consult the upgrade documentation for more information. Turning off CE services \u00b6 Register a downtime During the update, turn off the following services on your HTCondor-CE host: root@host # systemctl stop condor-ce Updating CE packages \u00b6 For OSG CEs serving an HTCondor pool If your OSG CE routes pilot jobs to a local HTCondor pool, also see the section for updating your HTCondor hosts After turning off your CE's services, you may proceed with the repository and RPM update process . Starting CE services \u00b6 After updating your RPMs and updating your configuration, turn on the HTCondor-CE service: :::console root@host # systemctl start condor-ce Updating Your HTCondor Hosts \u00b6 HTCondor-CE hosts Consult this section before updating the condor package on your HTCondor-CE hosts. If you are running an HTCondor pool, consult the following instructions to update to HTCondor from OSG 23. Ensure that you have the latest HTCondor installed (either HTCondor 10.9.0 or HTCondor 10.0.9). Run the condor_upgrade_check script and address any issues found. Also consult the HTCondor 23.0 upgrade instructions . You may proceed with the repository and RPM update process . Getting Help \u00b6 To get assistance, please use the this page .","title":"Updating to OSG 23"},{"location":"release/updating-to-osg-23/#updating-to-osg-23","text":"OSG 23 (the new series ) aligns much more closely with HTCondor 23 and HTCondor-CE 23. Compute Entrypoints should be updated to OSG 23 as soon as possible. HTCondor pools and access points should be updated to OSG 23 as soon as possible. All other services (e.g., OSG Worker Node clients, Frontier Squids) should be updated to OSG 23 as soon as possible.","title":"Updating to OSG 23"},{"location":"release/updating-to-osg-23/#updating-the-osg-repositories","text":"Note Before updating the OSG repository, be sure to turn off any OSG services. Consult the sections below that match your situation. Clean the yum cache: root@host # yum clean all --enablerepo = * Disable to upcoming repository: yum-config-manager --disable osg-upcoming Remove the old series Yum repositories: root@host # rpm -e osg-release This step ensures that any local modifications to *.repo files will not prevent installing the new series repos. Any modified *.repo files should appear under /etc/yum.repos.d/ with the *.rpmsave extension. After installing the new OSG repositories (the next step) you may want to apply any changes made in the *.rpmsave files to the new *.repo files. Update your Yum repositories to OSG 23 Update software: root@host # yum update Warning Please be aware that running yum update may also update other RPMs. You can exclude packages from being updated using the --exclude=[package-name or glob] option for the yum command. Watch the yum update carefully for any messages about a .rpmnew file being created. That means that a configuration file had been edited, and a new default version was to be installed. In that case, RPM does not overwrite the edited configuration file but instead installs the new version with a .rpmnew extension. You will need to merge any edits that have made into the .rpmnew file and then move the merged version into place (that is, without the .rpmnew extension). Continue on to any update instructions that match the role(s) that the host performs.","title":"Updating the OSG Repositories"},{"location":"release/updating-to-osg-23/#updating-your-osg-access-point","text":"In OSG 23, some manual configuration changes may be required for an OSG Access Point (APs).","title":"Updating Your OSG Access Point"},{"location":"release/updating-to-osg-23/#htcondor","text":"Consult the HTCondor upgrade section for details on updating your HTCondor configuration.","title":"HTCondor"},{"location":"release/updating-to-osg-23/#restarting-htcondor","text":"After updating your RPMs, restart your HTCondor service: root@host # systemctl restart condor","title":"Restarting HTCondor"},{"location":"release/updating-to-osg-23/#updating-your-osg-compute-entrypoint","text":"The OSG 23 release series contains HTCondor-CE 23 , a minor version upgrade from HTCondor-CE 6, which was available in the OSG 3.6 release repositories. To upgrade your CE to OSG 23, follow the sections below.","title":"Updating Your OSG Compute Entrypoint"},{"location":"release/updating-to-osg-23/#check-for-possible-incompatibilities","text":"Ensure that you have the latest HTCondor installed (either HTCondor 10.9.0 or HTCondor 10.0.9) Run the condor_upgrade_check -ce script and address any issues found. If you have an HTCondor batch system, also run the condor_upgrade_check script and address any issues found. Also consult the upgrade documentation for more information.","title":"Check for possible incompatibilities"},{"location":"release/updating-to-osg-23/#turning-off-ce-services","text":"Register a downtime During the update, turn off the following services on your HTCondor-CE host: root@host # systemctl stop condor-ce","title":"Turning off CE services"},{"location":"release/updating-to-osg-23/#updating-ce-packages","text":"For OSG CEs serving an HTCondor pool If your OSG CE routes pilot jobs to a local HTCondor pool, also see the section for updating your HTCondor hosts After turning off your CE's services, you may proceed with the repository and RPM update process .","title":"Updating CE packages"},{"location":"release/updating-to-osg-23/#starting-ce-services","text":"After updating your RPMs and updating your configuration, turn on the HTCondor-CE service: :::console root@host # systemctl start condor-ce","title":"Starting CE services"},{"location":"release/updating-to-osg-23/#updating-your-htcondor-hosts","text":"HTCondor-CE hosts Consult this section before updating the condor package on your HTCondor-CE hosts. If you are running an HTCondor pool, consult the following instructions to update to HTCondor from OSG 23. Ensure that you have the latest HTCondor installed (either HTCondor 10.9.0 or HTCondor 10.0.9). Run the condor_upgrade_check script and address any issues found. Also consult the HTCondor 23.0 upgrade instructions . You may proceed with the repository and RPM update process .","title":"Updating Your HTCondor Hosts"},{"location":"release/updating-to-osg-23/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"release/updating-to-osg-36/","text":"Updating to OSG 3.6 \u00b6 OSG 3.6 (the new series ) is a major overhaul of the OSG software stack compared to OSG 3.5 (the old series ) with changes to core protocols used for authentication and data transfer. See this page for more details regarding this transition. Depending on the collaboration(s) that you support , updating to the new series could result in issues with your site receiving pilot jobs and/or issues with data transfer. See the list of services below for any special considerations for the OSG 3.6 update: Compute Entrypoints should be updated to OSG 3.6 with care: If the collaborations that you support have NOT moved to bearer token pilot job submission, update to HTCondor-CE 5 available in OSG 3.5 upcoming to help your collaborations transition to bearer tokens. OSG 3.5 end-of-life OSG 3.5 support is scheduled to end on May 1, 2022 . If your collaboration does not yet support token-based pilot job submission, please contact them directly for their timeline. XRootD will continue to support GSI and VOMS proxies in OSG 3.6 through plugins that do not use the Grid Community Toolkit libraries. Therefore, XRootD hosts (i.e., standalone installations, caches and origins) should be updated to OSG 3.6 as soon as possible. Some config changes will be necessary; see the XRootD auth update instructions for specifics. GridFTP services should be replaced with an installation of XRootD standalone. HTCondor pools and access points should be updated to OSG 3.6 as soon as possible. Note that any pools using GSI authentication will need to transition to a different authentication method, such as IDTOKENS. All other services (e.g., OSG Worker Node clients, Frontier Squids) should be updated to OSG 3.6 as soon as possible. Updating the OSG Repositories \u00b6 Python 3 support Many software packages, such as HTCondor and HTCondor-CE, use Python 3 scripts. If you are using Enterprise Linux 7, you must upgrade to at least version 7.8 for Python 3 support. Note Before updating the OSG repository, be sure to turn off any OSG services. Consult the sections below that match your situation. Clean the yum cache: root@host # yum clean all --enablerepo = * Disable to upcoming repository: yum-config-manager --disable osg-upcoming Remove the old series Yum repositories: root@host # rpm -e osg-release This step ensures that any local modifications to *.repo files will not prevent installing the new series repos. Any modified *.repo files should appear under /etc/yum.repos.d/ with the *.rpmsave extension. After installing the new OSG repositories (the next step) you may want to apply any changes made in the *.rpmsave files to the new *.repo files. Update your Yum repositories to OSG 3.6 Update software: root@host # yum update Warning Please be aware that running yum update may also update other RPMs. You can exclude packages from being updated using the --exclude=[package-name or glob] option for the yum command. Watch the yum update carefully for any messages about a .rpmnew file being created. That means that a configuration file had been edited, and a new default version was to be installed. In that case, RPM does not overwrite the edited configuration file but instead installs the new version with a .rpmnew extension. You will need to merge any edits that have made into the .rpmnew file and then move the merged version into place (that is, without the .rpmnew extension). Continue on to any update instructions that match the role(s) that the host performs. Updating Your OSG Access Point \u00b6 In OSG 3.6, some manual configuration changes are required for OSG Access Point (APs). To perform this upgrade, turn off and disable the gratia-probes-cron service: root@host # systemctl stop gratia-probes-cron root@host # systemctl disable gratia-probes-cron Updating AP packages \u00b6 For OSG 3.6 APs, the relevant Gratia Probe package to install is gratia-probe-condor-ap and you may need to explicitly install it if you are running a non-OSPool AP: Proceed with the repository and RPM update process . Install the gratia-probe-condor-ap RPM (OSPool APs should already have this package through the osg-flock RPM): root@host # yum install condor-probe-ap Updating AP configuration \u00b6 HTCondor \u00b6 Consult the HTCondor upgrade section for details on updating your HTCondor configuration. Gratia Probe \u00b6 Copy the following values from /etc/gratia/condor/ProbeConfig to /etc/gratia/condor-ap/ProbeConfig : EnableProbe MapUnknownToGroup ProbeName SiteName VOOverride Updated default values It is not sufficient to overwrite the contents of /etc/gratia/condor-ap/ProbeConfig entirely with the contents of /etc/gratia/condor/ProbeConfig as many default values have changed. In /etc/gratia/condor-ap/ProbeConfig , replace condor: in the ProbeName with condor-ap: . For example, the following value should be changed from: ProbeName=\"condor:my-ap.site.edu to: ProbeName=\"condor-ap:my-ap.site.edu Ensure that the paths ( /var/lib/condor/gratia/data ) from the following commands are the same: root@host # condor_config_val PER_JOB_HISTORY_DIR /var/lib/condor/gratia/data root@host # awk -F '=' '/DataFolder/ {print $2}' /etc/gratia/condor-ap/ProbeConfig | tr -d '\"' /var/lib/condor/gratia/data Restarting HTCondor \u00b6 After updating your RPMs and updating your configuration, restart your HTCondor service: root@host # systemctl restart condor What about gratia-probes-cron ? In OSG 3.6, OSG APs no longer needs a separate service for Gratia Probe. Instead, the default CE configuration runs its Gratia Probe as a periodic process under the HTCondor process tree. Updating Your OSG Compute Entrypoint \u00b6 In OSG 3.6, OSG Compute Entrypoints (CEs) only accept token-based pilot job submissions. If you need to support token-based and GSI proxy-based pilot job submission, you must install or remain on OSG 3.5, with the osg-upcoming repositories enabled. If the collaborations that you support have the capability to submit token-based pilots, you may update your CE to OSG 3.6. In addition to the change in authentication protocol, OSG 3.6 CEs include new major versions of software that require manual updates. To upgrade your CE to OSG 3.6, follow the sections below to make your configuration OSG 3.6-compatible. Turning off CE services \u00b6 Register a downtime During the update, turn off the following services on your HTCondor-CE host: root@host # systemctl stop condor-ce root@host # systemctl stop gratia-probes-cron Run the command corresponding to your batch system to upload any remaining accounting records to the GRACC: If your batch system is... Then run the following command... HTCondor /usr/share/gratia/condor/condor_meter LSF /usr/share/gratia/lsf/lsf PBS /usr/share/gratia/pbs-lsf/pbs-lsf_meter.cron.sh SGE /usr/share/gratia/sge/sge_meter.cron.sh Slurm /usr/share/gratia/slurm/slurm_meter -c Disable the gratia-probes-cron service: root@host # systemctl disable gratia-probes-cron Updating CE packages \u00b6 After turning off your CE's services, you may proceed with the repository and RPM update process . Updating CE configuration \u00b6 Gratia Probe \u00b6 The OSG 3.6 release series contains Gratia Probe 2 , which uses the non-root HTCondor-CE probe to account for your site's resource contributions. To ensure that your contributions continue to be properly accounted for, perform the following steps based on the type of batch system running at your site. HTCondor batch systems \u00b6 After updating your gratia-probe-* packages, verify the values of your HTCondor and HTCondor-CE PER_JOB_HISTORY_DIR configurations match the output below: # condor_ce_config_val -v PER_JOB_HISTORY_DIR Not defined: PER_JOB_HISTORY_DIR # at: <Default> # raw: PER_JOB_HISTORY_DIR = # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor/config.d/99-gratia.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor-ce/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If the value of condor_ce_config_val -v PER_JOB_HISTORY_DIR is set, visit the referenced file and remove the offending configuration. Repeat until the output of condor_ce_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor-ce/gratia/data/ root@host # chown -R condor:condor /var/lib/condor-ce/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a. Non-HTCondor batch systems \u00b6 After updating your gratia-probe-* packages, verify that your HTCondor-CE's PER_JOB_HISTORY_DIR is set to /var/lib/condor-ce/gratia/data : root@host # condor_ce_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor-ce/config.d/99_gratia-ce.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output , visit the file listed in the output of condor_ce_config_val , remove the offending value, and repeat until the proper value is returned. OSG-Configure \u00b6 The OSG 3.6 release series contains OSG-Configure 4, a major version upgrade from the previously released versions in the OSG. See the OSG-Configure 4.0.0 release notes for an overview of the changes. Several configuration modules and options were removed or deprecated and CE configuration has been simplified; the update from version 3 to version 4 will require some manual changes to your configuration. To update OSG-Configure, perform the following steps: Merge any *.rpmnew files in /etc/osg/config.d/ Uninstall osg-configure-gip and osg-configure-misc if they are installed: root@host# yum erase osg-configure-gip osg-configure-misc If /etc/osg/config.d/30-gip.ini.rpmsave exists, merge its contents into 31-cluster.ini Edit the Site Information configuration section (in 40-siteinfo.ini ): If resource_group is not set, add: resource_group = <TOPOLOGY RESOURCE GROUP FOR THIS HOST> Delete the following attributes: sponsor site_policy contact email city country latitude longitude Run osg-configure to apply your changes: osg-configure -dc HTCondor-CE \u00b6 Passing along non-HTCondor batch system directives default_CERequirements in the the new Job Router ClassAd transform syntax is ignored. To fix this, apply the change in this patch to /usr/share/condor-ce/config.d/01-ce-router-defaults.conf . The next release of HTCondor-CE will contain this fix and will not require any additional action post-update. The OSG 3.6 release series contains HTCondor-CE 5, a major version upgrade from HTCondor-CE 4, which was available in the OSG 3.5 release repositories. To update HTCondor-CE, perform the following steps: If the collaboration(s) that you support submit token-based pilots and you map these pilots to non-default local Unix accounts: Copy the relevant default mappings from /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf (provided by the osg-scitokens-mapfile package) to a file in /etc/condor-ce/mapfiles.d/ Replacing the third field with the local Unix account. Also consult the upgrade documentation for other required configuration updates. For OSG CEs serving an HTCondor pool If your OSG CE routes pilot jobs to a local HTCondor pool, also see the section for updating your HTCondor hosts Starting CE services \u00b6 After updating your RPMs and updating your configuration, turn on the HTCondor-CE service: root@host # systemctl start condor-ce What about gratia-probes-cron ? In OSG 3.6, the OSG CE no longer needs a separate service for Gratia Probe. Instead, the default CE configuration runs its Gratia Probe as a periodic process under the HTCondor-CE process tree. Updating Your HTCondor Hosts \u00b6 HTCondor-CE hosts Consult this section before updating the condor package on your HTCondor-CE hosts. If you are running an HTCondor pool, consult the following instructions to update to HTCondor from OSG 3.6. Note that the version of HTCondor available in OSG 3.6 does not support GSI authentication. If your pool is configured to authenticate with GSI, we recommend using HTCondor's \"IDTOKENS\" configuration for host-to-host authentication. The following OSG specific configuration was dropped in anticipation of HTCondor's new secure by default configuration coming in HTCondor version 9.0. HTCondor's 9.0 recommended security configuration requires authentication for all access (including read access). CONDOR_HOST = $(FULL_HOSTNAME) DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD # require authentication and integrity for everything... SEC_DEFAULT_AUTHENTICATION=REQUIRED SEC_DEFAULT_INTEGRITY=REQUIRED # ...except read access... SEC_READ_AUTHENTICATION=OPTIONAL SEC_READ_INTEGRITY=OPTIONAL # ...and the outgoing (client side) connection since the server side will enforce its policy SEC_CLIENT_AUTHENTICATION=OPTIONAL SEC_CLIENT_INTEGRITY=OPTIONAL # this will required PASSWORD authentications for daemon-to-daemon, and # allow FS authentication for submitting jobs and running administrator commands SEC_DEFAULT_AUTHENTICATION_METHODS = FS, PASSWORD SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD SEC_NEGOTIATOR_AUTHENTICATION_METHODS = PASSWORD SEC_PASSWORD_FILE = /etc/condor/passwords.d/POOL # admin commands (e.g. condor_off) can be run by: # 1. root on the local host or the central manager # 2. condor user on the local host or the central manager ALLOW_ADMINISTRATOR = condor@*/$(FULL_HOSTNAME) condor@*/$(CONDOR_HOST) condor_pool@*/$(FULL_HOSTNAME) condor_pool@*/$(CONDOR_HOST) root@$(UID_DOMAIN)/$(FULL_HOSTNAME) # only the condor daemons on the central manager can negotiate ALLOW_NEGOTIATOR = condor@*/$(CONDOR_HOST) condor_pool@*/$(CONDOR_HOST) # any authenticated daemons in the pool can read/write/advertise ALLOW_DAEMON = condor@* condor_pool@* Manual intervention may be required to upgrade from the HTCondor 8.8 series to HTCondor 9.0.x. Please consult the HTCondor 9.0 upgrade instructions . If you are upgrading from the HTCondor 8.9 series (8.9.11 and earlier), please consult the Upgrading to 9.0 instructions Replacing Your GridFTP Service \u00b6 Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. As part of the GridFTP and GSI migration , GridFTP is no longer available in the OSG 3.6 repositories. If you need to continue to provide remote access to local storage at your site, follow the instructions to install the OSG's configuration of XRootD Standalone . Getting Help \u00b6 To get assistance, please use the this page .","title":"Updating to OSG 3.6"},{"location":"release/updating-to-osg-36/#updating-to-osg-36","text":"OSG 3.6 (the new series ) is a major overhaul of the OSG software stack compared to OSG 3.5 (the old series ) with changes to core protocols used for authentication and data transfer. See this page for more details regarding this transition. Depending on the collaboration(s) that you support , updating to the new series could result in issues with your site receiving pilot jobs and/or issues with data transfer. See the list of services below for any special considerations for the OSG 3.6 update: Compute Entrypoints should be updated to OSG 3.6 with care: If the collaborations that you support have NOT moved to bearer token pilot job submission, update to HTCondor-CE 5 available in OSG 3.5 upcoming to help your collaborations transition to bearer tokens. OSG 3.5 end-of-life OSG 3.5 support is scheduled to end on May 1, 2022 . If your collaboration does not yet support token-based pilot job submission, please contact them directly for their timeline. XRootD will continue to support GSI and VOMS proxies in OSG 3.6 through plugins that do not use the Grid Community Toolkit libraries. Therefore, XRootD hosts (i.e., standalone installations, caches and origins) should be updated to OSG 3.6 as soon as possible. Some config changes will be necessary; see the XRootD auth update instructions for specifics. GridFTP services should be replaced with an installation of XRootD standalone. HTCondor pools and access points should be updated to OSG 3.6 as soon as possible. Note that any pools using GSI authentication will need to transition to a different authentication method, such as IDTOKENS. All other services (e.g., OSG Worker Node clients, Frontier Squids) should be updated to OSG 3.6 as soon as possible.","title":"Updating to OSG 3.6"},{"location":"release/updating-to-osg-36/#updating-the-osg-repositories","text":"Python 3 support Many software packages, such as HTCondor and HTCondor-CE, use Python 3 scripts. If you are using Enterprise Linux 7, you must upgrade to at least version 7.8 for Python 3 support. Note Before updating the OSG repository, be sure to turn off any OSG services. Consult the sections below that match your situation. Clean the yum cache: root@host # yum clean all --enablerepo = * Disable to upcoming repository: yum-config-manager --disable osg-upcoming Remove the old series Yum repositories: root@host # rpm -e osg-release This step ensures that any local modifications to *.repo files will not prevent installing the new series repos. Any modified *.repo files should appear under /etc/yum.repos.d/ with the *.rpmsave extension. After installing the new OSG repositories (the next step) you may want to apply any changes made in the *.rpmsave files to the new *.repo files. Update your Yum repositories to OSG 3.6 Update software: root@host # yum update Warning Please be aware that running yum update may also update other RPMs. You can exclude packages from being updated using the --exclude=[package-name or glob] option for the yum command. Watch the yum update carefully for any messages about a .rpmnew file being created. That means that a configuration file had been edited, and a new default version was to be installed. In that case, RPM does not overwrite the edited configuration file but instead installs the new version with a .rpmnew extension. You will need to merge any edits that have made into the .rpmnew file and then move the merged version into place (that is, without the .rpmnew extension). Continue on to any update instructions that match the role(s) that the host performs.","title":"Updating the OSG Repositories"},{"location":"release/updating-to-osg-36/#updating-your-osg-access-point","text":"In OSG 3.6, some manual configuration changes are required for OSG Access Point (APs). To perform this upgrade, turn off and disable the gratia-probes-cron service: root@host # systemctl stop gratia-probes-cron root@host # systemctl disable gratia-probes-cron","title":"Updating Your OSG Access Point"},{"location":"release/updating-to-osg-36/#updating-ap-packages","text":"For OSG 3.6 APs, the relevant Gratia Probe package to install is gratia-probe-condor-ap and you may need to explicitly install it if you are running a non-OSPool AP: Proceed with the repository and RPM update process . Install the gratia-probe-condor-ap RPM (OSPool APs should already have this package through the osg-flock RPM): root@host # yum install condor-probe-ap","title":"Updating AP packages"},{"location":"release/updating-to-osg-36/#updating-ap-configuration","text":"","title":"Updating AP configuration"},{"location":"release/updating-to-osg-36/#htcondor","text":"Consult the HTCondor upgrade section for details on updating your HTCondor configuration.","title":"HTCondor"},{"location":"release/updating-to-osg-36/#gratia-probe","text":"Copy the following values from /etc/gratia/condor/ProbeConfig to /etc/gratia/condor-ap/ProbeConfig : EnableProbe MapUnknownToGroup ProbeName SiteName VOOverride Updated default values It is not sufficient to overwrite the contents of /etc/gratia/condor-ap/ProbeConfig entirely with the contents of /etc/gratia/condor/ProbeConfig as many default values have changed. In /etc/gratia/condor-ap/ProbeConfig , replace condor: in the ProbeName with condor-ap: . For example, the following value should be changed from: ProbeName=\"condor:my-ap.site.edu to: ProbeName=\"condor-ap:my-ap.site.edu Ensure that the paths ( /var/lib/condor/gratia/data ) from the following commands are the same: root@host # condor_config_val PER_JOB_HISTORY_DIR /var/lib/condor/gratia/data root@host # awk -F '=' '/DataFolder/ {print $2}' /etc/gratia/condor-ap/ProbeConfig | tr -d '\"' /var/lib/condor/gratia/data","title":"Gratia Probe"},{"location":"release/updating-to-osg-36/#restarting-htcondor","text":"After updating your RPMs and updating your configuration, restart your HTCondor service: root@host # systemctl restart condor What about gratia-probes-cron ? In OSG 3.6, OSG APs no longer needs a separate service for Gratia Probe. Instead, the default CE configuration runs its Gratia Probe as a periodic process under the HTCondor process tree.","title":"Restarting HTCondor"},{"location":"release/updating-to-osg-36/#updating-your-osg-compute-entrypoint","text":"In OSG 3.6, OSG Compute Entrypoints (CEs) only accept token-based pilot job submissions. If you need to support token-based and GSI proxy-based pilot job submission, you must install or remain on OSG 3.5, with the osg-upcoming repositories enabled. If the collaborations that you support have the capability to submit token-based pilots, you may update your CE to OSG 3.6. In addition to the change in authentication protocol, OSG 3.6 CEs include new major versions of software that require manual updates. To upgrade your CE to OSG 3.6, follow the sections below to make your configuration OSG 3.6-compatible.","title":"Updating Your OSG Compute Entrypoint"},{"location":"release/updating-to-osg-36/#turning-off-ce-services","text":"Register a downtime During the update, turn off the following services on your HTCondor-CE host: root@host # systemctl stop condor-ce root@host # systemctl stop gratia-probes-cron Run the command corresponding to your batch system to upload any remaining accounting records to the GRACC: If your batch system is... Then run the following command... HTCondor /usr/share/gratia/condor/condor_meter LSF /usr/share/gratia/lsf/lsf PBS /usr/share/gratia/pbs-lsf/pbs-lsf_meter.cron.sh SGE /usr/share/gratia/sge/sge_meter.cron.sh Slurm /usr/share/gratia/slurm/slurm_meter -c Disable the gratia-probes-cron service: root@host # systemctl disable gratia-probes-cron","title":"Turning off CE services"},{"location":"release/updating-to-osg-36/#updating-ce-packages","text":"After turning off your CE's services, you may proceed with the repository and RPM update process .","title":"Updating CE packages"},{"location":"release/updating-to-osg-36/#updating-ce-configuration","text":"","title":"Updating CE configuration"},{"location":"release/updating-to-osg-36/#gratia-probe_1","text":"The OSG 3.6 release series contains Gratia Probe 2 , which uses the non-root HTCondor-CE probe to account for your site's resource contributions. To ensure that your contributions continue to be properly accounted for, perform the following steps based on the type of batch system running at your site.","title":"Gratia Probe"},{"location":"release/updating-to-osg-36/#htcondor-batch-systems","text":"After updating your gratia-probe-* packages, verify the values of your HTCondor and HTCondor-CE PER_JOB_HISTORY_DIR configurations match the output below: # condor_ce_config_val -v PER_JOB_HISTORY_DIR Not defined: PER_JOB_HISTORY_DIR # at: <Default> # raw: PER_JOB_HISTORY_DIR = # condor_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor/config.d/99-gratia.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output : If the value of condor_config_val -v PER_JOB_HISTORY_DIR is not /var/lib/condor-ce/gratia/data note its value. Then visit the referenced file, remove the offending configuration and repeat until the output of condor_config_val -v PER_JOB_HISTORY_DIR matches the above output. If the value of condor_ce_config_val -v PER_JOB_HISTORY_DIR is set, visit the referenced file and remove the offending configuration. Repeat until the output of condor_ce_config_val -v PER_JOB_HISTORY_DIR matches the above output. If you noted a different value in step a, copy data from the old directory to the new directory and fix ensure that ownership is correct: root@host # cp <ORIGINAL DIR>/* /var/lib/condor-ce/gratia/data/ root@host # chown -R condor:condor /var/lib/condor-ce/gratia/data/ Replacing <ORIGINAL_DIR> with the value that you noted in step a.","title":"HTCondor batch systems"},{"location":"release/updating-to-osg-36/#non-htcondor-batch-systems","text":"After updating your gratia-probe-* packages, verify that your HTCondor-CE's PER_JOB_HISTORY_DIR is set to /var/lib/condor-ce/gratia/data : root@host # condor_ce_config_val -v PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data # at: /etc/condor-ce/config.d/99_gratia-ce.conf, line 5 # raw: PER_JOB_HISTORY_DIR = /var/lib/condor-ce/gratia/data If you see the above output , your Gratia Probe configuration is correct and you may continue onto the next section . If you do not see the above output , visit the file listed in the output of condor_ce_config_val , remove the offending value, and repeat until the proper value is returned.","title":"Non-HTCondor batch systems"},{"location":"release/updating-to-osg-36/#osg-configure","text":"The OSG 3.6 release series contains OSG-Configure 4, a major version upgrade from the previously released versions in the OSG. See the OSG-Configure 4.0.0 release notes for an overview of the changes. Several configuration modules and options were removed or deprecated and CE configuration has been simplified; the update from version 3 to version 4 will require some manual changes to your configuration. To update OSG-Configure, perform the following steps: Merge any *.rpmnew files in /etc/osg/config.d/ Uninstall osg-configure-gip and osg-configure-misc if they are installed: root@host# yum erase osg-configure-gip osg-configure-misc If /etc/osg/config.d/30-gip.ini.rpmsave exists, merge its contents into 31-cluster.ini Edit the Site Information configuration section (in 40-siteinfo.ini ): If resource_group is not set, add: resource_group = <TOPOLOGY RESOURCE GROUP FOR THIS HOST> Delete the following attributes: sponsor site_policy contact email city country latitude longitude Run osg-configure to apply your changes: osg-configure -dc","title":"OSG-Configure"},{"location":"release/updating-to-osg-36/#htcondor-ce","text":"Passing along non-HTCondor batch system directives default_CERequirements in the the new Job Router ClassAd transform syntax is ignored. To fix this, apply the change in this patch to /usr/share/condor-ce/config.d/01-ce-router-defaults.conf . The next release of HTCondor-CE will contain this fix and will not require any additional action post-update. The OSG 3.6 release series contains HTCondor-CE 5, a major version upgrade from HTCondor-CE 4, which was available in the OSG 3.5 release repositories. To update HTCondor-CE, perform the following steps: If the collaboration(s) that you support submit token-based pilots and you map these pilots to non-default local Unix accounts: Copy the relevant default mappings from /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf (provided by the osg-scitokens-mapfile package) to a file in /etc/condor-ce/mapfiles.d/ Replacing the third field with the local Unix account. Also consult the upgrade documentation for other required configuration updates. For OSG CEs serving an HTCondor pool If your OSG CE routes pilot jobs to a local HTCondor pool, also see the section for updating your HTCondor hosts","title":"HTCondor-CE"},{"location":"release/updating-to-osg-36/#starting-ce-services","text":"After updating your RPMs and updating your configuration, turn on the HTCondor-CE service: root@host # systemctl start condor-ce What about gratia-probes-cron ? In OSG 3.6, the OSG CE no longer needs a separate service for Gratia Probe. Instead, the default CE configuration runs its Gratia Probe as a periodic process under the HTCondor-CE process tree.","title":"Starting CE services"},{"location":"release/updating-to-osg-36/#updating-your-htcondor-hosts","text":"HTCondor-CE hosts Consult this section before updating the condor package on your HTCondor-CE hosts. If you are running an HTCondor pool, consult the following instructions to update to HTCondor from OSG 3.6. Note that the version of HTCondor available in OSG 3.6 does not support GSI authentication. If your pool is configured to authenticate with GSI, we recommend using HTCondor's \"IDTOKENS\" configuration for host-to-host authentication. The following OSG specific configuration was dropped in anticipation of HTCondor's new secure by default configuration coming in HTCondor version 9.0. HTCondor's 9.0 recommended security configuration requires authentication for all access (including read access). CONDOR_HOST = $(FULL_HOSTNAME) DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD # require authentication and integrity for everything... SEC_DEFAULT_AUTHENTICATION=REQUIRED SEC_DEFAULT_INTEGRITY=REQUIRED # ...except read access... SEC_READ_AUTHENTICATION=OPTIONAL SEC_READ_INTEGRITY=OPTIONAL # ...and the outgoing (client side) connection since the server side will enforce its policy SEC_CLIENT_AUTHENTICATION=OPTIONAL SEC_CLIENT_INTEGRITY=OPTIONAL # this will required PASSWORD authentications for daemon-to-daemon, and # allow FS authentication for submitting jobs and running administrator commands SEC_DEFAULT_AUTHENTICATION_METHODS = FS, PASSWORD SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD SEC_NEGOTIATOR_AUTHENTICATION_METHODS = PASSWORD SEC_PASSWORD_FILE = /etc/condor/passwords.d/POOL # admin commands (e.g. condor_off) can be run by: # 1. root on the local host or the central manager # 2. condor user on the local host or the central manager ALLOW_ADMINISTRATOR = condor@*/$(FULL_HOSTNAME) condor@*/$(CONDOR_HOST) condor_pool@*/$(FULL_HOSTNAME) condor_pool@*/$(CONDOR_HOST) root@$(UID_DOMAIN)/$(FULL_HOSTNAME) # only the condor daemons on the central manager can negotiate ALLOW_NEGOTIATOR = condor@*/$(CONDOR_HOST) condor_pool@*/$(CONDOR_HOST) # any authenticated daemons in the pool can read/write/advertise ALLOW_DAEMON = condor@* condor_pool@* Manual intervention may be required to upgrade from the HTCondor 8.8 series to HTCondor 9.0.x. Please consult the HTCondor 9.0 upgrade instructions . If you are upgrading from the HTCondor 8.9 series (8.9.11 and earlier), please consult the Upgrading to 9.0 instructions","title":"Updating Your HTCondor Hosts"},{"location":"release/updating-to-osg-36/#replacing-your-gridftp-service","text":"Requirements for XRootD-Multiuser with VOMS FQANs Using XRootD-Multiuser with a VOMS FQAN requires mapping the FQAN to a username, which requires a voms-mapfile . Support is available in xrootd-voms 5.4.2-1.1 , in the OSG 3.6 repos, though it is expected in XRootD 5.5.0. If you want to use multiuser, ensure you are getting xrootd-voms from the OSG repos. As part of the GridFTP and GSI migration , GridFTP is no longer available in the OSG 3.6 repositories. If you need to continue to provide remote access to local storage at your site, follow the instructions to install the OSG's configuration of XRootD Standalone .","title":"Replacing Your GridFTP Service"},{"location":"release/updating-to-osg-36/#getting-help","text":"To get assistance, please use the this page .","title":"Getting Help"},{"location":"release/yum-basics/","text":"Basics of using yum and RPM \u00b6 About This Document \u00b6 This document introduces package management tools that help you install, update, and remove packages. OSG uses RPMs (the Red Hat Packaging Manager) to package its software. While RPM is the packaging format, yum is the command you will use to do the installation. For example, yum will resolve and download the dependencies for the package you want to install; rpm will simply complain if you want to install a package that does not have all its dependencies installed. Installation \u00b6 Installation is done with the yum install command. Each of the individual installation guide shows you the correct command to use to do an installation. Here is an example installation with all of the output from yum. root@host # sudo yum install osg-ca-certs OSG Software for Enterprise Linux 9 - x86_64 668 kB/s | 438 kB 00:00 Dependencies resolved. ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: osg-ca-certs noarch 1.110-1.2.osg36.el9 osg 244 k Transaction Summary ==================================================================================================================== Install 1 Package Total download size: 244 k Installed size: 340 k Is this ok [y/N]: y Downloading Packages: osg-ca-certs-1.110-1.2.osg36.el9.noarch.rpm 1.5 MB/s | 244 kB 00:00 -------------------------------------------------------------------------------------------------------------------- Total 1.0 MB/s | 244 kB 00:00 OSG Software for Enterprise Linux 9 - x86_64 3.0 MB/s | 3.1 kB 00:00 Importing GPG key 0x1887C61A: Userid : \"OSG Software 3.6 for EL9 RSA <help@osg-htc.org>\" Fingerprint: B77E 70A6 0537 1D3B E109 A18E 3170 E150 1887 C61A From : /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 Is this ok [y/N]: y ... Installed: osg-ca-certs-1.110-1.2.osg36.el9.noarch Please Note : When you first install a package from the OSG repository, you will be prompted to import the GPG key. We use this key to sign our RPMs as a security measure. You should double-check the key id (above it is 824B8603) with the information on our signed RPMs . If it doesn't match, there is a problem somewhere and you should report it to the OSG via help@osg-htc.org. Verifying Packages and Installations \u00b6 You can check if an RPM has been modified. For instance, to check to see if any files have been modified in the osg-ca-certs RPM you just installed: user@host $ rpm --verify osg-ca-certs The lack of any output means there were no problems. If you would like to see all the files for which there are no problems, you can do: user@host $ rpm --verify -v osg-ca-certs ........ /etc/grid-security/certificates ........ /etc/grid-security/certificates/0119347c.0 ........ /etc/grid-security/certificates/0119347c.namespaces ........ /etc/grid-security/certificates/0119347c.signing_policy ... etc ... Each dot indicates a specific check that was made and passed. If someone had modified a file, you might see this: user@host $ rpm --verify osg-ca-certs ..5....T /etc/grid-security/certificates/ffc3d59b.0 This means the files MD5 checksum has changed (so the contents have changed) and the timestamp is different. The complete set of changes you might see (copied from the rpm man page) are: Letter Meaning S file Size differs M Mode differs (includes permissions and file type) 5 MD5 sum differs D Device major/minor number mismatch L readLink(2) path mismatch U User ownership differs G Group ownership differs T mTime differs If you don't care about some of those changes, you can tell rpm to ignore them. For instance, to ignore changes in the modification time: user@host $ rpm --verify --nomtime osg-ca-certs ..5..... /etc/grid-security/certificates/ffc3d59b.0 Understanding a package \u00b6 If you want to know what package a file belongs to, you can ask rpm. For instance, if you're curious what package contains the srm-ls command, you can do: # 1 . Find the exact path user@host $ which osg-cert-request /usr/bin/osg-cert-request # 2 . Ask rpm what package it is part of: user@host $ rpm -q --file /usr/bin/osg-cert-request osg-pki-tools-3.5.2-2.osg36.el9.noarch If you want to know what other things are in a package--perhaps the other available tools or configuration files--you can do that as well: user@host $ rpm -ql osg-pki-tools /usr/bin/osg-cert-request /usr/bin/osg-incommon-cert-request /usr/lib/python3.9/site-packages/osgpkitools /usr/lib/python3.9/site-packages/osgpkitools/ExceptionDefinitions.py /usr/lib/python3.9/site-packages/osgpkitools/__init__.py ... output trimmed ... What else does a package install? \u00b6 Sometimes you need to understand what other software is installed by a package. This can be particularly useful for understanding meta-packages , which are packages such as the osg-wn-client (worker node client) that contain nothing by themselves but only depend on other RPMs. To do this, use the --requires option to rpm. For example, you can see that the worker node client (as of OSG 3.6.0 in early June, 2023) will install curl , uberftp , wget , and a dozen or so other packages. user@host $ rpm -q --requires osg-wn-client /usr/bin/curl /usr/bin/ldapsearch /usr/bin/wget /usr/bin/xrdcp config(osg-wn-client) = 3.6-6.osg36.el9 fetch-crl gfal2 gfal2-plugin-file gfal2-plugin-http gfal2-plugin-xrootd grid-certificates >= 7 osg-system-profiler python3-gfal2-util rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(PayloadIsZstd) <= 5.4.18-1 stashcp vo-client voms-clients-cpp Finding RPM Packages \u00b6 It is normally best to read the OSG documentation to decide which packages to install because it may not be obvious what the correct packages to install are. That said, you can use yum to find out all sort of things. For instance, you can list packages that begin with \"voms\": user@host $ yum list \"voms*\" Available Packages voms.x86_64 2.1.0-0.27.rc3.el9 epel voms-clients-cpp.x86_64 2.1.0-0.27.rc3.el9 epel voms-api-java.noarch 3.3.2-11.el9 epel voms-api-java-javadoc.noarch 3.3.2-11.el9 epel voms-clients-java.noarch 3.3.2-5.el9 epel voms-devel.x86_64 2.1.0-0.27.rc3.el9 epel voms-doc.noarch 2.1.0-0.27.rc3.el9 epel voms-mysql-plugin.x86_64 3.1.7-13.el9 epel voms-server.x86_64 2.1.0-0.27.rc3.el9 epel If you want to search for packages that contain VOMS anywhere in the name or description, you can use yum search : user@host $ yum search voms ============================================ Name Exactly Matched: voms ============================================ voms.x86_64 : Virtual Organization Membership Service =========================================== Name & Summary Matched: voms =========================================== vo-client-lcmaps-voms.noarch : Provides a voms-mapfile-default file, mapping VOMS FQANs to Unix users suitable for : use by the LCMAPS VOMS plugin voms-mysql-plugin.x86_64 : VOMS server plugin for MySQL xrootd-voms.x86_64 : VOMS attribute extractor plug-in for XRootD ================================================ Name Matched: voms ================================================ voms-doc.noarch : Virtual Organization Membership Service Documentation voms-server.x86_64 : Virtual Organization Membership Service Server ============================================== Summary Matched: voms =============================================== vo-client.noarch : Contains vomses file for use with user authentication vo-client-dcache.noarch : Provides a grid-vorolemap file for use by dCache, similar to voms-mapfile-default ... etc ... One last example, if you want to know what RPM would give you the voms-proxy-init command, you can ask yum . The * indicates that you don't know the full pathname of voms-proxy-init . user@host $ yum whatprovides \"*voms-proxy-init\" voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 : Virtual Organization Membership Service Clients Repo : @System Matched from: Other : *voms-proxy-init voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 : Virtual Organization Membership Service Clients Repo : epel Matched from: Other : *voms-proxy-init voms-clients-java-3.3.2-5.el9.noarch : Virtual Organization Membership Service Java clients Repo : epel Matched from: Other : *voms-proxy-init Removing Packages \u00b6 To remove a single RPM, you can use yum remove . Not only will it uninstall the RPM you requested, but it will uninstall anything that depends on it. For example, if I previously installed the voms-clients package, I also installed another package it depends on called voms . If I remove voms , yum will also remove voms-clients : user@host $ sudo yum remove voms Dependencies resolved. ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Removing: voms x86_64 2.1.0-0.27.rc3.el9 @epel 432 k Removing dependent packages: osg-wn-client noarch 3.6-6.osg36.el9 @osg 211 Transaction Summary ==================================================================================================================== Remove 2 Package(s) Is this ok [y/N]: y Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running Transaction ... etc ... Removed: voms-2.1.0-0.27.rc3.el9.x86_64 Complete! Upgrading Packages \u00b6 You can check for updates with yum check-update . For example: root@host # yum check-update Loaded plugins: kernel-module, priorities 957 packages excluded due to repository priority protections kernel.x86_64 2.6.18-274.3.1.el5 fermi-security Obsoleting Packages ocsinventory-agent.noarch 1.1.2.1-1.el5 epel ocsinventory-client.noarch 0.9.9-10 installed You can do the update with yum update . Note that in this case we got more than was listed due to dependencies that needed to be resolved: root@host # yum update 957 packages excluded due to repository priority protections Setting up Update Process Resolving Dependencies --> Running transaction check ---> Package kernel.x86_64 0:2.6.18-274.3.1.el5 set to be installed ---> Package ocsinventory-agent.noarch 0:1.1.2.1-1.el5 set to be updated --> Processing Dependency: perl(Crypt::SSLeay) for package: ocsinventory-agent --> Processing Dependency: perl(Proc::Daemon) for package: ocsinventory-agent --> Processing Dependency: monitor-edid for package: ocsinventory-agent --> Processing Dependency: perl(Net::IP) for package: ocsinventory-agent --> Processing Dependency: nmap for package: ocsinventory-agent --> Processing Dependency: perl(Net::SSLeay) for package: ocsinventory-agent --> Running transaction check ---> Package monitor-edid.x86_64 0:2.5-1.el5.1 set to be updated ---> Package nmap.x86_64 2:4.11-1.1 set to be updated ---> Package perl-Crypt-SSLeay.x86_64 0:0.51-11.el5 set to be updated ---> Package perl-Net-IP.noarch 0:1.25-2.fc6 set to be updated ---> Package perl-Net-SSLeay.x86_64 0:1.30-4.fc6 set to be updated ---> Package perl-Proc-Daemon.noarch 0:0.03-1.el5 set to be updated --> Finished Dependency Resolution Beginning Kernel Module Plugin Finished Kernel Module Plugin --> Running transaction check ---> Package kernel.x86_64 0:2.6.18-238.1.1.el5 set to be erased --> Finished Dependency Resolution Dependencies Resolved ==================================================================================================================== Package Arch Version Repository Size ==================================================================================================================== Installing: kernel x86_64 2.6.18-274.3.1.el5 fermi-security 21 M ocsinventory-agent noarch 1.1.2.1-1.el5 epel 156 k replacing ocsinventory-client.noarch 0.9.9-10 Removing: kernel x86_64 2.6.18-238.1.1.el5 installed 93 M Installing for dependencies: monitor-edid x86_64 2.5-1.el5.1 epel 82 k nmap x86_64 2:4.11-1.1 sl-base 680 k perl-Crypt-SSLeay x86_64 0.51-11.el5 sl-base 45 k perl-Net-IP noarch 1.25-2.fc6 sl-base 31 k perl-Net-SSLeay x86_64 1.30-4.fc6 sl-base 192 k perl-Proc-Daemon noarch 0.03-1.el5 epel 9.4 k Transaction Summary ==================================================================================================================== Install 8 Package(s) Upgrade 0 Package(s) Remove 1 Package(s) Reinstall 0 Package(s) Downgrade 0 Package(s) Total download size: 22 M Is this ok [y/N]: y Downloading Packages: (1/8) : perl-Proc-Daemon-0.03-1.el5.noarch.rpm | 9.4 kB 00:00 (2/8) : perl-Net-IP-1.25-2.fc6.noarch.rpm | 31 kB 00:00 (3/8) : perl-Crypt-SSLeay-0.51-11.el5.x86_64.rpm | 45 kB 00:00 (4/8) : monitor-edid-2.5-1.el5.1.x86_64.rpm | 82 kB 00:00 (5/8) : ocsinventory-agent-1.1.2.1-1.el5.noarch.rpm | 156 kB 00:00 (6/8) : perl-Net-SSLeay-1.30-4.fc6.x86_64.rpm | 192 kB 00:00 (7/8) : nmap-4.11-1.1.x86_64.rpm | 680 kB 00:00 (8/8) : kernel-2.6.18-274.3.1.el5.x86_64.rpm | 21 MB 00:00 -------------------------------------------------------------------------------------------------------------------- Total 3.5 MB/s | 22 MB 00:06 warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID 217521f6 epel/gpgkey | 1.7 kB 00:00 Importing GPG key 0x217521F6 \"Fedora EPEL <epel@fedoraproject.org>\" from /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL Is this ok [y/N]: y Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : perl-Net-SSLeay 1/10 Installing : nmap 2/10 Installing : monitor-edid 3/10 Installing : perl-Crypt-SSLeay 4/10 Installing : perl-Net-IP 5/10 Installing : perl-Proc-Daemon 6/10 Installing : kernel 7/10 Installing : ocsinventory-agent 8/10 ule, priorities 957 packages excluded due to repository priority protections kernel.x86_64 2.6.18-274.3.1.el5 fermi-security Obsoleting Packages ocsinventory-agent.noarch 1.1.2.1-1.el5 epel ocsinventory-client.noarch 0.9.9-10 installed Erasing : ocsinventory-client 9/10 warning: /etc/ocsinventory-client/ocsinv.conf saved as /etc/ocsinventory-client/ocsinv.conf.rpmsave Cleanup : kernel 10/10 Removed: kernel.x86_64 0:2.6.18-238.1.1.el5 Installed: kernel.x86_64 0:2.6.18-274.3.1.el5 ocsinventory-agent.noarch 0:1.1.2.1-1.el5 Dependency Installed: monitor-edid.x86_64 0:2.5-1.el5.1 nmap.x86_64 2:4.11-1.1 perl-Crypt-SSLeay.x86_64 0:0.51-11.el5 perl-Net-IP.noarch 0:1.25-2.fc6 perl-Net-SSLeay.x86_64 0:1.30-4.fc6 perl-Proc-Daemon.noarch 0:0.03-1.el5 Replaced: ocsinventory-client.noarch 0:0.9.9-10 Complete! Advanced topic: Only geting OSG updates \u00b6 If you only want to get updates from the OSG repository and no other repositories, you can tell yum to do that with the following command: root@host # yum --disablerepo = * --enablerepo = osg update Advanced topic: Getting debugging information for installed software \u00b6 If you run into a problem with our software and have a hankering to debug it directly (or perhaps we need to ask you for some help), you can install so-called \"debuginfo\" packages. These packages will provide debugging symbols and source code so that you can do things like run gdb or pstack to get information about a program. Installing the debuginfo package requires three steps. Install the yum-utils package, which contains the debuginfo-install utility. root@host # yum install yum-utils ... ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: yum-utils noarch 4.3.0-5.el9_2 baseos 35 k Transaction Summary ==================================================================================================================== Install 1 Package Total download size: 35 k Installed size: 23 k Is this ok [y/N]: y ... Installed: yum-utils-4.3.0-5.el9_2.noarch Figure out which package installed the program you want to debug. One way to figure it out is to ask RPM. For example, if you want to debug grid-proxy-init: user@host $ rpm -qf ` which voms-proxy-init ` voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 Install the debugging information for that package. Continuing this example: root@host # debuginfo-install voms-clients-cpp ... ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: voms-clients-cpp-debuginfo x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 437 k voms-debugsource x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 195 k Installing dependencies: voms-debuginfo x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 928 k Transaction Summary ==================================================================================================================== Install 3 Packages Total download size: 1.5 M Installed size: 6.0 M Is this ok [y/N]: y ... Installed: voms-clients-cpp-debuginfo-2.1.0-0.27.rc3.el9.x86_64 voms-debuginfo-2.1.0-0.27.rc3.el9.x86_64 voms-debugsource-2.1.0-0.27.rc3.el9.x86_64 This last step will select the right package name, then use yum to install it. Troubleshooting \u00b6 Yum not finding packages \u00b6 If you is not finding some packages, e.g.: Error Downloading Packages: packageXYZ: failure: packageXYZ.rpm from osg: [Errno 256] No more mirrors to try. then you can try cleaning up Yum's cache: root@host # yum clean all --enablerepo = * Yum complaining about missing keys \u00b6 If yum is complaining you can re-import the keys in your distribution: root@host # rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY* References \u00b6 The main Yum web site A good description of the commands for RPM and Yum can be found at Learn Linux 101: RPM and Yum Package Management .","title":"Yum Basics"},{"location":"release/yum-basics/#basics-of-using-yum-and-rpm","text":"","title":"Basics of using yum and RPM"},{"location":"release/yum-basics/#about-this-document","text":"This document introduces package management tools that help you install, update, and remove packages. OSG uses RPMs (the Red Hat Packaging Manager) to package its software. While RPM is the packaging format, yum is the command you will use to do the installation. For example, yum will resolve and download the dependencies for the package you want to install; rpm will simply complain if you want to install a package that does not have all its dependencies installed.","title":"About This Document"},{"location":"release/yum-basics/#installation","text":"Installation is done with the yum install command. Each of the individual installation guide shows you the correct command to use to do an installation. Here is an example installation with all of the output from yum. root@host # sudo yum install osg-ca-certs OSG Software for Enterprise Linux 9 - x86_64 668 kB/s | 438 kB 00:00 Dependencies resolved. ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: osg-ca-certs noarch 1.110-1.2.osg36.el9 osg 244 k Transaction Summary ==================================================================================================================== Install 1 Package Total download size: 244 k Installed size: 340 k Is this ok [y/N]: y Downloading Packages: osg-ca-certs-1.110-1.2.osg36.el9.noarch.rpm 1.5 MB/s | 244 kB 00:00 -------------------------------------------------------------------------------------------------------------------- Total 1.0 MB/s | 244 kB 00:00 OSG Software for Enterprise Linux 9 - x86_64 3.0 MB/s | 3.1 kB 00:00 Importing GPG key 0x1887C61A: Userid : \"OSG Software 3.6 for EL9 RSA <help@osg-htc.org>\" Fingerprint: B77E 70A6 0537 1D3B E109 A18E 3170 E150 1887 C61A From : /etc/pki/rpm-gpg/RPM-GPG-KEY-OSG-4 Is this ok [y/N]: y ... Installed: osg-ca-certs-1.110-1.2.osg36.el9.noarch Please Note : When you first install a package from the OSG repository, you will be prompted to import the GPG key. We use this key to sign our RPMs as a security measure. You should double-check the key id (above it is 824B8603) with the information on our signed RPMs . If it doesn't match, there is a problem somewhere and you should report it to the OSG via help@osg-htc.org.","title":"Installation"},{"location":"release/yum-basics/#verifying-packages-and-installations","text":"You can check if an RPM has been modified. For instance, to check to see if any files have been modified in the osg-ca-certs RPM you just installed: user@host $ rpm --verify osg-ca-certs The lack of any output means there were no problems. If you would like to see all the files for which there are no problems, you can do: user@host $ rpm --verify -v osg-ca-certs ........ /etc/grid-security/certificates ........ /etc/grid-security/certificates/0119347c.0 ........ /etc/grid-security/certificates/0119347c.namespaces ........ /etc/grid-security/certificates/0119347c.signing_policy ... etc ... Each dot indicates a specific check that was made and passed. If someone had modified a file, you might see this: user@host $ rpm --verify osg-ca-certs ..5....T /etc/grid-security/certificates/ffc3d59b.0 This means the files MD5 checksum has changed (so the contents have changed) and the timestamp is different. The complete set of changes you might see (copied from the rpm man page) are: Letter Meaning S file Size differs M Mode differs (includes permissions and file type) 5 MD5 sum differs D Device major/minor number mismatch L readLink(2) path mismatch U User ownership differs G Group ownership differs T mTime differs If you don't care about some of those changes, you can tell rpm to ignore them. For instance, to ignore changes in the modification time: user@host $ rpm --verify --nomtime osg-ca-certs ..5..... /etc/grid-security/certificates/ffc3d59b.0","title":"Verifying Packages and Installations"},{"location":"release/yum-basics/#understanding-a-package","text":"If you want to know what package a file belongs to, you can ask rpm. For instance, if you're curious what package contains the srm-ls command, you can do: # 1 . Find the exact path user@host $ which osg-cert-request /usr/bin/osg-cert-request # 2 . Ask rpm what package it is part of: user@host $ rpm -q --file /usr/bin/osg-cert-request osg-pki-tools-3.5.2-2.osg36.el9.noarch If you want to know what other things are in a package--perhaps the other available tools or configuration files--you can do that as well: user@host $ rpm -ql osg-pki-tools /usr/bin/osg-cert-request /usr/bin/osg-incommon-cert-request /usr/lib/python3.9/site-packages/osgpkitools /usr/lib/python3.9/site-packages/osgpkitools/ExceptionDefinitions.py /usr/lib/python3.9/site-packages/osgpkitools/__init__.py ... output trimmed ...","title":"Understanding a package"},{"location":"release/yum-basics/#what-else-does-a-package-install","text":"Sometimes you need to understand what other software is installed by a package. This can be particularly useful for understanding meta-packages , which are packages such as the osg-wn-client (worker node client) that contain nothing by themselves but only depend on other RPMs. To do this, use the --requires option to rpm. For example, you can see that the worker node client (as of OSG 3.6.0 in early June, 2023) will install curl , uberftp , wget , and a dozen or so other packages. user@host $ rpm -q --requires osg-wn-client /usr/bin/curl /usr/bin/ldapsearch /usr/bin/wget /usr/bin/xrdcp config(osg-wn-client) = 3.6-6.osg36.el9 fetch-crl gfal2 gfal2-plugin-file gfal2-plugin-http gfal2-plugin-xrootd grid-certificates >= 7 osg-system-profiler python3-gfal2-util rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(PayloadIsZstd) <= 5.4.18-1 stashcp vo-client voms-clients-cpp","title":"What else does a package install?"},{"location":"release/yum-basics/#finding-rpm-packages","text":"It is normally best to read the OSG documentation to decide which packages to install because it may not be obvious what the correct packages to install are. That said, you can use yum to find out all sort of things. For instance, you can list packages that begin with \"voms\": user@host $ yum list \"voms*\" Available Packages voms.x86_64 2.1.0-0.27.rc3.el9 epel voms-clients-cpp.x86_64 2.1.0-0.27.rc3.el9 epel voms-api-java.noarch 3.3.2-11.el9 epel voms-api-java-javadoc.noarch 3.3.2-11.el9 epel voms-clients-java.noarch 3.3.2-5.el9 epel voms-devel.x86_64 2.1.0-0.27.rc3.el9 epel voms-doc.noarch 2.1.0-0.27.rc3.el9 epel voms-mysql-plugin.x86_64 3.1.7-13.el9 epel voms-server.x86_64 2.1.0-0.27.rc3.el9 epel If you want to search for packages that contain VOMS anywhere in the name or description, you can use yum search : user@host $ yum search voms ============================================ Name Exactly Matched: voms ============================================ voms.x86_64 : Virtual Organization Membership Service =========================================== Name & Summary Matched: voms =========================================== vo-client-lcmaps-voms.noarch : Provides a voms-mapfile-default file, mapping VOMS FQANs to Unix users suitable for : use by the LCMAPS VOMS plugin voms-mysql-plugin.x86_64 : VOMS server plugin for MySQL xrootd-voms.x86_64 : VOMS attribute extractor plug-in for XRootD ================================================ Name Matched: voms ================================================ voms-doc.noarch : Virtual Organization Membership Service Documentation voms-server.x86_64 : Virtual Organization Membership Service Server ============================================== Summary Matched: voms =============================================== vo-client.noarch : Contains vomses file for use with user authentication vo-client-dcache.noarch : Provides a grid-vorolemap file for use by dCache, similar to voms-mapfile-default ... etc ... One last example, if you want to know what RPM would give you the voms-proxy-init command, you can ask yum . The * indicates that you don't know the full pathname of voms-proxy-init . user@host $ yum whatprovides \"*voms-proxy-init\" voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 : Virtual Organization Membership Service Clients Repo : @System Matched from: Other : *voms-proxy-init voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 : Virtual Organization Membership Service Clients Repo : epel Matched from: Other : *voms-proxy-init voms-clients-java-3.3.2-5.el9.noarch : Virtual Organization Membership Service Java clients Repo : epel Matched from: Other : *voms-proxy-init","title":"Finding RPM Packages"},{"location":"release/yum-basics/#removing-packages","text":"To remove a single RPM, you can use yum remove . Not only will it uninstall the RPM you requested, but it will uninstall anything that depends on it. For example, if I previously installed the voms-clients package, I also installed another package it depends on called voms . If I remove voms , yum will also remove voms-clients : user@host $ sudo yum remove voms Dependencies resolved. ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Removing: voms x86_64 2.1.0-0.27.rc3.el9 @epel 432 k Removing dependent packages: osg-wn-client noarch 3.6-6.osg36.el9 @osg 211 Transaction Summary ==================================================================================================================== Remove 2 Package(s) Is this ok [y/N]: y Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running Transaction ... etc ... Removed: voms-2.1.0-0.27.rc3.el9.x86_64 Complete!","title":"Removing Packages"},{"location":"release/yum-basics/#upgrading-packages","text":"You can check for updates with yum check-update . For example: root@host # yum check-update Loaded plugins: kernel-module, priorities 957 packages excluded due to repository priority protections kernel.x86_64 2.6.18-274.3.1.el5 fermi-security Obsoleting Packages ocsinventory-agent.noarch 1.1.2.1-1.el5 epel ocsinventory-client.noarch 0.9.9-10 installed You can do the update with yum update . Note that in this case we got more than was listed due to dependencies that needed to be resolved: root@host # yum update 957 packages excluded due to repository priority protections Setting up Update Process Resolving Dependencies --> Running transaction check ---> Package kernel.x86_64 0:2.6.18-274.3.1.el5 set to be installed ---> Package ocsinventory-agent.noarch 0:1.1.2.1-1.el5 set to be updated --> Processing Dependency: perl(Crypt::SSLeay) for package: ocsinventory-agent --> Processing Dependency: perl(Proc::Daemon) for package: ocsinventory-agent --> Processing Dependency: monitor-edid for package: ocsinventory-agent --> Processing Dependency: perl(Net::IP) for package: ocsinventory-agent --> Processing Dependency: nmap for package: ocsinventory-agent --> Processing Dependency: perl(Net::SSLeay) for package: ocsinventory-agent --> Running transaction check ---> Package monitor-edid.x86_64 0:2.5-1.el5.1 set to be updated ---> Package nmap.x86_64 2:4.11-1.1 set to be updated ---> Package perl-Crypt-SSLeay.x86_64 0:0.51-11.el5 set to be updated ---> Package perl-Net-IP.noarch 0:1.25-2.fc6 set to be updated ---> Package perl-Net-SSLeay.x86_64 0:1.30-4.fc6 set to be updated ---> Package perl-Proc-Daemon.noarch 0:0.03-1.el5 set to be updated --> Finished Dependency Resolution Beginning Kernel Module Plugin Finished Kernel Module Plugin --> Running transaction check ---> Package kernel.x86_64 0:2.6.18-238.1.1.el5 set to be erased --> Finished Dependency Resolution Dependencies Resolved ==================================================================================================================== Package Arch Version Repository Size ==================================================================================================================== Installing: kernel x86_64 2.6.18-274.3.1.el5 fermi-security 21 M ocsinventory-agent noarch 1.1.2.1-1.el5 epel 156 k replacing ocsinventory-client.noarch 0.9.9-10 Removing: kernel x86_64 2.6.18-238.1.1.el5 installed 93 M Installing for dependencies: monitor-edid x86_64 2.5-1.el5.1 epel 82 k nmap x86_64 2:4.11-1.1 sl-base 680 k perl-Crypt-SSLeay x86_64 0.51-11.el5 sl-base 45 k perl-Net-IP noarch 1.25-2.fc6 sl-base 31 k perl-Net-SSLeay x86_64 1.30-4.fc6 sl-base 192 k perl-Proc-Daemon noarch 0.03-1.el5 epel 9.4 k Transaction Summary ==================================================================================================================== Install 8 Package(s) Upgrade 0 Package(s) Remove 1 Package(s) Reinstall 0 Package(s) Downgrade 0 Package(s) Total download size: 22 M Is this ok [y/N]: y Downloading Packages: (1/8) : perl-Proc-Daemon-0.03-1.el5.noarch.rpm | 9.4 kB 00:00 (2/8) : perl-Net-IP-1.25-2.fc6.noarch.rpm | 31 kB 00:00 (3/8) : perl-Crypt-SSLeay-0.51-11.el5.x86_64.rpm | 45 kB 00:00 (4/8) : monitor-edid-2.5-1.el5.1.x86_64.rpm | 82 kB 00:00 (5/8) : ocsinventory-agent-1.1.2.1-1.el5.noarch.rpm | 156 kB 00:00 (6/8) : perl-Net-SSLeay-1.30-4.fc6.x86_64.rpm | 192 kB 00:00 (7/8) : nmap-4.11-1.1.x86_64.rpm | 680 kB 00:00 (8/8) : kernel-2.6.18-274.3.1.el5.x86_64.rpm | 21 MB 00:00 -------------------------------------------------------------------------------------------------------------------- Total 3.5 MB/s | 22 MB 00:06 warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID 217521f6 epel/gpgkey | 1.7 kB 00:00 Importing GPG key 0x217521F6 \"Fedora EPEL <epel@fedoraproject.org>\" from /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL Is this ok [y/N]: y Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : perl-Net-SSLeay 1/10 Installing : nmap 2/10 Installing : monitor-edid 3/10 Installing : perl-Crypt-SSLeay 4/10 Installing : perl-Net-IP 5/10 Installing : perl-Proc-Daemon 6/10 Installing : kernel 7/10 Installing : ocsinventory-agent 8/10 ule, priorities 957 packages excluded due to repository priority protections kernel.x86_64 2.6.18-274.3.1.el5 fermi-security Obsoleting Packages ocsinventory-agent.noarch 1.1.2.1-1.el5 epel ocsinventory-client.noarch 0.9.9-10 installed Erasing : ocsinventory-client 9/10 warning: /etc/ocsinventory-client/ocsinv.conf saved as /etc/ocsinventory-client/ocsinv.conf.rpmsave Cleanup : kernel 10/10 Removed: kernel.x86_64 0:2.6.18-238.1.1.el5 Installed: kernel.x86_64 0:2.6.18-274.3.1.el5 ocsinventory-agent.noarch 0:1.1.2.1-1.el5 Dependency Installed: monitor-edid.x86_64 0:2.5-1.el5.1 nmap.x86_64 2:4.11-1.1 perl-Crypt-SSLeay.x86_64 0:0.51-11.el5 perl-Net-IP.noarch 0:1.25-2.fc6 perl-Net-SSLeay.x86_64 0:1.30-4.fc6 perl-Proc-Daemon.noarch 0:0.03-1.el5 Replaced: ocsinventory-client.noarch 0:0.9.9-10 Complete!","title":"Upgrading Packages"},{"location":"release/yum-basics/#advanced-topic-only-geting-osg-updates","text":"If you only want to get updates from the OSG repository and no other repositories, you can tell yum to do that with the following command: root@host # yum --disablerepo = * --enablerepo = osg update","title":"Advanced topic: Only geting OSG updates"},{"location":"release/yum-basics/#advanced-topic-getting-debugging-information-for-installed-software","text":"If you run into a problem with our software and have a hankering to debug it directly (or perhaps we need to ask you for some help), you can install so-called \"debuginfo\" packages. These packages will provide debugging symbols and source code so that you can do things like run gdb or pstack to get information about a program. Installing the debuginfo package requires three steps. Install the yum-utils package, which contains the debuginfo-install utility. root@host # yum install yum-utils ... ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: yum-utils noarch 4.3.0-5.el9_2 baseos 35 k Transaction Summary ==================================================================================================================== Install 1 Package Total download size: 35 k Installed size: 23 k Is this ok [y/N]: y ... Installed: yum-utils-4.3.0-5.el9_2.noarch Figure out which package installed the program you want to debug. One way to figure it out is to ask RPM. For example, if you want to debug grid-proxy-init: user@host $ rpm -qf ` which voms-proxy-init ` voms-clients-cpp-2.1.0-0.27.rc3.el9.x86_64 Install the debugging information for that package. Continuing this example: root@host # debuginfo-install voms-clients-cpp ... ==================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================== Installing: voms-clients-cpp-debuginfo x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 437 k voms-debugsource x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 195 k Installing dependencies: voms-debuginfo x86_64 2.1.0-0.27.rc3.el9 epel-debuginfo 928 k Transaction Summary ==================================================================================================================== Install 3 Packages Total download size: 1.5 M Installed size: 6.0 M Is this ok [y/N]: y ... Installed: voms-clients-cpp-debuginfo-2.1.0-0.27.rc3.el9.x86_64 voms-debuginfo-2.1.0-0.27.rc3.el9.x86_64 voms-debugsource-2.1.0-0.27.rc3.el9.x86_64 This last step will select the right package name, then use yum to install it.","title":"Advanced topic: Getting debugging information for installed software"},{"location":"release/yum-basics/#troubleshooting","text":"","title":"Troubleshooting"},{"location":"release/yum-basics/#yum-not-finding-packages","text":"If you is not finding some packages, e.g.: Error Downloading Packages: packageXYZ: failure: packageXYZ.rpm from osg: [Errno 256] No more mirrors to try. then you can try cleaning up Yum's cache: root@host # yum clean all --enablerepo = *","title":"Yum not finding packages"},{"location":"release/yum-basics/#yum-complaining-about-missing-keys","text":"If yum is complaining you can re-import the keys in your distribution: root@host # rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY*","title":"Yum complaining about missing keys"},{"location":"release/yum-basics/#references","text":"The main Yum web site A good description of the commands for RPM and Yum can be found at Learn Linux 101: RPM and Yum Package Management .","title":"References"},{"location":"resource-sharing/os-backfill-containers/","text":"Open Science Pool Containers \u00b6 In order to share compute resources with the Open Science Pool (OSPool), sites can launch pilot jobs directly by starting an OSG-provided Docker container. The container includes a worker node environment and an embedded pilot; when combined with an OSG-provided authentication token (not included in the container), the pilot can connect to the OSPool and start executing jobs. This technique is useful to implement backfill at a site: contributing computing resources when they would otherwise be idle. Container Limitations These containers do not allow the site to share resources between multiple pools and, if there are no matching idle jobs in the OSPool, the pilots may remain idle. Before Starting \u00b6 In order to configure the container, you will need: A system that can run containers, such as Docker or Kubernetes A registered administrative contact A registered resource in OSG Topology; resource registration allows OSG to do proper usage accounting and maintain contacts in case of security incidents and other issues. An authentication token from the OSG: once contact and resource registration are complete, you can retrieve a token through the OSPool Token Registry An HTTP caching proxy at or near your site. Running the Container with Docker \u00b6 The Docker image is kept in DockerHub . In order to successfully start payload jobs: Configure authentication: Authentication with the OSPool is performed using tokens retrieved from the OSPool Token Registry which you can then pass to the container by volume mounting it as a file under /etc/condor/tokens-orig.d/ . If you are using Docker to launch the container, this is done with the command line flag -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org . Replace /path/to/token with the full path to the token you obtained from the OSPool Token Registry. Set GLIDEIN_Site and GLIDEIN_ResourceName to match the resource group name and resource name that you registered in Topology, respectively. Set the OSG_SQUID_LOCATION environment variable to the HTTP address of your preferred Squid instance. If providing NVIDIA GPU resources: Bind-mount /etc/OpenCL/vendors , read-only. If you are using Docker to launch the container, this is done with the command line flags -v /etc/OpenCL/vendors:/etc/OpenCL/vendors:ro . Strongly_recommended: Enable CVMFS via one of the mechanisms described below. Strongly recommended: If you want job I/O to be done in a separate directory outside of the container, volume mount the desired directory on the host to /pilot inside the container. Without this, user jobs may compete for disk space with other containers on your system. If you are using Docker to launch the container, this is done with the command line flag -v /worker-temp-dir:/pilot . Replace /worker-temp-dir with a directory you created for jobs to write into. Make sure the user you run your container as has write access to this directory. Optional: add an expression with the GLIDEIN_Start_Extra environment variable to append to the HTCondor START expression ; this limits the pilot to only run certain jobs. Optional: limit OSG pilot container resource usage Here is an example invocation using docker run by hand: docker run -it --rm --user osg \\ --pull=always \\ --privileged \\ -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org \\ -v /worker-temp-dir:/pilot \\ -e GLIDEIN_Site=\"...\" \\ -e GLIDEIN_ResourceName=\"...\" \\ -e GLIDEIN_Start_Extra=\"True\" \\ -e OSG_SQUID_LOCATION=\"...\" \\ -e CVMFSEXEC_REPOS=\" \\ oasis.opensciencegrid.org \\ singularity.opensciencegrid.org\" \\ opensciencegrid/osgvo-docker-pilot:3.6-release Replace /path/to/token with the location you saved the token obtained from the OSPool Token Registry. Privileged mode ( --privileged ) requested in the above docker run allows the container to mount CVMFS using cvmfsexec and invoke singularity for user jobs. Singularity (now known as Apptainer) allows OSPool users to use their own container for their job (e.g., a common use case for GPU jobs). Optional Configuration \u00b6 (Recommended) CVMFS \u00b6 CernVM-FS (CVMFS) is a read-only remote filesystem that many OSG jobs depend on for software and data. Supporting CVMFS inside your container will greatly increase the types of OSG jobs you can run. There are two methods for making CVMFS available in your container: enabling cvmfsexec , or bind mounting CVMFS from the host . Bind mounting CVMFS will require CVMFS to be installed on the host first, but the container will need fewer privileges. cvmfsexec \u00b6 cvmfsexec System Requirements On EL7, you must have kernel version >= 3.10.0-1127 (run uname -vr to check), and user namespaces enabled. See step 1 in the Apptainer Install document for details. On EL8, you must have kernel version >= 4.18 (run uname -vr to check). See the cvmfsexec README details. cvmfsexec is a tool that can be used to mount CVMFS inside the container without requiring CVMFS on the host. To enable cvmfsexec, specify a space-separated list of repos in the CVMFSEXEC_REPOS environment variable. At a minimum, we recommend enabling the following repos: oasis.opensciencegrid.org singularity.opensciencegrid.org Additionally, you may set the following environment variables to further control the behavior of cvmfsexec: CVMFS_HTTP_PROXY - this sets the proxy to use for CVMFS; if left blank it will find the best one via WLCG Web Proxy Auto Discovery. CVMFS_QUOTA_LIMIT - the quota limit in MB for CVMFS; leave this blank to use the system default (4 GB) You can add other CVMFS options by bind mounting a config file over /cvmfsexec/default.local ; note that options in environment variables take precedence over options in /cvmfsexec/default.local . You may store the cache outside of the container by volume mounting a directory to /cvmfs-cache . Similarly, logs may be stored outside of the container by volume mounting a directory to /cvmfs-logs . Bind mount \u00b6 As an alternative to using cvmfsexec, you may install CVMFS on the host, and volume mount it into the container. Containers with bind mounted CVMFS can be run without --privileged but still require the following capabilities: DAC_OVERRIDE , DAC_READ_SEARCH , SETGID , SETUID , SYS_ADMIN , SYS_CHROOT , and SYS_PTRACE . Once you have CVMFS installed and mounted on your host, add -v /cvmfs:/cvmfs:shared to your docker run invocation. This is the example at the top of the page , modified to volume mount CVMFS instead of using cvmfsexec, and using reduced privileges: docker run -it --rm --user osg \\ --pull=always \\ --security-opt seccomp=unconfined \\ --security-opt systempaths=unconfined \\ --security-opt no-new-privileges \\ -v /cvmfs:/cvmfs:shared \\ -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org \\ -v /worker-temp-dir:/pilot \\ -e GLIDEIN_Site=\"...\" \\ -e GLIDEIN_ResourceName=\"...\" \\ -e GLIDEIN_Start_Extra=\"True\" \\ -e OSG_SQUID_LOCATION=\"...\" \\ opensciencegrid/osgvo-docker-pilot:3.6-release Fill in the values for /path/to/token , /worker-temp-dir , GLIDEIN_Site , GLIDEIN_ResourceName , and OSG_SQUID_LOCATION as above . Limiting resource usage \u00b6 By default, the container allows jobs to utilize the entire node's resources (CPUs, memory). To limit a container's resource consumptions, you may specify limits, which must be set in the following ways: As environment variables, limiting the resources the pilot offers to jobs. As options to the docker run command, limiting the resources the pilot container can use. CPUs \u00b6 To limit the number of CPUs available to jobs (thus limiting the number of simultaneous jobs), add the following to your docker run command: -e NUM_CPUS=<X> --cpus=<X> \\ where <X> is the number of CPUs you want to allow jobs to use. The NUM_CPUS environment variable tells the pilot not to offer more than the given number of CPUs to jobs; the --cpus argument tells Docker not to allocate more than the given number of CPUs to the container. Memory \u00b6 To limit the total amount of memory available to jobs, add the following to your docker run command: -e MEMORY=<X> --memory=$(( (<X> + 100) * 1024 * 1024 )) \\ where <X> is the total amount of memory (in MB) you want to allow jobs to use. The MEMORY environment variable tells the pilot not to offer more than the given amount of memory to jobs; the --memory argument tells Docker to kill the container if its total memory usage exceeds the given number. Allocating additional memory Note that the above command will allocate 100 MB more memory to the container. The pilot will place jobs on hold if they exceed their requested memory, but it may not notice high memory usage immediately. Additionally, the processes that manage jobs also use some amount of memory. Therefore, it is important to give the container some extra room. Advanced: Advertising additional pilot attributes \u00b6 You can put arbitrary additional attributes in the machine ads that the pilot advertises to the OSPool. These attributes will show up when users run condor_status -l against your pilot. This could be useful for advertising something about the way the pilot was provisioned. To do this, volume-mount a file containing key=value pairs to /etc/osg/extra-attributes.cfg . Keys must be valid classad attribute names and values must be valid classad expressions. Multi-line strings are not supported. A line starting with # will be treated as a comment. For example: # The Kubernetes namespace this pod is running under KUBERNETES_NAMESPACE = \"path-osgdev\" # The deployment for this pilot KUBERNETES_DEPLOYMENT = \"osgvo-docker-pilot-gpu\" Best Practices \u00b6 We recommend pulling new versions of backfill containers at least every 72 hours. This ensures that your containers will have the latest bug and security fixes as well as the configurations to match the rest of the OSPool. Additionally, you may see better utilization of your resources as the OSPool may ignore resources with a high number of job starts. Getting Help \u00b6 For assistance, please use the this page .","title":"Site Backfill"},{"location":"resource-sharing/os-backfill-containers/#open-science-pool-containers","text":"In order to share compute resources with the Open Science Pool (OSPool), sites can launch pilot jobs directly by starting an OSG-provided Docker container. The container includes a worker node environment and an embedded pilot; when combined with an OSG-provided authentication token (not included in the container), the pilot can connect to the OSPool and start executing jobs. This technique is useful to implement backfill at a site: contributing computing resources when they would otherwise be idle. Container Limitations These containers do not allow the site to share resources between multiple pools and, if there are no matching idle jobs in the OSPool, the pilots may remain idle.","title":"Open Science Pool Containers"},{"location":"resource-sharing/os-backfill-containers/#before-starting","text":"In order to configure the container, you will need: A system that can run containers, such as Docker or Kubernetes A registered administrative contact A registered resource in OSG Topology; resource registration allows OSG to do proper usage accounting and maintain contacts in case of security incidents and other issues. An authentication token from the OSG: once contact and resource registration are complete, you can retrieve a token through the OSPool Token Registry An HTTP caching proxy at or near your site.","title":"Before Starting"},{"location":"resource-sharing/os-backfill-containers/#running-the-container-with-docker","text":"The Docker image is kept in DockerHub . In order to successfully start payload jobs: Configure authentication: Authentication with the OSPool is performed using tokens retrieved from the OSPool Token Registry which you can then pass to the container by volume mounting it as a file under /etc/condor/tokens-orig.d/ . If you are using Docker to launch the container, this is done with the command line flag -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org . Replace /path/to/token with the full path to the token you obtained from the OSPool Token Registry. Set GLIDEIN_Site and GLIDEIN_ResourceName to match the resource group name and resource name that you registered in Topology, respectively. Set the OSG_SQUID_LOCATION environment variable to the HTTP address of your preferred Squid instance. If providing NVIDIA GPU resources: Bind-mount /etc/OpenCL/vendors , read-only. If you are using Docker to launch the container, this is done with the command line flags -v /etc/OpenCL/vendors:/etc/OpenCL/vendors:ro . Strongly_recommended: Enable CVMFS via one of the mechanisms described below. Strongly recommended: If you want job I/O to be done in a separate directory outside of the container, volume mount the desired directory on the host to /pilot inside the container. Without this, user jobs may compete for disk space with other containers on your system. If you are using Docker to launch the container, this is done with the command line flag -v /worker-temp-dir:/pilot . Replace /worker-temp-dir with a directory you created for jobs to write into. Make sure the user you run your container as has write access to this directory. Optional: add an expression with the GLIDEIN_Start_Extra environment variable to append to the HTCondor START expression ; this limits the pilot to only run certain jobs. Optional: limit OSG pilot container resource usage Here is an example invocation using docker run by hand: docker run -it --rm --user osg \\ --pull=always \\ --privileged \\ -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org \\ -v /worker-temp-dir:/pilot \\ -e GLIDEIN_Site=\"...\" \\ -e GLIDEIN_ResourceName=\"...\" \\ -e GLIDEIN_Start_Extra=\"True\" \\ -e OSG_SQUID_LOCATION=\"...\" \\ -e CVMFSEXEC_REPOS=\" \\ oasis.opensciencegrid.org \\ singularity.opensciencegrid.org\" \\ opensciencegrid/osgvo-docker-pilot:3.6-release Replace /path/to/token with the location you saved the token obtained from the OSPool Token Registry. Privileged mode ( --privileged ) requested in the above docker run allows the container to mount CVMFS using cvmfsexec and invoke singularity for user jobs. Singularity (now known as Apptainer) allows OSPool users to use their own container for their job (e.g., a common use case for GPU jobs).","title":"Running the Container with Docker"},{"location":"resource-sharing/os-backfill-containers/#optional-configuration","text":"","title":"Optional Configuration"},{"location":"resource-sharing/os-backfill-containers/#recommended-cvmfs","text":"CernVM-FS (CVMFS) is a read-only remote filesystem that many OSG jobs depend on for software and data. Supporting CVMFS inside your container will greatly increase the types of OSG jobs you can run. There are two methods for making CVMFS available in your container: enabling cvmfsexec , or bind mounting CVMFS from the host . Bind mounting CVMFS will require CVMFS to be installed on the host first, but the container will need fewer privileges.","title":"(Recommended) CVMFS"},{"location":"resource-sharing/os-backfill-containers/#cvmfsexec","text":"cvmfsexec System Requirements On EL7, you must have kernel version >= 3.10.0-1127 (run uname -vr to check), and user namespaces enabled. See step 1 in the Apptainer Install document for details. On EL8, you must have kernel version >= 4.18 (run uname -vr to check). See the cvmfsexec README details. cvmfsexec is a tool that can be used to mount CVMFS inside the container without requiring CVMFS on the host. To enable cvmfsexec, specify a space-separated list of repos in the CVMFSEXEC_REPOS environment variable. At a minimum, we recommend enabling the following repos: oasis.opensciencegrid.org singularity.opensciencegrid.org Additionally, you may set the following environment variables to further control the behavior of cvmfsexec: CVMFS_HTTP_PROXY - this sets the proxy to use for CVMFS; if left blank it will find the best one via WLCG Web Proxy Auto Discovery. CVMFS_QUOTA_LIMIT - the quota limit in MB for CVMFS; leave this blank to use the system default (4 GB) You can add other CVMFS options by bind mounting a config file over /cvmfsexec/default.local ; note that options in environment variables take precedence over options in /cvmfsexec/default.local . You may store the cache outside of the container by volume mounting a directory to /cvmfs-cache . Similarly, logs may be stored outside of the container by volume mounting a directory to /cvmfs-logs .","title":"cvmfsexec"},{"location":"resource-sharing/os-backfill-containers/#bind-mount","text":"As an alternative to using cvmfsexec, you may install CVMFS on the host, and volume mount it into the container. Containers with bind mounted CVMFS can be run without --privileged but still require the following capabilities: DAC_OVERRIDE , DAC_READ_SEARCH , SETGID , SETUID , SYS_ADMIN , SYS_CHROOT , and SYS_PTRACE . Once you have CVMFS installed and mounted on your host, add -v /cvmfs:/cvmfs:shared to your docker run invocation. This is the example at the top of the page , modified to volume mount CVMFS instead of using cvmfsexec, and using reduced privileges: docker run -it --rm --user osg \\ --pull=always \\ --security-opt seccomp=unconfined \\ --security-opt systempaths=unconfined \\ --security-opt no-new-privileges \\ -v /cvmfs:/cvmfs:shared \\ -v /path/to/token:/etc/condor/tokens-orig.d/flock.opensciencegrid.org \\ -v /worker-temp-dir:/pilot \\ -e GLIDEIN_Site=\"...\" \\ -e GLIDEIN_ResourceName=\"...\" \\ -e GLIDEIN_Start_Extra=\"True\" \\ -e OSG_SQUID_LOCATION=\"...\" \\ opensciencegrid/osgvo-docker-pilot:3.6-release Fill in the values for /path/to/token , /worker-temp-dir , GLIDEIN_Site , GLIDEIN_ResourceName , and OSG_SQUID_LOCATION as above .","title":"Bind mount"},{"location":"resource-sharing/os-backfill-containers/#limiting-resource-usage","text":"By default, the container allows jobs to utilize the entire node's resources (CPUs, memory). To limit a container's resource consumptions, you may specify limits, which must be set in the following ways: As environment variables, limiting the resources the pilot offers to jobs. As options to the docker run command, limiting the resources the pilot container can use.","title":"Limiting resource usage"},{"location":"resource-sharing/os-backfill-containers/#cpus","text":"To limit the number of CPUs available to jobs (thus limiting the number of simultaneous jobs), add the following to your docker run command: -e NUM_CPUS=<X> --cpus=<X> \\ where <X> is the number of CPUs you want to allow jobs to use. The NUM_CPUS environment variable tells the pilot not to offer more than the given number of CPUs to jobs; the --cpus argument tells Docker not to allocate more than the given number of CPUs to the container.","title":"CPUs"},{"location":"resource-sharing/os-backfill-containers/#memory","text":"To limit the total amount of memory available to jobs, add the following to your docker run command: -e MEMORY=<X> --memory=$(( (<X> + 100) * 1024 * 1024 )) \\ where <X> is the total amount of memory (in MB) you want to allow jobs to use. The MEMORY environment variable tells the pilot not to offer more than the given amount of memory to jobs; the --memory argument tells Docker to kill the container if its total memory usage exceeds the given number. Allocating additional memory Note that the above command will allocate 100 MB more memory to the container. The pilot will place jobs on hold if they exceed their requested memory, but it may not notice high memory usage immediately. Additionally, the processes that manage jobs also use some amount of memory. Therefore, it is important to give the container some extra room.","title":"Memory"},{"location":"resource-sharing/os-backfill-containers/#advanced-advertising-additional-pilot-attributes","text":"You can put arbitrary additional attributes in the machine ads that the pilot advertises to the OSPool. These attributes will show up when users run condor_status -l against your pilot. This could be useful for advertising something about the way the pilot was provisioned. To do this, volume-mount a file containing key=value pairs to /etc/osg/extra-attributes.cfg . Keys must be valid classad attribute names and values must be valid classad expressions. Multi-line strings are not supported. A line starting with # will be treated as a comment. For example: # The Kubernetes namespace this pod is running under KUBERNETES_NAMESPACE = \"path-osgdev\" # The deployment for this pilot KUBERNETES_DEPLOYMENT = \"osgvo-docker-pilot-gpu\"","title":"Advanced: Advertising additional pilot attributes"},{"location":"resource-sharing/os-backfill-containers/#best-practices","text":"We recommend pulling new versions of backfill containers at least every 72 hours. This ensures that your containers will have the latest bug and security fixes as well as the configurations to match the rest of the OSPool. Additionally, you may see better utilization of your resources as the OSPool may ignore resources with a high number of job starts.","title":"Best Practices"},{"location":"resource-sharing/os-backfill-containers/#getting-help","text":"For assistance, please use the this page .","title":"Getting Help"},{"location":"resource-sharing/overview/","text":"Compute Resource Sharing Overview \u00b6 OSG uses a resource-overlay (\"pilot\") model to share resources from your local cluster: compute resources are added to a large central resource pool in the OSG through the use of a bootstrap process, often called a pilot or a glidein . These pilots, in turn, download and execute user jobs from an OSG community (also known as \"payloads\") from the resource pool to run within the pilots. On OSG, there are several resource pools, one for each large community (such as ATLAS or CMS) and the special-purpose Open Science Pool . The latter focuses on aggregating resources together for small researcher-driven groups and is operated by the OSG itself. There are several ways pilots can join a resource pool: Submitted to your local batch system by a compute entrypoint (CE). These jobs are created by an external entity, a pilot factory based on observed demand in the pool. The CE is the most common way to receive pilot jobs since they integrate with automated processes that are responsive to existing demand. Sites can launch pilot containers when they have local resources they would like to contribute directly to a specific OSG pool. The site-launched pilot container method is useful for backfilling resources without the need for a batch system; however, at times these pilots may stay idle because there is insufficient demand within the resource pool. Users can launch personal pilot containers within a site's batch system so they can use an existing share or allocation at a site through the open science pool.","title":"Overview"},{"location":"resource-sharing/overview/#compute-resource-sharing-overview","text":"OSG uses a resource-overlay (\"pilot\") model to share resources from your local cluster: compute resources are added to a large central resource pool in the OSG through the use of a bootstrap process, often called a pilot or a glidein . These pilots, in turn, download and execute user jobs from an OSG community (also known as \"payloads\") from the resource pool to run within the pilots. On OSG, there are several resource pools, one for each large community (such as ATLAS or CMS) and the special-purpose Open Science Pool . The latter focuses on aggregating resources together for small researcher-driven groups and is operated by the OSG itself. There are several ways pilots can join a resource pool: Submitted to your local batch system by a compute entrypoint (CE). These jobs are created by an external entity, a pilot factory based on observed demand in the pool. The CE is the most common way to receive pilot jobs since they integrate with automated processes that are responsive to existing demand. Sites can launch pilot containers when they have local resources they would like to contribute directly to a specific OSG pool. The site-launched pilot container method is useful for backfilling resources without the need for a batch system; however, at times these pilots may stay idle because there is insufficient demand within the resource pool. Users can launch personal pilot containers within a site's batch system so they can use an existing share or allocation at a site through the open science pool.","title":"Compute Resource Sharing Overview"},{"location":"resource-sharing/user-containers/","text":"User-launched Containers with Singularity/Apptainer \u00b6 The OSG pilot container can be launched by users in order to run jobs on resources they have access to. The most common use case, documented here, is to start the pilot container inside a Slurm batch job that is launched by the user. This is a great way to add personal resources to the Open Science Pool to increase throughput for a specific workflow. Before Starting \u00b6 In order to configure the container, you will need: A registered resource in OSG Topology; resource registration allows OSG to do proper usage accounting and maintain contacts in case of security incidents. An authentication token from the OSG. Please contact OSG support to request a token for your user. Launching Inside Slurm \u00b6 To launch inside Slurm, one needs to write a small job control script; the details will vary from site-to-site and the following is given as an example for running on compute hosts with 24 cores: #!/bin/bash #SBATCH --job-name=osg-glidein #SBATCH -p compute #SBATCH -N 1 #SBATCH -n 24 #SBATCH -t 48:00:00 #SBATCH --output=osg-glidein-%j.log export TOKEN = \"put_your_provided_token_here\" # Set this so that the OSG accouting knows where the jobs ran export GLIDEIN_Site = \"SDSC\" export GLIDEIN_ResourceName = \"Comet\" # This is an important setting limiting what jobs your glideins will accept. # At the minimum, the expression should limit the \"Owner\" of the jobs to # whatever your username is on the OSG _submit_ side export GLIDEIN_Start_Extra = \"Owner == \\\"my_osgconnect_username\\\"\" module load singularity singularity run --contain \\ --bind /cvmfs \\ --bind /dev/fuse \\ --scratch /pilot \\ docker://opensciencegrid/osgvo-docker-pilot:3.6-release The above example rebuilds the Docker container on each host. If you plan to run large numbers of these jobs, you can download the Docker container once and create a local Singularity image: $ singularity build osgvo-pilot.sif docker://opensciencegrid/osgvo-docker-pilot:3.6-release In this case, the singularity run command should be changed to: singularity run --contain --bind /cvmfs --bind /dev/fuse --scratch /pilot osgvo-pilot.sif Limiting Resource Usage \u00b6 By default, the OSG pilot container will allow jobs to utilize the entire machines's resources (CPUs, memory). This is regardless of how many cores or how much memory you requested in your SLURM batch job. If you do not have the full machine allocated for your use, you should specify limits to what HTCondor will offer. This is done by specifying environment variables when launching the container. To limit the number of CPUs available to jobs (thus limiting the number of simultaneous jobs), add the following to your singularity run command: --env NUM_CPUS=<X> where <X> is the number of CPUs you want to allow jobs to use. The NUM_CPUS environment variable will tell the pilot not to offer more than the given number of CPUs to jobs. To limit the total amount of memory available to jobs, add the following to your docker run command: --env MEMORY=<X> where <X> is the total amount of memory (in MB) you want to allow jobs to use. The MEMORY environment variable will tell the pilot not to offer more than the given amount of memory to jobs. Note If you requested a specific amount of memory for your SLURM batch job, for example with the --mem argument, you should set MEMORY to be about 100 MB less than that, for the following reasons: The pilot will place jobs on hold if they exceed their requested memory, but it may not notice high memory usage immediately. In addition, the processes that manage jobs also use some amount of memory. Therefore it is important to give the container some extra room.","title":"User Allocations"},{"location":"resource-sharing/user-containers/#user-launched-containers-with-singularityapptainer","text":"The OSG pilot container can be launched by users in order to run jobs on resources they have access to. The most common use case, documented here, is to start the pilot container inside a Slurm batch job that is launched by the user. This is a great way to add personal resources to the Open Science Pool to increase throughput for a specific workflow.","title":"User-launched Containers with Singularity/Apptainer"},{"location":"resource-sharing/user-containers/#before-starting","text":"In order to configure the container, you will need: A registered resource in OSG Topology; resource registration allows OSG to do proper usage accounting and maintain contacts in case of security incidents. An authentication token from the OSG. Please contact OSG support to request a token for your user.","title":"Before Starting"},{"location":"resource-sharing/user-containers/#launching-inside-slurm","text":"To launch inside Slurm, one needs to write a small job control script; the details will vary from site-to-site and the following is given as an example for running on compute hosts with 24 cores: #!/bin/bash #SBATCH --job-name=osg-glidein #SBATCH -p compute #SBATCH -N 1 #SBATCH -n 24 #SBATCH -t 48:00:00 #SBATCH --output=osg-glidein-%j.log export TOKEN = \"put_your_provided_token_here\" # Set this so that the OSG accouting knows where the jobs ran export GLIDEIN_Site = \"SDSC\" export GLIDEIN_ResourceName = \"Comet\" # This is an important setting limiting what jobs your glideins will accept. # At the minimum, the expression should limit the \"Owner\" of the jobs to # whatever your username is on the OSG _submit_ side export GLIDEIN_Start_Extra = \"Owner == \\\"my_osgconnect_username\\\"\" module load singularity singularity run --contain \\ --bind /cvmfs \\ --bind /dev/fuse \\ --scratch /pilot \\ docker://opensciencegrid/osgvo-docker-pilot:3.6-release The above example rebuilds the Docker container on each host. If you plan to run large numbers of these jobs, you can download the Docker container once and create a local Singularity image: $ singularity build osgvo-pilot.sif docker://opensciencegrid/osgvo-docker-pilot:3.6-release In this case, the singularity run command should be changed to: singularity run --contain --bind /cvmfs --bind /dev/fuse --scratch /pilot osgvo-pilot.sif","title":"Launching Inside Slurm"},{"location":"resource-sharing/user-containers/#limiting-resource-usage","text":"By default, the OSG pilot container will allow jobs to utilize the entire machines's resources (CPUs, memory). This is regardless of how many cores or how much memory you requested in your SLURM batch job. If you do not have the full machine allocated for your use, you should specify limits to what HTCondor will offer. This is done by specifying environment variables when launching the container. To limit the number of CPUs available to jobs (thus limiting the number of simultaneous jobs), add the following to your singularity run command: --env NUM_CPUS=<X> where <X> is the number of CPUs you want to allow jobs to use. The NUM_CPUS environment variable will tell the pilot not to offer more than the given number of CPUs to jobs. To limit the total amount of memory available to jobs, add the following to your docker run command: --env MEMORY=<X> where <X> is the total amount of memory (in MB) you want to allow jobs to use. The MEMORY environment variable will tell the pilot not to offer more than the given amount of memory to jobs. Note If you requested a specific amount of memory for your SLURM batch job, for example with the --mem argument, you should set MEMORY to be about 100 MB less than that, for the following reasons: The pilot will place jobs on hold if they exceed their requested memory, but it may not notice high memory usage immediately. In addition, the processes that manage jobs also use some amount of memory. Therefore it is important to give the container some extra room.","title":"Limiting Resource Usage"},{"location":"security/certificate-management/","text":"Managing Certificates \u00b6 The OSG provides several tools to assist in the management of host and CA certificates. This page serves as a reference guide for several of these tools: osg-pki-tools : command line tools for requesting and managing user and host certificates. osg-ca-certs-updater : A package for auto-updating CAs on a server host. osg-ca-manage : A tool for detailed management of CA directories outside RPMs. Note This is a reference document and not introduction on how to install CA certificates or request host / user certificates. Most users will want the CA overview , host certificate overview . OSG PKI Command Line Clients \u00b6 Overview \u00b6 The OSG PKI Command Line Clients provide a command-line interface for creating certificate signing requests (CSRs). Prerequisites \u00b6 If you have not already done so, you need to configure the OSG software repositories . Installation \u00b6 The command-line scripts have been packaged as an RPM and are available from the OSG repositories. To install the RPM, run: root@host # yum install osg-pki-tools Usage \u00b6 Documentation for usage of the osg-pki-tools can be found here OSG CA Certificates Updater \u00b6 This section explains the installation and use of osg-ca-certs-updater , a package that provides automatic updates of CA certificates. Requirements \u00b6 As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Install instructions \u00b6 Run the following command to install the latest version of the updater. root@host# yum install osg-ca-certs-updater Services \u00b6 Starting and Enabling Services \u00b6 Run the following to enable the updater. This will persist until the machine is rebooted. root@host# service osg-ca-certs-updater-cron start Run the following to enable the updater when the machine is rebooted. root@host# chkconfig osg-ca-certs-updater-cron on Run both commands if you wish for the service to activate immediately and remain active throughout reboots. Stopping and Disabling Services \u00b6 Enter the following to disable the updater. This will persist until the machine is rebooted. root@host# service osg-ca-certs-updater-cron stop Enter the following to disable the updater when the machine is rebooted. root@host# chkconfig osg-ca-certs-updater-cron off Run both commands if you wish for the service to deactivate immediately and not get reactivated during reboots. Configuration \u00b6 While there is no configuration file, the behavior of the updater can be adjusted by command-line arguments that are specified in the cron entry of the service. This entry is located in the file /etc/cron.d/osg-ca-certs-updater . Please see the Unix manual page for crontab in section 5 for an explanation of the format. The manual page can be accessed by the command man 5 crontab . The valid command-line arguments can be listed by running osg-ca-certs-updater --help . Reasonable defaults have been provided, namely: Attempt an update no more often than every 23 hours. Due to the random wait (see below), having a 24-hour minimum time between updates would cause the update time to slowly slide back every day. Run the script every 6 hours. We run the script more often than we update so that downtime at the wrong moment does not cause the update to be delayed for a full day. Delay for a random amount of time up to 30 minutes before updating, to reduce load spikes on OSG repositories. Do not warn the administrator about update failures that have happened less than 72 hours since the last successful update. Log errors only. Troubleshooting \u00b6 Useful configuration and log files \u00b6 Configuration file \u00b6 Package File Description Location Comment osg-ca-certs-updater Cron entry for periodically launching the updater /etc/cron.d/osg-ca-certs-updater Command-line arguments to the updater can be specified here osg-release Repo definition files for production OSG repositories /etc/yum.repos.d/osg.repo Make sure these repositories are enabled and reachable from the host you are trying to update Log files \u00b6 Logging is performed to the console by default. Please see the manual for your cron daemon to find out how it handles console output. A logfile can be specified via the -l / --logfile command-line option. If logging to syslog via the -s / --log-to-syslog option, the updater will write to the user section of the syslog. The file /etc/syslog.conf determines where syslog messages are saved. References \u00b6 Some guides on X.509 certificates: Useful commands: http://security.ncsa.illinois.edu/research/grid-howtos/usefulopenssl.html Install GSI authentication on a server: http://security.ncsa.illinois.edu/research/wssec/gsihttps/ Certificates how-to: http://www.nordugrid.org/documents/certificate_howto.html See this page for examples of verifying certificates. Managing CAs \u00b6 The osg-ca-manage tool provides a unified interface to manage the CA Certificate installations. This page provides the instructions on using this command. It provides status commands that allows you to list the CAs and the validity of the CAs and CRLs included in the installation. The manage commands allow you to fetch CAs and CRLs, change the distribution URL, as well as add and remove CAs from your local installation. Usage examples \u00b6 Documentation for usage of the osg-ca-manage tool can be found here Note These commands will not work if of the osg-ca-certs (or igtf-ca-certs) RPM packages are installed. Install a certificate authority package \u00b6 Before you proceed to install a Certificate Authority Package you should decide which of the available packages to install. osg , the package recommended to be used by production resources on the OSG. It is based on the CA distribution from the IGTF, but it may differ slightly as decided by the Security Team . igtf , the package is a redistribution of the unchanged CA distribution from the IGTF url a package provided at a given URL Note If in doubt, please consult the policies of your home institution and get in contact with the Security Team . Next decide at what location to install the Certificate Authority Package: on the root file system in a system directory /etc/grid-security/certificates in a custom directory that can also be shared Setup the CA certificates \u00b6 The Certificate Authority Package is preferably be used by grid users without root privileges or if the CA certificates will not be shared by other installations on the same host. root@host # osg-ca-manage setupca --location root --url osg Setting CA Certificates for at '/etc/grid-security/certificates' Setup completed successfully. After a successful installation the certificates will be installed in ( /etc/grid-security/certificates in this example). Typically to write into this default location you will need root privileges. If you need to need to install it with out root privileges use user@host $ osg-ca-manage setupca --location $HOME /certificates --url osg Setting CA Certificates for at '$HOME/certificates' Setup completed successfully. Adding a directory of local CAs \u00b6 root@host # osg-ca-manage add --cadir /etc/grid-security/localca NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. Here is the resulting file after add ##cat /etc/osg/osg-update-certs.conf # Configuration file for osg-update-certs # This file has been regenerated by osg-ca-manage, which removes most # comments. You can still manually modify it, any manual change will # be preserved if osg-ca-manage is used again. ## The parent location certificates will be installed at. install_dir = /etc/grid-security ## cacerts_url is the URL of your certificate distribution cacerts_url = https://repo.opensciencegrid.org/cadist/ca-certs-version-igtf-new ## log specifies where logging output will go log = /var/log/osg-update-certs.log ## include specifies files (full pathnames) that should be copied ## into the certificates installation after an update has occured. include=/etc/grid-security/localca/* ## exclude_ca specifies a CA (not full pathnames, but just the hash ## of the CA you want to exclude) that should be removed from the ## certificates installation after an update has occured. debug = 0 Removing a directory of local CAs \u00b6 root@host # osg-ca-manage remove --cadir /etc/grid-security/localca NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. Removing a particular CA included in OSG CA package \u00b6 root@host # osg-ca-manage remove --caname ce33db76 Symlink detected for hash: We have determided that the hash value you entered belong to the CA 'IRAN-GRID.pem'. If you wish to add this CA back you will have to use this name is the parameter. NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. The resulting config file after the remove is as follows ##cat /etc/osg/osg-update-certs.conf # Configuration file for osg-update-certs # This file has been regenerated by osg-ca-manage, which removes most # comments. You can still manually modify it, any manual change will # be preserved if osg-ca-manage is used again. ## The parent location certificates will be installed at. install_dir = /etc/grid-security ## cacerts_url is the URL of your certificate distribution cacerts_url = https://repo.opensciencegrid.org/cadist/ca-certs-version-igtf-new ## log specifies where logging output will go log = /var/log/osg-update-certs.log ## include specifies files (full pathnames) that should be copied ## into the certificates installation after an update has occured. ## exclude_ca specifies a CA (not full pathnames, but just the hash ## of the CA you want to exclude) that should be removed from the ## certificates installation after an update has occured. exclude_ca = IRAN-GRID debug = 0 Adding a CA from the OSG CA package \u00b6 root@host # osg-ca-manage add --caname IRAN-GRID NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. Inspect installed CA certificates \u00b6 You can inspect the list of CA Certificates that have been installed: user@host $ osg-ca-manage listCA Hash=09ff08b7; Subject= /C=FR/O=CNRS/CN=CNRS2-Projets; Issuer= /C=FR/O=CNRS/CN=CNRS2; Accreditation=Unknown; Status=https://repo.opensciencegrid.org/cadist/ca-certs-version-new Hash=0a12b607; Subject= /DC=org/DC=ugrid/CN=UGRID CA; Issuer= /DC=org/DC=ugrid/CN=UGRID CA; Accreditation=Unknown; Status=https://repo.opensciencegrid.org/cadist/ca-certs-version-new [...] Any certificate issued by any of the Certificate Authorities listed will be trusted. If in doubt please contact the OSG Security Team and review the policies of your home institution. Troubleshooting \u00b6 Useful configuration and log files \u00b6 Logs and configuration: File Description Location Comment Configuration File for osg-update-certs /etc/osg/osg-update-certs.conf This file may be edited by hand, though it is recommended to use osg-ca-manage to set configuration parameters. Log file of osg-update-certs /var/log/osg-update-certs.log Stdout of osg-update-certs /var/log/osg-ca-certs-status.system.out Stdout of osg-ca-manage /var/log/osg-ca-manage.system.out Stdout of initial CA setup /var/log/osg-setup-ca-certificates.system.out References \u00b6 Installing the Certificate Authorities Certificates and the related RPMs","title":"Certificate Management Reference"},{"location":"security/certificate-management/#managing-certificates","text":"The OSG provides several tools to assist in the management of host and CA certificates. This page serves as a reference guide for several of these tools: osg-pki-tools : command line tools for requesting and managing user and host certificates. osg-ca-certs-updater : A package for auto-updating CAs on a server host. osg-ca-manage : A tool for detailed management of CA directories outside RPMs. Note This is a reference document and not introduction on how to install CA certificates or request host / user certificates. Most users will want the CA overview , host certificate overview .","title":"Managing Certificates"},{"location":"security/certificate-management/#osg-pki-command-line-clients","text":"","title":"OSG PKI Command Line Clients"},{"location":"security/certificate-management/#overview","text":"The OSG PKI Command Line Clients provide a command-line interface for creating certificate signing requests (CSRs).","title":"Overview"},{"location":"security/certificate-management/#prerequisites","text":"If you have not already done so, you need to configure the OSG software repositories .","title":"Prerequisites"},{"location":"security/certificate-management/#installation","text":"The command-line scripts have been packaged as an RPM and are available from the OSG repositories. To install the RPM, run: root@host # yum install osg-pki-tools","title":"Installation"},{"location":"security/certificate-management/#usage","text":"Documentation for usage of the osg-pki-tools can be found here","title":"Usage"},{"location":"security/certificate-management/#osg-ca-certificates-updater","text":"This section explains the installation and use of osg-ca-certs-updater , a package that provides automatic updates of CA certificates.","title":"OSG CA Certificates Updater"},{"location":"security/certificate-management/#requirements","text":"As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Requirements"},{"location":"security/certificate-management/#install-instructions","text":"Run the following command to install the latest version of the updater. root@host# yum install osg-ca-certs-updater","title":"Install instructions"},{"location":"security/certificate-management/#services","text":"","title":"Services"},{"location":"security/certificate-management/#starting-and-enabling-services","text":"Run the following to enable the updater. This will persist until the machine is rebooted. root@host# service osg-ca-certs-updater-cron start Run the following to enable the updater when the machine is rebooted. root@host# chkconfig osg-ca-certs-updater-cron on Run both commands if you wish for the service to activate immediately and remain active throughout reboots.","title":"Starting and Enabling Services"},{"location":"security/certificate-management/#stopping-and-disabling-services","text":"Enter the following to disable the updater. This will persist until the machine is rebooted. root@host# service osg-ca-certs-updater-cron stop Enter the following to disable the updater when the machine is rebooted. root@host# chkconfig osg-ca-certs-updater-cron off Run both commands if you wish for the service to deactivate immediately and not get reactivated during reboots.","title":"Stopping and Disabling Services"},{"location":"security/certificate-management/#configuration","text":"While there is no configuration file, the behavior of the updater can be adjusted by command-line arguments that are specified in the cron entry of the service. This entry is located in the file /etc/cron.d/osg-ca-certs-updater . Please see the Unix manual page for crontab in section 5 for an explanation of the format. The manual page can be accessed by the command man 5 crontab . The valid command-line arguments can be listed by running osg-ca-certs-updater --help . Reasonable defaults have been provided, namely: Attempt an update no more often than every 23 hours. Due to the random wait (see below), having a 24-hour minimum time between updates would cause the update time to slowly slide back every day. Run the script every 6 hours. We run the script more often than we update so that downtime at the wrong moment does not cause the update to be delayed for a full day. Delay for a random amount of time up to 30 minutes before updating, to reduce load spikes on OSG repositories. Do not warn the administrator about update failures that have happened less than 72 hours since the last successful update. Log errors only.","title":"Configuration"},{"location":"security/certificate-management/#troubleshooting","text":"","title":"Troubleshooting"},{"location":"security/certificate-management/#useful-configuration-and-log-files","text":"","title":"Useful configuration and log files"},{"location":"security/certificate-management/#configuration-file","text":"Package File Description Location Comment osg-ca-certs-updater Cron entry for periodically launching the updater /etc/cron.d/osg-ca-certs-updater Command-line arguments to the updater can be specified here osg-release Repo definition files for production OSG repositories /etc/yum.repos.d/osg.repo Make sure these repositories are enabled and reachable from the host you are trying to update","title":"Configuration file"},{"location":"security/certificate-management/#log-files","text":"Logging is performed to the console by default. Please see the manual for your cron daemon to find out how it handles console output. A logfile can be specified via the -l / --logfile command-line option. If logging to syslog via the -s / --log-to-syslog option, the updater will write to the user section of the syslog. The file /etc/syslog.conf determines where syslog messages are saved.","title":"Log files"},{"location":"security/certificate-management/#references","text":"Some guides on X.509 certificates: Useful commands: http://security.ncsa.illinois.edu/research/grid-howtos/usefulopenssl.html Install GSI authentication on a server: http://security.ncsa.illinois.edu/research/wssec/gsihttps/ Certificates how-to: http://www.nordugrid.org/documents/certificate_howto.html See this page for examples of verifying certificates.","title":"References"},{"location":"security/certificate-management/#managing-cas","text":"The osg-ca-manage tool provides a unified interface to manage the CA Certificate installations. This page provides the instructions on using this command. It provides status commands that allows you to list the CAs and the validity of the CAs and CRLs included in the installation. The manage commands allow you to fetch CAs and CRLs, change the distribution URL, as well as add and remove CAs from your local installation.","title":"Managing CAs"},{"location":"security/certificate-management/#usage-examples","text":"Documentation for usage of the osg-ca-manage tool can be found here Note These commands will not work if of the osg-ca-certs (or igtf-ca-certs) RPM packages are installed.","title":"Usage examples"},{"location":"security/certificate-management/#install-a-certificate-authority-package","text":"Before you proceed to install a Certificate Authority Package you should decide which of the available packages to install. osg , the package recommended to be used by production resources on the OSG. It is based on the CA distribution from the IGTF, but it may differ slightly as decided by the Security Team . igtf , the package is a redistribution of the unchanged CA distribution from the IGTF url a package provided at a given URL Note If in doubt, please consult the policies of your home institution and get in contact with the Security Team . Next decide at what location to install the Certificate Authority Package: on the root file system in a system directory /etc/grid-security/certificates in a custom directory that can also be shared","title":"Install a certificate authority package"},{"location":"security/certificate-management/#setup-the-ca-certificates","text":"The Certificate Authority Package is preferably be used by grid users without root privileges or if the CA certificates will not be shared by other installations on the same host. root@host # osg-ca-manage setupca --location root --url osg Setting CA Certificates for at '/etc/grid-security/certificates' Setup completed successfully. After a successful installation the certificates will be installed in ( /etc/grid-security/certificates in this example). Typically to write into this default location you will need root privileges. If you need to need to install it with out root privileges use user@host $ osg-ca-manage setupca --location $HOME /certificates --url osg Setting CA Certificates for at '$HOME/certificates' Setup completed successfully.","title":"Setup the CA certificates"},{"location":"security/certificate-management/#adding-a-directory-of-local-cas","text":"root@host # osg-ca-manage add --cadir /etc/grid-security/localca NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. Here is the resulting file after add ##cat /etc/osg/osg-update-certs.conf # Configuration file for osg-update-certs # This file has been regenerated by osg-ca-manage, which removes most # comments. You can still manually modify it, any manual change will # be preserved if osg-ca-manage is used again. ## The parent location certificates will be installed at. install_dir = /etc/grid-security ## cacerts_url is the URL of your certificate distribution cacerts_url = https://repo.opensciencegrid.org/cadist/ca-certs-version-igtf-new ## log specifies where logging output will go log = /var/log/osg-update-certs.log ## include specifies files (full pathnames) that should be copied ## into the certificates installation after an update has occured. include=/etc/grid-security/localca/* ## exclude_ca specifies a CA (not full pathnames, but just the hash ## of the CA you want to exclude) that should be removed from the ## certificates installation after an update has occured. debug = 0","title":"Adding a directory of local CAs"},{"location":"security/certificate-management/#removing-a-directory-of-local-cas","text":"root@host # osg-ca-manage remove --cadir /etc/grid-security/localca NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately.","title":"Removing a directory of local CAs"},{"location":"security/certificate-management/#removing-a-particular-ca-included-in-osg-ca-package","text":"root@host # osg-ca-manage remove --caname ce33db76 Symlink detected for hash: We have determided that the hash value you entered belong to the CA 'IRAN-GRID.pem'. If you wish to add this CA back you will have to use this name is the parameter. NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately. The resulting config file after the remove is as follows ##cat /etc/osg/osg-update-certs.conf # Configuration file for osg-update-certs # This file has been regenerated by osg-ca-manage, which removes most # comments. You can still manually modify it, any manual change will # be preserved if osg-ca-manage is used again. ## The parent location certificates will be installed at. install_dir = /etc/grid-security ## cacerts_url is the URL of your certificate distribution cacerts_url = https://repo.opensciencegrid.org/cadist/ca-certs-version-igtf-new ## log specifies where logging output will go log = /var/log/osg-update-certs.log ## include specifies files (full pathnames) that should be copied ## into the certificates installation after an update has occured. ## exclude_ca specifies a CA (not full pathnames, but just the hash ## of the CA you want to exclude) that should be removed from the ## certificates installation after an update has occured. exclude_ca = IRAN-GRID debug = 0","title":"Removing a particular CA included in OSG CA package"},{"location":"security/certificate-management/#adding-a-ca-from-the-osg-ca-package","text":"root@host # osg-ca-manage add --caname IRAN-GRID NOTE: You did not specify the --auto-refresh flag. So the changes made to the configuration will not be reflected till the next time when CAs and CRLs are updated respectively by osg-update-certs and fetch-crl running from cron. Run `osg-ca-manage refreshCA` and `osg-ca-manage fetchCRL` to commit your changes immediately.","title":"Adding a CA from the OSG CA package"},{"location":"security/certificate-management/#inspect-installed-ca-certificates","text":"You can inspect the list of CA Certificates that have been installed: user@host $ osg-ca-manage listCA Hash=09ff08b7; Subject= /C=FR/O=CNRS/CN=CNRS2-Projets; Issuer= /C=FR/O=CNRS/CN=CNRS2; Accreditation=Unknown; Status=https://repo.opensciencegrid.org/cadist/ca-certs-version-new Hash=0a12b607; Subject= /DC=org/DC=ugrid/CN=UGRID CA; Issuer= /DC=org/DC=ugrid/CN=UGRID CA; Accreditation=Unknown; Status=https://repo.opensciencegrid.org/cadist/ca-certs-version-new [...] Any certificate issued by any of the Certificate Authorities listed will be trusted. If in doubt please contact the OSG Security Team and review the policies of your home institution.","title":"Inspect installed CA certificates"},{"location":"security/certificate-management/#troubleshooting_1","text":"","title":"Troubleshooting"},{"location":"security/certificate-management/#useful-configuration-and-log-files_1","text":"Logs and configuration: File Description Location Comment Configuration File for osg-update-certs /etc/osg/osg-update-certs.conf This file may be edited by hand, though it is recommended to use osg-ca-manage to set configuration parameters. Log file of osg-update-certs /var/log/osg-update-certs.log Stdout of osg-update-certs /var/log/osg-ca-certs-status.system.out Stdout of osg-ca-manage /var/log/osg-ca-manage.system.out Stdout of initial CA setup /var/log/osg-setup-ca-certificates.system.out","title":"Useful configuration and log files"},{"location":"security/certificate-management/#references_1","text":"Installing the Certificate Authorities Certificates and the related RPMs","title":"References"},{"location":"security/host-certs/","text":"Page moved to Host Certificates .","title":"Host certs"},{"location":"security/host-certs/incommon/","text":"Requesting InCommon IGTF Host Certificates \u00b6 Many institutions in the United States already subscribe to InCommon and offer IGTF certificate services. If your institution is in the list of InCommon subscribers , continue with the instructions below. If your institution is not in the list, Let's Encrypt certificates do not meet your needs, and you do not have access to another IGTF CA subscription, please contact us . As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories From a host that meets the above requirements, there are two options to get InCommon IGTF-accredited host certificates: Requesting certificates from a Registration Authority (RA) : This requires a Certificate Signing Request (CSR), which can be generated with the osg-cert-request tool. Requesting certificates as an RA : As an RA, you can request, approve, and retrieve certificates yourself through the InCommon REST API using the osg-incommon-cert-request tool . Install the osg-pki-tools where both command line tools are available: root@host # yum install osg-pki-tools Requesting certificates from a registration authority \u00b6 Generate a Certificate Signing Request (CSR) and private key using the osg-cert-request tool: user@host $ osg-cert-request --hostname <HOSTNAME> \\ --country <COUNTRY> \\ --state <STATE> \\ --locality <LOCALITY> \\ --organization <ORGANIZATION> You may also add DNS Subject Alternative Names (SAN) to the request by specifying any number of --altname <SAN> . For example, the following generates a CSR for test.opensciencegrid.org with foo.opensciencegrid.org and bar.opensciencegrid.org as SANs: user@host $ osg-cert-request --hostname test.opensciencegrid.org \\ --country US \\ --state Wisconsin \\ --locality Madison \\ --organization 'University of Wisconsin-Madison' \\ --altname foo.opensciencegrid.org \\ --altname bar.opensciencegrid.org If successful, the CSR will be named <HOSTNAME>.req and the private key will be named <HOSTNAME>-key.pem . Additional options and descriptions can be found here . Find your institution-specific InCommon contact and submit the CSR that you generated above. Make sure to request a 1-year IGTF Server Certificate for OTHER server software. After the certificate has been issued by your institution, download the host certificate only (not the full chain) to its intended host and copy over the key you generated above. Verify that the issuer CN field is InCommon IGTF Server CA : $ openssl x509 -in <PATH TO CERTIFICATE> -noout -issuer issuer= /C=US/O=Internet2/OU=InCommon/CN=InCommon IGTF Server CA Where <PATH TO CERTIFICATE> is the file you downloaded in the previous step Install the host certificate and key: root@host # cp <PATH TO CERTIFICATE> /etc/grid-security/hostcert.pem root@host # chmod 444 /etc/grid-security/hostcert.pem root@host # cp <PATH TO KEY> /etc/grid-security/hostkey.pem root@host # chmod 400 /etc/grid-security/hostkey.pem Where <PATH TO KEY> is the \".key\" file you created in the first step Requesting certificates as a registration authority \u00b6 If you are a Registration Authority for your institution, skip ahead to this section . If you are not already a Registration Authority (RA) for your institution, you must request to be made one: Find your institution-specific InCommon contact (e.g. campus central IT) Request a Department Registration Authority user with SSL auto-approve enabled and a client certificate: If they do not grant your request, you will not be able to request, approve, and retrieve certificates yourself. Instead, you must request certificates from your RA . If they grant your request, you will receive an email with instructions for requesting your client certificate; download the .p12 file. Find your institution-specific organization and department codes at the InCommon Cert Manager (https://cert-manager.com/customer/InCommon). These are numeric codes that should be specified through the command line using the -O/--orgcode ORG,DEPT option: Organization code is shown as OrgID under Settings > Organizations > Edit Department code is shown as OrgID under Settings > Organizations > Departments > Edit Once you have RA privileges, you may request, approve, and retrieve host certificates using osg-incommon-cert-request : In order to request a certificate, you will need your InCommon client certificate as two separate files, incommon_user_key.pem for the key, and incommon_user_cert.pem for the cert. If you don't already have them, perform the following steps: Download the .p12 file with your client certificate and save this as incommon_file.p12 . You should have received instructions for how to obtain this file in an email when you became an RA. Extract the certificate and key: user@host $ openssl pkcs12 -in incommon_file.p12 \\ -nocerts -out ~/path_to_dir/incommon_user_key.pem user@host $ openssl pkcs12 -in incommon_file.p12 \\ -nokeys -out ~/path_to_dir/incommon_user_cert.pem Requesting a certificate with a single hostname <HOSTNAME> : user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostname <HOSTNAME> [--orgcode <ORG,DEPT>] Requesting a certificate with Subject Alternative Names (SANs): user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostname <HOSTNAME> \\ --altname <ALTNAME> \\ --altname <ALTNAME2> [--orgcode <ORG,DEPT>] Requesting certificates in bulk using a hostfile name: user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostfile ~/path_to_file/hostfile.txt \\ [ --orgcode <ORG,DEPT> ] Where the contents of hostfile.txt contain one hostname and any number of SANs per line: hostname01.yourdomain hostname02.yourdomain hostnamealias.yourdomain hostname03.yourdomain hostname04.yourdomain hostname05.yourdomain References \u00b6 CILogon documentation for requesting InCommon certificates Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate.","title":"Using InCommon"},{"location":"security/host-certs/incommon/#requesting-incommon-igtf-host-certificates","text":"Many institutions in the United States already subscribe to InCommon and offer IGTF certificate services. If your institution is in the list of InCommon subscribers , continue with the instructions below. If your institution is not in the list, Let's Encrypt certificates do not meet your needs, and you do not have access to another IGTF CA subscription, please contact us . As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories From a host that meets the above requirements, there are two options to get InCommon IGTF-accredited host certificates: Requesting certificates from a Registration Authority (RA) : This requires a Certificate Signing Request (CSR), which can be generated with the osg-cert-request tool. Requesting certificates as an RA : As an RA, you can request, approve, and retrieve certificates yourself through the InCommon REST API using the osg-incommon-cert-request tool . Install the osg-pki-tools where both command line tools are available: root@host # yum install osg-pki-tools","title":"Requesting InCommon IGTF Host Certificates"},{"location":"security/host-certs/incommon/#requesting-certificates-from-a-registration-authority","text":"Generate a Certificate Signing Request (CSR) and private key using the osg-cert-request tool: user@host $ osg-cert-request --hostname <HOSTNAME> \\ --country <COUNTRY> \\ --state <STATE> \\ --locality <LOCALITY> \\ --organization <ORGANIZATION> You may also add DNS Subject Alternative Names (SAN) to the request by specifying any number of --altname <SAN> . For example, the following generates a CSR for test.opensciencegrid.org with foo.opensciencegrid.org and bar.opensciencegrid.org as SANs: user@host $ osg-cert-request --hostname test.opensciencegrid.org \\ --country US \\ --state Wisconsin \\ --locality Madison \\ --organization 'University of Wisconsin-Madison' \\ --altname foo.opensciencegrid.org \\ --altname bar.opensciencegrid.org If successful, the CSR will be named <HOSTNAME>.req and the private key will be named <HOSTNAME>-key.pem . Additional options and descriptions can be found here . Find your institution-specific InCommon contact and submit the CSR that you generated above. Make sure to request a 1-year IGTF Server Certificate for OTHER server software. After the certificate has been issued by your institution, download the host certificate only (not the full chain) to its intended host and copy over the key you generated above. Verify that the issuer CN field is InCommon IGTF Server CA : $ openssl x509 -in <PATH TO CERTIFICATE> -noout -issuer issuer= /C=US/O=Internet2/OU=InCommon/CN=InCommon IGTF Server CA Where <PATH TO CERTIFICATE> is the file you downloaded in the previous step Install the host certificate and key: root@host # cp <PATH TO CERTIFICATE> /etc/grid-security/hostcert.pem root@host # chmod 444 /etc/grid-security/hostcert.pem root@host # cp <PATH TO KEY> /etc/grid-security/hostkey.pem root@host # chmod 400 /etc/grid-security/hostkey.pem Where <PATH TO KEY> is the \".key\" file you created in the first step","title":"Requesting certificates from a registration authority"},{"location":"security/host-certs/incommon/#requesting-certificates-as-a-registration-authority","text":"If you are a Registration Authority for your institution, skip ahead to this section . If you are not already a Registration Authority (RA) for your institution, you must request to be made one: Find your institution-specific InCommon contact (e.g. campus central IT) Request a Department Registration Authority user with SSL auto-approve enabled and a client certificate: If they do not grant your request, you will not be able to request, approve, and retrieve certificates yourself. Instead, you must request certificates from your RA . If they grant your request, you will receive an email with instructions for requesting your client certificate; download the .p12 file. Find your institution-specific organization and department codes at the InCommon Cert Manager (https://cert-manager.com/customer/InCommon). These are numeric codes that should be specified through the command line using the -O/--orgcode ORG,DEPT option: Organization code is shown as OrgID under Settings > Organizations > Edit Department code is shown as OrgID under Settings > Organizations > Departments > Edit Once you have RA privileges, you may request, approve, and retrieve host certificates using osg-incommon-cert-request : In order to request a certificate, you will need your InCommon client certificate as two separate files, incommon_user_key.pem for the key, and incommon_user_cert.pem for the cert. If you don't already have them, perform the following steps: Download the .p12 file with your client certificate and save this as incommon_file.p12 . You should have received instructions for how to obtain this file in an email when you became an RA. Extract the certificate and key: user@host $ openssl pkcs12 -in incommon_file.p12 \\ -nocerts -out ~/path_to_dir/incommon_user_key.pem user@host $ openssl pkcs12 -in incommon_file.p12 \\ -nokeys -out ~/path_to_dir/incommon_user_cert.pem Requesting a certificate with a single hostname <HOSTNAME> : user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostname <HOSTNAME> [--orgcode <ORG,DEPT>] Requesting a certificate with Subject Alternative Names (SANs): user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostname <HOSTNAME> \\ --altname <ALTNAME> \\ --altname <ALTNAME2> [--orgcode <ORG,DEPT>] Requesting certificates in bulk using a hostfile name: user@host $ osg-incommon-cert-request --username <INCOMMON_LOGIN> \\ --cert ~/path_to_dir/incommon_user_cert.pem \\ --pkey ~/path_to_dir/incommon_user_key.pem \\ --hostfile ~/path_to_file/hostfile.txt \\ [ --orgcode <ORG,DEPT> ] Where the contents of hostfile.txt contain one hostname and any number of SANs per line: hostname01.yourdomain hostname02.yourdomain hostnamealias.yourdomain hostname03.yourdomain hostname04.yourdomain hostname05.yourdomain","title":"Requesting certificates as a registration authority"},{"location":"security/host-certs/incommon/#references","text":"CILogon documentation for requesting InCommon certificates Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate.","title":"References"},{"location":"security/host-certs/lets-encrypt/","text":"Requesting Host Certificates Using Let's Encrypt \u00b6 Let's Encrypt is a free, automated, and open CA frequently used for web services; see the security team's position on Let's Encrypt for more details. Let's Encrypt can be used to obtain host certificates as an alternative to InCommon if your institution does not have an InCommon subscription. Let's Encrypt uses an automated script named certbot for requesting and renewing host certs. certbot binds to port 80 when running, so services running on port 80 (such as HTCondor-CE View service ) must be temporarily stopped before running certbot . In addition, port 80 must be open to the world while certbot is running. If this does not work for your host, see the alternate renewal methods section below. Let's Encrypt host certs expire every three months so it is important to set up automated renewal. Installation and Obtaining the Initial Certificate \u00b6 Install the certbot package (available from the EPEL 7 repository): root@host # yum install certbot Stop services running on port 80 if there are any. Run the following command to obtain the host certificate with Let's Encrypt: root@host # certbot certonly --standalone --email <ADMIN_EMAIL> -d <HOST> Set up hostcert/hostkey links: root@host # ln -sf /etc/letsencrypt/live/*/cert.pem /etc/grid-security/hostcert.pem root@host # ln -sf /etc/letsencrypt/live/*/privkey.pem /etc/grid-security/hostkey.pem root@host # chmod 0600 /etc/letsencrypt/archive/*/privkey*.pem Restart services running on port 80 if there were any. Renewing Let's Encrypt host certificates \u00b6 You can manually renew your certificate with the following command: root@host # certbot renew The certificate will be renewed if it is close to expiring. Disable services listening on port 80 Just like with obtaining a new certificate, renewing a certificate requires you to temporarily disable services running on port 80 so that certbot can verify the host. Automating renewals using systemd timers \u00b6 To automate renewal using systemd, you'll need to create two files: The first is a service file that tells systemd how to invoke certbot. The second is to generate a timer file that tells systemd how often to run the service. The steps to setup the timer are as follows: Create a service file called /etc/systemd/system/certbot.service with the following contents [Unit] Description=Let's Encrypt renewal [Service] Type=oneshot ExecStart=/usr/bin/certbot renew --quiet --agree-tos Once the certbot service is working correctly, you will need to create the timer file. Create the timer file at /etc/systemd/system/certbot.timer ) with the following contents: [Unit] Description=Let's Encrypt renewal timer [Timer] OnCalendar=0/12:00:00 RandomizedDelaySec=1h Persistent=true [Install] WantedBy=timers.target Update the systemd manager configuration: root@host # systemctl daemon-reload Start and enable the certbot timer: root@host # systemctl enable --now certbot.timer You can verify that the timer is active by running systemctl list-timers . Note Verify that the service has started correctly by running systemctl status certbot.service . The timer may fail without warnings if the service does not run correctly. Pre- and post-renewal hooks \u00b6 certbot provides the ability to run scripts before and/or after certificate renewal via command hooks. Common uses for these hooks include: Copying the renewed certificate so that it can be used for a separate service (such as XRootD) Shutting down and restarting a service running on port 80 Temporarily opening up the firewall To do this, call certbot with --pre-hook <COMMAND> for a command or script to run before renewal, and --post-hook <COMMAND> for a command or script to run after renewal. The command(s) will only be run if the certificate is actually renewed. Example \u00b6 This example is for a host running CEView and XRootD standalone; CEView needs to be stopped so it doesn't block port 80, and XRootD needs its certificate in a separate location. Create the following scripts: /root/bin/certbot-pre.sh #!/bin/bash condor_ce_off -daemon CEVIEW /root/bin/certbot-post.sh #!/bin/bash cd /etc/grid-security cp -f hostcert.pem xrd/xrdcert.pem cp -f hostkey.pem xrd/xrdkey.pem chown -R xrootd:xrootd xrd condor_ce_on -daemon CEVIEW systemctl restart xrootd@standalone Then call certbot as follows: root@host # certbot renew --pre-hook /root/bin/certbot-pre.sh \\ --post-hook /root/bin/certbot-post.sh For automated renewal, edit the certbot.service file that you created above and add the --pre-hook <COMMAND> and --post-hook <COMMAND> arguments to the ExecStart line: ExecStart = /usr/bin/certbot renew --quiet --agree-tos \\ --pre-hook /root/bin/certbot-pre.sh \\ --post-hook /root/bin/certbot-post.sh Alternate renewal methods \u00b6 There are some cases in which you might need an alternative to running certbot or certbot-auto as above. For example: You have a web server running on port 80 that you do not want to shut down during renewal You cannot open port 80 during renewal You want a wildcard certificate You want to run the renewal on a different machine than where the certificate will be used Certbot plugins may help in these cases. The Apache, Nginx, and Webroot plugins integrate with an already running web server to allow renewal without shutting the webserver down. One of the DNS plugins can be used to avoid using port 80, run on a different machine, or obtain a wildcard cert. If all else fails, the manual plugin can be used for manual renewal. References \u00b6 Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate. Official Let's Encrypt setup guide . Another Let's Encrypt setup reference . Under Getting your host certificate, we follow the first \"Setting up\" section.","title":"Using Let's Encrypt"},{"location":"security/host-certs/lets-encrypt/#requesting-host-certificates-using-lets-encrypt","text":"Let's Encrypt is a free, automated, and open CA frequently used for web services; see the security team's position on Let's Encrypt for more details. Let's Encrypt can be used to obtain host certificates as an alternative to InCommon if your institution does not have an InCommon subscription. Let's Encrypt uses an automated script named certbot for requesting and renewing host certs. certbot binds to port 80 when running, so services running on port 80 (such as HTCondor-CE View service ) must be temporarily stopped before running certbot . In addition, port 80 must be open to the world while certbot is running. If this does not work for your host, see the alternate renewal methods section below. Let's Encrypt host certs expire every three months so it is important to set up automated renewal.","title":"Requesting Host Certificates Using Let's Encrypt"},{"location":"security/host-certs/lets-encrypt/#installation-and-obtaining-the-initial-certificate","text":"Install the certbot package (available from the EPEL 7 repository): root@host # yum install certbot Stop services running on port 80 if there are any. Run the following command to obtain the host certificate with Let's Encrypt: root@host # certbot certonly --standalone --email <ADMIN_EMAIL> -d <HOST> Set up hostcert/hostkey links: root@host # ln -sf /etc/letsencrypt/live/*/cert.pem /etc/grid-security/hostcert.pem root@host # ln -sf /etc/letsencrypt/live/*/privkey.pem /etc/grid-security/hostkey.pem root@host # chmod 0600 /etc/letsencrypt/archive/*/privkey*.pem Restart services running on port 80 if there were any.","title":"Installation and Obtaining the Initial Certificate"},{"location":"security/host-certs/lets-encrypt/#renewing-lets-encrypt-host-certificates","text":"You can manually renew your certificate with the following command: root@host # certbot renew The certificate will be renewed if it is close to expiring. Disable services listening on port 80 Just like with obtaining a new certificate, renewing a certificate requires you to temporarily disable services running on port 80 so that certbot can verify the host.","title":"Renewing Let's Encrypt host certificates"},{"location":"security/host-certs/lets-encrypt/#automating-renewals-using-systemd-timers","text":"To automate renewal using systemd, you'll need to create two files: The first is a service file that tells systemd how to invoke certbot. The second is to generate a timer file that tells systemd how often to run the service. The steps to setup the timer are as follows: Create a service file called /etc/systemd/system/certbot.service with the following contents [Unit] Description=Let's Encrypt renewal [Service] Type=oneshot ExecStart=/usr/bin/certbot renew --quiet --agree-tos Once the certbot service is working correctly, you will need to create the timer file. Create the timer file at /etc/systemd/system/certbot.timer ) with the following contents: [Unit] Description=Let's Encrypt renewal timer [Timer] OnCalendar=0/12:00:00 RandomizedDelaySec=1h Persistent=true [Install] WantedBy=timers.target Update the systemd manager configuration: root@host # systemctl daemon-reload Start and enable the certbot timer: root@host # systemctl enable --now certbot.timer You can verify that the timer is active by running systemctl list-timers . Note Verify that the service has started correctly by running systemctl status certbot.service . The timer may fail without warnings if the service does not run correctly.","title":"Automating renewals using systemd timers"},{"location":"security/host-certs/lets-encrypt/#pre-and-post-renewal-hooks","text":"certbot provides the ability to run scripts before and/or after certificate renewal via command hooks. Common uses for these hooks include: Copying the renewed certificate so that it can be used for a separate service (such as XRootD) Shutting down and restarting a service running on port 80 Temporarily opening up the firewall To do this, call certbot with --pre-hook <COMMAND> for a command or script to run before renewal, and --post-hook <COMMAND> for a command or script to run after renewal. The command(s) will only be run if the certificate is actually renewed.","title":"Pre- and post-renewal hooks"},{"location":"security/host-certs/lets-encrypt/#example","text":"This example is for a host running CEView and XRootD standalone; CEView needs to be stopped so it doesn't block port 80, and XRootD needs its certificate in a separate location. Create the following scripts: /root/bin/certbot-pre.sh #!/bin/bash condor_ce_off -daemon CEVIEW /root/bin/certbot-post.sh #!/bin/bash cd /etc/grid-security cp -f hostcert.pem xrd/xrdcert.pem cp -f hostkey.pem xrd/xrdkey.pem chown -R xrootd:xrootd xrd condor_ce_on -daemon CEVIEW systemctl restart xrootd@standalone Then call certbot as follows: root@host # certbot renew --pre-hook /root/bin/certbot-pre.sh \\ --post-hook /root/bin/certbot-post.sh For automated renewal, edit the certbot.service file that you created above and add the --pre-hook <COMMAND> and --post-hook <COMMAND> arguments to the ExecStart line: ExecStart = /usr/bin/certbot renew --quiet --agree-tos \\ --pre-hook /root/bin/certbot-pre.sh \\ --post-hook /root/bin/certbot-post.sh","title":"Example"},{"location":"security/host-certs/lets-encrypt/#alternate-renewal-methods","text":"There are some cases in which you might need an alternative to running certbot or certbot-auto as above. For example: You have a web server running on port 80 that you do not want to shut down during renewal You cannot open port 80 during renewal You want a wildcard certificate You want to run the renewal on a different machine than where the certificate will be used Certbot plugins may help in these cases. The Apache, Nginx, and Webroot plugins integrate with an already running web server to allow renewal without shutting the webserver down. One of the DNS plugins can be used to avoid using port 80, run on a different machine, or obtain a wildcard cert. If all else fails, the manual plugin can be used for manual renewal.","title":"Alternate renewal methods"},{"location":"security/host-certs/lets-encrypt/#references","text":"Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate. Official Let's Encrypt setup guide . Another Let's Encrypt setup reference . Under Getting your host certificate, we follow the first \"Setting up\" section.","title":"References"},{"location":"security/host-certs/overview/","text":"Host Certificates \u00b6 Note This document describes how to get host certificates. Host certificates are X.509 certificates that are used to securely identify servers and to establish encrypted connections between services and clients. In the OSG Fabric of Services, some services (e.g., HTCondor-CE, XRootD) require host certificates. If you are unsure if your host needs a host certificate, please consult the installation instructions for the software you are interested in installing. Before Starting \u00b6 Before requesting a new host certificate, use openssl to check if your host already has a valid certificate, i.e. the present is between notBefore and notAfter dates and times. If so, you may safely skip this document: user@host $ openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout subject= /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=host.opensciencegrid.org issuer=/DC=org/DC=cilogon/C=US/O=CILogon/CN=CILogon OSG CA 1 notBefore=Jan 4 21:08:09 2010 GMT notAfter=Jan 4 21:08:09 2011 GMT If you are using OpenSSL 1.1, you may notice minor formatting differences. Requesting Host Certificates \u00b6 To acquire a host certificate, you must submit a request to a Certificate Authority (CA). We recommend requesting host certificates from one of the following CAs: InCommon IGTF : an IGTF-accredited CA for services that interact with the WLCG; requires a subscription, generally held by an institution Important For integration with the OSG Fabric of Services, InCommon host certificates must be issued by the IGTF CA and not the InCommon RSA CA. Let's Encrypt : a free, automated, and open CA frequently used for web services; see the security team's position on Let's Encrypt for more details. Let's Encrypt is not IGTF-accredited so their certificates are not suitable for WLCG services. If neither of the above options work for your site, the OSG Fabric of Services also accepts all IGTF-accredited CAs . Note For SSL to work properly, you will need to request a host certificate with \"TLS Web Server Authentication\" and \"TLS Web Client Authentication\" included in the X509v3 Extended Key Usage. Requesting Service Certificates \u00b6 Previously, the OSG Consortium recommended using separate X.509 certificates, called \"service certificates\", for each service on a host. This practice has become less popular as sites have separated SSL-requiring services to their own hosts. In the case where your host is only running a single service that requires a service certificate, we recommend using your host certificate as your service certificate. Ensure that the ownership of the host certificate and key are appropriate for the service you are running. If you are running multiple services that require host certificates, we recommend requesting a certificate whose CommonName is <service>-hostname and has the hostname in the list of subject alternative names. Frequently Asked Questions \u00b6 Can I use any host to request a certificate for a different host? \u00b6 YES, you can use any host to create a certificate signing request as long as the hostname for the certificate is a fully qualified domain name. How do I renew a host certificate? \u00b6 For Let's Encrypt certificates, see this section For other certificates, there is no separate renewal procedure. Instead, request a new certificate using one of the methods above. How can I check if I have a host certificate installed already? \u00b6 By default the host certificate key pair will be installed in /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem . You can use openssl to access basic information about the certificate: root@host # openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout subject= /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=host.opensciencegrid.org issuer= /DC=org/DC=cilogon/C=US/O=CILogon/CN=CILogon OSG CA 1 notBefore=Apr 8 00:00:00 2013 GMT notAfter=May 17 12:00:00 2014 GMT Note The openssl version 1.1.x command prints the subject DN in a slightly different format. OpenSSL version 1.1 is present on Enterprise Linux 8 systems. The new format is a comma separated list of attributes. You must convert that back to the older format for our map files. Each attribute must start with a / and there are no spaces around the = and remove the comma between attributes: DC = org, DC = opensciencegrid, O = Open Science Grid, OU = People, CN = Matyas Selmeci should be written as: /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci How can I check the expiration time of my installed host certificate? \u00b6 Use the following openssl command to find the dates that your host certificate is valid: root@host # openssl x509 -in /etc/grid-security/hostcert.pem -dates -noout notBefore=Jan 4 21:08:41 2010 GMT notAfter=Jan 4 21:08:41 2011 GMT References \u00b6 Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate.","title":"Overview"},{"location":"security/host-certs/overview/#host-certificates","text":"Note This document describes how to get host certificates. Host certificates are X.509 certificates that are used to securely identify servers and to establish encrypted connections between services and clients. In the OSG Fabric of Services, some services (e.g., HTCondor-CE, XRootD) require host certificates. If you are unsure if your host needs a host certificate, please consult the installation instructions for the software you are interested in installing.","title":"Host Certificates"},{"location":"security/host-certs/overview/#before-starting","text":"Before requesting a new host certificate, use openssl to check if your host already has a valid certificate, i.e. the present is between notBefore and notAfter dates and times. If so, you may safely skip this document: user@host $ openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout subject= /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=host.opensciencegrid.org issuer=/DC=org/DC=cilogon/C=US/O=CILogon/CN=CILogon OSG CA 1 notBefore=Jan 4 21:08:09 2010 GMT notAfter=Jan 4 21:08:09 2011 GMT If you are using OpenSSL 1.1, you may notice minor formatting differences.","title":"Before Starting"},{"location":"security/host-certs/overview/#requesting-host-certificates","text":"To acquire a host certificate, you must submit a request to a Certificate Authority (CA). We recommend requesting host certificates from one of the following CAs: InCommon IGTF : an IGTF-accredited CA for services that interact with the WLCG; requires a subscription, generally held by an institution Important For integration with the OSG Fabric of Services, InCommon host certificates must be issued by the IGTF CA and not the InCommon RSA CA. Let's Encrypt : a free, automated, and open CA frequently used for web services; see the security team's position on Let's Encrypt for more details. Let's Encrypt is not IGTF-accredited so their certificates are not suitable for WLCG services. If neither of the above options work for your site, the OSG Fabric of Services also accepts all IGTF-accredited CAs . Note For SSL to work properly, you will need to request a host certificate with \"TLS Web Server Authentication\" and \"TLS Web Client Authentication\" included in the X509v3 Extended Key Usage.","title":"Requesting Host Certificates"},{"location":"security/host-certs/overview/#requesting-service-certificates","text":"Previously, the OSG Consortium recommended using separate X.509 certificates, called \"service certificates\", for each service on a host. This practice has become less popular as sites have separated SSL-requiring services to their own hosts. In the case where your host is only running a single service that requires a service certificate, we recommend using your host certificate as your service certificate. Ensure that the ownership of the host certificate and key are appropriate for the service you are running. If you are running multiple services that require host certificates, we recommend requesting a certificate whose CommonName is <service>-hostname and has the hostname in the list of subject alternative names.","title":"Requesting Service Certificates"},{"location":"security/host-certs/overview/#frequently-asked-questions","text":"","title":"Frequently Asked Questions"},{"location":"security/host-certs/overview/#can-i-use-any-host-to-request-a-certificate-for-a-different-host","text":"YES, you can use any host to create a certificate signing request as long as the hostname for the certificate is a fully qualified domain name.","title":"Can I use any host to request a certificate for a different host?"},{"location":"security/host-certs/overview/#how-do-i-renew-a-host-certificate","text":"For Let's Encrypt certificates, see this section For other certificates, there is no separate renewal procedure. Instead, request a new certificate using one of the methods above.","title":"How do I renew a host certificate?"},{"location":"security/host-certs/overview/#how-can-i-check-if-i-have-a-host-certificate-installed-already","text":"By default the host certificate key pair will be installed in /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem . You can use openssl to access basic information about the certificate: root@host # openssl x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout subject= /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=host.opensciencegrid.org issuer= /DC=org/DC=cilogon/C=US/O=CILogon/CN=CILogon OSG CA 1 notBefore=Apr 8 00:00:00 2013 GMT notAfter=May 17 12:00:00 2014 GMT Note The openssl version 1.1.x command prints the subject DN in a slightly different format. OpenSSL version 1.1 is present on Enterprise Linux 8 systems. The new format is a comma separated list of attributes. You must convert that back to the older format for our map files. Each attribute must start with a / and there are no spaces around the = and remove the comma between attributes: DC = org, DC = opensciencegrid, O = Open Science Grid, OU = People, CN = Matyas Selmeci should be written as: /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=People/CN=Matyas Selmeci","title":"How can I check if I have a host certificate installed already?"},{"location":"security/host-certs/overview/#how-can-i-check-the-expiration-time-of-my-installed-host-certificate","text":"Use the following openssl command to find the dates that your host certificate is valid: root@host # openssl x509 -in /etc/grid-security/hostcert.pem -dates -noout notBefore=Jan 4 21:08:41 2010 GMT notAfter=Jan 4 21:08:41 2011 GMT","title":"How can I check the expiration time of my installed host certificate?"},{"location":"security/host-certs/overview/#references","text":"Useful OpenSSL commands (from NCSA) - e.g. how to convert the format of your certificate.","title":"References"},{"location":"security/tokens/overview/","text":"Bearer Token Overview \u00b6 Token-based Authentication and Authorization Infrastructure (AAI) is a security method that is intended as the replacement for X.509 for accessing compute and storage resources. This document will describe \"bearer tokens,\" which are one of the components of Token AAI; bearer tokens are the type of token that server software such as HTCondor and XRootD will primarily interact with. A bearer token (sometimes called an \"access token\") is a short-lived credential, performing a similar role as a grid proxy did in X.509. X.509 proxies established identity (the DN in your subject) and group membership (VOMS FQANs). Servers made decisions about access based on those properties. Tokens also have 'scope' which can restrict the actions that can be done with the token. For example, a token used for storage access can restrict the files that can be read to a particular directory tree. Instead of using a single proxy, a job may have multiple tokens. For example the job could have one token granting it the ability to be run; it could have a token for read access to an input dataset, and a token for write access to a results directory. Token Components \u00b6 Bearer tokens are credential strings in the JSON Web Token (JWT) format. A JWT consists of a JSON header, a JSON payload, and a signature that can be verified. The payload contains a number of fields, called \"claims\", that describe the token and what it can access. There are two JWT-based token standards that can be used with OSG software: SciTokens and WLCG Tokens . These standards describe the claims that are used in the payload of the JWT. SciTokens and WLCG Tokens are similar standards and have some common claims: Issuer (\"iss\") The issuer identifies the organization that issued the token. An issuer looks like an HTTPS URL; this URL must be valid and publicly accessible as they are used by site services to validate the token. Token issuers will be described below . Subject (\"sub\") The subject identifies an entity (which could be a human or a robot) that owns the token. Unlike the subject of an X.509 certificate, a token subject does not need to be globally unique, only unique to the issuer. Subjects will be elaborated on below . Issued-at (\"iat\"), not-before (\"nbf\"), expiration (\"exp\") These claims are Unix timestamps that specify when the token was issued, and its lifespan. Audience (\"aud\") The audience is a server (or a JSON list of servers) that the token may be used on; it is typically a hostname, host:port, or URI. For example a token used for submitting a job to a CE would have <CE FQDN>:<CE PORT> in the aud claim. The special values ANY (SciTokens) or https://wlcg.cern.ch/jwt/v1/any (WLCG Tokens) allow the token to be used on any server. Scope (\"scope\") The scope limits the actions that can be made using the token. The format of the scope claim differs between SciTokens and WLCG Tokens; scopes in use by OSG services will be listed below . WLCG Tokens may have a wlcg.group instead of a scope, as described below . Issuer \u00b6 To generate bearer tokens, a collaboration must adminster at least one \"token issuer\" to issue tokens to their users. In addition to generating and signing tokens, token issuers provide a public endpoint that can be used to validate an issued token, e.g. an OSG Compute Entrypoint (CE) will contact the token issuer to authorize a bearer token used for pilot job submission. The issuer is listed in the iss claim; this should be an HTTPS URL of a web server. This server must have the public key that can be used to validate the token in a well-known location, as described by the OpenID Connect Discovery standard . If the issuer is down, or the the public key cannot be downloaded, the token cannot be verified and will be rejected. Note that most clients will cache the public key. In order to ease the token transition, the current cache lifetime is 4 days, but at some point this will be lowered to a few hours. A collaboration may have more than one token issuer, but a single token issuer should never serve more than one collaboration. The issuer claim should be able to uniquely identify the collaboration that identifies the token. Subject \u00b6 The subject is listed in the sub claim and should be unique, stable identifier that corresponds to a user (human) or a service (robot or pilot job submission). A subject does not need to be globally unique but it must be unique to the issuer. The subject, when combined with the issuer, will give a globally unique identity that can be used for mapping, banning, accounting, monitoring, auditing, or tracing. Note Due to privacy concerns, the subject may be a randomly generated string, hash, UUID, etc., that does not contain any personally identifying information. Tracing a token to a user or service may require contacting the issuer. Scopes \u00b6 The scope claim is a space-separated list of authorizations that should be granted to the bearer. Scopes utilized by OSG services include the following: Capability SciTokens scope WLCG scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. A SciToken must have a non-empty scope, or it cannot be used to do anything. WLCG Groups \u00b6 A WLCG Token may have a wlcg.groups claim instead of a scope. The wlcg.groups claim is a comma and space separated list of collaboration groups. The format of these groups are similar to VOMS FQANs: /<collaboration>[/<group>][/Role=<role>] , replacing <collaboration> , <group> , and <role> with the collaboration, group, and role, respectively, where the group and role are optional. For example, the following groups and roles have been used by the ATLAS and CMS collaborations: /atlas/ /atlas/usatlas /cms/Role=pilot /cms/local/Role=pilot Validating Tokens in Pilot Jobs \u00b6 If an incoming (pre-routed) pilot on a CE has a token, it will have the following classad attributes: Attribute Meaning AuthTokenId A UUID of the token AuthTokenIssuer The URL of the issuer of the token AuthTokenScopes Any scope restrictions on the token AuthTokenSubject The sub claim of the token AuthTokenGroups The wlcg.groups , if any, claim of the token (A pre-routed job is a job without RoutedJob=True in its classad.) Note A job may have both a token and an X.509 proxy. Presence of any x509* attributes does not indicate the absence of a token. To see which authentication method was used for a job: - Examine the /var/log/condor-ce/AuditLog* files. - Find a line saying Submitting new job <JOBID> (where <JOBID> is a job ID like 21249.0 ). The line before that should say what authentication method was used. - Authentication via a token will say AuthMethod=SCITOKENS . - Authentication via a proxy will say AuthMethod=GSI . See the upstream documentation for more details. Collaboration support \u00b6 Verify support with collaborations The tables of collaborations below are updated as frequently as possible. If a collaboration you support is listed as not supporting tokens or WebDav, please contact your collaboration directly to verify that this information is up-to-date. WebDAV/XRootD File transfer \u00b6 The following collaborations support support file transfer using WebDAV or XRootD: Collaboration Supports WebDAV or XRootD ATLAS Yes CMS Yes CLAS12 Yes EIC N/A GLOW Yes GlueX N/A HCC N/A IceCube Undergoing testing* LIGO Undergoing testing* OSG Yes * Currently, collaborations testing WebDAV or XRootD support will continue to support other file transfer protocols so it should it should be safe to update your OSG WN clients to OSG 3.6. If you have any questions, please contact your collaboration directly. Help \u00b6 To get assistance, please use the this page . References \u00b6 Troubleshooting Tokens OSG Technology - Collaborations and Bearer Tokens JSON Web Tokens - includes token decoder SciTokens SciToken Claims and Scopes Language SciTokens Demo - includes token generator, verifier, and links to libraries WLCG Common JWT Profiles","title":"Overview"},{"location":"security/tokens/overview/#bearer-token-overview","text":"Token-based Authentication and Authorization Infrastructure (AAI) is a security method that is intended as the replacement for X.509 for accessing compute and storage resources. This document will describe \"bearer tokens,\" which are one of the components of Token AAI; bearer tokens are the type of token that server software such as HTCondor and XRootD will primarily interact with. A bearer token (sometimes called an \"access token\") is a short-lived credential, performing a similar role as a grid proxy did in X.509. X.509 proxies established identity (the DN in your subject) and group membership (VOMS FQANs). Servers made decisions about access based on those properties. Tokens also have 'scope' which can restrict the actions that can be done with the token. For example, a token used for storage access can restrict the files that can be read to a particular directory tree. Instead of using a single proxy, a job may have multiple tokens. For example the job could have one token granting it the ability to be run; it could have a token for read access to an input dataset, and a token for write access to a results directory.","title":"Bearer Token Overview"},{"location":"security/tokens/overview/#token-components","text":"Bearer tokens are credential strings in the JSON Web Token (JWT) format. A JWT consists of a JSON header, a JSON payload, and a signature that can be verified. The payload contains a number of fields, called \"claims\", that describe the token and what it can access. There are two JWT-based token standards that can be used with OSG software: SciTokens and WLCG Tokens . These standards describe the claims that are used in the payload of the JWT. SciTokens and WLCG Tokens are similar standards and have some common claims: Issuer (\"iss\") The issuer identifies the organization that issued the token. An issuer looks like an HTTPS URL; this URL must be valid and publicly accessible as they are used by site services to validate the token. Token issuers will be described below . Subject (\"sub\") The subject identifies an entity (which could be a human or a robot) that owns the token. Unlike the subject of an X.509 certificate, a token subject does not need to be globally unique, only unique to the issuer. Subjects will be elaborated on below . Issued-at (\"iat\"), not-before (\"nbf\"), expiration (\"exp\") These claims are Unix timestamps that specify when the token was issued, and its lifespan. Audience (\"aud\") The audience is a server (or a JSON list of servers) that the token may be used on; it is typically a hostname, host:port, or URI. For example a token used for submitting a job to a CE would have <CE FQDN>:<CE PORT> in the aud claim. The special values ANY (SciTokens) or https://wlcg.cern.ch/jwt/v1/any (WLCG Tokens) allow the token to be used on any server. Scope (\"scope\") The scope limits the actions that can be made using the token. The format of the scope claim differs between SciTokens and WLCG Tokens; scopes in use by OSG services will be listed below . WLCG Tokens may have a wlcg.group instead of a scope, as described below .","title":"Token Components"},{"location":"security/tokens/overview/#issuer","text":"To generate bearer tokens, a collaboration must adminster at least one \"token issuer\" to issue tokens to their users. In addition to generating and signing tokens, token issuers provide a public endpoint that can be used to validate an issued token, e.g. an OSG Compute Entrypoint (CE) will contact the token issuer to authorize a bearer token used for pilot job submission. The issuer is listed in the iss claim; this should be an HTTPS URL of a web server. This server must have the public key that can be used to validate the token in a well-known location, as described by the OpenID Connect Discovery standard . If the issuer is down, or the the public key cannot be downloaded, the token cannot be verified and will be rejected. Note that most clients will cache the public key. In order to ease the token transition, the current cache lifetime is 4 days, but at some point this will be lowered to a few hours. A collaboration may have more than one token issuer, but a single token issuer should never serve more than one collaboration. The issuer claim should be able to uniquely identify the collaboration that identifies the token.","title":"Issuer"},{"location":"security/tokens/overview/#subject","text":"The subject is listed in the sub claim and should be unique, stable identifier that corresponds to a user (human) or a service (robot or pilot job submission). A subject does not need to be globally unique but it must be unique to the issuer. The subject, when combined with the issuer, will give a globally unique identity that can be used for mapping, banning, accounting, monitoring, auditing, or tracing. Note Due to privacy concerns, the subject may be a randomly generated string, hash, UUID, etc., that does not contain any personally identifying information. Tracing a token to a user or service may require contacting the issuer.","title":"Subject"},{"location":"security/tokens/overview/#scopes","text":"The scope claim is a space-separated list of authorizations that should be granted to the bearer. Scopes utilized by OSG services include the following: Capability SciTokens scope WLCG scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. A SciToken must have a non-empty scope, or it cannot be used to do anything.","title":"Scopes"},{"location":"security/tokens/overview/#wlcg-groups","text":"A WLCG Token may have a wlcg.groups claim instead of a scope. The wlcg.groups claim is a comma and space separated list of collaboration groups. The format of these groups are similar to VOMS FQANs: /<collaboration>[/<group>][/Role=<role>] , replacing <collaboration> , <group> , and <role> with the collaboration, group, and role, respectively, where the group and role are optional. For example, the following groups and roles have been used by the ATLAS and CMS collaborations: /atlas/ /atlas/usatlas /cms/Role=pilot /cms/local/Role=pilot","title":"WLCG Groups"},{"location":"security/tokens/overview/#validating-tokens-in-pilot-jobs","text":"If an incoming (pre-routed) pilot on a CE has a token, it will have the following classad attributes: Attribute Meaning AuthTokenId A UUID of the token AuthTokenIssuer The URL of the issuer of the token AuthTokenScopes Any scope restrictions on the token AuthTokenSubject The sub claim of the token AuthTokenGroups The wlcg.groups , if any, claim of the token (A pre-routed job is a job without RoutedJob=True in its classad.) Note A job may have both a token and an X.509 proxy. Presence of any x509* attributes does not indicate the absence of a token. To see which authentication method was used for a job: - Examine the /var/log/condor-ce/AuditLog* files. - Find a line saying Submitting new job <JOBID> (where <JOBID> is a job ID like 21249.0 ). The line before that should say what authentication method was used. - Authentication via a token will say AuthMethod=SCITOKENS . - Authentication via a proxy will say AuthMethod=GSI . See the upstream documentation for more details.","title":"Validating Tokens in Pilot Jobs"},{"location":"security/tokens/overview/#collaboration-support","text":"Verify support with collaborations The tables of collaborations below are updated as frequently as possible. If a collaboration you support is listed as not supporting tokens or WebDav, please contact your collaboration directly to verify that this information is up-to-date.","title":"Collaboration support"},{"location":"security/tokens/overview/#webdavxrootd-file-transfer","text":"The following collaborations support support file transfer using WebDAV or XRootD: Collaboration Supports WebDAV or XRootD ATLAS Yes CMS Yes CLAS12 Yes EIC N/A GLOW Yes GlueX N/A HCC N/A IceCube Undergoing testing* LIGO Undergoing testing* OSG Yes * Currently, collaborations testing WebDAV or XRootD support will continue to support other file transfer protocols so it should it should be safe to update your OSG WN clients to OSG 3.6. If you have any questions, please contact your collaboration directly.","title":"WebDAV/XRootD File transfer"},{"location":"security/tokens/overview/#help","text":"To get assistance, please use the this page .","title":"Help"},{"location":"security/tokens/overview/#references","text":"Troubleshooting Tokens OSG Technology - Collaborations and Bearer Tokens JSON Web Tokens - includes token decoder SciTokens SciToken Claims and Scopes Language SciTokens Demo - includes token generator, verifier, and links to libraries WLCG Common JWT Profiles","title":"References"},{"location":"security/tokens/using-tokens/","text":"Using Bearer Tokens \u00b6 As part of the GridFTP and GSI migration , the OSG has transitioned authentication away from X.509 certificates to the use of bearer tokens such as SciTokens or WLCG JWT . Use this document to learn how to request tokens from an OpenID Connect (OIDC) Provider or how to generate a test token for validating your OSG services. Requesting Tokens From An OIDC Provider \u00b6 If you are a member of a collaboration with an OIDC provider (such as CILogon or Indigo IAM ), you can use the oidc-agent client to request tokens. This client tool is available either as a container or as an RPM installation . Alternatively, a collaboration may choose to set up a shared htvault-config service that is registered as the OIDC client or clients and enables each user to have a simpler experience to obtain tokens using the htgettoken command while at the same time keeping long-lived refresh tokens stored more securely. Both of those can be installed as RPMs from OSG repos as described at the above links, and they are also integrated with HTCondor . OSG Software recommends those tools as documented at those links for when collaborations are ready to use tokens in production, but the rest of this page gives instructions for oidc-agent which is better for early experimentation with tokens. At the end of the page we also recommend installing the htgettoken package just for its additional htdecodetoken command which is useful for looking inside tokens. Alternative tokens for testing If you are not a member of a collaboration with access to an OIDC provider, you can generate test SciTokens using these instructions Using a Container \u00b6 Registering an OIDC profile \u00b6 Start an agent container in the background and name it my-agent to easily run subsequent commands against it: docker run -d --name my-agent opensciencegrid/oidc-agent:3.6-release Generate a local client profile and follow the prompts: docker exec -it my-agent oidc-gen -w device <CLIENT PROFILE> Specify an OIDC provider such as CILogon or an IAM instance as the client issuer. For example, if you are requesting tokens from the WLCG IAM instance: Issuer [https://iam-test.indigo-datacloud.eu/]: https://wlcg.cloud.cnaf.infn.it/ Request scopes for the capabilities that you need based on the type of tokens that your provider issues: Capability SciTokens Scope WLCG Scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. If you are requesting WLCG tokens, you will need to also add the wlcg and offline_access scopes. For example, to request HTCondor READ and WRITE access from an OIDC provider issuing WLCG tokens, specify the following when prompted for a space delimited list of scopes: wlcg offline_access compute.read compute.modify compute.cancel compute.create When prompted, open the verification URL provided a browser, enter the code provided by oidc-gen , and click \"Submit\". Follow the instructions in your browser to authorize your new oidc-agent client Back in your terminal, enter a password to encrypt your local client profile. You'll need to remember this if you want to re-use this profile in subsequent sessions. Requesting access tokens \u00b6 Note You must first register a new profile . Request a token using the client profile that you used with oidc-gen : docker exec -it my-agent oidc-token --aud=\"<SERVER AUDIENCE>\" <CLIENT PROFILE> For tokens used against an HTCondor-CE, set <SERVER AUDIENCE> to <CE FQDN>:<CE PORT> . Copy the output of oidc-token into a file on the host where you need bearer token authentication, e.g. an HTCondor or XRootD client. Reloading an OIDC profile \u00b6 Note Required after restarting the running container. You must have an existing registered profile . If your existing container is not already running, start it: docker start my-agent Reload profile: docker exec -it my-agent oidc-add <CLIENT PROFILE> Enter the password used to encrypt your <CLIENT PROFILE> created during profile registration. Using an RPM installation \u00b6 Registering an OIDC profile \u00b6 Start the agent and add the appropriate variables to your environment: eval `oidc-agent` Generate a local client profile and follow the prompts: oidc-gen -w device <CLIENT PROFILE> Specify an OIDC provider such as CILogon or an IAM instance as the client issuer. For example, if you are requesting tokens from the WLCG IAM instance: Issuer [https://iam-test.indigo-datacloud.eu/]: https://wlcg.cloud.cnaf.infn.it/ Request scopes for the capabilities that you need based on the type of tokens that your provider issues: Capability SciTokens Scope WLCG Scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. If you are requesting WLCG tokens, you will need to also add the wlcg and offline_access scopes. For example, to request HTCondor READ and WRITE access from an OIDC provider issuing WLCG tokens, specify the following when prompted for a space delimited list of scopes: wlcg offline_access compute.read compute.modify compute.cancel compute.create When prompted, open the verification URL provided a browser, enter the code provided by oidc-gen , and click \"Submit\". Follow the instructions in your browser to authorize your new oidc-agent client Back in your terminal, enter a password to encrypt your local client profile. You'll need to remember this if you want to re-use this profile in subsequent sessions. Requesting access tokens \u00b6 Note You must first register a new profile . Request a token using the client profile that you used with oidc-gen : oidc-token --aud=\"<SERVER AUDIENCE>\" <CLIENT PROFILE> For tokens used against an HTCondor-CE, set <SERVER AUDIENCE> to <CE FQDN>:<CE PORT> . Copy the output of oidc-token into a file on the host where you need bearer token authentication, e.g. an HTCondor or XRootD client. Reloading an OIDC profile \u00b6 Note Required if you log out of the running machine. You must have an existing registered profile . If you do not already have a running 'oidc-agent', start one: eval 'oidc-agent' Reload profile: oidc-add <CLIENT PROFILE> Enter the password used to encrypt your <CLIENT PROFILE> created during profile registration. Generating SciTokens For Testing \u00b6 If you are not a member of a collaboration with an OIDC Provider and would like to validate token functionality with your HTCondor-CE or XRootD service, you can use the SciTokens demo website : Open https://demo.scitokens.org in a browser window Add a subject claim to the generated token by adding the following to the PAYLOAD: DATA window, between the curly braces: \"sub\": \"<subject string>\", Replacing <subject string> with a subject appropriate for the service that you are testing: Any random string for an HTCondor-CE, which should be reflected in your token mapping If you are using xrootd-multiuser , a local Unix username Add a scopes claim to the generated token by adding the following to the PAYLOAD: DATA window, between the curly braces: \"scope\": \"<space separated list of scopes>\", Replacing <list of scopes> appropriate for the service and authorization that you are interested in testing: Capability Scope Note HTCondor READ condor:/READ Required for job submission HTCondor WRITE condor:/WRITE Required for job submission XRootD read read:<PATH> XRootD write write:<PATH> Copy the entire contents of the Encoded window to a file where you will be running your client commands Add https://demo.scitokens.org (and subject if appropriate) to your service's configuration to authenticate your new test token Remove test mappings After completing testing, remove any test demo.scitokens.org mappings that you have added as anyone is capable of creating a demo SciToken. Using Tokens \u00b6 Client tools such as condor_submit or xrdcp will search for your access token in order of the following locations: Token contents in the $BEARER_TOKEN environment variable Path to the token in the $BEARER_TOKEN_FILE environment variable Path to the token in $XDG_RUNTIME_DIR/bt_u$UID Token saved to /tmp/bt_u$UID For more details, see the WLCG Bearer Token Discovery technical note . Troubleshooting Tokens \u00b6 Validating tokens \u00b6 A token must be a one-line string consisting of 3 base64-encoded parts separated by periods ( . ). You can use the tools in the scitokens-cpp RPM to validate a SciToken or WLCG token. Run scitokens-verify <TOKEN> (where <TOKEN> is the text of the token) to validate the token using the issuer. Run scitokens-list-access <TOKEN> <ISSUER> <AUDIENCE> (where <TOKEN> is the text of the token, <ISSUER> is the issuer to verify the token with, and <AUDIENCE> is the server you are using the token to access). Examining tokens \u00b6 Online: paste the token into https://jwt.io . Offline: Install htgettoken : # yum install htgettoken Write the token to a file named tok or store it in one of the default WLCG Bearer Token Discovery locations described above. Run htdecodetoken -H tok or leave off the tok filename if it is in one of the default locations. htdecodetoken is one of the additional commands that come with the htgettoken package.","title":"Using Tokens"},{"location":"security/tokens/using-tokens/#using-bearer-tokens","text":"As part of the GridFTP and GSI migration , the OSG has transitioned authentication away from X.509 certificates to the use of bearer tokens such as SciTokens or WLCG JWT . Use this document to learn how to request tokens from an OpenID Connect (OIDC) Provider or how to generate a test token for validating your OSG services.","title":"Using Bearer Tokens"},{"location":"security/tokens/using-tokens/#requesting-tokens-from-an-oidc-provider","text":"If you are a member of a collaboration with an OIDC provider (such as CILogon or Indigo IAM ), you can use the oidc-agent client to request tokens. This client tool is available either as a container or as an RPM installation . Alternatively, a collaboration may choose to set up a shared htvault-config service that is registered as the OIDC client or clients and enables each user to have a simpler experience to obtain tokens using the htgettoken command while at the same time keeping long-lived refresh tokens stored more securely. Both of those can be installed as RPMs from OSG repos as described at the above links, and they are also integrated with HTCondor . OSG Software recommends those tools as documented at those links for when collaborations are ready to use tokens in production, but the rest of this page gives instructions for oidc-agent which is better for early experimentation with tokens. At the end of the page we also recommend installing the htgettoken package just for its additional htdecodetoken command which is useful for looking inside tokens. Alternative tokens for testing If you are not a member of a collaboration with access to an OIDC provider, you can generate test SciTokens using these instructions","title":"Requesting Tokens From An OIDC Provider"},{"location":"security/tokens/using-tokens/#using-a-container","text":"","title":"Using a Container"},{"location":"security/tokens/using-tokens/#registering-an-oidc-profile","text":"Start an agent container in the background and name it my-agent to easily run subsequent commands against it: docker run -d --name my-agent opensciencegrid/oidc-agent:3.6-release Generate a local client profile and follow the prompts: docker exec -it my-agent oidc-gen -w device <CLIENT PROFILE> Specify an OIDC provider such as CILogon or an IAM instance as the client issuer. For example, if you are requesting tokens from the WLCG IAM instance: Issuer [https://iam-test.indigo-datacloud.eu/]: https://wlcg.cloud.cnaf.infn.it/ Request scopes for the capabilities that you need based on the type of tokens that your provider issues: Capability SciTokens Scope WLCG Scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. If you are requesting WLCG tokens, you will need to also add the wlcg and offline_access scopes. For example, to request HTCondor READ and WRITE access from an OIDC provider issuing WLCG tokens, specify the following when prompted for a space delimited list of scopes: wlcg offline_access compute.read compute.modify compute.cancel compute.create When prompted, open the verification URL provided a browser, enter the code provided by oidc-gen , and click \"Submit\". Follow the instructions in your browser to authorize your new oidc-agent client Back in your terminal, enter a password to encrypt your local client profile. You'll need to remember this if you want to re-use this profile in subsequent sessions.","title":"Registering an OIDC profile"},{"location":"security/tokens/using-tokens/#requesting-access-tokens","text":"Note You must first register a new profile . Request a token using the client profile that you used with oidc-gen : docker exec -it my-agent oidc-token --aud=\"<SERVER AUDIENCE>\" <CLIENT PROFILE> For tokens used against an HTCondor-CE, set <SERVER AUDIENCE> to <CE FQDN>:<CE PORT> . Copy the output of oidc-token into a file on the host where you need bearer token authentication, e.g. an HTCondor or XRootD client.","title":"Requesting access tokens"},{"location":"security/tokens/using-tokens/#reloading-an-oidc-profile","text":"Note Required after restarting the running container. You must have an existing registered profile . If your existing container is not already running, start it: docker start my-agent Reload profile: docker exec -it my-agent oidc-add <CLIENT PROFILE> Enter the password used to encrypt your <CLIENT PROFILE> created during profile registration.","title":"Reloading an OIDC profile"},{"location":"security/tokens/using-tokens/#using-an-rpm-installation","text":"","title":"Using an RPM installation"},{"location":"security/tokens/using-tokens/#registering-an-oidc-profile_1","text":"Start the agent and add the appropriate variables to your environment: eval `oidc-agent` Generate a local client profile and follow the prompts: oidc-gen -w device <CLIENT PROFILE> Specify an OIDC provider such as CILogon or an IAM instance as the client issuer. For example, if you are requesting tokens from the WLCG IAM instance: Issuer [https://iam-test.indigo-datacloud.eu/]: https://wlcg.cloud.cnaf.infn.it/ Request scopes for the capabilities that you need based on the type of tokens that your provider issues: Capability SciTokens Scope WLCG Scope HTCondor READ condor:/READ compute.read HTCondor WRITE condor:/WRITE compute.modify compute.cancel compute.create XRootD read read:<PATH> storage.read:<PATH> XRootD write write:<PATH> storage.modify:<PATH> Replacing <PATH> with a path to the storage location that the bearer should be authorized to access. If you are requesting WLCG tokens, you will need to also add the wlcg and offline_access scopes. For example, to request HTCondor READ and WRITE access from an OIDC provider issuing WLCG tokens, specify the following when prompted for a space delimited list of scopes: wlcg offline_access compute.read compute.modify compute.cancel compute.create When prompted, open the verification URL provided a browser, enter the code provided by oidc-gen , and click \"Submit\". Follow the instructions in your browser to authorize your new oidc-agent client Back in your terminal, enter a password to encrypt your local client profile. You'll need to remember this if you want to re-use this profile in subsequent sessions.","title":"Registering an OIDC profile"},{"location":"security/tokens/using-tokens/#requesting-access-tokens_1","text":"Note You must first register a new profile . Request a token using the client profile that you used with oidc-gen : oidc-token --aud=\"<SERVER AUDIENCE>\" <CLIENT PROFILE> For tokens used against an HTCondor-CE, set <SERVER AUDIENCE> to <CE FQDN>:<CE PORT> . Copy the output of oidc-token into a file on the host where you need bearer token authentication, e.g. an HTCondor or XRootD client.","title":"Requesting access tokens"},{"location":"security/tokens/using-tokens/#reloading-an-oidc-profile_1","text":"Note Required if you log out of the running machine. You must have an existing registered profile . If you do not already have a running 'oidc-agent', start one: eval 'oidc-agent' Reload profile: oidc-add <CLIENT PROFILE> Enter the password used to encrypt your <CLIENT PROFILE> created during profile registration.","title":"Reloading an OIDC profile"},{"location":"security/tokens/using-tokens/#generating-scitokens-for-testing","text":"If you are not a member of a collaboration with an OIDC Provider and would like to validate token functionality with your HTCondor-CE or XRootD service, you can use the SciTokens demo website : Open https://demo.scitokens.org in a browser window Add a subject claim to the generated token by adding the following to the PAYLOAD: DATA window, between the curly braces: \"sub\": \"<subject string>\", Replacing <subject string> with a subject appropriate for the service that you are testing: Any random string for an HTCondor-CE, which should be reflected in your token mapping If you are using xrootd-multiuser , a local Unix username Add a scopes claim to the generated token by adding the following to the PAYLOAD: DATA window, between the curly braces: \"scope\": \"<space separated list of scopes>\", Replacing <list of scopes> appropriate for the service and authorization that you are interested in testing: Capability Scope Note HTCondor READ condor:/READ Required for job submission HTCondor WRITE condor:/WRITE Required for job submission XRootD read read:<PATH> XRootD write write:<PATH> Copy the entire contents of the Encoded window to a file where you will be running your client commands Add https://demo.scitokens.org (and subject if appropriate) to your service's configuration to authenticate your new test token Remove test mappings After completing testing, remove any test demo.scitokens.org mappings that you have added as anyone is capable of creating a demo SciToken.","title":"Generating SciTokens For Testing"},{"location":"security/tokens/using-tokens/#using-tokens","text":"Client tools such as condor_submit or xrdcp will search for your access token in order of the following locations: Token contents in the $BEARER_TOKEN environment variable Path to the token in the $BEARER_TOKEN_FILE environment variable Path to the token in $XDG_RUNTIME_DIR/bt_u$UID Token saved to /tmp/bt_u$UID For more details, see the WLCG Bearer Token Discovery technical note .","title":"Using Tokens"},{"location":"security/tokens/using-tokens/#troubleshooting-tokens","text":"","title":"Troubleshooting Tokens"},{"location":"security/tokens/using-tokens/#validating-tokens","text":"A token must be a one-line string consisting of 3 base64-encoded parts separated by periods ( . ). You can use the tools in the scitokens-cpp RPM to validate a SciToken or WLCG token. Run scitokens-verify <TOKEN> (where <TOKEN> is the text of the token) to validate the token using the issuer. Run scitokens-list-access <TOKEN> <ISSUER> <AUDIENCE> (where <TOKEN> is the text of the token, <ISSUER> is the issuer to verify the token with, and <AUDIENCE> is the server you are using the token to access).","title":"Validating tokens"},{"location":"security/tokens/using-tokens/#examining-tokens","text":"Online: paste the token into https://jwt.io . Offline: Install htgettoken : # yum install htgettoken Write the token to a file named tok or store it in one of the default WLCG Bearer Token Discovery locations described above. Run htdecodetoken -H tok or leave off the tok filename if it is in one of the default locations. htdecodetoken is one of the additional commands that come with the htgettoken package.","title":"Examining tokens"},{"location":"submit/ap-ospool-aup/","text":"Acceptable Use Policy For OSG Access Points and the OSPool \u00b6 The OSG Access Points and the Open Science Pool (OSPool) are shared resources in support of US Open Science research. As a shared resource, actions of one researcher can impact other researchers. It is therefore important that all parties involved in using or operating these OSG services follow the Acceptable Use Policies (AUP). The OSPool offers capacity contributed by organizations that are members of the OSG Compute Federation. These contributions are entrusted to the OSG Consortium to be shared with US researchers and their collaborators in support of their research. It is the responsibility of all parties involved to maximize the impact of these contributions to Open Science. The OSG operates shared Access Points ( https://connect.osg-htc.org ) that provide researchers with capabilities to harness the capacity of the OSPool. Misuse of the resources of these Access Points can slow down or prevent the launching of jobs to the OSPool or file movement to and from jobs served by the OSPool. This AUP outlines the responsibilities for operators of Access Points and researchers that place their workloads on OSG-managed Access Points that harness the capacity of the OSPool. General Use Limitations \u00b6 In addition to the user or operator specific policies outlined in the sections below, all AP users and operators are expected to adhere to the following usage limitations: You agree that work running on the OSPool through your Access Point will be relevant to research and/or education efforts associated with an academic, government, or non-profit institution in the United States. Use by external collaborators of such an institution are permitted, provided they are relevant as defined above. Use of other resources and services via the Access Point should also follow relevant policies of use for those resources and services. Efforts benefitting from the OSPool and other Access Point features shall provide appropriate acknowledgement of support or citation; please see this page for information about citation. You shall not use the Access Point or OSPool for any purpose that is unlawful and not (attempt to) breach or circumvent any administrative or security controls. You shall respect intellectual property and confidentiality agreements. You shall protect your access credentials (e.g. private keys, tokens, or passwords). You shall keep all your registered information correct and up to date. You shall immediately report any known or suspected security breach or misuse of the resources/services or access credentials to the specified incident reporting locations and to the relevant credential-issuing authorities. You use the resources/services at your own risk. There is no guarantee that the resources/services will be available at any time or that their integrity or confidentiality will be preserved or that they will suit any purpose. You agree that logged information, including personal data provided by you for registration purposes, may be used for administrative, operational, accounting, monitoring and security purposes. You agree that this logged information may be disclosed to other authorized participants via secured mechanisms, only for the same purposes and only as far as necessary to provide the services. You agree that the body or bodies granting you access and resource/service providers are entitled to regulate, suspend or terminate your access without prior notice and without compensation, within their domain of authority, and you shall immediately comply with their instructions. You are liable for the consequences of your violation of any of these conditions of use, which may include but are not limited to the reporting of your violation to your home institute and, if the activities are thought to be illegal, to appropriate law enforcement agencies. You are responsible for ensuring that your use of OSG Connect and the OSPool is appropriate and does not violate any other requirements. This includes adhering to any applicable agreements regarding appropriate use, any regulatory requirements, any licensing agreements, privacy agreements, or any other requirements covering the data and software which you use with OSG Connect or the OSPool. Access Point Operation Policy \u00b6 This Operational Policy enumerates the roles and responsibilities of the Access Point (AP) operator; the operator of an AP is then responsible for their users' behaviors. The OSG Consortium collects metadata about the jobs run on the OSPool to present resources and funding agencies with information like the resource usage, project, and field of science; APs must participate in this resource accounting system. Additionally, the statistics about jobs (aggregated per-user-per-project) will be used to monitor the overall health and throughput of the pool; staff may follow up with APs in light of poor resource utilization. As the OSPool operations staff has a responsibility to the entire pool, the OSPool operations staff may disable the AP\u2019s connection to the OSPool, as necessary, to maintain operational integrity and resource utilization. When disabled, the AP will no longer receive resources from the OSPool (but may still use other non-OSPool resources). If the OSPool operators are unable to contact the AP administrator in a timely manner to respond to an urgent operational issue (such as a security incident or workload crashing execution points), they may disable the AP. To reduce the security attack surface of the OSPool, operators will disable APs that have been idle for more than 90 days. APs will need to contact the OSPool operators to be re-enabled.","title":"Acceptable Use Policy"},{"location":"submit/ap-ospool-aup/#acceptable-use-policy-for-osg-access-points-and-the-ospool","text":"The OSG Access Points and the Open Science Pool (OSPool) are shared resources in support of US Open Science research. As a shared resource, actions of one researcher can impact other researchers. It is therefore important that all parties involved in using or operating these OSG services follow the Acceptable Use Policies (AUP). The OSPool offers capacity contributed by organizations that are members of the OSG Compute Federation. These contributions are entrusted to the OSG Consortium to be shared with US researchers and their collaborators in support of their research. It is the responsibility of all parties involved to maximize the impact of these contributions to Open Science. The OSG operates shared Access Points ( https://connect.osg-htc.org ) that provide researchers with capabilities to harness the capacity of the OSPool. Misuse of the resources of these Access Points can slow down or prevent the launching of jobs to the OSPool or file movement to and from jobs served by the OSPool. This AUP outlines the responsibilities for operators of Access Points and researchers that place their workloads on OSG-managed Access Points that harness the capacity of the OSPool.","title":"Acceptable Use Policy For OSG Access Points and the OSPool"},{"location":"submit/ap-ospool-aup/#general-use-limitations","text":"In addition to the user or operator specific policies outlined in the sections below, all AP users and operators are expected to adhere to the following usage limitations: You agree that work running on the OSPool through your Access Point will be relevant to research and/or education efforts associated with an academic, government, or non-profit institution in the United States. Use by external collaborators of such an institution are permitted, provided they are relevant as defined above. Use of other resources and services via the Access Point should also follow relevant policies of use for those resources and services. Efforts benefitting from the OSPool and other Access Point features shall provide appropriate acknowledgement of support or citation; please see this page for information about citation. You shall not use the Access Point or OSPool for any purpose that is unlawful and not (attempt to) breach or circumvent any administrative or security controls. You shall respect intellectual property and confidentiality agreements. You shall protect your access credentials (e.g. private keys, tokens, or passwords). You shall keep all your registered information correct and up to date. You shall immediately report any known or suspected security breach or misuse of the resources/services or access credentials to the specified incident reporting locations and to the relevant credential-issuing authorities. You use the resources/services at your own risk. There is no guarantee that the resources/services will be available at any time or that their integrity or confidentiality will be preserved or that they will suit any purpose. You agree that logged information, including personal data provided by you for registration purposes, may be used for administrative, operational, accounting, monitoring and security purposes. You agree that this logged information may be disclosed to other authorized participants via secured mechanisms, only for the same purposes and only as far as necessary to provide the services. You agree that the body or bodies granting you access and resource/service providers are entitled to regulate, suspend or terminate your access without prior notice and without compensation, within their domain of authority, and you shall immediately comply with their instructions. You are liable for the consequences of your violation of any of these conditions of use, which may include but are not limited to the reporting of your violation to your home institute and, if the activities are thought to be illegal, to appropriate law enforcement agencies. You are responsible for ensuring that your use of OSG Connect and the OSPool is appropriate and does not violate any other requirements. This includes adhering to any applicable agreements regarding appropriate use, any regulatory requirements, any licensing agreements, privacy agreements, or any other requirements covering the data and software which you use with OSG Connect or the OSPool.","title":"General Use Limitations"},{"location":"submit/ap-ospool-aup/#access-point-operation-policy","text":"This Operational Policy enumerates the roles and responsibilities of the Access Point (AP) operator; the operator of an AP is then responsible for their users' behaviors. The OSG Consortium collects metadata about the jobs run on the OSPool to present resources and funding agencies with information like the resource usage, project, and field of science; APs must participate in this resource accounting system. Additionally, the statistics about jobs (aggregated per-user-per-project) will be used to monitor the overall health and throughput of the pool; staff may follow up with APs in light of poor resource utilization. As the OSPool operations staff has a responsibility to the entire pool, the OSPool operations staff may disable the AP\u2019s connection to the OSPool, as necessary, to maintain operational integrity and resource utilization. When disabled, the AP will no longer receive resources from the OSPool (but may still use other non-OSPool resources). If the OSPool operators are unable to contact the AP administrator in a timely manner to respond to an urgent operational issue (such as a security incident or workload crashing execution points), they may disable the AP. To reduce the security attack surface of the OSPool, operators will disable APs that have been idle for more than 90 days. APs will need to contact the OSPool operators to be re-enabled.","title":"Access Point Operation Policy"},{"location":"submit/osg-flock/","text":"Installing an Open Science Pool Access Point \u00b6 This document explains how to add a path for user jobs to flow from your local site out to the OSG, which in most cases means that the jobs will have far more resources available to run on than locally. If your local batch system frequently has many jobs waiting to run for a long time, you do not have a local batch system, or if you simply want to provide a local entry point for OSG-bound jobs, adding a path to OSG may result in less waiting for your users. Note that if you do not have a local batch system, consider having your users use OSG Connect , which will require less infrastructure work at your site. Note Flocking to the OSG requires some modification to user workflows. After installation, see the usage section for instructions on what your users will need to do. Background \u00b6 Every batch computing system has one or more entry points that users log on to and use to hand over their computing work to the batch system for completion. For the HTCondor batch system, we say that users log on to a access point (i.e., submit node, submit host) to submit their jobs to HTCondor, where the jobs wait (\"are queued\") until computing resources are available to run them. In a purely local HTCondor system, there are one to a few access points and many computing resources. An HTCondor access point can also be configured to forward excess jobs to an OSG-managed pool. This process is called flocking . If you already have an HTCondor pool, we recommend that you install this software on top of one of your existing HTCondor access points. This approach allows a user to submit locally and have their jobs run locally or, if the user chooses and if local resources are unavailable, have their jobs automatically flock to OSG. If you do not have an HTCondor batch system, following these instructions will install the HTCondor submit service and configure it only to forward jobs to the OSG. In other words, you do not need a whole HTCondor batch system just to have a local OSG access point. System Requirements \u00b6 The hardware requirement for an OSG access point depends on several factors such as number of users, number of jobs and for example how I/O intensity of those jobs. Our minimum recommended configuration is 6 cores, 12 GB RAM and 1 TB of local disk. The hardware can be bare metal or virtual machine, but we do not recommend containers as these submit host are running many system services which can be difficult to configure in a container. Also consider the following configuration requirements: Operating system: Ensure the host has a supported operating system Software repositories: Install the appropriate EPEL and OSG Yum repositories for your operating system User IDs: If it does not exist already, the installation will create the Linux user ID condor . Network: Inbound TCP port 9618 must be open. The access point must have a public IP address with both forward and reverse DNS configured. Scheduling a Planning Consultation \u00b6 Before participating in the OSG, either as a computational resource contributor or consumer, we ask that you contact us to set up a consultation. During this consultation, OSG staff will introduce you and your team to the OSG and develop a plan to meet your resource contribution and/or research goals. Initial Steps \u00b6 Read the Acceptable Usage Policy \u00b6 Be aware that hosting an access point comes with responsibilities, both for the administrators as well as end users of the system. The polices can be found in the Acceptable Usage Policy document . Register your access point in OSG Topology \u00b6 To be part of OSG, your access point should be registered with the OSG. You will need information like the hostname, and the administrative and security contacts. Follow the general registration instructions . For historical reasons, the service type is Submit Node . We also request that you tag the resources with OSPool . An example of a registration is the osg-vo.isi.edu entry Register with COManage \u00b6 The adminstrative contact from the the topology entry needs to register with COManage. Instructions can be found here Next is to retrive a token so that the new submit host can authenticate with the Open Science Pool manager. Please use your COManage registered and approved identity to log into the OSG Token Registration . Once logged in, select Token on Docker , and find your registered submit node in the list. Follow the instructions (you probably have to do the steps on a host with Docker and as root), and once you have the token generated, keep that for later steps. Installing Required Software \u00b6 Flocking requires HTCondor software as well as software for reporting to the OSG accounting system. Start by setting up the EPEL and OSG YUM repositories following the Installing Yum Repositories document. Note that you have to use OSG 3.6 . Earlier versions will not work. Once the YUM repositories are setup, install the osg-flock convenience RPM that installs all required packages. Example on a RHEL 7 host: # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm # yum install osg-flock Upgrading \u00b6 Upgrading from previous versions should be as simple as switching to OSG 3.6, and then issuing yum upgrade . If you made local config changes, please verify that the files under /etc/condor/config.d were renamed/disabled during the upgrade. Note that in some older versions of the package, the Gratia config was kept in /etc/gratia/condor/ProbeConfig . The new location is /etc/gratia/condor-ap/ProbeConfig . The Open Science Pool will no longer accept GSI authentcation. Access points still configured with GSI, will have to be upgraded to OSG 3.6 and switched over to token authentication as described in this document. Configuring Reporting via Gratia \u00b6 Reporting to the OSG accounting system is done using the Gratia service, which consists of probes . HTCondor uses the \"condor-ap\" probe, which is configured in /etc/gratia/condor-ap/ProbeConfig : see this section for more details. Configuring Authentication \u00b6 Create a file named /etc/condor/tokens.d/ospool.token with the IDTOKEN you received earlier. Ensure that there aren't any line breaks in this file (i.e., the entire token should only take up one line). Change the ownership to condor:condor and the permissions to 0600 . Verify this with ls -l /etc/condor/tokens.d/ospool.token : # ls -l /etc/condor/tokens.d/ospool.token -rw------- 1 condor condor 288 Nov 11 09:03 /etc/condor/tokens.d/ospool.token You can also list the token with the condor_token_list command: # condor_token_list Header: {\"alg\":\"HS256\",\"kid\":\"POOL\"} Payload: {\"iat\":1234,\"iss\":\"flock.opensciencegrid.org\",\"jti\":\"...\",\"scope\":\"condor:\\/READ condor:\\/ADVERTISE_SCHEDD\",\"sub\":\"RESOURCE-hostname@flock.opensciencegrid.org\"} File: /etc/condor/tokens.d/ospool.token Managing Services \u00b6 The only service which is required to be running is condor . Enable and restart the sevice: # systemctl enable condor # systemctl restart condor Usage \u00b6 Running jobs in OSG \u00b6 If your users are accustomed to running jobs locally, they may encounter some significant differences when running jobs in OSG. Users should be aware that OSG jobs are distributed across multiple institutions across a large geographical area. Each institution will have its own policy about the kinds of jobs that are allowed to run, and data transfer may be more complicated. The OSG Helpdesk Solutions page has information about what users should know; the Organizing and Submitting HTC Workloads Tutorial and Policies for Using OSG Services and the OSPool are particularly relevant. Specifying a project \u00b6 OSG will only run jobs that have a registered project associated with them. Users must follow the instructions for starting a project in OSG-Connect to register a project. A project is associated with a job by adding a ProjectName line to the user's submit file. For example: +ProjectName = \"My_Project\" The double quotes are necessary . If not quoted, My_Project will be interpreted as an expression, and most likely evaluate to undefined, and prevent your job from running. Get Help \u00b6 If you need help with setup or troubleshooting, see our help procedure .","title":"Install an OSPool Access Point"},{"location":"submit/osg-flock/#installing-an-open-science-pool-access-point","text":"This document explains how to add a path for user jobs to flow from your local site out to the OSG, which in most cases means that the jobs will have far more resources available to run on than locally. If your local batch system frequently has many jobs waiting to run for a long time, you do not have a local batch system, or if you simply want to provide a local entry point for OSG-bound jobs, adding a path to OSG may result in less waiting for your users. Note that if you do not have a local batch system, consider having your users use OSG Connect , which will require less infrastructure work at your site. Note Flocking to the OSG requires some modification to user workflows. After installation, see the usage section for instructions on what your users will need to do.","title":"Installing an Open Science Pool Access Point"},{"location":"submit/osg-flock/#background","text":"Every batch computing system has one or more entry points that users log on to and use to hand over their computing work to the batch system for completion. For the HTCondor batch system, we say that users log on to a access point (i.e., submit node, submit host) to submit their jobs to HTCondor, where the jobs wait (\"are queued\") until computing resources are available to run them. In a purely local HTCondor system, there are one to a few access points and many computing resources. An HTCondor access point can also be configured to forward excess jobs to an OSG-managed pool. This process is called flocking . If you already have an HTCondor pool, we recommend that you install this software on top of one of your existing HTCondor access points. This approach allows a user to submit locally and have their jobs run locally or, if the user chooses and if local resources are unavailable, have their jobs automatically flock to OSG. If you do not have an HTCondor batch system, following these instructions will install the HTCondor submit service and configure it only to forward jobs to the OSG. In other words, you do not need a whole HTCondor batch system just to have a local OSG access point.","title":"Background"},{"location":"submit/osg-flock/#system-requirements","text":"The hardware requirement for an OSG access point depends on several factors such as number of users, number of jobs and for example how I/O intensity of those jobs. Our minimum recommended configuration is 6 cores, 12 GB RAM and 1 TB of local disk. The hardware can be bare metal or virtual machine, but we do not recommend containers as these submit host are running many system services which can be difficult to configure in a container. Also consider the following configuration requirements: Operating system: Ensure the host has a supported operating system Software repositories: Install the appropriate EPEL and OSG Yum repositories for your operating system User IDs: If it does not exist already, the installation will create the Linux user ID condor . Network: Inbound TCP port 9618 must be open. The access point must have a public IP address with both forward and reverse DNS configured.","title":"System Requirements"},{"location":"submit/osg-flock/#scheduling-a-planning-consultation","text":"Before participating in the OSG, either as a computational resource contributor or consumer, we ask that you contact us to set up a consultation. During this consultation, OSG staff will introduce you and your team to the OSG and develop a plan to meet your resource contribution and/or research goals.","title":"Scheduling a Planning Consultation"},{"location":"submit/osg-flock/#initial-steps","text":"","title":"Initial Steps"},{"location":"submit/osg-flock/#read-the-acceptable-usage-policy","text":"Be aware that hosting an access point comes with responsibilities, both for the administrators as well as end users of the system. The polices can be found in the Acceptable Usage Policy document .","title":"Read the Acceptable Usage Policy"},{"location":"submit/osg-flock/#register-your-access-point-in-osg-topology","text":"To be part of OSG, your access point should be registered with the OSG. You will need information like the hostname, and the administrative and security contacts. Follow the general registration instructions . For historical reasons, the service type is Submit Node . We also request that you tag the resources with OSPool . An example of a registration is the osg-vo.isi.edu entry","title":"Register your access point in OSG Topology"},{"location":"submit/osg-flock/#register-with-comanage","text":"The adminstrative contact from the the topology entry needs to register with COManage. Instructions can be found here Next is to retrive a token so that the new submit host can authenticate with the Open Science Pool manager. Please use your COManage registered and approved identity to log into the OSG Token Registration . Once logged in, select Token on Docker , and find your registered submit node in the list. Follow the instructions (you probably have to do the steps on a host with Docker and as root), and once you have the token generated, keep that for later steps.","title":"Register with COManage"},{"location":"submit/osg-flock/#installing-required-software","text":"Flocking requires HTCondor software as well as software for reporting to the OSG accounting system. Start by setting up the EPEL and OSG YUM repositories following the Installing Yum Repositories document. Note that you have to use OSG 3.6 . Earlier versions will not work. Once the YUM repositories are setup, install the osg-flock convenience RPM that installs all required packages. Example on a RHEL 7 host: # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm # yum install https://repo.opensciencegrid.org/osg/3.6/osg-3.6-el7-release-latest.rpm # yum install osg-flock","title":"Installing Required Software"},{"location":"submit/osg-flock/#upgrading","text":"Upgrading from previous versions should be as simple as switching to OSG 3.6, and then issuing yum upgrade . If you made local config changes, please verify that the files under /etc/condor/config.d were renamed/disabled during the upgrade. Note that in some older versions of the package, the Gratia config was kept in /etc/gratia/condor/ProbeConfig . The new location is /etc/gratia/condor-ap/ProbeConfig . The Open Science Pool will no longer accept GSI authentcation. Access points still configured with GSI, will have to be upgraded to OSG 3.6 and switched over to token authentication as described in this document.","title":"Upgrading"},{"location":"submit/osg-flock/#configuring-reporting-via-gratia","text":"Reporting to the OSG accounting system is done using the Gratia service, which consists of probes . HTCondor uses the \"condor-ap\" probe, which is configured in /etc/gratia/condor-ap/ProbeConfig : see this section for more details.","title":"Configuring Reporting via Gratia"},{"location":"submit/osg-flock/#configuring-authentication","text":"Create a file named /etc/condor/tokens.d/ospool.token with the IDTOKEN you received earlier. Ensure that there aren't any line breaks in this file (i.e., the entire token should only take up one line). Change the ownership to condor:condor and the permissions to 0600 . Verify this with ls -l /etc/condor/tokens.d/ospool.token : # ls -l /etc/condor/tokens.d/ospool.token -rw------- 1 condor condor 288 Nov 11 09:03 /etc/condor/tokens.d/ospool.token You can also list the token with the condor_token_list command: # condor_token_list Header: {\"alg\":\"HS256\",\"kid\":\"POOL\"} Payload: {\"iat\":1234,\"iss\":\"flock.opensciencegrid.org\",\"jti\":\"...\",\"scope\":\"condor:\\/READ condor:\\/ADVERTISE_SCHEDD\",\"sub\":\"RESOURCE-hostname@flock.opensciencegrid.org\"} File: /etc/condor/tokens.d/ospool.token","title":"Configuring Authentication"},{"location":"submit/osg-flock/#managing-services","text":"The only service which is required to be running is condor . Enable and restart the sevice: # systemctl enable condor # systemctl restart condor","title":"Managing Services"},{"location":"submit/osg-flock/#usage","text":"","title":"Usage"},{"location":"submit/osg-flock/#running-jobs-in-osg","text":"If your users are accustomed to running jobs locally, they may encounter some significant differences when running jobs in OSG. Users should be aware that OSG jobs are distributed across multiple institutions across a large geographical area. Each institution will have its own policy about the kinds of jobs that are allowed to run, and data transfer may be more complicated. The OSG Helpdesk Solutions page has information about what users should know; the Organizing and Submitting HTC Workloads Tutorial and Policies for Using OSG Services and the OSPool are particularly relevant.","title":"Running jobs in OSG"},{"location":"submit/osg-flock/#specifying-a-project","text":"OSG will only run jobs that have a registered project associated with them. Users must follow the instructions for starting a project in OSG-Connect to register a project. A project is associated with a job by adding a ProjectName line to the user's submit file. For example: +ProjectName = \"My_Project\" The double quotes are necessary . If not quoted, My_Project will be interpreted as an expression, and most likely evaluate to undefined, and prevent your job from running.","title":"Specifying a project"},{"location":"submit/osg-flock/#get-help","text":"If you need help with setup or troubleshooting, see our help procedure .","title":"Get Help"},{"location":"worker-node/install-apptainer/","text":"Install Apptainer \u00b6 Apptainer (formerly known as Singularity, see announcement ) is a tool that creates docker-like process containers but without giving extra privileges to unprivileged users. It is used by pilot jobs (which are submitted by per-collaboration workload management systems) to isolate user jobs from the pilot's files and processes and from other users' files and processes. It also supplies a chroot environment in order to run user jobs in different operating system images under one Linux kernel. Apptainer works either by making use of unprivileged user namespaces or with a setuid-root assist program. By default it does not install the setuid-root assist program and it uses only unprivileged user namespaces. Unprivileged user namespaces are available on all OS versions that OSG supports, although it is not enabled by default on EL 7; instructions to enable it are below . The feature is enabled by default on EL 8. Kernel vs. Userspace Security Enabling unprivileged user namespaces increases the risk to the kernel. However, the kernel is much more widely reviewed than Apptainer and the additional capability given to users is more limited. OSG Security considers the non-setuid, kernel-based method to have a lower security risk, and they also recommend disabling network namespaces as detailed below. The OSG has installed Apptainer in OASIS , so most sites will not need to install Apptainer locally unless they have non-OSG users that need it. This document is intended for system administrators that wish to enable, install, and/or configure Apptainer. Before Starting \u00b6 As with all OSG Software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host If you intend to install Apptainer locally, then prepare the required Yum repositories . Note that the apptainer RPM comes from the EPEL Yum repository. OSG validates that distribution, and detailed instructions are still here. In addition, this is highly recommended for image distribution and for access to Apptainer itself: Install CVMFS Choosing whether or not to install Apptainer \u00b6 There are two sets of instructions on this page: Enabling Unprivileged Apptainer Installing Apptainer OSG VOs all support running apptainer directly from CVMFS, when CVMFS is available and unprivileged user namespaces are enabled. Unprivileged user namespaces are enabled by default on EL 8, and OSG recommends that system administrators enable it on EL 7 worker nodes. When unprivileged user namespaces are enabled, OSG recommends that sites not install Apptainer unless they have non-OSG users that require it. Sites that do want to install apptainer locally have two choices on how to do it. They can install it with a script which creates an unprivileged relocatable installation directory, or they can install it by RPM. Sites that install the RPM will by default still only get a non-setuid installation that makes use of unprivileged user namespaces and will need to install an additional apptainer-suid RPM if they want a setuid installation that does not require unprivileged user namespaces. Enabling Unprivileged Apptainer \u00b6 The instructions in this section are for enabling Apptainer to run unprivileged by enabling unprivileged user namespaces. Enable user namespaces via sysctl on EL 7: If the operating system is an EL 7, enable unprivileged Apptainer with the following steps. This step is not needed on EL 8 because it is enabled by default. root@host # echo \"user.max_user_namespaces = 15000\" \\ > /etc/sysctl.d/90-max_user_namespaces.conf root@host # sysctl -p /etc/sysctl.d/90-max_user_namespaces.conf (Recommended) Disable network namespaces: root@host # echo \"user.max_net_namespaces = 0\" \\ > /etc/sysctl.d/90-max_net_namespaces.conf root@host # sysctl -p /etc/sysctl.d/90-max_net_namespaces.conf OSG VOs do not need network namespaces with Apptainer, and disabling them significantly lowers the risk profile of enabling user namespaces and reduces the frequency of needing to apply urgent updates. Most of the kernel vulnerabilities related to unprivileged user namespaces over the last few years have been in combination with network namespaces. Network namespaces are, however, utilized by other software, such as Docker or Podman. Disabling network namespaces may break other software, or limit its capabilities (such as requiring the --net=host option in Docker or Podman). Disabling network namespaces blocks the systemd PrivateNetwork feature, which is a feature that is used by some EL 8 services. It is also configured for some EL 7 services but they are all disabled by default. To check them all, look for PrivateNetwork in /lib/systemd/system/*.service and see which of those services are enabled but failed to start. The only default such service on EL 8 is systemd-hostnamed, and a popular non-default such service is mlocate-updatedb. The PrivateNetwork feature can be turned off for a service without modifying an RPM-installed file through a <service>.d/*.conf file, for example for systemd-hostnamed: root@host # cd /etc/systemd/system root@host # mkdir -p systemd-hostnamed.service.d root@host # ( echo \"[Service]\" ; echo \"PrivateNetwork=no\" ) \\ >systemd-hostnamed.service.d/no-private-network.conf root@host # systemctl daemon-reload root@host # systemctl start systemd-hostnamed root@host # systemctl status systemd-hostnamed Configuring Docker to work with Apptainer \u00b6 If docker is being used to run jobs, the following options are recommended to allow unprivileged Apptainer to run (it does not need --privileged or any added capabilities): --security-opt seccomp=unconfined --security-opt systempaths=unconfined --security-opt seccomp=unconfined enables unshare to be called (which is needed to create namespaces), and --security-opt systempaths=unconfined allows /proc to be mounted in an unprivileged process namespace (as is done by apptainer exec -p). --security-opt systempaths=unconfined requires Docker 19.03 or later. The options are secure as long as the system administrator controls the images and does not allow user code to run as root, and are generally more secure than adding capabilities. If at this point no setuid or setcap programs needs to be run within the container, adding the following option will improve security by preventing any privilege escalation (Apptainer uses the same feature on its containers): --security-opt no-new-privileges In addition, the following option is recommended for allowing unprivileged fuse mounts: --device=/dev/fuse Configuring Unprivileged Apptainer \u00b6 When unprivileged user namespaces are enabled and VOs run apptainer from CVMFS, the Apptainer configuration file also comes from CVMFS so local sites have no control over changing the configuration. However, the most common local configuration change to the apptainer RPM is to add additional local \"bind path\" options to map extra local file paths into containers. This can instead be accomplished by setting the APPTAINER_BINDPATH variable in the environment of jobs, for example through configuration on your compute entrypoint. This is a comma-separated list of paths to bind, following the syntax of the apptainer exec --bind option. In order to be backward compatible with Singularity, also set SINGULARITY_BINDPATH to the same value. Apptainer also recognizes that variable but it prints a deprecation warning if only a SINGULARITY_ variable is set without the corresponding APPTAINER_ variable. There are also other environment variables that can affect Apptainer operation; see the Apptainer documentation for details. Validating Unprivileged Apptainer in CVMFS \u00b6 If you will not be installing Apptainer locally and you haven't yet installed CVMFS , please do so. Alternatively, use the cvmfsexec package configured for osg as an unprivileged user and mount the oasis.opensciencegrid.org and singularity.opensciencegrid.org repositories. Then as an unprivileged user verify that Apptainer in CVMFS works with this command: user@host $ /cvmfs/oasis.opensciencegrid.org/mis/apptainer/bin/apptainer \\ exec --contain --ipc --pid --bind /cvmfs \\ /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el7:latest \\ ps -ef UID PID PPID C STIME TTY TIME CMD user 1 0 0 10:51 console 00:00:00 appinit user 11 1 0 10:51 console 00:00:00 /usr/bin/ps -ef Installing Apptainer \u00b6 The instructions in this section are for installing a local copy of Apptainer, either an unprivileged installation or an RPM installation. Installing Apptainer via unprivileged script \u00b6 To install a relocatable unprivileged installation of Apptainer, follow the instructions in the upstream documentation . Installing Apptainer via RPM \u00b6 To install the apptainer RPM, make sure that your host is up to date before installing the required packages: Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install Apptainer root@host # yum install apptainer If you choose to install the (not recommended) setuid-root portion of Apptainer, that can be done by instead doing this: root@host # yum install apptainer-suid Configuring Apptainer RPM \u00b6 Generally Apptainer requires no configuration, but if you install it by RPM the primary configuration is done in /etc/apptainer/apptainer.conf . Warning If you modify /etc/apptainer/apptainer.conf , be careful with your upgrade procedures. RPM will not automatically merge your changes with new upstream configuration keys, which may cause a broken install or inadvertently change the site configuration. Apptainer changes its default configuration file more frequently than typical OSG software. Look for apptainer.conf.rpmnew after upgrades and merge in any changes to the defaults. Upgrading from Singularity RPM \u00b6 When upgrading from Singularity to Apptainer, any local changes that were made to /etc/singularity/singularity.conf need to be manually migrated to /etc/apptainer/apptainer.conf and the /etc/singularity directory needs to be deleted. See the Apptainer Migrating from Singularity guide and its explanation of Singularity compatibility for more details. Limiting Image Types with Setuid Installation \u00b6 If the RPM installation is setuid, consider the following. Images based on loopback devices carry an inherently higher exposure to unknown kernel exploits compared to directory-based images distributed via CVMFS. See this article for further discussion. In setuid mode, the SIF images produced by default by Apptainer are mounted with loopback devices. However, OSG VOs only need directory-based images, and Apptainer can also mount SIF images using unprivileged user namespaces. Hence, it is reasonable to disable the loopback-based images by setting the following option in /etc/apptainer/apptainer.conf : max loop devices = 0 While reasonable for some sites, this is not required as there are currently no public kernel exploits for this issue; any exploits are patched by Red Hat when they are discovered. If loopback devices are disabled but unprivileged user namespaces are enabled, then users can run Apptainer with the --userns option (which is the same thing as the default in a non-setuid installation) and still be able to mount images unprivileged, although they will get an error if they don't use the option. Validating Apptainer installation \u00b6 After apptainer is installed, as an ordinary user run the following command to verify it: user@host $ apptainer exec --contain --ipc --pid docker://centos:7 ps -ef UID PID PPID C STIME TTY TIME CMD user 1 0 0 11:07 console 00:00:00 appinit user 12 1 0 11:07 console 00:00:00 /usr/bin/ps -ef Starting and Stopping Services \u00b6 Apptainer has no services to start or stop. References \u00b6 Apptainer Documentation Apptainer Support","title":"Install Apptainer"},{"location":"worker-node/install-apptainer/#install-apptainer","text":"Apptainer (formerly known as Singularity, see announcement ) is a tool that creates docker-like process containers but without giving extra privileges to unprivileged users. It is used by pilot jobs (which are submitted by per-collaboration workload management systems) to isolate user jobs from the pilot's files and processes and from other users' files and processes. It also supplies a chroot environment in order to run user jobs in different operating system images under one Linux kernel. Apptainer works either by making use of unprivileged user namespaces or with a setuid-root assist program. By default it does not install the setuid-root assist program and it uses only unprivileged user namespaces. Unprivileged user namespaces are available on all OS versions that OSG supports, although it is not enabled by default on EL 7; instructions to enable it are below . The feature is enabled by default on EL 8. Kernel vs. Userspace Security Enabling unprivileged user namespaces increases the risk to the kernel. However, the kernel is much more widely reviewed than Apptainer and the additional capability given to users is more limited. OSG Security considers the non-setuid, kernel-based method to have a lower security risk, and they also recommend disabling network namespaces as detailed below. The OSG has installed Apptainer in OASIS , so most sites will not need to install Apptainer locally unless they have non-OSG users that need it. This document is intended for system administrators that wish to enable, install, and/or configure Apptainer.","title":"Install Apptainer"},{"location":"worker-node/install-apptainer/#before-starting","text":"As with all OSG Software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host If you intend to install Apptainer locally, then prepare the required Yum repositories . Note that the apptainer RPM comes from the EPEL Yum repository. OSG validates that distribution, and detailed instructions are still here. In addition, this is highly recommended for image distribution and for access to Apptainer itself: Install CVMFS","title":"Before Starting"},{"location":"worker-node/install-apptainer/#choosing-whether-or-not-to-install-apptainer","text":"There are two sets of instructions on this page: Enabling Unprivileged Apptainer Installing Apptainer OSG VOs all support running apptainer directly from CVMFS, when CVMFS is available and unprivileged user namespaces are enabled. Unprivileged user namespaces are enabled by default on EL 8, and OSG recommends that system administrators enable it on EL 7 worker nodes. When unprivileged user namespaces are enabled, OSG recommends that sites not install Apptainer unless they have non-OSG users that require it. Sites that do want to install apptainer locally have two choices on how to do it. They can install it with a script which creates an unprivileged relocatable installation directory, or they can install it by RPM. Sites that install the RPM will by default still only get a non-setuid installation that makes use of unprivileged user namespaces and will need to install an additional apptainer-suid RPM if they want a setuid installation that does not require unprivileged user namespaces.","title":"Choosing whether or not to install Apptainer"},{"location":"worker-node/install-apptainer/#enabling-unprivileged-apptainer","text":"The instructions in this section are for enabling Apptainer to run unprivileged by enabling unprivileged user namespaces. Enable user namespaces via sysctl on EL 7: If the operating system is an EL 7, enable unprivileged Apptainer with the following steps. This step is not needed on EL 8 because it is enabled by default. root@host # echo \"user.max_user_namespaces = 15000\" \\ > /etc/sysctl.d/90-max_user_namespaces.conf root@host # sysctl -p /etc/sysctl.d/90-max_user_namespaces.conf (Recommended) Disable network namespaces: root@host # echo \"user.max_net_namespaces = 0\" \\ > /etc/sysctl.d/90-max_net_namespaces.conf root@host # sysctl -p /etc/sysctl.d/90-max_net_namespaces.conf OSG VOs do not need network namespaces with Apptainer, and disabling them significantly lowers the risk profile of enabling user namespaces and reduces the frequency of needing to apply urgent updates. Most of the kernel vulnerabilities related to unprivileged user namespaces over the last few years have been in combination with network namespaces. Network namespaces are, however, utilized by other software, such as Docker or Podman. Disabling network namespaces may break other software, or limit its capabilities (such as requiring the --net=host option in Docker or Podman). Disabling network namespaces blocks the systemd PrivateNetwork feature, which is a feature that is used by some EL 8 services. It is also configured for some EL 7 services but they are all disabled by default. To check them all, look for PrivateNetwork in /lib/systemd/system/*.service and see which of those services are enabled but failed to start. The only default such service on EL 8 is systemd-hostnamed, and a popular non-default such service is mlocate-updatedb. The PrivateNetwork feature can be turned off for a service without modifying an RPM-installed file through a <service>.d/*.conf file, for example for systemd-hostnamed: root@host # cd /etc/systemd/system root@host # mkdir -p systemd-hostnamed.service.d root@host # ( echo \"[Service]\" ; echo \"PrivateNetwork=no\" ) \\ >systemd-hostnamed.service.d/no-private-network.conf root@host # systemctl daemon-reload root@host # systemctl start systemd-hostnamed root@host # systemctl status systemd-hostnamed","title":"Enabling Unprivileged Apptainer"},{"location":"worker-node/install-apptainer/#configuring-docker-to-work-with-apptainer","text":"If docker is being used to run jobs, the following options are recommended to allow unprivileged Apptainer to run (it does not need --privileged or any added capabilities): --security-opt seccomp=unconfined --security-opt systempaths=unconfined --security-opt seccomp=unconfined enables unshare to be called (which is needed to create namespaces), and --security-opt systempaths=unconfined allows /proc to be mounted in an unprivileged process namespace (as is done by apptainer exec -p). --security-opt systempaths=unconfined requires Docker 19.03 or later. The options are secure as long as the system administrator controls the images and does not allow user code to run as root, and are generally more secure than adding capabilities. If at this point no setuid or setcap programs needs to be run within the container, adding the following option will improve security by preventing any privilege escalation (Apptainer uses the same feature on its containers): --security-opt no-new-privileges In addition, the following option is recommended for allowing unprivileged fuse mounts: --device=/dev/fuse","title":"Configuring Docker to work with Apptainer"},{"location":"worker-node/install-apptainer/#configuring-unprivileged-apptainer","text":"When unprivileged user namespaces are enabled and VOs run apptainer from CVMFS, the Apptainer configuration file also comes from CVMFS so local sites have no control over changing the configuration. However, the most common local configuration change to the apptainer RPM is to add additional local \"bind path\" options to map extra local file paths into containers. This can instead be accomplished by setting the APPTAINER_BINDPATH variable in the environment of jobs, for example through configuration on your compute entrypoint. This is a comma-separated list of paths to bind, following the syntax of the apptainer exec --bind option. In order to be backward compatible with Singularity, also set SINGULARITY_BINDPATH to the same value. Apptainer also recognizes that variable but it prints a deprecation warning if only a SINGULARITY_ variable is set without the corresponding APPTAINER_ variable. There are also other environment variables that can affect Apptainer operation; see the Apptainer documentation for details.","title":"Configuring Unprivileged Apptainer"},{"location":"worker-node/install-apptainer/#validating-unprivileged-apptainer-in-cvmfs","text":"If you will not be installing Apptainer locally and you haven't yet installed CVMFS , please do so. Alternatively, use the cvmfsexec package configured for osg as an unprivileged user and mount the oasis.opensciencegrid.org and singularity.opensciencegrid.org repositories. Then as an unprivileged user verify that Apptainer in CVMFS works with this command: user@host $ /cvmfs/oasis.opensciencegrid.org/mis/apptainer/bin/apptainer \\ exec --contain --ipc --pid --bind /cvmfs \\ /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el7:latest \\ ps -ef UID PID PPID C STIME TTY TIME CMD user 1 0 0 10:51 console 00:00:00 appinit user 11 1 0 10:51 console 00:00:00 /usr/bin/ps -ef","title":"Validating Unprivileged Apptainer in CVMFS"},{"location":"worker-node/install-apptainer/#installing-apptainer","text":"The instructions in this section are for installing a local copy of Apptainer, either an unprivileged installation or an RPM installation.","title":"Installing Apptainer"},{"location":"worker-node/install-apptainer/#installing-apptainer-via-unprivileged-script","text":"To install a relocatable unprivileged installation of Apptainer, follow the instructions in the upstream documentation .","title":"Installing Apptainer via unprivileged script"},{"location":"worker-node/install-apptainer/#installing-apptainer-via-rpm","text":"To install the apptainer RPM, make sure that your host is up to date before installing the required packages: Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install Apptainer root@host # yum install apptainer If you choose to install the (not recommended) setuid-root portion of Apptainer, that can be done by instead doing this: root@host # yum install apptainer-suid","title":"Installing Apptainer via RPM"},{"location":"worker-node/install-apptainer/#configuring-apptainer-rpm","text":"Generally Apptainer requires no configuration, but if you install it by RPM the primary configuration is done in /etc/apptainer/apptainer.conf . Warning If you modify /etc/apptainer/apptainer.conf , be careful with your upgrade procedures. RPM will not automatically merge your changes with new upstream configuration keys, which may cause a broken install or inadvertently change the site configuration. Apptainer changes its default configuration file more frequently than typical OSG software. Look for apptainer.conf.rpmnew after upgrades and merge in any changes to the defaults.","title":"Configuring Apptainer RPM"},{"location":"worker-node/install-apptainer/#upgrading-from-singularity-rpm","text":"When upgrading from Singularity to Apptainer, any local changes that were made to /etc/singularity/singularity.conf need to be manually migrated to /etc/apptainer/apptainer.conf and the /etc/singularity directory needs to be deleted. See the Apptainer Migrating from Singularity guide and its explanation of Singularity compatibility for more details.","title":"Upgrading from Singularity RPM"},{"location":"worker-node/install-apptainer/#limiting-image-types-with-setuid-installation","text":"If the RPM installation is setuid, consider the following. Images based on loopback devices carry an inherently higher exposure to unknown kernel exploits compared to directory-based images distributed via CVMFS. See this article for further discussion. In setuid mode, the SIF images produced by default by Apptainer are mounted with loopback devices. However, OSG VOs only need directory-based images, and Apptainer can also mount SIF images using unprivileged user namespaces. Hence, it is reasonable to disable the loopback-based images by setting the following option in /etc/apptainer/apptainer.conf : max loop devices = 0 While reasonable for some sites, this is not required as there are currently no public kernel exploits for this issue; any exploits are patched by Red Hat when they are discovered. If loopback devices are disabled but unprivileged user namespaces are enabled, then users can run Apptainer with the --userns option (which is the same thing as the default in a non-setuid installation) and still be able to mount images unprivileged, although they will get an error if they don't use the option.","title":"Limiting Image Types with Setuid Installation"},{"location":"worker-node/install-apptainer/#validating-apptainer-installation","text":"After apptainer is installed, as an ordinary user run the following command to verify it: user@host $ apptainer exec --contain --ipc --pid docker://centos:7 ps -ef UID PID PPID C STIME TTY TIME CMD user 1 0 0 11:07 console 00:00:00 appinit user 12 1 0 11:07 console 00:00:00 /usr/bin/ps -ef","title":"Validating Apptainer installation"},{"location":"worker-node/install-apptainer/#starting-and-stopping-services","text":"Apptainer has no services to start or stop.","title":"Starting and Stopping Services"},{"location":"worker-node/install-apptainer/#references","text":"Apptainer Documentation Apptainer Support","title":"References"},{"location":"worker-node/install-cvmfs/","text":"Installing and Maintaining the CernVM File System Client \u00b6 EL7 version compatibility There is an incompatibility with EL7 < 7.5 due to an old version of the selinux-policy package The CernVM File System ( CVMFS ) is an HTTP-based file distribution service used to provide data and software for jobs. By installing CVMFS, you have access to an alternative installation method for required worker node software and your site you will be able to support a wider range of user jobs. For example, CVMFS provides easy access to the following: The worker node client CA and VO security data Software used by VOs Data stored in StashCache . Use this page to learn how to install, configure, run, test, and troubleshoot the CVMFS client from the OSG software repositories. Applicable versions The applicable software versions for this document are OSG Version >= 3.4.3. The version of CVMFS installed should be >= 2.4.1 Before Starting \u00b6 Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If it does not exist already, the installation will create the cvmfs Linux user Group IDs: If they do not exist already, the installation will create the Linux groups cvmfs and fuse Network ports: You will need network access to a local squid server such as the squid distributed by OSG . The squid will need out-bound access to cvmfs stratum 1 servers. Host choice: - Sufficient (~20GB+20%) cache space reserved, preferably in a separate filesystem (details below ) FUSE : CVMFS requires FUSE As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Installing CVMFS \u00b6 The following will install CVMFS from the OSG yum repository. It will also install fuse and autofs if you do not have them, and it will install the configuration for the OSG CVMFS distribution which is called OASIS. To simplify installation, OSG provides convenience RPMs that install all required software with a single command. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install CVMFS software: root@host # yum install osg-oasis Automount setup \u00b6 If automount is not yet in use on the system, do the following: root@host # systemctl enable autofs root@host # systemctl start autofs Put the following in /etc/auto.master.d/cvmfs.autofs : /cvmfs /etc/auto.cvmfs Restart autofs to make the change take effect: root@host # systemctl restart autofs Configuring CVMFS \u00b6 Create or edit /etc/cvmfs/default.local , a file that controls the CVMFS configuration. Below is a sample configuration, but please note that you will need to edit the parts in angle brackets . In particular, the CVMFS_HTTP_PROXY line below must be edited for your site. CVMFS_REPOSITORIES=\"$((echo oasis.opensciencegrid.org;echo cms.cern.ch;ls /cvmfs)|sort -u|paste -sd ,)\" CVMFS_QUOTA_LIMIT=<QUOTA LIMIT> CVMFS_HTTP_PROXY=\"<SQUID URL>:<SQUID PORT>\" CVMFS by default allows any repository to be mounted, no matter what the setting of CVMFS_REPOSITORIES is; that variable is used by support tools such as cvmfs_config and cvmfs_talk when they need to know a list of repositories. The recommended CVMFS_REPOSITORIES setting above is so that those tools will use two common repositories plus any additional that have been mounted. You may want to choose a different set of always-known repositories. Set up a list of CVMFS HTTP proxies to retrieve from in CVMFS_HTTP_PROXY . If you do not have any squid at your site follow the instructions to install squid from OSG . Vertical bars separating proxies means to load balance between them and try them all before continuing. A semicolon between proxies means to try that one only after the previous ones have failed. For example: CVMFS_HTTP_PROXY=\"http://squid1.example.com:3128|http://squid2.example.com:3128;http://backup-squid.example.com:3128\" If no squid is available, it is acceptable for very small sites and laptops to set CVMFS_HTTP_PROXY=\"DIRECT\" . In that case, the OSG configuration sets the servers to be contacted through globally distributed caches . This is strongly discouraged for large sites because of the performance impact and because of the potential impact on the global caches. When there is at least one local proxy defined, the OSG configuration instead adds fallback proxies at Fermilab and CERN. Those fallback proxies are monitored by a WLCG team that will contact your site when your local proxy is failing and help you fix it. Set up the cache limit in CVMFS_QUOTA_LIMIT (in Megabytes). The recommended value for most applications is 20000 MB. This is the combined limit for all but the osgstorage.org repositories. This cache will be stored in /var/lib/cvmfs by default; to override the location, set CVMFS_CACHE_BASE in /etc/cvmfs/default.local . Note that an additional 1000 MB is allocated for a separate osgstorage.org repositories cache in $CVMFS_CACHE_BASE/osgstorage . To be safe, make sure that at least 20% more than $CVMFS_QUOTA_LIMIT + 1000 MB of space stays available for CVMFS in that filesystem. This is very important, since if that space is not available it can cause many I/O errors and application crashes. Many system administrators choose to put the cache space in a separate filesystem, which is a good way to manage it. If you change CVMFS_CACHE_BASE... The new cache directory must be owned by the cvmfs user, and have 0700 permissions. If you use SELinux, then the new cache directory must be labeled with SELinux type cvmfs_cache_t . This can be done by executing the following command: :::console user@host $ chcon -R -t cvmfs_cache_t Validating CVMFS \u00b6 After CVMFS is installed, you should be able to see the /cvmfs directory. But note that it will initially appear to be empty: user@host $ ls /cvmfs user@host $ Directories within /cvmfs will not be mounted until you examine them. For instance: user@host $ ls /cvmfs user@host $ ls -l /cvmfs/atlas.cern.ch total 1 drwxr-xr-x 8 cvmfs cvmfs 3 Apr 13 14:50 repo user@host $ ls -l /cvmfs/oasis.opensciencegrid.org/cmssoft total 1 lrwxrwxrwx 1 cvmfs cvmfs 18 May 13 2015 cms -> /cvmfs/cms.cern.ch user@host $ ls -l /cvmfs/glast.egi.eu total 5 drwxr-xr-x 9 cvmfs cvmfs 4096 Feb 7 2014 glast user@host $ ls -l /cvmfs/nova.osgstorage.org total 6 lrwxrwxrwx 1 cvmfs cvmfs 43 Jun 14 2016 analysis -> pnfs/fnal.gov/usr/nova/persistent/analysis/ lrwxrwxrwx 1 cvmfs cvmfs 32 Jan 19 11:40 flux -> pnfs/fnal.gov/usr/nova/data/flux drwxr-xr-x 3 cvmfs cvmfs 4096 Jan 19 11:39 pnfs user@host $ ls /cvmfs atlas.cern.ch glast.egi.eu oasis.opensciencegrid.org config-osg.opensciencegrid.org nova.osgstorage.org Troubleshooting problems \u00b6 If no directories exist under /cvmfs/ , you can try the following steps to debug: Mount it manually with mkdir -p /mnt/cvmfs and then mount -t cvmfs REPOSITORYNAME /mnt/cvmfs where REPOSITORYNAME is the repository, for example config-osg.opensciencegrid.org (which is the best one to try first because other repositories require it to be mounted). If this works, then CVMFS is working, but there is a problem with automount. If that doesn't work and doesn't give any explanatory errors, try cvmfs_config chksetup or cvmfs_config showconfig REPOSITORYNAME to verify your setup. If chksetup reports access problems to proxies, it may be caused by access control settings in the squids. If you have changed settings in /etc/cvmfs/default.local , and they do not seem to be taking effect, note that there are other configuration files that can override the settings. See the comments at the beginning of /etc/cvmfs/default.conf regarding the order in which configuration files are evaluated and look for old files that may have been left from a previous installation. More things to try are in the upstream documentation . Starting and Stopping services \u00b6 Once it is set up, CVMFS is always automatically started when one of the repositories are accessed; there are no system services to start. CVMFS can be stopped via: root@host # cvmfs_config umount Unmounting /cvmfs/config-osg.opensciencegrid.org: OK Unmounting /cvmfs/atlas.cern.ch: OK Unmounting /cvmfs/oasis.opensciencegrid.org: OK Unmounting /cvmfs/glast.egi.eu: OK Unmounting /cvmfs/nova.osgstorage.org: OK How to get Help? \u00b6 If you cannot resolve the problem, there are several ways to receive help: For bug reporting and OSG-specific issues, see our help procedure For community support and best-effort software team support contact osg-cvmfs@opensciencegrid.org . For general CernVM File System support contact cernvm.support@cern.ch . References \u00b6 http://cernvm.cern.ch/portal/filesystem/techinformation https://cvmfs.readthedocs.io/en/latest/ Users and Groups \u00b6 This installation will create one user unless it already exists User Comment cvmfs CernVM-FS service account The installation will also create a cvmfs group and default the cvmfs user to that group. In addition, if the fuse RPM is not already installed, installing cvmfs will also install fuse and that will create another group: Group Comment Group members cvmfs CernVM-FS service account none fuse FUSE service account cvmfs","title":"Install CVMFS"},{"location":"worker-node/install-cvmfs/#installing-and-maintaining-the-cernvm-file-system-client","text":"EL7 version compatibility There is an incompatibility with EL7 < 7.5 due to an old version of the selinux-policy package The CernVM File System ( CVMFS ) is an HTTP-based file distribution service used to provide data and software for jobs. By installing CVMFS, you have access to an alternative installation method for required worker node software and your site you will be able to support a wider range of user jobs. For example, CVMFS provides easy access to the following: The worker node client CA and VO security data Software used by VOs Data stored in StashCache . Use this page to learn how to install, configure, run, test, and troubleshoot the CVMFS client from the OSG software repositories. Applicable versions The applicable software versions for this document are OSG Version >= 3.4.3. The version of CVMFS installed should be >= 2.4.1","title":"Installing and Maintaining the CernVM File System Client"},{"location":"worker-node/install-cvmfs/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Reference section below as needed): User IDs: If it does not exist already, the installation will create the cvmfs Linux user Group IDs: If they do not exist already, the installation will create the Linux groups cvmfs and fuse Network ports: You will need network access to a local squid server such as the squid distributed by OSG . The squid will need out-bound access to cvmfs stratum 1 servers. Host choice: - Sufficient (~20GB+20%) cache space reserved, preferably in a separate filesystem (details below ) FUSE : CVMFS requires FUSE As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories","title":"Before Starting"},{"location":"worker-node/install-cvmfs/#installing-cvmfs","text":"The following will install CVMFS from the OSG yum repository. It will also install fuse and autofs if you do not have them, and it will install the configuration for the OSG CVMFS distribution which is called OASIS. To simplify installation, OSG provides convenience RPMs that install all required software with a single command. Clean yum cache: root@host # yum clean all --enablerepo = * Update software: root@host # yum update This command will update all packages Install CVMFS software: root@host # yum install osg-oasis","title":"Installing CVMFS"},{"location":"worker-node/install-cvmfs/#automount-setup","text":"If automount is not yet in use on the system, do the following: root@host # systemctl enable autofs root@host # systemctl start autofs Put the following in /etc/auto.master.d/cvmfs.autofs : /cvmfs /etc/auto.cvmfs Restart autofs to make the change take effect: root@host # systemctl restart autofs","title":"Automount setup"},{"location":"worker-node/install-cvmfs/#configuring-cvmfs","text":"Create or edit /etc/cvmfs/default.local , a file that controls the CVMFS configuration. Below is a sample configuration, but please note that you will need to edit the parts in angle brackets . In particular, the CVMFS_HTTP_PROXY line below must be edited for your site. CVMFS_REPOSITORIES=\"$((echo oasis.opensciencegrid.org;echo cms.cern.ch;ls /cvmfs)|sort -u|paste -sd ,)\" CVMFS_QUOTA_LIMIT=<QUOTA LIMIT> CVMFS_HTTP_PROXY=\"<SQUID URL>:<SQUID PORT>\" CVMFS by default allows any repository to be mounted, no matter what the setting of CVMFS_REPOSITORIES is; that variable is used by support tools such as cvmfs_config and cvmfs_talk when they need to know a list of repositories. The recommended CVMFS_REPOSITORIES setting above is so that those tools will use two common repositories plus any additional that have been mounted. You may want to choose a different set of always-known repositories. Set up a list of CVMFS HTTP proxies to retrieve from in CVMFS_HTTP_PROXY . If you do not have any squid at your site follow the instructions to install squid from OSG . Vertical bars separating proxies means to load balance between them and try them all before continuing. A semicolon between proxies means to try that one only after the previous ones have failed. For example: CVMFS_HTTP_PROXY=\"http://squid1.example.com:3128|http://squid2.example.com:3128;http://backup-squid.example.com:3128\" If no squid is available, it is acceptable for very small sites and laptops to set CVMFS_HTTP_PROXY=\"DIRECT\" . In that case, the OSG configuration sets the servers to be contacted through globally distributed caches . This is strongly discouraged for large sites because of the performance impact and because of the potential impact on the global caches. When there is at least one local proxy defined, the OSG configuration instead adds fallback proxies at Fermilab and CERN. Those fallback proxies are monitored by a WLCG team that will contact your site when your local proxy is failing and help you fix it. Set up the cache limit in CVMFS_QUOTA_LIMIT (in Megabytes). The recommended value for most applications is 20000 MB. This is the combined limit for all but the osgstorage.org repositories. This cache will be stored in /var/lib/cvmfs by default; to override the location, set CVMFS_CACHE_BASE in /etc/cvmfs/default.local . Note that an additional 1000 MB is allocated for a separate osgstorage.org repositories cache in $CVMFS_CACHE_BASE/osgstorage . To be safe, make sure that at least 20% more than $CVMFS_QUOTA_LIMIT + 1000 MB of space stays available for CVMFS in that filesystem. This is very important, since if that space is not available it can cause many I/O errors and application crashes. Many system administrators choose to put the cache space in a separate filesystem, which is a good way to manage it. If you change CVMFS_CACHE_BASE... The new cache directory must be owned by the cvmfs user, and have 0700 permissions. If you use SELinux, then the new cache directory must be labeled with SELinux type cvmfs_cache_t . This can be done by executing the following command: :::console user@host $ chcon -R -t cvmfs_cache_t","title":"Configuring CVMFS"},{"location":"worker-node/install-cvmfs/#validating-cvmfs","text":"After CVMFS is installed, you should be able to see the /cvmfs directory. But note that it will initially appear to be empty: user@host $ ls /cvmfs user@host $ Directories within /cvmfs will not be mounted until you examine them. For instance: user@host $ ls /cvmfs user@host $ ls -l /cvmfs/atlas.cern.ch total 1 drwxr-xr-x 8 cvmfs cvmfs 3 Apr 13 14:50 repo user@host $ ls -l /cvmfs/oasis.opensciencegrid.org/cmssoft total 1 lrwxrwxrwx 1 cvmfs cvmfs 18 May 13 2015 cms -> /cvmfs/cms.cern.ch user@host $ ls -l /cvmfs/glast.egi.eu total 5 drwxr-xr-x 9 cvmfs cvmfs 4096 Feb 7 2014 glast user@host $ ls -l /cvmfs/nova.osgstorage.org total 6 lrwxrwxrwx 1 cvmfs cvmfs 43 Jun 14 2016 analysis -> pnfs/fnal.gov/usr/nova/persistent/analysis/ lrwxrwxrwx 1 cvmfs cvmfs 32 Jan 19 11:40 flux -> pnfs/fnal.gov/usr/nova/data/flux drwxr-xr-x 3 cvmfs cvmfs 4096 Jan 19 11:39 pnfs user@host $ ls /cvmfs atlas.cern.ch glast.egi.eu oasis.opensciencegrid.org config-osg.opensciencegrid.org nova.osgstorage.org","title":"Validating CVMFS"},{"location":"worker-node/install-cvmfs/#troubleshooting-problems","text":"If no directories exist under /cvmfs/ , you can try the following steps to debug: Mount it manually with mkdir -p /mnt/cvmfs and then mount -t cvmfs REPOSITORYNAME /mnt/cvmfs where REPOSITORYNAME is the repository, for example config-osg.opensciencegrid.org (which is the best one to try first because other repositories require it to be mounted). If this works, then CVMFS is working, but there is a problem with automount. If that doesn't work and doesn't give any explanatory errors, try cvmfs_config chksetup or cvmfs_config showconfig REPOSITORYNAME to verify your setup. If chksetup reports access problems to proxies, it may be caused by access control settings in the squids. If you have changed settings in /etc/cvmfs/default.local , and they do not seem to be taking effect, note that there are other configuration files that can override the settings. See the comments at the beginning of /etc/cvmfs/default.conf regarding the order in which configuration files are evaluated and look for old files that may have been left from a previous installation. More things to try are in the upstream documentation .","title":"Troubleshooting problems"},{"location":"worker-node/install-cvmfs/#starting-and-stopping-services","text":"Once it is set up, CVMFS is always automatically started when one of the repositories are accessed; there are no system services to start. CVMFS can be stopped via: root@host # cvmfs_config umount Unmounting /cvmfs/config-osg.opensciencegrid.org: OK Unmounting /cvmfs/atlas.cern.ch: OK Unmounting /cvmfs/oasis.opensciencegrid.org: OK Unmounting /cvmfs/glast.egi.eu: OK Unmounting /cvmfs/nova.osgstorage.org: OK","title":"Starting and Stopping services"},{"location":"worker-node/install-cvmfs/#how-to-get-help","text":"If you cannot resolve the problem, there are several ways to receive help: For bug reporting and OSG-specific issues, see our help procedure For community support and best-effort software team support contact osg-cvmfs@opensciencegrid.org . For general CernVM File System support contact cernvm.support@cern.ch .","title":"How to get Help?"},{"location":"worker-node/install-cvmfs/#references","text":"http://cernvm.cern.ch/portal/filesystem/techinformation https://cvmfs.readthedocs.io/en/latest/","title":"References"},{"location":"worker-node/install-cvmfs/#users-and-groups","text":"This installation will create one user unless it already exists User Comment cvmfs CernVM-FS service account The installation will also create a cvmfs group and default the cvmfs user to that group. In addition, if the fuse RPM is not already installed, installing cvmfs will also install fuse and that will create another group: Group Comment Group members cvmfs CernVM-FS service account none fuse FUSE service account cvmfs","title":"Users and Groups"},{"location":"worker-node/install-singularity/","text":"Install Singularity \u00b6 Singularity has been renamed to Apptainer; see instead Install Apptainer . Enabling Unprivileged Singularity \u00b6 See instead Enabling Unprivileged Apptainer . Singularity via RPM \u00b6 See instead Apptainer via RPM .","title":"Install Singularity"},{"location":"worker-node/install-singularity/#install-singularity","text":"Singularity has been renamed to Apptainer; see instead Install Apptainer .","title":"Install Singularity"},{"location":"worker-node/install-singularity/#enabling-unprivileged-singularity","text":"See instead Enabling Unprivileged Apptainer .","title":"Enabling Unprivileged Singularity"},{"location":"worker-node/install-singularity/#singularity-via-rpm","text":"See instead Apptainer via RPM .","title":"Singularity via RPM"},{"location":"worker-node/install-wn-oasis/","text":"Installing the Worker Node Client via OASIS \u00b6 The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Use from OASIS (this guide) - useful when CVMFS is already mounted on your worker nodes Install using a tarball - useful when installing onto a shared filesystem for distribution to worker nodes Install using RPMs and Yum - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs This document is intended to guide system administrators through the process of configuring a site to make the Worker Node Client software available from OASIS. Before Starting \u00b6 As with all OSG Software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system On every worker node, install and configure CVMFS Once configured to use OASIS, OSG jobs will download the worker-node software on demand (into the local disk cache). This may result in extra network activity, especially on first use of the client tools. Configure the CE \u00b6 Determine the OASIS path to the Worker Node Client software for your worker nodes: Worker Node OS Use\u2026 EL 7 (64-bit) /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el7-x86_64 EL 8 (64-bit) /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el8-x86_64 On the CE, in the /etc/osg/config.d/10-storage.ini file, set the grid_dir configuration setting to the path from the previous step. Once you finish making changes to configuration files on your CE, validate, fix, and apply the configuration: root@host # osg-configure -v root@host # osg-configure -c For more information, see the OSG worker node environment documentation and the CE configuration instructions . Validating the Worker Node Client \u00b6 To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of OSG_GRID is set to the directory of your WN Client installation Manually Using the Worker Node Client From OASIS \u00b6 If you must log onto a worker node and use the Worker Node Client software directly during your login session, consult the following table for the command to set up your environment: Worker Node OS Run the following command\u2026 EL 7 (64-bit) source /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el7-x86_64/setup.sh EL 8 (64-bit) source /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el8-x86_64/setup.sh Getting Help \u00b6 To get assistance, please use this page .","title":"Install from OASIS"},{"location":"worker-node/install-wn-oasis/#installing-the-worker-node-client-via-oasis","text":"The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Use from OASIS (this guide) - useful when CVMFS is already mounted on your worker nodes Install using a tarball - useful when installing onto a shared filesystem for distribution to worker nodes Install using RPMs and Yum - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs This document is intended to guide system administrators through the process of configuring a site to make the Worker Node Client software available from OASIS.","title":"Installing the Worker Node Client via OASIS"},{"location":"worker-node/install-wn-oasis/#before-starting","text":"As with all OSG Software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system On every worker node, install and configure CVMFS Once configured to use OASIS, OSG jobs will download the worker-node software on demand (into the local disk cache). This may result in extra network activity, especially on first use of the client tools.","title":"Before Starting"},{"location":"worker-node/install-wn-oasis/#configure-the-ce","text":"Determine the OASIS path to the Worker Node Client software for your worker nodes: Worker Node OS Use\u2026 EL 7 (64-bit) /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el7-x86_64 EL 8 (64-bit) /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el8-x86_64 On the CE, in the /etc/osg/config.d/10-storage.ini file, set the grid_dir configuration setting to the path from the previous step. Once you finish making changes to configuration files on your CE, validate, fix, and apply the configuration: root@host # osg-configure -v root@host # osg-configure -c For more information, see the OSG worker node environment documentation and the CE configuration instructions .","title":"Configure the CE"},{"location":"worker-node/install-wn-oasis/#validating-the-worker-node-client","text":"To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of OSG_GRID is set to the directory of your WN Client installation","title":"Validating the Worker Node Client"},{"location":"worker-node/install-wn-oasis/#manually-using-the-worker-node-client-from-oasis","text":"If you must log onto a worker node and use the Worker Node Client software directly during your login session, consult the following table for the command to set up your environment: Worker Node OS Run the following command\u2026 EL 7 (64-bit) source /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el7-x86_64/setup.sh EL 8 (64-bit) source /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el8-x86_64/setup.sh","title":"Manually Using the Worker Node Client From OASIS"},{"location":"worker-node/install-wn-oasis/#getting-help","text":"To get assistance, please use this page .","title":"Getting Help"},{"location":"worker-node/install-wn-tarball/","text":"Installing the Worker Node Client via Tarball \u00b6 The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Install using a tarball (this guide) - useful when installing onto a shared filesystem for distribution to worker nodes Use from OASIS - useful when worker nodes already mount CVMFS Install using RPMs and Yum - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs This document is intended to guide users through the process of installing the worker node software and configuring the installed worker node software. Although this document is oriented to system administrators, any unprivileged user may install and use the client. Before starting, ensure the host has a supported operating system . Download the WN Client \u00b6 Please pick the osg-wn-client tarball that is appropriate for your distribution and architecture. You will find them in https://repo.opensciencegrid.org/tarball-install/ . For OSG 23: Binaries for RHEL8-compatible Binaries for RHEL9-compatible For OSG 3.6: Binaries for RHEL7-compatible Binaries for RHEL8-compatible Binaries for RHEL9-compatible Install the WN Client \u00b6 Unpack the tarball. Move the directory that was created to where you want the tarball client to be. Run osg-post-install ( <PATH_TO_CLIENT>/osg/osg-post-install ) to fix the directories in the installation. Source the setup source <PATH_TO_CLIENT>/setup.sh (or setup.csh depending on the shell). Download and set up CA certificates using osg-ca-manage (See the CA management documentation for the available options). Download CRLs using fetch-crl . Note The WN client requires a Perl interpreter to be available in /usr/bin/perl . If not present, install by running yum install perl as root. Warning Once osg-post-install is run to relocate the install, it cannot be run again. You will need to unpack a fresh copy. Example EL9 installation (in /home/user/test-install , the <PATH_TO_CLIENT>/ is /home/user/test-install/osg-wn-client ): user@host $ mkdir /home/user/test-install user@host $ cd /home/user/test-install user@host $ wget https://repo.opensciencegrid.org/tarball-install/23-main/osg-wn-client-latest.el9.x86_64.tar.gz user@host $ tar xzf osg-wn-client-latest.el9.x86_64.tar.gz user@host $ cd osg-wn-client user@host $ ./osg/osg-post-install user@host $ source setup.sh user@host $ osg-ca-manage setupCA --url osg user@host $ fetch-crl Configure the CE \u00b6 Using the wn-client software installed from the tarball will require a few changes on the compute entrypoint so that the resource's configuration can be correctly reported. Set grid_dir in the Storage section of your OSG-Configure configs: CE configuration instructions . grid_dir is used as the $OSG_GRID environment variable in running jobs - see the worker node environment document . Pilot jobs source $OSG_GRID/setup.sh before performing any work. The value set for grid_dir must be the path of the wn-client installation directory. This is the path returned by echo $OSG_LOCATION once you source the setup file created by this installation. Services \u00b6 The WN client is a collection of client programs that do not require service startup or shutdown. The only services are osg-update-certs that keeps the CA certificates up-to-date and fetch-crl that keeps the CRLs up-to-date. Following the instructions below you'll add the services to your crontab that will take care to run them periodically until you remove them. Auto-updating certificates and CRLs \u00b6 You must create cron jobs to run fetch-crl and osg-update-certs to update your CRLs and certificates automatically. Here is what they should look like. (Note: fill in <OSG_LOCATION> with the full path of your tarball install, including osg-wn-client that is created by the tarball). # Cron job to update certs. # Runs every hour by default, though does not update certs until they're at # least 24 hours old. There is a random sleep time for up to 45 minutes (2700 # seconds) to avoid overloading cert servers. 10 * * * * ( . <OSG_LOCATION>/setup.sh && osg-update-certs --random-sleep 2700 --called-from-cron ) # Cron job to update CRLs # Runs every 6 hours at, 45 minutes +/- 3 minutes. 42 */6 * * * ( . <OSG_LOCATION>/setup.sh && fetch-crl -q -r 360 ) You might want to configure proxy settings in $OSG_LOCATION/etc/fetch-crl.conf . Enabling and Disabling Services \u00b6 To enable the CRL updates, you must edit your cron with crontab -e and add the lines above. To disable, remove the lines from the crontab . Validating the Worker Node Client \u00b6 To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of $OSG_GRID is set to the directory of your worker node client installation How to get Help? \u00b6 To get assistance please use this Help Procedure .","title":"Install from Tarball"},{"location":"worker-node/install-wn-tarball/#installing-the-worker-node-client-via-tarball","text":"The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Install using a tarball (this guide) - useful when installing onto a shared filesystem for distribution to worker nodes Use from OASIS - useful when worker nodes already mount CVMFS Install using RPMs and Yum - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs This document is intended to guide users through the process of installing the worker node software and configuring the installed worker node software. Although this document is oriented to system administrators, any unprivileged user may install and use the client. Before starting, ensure the host has a supported operating system .","title":"Installing the Worker Node Client via Tarball"},{"location":"worker-node/install-wn-tarball/#download-the-wn-client","text":"Please pick the osg-wn-client tarball that is appropriate for your distribution and architecture. You will find them in https://repo.opensciencegrid.org/tarball-install/ . For OSG 23: Binaries for RHEL8-compatible Binaries for RHEL9-compatible For OSG 3.6: Binaries for RHEL7-compatible Binaries for RHEL8-compatible Binaries for RHEL9-compatible","title":"Download the WN Client"},{"location":"worker-node/install-wn-tarball/#install-the-wn-client","text":"Unpack the tarball. Move the directory that was created to where you want the tarball client to be. Run osg-post-install ( <PATH_TO_CLIENT>/osg/osg-post-install ) to fix the directories in the installation. Source the setup source <PATH_TO_CLIENT>/setup.sh (or setup.csh depending on the shell). Download and set up CA certificates using osg-ca-manage (See the CA management documentation for the available options). Download CRLs using fetch-crl . Note The WN client requires a Perl interpreter to be available in /usr/bin/perl . If not present, install by running yum install perl as root. Warning Once osg-post-install is run to relocate the install, it cannot be run again. You will need to unpack a fresh copy. Example EL9 installation (in /home/user/test-install , the <PATH_TO_CLIENT>/ is /home/user/test-install/osg-wn-client ): user@host $ mkdir /home/user/test-install user@host $ cd /home/user/test-install user@host $ wget https://repo.opensciencegrid.org/tarball-install/23-main/osg-wn-client-latest.el9.x86_64.tar.gz user@host $ tar xzf osg-wn-client-latest.el9.x86_64.tar.gz user@host $ cd osg-wn-client user@host $ ./osg/osg-post-install user@host $ source setup.sh user@host $ osg-ca-manage setupCA --url osg user@host $ fetch-crl","title":"Install the WN Client"},{"location":"worker-node/install-wn-tarball/#configure-the-ce","text":"Using the wn-client software installed from the tarball will require a few changes on the compute entrypoint so that the resource's configuration can be correctly reported. Set grid_dir in the Storage section of your OSG-Configure configs: CE configuration instructions . grid_dir is used as the $OSG_GRID environment variable in running jobs - see the worker node environment document . Pilot jobs source $OSG_GRID/setup.sh before performing any work. The value set for grid_dir must be the path of the wn-client installation directory. This is the path returned by echo $OSG_LOCATION once you source the setup file created by this installation.","title":"Configure the CE"},{"location":"worker-node/install-wn-tarball/#services","text":"The WN client is a collection of client programs that do not require service startup or shutdown. The only services are osg-update-certs that keeps the CA certificates up-to-date and fetch-crl that keeps the CRLs up-to-date. Following the instructions below you'll add the services to your crontab that will take care to run them periodically until you remove them.","title":"Services"},{"location":"worker-node/install-wn-tarball/#auto-updating-certificates-and-crls","text":"You must create cron jobs to run fetch-crl and osg-update-certs to update your CRLs and certificates automatically. Here is what they should look like. (Note: fill in <OSG_LOCATION> with the full path of your tarball install, including osg-wn-client that is created by the tarball). # Cron job to update certs. # Runs every hour by default, though does not update certs until they're at # least 24 hours old. There is a random sleep time for up to 45 minutes (2700 # seconds) to avoid overloading cert servers. 10 * * * * ( . <OSG_LOCATION>/setup.sh && osg-update-certs --random-sleep 2700 --called-from-cron ) # Cron job to update CRLs # Runs every 6 hours at, 45 minutes +/- 3 minutes. 42 */6 * * * ( . <OSG_LOCATION>/setup.sh && fetch-crl -q -r 360 ) You might want to configure proxy settings in $OSG_LOCATION/etc/fetch-crl.conf .","title":"Auto-updating certificates and CRLs"},{"location":"worker-node/install-wn-tarball/#enabling-and-disabling-services","text":"To enable the CRL updates, you must edit your cron with crontab -e and add the lines above. To disable, remove the lines from the crontab .","title":"Enabling and Disabling Services"},{"location":"worker-node/install-wn-tarball/#validating-the-worker-node-client","text":"To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of $OSG_GRID is set to the directory of your worker node client installation","title":"Validating the Worker Node Client"},{"location":"worker-node/install-wn-tarball/#how-to-get-help","text":"To get assistance please use this Help Procedure .","title":"How to get Help?"},{"location":"worker-node/install-wn/","text":"Installing the Worker Node Client From RPMs \u00b6 The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Install using RPMs and Yum (this guide) - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs Use from OASIS - useful when worker nodes already mount CVMFS Install using a tarball - useful when installing onto a shared filesystem for distribution to worker nodes This document is intended to guide system administrators through the process of configuring a site to make the Worker Node Client software available from an RPM. Before Starting \u00b6 As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Install the Worker Node Client \u00b6 Install the Worker Node Client RPM: root@host # yum install osg-wn-client Services \u00b6 Fetch-CRL is the only service required to support the WN Client. Software Service name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info Note fetch-crl-boot will begin fetching CRLS, which can take a few minutes and fail on transient errors. You can add configuration to ignore these transient errors in /etc/fetch-crl.conf : noerrors As a reminder, here are common service commands (all run as root ): To \u2026 Run the command \u2026 Start a service service SERVICE-NAME start Stop a service service SERVICE-NAME stop Enable a service to start during boot chkconfig SERVICE-NAME on Disable a service from starting during boot chkconfig SERVICE-NAME off Validating the Worker Node Client \u00b6 To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of OSG_GRID is set to /etc/osg/wn-client How to get Help? \u00b6 To get assistance please use this Help Procedure .","title":"Install from RPM"},{"location":"worker-node/install-wn/#installing-the-worker-node-client-from-rpms","text":"The OSG Worker Node Client is a collection of software components that is expected to be added to every worker node that can run OSG jobs. It provides a common environment and a minimal set of common tools that all OSG jobs can expect to use. Contents of the worker node client can be found here . Note It is possible to install the Worker Node Client software in a variety of ways, depending on your local site: Install using RPMs and Yum (this guide) - useful when managing your worker nodes with a tool (e.g., Puppet, Chef) that can automate RPM installs Use from OASIS - useful when worker nodes already mount CVMFS Install using a tarball - useful when installing onto a shared filesystem for distribution to worker nodes This document is intended to guide system administrators through the process of configuring a site to make the Worker Node Client software available from an RPM.","title":"Installing the Worker Node Client From RPMs"},{"location":"worker-node/install-wn/#before-starting","text":"As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Before Starting"},{"location":"worker-node/install-wn/#install-the-worker-node-client","text":"Install the Worker Node Client RPM: root@host # yum install osg-wn-client","title":"Install the Worker Node Client"},{"location":"worker-node/install-wn/#services","text":"Fetch-CRL is the only service required to support the WN Client. Software Service name Notes Fetch CRL EL8: fetch-crl.timer EL7: fetch-crl-boot and fetch-crl-cron See CA documentation for more info Note fetch-crl-boot will begin fetching CRLS, which can take a few minutes and fail on transient errors. You can add configuration to ignore these transient errors in /etc/fetch-crl.conf : noerrors As a reminder, here are common service commands (all run as root ): To \u2026 Run the command \u2026 Start a service service SERVICE-NAME start Stop a service service SERVICE-NAME stop Enable a service to start during boot chkconfig SERVICE-NAME on Disable a service from starting during boot chkconfig SERVICE-NAME off","title":"Services"},{"location":"worker-node/install-wn/#validating-the-worker-node-client","text":"To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output. Submit a job that executes the env command (e.g. Run condor_ce_trace with the -d flag from your HTCondor CE) Verify that the value of OSG_GRID is set to /etc/osg/wn-client","title":"Validating the Worker Node Client"},{"location":"worker-node/install-wn/#how-to-get-help","text":"To get assistance please use this Help Procedure .","title":"How to get Help?"},{"location":"worker-node/using-wn-containers/","text":"Using the Worker Node Containers \u00b6 The OSG worker node containers contain the suggested base environment for worker nodes. They can be used as a base image to build containers or to perform testing. The containers are available on Docker Hub . Available Containers \u00b6 Available tags include: latest : The latest version of the OSG worker node environment on the most recent supported OS. As of August 2021, this is OSG 3.6 and RHEL8. 3.6 : The OSG 3.6 release series on top of the most recent supported OS. As of August 2021, this is RHEL8. 3.6-el7 : The OSG 3.6 release series on top of a RHEL7 environment. 3.6-el8 : The OSG 3.6 release series on top of a RHEL8 environment. Building Upon the Container \u00b6 You may base the container on the OSG worker node by including it inside your Dockerfile : FROM opensciencegrid/osg-wn:latest You can replace latest with any tag listed above. Perform Testing \u00b6 You may perform testing from within the OSG worker node envionment by running the command: root@host # docker run -ti --rm opensciencegrid/osg-wn:latest /bin/bash","title":"Using WN Containers"},{"location":"worker-node/using-wn-containers/#using-the-worker-node-containers","text":"The OSG worker node containers contain the suggested base environment for worker nodes. They can be used as a base image to build containers or to perform testing. The containers are available on Docker Hub .","title":"Using the Worker Node Containers"},{"location":"worker-node/using-wn-containers/#available-containers","text":"Available tags include: latest : The latest version of the OSG worker node environment on the most recent supported OS. As of August 2021, this is OSG 3.6 and RHEL8. 3.6 : The OSG 3.6 release series on top of the most recent supported OS. As of August 2021, this is RHEL8. 3.6-el7 : The OSG 3.6 release series on top of a RHEL7 environment. 3.6-el8 : The OSG 3.6 release series on top of a RHEL8 environment.","title":"Available Containers"},{"location":"worker-node/using-wn-containers/#building-upon-the-container","text":"You may base the container on the OSG worker node by including it inside your Dockerfile : FROM opensciencegrid/osg-wn:latest You can replace latest with any tag listed above.","title":"Building Upon the Container"},{"location":"worker-node/using-wn-containers/#perform-testing","text":"You may perform testing from within the OSG worker node envionment by running the command: root@host # docker run -ti --rm opensciencegrid/osg-wn:latest /bin/bash","title":"Perform Testing"},{"location":"worker-node/using-wn/","text":"Worker Node Overview \u00b6 The Worker Node Client is a collection of useful software components that is expected to be on every OSG worker node. In addition, a job running on a worker node can access a handful of environment variables that can be used to locate resources. This page describes how to initialize the environment of your job to correctly access the execution and data areas from the worker node. The OSG provides no scientific software dependencies or software build tools on the worker node; you are expected to bring along all application-level dependencies yourself (preferred; most portable) or utilize CVMFS. Sites are not required to provide any specific tools ( gcc , lapack , blas , etc.) beyond the ones in the OSG worker node client and the base OS. If you would like to test the minimal OS environment that jobs can expect, you can test out your scientific software in the OSG Docker image . Filling local scratch disk The directory specified by the OSG_WN_TMP environment variable is used by pilot jobs as a temporary staging area for user job data during the lifetime of the pilot. If many pilot jobs do not exit cleanly (e.g., due to preemption), this may result in the local scratch directory filling up, which could negatively affect other jobs running on the impacted node. See this section for suggestions for mitigation. Hardware Recommendations \u00b6 Hardware Minimum Recommended Notes Core per pilot 1 8 Depends on the supported VOs. The total core count on every node in the cluster must be divisible by core per pilot. Memory per core 1024MB 2048MB Memory per core times core per pilot needs to be less than the total memory on every node. Do not overcommit. Scratch disk per core ( OSG_WN_TMP ) 2 GB 10 GB This can be overcommitted if a mix of different VO jobs is expected. CVMFS Cache per node (optional) 10 GB 20 GB Common Software Available on Worker Nodes \u00b6 The OSG worker node environment contains the following software: Data and related tooling: The supported set of CA certificates (located in $X509_CERT_DIR after the environment is set up) VO authentication: vo-client Update Certificate Revocation Lists: fetch-crl Proxy management tools: Create proxies: voms-proxy-init Show proxy info: voms-proxy-info Destroy the current proxy: voms-proxy-destroy Data transfer tools: HTTP/plain FTP protocol tools (via system dependencies): wget and curl : standard tools for downloading files with HTTP and FTP Transfer clients GFAL -based client ( gfal-copy and others). GFAL supports SRM, XRootD, and HTTP protocols. The stashcp data federation client The XRootD command line client, xrdcp Troubelshooting tool: osg-system-profiler At some sites, these tools may not be available at the pilot launch. To setup the environment, do the following: user@host $ source $OSG_GRID /setup.sh This should be done by a pilot job, not by the end-user payload. The Worker Node Environment \u00b6 The following table outlines the various important directories and information in the worker node environment. A job running on an OSG worker node can refer to each directory using the corresponding environment variable. Several of them are defined as options in your OSG-Configure .ini files in /etc/osg/config.d . Custom variables and those that aren't listed may be defined in the Local Settings section . Environment Variable OSG-Configure section/option Purpose Notes $OSG_GRID Storage / grid_dir Location of additional environment variables. Pilots should source $OSG_GRID/setup.sh in order to guarantee the environment contains the worker node binaries in $PATH . $OSG_SQUID_LOCATION , Squid / location Location of a HTTP caching proxy server Utilize this service for downloading files via HTTP for cache-friendly workflows. $OSG_WN_TMP Storage / worker_node_temp Temporary storage area workspace for pilot job(s) Local to each worker node. See this section below for details. $X509_CERT_DIR Location of the CA certificates If not defined, defaults to /etc/grid-security/certificates . $_CONDOR_SCRATCH_DIR Suggested temporary storage for glideinWMS-based payloads. Users should prefer this environment variable over $OSG_WN_TMP if running inside glideinWMS. OSG_WN_TMP \u00b6 As described above OSG_WN_TMP is a temporary storage area on each worker node for pilot jobs to use as temporary scratch space. Its value is set through the configuration of your CE . For site administrators \u00b6 Filling local scratch disk The directory specified by the OSG_WN_TMP environment variable is used by pilot jobs as a temporary staging area for user job data during the lifetime of the pilot. If many pilot jobs do not exit cleanly (e.g., due to preemption), this may result in the local scratch directory filling up, which could negatively affect other jobs running on the impacted node. Site administrators are responsible for cleaning up the contents of $OSG_WN_TMP (see table above for size recommendations). We recommend one of the following solutions: (Recommended) Use batch-system capabilities to create directories in the job scratch directory and bind mount them for the job so that the batch system performs the clean up. For HTCondor batch systems , HTCondor has this ability through MOUNT_UNDER_SCRATCH : MOUNT_UNDER_SCRATCH = $(MOUNT_UNDER_SCRATCH), <PATH TO OSG_WN_TMP> If using this method, space set aside for OSG_WN_TMP should be reallocated to the partition containing the job scratch directories. If using HTCondor, this will be the partition containing the path defined by the HTCondor EXECUTE configuration variable. For Slurm batch systems , we recommend using the Lua plugin Slurm-tmpdir alongside prolog/epilog scripts ( https://slurm.schedmd.com/prolog_epilog.html ). This method will create per job /scratch and /tmp directories which will be cleaned up after the job completes. Periodically purge the directory (e.g. tmpwatch ). Job removal grace periods Additionally, increasing the batch system grace period for job removal will give pilot jobs a better chance of cleaning up after themselves. For example, the time between scancel triggering a SIGTERM and a SIGKILL is controlled by the value of the KillWait configuration. Consider increasing this grace period scaling with the number of cores given to a pilot job as there could be more data to clean up with an increasing core count. For VO managers \u00b6 Note The following advice applies to VO managers or maintainers of pilot software; end-users should contact their VO for the proper locations to stage temporary work (often, this will be either $TMPDIR or $_CONDOR_SCRATCH_DIR ). Be careful with using $OSG_WN_TMP ; at some sites, this directory might be shared with other VOs. We recommend creating a new sub-directory as a precaution: mkdir -p $OSG_WN_TMP /MYVO export mydir = ` mktemp -d -t MYVO ` cd $mydir # Run the rest of your application rm -rf $mydir The pilot should utilize $TMPDIR to communicate the location of temporary storage to payloads. A significant number of sites use the batch system to make an independent directory for each user job, and change $OSG_WN_TMP on the fly to point to this directory. There is no way to know in advance how much scratch disk space any given worker node has available; recall, what disk space is available may be shared among a number of job slots.","title":"Overview"},{"location":"worker-node/using-wn/#worker-node-overview","text":"The Worker Node Client is a collection of useful software components that is expected to be on every OSG worker node. In addition, a job running on a worker node can access a handful of environment variables that can be used to locate resources. This page describes how to initialize the environment of your job to correctly access the execution and data areas from the worker node. The OSG provides no scientific software dependencies or software build tools on the worker node; you are expected to bring along all application-level dependencies yourself (preferred; most portable) or utilize CVMFS. Sites are not required to provide any specific tools ( gcc , lapack , blas , etc.) beyond the ones in the OSG worker node client and the base OS. If you would like to test the minimal OS environment that jobs can expect, you can test out your scientific software in the OSG Docker image . Filling local scratch disk The directory specified by the OSG_WN_TMP environment variable is used by pilot jobs as a temporary staging area for user job data during the lifetime of the pilot. If many pilot jobs do not exit cleanly (e.g., due to preemption), this may result in the local scratch directory filling up, which could negatively affect other jobs running on the impacted node. See this section for suggestions for mitigation.","title":"Worker Node Overview"},{"location":"worker-node/using-wn/#hardware-recommendations","text":"Hardware Minimum Recommended Notes Core per pilot 1 8 Depends on the supported VOs. The total core count on every node in the cluster must be divisible by core per pilot. Memory per core 1024MB 2048MB Memory per core times core per pilot needs to be less than the total memory on every node. Do not overcommit. Scratch disk per core ( OSG_WN_TMP ) 2 GB 10 GB This can be overcommitted if a mix of different VO jobs is expected. CVMFS Cache per node (optional) 10 GB 20 GB","title":"Hardware Recommendations"},{"location":"worker-node/using-wn/#common-software-available-on-worker-nodes","text":"The OSG worker node environment contains the following software: Data and related tooling: The supported set of CA certificates (located in $X509_CERT_DIR after the environment is set up) VO authentication: vo-client Update Certificate Revocation Lists: fetch-crl Proxy management tools: Create proxies: voms-proxy-init Show proxy info: voms-proxy-info Destroy the current proxy: voms-proxy-destroy Data transfer tools: HTTP/plain FTP protocol tools (via system dependencies): wget and curl : standard tools for downloading files with HTTP and FTP Transfer clients GFAL -based client ( gfal-copy and others). GFAL supports SRM, XRootD, and HTTP protocols. The stashcp data federation client The XRootD command line client, xrdcp Troubelshooting tool: osg-system-profiler At some sites, these tools may not be available at the pilot launch. To setup the environment, do the following: user@host $ source $OSG_GRID /setup.sh This should be done by a pilot job, not by the end-user payload.","title":"Common Software Available on Worker Nodes"},{"location":"worker-node/using-wn/#the-worker-node-environment","text":"The following table outlines the various important directories and information in the worker node environment. A job running on an OSG worker node can refer to each directory using the corresponding environment variable. Several of them are defined as options in your OSG-Configure .ini files in /etc/osg/config.d . Custom variables and those that aren't listed may be defined in the Local Settings section . Environment Variable OSG-Configure section/option Purpose Notes $OSG_GRID Storage / grid_dir Location of additional environment variables. Pilots should source $OSG_GRID/setup.sh in order to guarantee the environment contains the worker node binaries in $PATH . $OSG_SQUID_LOCATION , Squid / location Location of a HTTP caching proxy server Utilize this service for downloading files via HTTP for cache-friendly workflows. $OSG_WN_TMP Storage / worker_node_temp Temporary storage area workspace for pilot job(s) Local to each worker node. See this section below for details. $X509_CERT_DIR Location of the CA certificates If not defined, defaults to /etc/grid-security/certificates . $_CONDOR_SCRATCH_DIR Suggested temporary storage for glideinWMS-based payloads. Users should prefer this environment variable over $OSG_WN_TMP if running inside glideinWMS.","title":"The Worker Node Environment"},{"location":"worker-node/using-wn/#osg_wn_tmp","text":"As described above OSG_WN_TMP is a temporary storage area on each worker node for pilot jobs to use as temporary scratch space. Its value is set through the configuration of your CE .","title":"OSG_WN_TMP"},{"location":"worker-node/using-wn/#for-site-administrators","text":"Filling local scratch disk The directory specified by the OSG_WN_TMP environment variable is used by pilot jobs as a temporary staging area for user job data during the lifetime of the pilot. If many pilot jobs do not exit cleanly (e.g., due to preemption), this may result in the local scratch directory filling up, which could negatively affect other jobs running on the impacted node. Site administrators are responsible for cleaning up the contents of $OSG_WN_TMP (see table above for size recommendations). We recommend one of the following solutions: (Recommended) Use batch-system capabilities to create directories in the job scratch directory and bind mount them for the job so that the batch system performs the clean up. For HTCondor batch systems , HTCondor has this ability through MOUNT_UNDER_SCRATCH : MOUNT_UNDER_SCRATCH = $(MOUNT_UNDER_SCRATCH), <PATH TO OSG_WN_TMP> If using this method, space set aside for OSG_WN_TMP should be reallocated to the partition containing the job scratch directories. If using HTCondor, this will be the partition containing the path defined by the HTCondor EXECUTE configuration variable. For Slurm batch systems , we recommend using the Lua plugin Slurm-tmpdir alongside prolog/epilog scripts ( https://slurm.schedmd.com/prolog_epilog.html ). This method will create per job /scratch and /tmp directories which will be cleaned up after the job completes. Periodically purge the directory (e.g. tmpwatch ). Job removal grace periods Additionally, increasing the batch system grace period for job removal will give pilot jobs a better chance of cleaning up after themselves. For example, the time between scancel triggering a SIGTERM and a SIGKILL is controlled by the value of the KillWait configuration. Consider increasing this grace period scaling with the number of cores given to a pilot job as there could be more data to clean up with an increasing core count.","title":"For site administrators"},{"location":"worker-node/using-wn/#for-vo-managers","text":"Note The following advice applies to VO managers or maintainers of pilot software; end-users should contact their VO for the proper locations to stage temporary work (often, this will be either $TMPDIR or $_CONDOR_SCRATCH_DIR ). Be careful with using $OSG_WN_TMP ; at some sites, this directory might be shared with other VOs. We recommend creating a new sub-directory as a precaution: mkdir -p $OSG_WN_TMP /MYVO export mydir = ` mktemp -d -t MYVO ` cd $mydir # Run the rest of your application rm -rf $mydir The pilot should utilize $TMPDIR to communicate the location of temporary storage to payloads. A significant number of sites use the batch system to make an independent directory for each user job, and change $OSG_WN_TMP on the fly to point to this directory. There is no way to know in advance how much scratch disk space any given worker node has available; recall, what disk space is available may be shared among a number of job slots.","title":"For VO managers"}]}
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 62a7cd8309ad15ca174e4fce815d79dc85666d61..cab05ca87958cb7023f99d6ec997497109be97f4 100644
GIT binary patch
delta 15
Wcmcc5cAt$+zMF&NXvs#lNM-;kPXyNh

delta 15
Wcmcc5cAt$+zMF%iwrC?;Br^agl?06d

diff --git a/worker-node/install-wn-tarball/index.html b/worker-node/install-wn-tarball/index.html
index 07ab6335c..16671757c 100644
--- a/worker-node/install-wn-tarball/index.html
+++ b/worker-node/install-wn-tarball/index.html
@@ -1976,10 +1976,16 @@ <h1 id="installing-the-worker-node-client-via-tarball">Installing the Worker Nod
 <p>Before starting, ensure the host has <a href="../../release/supported_platforms/">a supported operating system</a>.</p>
 <h2 id="download-the-wn-client">Download the WN Client<a class="headerlink" href="#download-the-wn-client" title="Permanent link">&para;</a></h2>
 <p>Please pick the <code>osg-wn-client</code> tarball that is appropriate for your distribution and architecture. You will find them in <a href="https://repo.opensciencegrid.org/tarball-install/">https://repo.opensciencegrid.org/tarball-install/</a> .</p>
+<p>For OSG 23:</p>
+<ul>
+<li><a href="https://repo.opensciencegrid.org/tarball-install/23-main/osg-wn-client-latest.el8.x86_64.tar.gz">Binaries for RHEL8-compatible</a></li>
+<li><a href="https://repo.opensciencegrid.org/tarball-install/23-main/osg-wn-client-latest.el9.x86_64.tar.gz">Binaries for RHEL9-compatible</a></li>
+</ul>
 <p>For OSG 3.6:</p>
 <ul>
-<li><a href="https://repo.opensciencegrid.org/tarball-install/3.6/osg-wn-client-latest.el7.x86_64.tar.gz">Binaries for RHEL7</a></li>
-<li><a href="https://repo.opensciencegrid.org/tarball-install/3.6/osg-wn-client-latest.el8.x86_64.tar.gz">Binaries for RHEL8</a></li>
+<li><a href="https://repo.opensciencegrid.org/tarball-install/3.6/osg-wn-client-latest.el7.x86_64.tar.gz">Binaries for RHEL7-compatible</a></li>
+<li><a href="https://repo.opensciencegrid.org/tarball-install/3.6/osg-wn-client-latest.el8.x86_64.tar.gz">Binaries for RHEL8-compatible</a></li>
+<li><a href="https://repo.opensciencegrid.org/tarball-install/3.6/osg-wn-client-latest.el9.x86_64.tar.gz">Binaries for RHEL9-compatible</a></li>
 </ul>
 <h2 id="install-the-wn-client">Install the WN Client<a class="headerlink" href="#install-the-wn-client" title="Permanent link">&para;</a></h2>
 <ol>
@@ -1999,11 +2005,11 @@ <h2 id="install-the-wn-client">Install the WN Client<a class="headerlink" href="
 <p class="admonition-title">Warning</p>
 <p>Once <code>osg-post-install</code> is run to relocate the install, it cannot be run again.  You will need to unpack a fresh copy.</p>
 </div>
-<p>Example installation (in <code>/home/user/test-install</code>, the <strong><code>&lt;PATH_TO_CLIENT&gt;/</code></strong> is <code>/home/user/test-install/osg-wn-client</code> ):</p>
+<p>Example EL9 installation (in <code>/home/user/test-install</code>, the <strong><code>&lt;PATH_TO_CLIENT&gt;/</code></strong> is <code>/home/user/test-install/osg-wn-client</code> ):</p>
 <div class="highlight"><pre><span></span><code><span class="gp">user@host $ </span>mkdir /home/user/test-install
 <span class="gp">user@host $ </span><span class="nb">cd</span> /home/user/test-install
-<span class="gp">user@host $ </span>wget https://repo.opensciencegrid.org/tarball-install/3.6/osg-wn-client-latest.el7.x86_64.tar.gz
-<span class="gp">user@host $ </span>tar xzf osg-wn-client-latest.el7.x86_64.tar.gz
+<span class="gp">user@host $ </span>wget https://repo.opensciencegrid.org/tarball-install/23-main/osg-wn-client-latest.el9.x86_64.tar.gz
+<span class="gp">user@host $ </span>tar xzf osg-wn-client-latest.el9.x86_64.tar.gz
 <span class="gp">user@host $ </span><span class="nb">cd</span> osg-wn-client
 <span class="gp">user@host $ </span>./osg/osg-post-install
 <span class="gp">user@host $ </span><span class="nb">source</span> setup.sh
@@ -2035,7 +2041,7 @@ <h3 id="enabling-and-disabling-services">Enabling and Disabling Services<a class
 <h2 id="validating-the-worker-node-client">Validating the Worker Node Client<a class="headerlink" href="#validating-the-worker-node-client" title="Permanent link">&para;</a></h2>
 <p>To verify functionality of the worker node client, you will need to submit a test job against your CE and verify the job's output.</p>
 <ol>
-<li>Submit a job that executes the <code>env</code> command (e.g. Run <a href="https://htcondor.github.io/htcondor-ce/v6/troubleshooting/debugging-tools/#condor_ce_trace"><code>condor_ce_trace</code></a> with the <code>-d</code> flag from your HTCondor CE)</li>
+<li>Submit a job that executes the <code>env</code> command (e.g. Run <a href="https://htcondor.github.io/htcondor-ce/v23/troubleshooting/debugging-tools/#condor_ce_trace"><code>condor_ce_trace</code></a> with the <code>-d</code> flag from your HTCondor CE)</li>
 <li>Verify that the value of <code>$OSG_GRID</code> is set to the directory of your worker node client installation</li>
 </ol>
 <h2 id="how-to-get-help">How to get Help?<a class="headerlink" href="#how-to-get-help" title="Permanent link">&para;</a></h2>