-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
2f93137
commit 844bb92
Showing
22 changed files
with
1,468 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
--- | ||
title: 1. Yen-Slurm Cluster | ||
layout: page | ||
nav_order: 1 | ||
updateDate: 2024-08-29 | ||
--- | ||
|
||
# {{ page.title }} | ||
|
||
Today we will be working with the scheduled Yens. | ||
|
||
The `yen-slurm` is a computing cluster designed to give researchers the ability to run computations that require a large amount of resources without leaving the environment and filesystem of the interactive Yens. | ||
|
||
<div class="row"> | ||
<div class="col-lg-12"> | ||
<H1> </H1> | ||
</div> | ||
</div> | ||
<div class="row"> | ||
<div class="col-lg-12"> | ||
<div class="fontAwesomeStyle"><i class="fas fa-tachometer-alt"></i> Current cluster configuration</div> | ||
<iframe class="airtable-embed" src="https://airtable.com/embed/shr0XAunXoKz62Zgl?backgroundColor=purple" frameborder="0" onmousewheel="" width="100%" height="533" style="background: transparent; border: 1px solid #ccc;"></iframe> | ||
</div> | ||
<div class="col col-md-2"></div> | ||
</div> | ||
|
||
# Yen Computing Infrastructure | ||
![](../assets/images/yen-computing-infrastructure.png) | ||
|
||
The `yen-slurm` cluster has 11 nodes with over 1,500 CPU cores, 10 TB of memory, and 12 NVIDIA GPU's. | ||
|
||
## What is a Scheduler? | ||
|
||
The `yen-slurm` cluster can be accessed by the [Slurm Workload Manager](https://slurm.schedmd.com/). Researchers can submit jobs to the cluster, asking for a certain amount of resources (CPU, Memory, GPUs and Time). Slurm will then manage the queue of jobs based on what resources are available. In general, those who request less resources will see their jobs start faster than jobs requesting more resources. | ||
|
||
## Why Use a Scheduler? | ||
|
||
A job scheduler has many advantages over the directly shared environment of the yens: | ||
|
||
* Run jobs with a guaranteed amount of resources (CPU, Memory, GPUs, Time) | ||
* Setup multiple jobs to run automatically | ||
* Run jobs that exceed the community guidelines on the interactive nodes | ||
* Gold standard for using high-performance computing resources around the world | ||
|
||
## Preparing to Use a Scheduler | ||
|
||
First, you should make sure your process can run on the interactive Yen command line. | ||
|
||
Once your process is capable of running on the interactive Yen command line, you will need to create a slurm script. This script has two major components: | ||
|
||
* Metadata around your job, and the resources you are requesting | ||
* The commands necessary to run your process | ||
|
||
|
||
## Looking at Cluster Queue | ||
|
||
You can look at the current job queue by running `squeue`: | ||
|
||
```bash | ||
USER@yen4:~$ squeue | ||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | ||
1043 normal a_job user1 PD 0:00 1 (Resources) | ||
1042 normal job_2 user2 R 1:29:53 1 yen11 | ||
1041 normal bash user3 R 3:17:08 1 yen11 | ||
``` | ||
|
||
Jobs with state (ST) R are running, and PD are pending. Your job will run based on this queue. | ||
|
||
## Best Practices | ||
|
||
### Use all of the resources you request | ||
|
||
The Slurm scheduler keeps track of the resources you request, and the resources you use. Frequent under-utilization of CPU and Memory will affect your future job priority. You should be confident that your job will use all of the resources you request. It's recommended that you run your job on the interactive Yens, and monitor resource usage to make an educated guess on resource usage. | ||
|
||
### Restructure Your job into Small Tasks | ||
|
||
Small jobs start faster than big jobs. Small jobs likely finish faster too. If your job requires doing the same process many times (i.e. OCR'ing many PDFs), it will benefit you to setup your job as many small jobs. | ||
|
||
## Tips and Tricks | ||
|
||
### Current Partitions and their limits | ||
|
||
Run `sinfo` command to see available partitions: | ||
|
||
```bash | ||
$ sinfo | ||
``` | ||
|
||
You should see the following output: | ||
|
||
```bash | ||
USER@yen4:~$ sinfo | ||
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | ||
normal* up 2-00:00:00 8 idle yen[11-18] | ||
dev up 2:00:00 8 idle yen[11-18] | ||
long up 7-00:00:00 8 idle yen[11-18] | ||
gpu up 1-00:00:00 3 idle yen-gpu[1-3] | ||
``` | ||
|
||
The first column PARTITION lists all available partitions. Partitions are the logical subdivision | ||
of the `yen-slurm` cluster. The `*` denotes the default partition. | ||
|
||
The four partitions have the following limits: | ||
|
||
| Partition | CPU Limit Per User | Memory Limit | Max Memory Per CPU (default) | Time Limit (default) | | ||
| -------------- | :----------------: | :--------------------: | :----------------------------:| :-------------------:| | ||
| normal | 256 | 3 TB | 24 GB (4 GB) | 2 days (2 hours) | | ||
| dev | 2 | 48 GB | 24 GB (4 GB) | 2 hours (1 hour) | | ||
| long | 50 | 1.2 TB | 24 GB (4 GB) | 7 days (2 hours) | | ||
| gpu | 64 | 256 GB | 24 GB (4 GB) | 1 day (2 hours) | | ||
|
||
|
||
You can submit to the `dev` partition by specifying: | ||
|
||
```bash | ||
#SBATCH --partition=dev | ||
``` | ||
|
||
Or with a shorthand: | ||
|
||
```bash | ||
#SBATCH -p dev | ||
``` | ||
|
||
If you don’t specify the partition in the submission script, the job is queued in the `normal` partition. To request a particular partition, for example, `long`, specify `#SBATCH -p long` in the slurm submission script. You can specify more than one partition if the job can be run on multiple partitions (i.e. `#SBATCH -p normal,dev`). | ||
|
||
### How do I check how busy the machines are? | ||
|
||
You can pass format options to the `sinfo` command as follows: | ||
|
||
```bash | ||
USER@yen4:~$ sinfo --format="%m | %C" | ||
MEMORY | CPUS(A/I/O/T) | ||
257366+ | 268/1300/0/1568 | ||
``` | ||
|
||
where `MEMORY` outputs the minimum size of memory of the `yen-slurm` cluster node in megabytes (256 GB) and | ||
`CPUS(A/I/O/T)` prints the number of CPU's that are allocated / idle / other / total. | ||
For example, if you see `268/1300/0/1568` that means 268 CPU's are allocated, 1,300 are idle (free) out of 1,568 CPU's total. | ||
|
||
You can also run `checkyens` and look at the last line for summary of all pending and running jobs on yen-slurm. | ||
|
||
```bash | ||
USER@yen4:~$ checkyens | ||
Enter checkyens to get the current server resource loads. Updated every minute. | ||
yen1 : 2 Users | CPU [#### 20%] | Memory [#### 20%] | updated 2024-06-20-07:58:00 | ||
yen2 : 2 Users | CPU [ 0%] | Memory [## 11%] | updated 2024-06-20-07:58:01 | ||
yen3 : 2 Users | CPU [ 0%] | Memory [ 3%] | updated 2024-06-20-07:57:04 | ||
yen4 : 3 Users | CPU [#### 20%] | Memory [### 15%] | updated 2024-06-20-07:58:00 | ||
yen5 : 1 Users | CPU [ 1%] | Memory [ 3%] | updated 2024-06-20-07:58:02 | ||
yen-slurm : 11 jobs, 5 pending | 3 CPUs allocated (1%) | 100G Memory Allocated (2%) | updated 2024-06-20-07:58:02 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
--- | ||
title: 2. Python Virtual Environments | ||
layout: page | ||
nav_order: 2 | ||
updateDate: 2024-08-29 | ||
--- | ||
|
||
|
||
|
||
|
||
# {{ page.title }} | ||
|
||
Virtual environments are a foundational aspect of professional development, allowing developers to isolate and manage packages and dependencies specific to individual projects or tasks. This isolation is crucial in maintaining a clean and organized development workspace, as it prevents conflicts between packages used in different projects. Furthermore, virtual environments ensure that projects are reproducible and can be shared with others without compatibility issues, as all the necessary dependencies are clearly defined and contained within the environment. | ||
|
||
## Different Tools for Python Environment Management | ||
|
||
* [`venv`](https://docs.python.org/3/library/venv.html): built into Python 3.3 and later. (Recommended) | ||
* [`Anaconda`](https://www.anaconda.com/products/distribution): third-party tool popular in data science. | ||
* [`renv`](https://rstudio.github.io/renv/articles/renv.html): renv package helps you create reproducible environments for your R projects. | ||
|
||
|
||
|
||
## Best Practices for Environment Management | ||
|
||
1. **Creating a New Environment for Each Project**: This ensures that each project has its' own set of dependencies. | ||
|
||
2. **Documenting Dependencies**: Clearly list all dependencies in a requirements file or using a tool that automatically manages this aspect. | ||
|
||
3. **Regularly Updating Dependencies**: Keep the dependencies up-to-date to ensure the security and efficiency of your projects. | ||
|
||
|
||
## Recommendations on the Yens | ||
|
||
{: .important} | ||
We highly recommend using `venv`, Python’s built-in tool for creating virtual environments, especially in shared systems like the Yens. This recommendation is rooted in several key advantages that `venv` offers over other tools like `conda`: | ||
|
||
* **Built-in and Simple**: `venv` is included in Python's standard library, eliminating the need for third-party installations and making it straightforward to use, especially beneficial in shared systems where ease of setup and simplicity are crucial. | ||
|
||
* **Fast and Resource-Efficient**: `venv` offers quicker environment creation and is more lightweight compared to tools like `conda`, making it ideal for shared systems where speed and efficient use of resources are important. | ||
|
||
* **Ease of Reproducibility**: `venv` allows for easy replication of environments by using a `requirements.txt` file, ensuring that the code remains reproducible and consistent regardless of the platform. | ||
|
||
* **Terminal Agnostic**: `venv` allows you to work across various terminals—including JupyterHub, Linux Terminal, and Slurm—from a single unified location | ||
|
||
## Creating a New Virtual Environment with `venv` | ||
|
||
Let's navigate to a project directory: | ||
|
||
```bash | ||
$ cd <path/to/project> | ||
``` | ||
where `<path/to/project>` is the shared project location on ZFS. | ||
|
||
Create a new virtual environment: | ||
|
||
```bash | ||
$ /usr/bin/python3 -m venv venv # Note venv is a customizable name | ||
``` | ||
where we make a directory `venv` inside the project directory. | ||
|
||
## Activating a New Virtual Environment | ||
|
||
Next, we activate the virtual environment: | ||
```bash | ||
$ source venv/bin/activate | ||
``` | ||
|
||
You should see `(venv):` prepended to the prompt: | ||
```bash | ||
(venv): | ||
``` | ||
|
||
Check Python version: | ||
|
||
```bash | ||
$ which python | ||
/path/to/env/venv/bin/python | ||
``` | ||
|
||
## Installing Python Packages within the New Virtual Environment | ||
Install any python package with `pip`: | ||
|
||
```bash | ||
(venv) $ pip install <package> | ||
``` | ||
|
||
|
||
## Making the Virtual Environment into a JupyterHub Kernel | ||
Install `ipykernel` package before installing the new environment as a kernel on JupyterHub: | ||
|
||
```bash | ||
(venv) $ pip install ipykernel | ||
``` | ||
|
||
To add the **active** virtual environment as a kernel, run: | ||
```bash | ||
(venv) $ python -m ipykernel install --user --name=<kernel-name> | ||
``` | ||
where `<kernel-name>` is the name of the kernel on JupyterHub. | ||
|
||
Example | ||
```bash | ||
(venv) $ python -m ipykernel install --user --name=venv | ||
``` | ||
|
||
![](../assets/images/jupyter_venv.png) | ||
|
||
## Sharing the Environment | ||
|
||
Environments can get quite large and take up lots of space depending on the project. An easy way to share them is you share the requirements.txt file which is a list of all the libraries and versions. | ||
|
||
```bash | ||
(venv)$ pip freeze > requirements.txt | ||
``` | ||
This will be different depending on which packages you install and can help users run the code you developed using that environment. | ||
|
||
![](../assets/images/requirements.png) | ||
|
||
To then replicate an environment you need to perform the following steps: | ||
|
||
```bash | ||
$ /usr/bin/python3 -m venv new_venv # Create the new environment | ||
$ source new_venv/bin/activate # Activate the new environment | ||
(new_venv)$ pip install -r requirements.txt # Install the packages | ||
``` | ||
|
||
{: .warning} | ||
Once the virtual environments are created they SHOULD NOT be moved. This will break the environment and you may need to recreate it. | ||
|
||
|
||
### Deactivating the Virtual Environment | ||
You can deactivate the virtual environment with: | ||
``` | ||
$ deactivate | ||
``` | ||
|
||
### Removing the Virtual Environment | ||
If you would like to delete the previously created virtual enviroment, simply delete the environment directory since `venv` environment is essentially a directory containing files and folders. | ||
|
||
``` | ||
$ rm -rf venv | ||
``` | ||
|
||
If you created a Jupyter kernel you will also need to remove that with the following command from your home | ||
|
||
```bash | ||
$ jupyter kernelspec uninstall venv | ||
``` | ||
|
||
# Exercise | ||
|
||
1. Navigate to `examples/python_examples` | ||
2. Create a new virtual environment named **venv** | ||
3. Activate the environment | ||
4. Install the packages in `requirements.txt` | ||
|
||
<details> | ||
<summary>Click for answer</summary> | ||
<div class="language-bash highlighter-rouge"> | ||
<pre class="highlight"><code> | ||
<span class="nv">$ </span><span class="nb">cd examples/python_examples</span> | ||
<span class="nv">$ </span><span class="nb">/usr/bin/python3 -m venv venv</span> | ||
<span class="nv">$ </span><span class="nb">source venv/bin/activate</span> | ||
<span class="nv">(venv) $ </span><span class="nb">pip install -r requirements.txt</span> | ||
</code></pre> | ||
</div> | ||
</details> |
Oops, something went wrong.