diff --git a/docs/source/examples/spot-jobs.rst b/docs/source/examples/spot-jobs.rst index b498c82ea6c..43cfb9eaefa 100644 --- a/docs/source/examples/spot-jobs.rst +++ b/docs/source/examples/spot-jobs.rst @@ -106,6 +106,8 @@ The :code:`MOUNT` mode in :ref:`SkyPilot Storage ` ensures the chec Note that the application code should save program checkpoints periodically and reload those states when the job is restarted. This is typically achieved by reloading the latest checkpoint at the beginning of your program. +.. _spot-jobs-end-to-end: + An end-to-end example --------------------- diff --git a/docs/source/reference/yaml-spec.rst b/docs/source/reference/yaml-spec.rst index a5749d66601..ea4a60ae21a 100644 --- a/docs/source/reference/yaml-spec.rst +++ b/docs/source/reference/yaml-spec.rst @@ -112,7 +112,7 @@ Available fields: # image_id: skypilot:k80-ubuntu-2004 # image_id: skypilot:gpu-ubuntu-1804 # image_id: skypilot:k80-ubuntu-1804 - # It is also possible to specify a per-region image id (failover will only go through the regions sepcified as keys; + # It is also possible to specify a per-region image id (failover will only go through the regions sepcified as keys; # useful when you have the custom images in multiple regions): # image_id: # us-east-1: ami-0729d913a335efca7 @@ -132,6 +132,16 @@ Available fields: # To use a more limited but easier to manage tool: # https://github.com/IBM/vpc-img-inst + # Environment variables (optional). These values can be accessed in the + # `file_mounts`, `setup`, and `run` sections below. + # + # Values set here can be overridden by a CLI flag: + # `sky launch/exec --env ENV=val` (if ENV is present). + envs: + MY_BUCKET: skypilot-temp-gcs-test + MY_LOCAL_PATH: tmp-workdir + MODEL_SIZE: 13b + file_mounts: # Uses rsync to sync local files/directories to all nodes of the cluster. # @@ -156,6 +166,12 @@ Available fields: # Copies a cloud object store URI to the cluster. Can be private buckets. /datasets-s3: s3://my-awesome-dataset + # Demoing env var usage. + /checkpoint/${MODEL_SIZE}: ~/${MY_LOCAL_PATH} + /mydir: + name: ${MY_BUCKET} # Name of the bucket. + mode: MOUNT + # Setup script (optional) to execute on every `sky launch`. # This is executed before the 'run' commands. # @@ -170,3 +186,6 @@ Available fields: run: | echo "Beginning task." python train.py + + # Demoing env var usage. + echo Env var MODEL_SIZE has value: ${MODEL_SIZE} diff --git a/docs/source/running-jobs/environment-variables.rst b/docs/source/running-jobs/environment-variables.rst new file mode 100644 index 00000000000..0d4d97e504b --- /dev/null +++ b/docs/source/running-jobs/environment-variables.rst @@ -0,0 +1,107 @@ + +.. _env-vars: + +Using Environment Variables +================================================ + +User-specified environment variables +------------------------------------------------------------------ + +You can specify environment variables to be made available to a task in two ways: + +- The ``envs`` field (dict) in a :ref:`task YAML ` +- The ``--env`` flag in the ``sky launch/exec`` :ref:`CLI ` (takes precedence over the above) + +The ``file_mounts``, ``setup``, and ``run`` sections of a task YAML can access the variables via the ``${MYVAR}`` syntax. + +Using in ``file_mounts`` +~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: yaml + + # Sets default values for some variables; can be overridden by --env. + envs: + MY_BUCKET: skypilot-temp-gcs-test + MY_LOCAL_PATH: tmp-workdir + MODEL_SIZE: 13b + + file_mounts: + /mydir: + name: ${MY_BUCKET} # Name of the bucket. + mode: MOUNT + + /another-dir2: + name: ${MY_BUCKET}-2 + source: ["~/${MY_LOCAL_PATH}"] + + /checkpoint/${MODEL_SIZE}: ~/${MY_LOCAL_PATH} + +The values of these variables are filled in by SkyPilot at task YAML parse time. + +Read more at `examples/using_file_mounts_with_env_vars.yaml `_. + +Using in ``setup`` and ``run`` +~~~~~~~~~~~~~~~~~~~~~~~~ + +All user-specified environment variables are exported to a task's ``setup`` and ``run`` commands (i.e., accessible when they are being run). + +For example, this is useful for passing secrets to the task (see below). + +Passing secrets +~~~~~~~~~~~~~~~~~~~~~~~~ + +We recommend passing secrets to any node(s) executing your task by first making +it available in your current shell, then using ``--env`` to pass it to SkyPilot: + +.. code-block:: console + + $ sky launch -c mycluster --env WANDB_API_KEY task.yaml + $ sky exec mycluster --env WANDB_API_KEY task.yaml + +.. tip:: + + In other words, you do not need to pass the value directly such as ``--env + WANDB_API_KEY=1234``. + + + + + +SkyPilot environment variables +------------------------------------------------------------------ + +SkyPilot exports these environment variables for a task's execution (while ``run`` commands are running): + +.. list-table:: + :widths: 20 70 10 + :header-rows: 1 + + * - Name + - Definition + - Example + * - ``SKYPILOT_NODE_RANK`` + - Rank (an integer ID from 0 to :code:`num_nodes-1`) of the node executing the task. Read more :ref:`here `. + - 0 + * - ``SKYPILOT_NODE_IPS`` + - A string of IP addresses of the nodes reserved to execute the task, where each line contains one IP address. Read more :ref:`here `. + - 1.2.3.4 + * - ``SKYPILOT_NUM_GPUS_PER_NODE`` + - Number of GPUs reserved on each node to execute the task; the same as the + count in ``accelerators: :`` (rounded up if a fraction). Read + more :ref:`here `. + - 0 + * - ``SKYPILOT_TASK_ID`` + - A unique ID assigned to each task. + Useful for logging purposes: e.g., use a unique output path on the cluster; pass to Weights & Biases; etc. + + If a task is run as a :ref:`managed spot job `, then all + recoveries of that job will have the same ID value. Read more :ref:`here `. + - sky-2023-07-06-21-18-31-563597_myclus_id-1 + +The values of these variables are filled in by SkyPilot at task execution time. + +You can access these variables in the following ways: + +* In the task YAML's ``run`` commands (a Bash script), access them using the ``${MYVAR}`` syntax; +* In the program(s) launched in ``run``, access them using the + language's standard method (e.g., ``os.environ`` for Python). diff --git a/docs/source/running-jobs/index.rst b/docs/source/running-jobs/index.rst index 1f0ffa9e752..0f7541b35d9 100644 --- a/docs/source/running-jobs/index.rst +++ b/docs/source/running-jobs/index.rst @@ -5,3 +5,4 @@ More User Guides distributed-jobs grid-search + environment-variables