diff --git a/docs/en_US/TrainingService/Overview.rst b/docs/en_US/TrainingService/Overview.rst index 058276227d..5f0727e8dd 100644 --- a/docs/en_US/TrainingService/Overview.rst +++ b/docs/en_US/TrainingService/Overview.rst @@ -68,3 +68,13 @@ Step 2. **Submit the first trial.** To initiate a trial, usually (in non-reuse m .. Warning:: The working directory of trial command has exactly the same content as ``codeDir``, but can have different paths (even on different machines) Local mode is the only training service that shares one ``codeDir`` across all trials. Other training services copies a ``codeDir`` from the shared copy prepared in step 1 and each trial has an independent working directory. We strongly advise users not to rely on the shared behavior in local mode, as it will make your experiments difficult to scale to other training services. Step 3. **Collect metrics.** NNI then monitors the status of trial, updates the status (e.g., from ``WAITING`` to ``RUNNING``\ , ``RUNNING`` to ``SUCCEEDED``\ ) recorded, and also collects the metrics. Currently, most training services are implemented in an "active" way, i.e., training service will call the RESTful API on NNI manager to update the metrics. Note that this usually requires the machine that runs NNI manager to be at least accessible to the worker node. + + +Training Service Under Reuse Mode +--------------------------------- + +When reuse mode is enabled, a cluster, such as a remote machine or a computer instance on AML, will launch a long-running environment, so that NNI will submit trials to these environments iteratively, which saves the time to create new jobs. For instance, using OpenPAI training platform under reuse mode can avoid the overhead of pulling docker images, creating containers, and downloading data repeatedly. + +In the reuse mode, user needs to make sure each trial can run independently in the same job (e.g., avoid loading checkpoints from previous trials). + +.. note:: Currently, only `Local <./LocalMode.rst>`__, `Remote <./RemoteMachineMode.rst>`__, `OpenPAI <./PaiMode.rst>`__ and `AML <./AMLMode.rst>`__ training services support resue mode. For Remote and OpenPAI training platforms, you can enable reuse mode according to `here <../reference/experiment_config.rst>`__ manually. AML is implemented under reuse mode, so the default mode is reuse mode, no need to manually enable. diff --git a/docs/en_US/Tutorial/ExperimentConfig.rst b/docs/en_US/Tutorial/ExperimentConfig.rst index 3bb2449076..0f32e3606c 100644 --- a/docs/en_US/Tutorial/ExperimentConfig.rst +++ b/docs/en_US/Tutorial/ExperimentConfig.rst @@ -1,5 +1,7 @@ -Experiment Config Reference -=========================== +Experiment Config Reference (legacy) +==================================== + +This is the previous version (V1) of experiment configuration specification. It is still supported for now, but we recommend users to use `the new version of experiment configuration (V2) <../reference/experiment_config.rst>`_. A config file is needed when creating an experiment. The path of the config file is provided to ``nnictl``. The config file is in YAML format. diff --git a/docs/en_US/Tutorial/HowToUseSharedStorage.rst b/docs/en_US/Tutorial/HowToUseSharedStorage.rst index 18ab42f4f0..7a19746e95 100644 --- a/docs/en_US/Tutorial/HowToUseSharedStorage.rst +++ b/docs/en_US/Tutorial/HowToUseSharedStorage.rst @@ -7,7 +7,7 @@ All the information generated by the experiment will be stored under ``/nni`` fo All the output produced by the trial will be located under ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}/nnioutput`` folder in your shared storage. This saves you from finding for experiment-related information in various places. Remember that your trial working directory is ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}``, so if you upload your data in this shared storage, you can open it like a local file in your trial code without downloading it. -And we will develop more practical features in the future based on shared storage. +And we will develop more practical features in the future based on shared storage. The config reference can be found `here <../reference/experiment_config.html#sharedstorageconfig>`_. .. note:: Shared storage is currently in the experimental stage. We suggest use AzureBlob under Ubuntu/CentOS/RHEL, and NFS under Ubuntu/CentOS/RHEL/Fedora/Debian for remote. diff --git a/docs/en_US/Tutorial/QuickStart.rst b/docs/en_US/Tutorial/QuickStart.rst index 98d4b7c4d2..07e342f297 100644 --- a/docs/en_US/Tutorial/QuickStart.rst +++ b/docs/en_US/Tutorial/QuickStart.rst @@ -149,6 +149,9 @@ Three steps to start an experiment codeDir: . gpuNum: 0 + +.. _nniignore: + .. Note:: If you are planning to use remote machines or clusters as your :doc:`training service <../TrainingService/Overview>`, to avoid too much pressure on network, we limit the number of files to 2000 and total size to 300MB. If your codeDir contains too many files, you can choose which files and subfolders should be excluded by adding a ``.nniignore`` file that works like a ``.gitignore`` file. For more details on how to write this file, see the `git documentation `__. *Example:* :githublink:`config.yml ` and :githublink:`.nniignore ` diff --git a/docs/en_US/reference.rst b/docs/en_US/reference.rst index 640bf1d387..70d410ed2c 100644 --- a/docs/en_US/reference.rst +++ b/docs/en_US/reference.rst @@ -5,12 +5,12 @@ References :maxdepth: 2 nnictl Commands - Experiment Configuration - Experiment Configuration V2 + Experiment Configuration + Experiment Configuration (legacy) Search Space NNI Annotation SDK API References Supported Framework Library - Launch from python + Launch from Python Shared Storage Tensorboard diff --git a/docs/en_US/reference/experiment_config.rst b/docs/en_US/reference/experiment_config.rst index f3779bd1aa..44b0d57c33 100644 --- a/docs/en_US/reference/experiment_config.rst +++ b/docs/en_US/reference/experiment_config.rst @@ -2,25 +2,28 @@ Experiment Config Reference =========================== -Notes -===== +A config file is needed when creating an experiment. This document describes the rules to write a config file and provides some examples. -1. This document list field names is ``camelCase``. - They need to be converted to ``snake_case`` for Python library ``nni.experiment``. +.. Note:: -2. In this document type of fields are formatted as `Python type hint `__. - Therefore JSON objects are called `dict` and arrays are called `list`. + 1. This document lists field names with ``camelCase``. If users use these fields in the pythonic way with NNI Python APIs (e.g., ``nni.experiment``), the field names should be converted to ``snake_case``. -.. _path: + 2. In this document, the type of fields are formatted as `Python type hint `_. Therefore JSON objects are called `dict` and arrays are called `list`. -3. Some fields take a path to file or directory. - Unless otherwise noted, both absolute path and relative path are supported, and ``~`` will be expanded to home directory. + .. _path: - - When written in YAML file, relative paths are relative to the directory containing that file. - - When assigned in Python code, relative paths are relative to current working directory. - - All relative paths are converted to absolute when loading YAML file into Python class, and when saving Python class to YAML file. + 3. Some fields take a path to a file or directory. Unless otherwise noted, both absolute path and relative path are supported, and ``~`` will be expanded to the home directory. -4. Setting a field to ``None`` or ``null`` is equivalent to not setting the field. + - When written in the YAML file, relative paths are relative to the directory containing that file. + - When assigned in Python code, relative paths are relative to the current working directory. + - All relative paths are converted to absolute when loading YAML file into Python class, and when saving Python class to YAML file. + + 4. Setting a field to ``None`` or ``null`` is equivalent to not setting the field. + +.. contents:: Contents + :local: + :depth: 3 + Examples ======== @@ -105,7 +108,7 @@ ExperimentConfig experimentName -------------- -Mnemonic name of the experiment. This will be shown in web UI and nnictl. +Mnemonic name of the experiment, which will be shown in WebUI and nnictl. type: ``Optional[str]`` @@ -113,11 +116,11 @@ type: ``Optional[str]`` searchSpaceFile --------------- -Path_ to a JSON file containing the search space. +Path_ to the JSON file containing the search space. type: ``Optional[str]`` -Search space format is determined by tuner. Common format for built-in tuners is documeted `here <../Tutorial/SearchSpaceSpec.rst>`__. +Search space format is determined by tuner. The common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__. Mutually exclusive to `searchSpace`_. @@ -145,6 +148,8 @@ type: ``str`` The command will be executed in bash on Linux and macOS, and in PowerShell on Windows. +Note that using ``python3`` on Linux and macOS, and using ``python`` on Windows. + trialCodeDirectory ------------------ @@ -155,8 +160,8 @@ type: ``str`` default: ``"."`` -All files in this directory will be sent to training machine, unless there is a ``.nniignore`` file. -(See nniignore section of `quick start guide <../Tutorial/QuickStart.rst>`__ for details.) +All files in this directory will be sent to the training machine, unless in the ``.nniignore`` file. +(See :ref:`nniignore ` for details.) trialConcurrency @@ -176,11 +181,11 @@ Number of GPUs used by each trial. type: ``Optional[int]`` -This field might have slightly different meaning for various training services, +This field might have slightly different meanings for various training services, especially when set to ``0`` or ``None``. -See training service's document for details. +See `training service's document <../training_services.rst>`__ for details. -In local mode, setting the field to zero will prevent trials from accessing GPU (by empty ``CUDA_VISIBLE_DEVICES``). +In local mode, setting the field to ``0`` will prevent trials from accessing GPU (by empty ``CUDA_VISIBLE_DEVICES``). And when set to ``None``, trials will be created and scheduled as if they did not use GPU, but they can still use all GPU resources if they want. @@ -196,7 +201,7 @@ format: ``number + s|m|h|d`` examples: ``"10m"``, ``"0.5h"`` -When time runs out, the experiment will stop creating trials but continue to serve web UI. +When time runs out, the experiment will stop creating trials but continue to serve WebUI. maxTrialNumber @@ -206,19 +211,19 @@ Limit the number of trials to create if specified. type: ``Optional[int]`` -When the budget runs out, the experiment will stop creating trials but continue to serve web UI. +When the budget runs out, the experiment will stop creating trials but continue to serve WebUI. nniManagerIp ------------ -IP of current machine, used by training machines to access NNI manager. Not used in local mode. +IP of the current machine, used by training machines to access NNI manager. Not used in local mode. type: ``Optional[str]`` If not specified, IPv4 address of ``eth0`` will be used. -Must be set on Windows and systems using predictable network interface name, except for local mode. +Except for the local mode, it is highly recommended to set this field manually. useAnnotation @@ -242,19 +247,19 @@ type: ``bool`` default: ``False`` -When enabled, logging will be more verbose and some internal validation will be loosen. +When enabled, logging will be more verbose and some internal validation will be loosened. logLevel -------- -Set log level of whole system. +Set log level of the whole system. type: ``Optional[str]`` values: ``"trace"``, ``"debug"``, ``"info"``, ``"warning"``, ``"error"``, ``"fatal"`` -Defaults to "info" or "debug", depending on `debug`_ option. +Defaults to "info" or "debug", depending on `debug`_ option. When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info". Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc. @@ -266,13 +271,13 @@ For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``loggin experimentWorkingDirectory -------------------------- -Specify the `directory `_ to place log, checkpoint, metadata, and other run-time stuff. +Specify the :ref:`directory ` to place log, checkpoint, metadata, and other run-time stuff. type: ``Optional[str]`` By default uses ``~/nni-experiments``. -NNI will create a subdirectory named by experiment ID, so it is safe to use same directory for multiple experiments. +NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments. tunerGpuIndices @@ -280,53 +285,67 @@ tunerGpuIndices Limit the GPUs visible to tuner, assessor, and advisor. -type: ``Optional[list[int] | str]`` +type: ``Optional[list[int] | str | int]`` This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process. -Because tuner, assessor, and advisor run in same process, this option will affect them all. +Because tuner, assessor, and advisor run in the same process, this option will affect them all. tuner ----- -Specify the tuner. +Specify the tuner. type: Optional `AlgorithmConfig`_ +The built-in tuners can be found `here <../builtin_tuner.rst>`__ and you can follow `this tutorial <../Tuner/CustomizeTuner.rst>`__ to customize a new tuner. + assessor -------- -Specify the assessor. +Specify the assessor. type: Optional `AlgorithmConfig`_ +The built-in assessors can be found `here <../builtin_assessor.rst>`__ and you can follow `this tutorial <../Assessor/CustomizeAssessor.rst>`__ to customize a new assessor. + advisor ------- -Specify the advisor. +Specify the advisor. type: Optional `AlgorithmConfig`_ +NNI provides two built-in advisors: `BOHB <../Tuner/BohbAdvisor.rst>`__ and `Hyperband <../Tuner/HyperbandAdvisor.rst>`__, and you can follow `this tutorial <../Tuner/CustomizeAdvisor.rst>`__ to customize a new advisor. + trainingService --------------- -Specify `training service <../TrainingService/Overview.rst>`__. +Specify the `training service <../TrainingService/Overview.rst>`__. type: `TrainingServiceConfig`_ +sharedStorage +------------- + +Configure the shared storage, detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__. + +type: Optional `SharedStorageConfig`_ + + AlgorithmConfig ^^^^^^^^^^^^^^^ ``AlgorithmConfig`` describes a tuner / assessor / advisor algorithm. -For custom algorithms, there are two ways to describe them: +For customized algorithms, there are two ways to describe them: - 1. `Register the algorithm <../Tuner/InstallCustomizedTuner.rst>`__ to use it like built-in. (preferred) + 1. `Register the algorithm <../Tutorial/InstallCustomizedAlgos.rst>`__ to use it like built-in. (preferred) 2. Specify code directory and class name directly. @@ -334,17 +353,17 @@ For custom algorithms, there are two ways to describe them: name ---- -Name of built-in or registered algorithm. +Name of the built-in or registered algorithm. -type: ``str`` for built-in and registered algorithm, ``None`` for other custom algorithm +type: ``str`` for the built-in and registered algorithm, ``None`` for other customized algorithms. className --------- -Qualified class name of not registered custom algorithm. +Qualified class name of not registered customized algorithm. -type: ``None`` for built-in and registered algorithm, ``str`` for other custom algorithm +type: ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms. example: ``"my_tuner.MyTuner"`` @@ -352,9 +371,9 @@ example: ``"my_tuner.MyTuner"`` codeDirectory ------------- -`Path`_ to directory containing the custom algorithm class. +`Path`_ to the directory containing the customized algorithm class. -type: ``None`` for built-in and registered algorithm, ``str`` for other custom algorithm +type: ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms. classArgs @@ -370,41 +389,51 @@ See algorithm's document for supported value. TrainingServiceConfig ^^^^^^^^^^^^^^^^^^^^^ -One of following: +One of the following: - `LocalConfig`_ - `RemoteConfig`_ -- `OpenpaiConfig `_ +- :ref:`OpenpaiConfig ` - `AmlConfig`_ +- `HybridConfig`_ -For other training services, we suggest to use `v1 config schema <../Tutorial/ExperimentConfig.rst>`_ for now. +For `Kubeflow <../TrainingService/KubeflowMode.rst>`_, `FrameworkController <../TrainingService/FrameworkControllerMode.rst>`_, and `AdaptDL <../TrainingService/AdaptDLMode.rst>`_ training platforms, it is suggested to use `v1 config schema <../Tutorial/ExperimentConfig.rst>`_ for now. LocalConfig -^^^^^^^^^^^ +----------- -Detailed `here <../TrainingService/LocalMode.rst>`__. +Detailed usage can be found `here <../TrainingService/LocalMode.rst>`__. platform --------- +"""""""" Constant string ``"local"``. useActiveGpu ------------- +"""""""""""" Specify whether NNI should submit trials to GPUs occupied by other tasks. type: ``Optional[bool]`` -Must be set when `trialGpuNumber` greater than zero. +Must be set when `trialGpuNumber`_ greater than zero. -If your are using desktop system with GUI, set this to ``True``. +Following processes can make GPU "active": + + - non-NNI CUDA programs + - graphical desktop + - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time + - other users' CUDA programs, if you are using a shared server + +If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial. + +When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously. maxTrialNumberPerGpu ---------------------- +"""""""""""""""""""" Specify how many trials can share one GPU. @@ -414,11 +443,11 @@ default: ``1`` gpuIndices ----------- +"""""""""" Limit the GPUs visible to trial processes. -type: ``Optional[list[int] | str]`` +type: ``Optional[list[int] | str | int]`` If `trialGpuNumber`_ is less than the length of this value, only a subset will be visible to each trial. @@ -426,18 +455,18 @@ This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable. RemoteConfig -^^^^^^^^^^^^ +------------ -Detailed `here <../TrainingService/RemoteMachineMode.rst>`__. +Detailed usage can be found `here <../TrainingService/RemoteMachineMode.rst>`__. platform --------- +"""""""" Constant string ``"remote"``. machineList ------------ +""""""""""" List of training machines. @@ -445,18 +474,18 @@ type: list of `RemoteMachineConfig`_ reuseMode ---------- +""""""""" -Enable reuse `mode <../Tutorial/ExperimentConfig.rst#reuse>`__. +Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__. type: ``bool`` RemoteMachineConfig -^^^^^^^^^^^^^^^^^^^ +""""""""""""""""""" host ----- +**** IP or hostname (domain name) of the machine. @@ -464,7 +493,7 @@ type: ``str`` port ----- +**** SSH service port. @@ -474,7 +503,7 @@ default: ``22`` user ----- +**** Login user name. @@ -482,7 +511,7 @@ type: ``str`` password --------- +******** Login password. @@ -492,7 +521,7 @@ If not specified, `sshKeyFile`_ will be used instead. sshKeyFile ----------- +********** `Path`_ to sshKeyFile (identity file). @@ -502,7 +531,7 @@ Only used when `password`_ is not specified. sshPassphrase -------------- +************* Passphrase of SSH identity file. @@ -510,7 +539,7 @@ type: ``Optional[str]`` useActiveGpu ------------- +************ Specify whether NNI should submit trials to GPUs occupied by other tasks. @@ -518,9 +547,22 @@ type: ``bool`` default: ``False`` +Must be set when `trialGpuNumber`_ greater than zero. + +Following processes can make GPU "active": + + - non-NNI CUDA programs + - graphical desktop + - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time + - other users' CUDA programs, if you are using a shared server + +If your remote machine is a graphical OS like Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial. + +When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously. + maxTrialNumberPerGpu --------------------- +******************** Specify how many trials can share one GPU. @@ -530,11 +572,11 @@ default: ``1`` gpuIndices ----------- +********** Limit the GPUs visible to trial processes. -type: ``Optional[list[int] | str]`` +type: ``Optional[list[int] | str | int]`` If `trialGpuNumber`_ is less than the length of this value, only a subset will be visible to each trial. @@ -542,47 +584,51 @@ This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable. pythonPath -------------------- +********** -Specify a python environment, this path will insert at the front of PATH. Here are some examples: - - (linux) pythonPath: /opt/python3.7/bin - - (windows) pythonPath: C:/Python37 - -Notice: If you are working on anacondaļ¼Œthere are some difference. You have to add "../script" and "../Library/bin" to this and separated by ";" on windows, example as below: - - (linux anaconda) pythonPath: /home/yourname/anaconda3/envs/myenv/bin/ - - (windows anaconda) pythonPath: C:/Users/yourname/.conda/envs/myenv;C:/Users/yourname/.conda/envs/myenv/Scripts;C:/Users/yourname/.conda/envs/myenv/Library/bin +Specify a Python environment. type: ``Optional[str]`` +This path will be inserted at the front of PATH. Here are some examples: + + - (linux) pythonPath: ``/opt/python3.7/bin`` + - (windows) pythonPath: ``C:/Python37`` + +If you are working on Anaconda, there is some difference. On Windows, you also have to add ``../script`` and ``../Library/bin`` separated by ``;``. Examples are as below: + + - (linux anaconda) pythonPath: ``/home/yourname/anaconda3/envs/myenv/bin/`` + - (windows anaconda) pythonPath: ``C:/Users/yourname/.conda/envs/myenv;C:/Users/yourname/.conda/envs/myenv/Scripts;C:/Users/yourname/.conda/envs/myenv/Library/bin`` + This is useful if preparing steps vary for different machines. .. _openpai-class: OpenpaiConfig -^^^^^^^^^^^^^ +------------- -Detailed `here <../TrainingService/PaiMode.rst>`__. +Detailed usage can be found `here <../TrainingService/PaiMode.rst>`__. platform --------- +"""""""" Constant string ``"openpai"``. host ----- +"""" Hostname of OpenPAI service. type: ``str`` -This may includes ``https://`` or ``http://`` prefix. +This may include ``https://`` or ``http://`` prefix. HTTPS will be used by default. username --------- +"""""""" OpenPAI user name. @@ -590,7 +636,7 @@ type: ``str`` token ------ +""""" OpenPAI user token. @@ -599,8 +645,36 @@ type: ``str`` This can be found in your OpenPAI user settings page. +trialCpuNumber +"""""""""""""" + +Specify the CPU number of each trial to be used in OpenPAI container. + +type: ``int`` + + +trialMemorySize +""""""""""""""" + +Specify the memory size of each trial to be used in OpenPAI container. + +type: ``str`` + +format: ``number + tb|gb|mb|kb`` + +examples: ``"8gb"``, ``"8192mb"`` + + +storageConfigName +""""""""""""""""" + +Specify the storage name used in OpenPAI. + +type: ``str`` + + dockerImage ------------ +""""""""""" Name and tag of docker image to run the trials. @@ -609,16 +683,16 @@ type: ``str`` default: ``"msranni/nni:latest"`` -nniManagerStorageMountPoint ---------------------------- +localStorageMountPoint +"""""""""""""""""""""" -`Mount point `_ of storage service (typically NFS) on current machine. +:ref:`Mount point ` of storage service (typically NFS) on the local machine. type: ``str`` containerStorageMountPoint --------------------------- +"""""""""""""""""""""""""" Mount point of storage service (typically NFS) in docker container. @@ -628,9 +702,9 @@ This must be an absolute path. reuseMode ---------- +""""""""" -Enable reuse `mode <../Tutorial/ExperimentConfig.rst#reuse>`__. +Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__. type: ``bool`` @@ -638,7 +712,7 @@ default: ``False`` openpaiConfig -------------- +""""""""""""" Embedded OpenPAI config file. @@ -646,29 +720,29 @@ type: ``Optional[JSON]`` openpaiConfigFile ------------------ +""""""""""""""""" `Path`_ to OpenPAI config file. type: ``Optional[str]`` -An example can be found `here `__ +An example can be found `here `__. AmlConfig -^^^^^^^^^ +--------- -Detailed `here <../TrainingService/AMLMode.rst>`__. +Detailed usage can be found `here <../TrainingService/AMLMode.rst>`__. platform --------- +"""""""" Constant string ``"aml"``. dockerImage ------------ +""""""""""" Name and tag of docker image to run the trials. @@ -678,7 +752,7 @@ default: ``"msranni/nni:latest"`` subscriptionId --------------- +"""""""""""""" Azure subscription ID. @@ -686,7 +760,7 @@ type: ``str`` resourceGroup -------------- +""""""""""""" Azure resource group name. @@ -694,7 +768,7 @@ type: ``str`` workspaceName -------------- +""""""""""""" Azure workspace name. @@ -702,8 +776,157 @@ type: ``str`` computeTarget -------------- +""""""""""""" AML compute cluster name. type: ``str`` + + +HybridConfig +------------ + +Currently only support `LocalConfig`_, `RemoteConfig`_, :ref:`OpenpaiConfig ` and `AmlConfig`_ . Detailed usage can be found `here <../TrainingService/HybridMode.rst>`__. + +type: list of `TrainingServiceConfig`_ + + +SharedStorageConfig +^^^^^^^^^^^^^^^^^^^ + +Detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__. + + +nfsConfig +--------- + +storageType +""""""""""" + +Constant string ``"NFS"``. + + +localMountPoint +""""""""""""""" + +The path that the storage has been or will be mounted in the local machine. + +type: ``str`` + +If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``. + + +remoteMountPoint +"""""""""""""""" + +The path that the storage will be mounted in the remote achine. + +type: ``str`` + +If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``. + + +localMounted +"""""""""""" + +Specify the object and status to mount the shared storage. + +type: ``str`` + +values: ``"usermount"``, ``"nnimount"``, ``"nomount"`` + +``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future. + + +nfsServer +""""""""" + +NFS server host. + +type: ``str`` + + +exportedDirectory +""""""""""""""""" + +Exported directory of NFS server, detailed `here `_. + +type: ``str`` + + +azureBlobConfig +--------------- + +storageType +""""""""""" + +Constant string ``"AzureBlob"``. + + +localMountPoint +""""""""""""""" + +The path that the storage has been or will be mounted in the local machine. + +type: ``str`` + +If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``. + + +remoteMountPoint +"""""""""""""""" + +The path that the storage will be mounted in the remote achine. + +type: ``str`` + +If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``. + +Note that the directory must be empty when using AzureBlob. + + +localMounted +"""""""""""" + +Specify the object and status to mount the shared storage. + +type: ``str`` + +values: ``"usermount"``, ``"nnimount"``, ``"nomount"`` + +``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future. + + +storageAccountName +"""""""""""""""""" + +Azure storage account name. + +type: ``str`` + + +storageAccountKey +""""""""""""""""" + +Azure storage account key. + +type: ``Optional[str]`` + +When not set storageAccountKey, should use ``az login`` with Azure CLI at first and set `resourceGroupName`_. + + +resourceGroupName +""""""""""""""""" + +Resource group that AzureBlob container belongs to. + +type: ``Optional[str]`` + +Required if ``storageAccountKey`` not set. + +containerName +""""""""""""" + +AzureBlob container name. + +type: ``str`` diff --git a/examples/trials/mnist-pytorch/.nniignore b/examples/trials/mnist-pytorch/.nniignore new file mode 100644 index 0000000000..b22de6bbe6 --- /dev/null +++ b/examples/trials/mnist-pytorch/.nniignore @@ -0,0 +1,4 @@ +# Exclude the following directory when uploading codeDir +data + +# They can also be files \ No newline at end of file