From e48aaf5349a3dc03cd019a271a2b53bcf573cafd Mon Sep 17 00:00:00 2001 From: "Kang, Harim" Date: Fri, 23 Aug 2024 17:30:40 +0900 Subject: [PATCH] Update docs --- .../algorithms/action/action_detection.rst | 47 --- .../explanation/algorithms/action/index.rst | 1 - .../source/guide/get_started/cli_commands.rst | 4 +- .../source/guide/get_started/installation.rst | 2 +- .../source/guide/tutorials/advanced/index.rst | 2 + .../advanced/low_rank_adaptation.rst | 39 +++ .../tutorials/advanced/torch_compile.rst | 41 +++ .../base/how_to_train/action_detection.rst | 275 ------------------ .../tutorials/base/how_to_train/index.rst | 8 - 9 files changed, 85 insertions(+), 334 deletions(-) delete mode 100644 docs/source/guide/explanation/algorithms/action/action_detection.rst create mode 100644 docs/source/guide/tutorials/advanced/low_rank_adaptation.rst create mode 100644 docs/source/guide/tutorials/advanced/torch_compile.rst delete mode 100644 docs/source/guide/tutorials/base/how_to_train/action_detection.rst diff --git a/docs/source/guide/explanation/algorithms/action/action_detection.rst b/docs/source/guide/explanation/algorithms/action/action_detection.rst deleted file mode 100644 index 1eae4fae0d3..00000000000 --- a/docs/source/guide/explanation/algorithms/action/action_detection.rst +++ /dev/null @@ -1,47 +0,0 @@ -Action Detection -================ - -Sptio-Temporal action detection is the problem of localizing the actor(spatial detection) and action(temporal detection). We solve this problem by combining 3D action classification backbone and 2D object detection model. We can combine these two models in several ways. Currently, we support the simplest way. The other ways will be supported in near future. - -X3D + Fast-RCNN architecture comes from `X3D paper `_. This model requires pre-computed actor proposals. Actor pre-proposals can be obtained from `COCO `_ pre-trained 2D object detector (e.g. `Faster-RCNN `_, `ATSS `_). If the custom dataset requires finetuning of 2d object detector, please refer :doc:`otx.algorithms.detection <../object_detection/object_detection>`. Region-of-interest (RoI) features are extracted at the last feature map of X3D by extending a 2D proposal at a keyframe into a 3D RoI by replicating it along the temporal axis. The RoI features fed into the roi head of Fast-RCNN. - -For better transfer learning we use the following algorithm components: - -- ``Augmentations``: We use only random crop and random flip for the training pipeline - -- ``Optimizer``: We use `SGD `_ optimizer with the weight decay set to **1e-4** and momentum set to **0.9**. - -- ``Loss functions``: For the multi-label case binary cross entropy loss is used. In the other case, `Cross Entropy Loss `_ is used for the categories classification. - -************** -Dataset Format -************** - -We support the popular action classification formats, `AVA dataset `_ format. - - -****** -Models -****** - -We support the following ready-to-use model recipes for transfer learning: - -+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+---------------------+-------------------------+ -| Recipe ID | Name | Complexity (GFLOPs) | Model size (MB) | -+=========================================================================================================================================================================================+===============+=====================+=========================+ -| `Custom_Action_Detection_X3D_FAST_RCNN `_ | x3d_fast_rcnn | 13.04 | 8.32 | -+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+---------------------+-------------------------+ - -To see which models are available for the task, the following command can be executed: - -.. code-block:: shell - - (otx) ...$ otx find --task ACTION_DETECTION - -In the table below the **mAP** on some academic datasets are presented. Each model is trained using `Kinetics-400 `_ pre-trained weight with single Nvidia GeForce RTX3090. - -+----------------+-------+-----------+ -| Model name | JHMDB | UCF101-24 | -+================+=======+===========+ -| x3d_fast_rcnn | 92.14 | 80.7 | -+----------------+-------+-----------+ diff --git a/docs/source/guide/explanation/algorithms/action/index.rst b/docs/source/guide/explanation/algorithms/action/index.rst index c2965183809..1d5de506b5a 100644 --- a/docs/source/guide/explanation/algorithms/action/index.rst +++ b/docs/source/guide/explanation/algorithms/action/index.rst @@ -6,4 +6,3 @@ Action Recognition action_classification - action_detection diff --git a/docs/source/guide/get_started/cli_commands.rst b/docs/source/guide/get_started/cli_commands.rst index a7b03590e60..7231c6d0dd7 100644 --- a/docs/source/guide/get_started/cli_commands.rst +++ b/docs/source/guide/get_started/cli_commands.rst @@ -339,11 +339,11 @@ The results will be saved in ``./otx-workspace/`` folder by default. The output (otx) ...$ otx train --model --task --data_root - For example, if you want to use the ``otx.algo.detection.atss.ATSS`` model class, you can train it as shown below. + For example, if you want to use the ``otx.algo.classification.torchvision_model.TVModelForMulticlassCls`` model class, you can train it as shown below. .. code-block:: shell - (otx) ...$ otx train --model otx.algo.detection.atss.ATSS --model.variant mobilenetv2 --task DETECTION ... + (otx) ...$ otx train --model otx.algo.classification.torchvision_model.TVModelForMulticlassCls --model.backbone mobilenet_v3_small ... .. note:: You also can visualize the training using ``Tensorboard`` as these logs are located in ``/tensorboard``. diff --git a/docs/source/guide/get_started/installation.rst b/docs/source/guide/get_started/installation.rst index 3ab889bbb57..94de82e29a3 100644 --- a/docs/source/guide/get_started/installation.rst +++ b/docs/source/guide/get_started/installation.rst @@ -68,7 +68,7 @@ according to your system environment. .. note:: - Currently, only torch==2.1.1 was fully validated. (older versions are not supported due to security issues). + Currently, only torch==2.2 was fully validated. (older versions are not supported due to security issues). 3. Once the package is installed in the virtual environment, you can use full diff --git a/docs/source/guide/tutorials/advanced/index.rst b/docs/source/guide/tutorials/advanced/index.rst index 8524b3a8200..a4b69cdacc8 100644 --- a/docs/source/guide/tutorials/advanced/index.rst +++ b/docs/source/guide/tutorials/advanced/index.rst @@ -8,5 +8,7 @@ Advanced Tutorials semi_supervised_learning huggingface_model multi_gpu + low_rank_adaptation + torch_compile .. Once we have enough material, we might need to categorize these into `data`, `model learning` sections. \ No newline at end of file diff --git a/docs/source/guide/tutorials/advanced/low_rank_adaptation.rst b/docs/source/guide/tutorials/advanced/low_rank_adaptation.rst new file mode 100644 index 00000000000..06b31d2b7b4 --- /dev/null +++ b/docs/source/guide/tutorials/advanced/low_rank_adaptation.rst @@ -0,0 +1,39 @@ +LoRA: Low Rank Adaptation for Classification Tasks +=================================================== + +.. note:: + + LoRA is only supported for VisionTransformer models. + See the model in otx.algo.classification.vit. + +Overview +-------- + +OpenVINO™ Training Extensions now supports Low Rank Adaptation (LoRA) for classification tasks using Transformer models. +LoRA is a parameter-efficient approach to adapt pre-trained models by introducing low-rank matrices that capture important adaptations without the need to retrain the entire model. + +Benefits of LoRA +---------------- + +- **Efficiency**: LoRA allows for efficient adaptation of large pre-trained models with minimal additional parameters. +- **Performance**: By focusing on key parameters, LoRA can achieve competitive performance with less computational overhead. +- **Flexibility**: LoRA can be applied to various parts of the transformer model, providing flexibility in model tuning. + +How to Use LoRA in OpenVINO™ Training Extensions +------------------------------------------------ + +.. tab-set:: + + .. tab-item:: API + + .. code-block:: python + + from otx.algo.classification.vit import VisionTransformerForMulticlassCls + + model = VisionTransformerForMulticlassCls(..., lora=True) + + .. tab-item:: CLI + + .. code-block:: bash + + (otx) ...$ otx train ... --model.lora True diff --git a/docs/source/guide/tutorials/advanced/torch_compile.rst b/docs/source/guide/tutorials/advanced/torch_compile.rst new file mode 100644 index 00000000000..97a4f325923 --- /dev/null +++ b/docs/source/guide/tutorials/advanced/torch_compile.rst @@ -0,0 +1,41 @@ +[BETA] Enable torch.compile +============================ + +.. warning:: + Not currently supported on all models. + As far as we check, it is available for Classification Task models and some segmentation models. + We will continue to optimize this and do not guarantee performance for now. + +Overview +-------- + +OpenVINO™ Training Extensions now integrates the `torch.compile` feature from PyTorch, allowing users to optimize their models for better performance. +This feature compiles the model's operations into optimized lower-level code, which can significantly improve execution speed and reduce memory usage. + +Benefits of torch.compile +------------------------- + +- **Performance Optimization**: Compiled models run faster by executing optimized low-level operations. +- **Reduced Memory Footprint**: Optimized models can use less memory, which is beneficial for deploying models on resource-constrained devices. +For more information on the benefits of `torch.compile`, refer to the official `PyTorch documentation `_. + +How to Use torch.compile in OpenVINO™ Training Extensions +---------------------------------------------------------- + +**Prepare OTXModel**: Ensure that model is compatible with `torch.compile`. When building the model, give the `torch_compile` option `True`. + +.. tab-set:: + + .. tab-item:: API + + .. code-block:: python + + from otx.algo.classification.vit import VisionTransformerForMulticlassCls + + model = VisionTransformerForMulticlassCls(..., torch_compile=True) + + .. tab-item:: CLI + + .. code-block:: bash + + (otx) ...$ otx train ... --model.torch_compile True diff --git a/docs/source/guide/tutorials/base/how_to_train/action_detection.rst b/docs/source/guide/tutorials/base/how_to_train/action_detection.rst deleted file mode 100644 index 7707e803e62..00000000000 --- a/docs/source/guide/tutorials/base/how_to_train/action_detection.rst +++ /dev/null @@ -1,275 +0,0 @@ -Action Detection model -================================ - -This live example shows how to easily train and validate for spatio-temporal action detection model on the subset of `JHMDB `_. -To learn more about Action Detection task, refer to :doc:`../../../explanation/algorithms/action/action_detection`. - -.. note:: - - To learn deeper how to manage training process of the model including additional parameters and its modification, refer to :doc:`./detection`. - -The process has been tested on the following configuration. - -- Ubuntu 20.04 -- NVIDIA GeForce RTX 3090 -- Intel(R) Core(TM) i9-10980XE -- CUDA Toolkit 11.1 - -************************* -Setup virtual environment -************************* - -1. You can follow the installation process from a :doc:`quick start guide <../../../get_started/installation>` -to create a universal virtual environment for OpenVINO™ Training Extensions. - -2. Activate your virtual -environment: - -.. code-block:: - - .otx/bin/activate - # or by this line, if you created an environment, using tox - . venv/otx/bin/activate - - -*************************** -Dataset preparation -*************************** - -For action detection task, you need to prepare dataset whose format is `AVA `_ dataset. -For easy beginning, we provide `sample dataset `_ - -If you download data from link and extract to ``training_extensions/data`` folder(you should make data folder at first), you can see the structure below: - -.. code-block:: - - training_extensions - └── data - └── JHMDB_10% - ├── annotations - │ └── ava_action_list_v2.2.pbtxt - │ └── ava_test.csv - │ └── ava_train.csv - │ └── ava_val.csv - │ └── test.pkl - │ └── train.pkl - │ └── val.pkl - │ - └── frames - │── train_video001 - │ └── train_video001_0001.jpg - └── test_video001 - └── test_video001_0001.jpg - - - -********* -Training -********* - -1. First of all, you need to choose which action detection model you want to train. -The list of supported recipes for action detection is available with the command line below: - -.. note:: - - The characteristics and detailed comparison of the models could be found in :doc:`Explanation section <../../../explanation/algorithms/action/action_detection>`. - -.. code-block:: - - (otx) ...$ otx find --task ACTION_DETECTION - - +-----------------------+--------------------------------------+---------------------------------------------------------------------------------+ - | TASK | Model Name | Recipe PATH | - +-----------------------+--------------------------------------+---------------------------------------------------------------------------------+ - | ACTION_DETECTION | x3d_fast_rcnn | ../otx/recipe/action/action_detection/x3d_fast_rcnn.yaml | - +-----------------------+--------------------------------------+---------------------------------------------------------------------------------+ - -To have a specific example in this tutorial, all commands will be run on the X3D_FAST_RCNN model. It's a light model, that achieves competitive accuracy while keeping the inference fast. - -2. ``otx train`` trains a model (a particular model template) -on a dataset and results: - -Here are the main outputs can expect with CLI: -- ``{work_dir}/{timestamp}/checkpoints/epoch_*.ckpt`` - a model checkpoint file. -- ``{work_dir}/{timestamp}/configs.yaml`` - The configuration file used in the training can be reused to reproduce the training. -- ``{work_dir}/.latest`` - The results of each of the most recently executed subcommands are soft-linked. This allows you to skip checkpoints and config file entry as a workspace. - -.. tab-set:: - - .. tab-item:: CLI (auto-config) - - .. code-block:: shell - - (otx) ...$ otx train --data_root data/JHMDB_10% - - .. tab-item:: CLI (with config) - - .. code-block:: shell - - (otx) ...$ otx train --config src/otx/recipe/action/action_detection/x3d_fast_rcnn.yaml --data_root data/JHMDB_10% - - .. tab-item:: API (from_config) - - .. code-block:: python - - from otx.engine import Engine - - data_root = "data/JHMDB_10%" - recipe = "src/otx/recipe/action/action_detection/x3d_fast_rcnn.yaml" - - engine = Engine.from_config( - config_path=recipe, - data_root=data_root, - work_dir="otx-workspace", - ) - - engine.train(...) - - .. tab-item:: API - - .. code-block:: python - - from otx.engine import Engine - - data_root = "data/JHMDB_10%" - - engine = Engine( - model="x3d", - data_root=data_root, - work_dir="otx-workspace", - ) - - engine.train(...) - - -3. ``(Optional)`` Additionally, we can tune training parameters such as batch size, learning rate, patience epochs or warm-up iterations. -Learn more about specific parameters using ``otx train --help -v`` or ``otx train --help -vv``. - -For example, to decrease the batch size to 4, fix the number of epochs to 100, extend the command line above with the following line. - -.. tab-set:: - - .. tab-item:: CLI - - .. code-block:: shell - - (otx) ...$ otx train ... --data.train_subset.batch_size 4 \ - --max_epochs 100 - - .. tab-item:: API - - .. code-block:: python - - from otx.core.config.data import SubsetConfig - from otx.core.data.module import OTXDataModule - from otx.engine import Engine - - datamodule = OTXDataModule(..., train_subset=SubsetConfig(..., batch_size=4)) - - engine = Engine(..., datamodule=datamodule) - - engine.train(max_epochs=100) - - -4. The training result ``checkpoints/*.ckpt`` file is located in ``{work_dir}`` folder, -while training logs can be found in the ``{work_dir}/{timestamp}`` dir. - -.. note:: - We also can visualize the training using ``Tensorboard`` as these logs are located in ``{work_dir}/{timestamp}/tensorboard``. - -.. code-block:: - - otx-workspace - ├── 20240403_134256/ - ├── csv/ - ├── checkpoints/ - | └── epoch_*.pth - ├── tensorboard/ - └── configs.yaml - └── .latest - └── train/ - ... - -The training time highly relies on the hardware characteristics, for example on 1 NVIDIA GeForce RTX 3090 the training took about 3 minutes. - -After that, we have the PyTorch object detection model trained with OpenVINO™ Training Extensions, which we can use for evaluation, export, optimization and deployment. - -*********** -Evaluation -*********** - -1. ``otx test`` runs evaluation of a -trained model on a particular dataset. - -Test function receives test annotation information and model snapshot, trained in previous step. - -The default metric is mAP_50 measure. - -2. That's how we can evaluate the snapshot in ``otx-workspace`` -folder on JHMDB_10% dataset and save results to ``otx-workspace``: - -.. tab-set:: - - .. tab-item:: CLI (with work_dir) - - .. code-block:: shell - - (otx) ...$ otx test --work_dir otx-workspace - ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - ┃ Test metric ┃ DataLoader 0 ┃ - ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ - │ test/data_time │ 0.006367621477693319 │ - │ test/iter_time │ 0.02698644995689392 │ - │ test/map │ 0.10247182101011276 │ - │ test/map_50 │ 0.3779516816139221 │ - │ test/map_75 │ 0.03639398142695427 │ - │ test/map_large │ 0.11831618845462799 │ - │ test/map_medium │ 0.02958027645945549 │ - │ test/map_per_class │ -1.0 │ - │ test/map_small │ 0.0 │ - │ test/mar_1 │ 0.12753313779830933 │ - │ test/mar_10 │ 0.1305265873670578 │ - │ test/mar_100 │ 0.1305265873670578 │ - │ test/mar_100_per_class │ -1.0 │ - │ test/mar_large │ 0.14978596568107605 │ - │ test/mar_medium │ 0.06217033043503761 │ - │ test/mar_small │ 0.0 │ - └───────────────────────────┴───────────────────────────┘ - - .. tab-item:: CLI (with config) - - .. code-block:: shell - - (otx) ...$ otx test --config src/otx/recipe/action/action_detection/x3d_fast_rcnn.yaml \ - --data_root data/JHMDB_10% \ - --checkpoint otx-workspace/20240312_051135/checkpoints/epoch_033.ckpt - ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - ┃ Test metric ┃ DataLoader 0 ┃ - ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ - │ test/data_time │ 0.006367621477693319 │ - │ test/iter_time │ 0.02698644995689392 │ - │ test/map │ 0.10247182101011276 │ - │ test/map_50 │ 0.3779516816139221 │ - │ test/map_75 │ 0.03639398142695427 │ - │ test/map_large │ 0.11831618845462799 │ - │ test/map_medium │ 0.02958027645945549 │ - │ test/map_per_class │ -1.0 │ - │ test/map_small │ 0.0 │ - │ test/mar_1 │ 0.12753313779830933 │ - │ test/mar_10 │ 0.1305265873670578 │ - │ test/mar_100 │ 0.1305265873670578 │ - │ test/mar_100_per_class │ -1.0 │ - │ test/mar_large │ 0.14978596568107605 │ - │ test/mar_medium │ 0.06217033043503761 │ - │ test/mar_small │ 0.0 │ - └───────────────────────────┴───────────────────────────┘ - - .. tab-item:: API - - .. code-block:: python - - engine.test() - - -3. The output of ``{work_dir}/{timestamp}/csv/version_0/metrics.csv`` consists of -a dict with target metric name and its value. \ No newline at end of file diff --git a/docs/source/guide/tutorials/base/how_to_train/index.rst b/docs/source/guide/tutorials/base/how_to_train/index.rst index 7d224cb46cf..87c673cfd3f 100644 --- a/docs/source/guide/tutorials/base/how_to_train/index.rst +++ b/docs/source/guide/tutorials/base/how_to_train/index.rst @@ -47,13 +47,6 @@ Training to deployment tutorials Learn how to train an action classification model - .. grid-item-card:: Action Detection - :link: action_detection - :link-type: doc - :text-align: center - - Learn how to train an action detection model - .. grid-item-card:: Visual Prompting :link: visual_prompting :link-type: doc @@ -71,5 +64,4 @@ Training to deployment tutorials semantic_segmentation anomaly_detection action_classification - action_detection visual_prompting