Add support of the Intel dGPUs devices (#3058) · openvinotoolkit/training_extensions@ccbfac1

Commit

Add support of the Intel dGPUs devices (#3058)

* POC: xpu support (#2553)

* Upgrade torch==2.0.1 torchvision==0.15.2 mmcv-full=1.7.1

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Rollback to mmcv-full==1.7.0 due to mmaction conflict

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix unit test

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update cls to run on xpu

* Enable XPU for detection

* Disable FP16 for detection models

* Add xpu data parallel to core utils

* Del copys of XPUDP

* Update sseg XPU support

* Update XPUDP

* added fix for OD and IS. Not debugged

* Upgrade torch==2.0.1 torchvision==0.15.2 mmcv-full=1.7.1

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Rollback to mmcv-full==1.7.0 due to mmaction conflict

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix unit test

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update cls to run on xpu

* Enable XPU for detection

* Disable FP16 for detection models

* Add xpu data parallel to core utils

* Del copys of XPUDP

* Update sseg XPU support

* Update XPUDP

* Fix import of assigner

* Add mmdet ops patching to inference

* Add xpu seed

* Fix linters

* Cleanup in cls train runner

* Cleanup in segm train function

* Fix linters

* Fix linters

* Cleanup

* Update NMS patch

* Use GPU id to move model on target XPU

* Disable fp16 for swinT

* Add XPU device info to OTX env report

* Fix linters

* Update new inits

* Update autocast

* Update fallback to CPU

* Update target device in XPU data parallel

* Revert "Update autocast"

This reverts commit 0ff53ebccd2007fc38e387919e306d476a695635.

* Revert "Patch to enable FP16 on XPU"

This reverts commit fd91f6744f238546c92ae46a7d193369aebc17b4.

* FP16 support for XPU

* Enable FP 16

* Patch to enable FP16 on XPU

* Fix linters

---------

Signed-off-by: Songki Choi <songki.choi@intel.com>
Co-authored-by: Songki Choi <songki.choi@intel.com>
Co-authored-by: kprokofi <kirill.prokofiev@intel.com>

* Fix tv nms check (#2582)

* POC: HPU support (#2574)

* Add hpu.is_available

* Add hpu in `configure_device`

* Add hpu optimizers

* Update `train_model`

* Add `HPUDataParallel`

* Fix

* Enable HPUOptimizerHooks

* Enable autocast

* Update

* (WIP) Enable dataloader

* (tmp) Disenable habana loader

* Fix to convert bfloat16 tensor to float32 for numpy

* Update `htcore.mark_step` location

* Move setting optimizer hooks into configurer

* Fix to enable bf16

* (WIP) Enable gpu migration

* enable ocr_lite_hrnet_18_mod2 training

* register hpu optimizer once

* detch seg logit for calculating acc only when hpu is used

* change hpu_optimizer path

* cache whether hpu_available

* align with other args

* move hpu_migration into otx.algo.common.util init file

* Refactoring

* move the code changing adam to adamw into mmseg.apis.train.py

* move the code model.to in mmseg.api.train.py

* use use_autocast properly

* remove HPUDistOptimizerHook

* precommit

* Fix unit test

* Use `self.src_device_obj`

* Add function for repeated pattern

* precommit

---------

Co-authored-by: Shin, Eunwoo <eunwoo.shin@intel.com>

* Merge develop & fix YOLOX inference in BFP16 on XPU (#2602)

Fix yolox with bfp16 dtype on xpu

* Object Detection with Gaudi2 (#2608)

* added support for OD on habana

* added hpu_opt

* added OD support.

* optimize a bit YOLOX. Now, inference is fast. Training still freezes

* SSD, ATSS e2e training

* stabilize mask rcnn a bit

* don't put gt on hpu for OD

* minor fix

* Enable e2e training for Instance Segmentation.

* clean the code stage 1

* clean code 2

* fix pre-commit

* minor

* change cast of bf16

* reply comments

* align with pre-commit

---------

Co-authored-by: eunwoosh <eunwoo.shin@intel.com>

* Update XPU detection (#2623)

* Add `ModuleCacher` for HPU graphs (#2624)

* Add `ModuleCacher` on classification and segmentation

* Fix

* Update Intel devices branch to the latest develop state (#2666)

* Update base.txt

updated dependency version of datumaro

* Update __init__.py

update version string

* Update requirements.txt

* Temporarily skip visual prompting openvino integration test (#2323)

* Fix import dm.DatasetSubset (#2324)

Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>

* Fix semantic segmentation soft prediction dtype (#2322)

* Fix semantic segmentation soft prediction dtype

* relax ref sal vals check

---------

Co-authored-by: Songki Choi <songki.choi@intel.com>

* Contrain yapf verison lesser than 0.40.0 (#2328)

contrain_yapf_version

* Fix detection e2e tests (#2327)

Fix for detection

* Mergeback: Label addtion/deletion 1.2.4 --> 1.4.0 (#2326)

* Make black happy

* Fix conflicts

* Merge-back: add test datasets and edit the test code

* Make black happy

* Fix mis-merge

* Make balck happy

* Fix typo

* Fix typoi

---------

Co-authored-by: Songki Choi <songki.choi@intel.com>

* Bump datumaro up to 1.4.0rc2 (#2332)

bump datumaro up to 1.4.0rc2

* Tiling Doc for releases 1.4.0 (#2333)

* Add tiling documentation

* Bump otx version to 1.4.0rc2 (#2341)

* OTX deploy for visual prompting task  (#2311)

* Enable `otx deploy`

* (WIP) integration test

* Docstring

* Update args for create_model

* Manually set image embedding layout

* Enable to use model api for preprocessing
- `fit_to_window` doesn't work expectedly, so newly implemented `VisualPromptingOpenvinoAdapter` to use new resize function

* Remove skipped test

* Updated

* Update unit tests on model wrappers

* Update

* Update configuration

* Fix not to patch pretrained path

* pylint & update model api version in docstring

---------

Co-authored-by: Wonju Lee <wonju.lee@intel.com>

* Bump albumentations version in anomaly requirements (#2350)

increment albumentations version

* Update action detection (#2346)

* Remove skip mark for PTQ test of action detection

* Update action detection documentation

* Fix e2e (#2348)

* Change classification dataset from dummy to toy

* Revert test changes

* Change label name for multilabel dataset

* Revert e2e test changes

* Change ov test cases' threshold

* Add parent's label

* Update ModelAPI in 1.4 release (#2347)

* Upgrade model API

* Update otx in exportable code

* Fix unit tests

* Fix black

* Fix detection inference

* Fix det tiling

* Fix mypy

* Fix demo

* Fix visualizer in demo

* Fix black

* Add OTX optimize for visual prompting task (#2318)

* Initial commit

* Update block

* (WIP) otx optimize

* Fix

* WIP

* Update configs & exported outputs

* Remove unused modules for torch

* Add unit tests

* pre-commit

* Update CHANGELOG

* Update detection docs (#2335)

* Update detection docs

* Revert template id changes

* Fix wrong template id

* Update docs/source/guide/explanation/algorithms/object_detection/object_detection.rst

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Update docs/source/guide/explanation/algorithms/object_detection/object_detection.rst

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

---------

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Add visual prompting documentation (#2354)

* (WIP) write docs

* Add visual prompting documentation

* Update CHANGELOG

---------

Co-authored-by: sungchul.kim <sungchul@ikvensx010>

* Remove custom modelapi patch in visual prompting (#2359)

* Remove custom modelapi patch

* Update test

* Fix graph metric order and label issues (#2356)

* Fix graph metric going backward issue
* Add license notice
* Fix pre-commit issue
* Add rename items & logic for metric
---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update multi-label document and conversion script (#2358)

Update docs, label convert script

* Update third party programs (#2365)

* Make anomaly task compatible with older albumentations versions (#2363)

* fix transforms export in metadata

* wrap transform dict

* add todo for updating to_dict call

* Fixing detection saliency map for one class case (#2368)

* fix softmax

* fix validity tests

* Add e2e test for visual prompting (#2360)

* (WIP) otx optimize

* pre-commit

* (WIP) set e2e

* Remove nncf config

* Add visual prompting requirement

* Add visual prompting in tox

* Add visual prompting in setup.py

* Fix typo

* Delete unused configuration.yaml

* Edit test_name

* Add to limit activation range

* Update from `vp` to `visprompt`

* Fix about no returning the first label

* pre-commit

* (WIP) otx optimize

* pre-commit

* (WIP) set e2e

* Remove nncf config

* Add visual prompting requirement

* Add visual prompting in tox

* Add visual prompting in setup.py

* Fix typo

* pre-commit

* Add actions

* Update tests/e2e/cli/visual_prompting/test_visual_prompting.py

Co-authored-by: Jaeguk Hyun <jaeguk.hyun@intel.com>

* Skip PTQ e2e test

* Change task name

* Remove skipped tc

---------

Co-authored-by: Jaeguk Hyun <jaeguk.hyun@intel.com>

* Fix e2e (#2366)

* Change e2e reference name

* Update openvino eval threshold for multiclass classification

* Change comment message

* Fix tiling e2e tests

---------

Co-authored-by: GalyaZalesskaya <galina.zalesskaya@intel.com>

* Add Dino head unit tests (#2344)

Recover DINO head unit tests

* Update for release 1.4.0rc2 (#2370)

* update for release 1.4.0rc2

* Add skip mark for unstable unit tests

---------

Co-authored-by: jaegukhyun <jaeguk.hyun@intel.com>

* Fix NNCF training on CPU (#2373)

* Align label order between Geti and OTX (#2369)

* align label order

* align with pre-commit

* update CHANGELOG.md

* deal with edge case

* update type hint

* Remove CenterCrop from Classification test pipeline and editing missing docs link (#2375)

* Fix missing link for docs and removing centercrop for classification data pipeline

* Revert the test threshold

* Fix H-label classification (#2377)

* Fix h-labelissue

* Update unit tests

* Make black happy

* Fix unittests

* Make black happy

* Fix update heades information func

* Update the logic: consider the loss per batch

* Update for release 1.4 (#2380)

* updated for 1.4.0rc3

* update changelog & release note

* bump datumaro version up

---------

Co-authored-by: Songki Choi <songki.choi@intel.com>

* Switch to PTQ for sseg (#2374)

* Switch to PTQ for sseg

* Update log messages

* Fix invalid import structures in otx.api (#2383)

Update tiler.py

* Update for 1.4.0rc4 (#2385)

update for release 1.4.0rc4

* [release 1.4.0] XAI: Return saliency maps for Mask RCNN IR async infer (#2395)

* Return saliency maps for openvino async infer

* add workaround to fix yapf importing error

---------

Co-authored-by: eunwoosh <eunwoo.shin@intel.com>

* Update for release 1.4.0 (#2399)

update version string

Co-authored-by: Sungman Cho <sungman.cho@intel.com>

* Fix broken links in documentation (#2405)

* fix docs links to datumaro's docs
* fix docs links to otx's docs
* bump version to 1.4.1

* Update exportable code README (#2411)

* Updated for release 1.4.1 (#2412)

updated for release 1.4.1

* Add workaround for the incorrect meta info M-RCNN (used for XAI) (#2437)

Add workaround for the incorrect mata info

* Add model category attributes to model template (#2439)

Add model category attributes to model template

* Add model category & status fields in model template

* Add is_default_for_task attr to model template

* Update model templates with category attrs

* Add integration tests for model templates consistency

* Fix license & doc string

* Fix typo

* Refactor test cases

* Refactor common tests by generator

---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update for 1.4.2rc1 (#2441)

update for release 1.4.2rc1

* Fix label list order for h-label classification (#2440)

* Fix label list for h-label cls
* Fix unit tests

* Modified fq numbers for lite HRNET (#2445)

modified fq numbers for lite HRNET

* Update PTQ ignored scope for hrnet 18  mod2 (#2449)

Update ptq ignored scope for hrnet 18  mod2

* Fix OpenVINO inference for legacy models (#2450)

* bug fix for legacy openvino models

* Add tests

* Specific exceptions

---------

* Update for 1.4.2rc2 (#2455)

update for release 1.4.2rc2

* Prevent zero-sized saliency map in tiling if tile size is too big (#2452)

* Prevent zero-sized saliency map in tiling if tile size is too big

* Prevent zero-sized saliency in tiling (PyTorch)

* Add unit tests for Tiler merge features methods

---------

Co-authored-by: Galina <galina.zalesskaya@intel.com>

* Update pot fq reference number (#2456)

update pot fq reference number to 15

* Bump datumaro version to 1.5.0rc0 (#2470)

bump datumaro version to 1.5.0rc0

* Set tox version constraint (#2472)

set tox version constraint - https://github.com/tox-dev/tox/issues/3110

* Bug fix for albumentations (#2467)

* bug fix for legacy openvino models

* Address albumentation issue

---------

Co-authored-by: Ashwin Vaidya <ashwinitinvaidya@gmail.com>

* update for release 1.4.2rc3

* Add a dummy hierarchical config required by MAPI (#2483)

* bump version to 1.4.2rc4

* Bump datumaro version (#2502)

* bump datumaro version

* remove deprecated/reomved attribute usage of the datumaro

* Upgrade nncf version for 1.4 release (#2459)

* Upgrade nncf version

* Fix nncf interface warning

* Set the exact nncf version

* Update FQ refs after NNCF upgrade

* Use NNCF from pypi

* Update version for release 1.4.2rc5 (#2507)

update version for release 1.4.2rc5

* Update for 1.4.2 (#2514)

update for release 1.4.2

* create branch release/1.5.0

* Delete mem cache handler after training is done (#2535)

release mem cache handler after training is done

* Fix bug that auto batch size doesn't consider distributed training (#2533)

* consider distributed training while searching batch size

* update unit test

* reveret gpu memory upper bound

* fix typo

* change allocated to reserved

* add unit test for distributed training

* align with pre-commit

* Apply fix progress hook to release 1.5.0 (#2539)

* Fix hook's ordering issue. AdaptiveRepeatHook changes the runner.max_iters before the ProgressHook

* Change the expression

* Fix typo

* Fix multi-label, h-label issue

* Fix auto_bs issue

* Apply suggestions from code review

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Reflecting reviews

* Refactor the name of get_data_cfg

* Revert adaptive hook sampler init

* Refactor the function name: get_data_cfg -> get_subset_data_cfg

* Fix unit test errors

* Remove adding AdaptiveRepeatDataHook for autobs

* Remove unused import

* Fix detection and segmentation case in Geti scenario

---------

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Re introduce adaptive scheduling for training (#2541)

* Re-introduce adaptive patience for training

* Revert unit tests

* Update for release 1.4.3rc1 (#2542)

* Mirror Anomaly ModelAPI changes (#2531)

* Migrate anomaly exportable code to modelAPI (#2432)

* Fix license in PR template

* Migrate to modelAPI

* Remove color conversion in streamer

* Remove reverse_input_channels

* Add float

* Remove test as metadata is no longer used

* Remove metadata from load method

* remove anomalib openvino inferencer

* fix signature

* Support logacy OpenVINO model

* Transform image

* add configs

* Re-introduce adaptive training (#2543)

* Re-introduce adaptive patience for training

* Revert unit tests

* Fix auto input size mismatch in eval & export (#2530)

* Fix auto input size mismatch in eval & export

* Re-enable E2E tests for Issue#2518

* Add input size check in export testing

* Format float numbers in log

* Fix NNCF export shape mismatch

* Fix saliency map issue

* Disable auto input size if tiling enabled

---------

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update ref. fq number for anomaly e2e2 (#2547)

* Skip e2e det tests by issue2548 (#2550)

* Add skip to chained TC for issue #2548 (#2552)

* Update for release 1.4.3 (#2551)

* Update MAPI for 1.5 release (#2555)

Upgrade MAPI to v 0.1.6 (#2529)

* Upgrade MAPI

* Update exp code demo commit

* Fix MAPI imports

* Update ModelAPI configuration (#2564)

* Update MAPI rt infor for detection

* Upadte export info for cls, det and seg

* Update unit tests

* Disable QAT for SegNexts (#2565)

* Disable NNCF QAT for SegNext

* Del obsolete pot configs

* Move NNCF skip marks to test commands to avoid duplication

* Add Anomaly modelAPI changes to releases/1.4.0 (#2563)

* bug fix for legacy openvino models

* Apply otx anomaly 1.5 changes

* Fix tests

* Fix compression config

* fix modelAPI imports

* update integration tests

* Edit config types

* Update keys in deployed model

---------

Co-authored-by: Ashwin Vaidya <ashwinitinvaidya@gmail.com>
Co-authored-by: Kim, Sungchul <sungchul.kim@intel.com>

* Fix the CustomNonLinearClsHead when the batch_size is set to 1 (#2571)

Fix bn1d issue

Co-authored-by: sungmanc <sungmanc@intel.com>

* Update ModelAPI configuration (#2564 from 1.4) (#2568)

Update ModelAPI configuration (#2564)

* Update MAPI rt infor for detection

* Upadte export info for cls, det and seg

* Update unit tests

* Update for 1.4.4rc1 (#2572)

* Hotfix DatasetEntity.get_combined_subset function loop (#2577)

Fix get_combined_subset function

* Revert default input size to `Default` due to YOLOX perf regression (#2580)

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix for the degradation issue of the classification task (#2585)

* Revert to sync with 1.4.0

* Remove repeat data

* Convert to the RGB value

* Fix color conversion logic

* Fix precommit

* Bump datumaro version to 1.5.1rc3 (#2587)

* Add label ids to anomaly OpenVINO model xml (#2590)

* Add label ids to model xml

---------

* Fix DeiT-Tiny model regression during class incremental training (#2594)

* enable IBloss for DeiT-Tiny

* update changelog

* add docstring

* Add label ids to model xml in release 1.5 (#2591)

Add label ids to model xml

* Fix DeiT-Tiny regression test for release/1.4.0 (#2595)

* Fix DeiT regression test

* update changelog

* temp

* Fix mmcls bug not wrapping model in DataParallel on CPUs (#2601)

Wrap multi-label and h-label classification models by MMDataParallel in case of CPU training.
---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix h-label loss normalization issue w/ exclusive label group of singe label (#2604)

* Fix h-label loss normalization issue w/ exclusive label group with signle label

* Fix non-linear version

---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Boost up Image numpy accessing speed through PIL (#2586)

* boost up numpy accessing speed through PIL

* update CHANGELOG

* resolve precommit error

* resolve precommit error

* add fallback logic with PIL open

* use convert instead of draft

* Add missing import pathlib for cls e2e testing (#2610)

* Fix division by zero in class incremental learning for classification (#2606)

* Add empty label to reproduce zero-division error

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix minor typo

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix empty label 4 -> 3

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Prevent division by zero

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update license

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update CHANGELOG.md

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix inefficient sampling

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Revert indexing

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix minor typo

Signed-off-by: Songki Choi <songki.choi@intel.com>

---------

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Unify logger usage (#2612)

* unify logger

* align with pre-commit

* unify anomaly logger to otx

* change logger file path

* align with pre-commit

* change logger file path in missing file

* configure logger after ConfigManager is initialized

* configure logger when ConfigManager instance is initialized

* update unit test code

* move config_logger to each cli file

* align with pre-commit

* change part still using mmcv logger

* Fix XAI algorithm for Detection (#2609)

* Impove saliency maps algorithm for Detection

* Remove extra changes

* Update unit tests

* Changes for 1 class

* Fix pre-commit

* Update CHANGELOG

* Tighten dependency constraint only adapting latest patches (#2607)

* tighten dependency constratint only adapting latest patches

* adjust scikit-image version w.r.t python version

* adjust tensorboard version w.r.t python version

* remove version specifier for scikit-image

* Add metadata to optimized model (#2618)

* bug fix for legacy openvino models

* Add metadata to optimized model

* Revert formatting changes

---------

Co-authored-by: Ashwin Vaidya <ashwinitinvaidya@gmail.com>

* modify omegaconf version constraint

* [release 1.5.0] Fix XAI algorithm for Detection (#2617)

Update detection XAI algorithm

* Update dependency constraint (#2622)

* Update tpp (#2621)

* Fix h-label bug of missing parent labels in output (#2626)

* Fix h-label bug of missing parent labels in output

* Fix h-label test data label schema

* Update CHANGELOG.md

---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update publish workflow (#2625)

update publish workflow to push whl to internal pypi

* bump datumaro version to ~=1.5.0

* fixed mistake while mergeing back 1.4.4

* modifiy readme

* remove openvino model wrapper class

* remove openvino model wrapper tests

* [release 1.5.0] DeiT: enable tests + add ViTFeatureVectorHook (#2630)

Add ViT feature vector hook

* Fix docs broken link to datatumaro_h-label

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix wrong label settings for non-anomaly task ModelAPIs

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update publish workflow for tag checking (#2632)

* Update e2e tests for XAI Detection (#2634)

Fix e2e XAI ref value

* Disable QAT for newly added models (#2636)

* Update release note and readme (#2637)

* update release note and readme

* remove package upload step on internal publish wf

* update release note and, changelog, and readme

* update version string to 1.6.0dev

* fix datumaro version to 1.6.0rc0

* Mergeback 1.5.0 to develop (#2642)

* Update publish workflow for tag checking (#2632)

* Update e2e tests for XAI Detection (#2634)

* Disable QAT for newly added models (#2636)

* Update release note and readme (#2637)

* remove package upload step on internal publish wf

* update release note and, changelog, and readme

* update version string to 1.6.0dev

---------

Co-authored-by: Galina Zalesskaya <galina.zalesskaya@intel.com>
Co-authored-by: Jaeguk Hyun <jaeguk.hyun@intel.com>

* Revert "Mergeback 1.5.0 to develop" (#2645)

Revert "Mergeback 1.5.0 to develop (#2642)"

This reverts commit 2f67686103df873d020681f6d504f9595ce4a963.

* Add a tool to help conduct experiments (#2651)

* implement run and experiment

* implement experiment result aggregator

* refactor experiment.py

* refactor run.py

* get export model speed

* add var collumn

* refactor experiment.py

* refine a way to update argument in cmd

* refine resource tracker

* support anomaly on research framework

* refine code aggregating exp result

* bugfix

* make other task available

* eval task save avg_time_per_images as result

* Add new argument to track CPU&GPU utilization and memory usage (#2500)

* add argument to track resource usage

* fix bug

* fix a bug in a multi gpu case

* use total cpu usage

* add unit test

* add mark to unit test

* cover edge case

* add pynvml in requirement

* align with pre-commit

* add license comment

* update changelog

* refine argument help

* align with pre-commit

* add version to requirement and raise an error if not supported values are given

* apply new resource tracker format

* refactor run.py

* support optimize in research framework

* cover edge case

* Handle a case where fail cases exist

* make argparse raise error rather than exit if problem exist

* revert tensorboard aggregator

* bugfix

* save failed cases as yaml file

* deal with integer in variables

* add epoch to metric

* use latest log.json file

* align with otx logging method

* move experiment.py from cli to tools

* refactor experiment.py

* merge otx run feature into experiment.py

* move set_arguments_to_cmd definition into experiment.py

* refactor experiment.py

* bugfix

* minor bugfix

* use otx.cli instead of each otx entry

* add feature to parse single workspace

* add comments

* fix bugs

* align with pre-commit

* revert parser argument

* align with pre-commit

* Revert inference batch size to 1 for instance segmentation (#2648)

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Remove unnecessary log while building a model (#2658)

* revert logger in otx/algorithms/detection/adapters/mmdet/utils/builder.py

* revert logger in otx/algorithms/classification/adapters/mmcls/utils/builder.py

* make change more readable

* Fix a minor bug of experiment.py (#2662)

fix bug

* Not check avg_time_per_image during test (#2665)

* ignore avg_time_per_image during test

* do not call stdev when length of array is less than 2

* ignore avg_time_per_image during regerssion test

* Update device selection logic in classificaiton

---------

Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Signed-off-by: Songki Choi <songki.choi@intel.com>
Co-authored-by: Yunchu Lee <yunchu.lee@intel.com>
Co-authored-by: Kim, Sungchul <sungchul.kim@intel.com>
Co-authored-by: Vinnam Kim <vinnam.kim@intel.com>
Co-authored-by: Evgeny Tsykunov <evgeny.tsykunov@intel.com>
Co-authored-by: Songki Choi <songki.choi@intel.com>
Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>
Co-authored-by: Jaeguk Hyun <jaeguk.hyun@intel.com>
Co-authored-by: Sungman Cho <sungman.cho@intel.com>
Co-authored-by: Eugene Liu <eugene.liu@intel.com>
Co-authored-by: Wonju Lee <wonju.lee@intel.com>
Co-authored-by: Dick Ameln <dick.ameln@intel.com>
Co-authored-by: sungchul.kim <sungchul@ikvensx010>
Co-authored-by: GalyaZalesskaya <galina.zalesskaya@intel.com>
Co-authored-by: Harim Kang <harim.kang@intel.com>
Co-authored-by: Ashwin Vaidya <ashwin.vaidya@intel.com>
Co-authored-by: Ashwin Vaidya <ashwinitinvaidya@gmail.com>
Co-authored-by: sungmanc <sungmanc@intel.com>

* Fix mixed & lower precision training (#2668)

* remove dtype argument in torch.xpu.optimize

* Add `custom_auto_fp16` to use xpu autocast

* Update `forward`s to use `custom_auto_fp16`

* precommit

* Disable FP16 training

* Add `custom_force_fp32`

* Removed what force casting tensors to bf16

* Add `XPUOptimizerHook` and `XPUGradScaler`

* precommit

* Enable lower precision training

* Remove dtype check for lower precision

* Add `bf16_training` in recipe

* fix

* Remove unused module

* Change `XPUOptimizerHook` to `BFp16XPUOptimizerHook`

* Fix for common devices which don't use bf16

* precommit

* Enable to use `auto_fp16` as it is

* Add try-except avoiding mmcv import error

* Fix error type

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

---------

Co-authored-by: Shin, Eunwoo <eunwoo.shin@intel.com>

* Add XPU support to anomaly task (#2677)

* Update base.txt

updated dependency version of datumaro

* Update __init__.py

update version string

* Update requirements.txt

* Temporarily skip visual prompting openvino integration test (#2323)

* Fix import dm.DatasetSubset (#2324)

Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>

* Fix semantic segmentation soft prediction dtype (#2322)

* Fix semantic segmentation soft prediction dtype

* relax ref sal vals check

---------

Co-authored-by: Songki Choi <songki.choi@intel.com>

* Contrain yapf verison lesser than 0.40.0 (#2328)

contrain_yapf_version

* Fix detection e2e tests (#2327)

Fix for detection

* Mergeback: Label addtion/deletion 1.2.4 --> 1.4.0 (#2326)

* Make black happy

* Fix conflicts

* Merge-back: add test datasets and edit the test code

* Make black happy

* Fix mis-merge

* Make balck happy

* Fix typo

* Fix typoi

---------

Co-authored-by: Songki Choi <songki.choi@intel.com>

* Bump datumaro up to 1.4.0rc2 (#2332)

bump datumaro up to 1.4.0rc2

* Tiling Doc for releases 1.4.0 (#2333)

* Add tiling documentation

* Bump otx version to 1.4.0rc2 (#2341)

* OTX deploy for visual prompting task  (#2311)

* Enable `otx deploy`

* (WIP) integration test

* Docstring

* Update args for create_model

* Manually set image embedding layout

* Enable to use model api for preprocessing
- `fit_to_window` doesn't work expectedly, so newly implemented `VisualPromptingOpenvinoAdapter` to use new resize function

* Remove skipped test

* Updated

* Update unit tests on model wrappers

* Update

* Update configuration

* Fix not to patch pretrained path

* pylint & update model api version in docstring

---------

Co-authored-by: Wonju Lee <wonju.lee@intel.com>

* Bump albumentations version in anomaly requirements (#2350)

increment albumentations version

* Update action detection (#2346)

* Remove skip mark for PTQ test of action detection

* Update action detection documentation

* Fix e2e (#2348)

* Change classification dataset from dummy to toy

* Revert test changes

* Change label name for multilabel dataset

* Revert e2e test changes

* Change ov test cases' threshold

* Add parent's label

* Update ModelAPI in 1.4 release (#2347)

* Upgrade model API

* Update otx in exportable code

* Fix unit tests

* Fix black

* Fix detection inference

* Fix det tiling

* Fix mypy

* Fix demo

* Fix visualizer in demo

* Fix black

* Add OTX optimize for visual prompting task (#2318)

* Initial commit

* Update block

* (WIP) otx optimize

* Fix

* WIP

* Update configs & exported outputs

* Remove unused modules for torch

* Add unit tests

* pre-commit

* Update CHANGELOG

* Update detection docs (#2335)

* Update detection docs

* Revert template id changes

* Fix wrong template id

* Update docs/source/guide/explanation/algorithms/object_detection/object_detection.rst

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Update docs/source/guide/explanation/algorithms/object_detection/object_detection.rst

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

---------

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Add visual prompting documentation (#2354)

* (WIP) write docs

* Add visual prompting documentation

* Update CHANGELOG

---------

Co-authored-by: sungchul.kim <sungchul@ikvensx010>

* Remove custom modelapi patch in visual prompting (#2359)

* Remove custom modelapi patch

* Update test

* Fix graph metric order and label issues (#2356)

* Fix graph metric going backward issue
* Add license notice
* Fix pre-commit issue
* Add rename items & logic for metric
---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update multi-label document and conversion script (#2358)

Update docs, label convert script

* Update third party programs (#2365)

* Make anomaly task compatible with older albumentations versions (#2363)

* fix transforms export in metadata

* wrap transform dict

* add todo for updating to_dict call

* Fixing detection saliency map for one class case (#2368)

* fix softmax

* fix validity tests

* Add e2e test for visual prompting (#2360)

* (WIP) otx optimize

* pre-commit

* (WIP) set e2e

* Remove nncf config

* Add visual prompting requirement

* Add visual prompting in tox

* Add visual prompting in setup.py

* Fix typo

* Delete unused configuration.yaml

* Edit test_name

* Add to limit activation range

* Update from `vp` to `visprompt`

* Fix about no returning the first label

* pre-commit

* (WIP) otx optimize

* pre-commit

* (WIP) set e2e

* Remove nncf config

* Add visual prompting requirement

* Add visual prompting in tox

* Add visual prompting in setup.py

* Fix typo

* pre-commit

* Add actions

* Update tests/e2e/cli/visual_prompting/test_visual_prompting.py

Co-authored-by: Jaeguk Hyun <jaeguk.hyun@intel.com>

* Skip PTQ e2e test

* Change task name

* Remove skipped tc

---------

Co-authored-by: Jaeguk Hyun <jaeguk.hyun@intel.com>

* Fix e2e (#2366)

* Change e2e reference name

* Update openvino eval threshold for multiclass classification

* Change comment message

* Fix tiling e2e tests

---------

Co-authored-by: GalyaZalesskaya <galina.zalesskaya@intel.com>

* Add Dino head unit tests (#2344)

Recover DINO head unit tests

* Update for release 1.4.0rc2 (#2370)

* update for release 1.4.0rc2

* Add skip mark for unstable unit tests

---------

Co-authored-by: jaegukhyun <jaeguk.hyun@intel.com>

* Fix NNCF training on CPU (#2373)

* Align label order between Geti and OTX (#2369)

* align label order

* align with pre-commit

* update CHANGELOG.md

* deal with edge case

* update type hint

* Remove CenterCrop from Classification test pipeline and editing missing docs link (#2375)

* Fix missing link for docs and removing centercrop for classification data pipeline

* Revert the test threshold

* Fix H-label classification (#2377)

* Fix h-labelissue

* Update unit tests

* Make black happy

* Fix unittests

* Make black happy

* Fix update heades information func

* Update the logic: consider the loss per batch

* Update for release 1.4 (#2380)

* updated for 1.4.0rc3

* update changelog & release note

* bump datumaro version up

---------

Co-authored-by: Songki Choi <songki.choi@intel.com>

* Switch to PTQ for sseg (#2374)

* Switch to PTQ for sseg

* Update log messages

* Fix invalid import structures in otx.api (#2383)

Update tiler.py

* Update for 1.4.0rc4 (#2385)

update for release 1.4.0rc4

* [release 1.4.0] XAI: Return saliency maps for Mask RCNN IR async infer (#2395)

* Return saliency maps for openvino async infer

* add workaround to fix yapf importing error

---------

Co-authored-by: eunwoosh <eunwoo.shin@intel.com>

* Update for release 1.4.0 (#2399)

update version string

Co-authored-by: Sungman Cho <sungman.cho@intel.com>

* Fix broken links in documentation (#2405)

* fix docs links to datumaro's docs
* fix docs links to otx's docs
* bump version to 1.4.1

* Update exportable code README (#2411)

* Updated for release 1.4.1 (#2412)

updated for release 1.4.1

* Add workaround for the incorrect meta info M-RCNN (used for XAI) (#2437)

Add workaround for the incorrect mata info

* Add model category attributes to model template (#2439)

Add model category attributes to model template

* Add model category & status fields in model template

* Add is_default_for_task attr to model template

* Update model templates with category attrs

* Add integration tests for model templates consistency

* Fix license & doc string

* Fix typo

* Refactor test cases

* Refactor common tests by generator

---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update for 1.4.2rc1 (#2441)

update for release 1.4.2rc1

* Fix label list order for h-label classification (#2440)

* Fix label list for h-label cls
* Fix unit tests

* Modified fq numbers for lite HRNET (#2445)

modified fq numbers for lite HRNET

* Update PTQ ignored scope for hrnet 18  mod2 (#2449)

Update ptq ignored scope for hrnet 18  mod2

* Fix OpenVINO inference for legacy models (#2450)

* bug fix for legacy openvino models

* Add tests

* Specific exceptions

---------

* Update for 1.4.2rc2 (#2455)

update for release 1.4.2rc2

* Prevent zero-sized saliency map in tiling if tile size is too big (#2452)

* Prevent zero-sized saliency map in tiling if tile size is too big

* Prevent zero-sized saliency in tiling (PyTorch)

* Add unit tests for Tiler merge features methods

---------

Co-authored-by: Galina <galina.zalesskaya@intel.com>

* Update pot fq reference number (#2456)

update pot fq reference number to 15

* Bump datumaro version to 1.5.0rc0 (#2470)

bump datumaro version to 1.5.0rc0

* Set tox version constraint (#2472)

set tox version constraint - https://github.com/tox-dev/tox/issues/3110

* Bug fix for albumentations (#2467)

* bug fix for legacy openvino models

* Address albumentation issue

---------

Co-authored-by: Ashwin Vaidya <ashwinitinvaidya@gmail.com>

* update for release 1.4.2rc3

* Add a dummy hierarchical config required by MAPI (#2483)

* bump version to 1.4.2rc4

* Bump datumaro version (#2502)

* bump datumaro version

* remove deprecated/reomved attribute usage of the datumaro

* Upgrade nncf version for 1.4 release (#2459)

* Upgrade nncf version

* Fix nncf interface warning

* Set the exact nncf version

* Update FQ refs after NNCF upgrade

* Use NNCF from pypi

* Update version for release 1.4.2rc5 (#2507)

update version for release 1.4.2rc5

* Update for 1.4.2 (#2514)

update for release 1.4.2

* create branch release/1.5.0

* Delete mem cache handler after training is done (#2535)

release mem cache handler after training is done

* Fix bug that auto batch size doesn't consider distributed training (#2533)

* consider distributed training while searching batch size

* update unit test

* reveret gpu memory upper bound

* fix typo

* change allocated to reserved

* add unit test for distributed training

* align with pre-commit

* Apply fix progress hook to release 1.5.0 (#2539)

* Fix hook's ordering issue. AdaptiveRepeatHook changes the runner.max_iters before the ProgressHook

* Change the expression

* Fix typo

* Fix multi-label, h-label issue

* Fix auto_bs issue

* Apply suggestions from code review

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Reflecting reviews

* Refactor the name of get_data_cfg

* Revert adaptive hook sampler init

* Refactor the function name: get_data_cfg -> get_subset_data_cfg

* Fix unit test errors

* Remove adding AdaptiveRepeatDataHook for autobs

* Remove unused import

* Fix detection and segmentation case in Geti scenario

---------

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Re introduce adaptive scheduling for training (#2541)

* Re-introduce adaptive patience for training

* Revert unit tests

* Update for release 1.4.3rc1 (#2542)

* Mirror Anomaly ModelAPI changes (#2531)

* Migrate anomaly exportable code to modelAPI (#2432)

* Fix license in PR template

* Migrate to modelAPI

* Remove color conversion in streamer

* Remove reverse_input_channels

* Add float

* Remove test as metadata is no longer used

* Remove metadata from load method

* remove anomalib openvino inferencer

* fix signature

* Support logacy OpenVINO model

* Transform image

* add configs

* Re-introduce adaptive training (#2543)

* Re-introduce adaptive patience for training

* Revert unit tests

* Fix auto input size mismatch in eval & export (#2530)

* Fix auto input size mismatch in eval & export

* Re-enable E2E tests for Issue#2518

* Add input size check in export testing

* Format float numbers in log

* Fix NNCF export shape mismatch

* Fix saliency map issue

* Disable auto input size if tiling enabled

---------

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update ref. fq number for anomaly e2e2 (#2547)

* Skip e2e det tests by issue2548 (#2550)

* Add skip to chained TC for issue #2548 (#2552)

* Update for release 1.4.3 (#2551)

* Update MAPI for 1.5 release (#2555)

Upgrade MAPI to v 0.1.6 (#2529)

* Upgrade MAPI

* Update exp code demo commit

* Fix MAPI imports

* Update ModelAPI configuration (#2564)

* Update MAPI rt infor for detection

* Upadte export info for cls, det and seg

* Update unit tests

* Disable QAT for SegNexts (#2565)

* Disable NNCF QAT for SegNext

* Del obsolete pot configs

* Move NNCF skip marks to test commands to avoid duplication

* Add Anomaly modelAPI changes to releases/1.4.0 (#2563)

* bug fix for legacy openvino models

* Apply otx anomaly 1.5 changes

* Fix tests

* Fix compression config

* fix modelAPI imports

* update integration tests

* Edit config types

* Update keys in deployed model

---------

Co-authored-by: Ashwin Vaidya <ashwinitinvaidya@gmail.com>
Co-authored-by: Kim, Sungchul <sungchul.kim@intel.com>

* Fix the CustomNonLinearClsHead when the batch_size is set to 1 (#2571)

Fix bn1d issue

Co-authored-by: sungmanc <sungmanc@intel.com>

* Update ModelAPI configuration (#2564 from 1.4) (#2568)

Update ModelAPI configuration (#2564)

* Update MAPI rt infor for detection

* Upadte export info for cls, det and seg

* Update unit tests

* Update for 1.4.4rc1 (#2572)

* Hotfix DatasetEntity.get_combined_subset function loop (#2577)

Fix get_combined_subset function

* Revert default input size to `Default` due to YOLOX perf regression (#2580)

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix for the degradation issue of the classification task (#2585)

* Revert to sync with 1.4.0

* Remove repeat data

* Convert to the RGB value

* Fix color conversion logic

* Fix precommit

* Bump datumaro version to 1.5.1rc3 (#2587)

* Add label ids to anomaly OpenVINO model xml (#2590)

* Add label ids to model xml

---------

* Fix DeiT-Tiny model regression during class incremental training (#2594)

* enable IBloss for DeiT-Tiny

* update changelog

* add docstring

* Add label ids to model xml in release 1.5 (#2591)

Add label ids to model xml

* Fix DeiT-Tiny regression test for release/1.4.0 (#2595)

* Fix DeiT regression test

* update changelog

* temp

* Fix mmcls bug not wrapping model in DataParallel on CPUs (#2601)

Wrap multi-label and h-label classification models by MMDataParallel in case of CPU training.
---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix h-label loss normalization issue w/ exclusive label group of singe label (#2604)

* Fix h-label loss normalization issue w/ exclusive label group with signle label

* Fix non-linear version

---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Boost up Image numpy accessing speed through PIL (#2586)

* boost up numpy accessing speed through PIL

* update CHANGELOG

* resolve precommit error

* resolve precommit error

* add fallback logic with PIL open

* use convert instead of draft

* Add missing import pathlib for cls e2e testing (#2610)

* Fix division by zero in class incremental learning for classification (#2606)

* Add empty label to reproduce zero-division error

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix minor typo

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix empty label 4 -> 3

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Prevent division by zero

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update license

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update CHANGELOG.md

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix inefficient sampling

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Revert indexing

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix minor typo

Signed-off-by: Songki Choi <songki.choi@intel.com>

---------

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Unify logger usage (#2612)

* unify logger

* align with pre-commit

* unify anomaly logger to otx

* change logger file path

* align with pre-commit

* change logger file path in missing file

* configure logger after ConfigManager is initialized

* configure logger when ConfigManager instance is initialized

* update unit test code

* move config_logger to each cli file

* align with pre-commit

* change part still using mmcv logger

* Fix XAI algorithm for Detection (#2609)

* Impove saliency maps algorithm for Detection

* Remove extra changes

* Update unit tests

* Changes for 1 class

* Fix pre-commit

* Update CHANGELOG

* Tighten dependency constraint only adapting latest patches (#2607)

* tighten dependency constratint only adapting latest patches

* adjust scikit-image version w.r.t python version

* adjust tensorboard version w.r.t python version

* remove version specifier for scikit-image

* Add metadata to optimized model (#2618)

* bug fix for legacy openvino models

* Add metadata to optimized model

* Revert formatting changes

---------

Co-authored-by: Ashwin Vaidya <ashwinitinvaidya@gmail.com>

* modify omegaconf version constraint

* [release 1.5.0] Fix XAI algorithm for Detection (#2617)

Update detection XAI algorithm

* Update dependency constraint (#2622)

* Update tpp (#2621)

* Fix h-label bug of missing parent labels in output (#2626)

* Fix h-label bug of missing parent labels in output

* Fix h-label test data label schema

* Update CHANGELOG.md

---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update publish workflow (#2625)

update publish workflow to push whl to internal pypi

* bump datumaro version to ~=1.5.0

* fixed mistake while mergeing back 1.4.4

* modifiy readme

* remove openvino model wrapper class

* remove openvino model wrapper tests

* [release 1.5.0] DeiT: enable tests + add ViTFeatureVectorHook (#2630)

Add ViT feature vector hook

* Fix docs broken link to datatumaro_h-label

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Fix wrong label settings for non-anomaly task ModelAPIs

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update publish workflow for tag checking (#2632)

* Update e2e tests for XAI Detection (#2634)

Fix e2e XAI ref value

* Disable QAT for newly added models (#2636)

* Update release note and readme (#2637)

* update release note and readme

* remove package upload step on internal publish wf

* update release note and, changelog, and readme

* update version string to 1.6.0dev

* fix datumaro version to 1.6.0rc0

* Mergeback 1.5.0 to develop (#2642)

* Update publish workflow for tag checking (#2632)

* Update e2e tests for XAI Detection (#2634)

* Disable QAT for newly added models (#2636)

* Update release note and readme (#2637)

* remove package upload step on internal publish wf

* update release note and, changelog, and readme

* update version string to 1.6.0dev

---------

Co-authored-by: Galina Zalesskaya <galina.zalesskaya@intel.com>
Co-authored-by: Jaeguk Hyun <jaeguk.hyun@intel.com>

* Revert "Mergeback 1.5.0 to develop" (#2645)

Revert "Mergeback 1.5.0 to develop (#2642)"

This reverts commit 2f67686103df873d020681f6d504f9595ce4a963.

* Add a tool to help conduct experiments (#2651)

* implement run and experiment

* implement experiment result aggregator

* refactor experiment.py

* refactor run.py

* get export model speed

* add var collumn

* refactor experiment.py

* refine a way to update argument in cmd

* refine resource tracker

* support anomaly on research framework

* refine code aggregating exp result

* bugfix

* make other task available

* eval task save avg_time_per_images as result

* Add new argument to track CPU&GPU utilization and memory usage (#2500)

* add argument to track resource usage

* fix bug

* fix a bug in a multi gpu case

* use total cpu usage

* add unit test

* add mark to unit test

* cover edge case

* add pynvml in requirement

* align with pre-commit

* add license comment

* update changelog

* refine argument help

* align with pre-commit

* add version to requirement and raise an error if not supported values are given

* apply new resource tracker format

* refactor run.py

* support optimize in research framework

* cover edge case

* Handle a case where fail cases exist

* make argparse raise error rather than exit if problem exist

* revert tensorboard aggregator

* bugfix

* save failed cases as yaml file

* deal with integer in variables

* add epoch to metric

* use latest log.json file

* align with otx logging method

* move experiment.py from cli to tools

* refactor experiment.py

* merge otx run feature into experiment.py

* move set_arguments_to_cmd definition into experiment.py

* refactor experiment.py

* bugfix

* minor bugfix

* use otx.cli instead of each otx entry

* add feature to parse single workspace

* add comments

* fix bugs

* align with pre-commit

* revert parser argument

* align with pre-commit

* Revert inference batch size to 1 for instance segmentation (#2648)

Signed-off-by: Songki Choi <songki.choi@intel.com>

* Remove unnecessary log while building a model (#2658)

* revert logger in otx/algorithms/detection/adapters/mmdet/utils/builder.py

* revert logger in otx/algorithms/classification/adapters/mmcls/utils/builder.py

* make change more readable

* Fix a minor bug of experiment.py (#2662)

fix bug

* Not check avg_time_per_image during test (#2665)

* ignore avg_time_per_image during test

* do not call stdev when length of array is less than 2

* ignore avg_time_per_image during regerssion test

* Update device selection logic in classificaiton

* Add xpu accelerator

* Tmp patch for anomaly trainer

* Use XPU callback for anomaly training

* Update xpu accelerator

* Fix for anomaly xpu callback

* Fix validation batch logic

* Cleanup, add docstrings

* Refine xpu callback

---------

Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Signed-off-by: Songki Choi <songki.choi@intel.com>
Co-authored-by: Yunchu Lee <yunchu.lee@intel.com>
Co-authored-by: Kim, Sungchul <sungchul.kim@intel.com>
Co-authored-by: Vinnam Kim <vinnam.kim@intel.com>
Co-authored-by: Evgeny Tsykunov <evgeny.tsykunov@intel.com>
Co-authored-by: Songki Choi <songki.choi@intel.com>
Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>
Co-authored-by: Jaeguk Hyun <jaeguk.hyun@intel.com>
Co-authored-by: Sungman Cho <sungman.cho@intel.com>
Co-authored-by: Eugene Liu <eugene.liu@intel.com>
Co-authored-by: Wonju Lee <wonju.lee@intel.com>
Co-authored-by: Dick Ameln <dick.ameln@intel.com>
Co-authored-by: sungchul.kim <sungchul@ikvensx010>
Co-authored-by: GalyaZalesskaya <galina.zalesskaya@intel.com>
Co-authored-by: Harim Kang <harim.kang@intel.com>
Co-authored-by: Ashwin Vaidya <ashwin.vaidya@intel.com>
Co-authored-by: Ashwin Vaidya <ashwinitinvaidya@gmail.com>
Co-authored-by: sungmanc <sungmanc@intel.com>

* Disable mixed precision training on XPU (#2683)

* disable mixed precision training on XPU

* fix optimize error

* refactor XPUDataParallel

* align with pre-commit

* align with pre-commit

* fix bug

* Update anomaly XPU integration (#2697)

* Update anomaly XPU integration

* Update strategy and accelerator

* Cleanup in strategy

* Fix mypy

* remove XPU callback

* Add XPU mixed precision plugin for lightning (#2714)

* Update anomaly XPU integration

* Update strategy and accelerator

* Cleanup in strategy

* Fix mypy

* remove XPU callback

* Add XPU mixed precision lightning training

* Fix linters

* Handle default plugins value

* Update code to support other features than 'train' on XPU (#2704)

* move where patching code for xpu in detection

* implement xpu resource manager in HPO

* consider xpu during adaptive_num_workers

* consider xpu while deciding asynchronous_sha

* align with pre-commit

* change variable name to proper one

* change patching code into configure_device

* update unit test code

* align with pre-commit

* Merge develop to develop-idev (#2727)

* Update base.txt

updated dependency version of datumaro

* Update __init__.py

update version string

* Update requirements.txt

* Temporarily skip visual prompting openvino integration test (#2323)

* Fix import dm.DatasetSubset (#2324)

Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>

* Fix semantic segmentation soft prediction dtype (#2322)

* Fix semantic segmentation soft prediction dtype

* relax ref sal vals check

---------

Co-authored-by: Songki Choi <songki.choi@intel.com>

* Contrain yapf verison lesser than 0.40.0 (#2328)

contrain_yapf_version

* Fix detection e2e tests (#2327)

Fix for detection

* Mergeback: Label addtion/deletion 1.2.4 --> 1.4.0 (#2326)

* Make black happy

* Fix conflicts

* Merge-back: add test datasets and edit the test code

* Make black happy

* Fix mis-merge

* Make balck happy

* Fix typo

* Fix typoi

---------

Co-authored-by: Songki Choi <songki.choi@intel.com>

* Bump datumaro up to 1.4.0rc2 (#2332)

bump datumaro up to 1.4.0rc2

* Tiling Doc for releases 1.4.0 (#2333)

* Add tiling documentation

* Bump otx version to 1.4.0rc2 (#2341)

* OTX deploy for visual prompting task  (#2311)

* Enable `otx deploy`

* (WIP) integration test

* Docstring

* Update args for create_model

* Manually set image embedding layout

* Enable to use model api for preprocessing
- `fit_to_window` doesn't work expectedly, so newly implemented `VisualPromptingOpenvinoAdapter` to use new resize function

* Remove skipped test

* Updated

* Update unit tests on model wrappers

* Update

* Update configuration

* Fix not to patch pretrained path

* pylint & update model api version in docstring

---------

Co-authored-by: Wonju Lee <wonju.lee@intel.com>

* Bump albumentations version in anomaly requirements (#2350)

increment albumentations version

* Update action detection (#2346)

* Remove skip mark for PTQ test of action detection

* Update action detection documentation

* Fix e2e (#2348)

* Change classification dataset from dummy to toy

* Revert test changes

* Change label name for multilabel dataset

* Revert e2e test changes

* Change ov test cases' threshold

* Add parent's label

* Update ModelAPI in 1.4 release (#2347)

* Upgrade model API

* Update otx in exportable code

* Fix unit tests

* Fix black

* Fix detection inference

* Fix det tiling

* Fix mypy

* Fix demo

* Fix visualizer in demo

* Fix black

* Add OTX optimize for visual prompting task (#2318)

* Initial commit

* Update block

* (WIP) otx optimize

* Fix

* WIP

* Update configs & exported outputs

* Remove unused modules for torch

* Add unit tests

* pre-commit

* Update CHANGELOG

* Update detection docs (#2335)

* Update detection docs

* Revert template id changes

* Fix wrong template id

* Update docs/source/guide/explanation/algorithms/object_detection/object_detection.rst

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Update docs/source/guide/explanation/algorithms/object_detection/object_detection.rst

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

---------

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Add visual prompting documentation (#2354)

* (WIP) write docs

* Add visual prompting documentation

* Update CHANGELOG

---------

Co-authored-by: sungchul.kim <sungchul@ikvensx010>

* Remove custom modelapi patch in visual prompting (#2359)

* Remove custom modelapi patch

* Update test

* Fix graph metric order and label issues (#2356)

* Fix graph metric going backward issue
* Add license notice
* Fix pre-commit issue
* Add rename items & logic for metric
---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update multi-label document and conversion script (#2358)

Update docs, label convert script

* Update third party programs (#2365)

* Make anomaly task compatible with older albumentations versions (#2363)

* fix transforms export in metadata

* wrap transform dict

* add todo for updating to_dict call

* Fixing detection saliency map for one class case (#2368)

* fix softmax

* fix validity tests

* Add e2e test for visual prompting (#2360)

* (WIP) otx optimize

* pre-commit

* (WIP) set e2e

* Remove nncf config

* Add visual prompting requirement

* Add visual prompting in tox

* Add visual prompting in setup.py

* Fix typo

* Delete unused configuration.yaml

* Edit test_name

* Add to limit activation range

* Update from `vp` to `visprompt`

* Fix about no returning the first label

* pre-commit

* (WIP) otx optimize

* pre-commit

* (WIP) set e2e

* Remove nncf config

* Add visual prompting requirement

* Add visual prompting in tox

* Add visual prompting in setup.py

* Fix typo

* pre-commit

* Add actions

* Update tests/e2e/cli/visual_prompting/test_visual_prompting.py

Co-authored-by: Jaeguk Hyun <jaeguk.hyun@intel.com>

* Skip PTQ e2e test

* Change task name

* Remove skipped tc

---------

Co-authored-by: Jaeguk Hyun <jaeguk.hyun@intel.com>

* Fix e2e (#2366)

* Change e2e reference name

* Update openvino eval threshold for multiclass classification

* Change comment message

* Fix tiling e2e tests

---------

Co-authored-by: GalyaZalesskaya <galina.zalesskaya@intel.com>

* Add Dino head unit tests (#2344)

Recover DINO head unit tests

* Update for release 1.4.0rc2 (#2370)

* update for release 1.4.0rc2

* Add skip mark for unstable unit tests

---------

Co-authored-by: jaegukhyun <jaeguk.hyun@intel.com>

* Fix NNCF training on CPU (#2373)

* Align label order between Geti and OTX (#2369)

* align label order

* align with pre-commit

* update CHANGELOG.md

* deal with edge case

* update type hint

* Remove CenterCrop from Classification test pipeline and editing missing docs link (#2375)

* Fix missing link for docs and removing centercrop for classification data pipeline

* Revert the test threshold

* Fix H-label classification (#2377)

* Fix h-labelissue

* Update unit tests

* Make black happy

* Fix unittests

* Make black happy

* Fix update heades information func

* Update the logic: consider the loss per batch

* Update for release 1.4 (#2380)

* updated for 1.4.0rc3

* update changelog & release note

* bump datumaro version up

---------

Co-authored-by: Songki Choi <songki.choi@intel.com>

* Switch to PTQ for sseg (#2374)

* Switch to PTQ for sseg

* Update log messages

* Fix invalid import structures in otx.api (#2383)

Update tiler.py

* Update for 1.4.0rc4 (#2385)

update for release 1.4.0rc4

* [release 1.4.0] XAI: Return saliency maps for Mask RCNN IR async infer (#2395)

* Return saliency maps for openvino async infer

* add workaround to fix yapf importing error

---------

Co-authored-by: eunwoosh <eunwoo.shin@intel.com>

* Update for release 1.4.0 (#2399)

update version string

Co-authored-by: Sungman Cho <sungman.cho@intel.com>

* Fix broken links in documentation (#2405)

* fix docs links to datumaro's docs
* fix docs links to otx's docs
* bump version to 1.4.1

* Update exportable code README (#2411)

* Updated for release 1.4.1 (#2412)

updated for release 1.4.1

* Add workaround for the incorrect meta info M-RCNN (used for XAI) (#2437)

Add workaround for the incorrect mata info

* Add model category attributes to model template (#2439)

Add model category attributes to model template

* Add model category & status fields in model template

* Add is_default_for_task attr to model template

* Update model templates with category attrs

* Add integration tests for model templates consistency

* Fix license & doc string

* Fix typo

* Refactor test cases

* Refactor common tests by generator

---------
Signed-off-by: Songki Choi <songki.choi@intel.com>

* Update for 1.4.2rc1 (#2441)

update for release 1.4.2rc1

* Fix label list order for h-label classification (#2440)

* Fix label list for h-label cls
* Fix unit tests

* Modified fq numbers for lite HRNET (#2445)

modified fq numbers for lite HRNET

* Update PTQ ignored scope for hrnet 18  mod2 (#2449)

Update ptq ignored scope for hrnet 18  mod2

* Fix OpenVINO inference for legacy models (#2450)

* bug fix for legacy openvino models

* Add tests

* Specific exceptions

---------

* Update for 1.4.2rc2 (#2455)

update for release 1.4.2rc2

* Prevent zero-sized saliency map in tiling if tile size is too big (#2452)

* Prevent zero-sized saliency map in tiling if tile size is too big

* Prevent zero-sized saliency in tiling (PyTorch)

* Add unit tests for Tiler merge features methods

---------

Co-authored-by: Galina <galina.zalesskaya@intel.com>

* Update pot fq reference number (#2456)

update pot fq reference number to 15

* Bump datumaro version to 1.5.0rc0 (#2470)

bump datumaro version to 1.5.0rc0

* Set tox version constraint (#2472)

set tox version constraint - https://github.com/tox-dev/tox/issues/3110

* Bug fix for albumentations (#2467)

* bug fix for legacy openvino models

* Address albumentation issue

---------

Co-authored-by: Ashwin Vaidya <ashwinitinvaidya@gmail.com>

* update for release 1.4.2rc3

* Add a dummy hierarchical config required by MAPI (#2483)

* bump version to 1.4.2rc4

* Bump datumaro version (#2502)

* bump datumaro version

* remove deprecated/reomved attribute usage of the datumaro

* Upgrade nncf version for 1.4 release (#2459)

* Upgrade nncf version

* Fix nncf interface warning

* Set the exact nncf version

* Update FQ refs after NNCF upgrade

* Use NNCF from pypi

* Update version for release 1.4.2rc5 (#2507)

update version for release 1.4.2rc5

* Update for 1.4.2 (#2514)

update for release 1.4.2

* create branch release/1.5.0

* Delete mem cache handler after training is done (#2535)

release mem cache handler after training is done

* Fix bug that auto batch size doesn't consider distributed training (#2533)

* consider distributed training while searching batch size

* update unit test

* reveret gpu memory upper bound

* fix typo

* change allocated to reserved

* add unit test for distributed training

* align with pre-commit

* Apply fix progress hook to release 1.5.0 (#2539)

* Fix hook's ordering issue. AdaptiveRepeatHook changes the runner.max_iters before the ProgressHook

* Change the expression

* Fix typo

* Fix multi-label, h-label issue

* Fix auto_bs issue

* Apply suggestions from code review

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Reflecting reviews

* Refactor the name of get_data_cfg

* Revert adaptive hook sampler init

* Refactor the function name: get_data_cfg -> get_subset_data_cfg

* Fix unit test errors

* Remove adding AdaptiveRepeatDataHook for autobs

* Remove unused import

* Fix detection and segmentation case in Geti scenario

---------

Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>

* Re introduce adaptive scheduling for training (#2541)

* Re-introduce adaptive patience for training

* Revert unit tests

* Update for release 1.4.3rc1 (#2542)

* Mirror Anomaly ModelAPI changes (#2531)

* Migrate anomaly exportable code to modelAPI (#2432)

* Fix license in PR template

* Migrate to modelAPI

* Remove color conversion in streamer

* Remove reverse_input_channels

* Add float

* Remove test as metadata is no longer used

* Remove metadata from load method

* remove anomalib openvino inferencer

* fix signature

* Support logacy OpenVINO model

* Transform image

* add configs

* Re-introduce adaptive training (#2543)

* Re-introduce adaptive patience for training

* Revert unit t…

Loading branch information

22 people authored Mar 11, 2024

1 parent 11865f6 commit ccbfac1

src/otx/algorithms/anomaly/adapters/__init__.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -13,3 +13,7 @@ @@
     # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     # See the License for the specific language governing permissions
     # and limitations under the License.
+    from .anomalib.accelerators.xpu import XPUAccelerator  # noqa: F401
+    from .anomalib.strategies import SingleXPUStrategy  # noqa: F401

src/otx/algorithms/anomaly/adapters/anomalib/accelerators/__init__.py

-Original file line number
+Diff line change
@@ -0,0 +1,8 @@
+    """Lightning accelerator for XPU device."""
+    # Copyright (C) 2023 Intel Corporation
+    # SPDX-License-Identifier: Apache-2.0
+    #
+    from .xpu import XPUAccelerator
+    __all__ = ["XPUAccelerator"]

src/otx/algorithms/anomaly/adapters/anomalib/accelerators/xpu.py

-Original file line number
+Diff line change
@@ -0,0 +1,60 @@
+    """Lightning accelerator for XPU device."""
+    # Copyright (C) 2023 Intel Corporation
+    # SPDX-License-Identifier: Apache-2.0
+    #
+    from typing import Any, Dict, Union
+    import torch
+    from pytorch_lightning.accelerators import AcceleratorRegistry
+    from pytorch_lightning.accelerators.accelerator import Accelerator
+    from otx.algorithms.common.utils.utils import is_xpu_available
+    class XPUAccelerator(Accelerator):
+        """Support for a XPU, optimized for large-scale machine learning."""
+        accelerator_name = "xpu"
+        def setup_device(self, device: torch.device) -> None:
+            """Sets up the specified device."""
+            if device.type != "xpu":
+                raise RuntimeError(f"Device should be xpu, got {device} instead")
+            torch.xpu.set_device(device)
+        @staticmethod
+        def parse_devices(devices: Any) -> Any:
+            """Parses devices for multi-GPU training."""
+            if isinstance(devices, list):
+                return devices
+            return [devices]
+        @staticmethod
+        def get_parallel_devices(devices: Any) -> Any:
+            """Generates a list of parrallel devices."""
+            return [torch.device("xpu", idx) for idx in devices]
+        @staticmethod
+        def auto_device_count() -> int:
+            """Returns number of XPU devices available."""
+            return torch.xpu.device_count()
+        @staticmethod
+        def is_available() -> bool:
+            """Checks if XPU available."""
+            return is_xpu_available()
+        def get_device_stats(self, device: Union[str, torch.device]) -> Dict[str, Any]:
+            """Returns XPU devices stats."""
+            return {}
+        def teardown(self) -> None:
+            """Cleans-up XPU-related resources."""
+            pass
+    AcceleratorRegistry.register(
+        XPUAccelerator.accelerator_name, XPUAccelerator, description="Accelerator supports XPU devices"
+    )

src/otx/algorithms/anomaly/adapters/anomalib/plugins/__init__.py

-Original file line number
+Diff line change
@@ -0,0 +1,7 @@
+    """Plugin for mixed-precision training on XPU."""
+    # Copyright (C) 2023 Intel Corporation
+    # SPDX-License-Identifier: Apache-2.0
+    from .xpu_precision import MixedPrecisionXPUPlugin
+    __all__ = ["MixedPrecisionXPUPlugin"]

src/otx/algorithms/anomaly/adapters/anomalib/plugins/xpu_precision.py

-Original file line number
+Diff line change
@@ -0,0 +1,109 @@
+    """Plugin for mixed-precision training on XPU."""
+    # Copyright (C) 2023 Intel Corporation
+    # SPDX-License-Identifier: Apache-2.0
+    from contextlib import contextmanager
+    from typing import Any, Callable, Dict, Generator, Optional, Union
+    import pytorch_lightning as pl
+    import torch
+    from lightning_fabric.utilities.types import Optimizable
+    from pytorch_lightning.plugins.precision.precision_plugin import PrecisionPlugin
+    from pytorch_lightning.utilities import GradClipAlgorithmType
+    from pytorch_lightning.utilities.exceptions import MisconfigurationException
+    from torch import Tensor
+    from torch.optim import LBFGS, Optimizer
+    class MixedPrecisionXPUPlugin(PrecisionPlugin):
+        """Plugin for Automatic Mixed Precision (AMP) training with ``torch.xpu.autocast``.
+        Args:
+            scaler: An optional :class:`torch.cuda.amp.GradScaler` to use.
+        """
+        def __init__(self, scaler: Optional[Any] = None) -> None:
+            self.scaler = scaler
+        def pre_backward(self, tensor: Tensor, module: "pl.LightningModule") -> Tensor:
+            """Apply grad scaler before backward."""
+            if self.scaler is not None:
+                tensor = self.scaler.scale(tensor)
+            return super().pre_backward(tensor, module)
+        def optimizer_step(  # type: ignore[override]
+            self,
+            optimizer: Optimizable,
+            model: "pl.LightningModule",
+            optimizer_idx: int,
+            closure: Callable[[], Any],
+            **kwargs: Any,
+        ) -> Any:
+            """Make an optimizer step using scaler if it was passed."""
+            if self.scaler is None:
+                # skip scaler logic, as bfloat16 does not require scaler
+                return super().optimizer_step(
+                    optimizer, model=model, optimizer_idx=optimizer_idx, closure=closure, **kwargs
+                )
+            if isinstance(optimizer, LBFGS):
+                raise MisconfigurationException(
+                    f"Native AMP and the LBFGS optimizer are not compatible (optimizer {optimizer_idx})."
+                )
+            closure_result = closure()
+            if not _optimizer_handles_unscaling(optimizer):
+                # Unscaling needs to be performed here in case we are going to apply gradient clipping.
+                # Optimizers that perform unscaling in their `.step()` method are not supported (e.g., fused Adam).
+                # Note: `unscale` happens after the closure is executed, but before the `on_before_optimizer_step` hook.
+                self.scaler.unscale_(optimizer)
+            self._after_closure(model, optimizer, optimizer_idx)
+            skipped_backward = closure_result is None
+            # in manual optimization, the closure does not return a value
+            if not model.automatic_optimization or not skipped_backward:
+                # note: the scaler will skip the `optimizer.step` if nonfinite gradients are found
+                step_output = self.scaler.step(optimizer, **kwargs)
+                self.scaler.update()
+                return step_output
+            return closure_result
+        def clip_gradients(
+            self,
+            optimizer: Optimizer,
+            clip_val: Union[int, float] = 0.0,
+            gradient_clip_algorithm: GradClipAlgorithmType = GradClipAlgorithmType.NORM,
+        ) -> None:
+            """Handle grad clipping with scaler."""
+            if clip_val > 0 and _optimizer_handles_unscaling(optimizer):
+                raise RuntimeError(
+                    f"The current optimizer, {type(optimizer).__qualname__}, does not allow for gradient clipping"
+                    " because it performs unscaling of gradients internally. HINT: Are you using a 'fused' optimizer?"
+                )
+            super().clip_gradients(optimizer=optimizer, clip_val=clip_val, gradient_clip_algorithm=gradient_clip_algorithm)
+        @contextmanager
+        def forward_context(self) -> Generator[None, None, None]:
+            """Enable autocast context."""
+            with torch.xpu.autocast(True):
+                yield
+        def state_dict(self) -> Dict[str, Any]:
+            """Returns state dict of the plugin."""
+            if self.scaler is not None:
+                return self.scaler.state_dict()
+            return {}
+        def load_state_dict(self, state_dict: Dict[str, Any]) -> None:
+            """Loads state dict to the plugin."""
+            if self.scaler is not None:
+                self.scaler.load_state_dict(state_dict)
+    def _optimizer_handles_unscaling(optimizer: Any) -> bool:
+        """Determines if a PyTorch optimizer handles unscaling gradients in the step method ratherthan through the scaler.
+        Since, the current implementation of this function checks a PyTorch internal variable on the optimizer, the return
+        value will only be reliable for built-in PyTorch optimizers.
+        """
+        return getattr(optimizer, "_step_supports_amp_scaling", False)

src/otx/algorithms/anomaly/adapters/anomalib/strategies/__init__.py

-Original file line number
+Diff line change
@@ -0,0 +1,8 @@
+    """Lightning strategy for single XPU device."""
+    # Copyright (C) 2023 Intel Corporation
+    # SPDX-License-Identifier: Apache-2.0
+    #
+    from .xpu_single import SingleXPUStrategy
+    __all__ = ["SingleXPUStrategy"]

src/otx/algorithms/anomaly/adapters/anomalib/strategies/xpu_single.py

-Original file line number
+Diff line change
@@ -0,0 +1,60 @@
+    """Lightning strategy for single XPU device."""
+    # Copyright (C) 2023 Intel Corporation
+    # SPDX-License-Identifier: Apache-2.0
+    #
+    from typing import Optional
+    import pytorch_lightning as pl
+    import torch
+    from lightning_fabric.plugins import CheckpointIO
+    from lightning_fabric.utilities.types import _DEVICE
+    from pytorch_lightning.plugins.precision import PrecisionPlugin
+    from pytorch_lightning.strategies import StrategyRegistry
+    from pytorch_lightning.strategies.single_device import SingleDeviceStrategy
+    from pytorch_lightning.utilities.exceptions import MisconfigurationException
+    from otx.algorithms.common.utils.utils import is_xpu_available
+    class SingleXPUStrategy(SingleDeviceStrategy):
+        """Strategy for training on single XPU device."""
+        strategy_name = "xpu_single"
+        def __init__(
+            self,
+            device: _DEVICE = "xpu:0",
+            accelerator: Optional["pl.accelerators.Accelerator"] = None,
+            checkpoint_io: Optional[CheckpointIO] = None,
+            precision_plugin: Optional[PrecisionPlugin] = None,
+        ):
+            if not is_xpu_available():
+                raise MisconfigurationException("`SingleXPUStrategy` requires XPU devices to run")
+            super().__init__(
+                accelerator=accelerator,
+                device=device,
+                checkpoint_io=checkpoint_io,
+                precision_plugin=precision_plugin,
+            )
+        @property
+        def is_distributed(self) -> bool:
+            """Returns true if the strategy supports distributed training."""
+            return False
+        def setup_optimizers(self, trainer: "pl.Trainer") -> None:
+            """Sets up optimizers."""
+            super().setup_optimizers(trainer)
+            if len(self.optimizers) != 1:  # type: ignore
+                raise RuntimeError("XPU strategy doesn't support multiple optimizers")
+            model, optimizer = torch.xpu.optimize(trainer.model, optimizer=self.optimizers[0])  # type: ignore
+            self.optimizers = [optimizer]
+            trainer.model = model
+    StrategyRegistry.register(
+        SingleXPUStrategy.strategy_name, SingleXPUStrategy, description="Strategy that enables training on single XPU"
+    )

src/otx/algorithms/anomaly/tasks/train.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -30,6 +30,8 @@ @@
     from otx.algorithms.anomaly.adapters.anomalib.callbacks import IterationTimer, ProgressCallback
     from otx.algorithms.anomaly.adapters.anomalib.data import OTXAnomalyDataModule
+    from otx.algorithms.anomaly.adapters.anomalib.plugins.xpu_precision import MixedPrecisionXPUPlugin
+    from otx.algorithms.common.utils.utils import is_xpu_available
     from otx.api.entities.datasets import DatasetEntity
     from otx.api.entities.model import ModelEntity
     from otx.api.entities.train_parameters import TrainParameters
@@ Expand Down Expand Up / @@ -90,7 +92,20 @@ def train( @@
                 IterationTimer(on_step=False),
             ]
-            self.trainer = Trainer(**config.trainer, logger=CSVLogger(self.project_path, name=""), callbacks=callbacks)
+            plugins = []
+            if config.trainer.plugins is not None:
+                plugins.extend(config.trainer.plugins)
+            config.trainer.pop("plugins")
+            if is_xpu_available():
+                config.trainer.strategy = "xpu_single"
+                config.trainer.accelerator = "xpu"
+                if config.trainer.precision == 16:
+                    plugins.append(MixedPrecisionXPUPlugin())
+            self.trainer = Trainer(
+                **config.trainer, logger=CSVLogger(self.project_path, name=""), callbacks=callbacks, plugins=plugins
+            )
             self.trainer.fit(model=self.model, datamodule=datamodule)
             self.save_model(output_model)
@@ Expand Down @@

src/otx/algorithms/classification/adapters/mmcls/apis/__init__.py

-Original file line number
+Diff line change
@@ -0,0 +1,8 @@
+    """Adapters of classification - mmcls."""
+    # Copyright (C) 2023 Intel Corporation
+    # SPDX-License-Identifier: Apache-2.0
+    from .train import train_model
+    __all__ = ["train_model"]

0 comments on commit `ccbfac1`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `ccbfac1`

Commit

There are no files selected for viewing

0 comments on commit ccbfac1

0 comments on commit `ccbfac1`