Auto adapting batch size #2119

eunwoosh · 2023-05-09T01:30:29Z

Summary

Implement a feature to adapt a batch size according to current GPU memory size automatically.
It supports action, classification, detection and segmentation tasks.

Detail

This feature finds a batch size which uses almost GPU memory. It generally increases batch size, so, it can reduce a training time when comparing it to small batch size with same epochs. To make it work well, it uses some techniques. First, it tries to find just a big enough batch size instead of maximum runnable batch size. Finding a maximum value need quite much time although the advantage isn't huge compared to finding just a big enough batch size. Second, it estimates an optimal batch size using estimated equation. it can reduce much time compared to other methods. Finally, learning rate is increased sqrt(k) times,
when increasing batch size k times. It's based on experiments and also, it's supported theoretically.

How to test

You can use the feature by adding --learning_parameters.auto_adapt_batch_size Full.
You can also test the feature by running pytest -k auto_adapt_batch_size tests/integration/cli/

Checklist

I have added unit tests to cover my changes.
I have added integration tests to cover my changes.
I have added e2e tests for validation.
I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).
I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
I have linked related issues.

License

I submit my code changes under the same Apache License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below).

# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

harimkang

I left one comment.

otx/algorithms/detection/adapters/mmdet/task.py

sungmanc

Why do we need to separate auto_decrease_batch_size and auto_adapt_batch_size ?
How about unifying the parameters? auto_adapt_batch_size could include the concept of auto_decrease_batch_size.

eunwoosh · 2023-05-09T05:08:22Z

Why do we need to separate auto_decrease_batch_size and auto_adapt_batch_size ? How about unifying the parameters? auto_adapt_batch_size could include the concept of auto_decrease_batch_size.

IMHO, because they have different purpose. Purpose of auto_decrease_batch_size is decreasing batch size only when current batch size isn't fit to GPU memory. In contrast, Purpose of auto_adapt_batch_size is making training faster by increasing batch size. And I also concern that if unifying them, sem-seg and ins-seg will use auto_adapt_batch_size as default. Currently, auto_decrease_batch_size is turned on as default in those tasks because they are easy to make GPU OOM.

harimkang · 2023-05-09T06:02:46Z

How about getting a key named auto_adapt_batch_size as a str like.. auto_adapt_batch_size in ("decrease", "auto") (just example)
I don't see the need to have two separate parameters. Wouldn't it be nice to have multiple behaviors in one?

eunwoosh · 2023-05-09T06:18:30Z

How about getting a key named auto_adapt_batch_size as a str like.. auto_adapt_batch_size in ("decrease", "auto") (just example) I don't see the need to have two separate parameters. Wouldn't it be nice to have multiple behaviors in one?

I think it's a good idea. I'll apply it after asking others' opinions.

otx/algorithms/action/adapters/mmaction/task.py

otx/algorithms/common/adapters/mmcv/utils/automatic_bs.py

eunwoosh · 2023-05-11T02:40:40Z

I combined separated arguments to a single one and applied other comments. Could you review my PR? @harimkang @supersoob @sungmanc

otx/algorithms/common/configs/configuration_enums.py

sungmanc

LGTM, with minor comments

otx/algorithms/action/configs/classification/configuration.yaml

sungmanc

Thanks very much for applying my minor suggestion :), LGTM.

otx/algorithms/common/adapters/torch/utils/bs_search_algo.py

otx/algorithms/common/configs/configuration_enums.py

* refactore bs_search_algo * implement draft * refine big bs search algorithm * refine algorithm * enable auto adapt bs to detection task * enable auto adapt bs to other tasks * update test_automatic_bs.py * refine bs estimation algo & implement unit test * align with pre-commit * implement intg test * update changelog * fix typo * fix typo * disable auto_adapt_batch_size while HPO * update algorithm * change interface * fix typo * exclude validation in cls task * use root scale to both cases * update test code * add comment to BatchSizeAdaptType enum * refine sentence * update hpo.py * update unused test case * fix typo

eunwoosh added this to the 1.3.0 milestone May 9, 2023

github-actions bot added ALGO Any changes in OTX Algo Tasks implementation TEST Any changes in tests DOC Improvements or additions to documentation labels May 9, 2023

eunwoosh marked this pull request as ready for review May 9, 2023 01:32

eunwoosh requested a review from a team as a code owner May 9, 2023 01:32

github-actions bot added the CLI Any changes in OTE CLI label May 9, 2023

harimkang reviewed May 9, 2023

View reviewed changes

otx/algorithms/detection/adapters/mmdet/task.py Outdated Show resolved Hide resolved

sungmanc reviewed May 9, 2023

View reviewed changes

supersoob reviewed May 10, 2023

View reviewed changes

otx/algorithms/action/adapters/mmaction/task.py Outdated Show resolved Hide resolved

supersoob reviewed May 10, 2023

View reviewed changes

otx/algorithms/common/adapters/mmcv/utils/automatic_bs.py Outdated Show resolved Hide resolved

otx/algorithms/common/adapters/mmcv/utils/automatic_bs.py Outdated Show resolved Hide resolved

eunwoosh added 16 commits May 11, 2023 10:11

refactore bs_search_algo

eb827fe

implement draft

a070a24

refine big bs search algorithm

22b5e11

refine algorithm

bf4289f

enable auto adapt bs to detection task

9206c43

enable auto adapt bs to other tasks

6356ed3

update test_automatic_bs.py

1c957af

refine bs estimation algo & implement unit test

851f353

align with pre-commit

1c74e92

implement intg test

0046df0

update changelog

0ab43b3

fix typo

1bb633d

fix typo

8737054

disable auto_adapt_batch_size while HPO

4308db1

update algorithm

89c32cc

change interface

0e6ee18

fix typo

1dea19b

eunwoosh force-pushed the es/increase_gpu_mem branch from 9bbb098 to 1dea19b Compare May 11, 2023 01:12

eunwoosh added 3 commits May 11, 2023 11:18

exclude validation in cls task

33a2018

use root scale to both cases

262fa51

update test code

829b699

eunwoosh requested review from supersoob, sungmanc and harimkang May 11, 2023 02:35

harimkang reviewed May 11, 2023

View reviewed changes

otx/algorithms/common/configs/configuration_enums.py Show resolved Hide resolved

add comment to BatchSizeAdaptType enum

2dcb527

sungmanc reviewed May 11, 2023

View reviewed changes

otx/algorithms/action/configs/classification/configuration.yaml Outdated Show resolved Hide resolved

otx/algorithms/action/configs/classification/configuration.yaml Outdated Show resolved Hide resolved

refine sentence

942eb90

sungmanc previously approved these changes May 11, 2023

View reviewed changes

harimkang previously approved these changes May 11, 2023

View reviewed changes

supersoob previously approved these changes May 11, 2023

View reviewed changes

update hpo.py

cbf2dc0

eunwoosh dismissed stale reviews from supersoob, harimkang, and sungmanc via cbf2dc0 May 11, 2023 14:56

update unused test case

a43d745

eunwoosh requested review from supersoob, harimkang and sungmanc May 12, 2023 04:04

fix typo

27244bd

harimkang reviewed May 12, 2023

View reviewed changes

otx/algorithms/common/adapters/torch/utils/bs_search_algo.py Show resolved Hide resolved

otx/algorithms/common/adapters/torch/utils/bs_search_algo.py Show resolved Hide resolved

otx/algorithms/common/configs/configuration_enums.py Show resolved Hide resolved

harimkang approved these changes May 12, 2023

View reviewed changes

supersoob approved these changes May 12, 2023

View reviewed changes

eunwoosh merged commit 7db5bbb into openvinotoolkit:develop May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto adapting batch size #2119

Auto adapting batch size #2119

eunwoosh commented May 9, 2023 •

edited

Loading

harimkang left a comment

sungmanc left a comment

eunwoosh commented May 9, 2023

harimkang commented May 9, 2023 •

edited

Loading

eunwoosh commented May 9, 2023

eunwoosh commented May 11, 2023

sungmanc left a comment

sungmanc left a comment

Auto adapting batch size #2119

Auto adapting batch size #2119

Conversation

eunwoosh commented May 9, 2023 • edited Loading

Summary

Detail

How to test

Checklist

License

harimkang left a comment

Choose a reason for hiding this comment

sungmanc left a comment

Choose a reason for hiding this comment

eunwoosh commented May 9, 2023

harimkang commented May 9, 2023 • edited Loading

eunwoosh commented May 9, 2023

eunwoosh commented May 11, 2023

sungmanc left a comment

Choose a reason for hiding this comment

sungmanc left a comment

Choose a reason for hiding this comment

eunwoosh commented May 9, 2023 •

edited

Loading

harimkang commented May 9, 2023 •

edited

Loading