Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto adapting batch size #2119

Merged
merged 25 commits into from
May 12, 2023
Merged

Conversation

eunwoosh
Copy link
Contributor

@eunwoosh eunwoosh commented May 9, 2023

Summary

  • Implement a feature to adapt a batch size according to current GPU memory size automatically.
  • It supports action, classification, detection and segmentation tasks.

Detail

This feature finds a batch size which uses almost GPU memory. It generally increases batch size, so, it can reduce a training time when comparing it to small batch size with same epochs. To make it work well, it uses some techniques. First, it tries to find just a big enough batch size instead of maximum runnable batch size. Finding a maximum value need quite much time although the advantage isn't huge compared to finding just a big enough batch size. Second, it estimates an optimal batch size using estimated equation. it can reduce much time compared to other methods. Finally, learning rate is increased sqrt(k) times,
when increasing batch size k times. It's based on experiments and also, it's supported theoretically.

How to test

You can use the feature by adding --learning_parameters.auto_adapt_batch_size Full.
You can also test the feature by running pytest -k auto_adapt_batch_size tests/integration/cli/

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have added e2e tests for validation.
  • I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).​
  • I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
  • I have linked related issues.

License

  • I submit my code changes under the same Apache License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

@eunwoosh eunwoosh added this to the 1.3.0 milestone May 9, 2023
@github-actions github-actions bot added ALGO Any changes in OTX Algo Tasks implementation TEST Any changes in tests DOC Improvements or additions to documentation labels May 9, 2023
@eunwoosh eunwoosh marked this pull request as ready for review May 9, 2023 01:32
@eunwoosh eunwoosh requested a review from a team as a code owner May 9, 2023 01:32
@github-actions github-actions bot added the CLI Any changes in OTE CLI label May 9, 2023
Copy link
Contributor

@harimkang harimkang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment.

otx/algorithms/detection/adapters/mmdet/task.py Outdated Show resolved Hide resolved
Copy link
Contributor

@sungmanc sungmanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to separate auto_decrease_batch_size and auto_adapt_batch_size ?
How about unifying the parameters? auto_adapt_batch_size could include the concept of auto_decrease_batch_size.

@eunwoosh
Copy link
Contributor Author

eunwoosh commented May 9, 2023

Why do we need to separate auto_decrease_batch_size and auto_adapt_batch_size ? How about unifying the parameters? auto_adapt_batch_size could include the concept of auto_decrease_batch_size.

IMHO, because they have different purpose. Purpose of auto_decrease_batch_size is decreasing batch size only when current batch size isn't fit to GPU memory. In contrast, Purpose of auto_adapt_batch_size is making training faster by increasing batch size. And I also concern that if unifying them, sem-seg and ins-seg will use auto_adapt_batch_size as default. Currently, auto_decrease_batch_size is turned on as default in those tasks because they are easy to make GPU OOM.

@harimkang
Copy link
Contributor

harimkang commented May 9, 2023

How about getting a key named auto_adapt_batch_size as a str like.. auto_adapt_batch_size in ("decrease", "auto") (just example)
I don't see the need to have two separate parameters. Wouldn't it be nice to have multiple behaviors in one?

@eunwoosh
Copy link
Contributor Author

eunwoosh commented May 9, 2023

How about getting a key named auto_adapt_batch_size as a str like.. auto_adapt_batch_size in ("decrease", "auto") (just example) I don't see the need to have two separate parameters. Wouldn't it be nice to have multiple behaviors in one?

I think it's a good idea. I'll apply it after asking others' opinions.

@eunwoosh eunwoosh force-pushed the es/increase_gpu_mem branch from 9bbb098 to 1dea19b Compare May 11, 2023 01:12
@eunwoosh
Copy link
Contributor Author

I combined separated arguments to a single one and applied other comments. Could you review my PR? @harimkang @supersoob @sungmanc

Copy link
Contributor

@sungmanc sungmanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with minor comments

sungmanc
sungmanc previously approved these changes May 11, 2023
Copy link
Contributor

@sungmanc sungmanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for applying my minor suggestion :), LGTM.

harimkang
harimkang previously approved these changes May 11, 2023
supersoob
supersoob previously approved these changes May 11, 2023
@eunwoosh eunwoosh dismissed stale reviews from supersoob, harimkang, and sungmanc via cbf2dc0 May 11, 2023 14:56
@eunwoosh eunwoosh merged commit 7db5bbb into openvinotoolkit:develop May 12, 2023
vinnamkim pushed a commit that referenced this pull request May 15, 2023
* refactore bs_search_algo

* implement draft

* refine big bs search algorithm

* refine algorithm

* enable auto adapt bs to detection task

* enable auto adapt bs to other tasks

* update test_automatic_bs.py

* refine bs estimation algo & implement unit test

* align with pre-commit

* implement intg test

* update changelog

* fix typo

* fix typo

* disable auto_adapt_batch_size while HPO

* update algorithm

* change interface

* fix typo

* exclude validation in cls task

* use root scale to both cases

* update test code

* add comment to BatchSizeAdaptType enum

* refine sentence

* update hpo.py

* update unused test case

* fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ALGO Any changes in OTX Algo Tasks implementation CLI Any changes in OTE CLI DOC Improvements or additions to documentation TEST Any changes in tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants