Skip to content

Commit

Permalink
[Refactor] Make sure the pipeline argument shape is in WH order (#9324)
Browse files Browse the repository at this point in the history
* Keep input wh shape format in pipeline

* update

* fix lint

* add doc

* update

* update

* update

* update

* fix lint

* update title

* fix comment
  • Loading branch information
hhaAndroid authored Dec 20, 2022
1 parent 92e2eb3 commit 7185b5a
Show file tree
Hide file tree
Showing 7 changed files with 151 additions and 78 deletions.
3 changes: 2 additions & 1 deletion configs/yolox/yolox_s_8xb8-300e_coco.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
_base_ = ['../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py']

img_scale = (640, 640) # height, width
img_scale = (640, 640) # width, height

# model settings
model = dict(
Expand Down Expand Up @@ -83,6 +83,7 @@
dict(
type='RandomAffine',
scaling_ratio_range=(0.1, 2),
# img_scale is (width, height)
border=(-img_scale[0] // 2, -img_scale[1] // 2)),
dict(
type='MixUp',
Expand Down
3 changes: 2 additions & 1 deletion configs/yolox/yolox_tiny_8xb8-300e_coco.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
neck=dict(in_channels=[96, 192, 384], out_channels=96),
bbox_head=dict(in_channels=96, feat_channels=96))

img_scale = (640, 640) # height, width
img_scale = (640, 640) # width, height

# file_client_args = dict(
# backend='petrel',
Expand All @@ -28,6 +28,7 @@
dict(
type='RandomAffine',
scaling_ratio_range=(0.5, 1.5),
# img_scale is (width, height)
border=(-img_scale[0] // 2, -img_scale[1] // 2)),
dict(type='YOLOXHSVRandomAug'),
dict(type='RandomFlip', prob=0.5),
Expand Down
33 changes: 33 additions & 0 deletions docs/en/advanced_guides/conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,39 @@

Please check the following conventions if you would like to modify MMDetection as your own project.

## About the order of image shape

In OpenMMLab 2.0, to be consistent with the input argument of OpenCV, the argument about image shape in the data transformation pipeline is always in the `(width, height)` order. On the contrary, for computation convenience, the order of the field going through the data pipeline and the model is `(height, width)`. Specifically, in the results processed by each data transform pipeline, the fields and their value meaning is as below:

- img_shape: (height, width)
- ori_shape: (height, width)
- pad_shape: (height, width)
- batch_input_shape: (height, width)

As an example, the initialization arguments of `Mosaic` are as below:

```python
@TRANSFORMS.register_module()
class Mosaic(BaseTransform):
def __init__(self,
img_scale: Tuple[int, int] = (640, 640),
center_ratio_range: Tuple[float, float] = (0.5, 1.5),
bbox_clip_border: bool = True,
pad_val: float = 114.0,
prob: float = 1.0) -> None:
...

# img_scale order should be (width, height)
self.img_scale = img_scale

def transform(self, results: dict) -> dict:
...

results['img'] = mosaic_img
# (height, width)
results['img_shape'] = mosaic_img.shape[:2]
```

## Loss

In MMDetection, a `dict` containing losses and metrics will be returned by `model(**data)`.
Expand Down
34 changes: 34 additions & 0 deletions docs/zh_cn/advanced_guides/conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,40 @@

如果你想把 MMDetection 修改为自己的项目,请遵循下面的约定。

## 关于图片 shape 顺序的说明

在OpenMMLab 2.0中, 为了与 OpenCV 的输入参数相一致,图片处理 pipeline 中关于图像 shape 的输入参数总是以 `(width, height)` 的顺序排列。
相反,为了计算方便,经过 pipeline 和 model 的字段的顺序是 `(height, width)`。具体来说在每个数据 pipeline 处理的结果中,字段和它们的值含义如下:

- img_shape: (height, width)
- ori_shape: (height, width)
- pad_shape: (height, width)
- batch_input_shape: (height, width)

`Mosaic` 为例,其初始化参数如下所示:

```python
@TRANSFORMS.register_module()
class Mosaic(BaseTransform):
def __init__(self,
img_scale: Tuple[int, int] = (640, 640),
center_ratio_range: Tuple[float, float] = (0.5, 1.5),
bbox_clip_border: bool = True,
pad_val: float = 114.0,
prob: float = 1.0) -> None:
...

# img_scale 顺序应该是 (width, height)
self.img_scale = img_scale

def transform(self, results: dict) -> dict:
...

results['img'] = mosaic_img
# (height, width)
results['img_shape'] = mosaic_img.shape[:2]
```

## 损失

在 MMDetection 中,`model(**data)` 的返回值是一个字典,包含着所有的损失和评价指标,他们将会由 `model(**data)` 返回。
Expand Down
4 changes: 2 additions & 2 deletions mmdet/datasets/transforms/formatting.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ class PackDetInputs(BaseTransform):
- ``img_path``: path to the image file
- ``ori_shape``: original shape of the image as a tuple (h, w, c)
- ``ori_shape``: original shape of the image as a tuple (h, w)
- ``img_shape``: shape of the image input to the network as a tuple \
(h, w, c). Note that images may be zero padded on the \
(h, w). Note that images may be zero padded on the \
bottom/right if the batch tensor is larger than this shape.
- ``scale_factor``: a float indicating the preprocessing scale
Expand Down
Loading

0 comments on commit 7185b5a

Please sign in to comment.