forked from open-mmlab/mmsegmentation
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This PR adds a webcam demo tool with the following features: 1. Read video stream from the webcam or offline video file 2. Async model inference (detection + human pose + animal pose) and video I/O 3. Optimized visualization functions of bbox, object label, and pose 4. Apply special effects (sunglasses or bug-eye) 5. Show statistic information, e.g., FPS, CPU usage. 6. Optionally, save out the video
- Loading branch information
Showing
21 changed files
with
1,175 additions
and
48 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
## Webcam Demo | ||
|
||
We provide a webcam demo tool which integrartes detection and 2D pose estimation for humans and animals. You can simply run the following command: | ||
|
||
```python | ||
python demo/webcam_demo.py | ||
``` | ||
|
||
It will launch a window to display the webcam video steam with detection and pose estimation results: | ||
|
||
<div align="center"> | ||
<img src="https://user-images.githubusercontent.com/15977946/124059525-ce20c580-da5d-11eb-8e4a-2d96cd31fe9f.gif" width="600px" alt><br> | ||
</div> | ||
|
||
### Usage Tips | ||
|
||
- **Which model is used in the demo tool?** | ||
|
||
Please check the following default arguments in the script. You can also choose other models from the [MMDetection Model Zoo](https://github.com/open-mmlab/mmdetection/blob/master/docs/model_zoo.md) and [MMPose Model Zoo](https://mmpose.readthedocs.io/en/latest/modelzoo.html#) or use your own models. | ||
|
||
| Model | Arguments | | ||
| :--: | :-- | | ||
| Detection | `--det-config`, `--det-checkpoint` | | ||
| Human Pose | `--human-pose-config`, `--human-pose-checkpoint` | | ||
| Animal Pose | `--animal-pose-config`, `--animal-pose-checkpoint` | | ||
|
||
- **Can this tool run without GPU?** | ||
|
||
Yes, you can set `--device=cpu` and the model inference will be performed on CPU. Of course, this may cause a low inference FPS compared to using GPU devices. | ||
|
||
- **Why there is time delay between the pose visualization and the video?** | ||
|
||
The video I/O and model inference are running asynchronously and the latter usually takes more time for a single frame. To allevidate the time delay, you can: | ||
|
||
1. set `--display-delay=MILLISECONDS` to defer the video stream, according to the inference delay shown at the top left corner. Or, | ||
|
||
2. set `--synchronous-mode` to force video stream being aligned with inference results. This may reduce the video display FPS. | ||
|
||
- **Can this tool process video files?** | ||
|
||
Yes. You can set `--cam_id=VIDEO_FILE_PATH` to run the demo tool in offline mode on a video file. Note that `--synchronous-mode` should be set in this case. | ||
|
||
- **How to enable/disable the special effects?** | ||
|
||
The special effects can be enabled/disabled at launch time by setting arguments like `--bugeye`, `--sunglasses`, *etc*. You can also toggle the effects by keyboard shorcuts like `b`, `s` when the tool starts. | ||
|
||
- **What if my computer doesn't have a camera?** | ||
|
||
You can use a smart phone as a webcam with apps like [Camo](https://reincubate.com/camo/) or [DroidCam](https://www.dev47apps.com/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
# model settings | ||
model = dict( | ||
type='YOLOV3', | ||
pretrained='open-mmlab://darknet53', | ||
backbone=dict(type='Darknet', depth=53, out_indices=(3, 4, 5)), | ||
neck=dict( | ||
type='YOLOV3Neck', | ||
num_scales=3, | ||
in_channels=[1024, 512, 256], | ||
out_channels=[512, 256, 128]), | ||
bbox_head=dict( | ||
type='YOLOV3Head', | ||
num_classes=80, | ||
in_channels=[512, 256, 128], | ||
out_channels=[1024, 512, 256], | ||
anchor_generator=dict( | ||
type='YOLOAnchorGenerator', | ||
base_sizes=[[(116, 90), (156, 198), (373, 326)], | ||
[(30, 61), (62, 45), (59, 119)], | ||
[(10, 13), (16, 30), (33, 23)]], | ||
strides=[32, 16, 8]), | ||
bbox_coder=dict(type='YOLOBBoxCoder'), | ||
featmap_strides=[32, 16, 8], | ||
loss_cls=dict( | ||
type='CrossEntropyLoss', | ||
use_sigmoid=True, | ||
loss_weight=1.0, | ||
reduction='sum'), | ||
loss_conf=dict( | ||
type='CrossEntropyLoss', | ||
use_sigmoid=True, | ||
loss_weight=1.0, | ||
reduction='sum'), | ||
loss_xy=dict( | ||
type='CrossEntropyLoss', | ||
use_sigmoid=True, | ||
loss_weight=2.0, | ||
reduction='sum'), | ||
loss_wh=dict(type='MSELoss', loss_weight=2.0, reduction='sum')), | ||
# training and testing settings | ||
train_cfg=dict( | ||
assigner=dict( | ||
type='GridAssigner', | ||
pos_iou_thr=0.5, | ||
neg_iou_thr=0.5, | ||
min_pos_iou=0)), | ||
test_cfg=dict( | ||
nms_pre=1000, | ||
min_bbox_size=0, | ||
score_thr=0.05, | ||
conf_thr=0.005, | ||
nms=dict(type='nms', iou_threshold=0.45), | ||
max_per_img=100)) | ||
# dataset settings | ||
dataset_type = 'CocoDataset' | ||
data_root = 'data/coco' | ||
img_norm_cfg = dict(mean=[0, 0, 0], std=[255., 255., 255.], to_rgb=True) | ||
train_pipeline = [ | ||
dict(type='LoadImageFromFile', to_float32=True), | ||
dict(type='LoadAnnotations', with_bbox=True), | ||
dict(type='PhotoMetricDistortion'), | ||
dict( | ||
type='Expand', | ||
mean=img_norm_cfg['mean'], | ||
to_rgb=img_norm_cfg['to_rgb'], | ||
ratio_range=(1, 2)), | ||
dict( | ||
type='MinIoURandomCrop', | ||
min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), | ||
min_crop_size=0.3), | ||
dict(type='Resize', img_scale=(320, 320), keep_ratio=True), | ||
dict(type='RandomFlip', flip_ratio=0.5), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='Pad', size_divisor=32), | ||
dict(type='DefaultFormatBundle'), | ||
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) | ||
] | ||
test_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict( | ||
type='MultiScaleFlipAug', | ||
img_scale=(320, 320), | ||
flip=False, | ||
transforms=[ | ||
dict(type='Resize', keep_ratio=True), | ||
dict(type='RandomFlip'), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='Pad', size_divisor=32), | ||
dict(type='DefaultFormatBundle'), | ||
dict(type='Collect', keys=['img']) | ||
]) | ||
] | ||
data = dict( | ||
samples_per_gpu=8, | ||
workers_per_gpu=4, | ||
train=dict( | ||
type=dataset_type, | ||
ann_file=f'{data_root}/annotations/instances_train2017.json', | ||
img_prefix=f'{data_root}/train2017/', | ||
pipeline=train_pipeline), | ||
val=dict( | ||
type=dataset_type, | ||
ann_file=f'{data_root}/annotations/instances_val2017.json', | ||
img_prefix=f'{data_root}/val2017/', | ||
pipeline=test_pipeline), | ||
test=dict( | ||
type=dataset_type, | ||
ann_file=f'{data_root}/annotations/instances_val2017.json', | ||
img_prefix=f'{data_root}/val2017/', | ||
pipeline=test_pipeline)) | ||
# optimizer | ||
optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0005) | ||
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) | ||
# learning policy | ||
lr_config = dict( | ||
policy='step', | ||
warmup='linear', | ||
warmup_iters=2000, # same as burn-in in darknet | ||
warmup_ratio=0.1, | ||
step=[218, 246]) | ||
# runtime settings | ||
runner = dict(type='EpochBasedRunner', max_epochs=273) | ||
evaluation = dict(interval=1, metric=['bbox']) | ||
|
||
checkpoint_config = dict(interval=1) | ||
# yapf:disable | ||
log_config = dict( | ||
interval=50, | ||
hooks=[ | ||
dict(type='TextLoggerHook'), | ||
# dict(type='TensorboardLoggerHook') | ||
]) | ||
# yapf:enable | ||
custom_hooks = [dict(type='NumClassCheckHook')] | ||
|
||
dist_params = dict(backend='nccl') | ||
log_level = 'INFO' | ||
load_from = None | ||
resume_from = None | ||
workflow = [('train', 1)] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.