Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
wymanCV committed Sep 14, 2023
1 parent 0ff774b commit 41c11cb
Show file tree
Hide file tree
Showing 60 changed files with 11,153 additions and 2 deletions.
179 changes: 177 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,177 @@
# SOMA
[ICCV' 23] Novel Scenes & Classes: Towards Adaptive Open-set Object Detection
# [Novel Scenes & Classes: Towards Adaptive Open-set Object Detection (ICCV-23 ORAL)](assets/paper.pdf)

By [Wuyang Li](https://wymancv.github.io/wuyang.github.io/)

Paper link will be updated after the CVF open access.

<div align=center>
<img src="./assets/mot.png" width="400">
</div>

Domain Adaptive Object Detection (DAOD) strongly assumes a shared class space between the two domains.

This work breaks the assumption and formulates Adaptive Open-set Object Detection (AOOD), by allowing the target domain with novel-class objects.

The object detector uses the base-class labels in the source domain for training, and aims to detect base-class objects and identify novel-class objects as unknown in the target domain.

If you have any ideas and problems hope to discuss, you can reach me out via [E-mail](mailto:wuyangli2-c@my.cityu.edu.hk).

# 💡 Preparation

## Setp 1: Clone and Install the Project

### Clone the repository

```bash
git clone https://github.com/CityU-AIM-Group/SOMA.git
```

### Install the project following [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR)

Note that the following is in line with our experimental environments, which is silightly different from the official one.

```
# Linux, CUDA>=9.2, GCC>=5.4
# (ours) CUDA=10.2, GCC=8.4, NVIDIA V100
# Establish the conda environment
conda create -n aood python=3.7 pip
conda activate aood
conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
# Compile the project
cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py
# NOTE: If you meet the permission denied issue when starting the training
cd ../../
chmod -R 777 ./
```

## Setp 2: Download Necessary Resources

### Download pre-processed datasets (VOC format) from the following links

| | (Foggy) Cityscapes | Pascal VOC | Clipart | BDD100K |
| :------------: | :------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: |
| Official Links | [Imgs](https://www.cityscapes-dataset.com/login/) | [Imgs+Labels](https://pjreddie.com/projects/pascal-voc-dataset-mirror/) | - | - |
| Our Links | [Labels](https://portland-my.sharepoint.com/:u:/g/personal/wuyangli2-c_my_cityu_edu_hk/EVNAjK2JkG9ChREzzqdqJkYBLoZ_VOqkMdhWasN_BETGWw?e=fP9Ae4) | - | [Imgs+Labels](https://portland-my.sharepoint.com/:u:/g/personal/wuyangli2-c_my_cityu_edu_hk/Edz2YcXHuStIqwM_NA7k8FMBGLeyAGQcSjdSR-vYaVx_vw?e=es6KDW) | [Imgs+Labels](https://portland-my.sharepoint.com/:u:/g/personal/wuyangli2-c_my_cityu_edu_hk/EeiO6O36QgZKnTcUZMInACIB0dfWEg4OFyoEZnZCkibKHA?e=6byqBX) |

### Download DINO-pretrained ResNet-50 from this [link](https://portland-my.sharepoint.com/:u:/g/personal/wuyangli2-c_my_cityu_edu_hk/EVnK9IPi91ZPuNmwpeSWGHABqhSFQK52I7xGzroXKeuyzA?e=EnlwgO)

## Setp 3: Change the Path

### Change the data path as follows.

```
[DATASET_PATH]
└─ Cityscapes
└─ AOOD_Annotations
└─ AOOD_Main
└─ train_source.txt
└─ train_target.txt
└─ val_source.txt
└─ val_target.txt
└─ leftImg8bit
└─ train
└─ val
└─ leftImg8bit_foggy
└─ train
└─ val
└─ bdd_daytime
└─ Annotations
└─ ImageSets
└─ JPEGImages
└─ clipart
└─ Annotations
└─ ImageSets
└─ JPEGImages
└─ VOCdevkit
└─ VOC2007
└─ VOC2012
```

### Change the data root folder in config files

Replace the DATASET.COCO_PATH in all yaml files in [config](configs) by your data root $DATASET_PATH, e.g., Line 22 of [soma_aood_city_to_foggy_r50.yaml](configs/soma_aood_city_to_foggy_r50.yaml)

### Change the path of DINO-pretrained backbone

Replace the backbone loading path at Line 107 of [backbone.py](models/backbone.py).

# 🔥 Start Training

We use two GPUs for training with 2 source images and 2 target images as input.

```bash
GPUS_PER_NODE=2
./tools/run_dist_launch.sh 2 python main.py --config_file {CONFIG_FILE} --opts DATASET.AOOD_SETTING 1
```

We provide some scripts in our experiments in [run.sh](./run.sh). After "--opts", the settings will overwrite the default config file as the maskrcnn-benchmark framework.

# 📦 Well-trained models

Will be provided later

<!-- | Source| Target| Task | mAP $_b$ | AR $_n$ | WI | AOSE | AP@75 | checkpoint |
| :-----:| :-----:| :-----:| :-----:| :-----:| :-----:| :-----:| :-----:| :-----:
| City |Foggy | het-sem |
| City |Foggy | het-sem |
| City |Foggy | het-sem |
| City |Foggy | het-sem | -->


# 💬 Notification

- The core idea is to select informative motifs (which can be trated as the mix-up of object queries) for self-training.
- You can try the DA version of [OW-DETR](https://github.com/akshitac8/OW-DETR) in this repository by setting:
```
-opts AOOD.OW_DETR_ON True
```
- Adopting SAM to address AOOD may be a good direction.
- To visualize unknown boxes, post-processing is needed in Line736 of [PostProcess](models/motif_detr.py).

# 📝 Citation

If you think this work is helpful for your project, please give it a star and citation. We sincerely appreciate your acknowledgment.

```BibTeX
@InProceedings{li2023novel,
title={Novel Scenes & Classes: Towards Adaptive Open-set Object Detection},
author={Li, Wuyang and Guo, Xiaoqing and Yuan, Yixuan},
booktitle={ICCV},
year={2023}
}
```

Relevant project:

Exploring the similar issue for the classifictaion task. [[link]](https://openaccess.thecvf.com/content/CVPR2023/html/Li_Adjustment_and_Alignment_for_Unbiased_Open_Set_Domain_Adaptation_CVPR_2023_paper.html)

```BibTeX
@InProceedings{Li_2023_CVPR,
author = {Li, Wuyang and Liu, Jie and Han, Bo and Yuan, Yixuan},
title = {Adjustment and Alignment for Unbiased Open Set Domain Adaptation},
booktitle = {CVPR},
year = {2023},
}
```

# 🤞 Acknowledgements

We greatly appreciate the tremendous effort for the following works.

- This work is based on DAOD framework [AQT](https://github.com/weii41392/AQT).
- Our work is highly inspired by [OW-DETR](https://github.com/akshitac8/OW-DETR) and [OpenDet](https://github.com/csuhan/opendet2).
- The implementation of the basic detector is based on [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR).

# 📒 Abstract

Domain Adaptive Object Detection (DAOD) transfers an object detector to a novel domain free of labels. However, in the real world, besides encountering novel scenes, novel domains always contain novel-class objects de facto, which are ignored in existing research. Thus, we formulate and study a more practical setting, Adaptive Open-set Object Detection (AOOD), considering both novel scenes and classes. Directly combing off-the-shelled cross-domain and open-set approaches is sub-optimal since their low-order dependence, such as the confidence score, is insufficient for the AOOD with two dimensions of novel information. To address this, we propose a novel Structured Motif Matching (SOMA) framework for AOOD, which models the high-order relation with motifs, \ie, statistically significant subgraphs, and formulates AOOD solution as motif matching to learn with high-order patterns. In a nutshell, SOMA consists of Structure-aware Novel-class Learning (SNL) and Structure-aware Transfer Learning (STL). As for SNL, we establish an instance-oriented graph to capture the class-independent object feature hidden in different base classes. Then, a high-order metric is proposed to match the most significant motif as high-order patterns, serving for motif-guided novel-class learning. In STL, we set up a semantic-oriented graph to model the class-dependent relation across domains, and match unlabelled objects with high-order motifs to align the cross-domain distribution with structural awareness. Extensive experiments demonstrate that the proposed SOMA achieves state-of-the-art performance.

![image](./assets/overall.png)
Binary file added assets/mot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/overall.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/paper.pdf
Binary file not shown.
69 changes: 69 additions & 0 deletions benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# ------------------------------------------------------------------------
# Modified by Wei-Jie Huang
# ------------------------------------------------------------------------
# Deformable DETR
# Copyright (c) 2020 SenseTime. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------

"""
Benchmark inference speed of Deformable DETR.
"""
import os
import time
import argparse

import torch

from main import get_args_parser as get_main_args_parser
from models import build_model
from datasets import build_dataset
from util.misc import nested_tensor_from_tensor_list


def get_benckmark_arg_parser():
parser = argparse.ArgumentParser('Benchmark inference speed of Deformable DETR.')
parser.add_argument('--num_iters', type=int, default=300, help='total iters to benchmark speed')
parser.add_argument('--warm_iters', type=int, default=5, help='ignore first several iters that are very slow')
parser.add_argument('--batch_size', type=int, default=1, help='batch size in inference')
parser.add_argument('--resume', type=str, help='load the pre-trained checkpoint')
return parser


@torch.no_grad()
def measure_average_inference_time(model, inputs, num_iters=100, warm_iters=5):
ts = []
for iter_ in range(num_iters):
torch.cuda.synchronize()
t_ = time.perf_counter()
model(inputs)
torch.cuda.synchronize()
t = time.perf_counter() - t_
if iter_ >= warm_iters:
ts.append(t)
print(ts)
return sum(ts) / len(ts)


def benchmark():
args, _ = get_benckmark_arg_parser().parse_known_args()
main_args = get_main_args_parser().parse_args(_)
assert args.warm_iters < args.num_iters and args.num_iters > 0 and args.warm_iters >= 0
assert args.batch_size > 0
assert args.resume is None or os.path.exists(args.resume)
dataset = build_dataset('val', main_args)
model, _, _ = build_model(main_args)
model.cuda()
model.eval()
if args.resume is not None:
ckpt = torch.load(args.resume, map_location=lambda storage, loc: storage)
model.load_state_dict(ckpt['model'])
inputs = nested_tensor_from_tensor_list([dataset.__getitem__(0)[0].cuda() for _ in range(args.batch_size)])
t = measure_average_inference_time(model, inputs, args.num_iters, args.warm_iters)
return 1.0 / t * args.batch_size


if __name__ == '__main__':
fps = benchmark()
print(f'Inference Speed: {fps:.1f} FPS')

Loading

0 comments on commit 41c11cb

Please sign in to comment.