Skip to content

Commit

Permalink
add zero-shot eval; add configs & logs; add labels
Browse files Browse the repository at this point in the history
  • Loading branch information
whwu95 committed Nov 29, 2022
1 parent e0ec374 commit 092ad0a
Show file tree
Hide file tree
Showing 72 changed files with 721,334 additions and 34 deletions.
60 changes: 26 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ This is the official implementation of the paper [Revisiting Classifier: Transfe

## Updates
- [ ] Models: The trained models & logs.
- [ ] Config: All the configs (general/few-shot/zero-shot video recognition) on Kinetics-400 & 600, ActivityNet, UCF, and HMDB.
- [ ] Code: Zero-shot Evaluation: Half-classes evaluation and Full-classes evaluation.
- [ ] Code: Distributed training for InfoNCE and Compatible with CE.
- [x] **[Nov 30, 2022]** Config: All the configs (general/few-shot/zero-shot video recognition) on Kinetics-400 & 600, ActivityNet, UCF, and HMDB.
- [x] **[Nov 30, 2022]** Code: Zero-shot Evaluation: Half-classes evaluation and Full-classes evaluation.
- [x] **[Nov 29, 2022]** Code: Distributed training for InfoNCE and Compatible with CE.
- [x] **[Nov 28, 2022]** Code: Multi-Machine Multi-GPU Distributed Training
- [x] **[Nov 28, 2022]** Code: Single-Machine Multi-GPU Distributed Training, Distributed testing.
- [x] **[Nov 19, 2022]** 🎉Our paper has been accepted by **AAAI-2023**.
Expand Down Expand Up @@ -80,13 +80,13 @@ Annotation information consists of two parts: video label, and category descript
## Training
This implementation supports Multi-GPU `DistributedDataParallel` training, which is faster and simpler than `DataParallel` used in [ActionCLIP](https://github.com/sallymmx/actionclip).

- To train our model on Kinetics-400 with 8 GPUs in a **Single Machine**, you can run:
- **Single Machine**: To train our model on Kinetics-400 with 8 GPUs in *Single Machine*, you can run:
```sh
# For example, train the 8 Frames ViT-B/32.
sh scripts/run_train.sh configs/k400/k400_train_rgb_vitb-32-f8.yaml
```

- We also provide the script to train larger model with **Mulitple Machines** (e.g., 2 machines and 16 GPUs), you can run:
- **Mulitple Machines**: We also provide the script to train larger model with *Mulitple Machines* (e.g., 2 machines and 16 GPUs), you can run:
```sh
# For example, we train the 8 Frames ViT-L/14 with 2 machines as follows:
# For first machine, you need to set the ip of your first machine as the --master_addr, --nnodes is 2.
Expand All @@ -97,7 +97,7 @@ sh scripts/run_train_multinodes.sh configs/k400/k400_train_rgb_vitl-14-f8.yaml 0
sh scripts/run_train_multinodes.sh configs/k400/k400_train_rgb_vitl-14-f8.yaml 1
```

- To train our model under **Few-shot** scenario, you just need to add one line in the general config file:
- **Few-shot Recognition**: To train our model under *Few-shot* scenario, you just need to add one line in the general config file:
```sh
# You can refer to config/k400/k400_few_shot.yaml
data:
Expand Down Expand Up @@ -125,33 +125,25 @@ We use the Kinetics-400 pre-trained model (e.g., ViT-L/14 with 8 frames) to perf

- Half-classes Evaluation: A traditional evaluation protocol involves selecting half of the test dataset's classes, repeating the process ten times, and reporting the mean accuracy with a standard deviation of ten times.


- Full-classes Evaluation: Perform evaluation on the entire dataset.

```sh
# On ActivityNet
# On ActivityNet: reporting the half-classes and full-classes results
sh scripts/run_test_zeroshot.sh configs/anet/anet_zero_shot.yaml exp/k400/ViT-L/14/f8/last_model.pt

# On UCF101
# On UCF101: reporting the half-classes and full-classes results
sh scripts/run_test_zeroshot.sh configs/ucf101/ucf_zero_shot.yaml exp/k400/ViT-L/14/f8/last_model.pt

# On HMDB51
# On HMDB51: reporting the half-classes and full-classes results
sh scripts/run_test_zeroshot.sh configs/hmdb51/hmdb_zero_shot.yaml exp/k400/ViT-L/14/f8/last_model.pt

# On Kinetics-600, reporting the mean accuracy with standard deviation of three splits.
# On Kinetics-600: manually calculating the mean accuracy with standard deviation of three splits.
sh scripts/run_test.sh configs/k600/k600_zero_shot_split1.yaml exp/k400/ViT-L/14/f8/last_model.pt
sh scripts/run_test.sh configs/k600/k600_zero_shot_split2.yaml exp/k400/ViT-L/14/f8/last_model.pt
sh scripts/run_test.sh configs/k600/k600_zero_shot_split3.yaml exp/k400/ViT-L/14/f8/last_model.pt
```

- Full-classes Evaluation: Perform evaluation on the entire dataset.
```sh
# On ActivityNet
sh scripts/run_test.sh configs/anet/anet_zero_shot.yaml exp/k400/ViT-L/14/f8/last_model.pt

# On UCF101
sh scripts/run_test.sh configs/ucf101/ucf_zero_shot.yaml exp/k400/ViT-L/14/f8/last_model.pt

# On HMDB51
sh scripts/run_test.sh configs/hmdb51/hmdb_zero_shot.yaml exp/k400/ViT-L/14/f8/last_model.pt
```


## Model Zoo
Expand All @@ -163,38 +155,38 @@ Here we provide some off-the-shelf pre-trained checkpoints of our models in the

| Architecture |#Frame | Top-1 Acc.(%) | checkpoint | Train log| config|
|:------------:|:-------------------:|:------------------:|:-----------------:|:--------------:|:--------------:|
| ViT-B/32 | 8x3x4 | 80.0 | - | - | - |
| ViT-B/32 | 16x3x4 | 80.5 | - | - | - |
| ViT-B/16 | 8x3x4 | 82.9 | - | - | - |
| ViT-B/16 | 16x3x4 | 83.6 | - | - | - |
| ViT-L/14* | 8x3x4 | 86.4 | - | - | - |
| ViT-L/14-336 | 8x3x4 | 87.1 | - | - | - |
| ViT-L/14-336 | 32x3x1 | 87.8 | - | - | - |
| ViT-B/32 | 8x3x4 | 80.0 | - | [log](exps/k400/ViT-B/32/f8/log.txt) | [config](configs/k400/k400_train_rgb_vitb-32-f8.yaml) |
| ViT-B/32 | 16x3x4 | 80.5 | - | [log](exps/k400/ViT-B/32/f16/log.txt) | [config](configs/k400/k400_train_rgb_vitb-32-f16.yaml) |
| ViT-B/16 | 8x3x4 | 82.9 | - | [log](exps/k400/ViT-B/16/f8/log.txt) | [config](configs/k400/k400_train_rgb_vitb-16-f8.yaml) |
| ViT-B/16 | 16x3x4 | 83.6 | - | [log](exps/k400/ViT-B/16/f16/log.txt) | [config](configs/k400/k400_train_rgb_vitb-16-f16.yaml) |
| ViT-L/14* | 8x3x4 | 86.4 | - | [log](exps/k400/ViT-L/14/f8/log.txt) | [config](configs/k400/k400_train_rgb_vitl-14-f8.yaml) |
| ViT-L/14-336 | 8x3x4 | 87.1 | - | [log](exps/k400/ViT-L/14-336px/f8/log.txt) | [config](configs/k400/k400_train_rgb_vitl-14-336-f8.yaml) |
| ViT-L/14-336 | 32x3x1 | 87.8 | - | [log](exps/k400/ViT-L/14-336px/f32/log.txt) | [config](configs/k400/k400_train_rgb_vitl-14-336-f32.yaml) |

*Note: * indicates that this ViT-L model is used for the zero-shot evaluation on UCF, HMDB, ActivityNet and Kinetics-600.*

#### ActivityNet
| Architecture |#Frame | mAP (%) | checkpoint | Train log| config|
|:------------:|:-------------------:|:------------------:|:-----------------:|:--------------:|:--------------:|
| ViT-L/14 | 16x1x1 | 96.5 | - | - | - |
| ViT-L/14-336 | 16x1x1 | 96.9 | - | - | - |
| ViT-L/14 | 16x1x1 | 96.5 | - | [log](exps/anet/ViT-L/14/f16/log.txt) | [config](configs/anet/anet_k400_finetune.yaml) |
| ViT-L/14-336 | 16x1x1 | 96.9 | - | [log](exps/anet/ViT-L/14-336px/f16/log.txt) | [config](configs/anet/anet_k400_finetune_336.yaml) |

#### UCF-101
| Architecture |#Frame | Top-1 Acc. (%) | checkpoint | Train log| config|
|:------------:|:-------------------:|:------------------:|:-----------------:|:--------------:|:--------------:|
| ViT-L/14 | 16x1x1 | 98.1 | - | - | - |
| ViT-L/14-336 | 16x1x1 | 98.2 | - | - | - |
| ViT-L/14 | 16x1x1 | 98.1 | - | [log](exps/ucf101/ViT-L/14/f16/log.txt) | [config](configs/ucf101/ucf_k400_finetune.yaml) |
| ViT-L/14-336 | 16x1x1 | 98.2 | - | [log](exps/ucf101/ViT-L/14-336px/f16/log.txt) | [config](configs/ucf101/ucf_k400_finetune_336.yaml) |

#### HMDB-51
| Architecture |#Frame | Top-1 Acc. (%) | checkpoint | Train log| config|
|:------------:|:-------------------:|:------------------:|:-----------------:|:--------------:|:--------------:|
| ViT-L/14 | 16x1x1 | 81.3 | - | - | - |
| ViT-L/14 | 16x1x1 | 81.3 | - | [log](exps/hmdb51/ViT-L/14/f16/log.txt) | [config](configs/hmdb51/hmdb_k400_finetune.yaml) |




## Bibtex
If you find this repository useful, please consider citing our paper 😄:
If you find this repository useful, please star🌟 this repo and cite📑 our paper:

```
@article{wu2022transferring,
Expand Down
43 changes: 43 additions & 0 deletions configs/anet/anet_few_shot.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
resume:
pretrain:
seed: 1024
data:
dataset: anet
modality: RGB
num_segments: 8
seg_length: 1
batch_size: 8
workers: 4
num_classes: 200
image_tmpl: 'image_{:06d}.jpg'
train_root: '/bpfs/v2_mnt/VIS/wuwenhao/anet/anet_instance_frames_v1.3_train_vids_fps1'
val_root: '/bpfs/v2_mnt/VIS/wuwenhao/anet/activitynet_val_resize_img_256_340_fps1'
train_list: 'lists/anet/anet_train_instance_fps1.txt'
val_list: 'lists/anet/anet_val_video_fps1.txt' #
label_list: 'lists/anet1.3_labels.csv'
input_size: 224
random_shift: True
shot: 2
network:
arch: ViT-L/14 #ViT-B/32 ViT-B/16
init: True
drop_out: 0.0
emb_dropout: 0.0
type: clip_anet
sim_header: None
drop: 0
solver:
type: cosine
epochs: 30
start_epoch: 0
optim: adamw
lr: 5.e-5
lr_warmup_step: 5
weight_decay: 0.2
loss_type: CE
evaluate: False
clip_ratio: 0.1
grad_accumulation_steps: 1
logging:
print_freq: 10
eval_freq: 5
43 changes: 43 additions & 0 deletions configs/anet/anet_k400_finetune.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
pretrain: exp_sota/k400/ViT-L/14/f16/last_model.pt
resume:
seed: 1024
data:
dataset: anet
modality: RGB
num_segments: 16
seg_length: 1
batch_size: 4
workers: 4
num_classes: 200
image_tmpl: 'image_{:06d}.jpg'
train_root: '/bpfs/v2_mnt/VIS/wuwenhao/anet/anet_instance_frames_v1.3_train_vids_fps1'
val_root: '/bpfs/v2_mnt/VIS/wuwenhao/anet/activitynet_val_resize_img_256_340_fps1'
train_list: 'lists/anet/anet_train_instance_fps1.txt'
val_list: 'lists/anet/anet_val_video_fps1.txt' #
label_list: 'lists/anet1.3_labels.csv'
input_size: 224
random_shift: True
network:
arch: ViT-L/14
init: True
drop_out: 0.0
emb_dropout: 0.0
type: clip_anet
sim_header: Transf
drop: 0
solver:
type: cosine
epochs: 30
start_epoch: 0
optim: adamw
lr: 5.e-5
lr_warmup_step: 5
weight_decay: 0.2
loss_type: CE
evaluate: False
clip_ratio: 0.1
grad_accumulation_steps: 2
logging:
print_freq: 10
eval_freq: 5

43 changes: 43 additions & 0 deletions configs/anet/anet_k400_finetune_336.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
pretrain: exp_sota/k400/ViT-L/14-336px/f16/last_model.pt
resume:
seed: 1024
data:
dataset: anet
modality: RGB
num_segments: 16
seg_length: 1
batch_size: 1
workers: 4
num_classes: 200
image_tmpl: 'image_{:06d}.jpg'
train_root: '/bpfs/v2_mnt/VIS/wuwenhao/anet/anet_instance_frames_v1.3_train_vids_fps1'
val_root: '/bpfs/v2_mnt/VIS/wuwenhao/anet/activitynet_val_resize_img_256_340_fps1'
train_list: 'lists/anet/anet_train_instance_fps1.txt'
val_list: 'lists/anet/anet_val_video_fps1.txt' #
label_list: 'lists/anet1.3_labels.csv'
input_size: 336
random_shift: True
network:
arch: ViT-L/14-336px
init: True
drop_out: 0.0
emb_dropout: 0.0
type: clip_anet
sim_header: Transf
drop: 0
solver:
type: cosine
epochs: 30
start_epoch: 0
optim: adamw
lr: 5.e-5
lr_warmup_step: 5
weight_decay: 0.2
loss_type: CE
evaluate: False
clip_ratio: 0.1
grad_accumulation_steps: 8
logging:
print_freq: 10
eval_freq: 5

25 changes: 25 additions & 0 deletions configs/anet/anet_zero_shot.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
seed: 1024
data:
dataset: anet
modality: RGB
num_segments: 8
seg_length: 1
batch_size: 32
workers: 4
num_classes: 200
image_tmpl: 'image_{:06d}.jpg'
val_root: /bpfs/v2_mnt/VIS/wuwenhao/anet
val_list: 'lists/anet/anet_full_for_zeroshot.txt'
label_list: 'lists/anet1.3_labels.csv'
index_bias: 1
input_size: 224
network:
arch: ViT-L/14 #ViT-L/14 #ViT-B/32 ViT-B/16
init: True
drop_out: 0.0
emb_dropout: 0.0
type: clip_anet
sim_header: Transf
logging:
print_freq: 10
eval_freq: 5
43 changes: 43 additions & 0 deletions configs/hmdb51/hmdb_few_shot.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
pretrain:
resume:
seed: 1024
data:
dataset: hmdb51
modality: RGB
num_segments: 8
seg_length: 1
batch_size: 8
workers: 4
num_classes: 51
image_tmpl: 'image_{:06d}.jpg'
train_root: '/bpfs/v2_mnt/VIS/wuwenhao/hmdb51_rgb_img_256_340'
val_root: '/bpfs/v2_mnt/VIS/wuwenhao/hmdb51_rgb_img_256_340'
train_list: 'lists/hmdb51/train_rgb_split_1.txt'
val_list: 'lists/hmdb51/val_rgb_split_1.txt'
label_list: 'lists/hmdb51_labels.csv'
input_size: 224
random_shift: True
shot: 2
network:
arch: ViT-L/14 #ViT-B/32 ViT-B/16
init: True
drop_out: 0.0
emb_dropout: 0.0
type: clip_hmdb
sim_header: None
drop: 0
solver:
type: cosine
epochs: 30
start_epoch: 0
optim: adamw
lr: 5.e-5
lr_warmup_step: 5
weight_decay: 0.2
loss_type: CE
evaluate: False
clip_ratio: 0.1
grad_accumulation_steps: 1
logging:
print_freq: 10
eval_freq: 5
42 changes: 42 additions & 0 deletions configs/hmdb51/hmdb_k400_finetune.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
pretrain: exp_sota/k400/ViT-L/14/f16/last_model.pt
resume:
seed: 1024
data:
dataset: hmdb51
modality: RGB
num_segments: 16
seg_length: 1
batch_size: 4
workers: 4
num_classes: 51
image_tmpl: 'image_{:06d}.jpg'
train_root: '/bpfs/v2_mnt/VIS/wuwenhao/hmdb51_rgb_img_256_340'
val_root: '/bpfs/v2_mnt/VIS/wuwenhao/hmdb51_rgb_img_256_340'
train_list: 'lists/hmdb51/train_rgb_split_1.txt'
val_list: 'lists/hmdb51/val_rgb_split_1.txt'
label_list: 'lists/hmdb51_labels.csv'
input_size: 224
random_shift: True
network:
arch: ViT-L/14 #ViT-B/32 ViT-B/16
init: True
drop_out: 0.0
emb_dropout: 0.0
type: clip_hmdb
sim_header: Transf
drop: 0
solver:
type: cosine
epochs: 30
start_epoch: 0
optim: adamw
lr: 5.e-5
lr_warmup_step: 5
weight_decay: 0.2
loss_type: CE
evaluate: False
clip_ratio: 0.1
grad_accumulation_steps: 2
logging:
print_freq: 10
eval_freq: 5
Loading

0 comments on commit 092ad0a

Please sign in to comment.