- For all the pretraining and finetuning, we adopt spaese/uniform sampling.
-
#Frame
$=$ #input_frame
$\times$ #crop
$\times$ #clip
-
#input_frame
means how many frames are input for model per inference -
#crop
means spatial crops (e.g., 3 for left/right/center) -
#clip
means temporal clips (e.g., 4 means repeted sampling four clips with different start indices)
Model | Setting | Model | Shell | Log |
---|---|---|---|---|
UMT-B/16 | K710 200e | ckpt | run.sh | log |
UMT-L/16 | K710 200e | ckpt | run.sh | log |
Model | Setting | #Frame | Top-1 | Model | Shell | Log |
---|---|---|---|---|---|---|
UMT-B/16 | K710 PT | 8x3x4 | 81.9 | ckpt | run.sh | log |
UMT-L/16 | K710 PT | 8x3x4 | 86.0 | ckpt | run.sh | log |
Model | Setting | #Frame | Top-1 | Model | Shell | Log |
---|---|---|---|---|---|---|
UMT-B/16 | K710 PT+FT | 8x3x4 | 87.4 | ckpt | run.sh | log |
UMT-L/16 | K710 PT+FT | 8x3x4 | 90.3 | ckpt | run.sh | log |
UMT-L/16 | K710 PT+FT | 16x3x4 | 90.6 | ckpt | run.sh | log |
Model | Setting | #Frame | Top-1 | Model | Shell | Log |
---|---|---|---|---|---|---|
UMT-B/16 | K710 PT+FT | 8x3x4 | 87.8 | ckpt | run.sh | log |
UMT-L/16 | K710 PT+FT | 8x3x4 | 90.4 | ckpt | run.sh | log |
UMT-L/16 | K710 PT+FT | 16x3x4 | 90.5 | ckpt | run.sh | log |
Model | Setting | #Frame | Top-1 | Model | Shell | Log |
---|---|---|---|---|---|---|
UMT-B/16 | K710 PT+FT | 8x3x4 | 78.5 | ckpt | run.sh | log |
UMT-L/16 | K710 PT+FT | 8x3x4 | 83.2 | ckpt | run.sh | log |
UMT-L/16 | K710 PT+FT | 16x3x4 | 83.6 | ckpt | run.sh | log |
Model | Setting | #Frame | Top-1 | Model | Shell | Log |
---|---|---|---|---|---|---|
UMT-B/16 | K710 PT+FT, K400 FT | 8x3x4 | 44.6 | ckpt | run.sh | log |
UMT-L/16 384↑ | K710 PT+FT, K400 FT | 8x3x4 | 45.5 | ckpt | run.sh | log |
UMT-L/16 | K710 PT+FT, K400 FT | 8x3x4 | 48.0 | ckpt | run.sh | log |
UMT-L/16 384↑ | K710 PT+FT, K400 FT | 8x3x4 | 48.7 | ckpt | run.sh | log |
Model | Setting | #Frame | Top-1 | Model | Shell | Log |
---|---|---|---|---|---|---|
UMT-B/16 | K710 PT | 8x3x4 | 70.8 | ckpt | run.sh | log |
UMT-L/16 | K710 PT | 8x3x4 | 74.7 | ckpt | run.sh | log |
See action_detection.