diff --git a/README.md b/README.md index 523900c..29f980a 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,8 @@ # OneFormer: One Transformer to Rule Universal Image Segmentation [![Framework: PyTorch](https://img.shields.io/badge/Framework-PyTorch-orange.svg)](https://pytorch.org/) [![Huggingface space](https://img.shields.io/badge/šŸ¤—-Huggingface%20Space-cyan.svg)](https://huggingface.co/spaces/shi-labs/OneFormer) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/SHI-Labs/OneFormer/blob/main/colab/oneformer_colab.ipynb) [![License](https://img.shields.io/badge/License-MIT-red.svg)](https://opensource.org/licenses/MIT) [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/_Zr1pOi7Chw) - -[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/panoptic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/panoptic-segmentation-on-ade20k-val?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/instance-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/instance-segmentation-on-ade20k-val?p=oneformer-one-transformer-to-rule-universal) -[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/instance-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/instance-segmentation-on-cityscapes-val?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/semantic-segmentation-on-coco-1)](https://paperswithcode.com/sota/semantic-segmentation-on-coco-1?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/instance-segmentation-on-coco-val-panoptic)](https://paperswithcode.com/sota/instance-segmentation-on-coco-val-panoptic?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/panoptic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/panoptic-segmentation-on-cityscapes-val?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/semantic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k-val?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/panoptic-segmentation-on-coco-minival)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-minival?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/semantic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes-val?p=oneformer-one-transformer-to-rule-universal) + +[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/instance-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/instance-segmentation-on-cityscapes-val?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/semantic-segmentation-on-coco-1)](https://paperswithcode.com/sota/semantic-segmentation-on-coco-1?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/instance-segmentation-on-coco-val-panoptic)](https://paperswithcode.com/sota/instance-segmentation-on-coco-val-panoptic?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/panoptic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/panoptic-segmentation-on-ade20k-val?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/instance-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/instance-segmentation-on-ade20k-val?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/panoptic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/panoptic-segmentation-on-cityscapes-val?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/semantic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k-val?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/panoptic-segmentation-on-coco-minival)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-minival?p=oneformer-one-transformer-to-rule-universal) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/oneformer-one-transformer-to-rule-universal/semantic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes-val?p=oneformer-one-transformer-to-rule-universal) [Jitesh Jain](https://praeclarumjj3.github.io/), [Jiachen Li](https://chrisjuniorli.github.io/), [MangTik Chiu](https://www.linkedin.com/in/mtchiu/), [Ali Hassani](https://alihassanijr.com/), [Nikita Orlov](https://www.linkedin.com/in/nukich74/), [Humphrey Shi](https://www.humphreyshi.com/home) @@ -38,6 +37,10 @@ This repo contains the code for our paper **OneFormer: One Transformer to Rule U ## News +### December 26, 2022 + +- Checkpoints for Swin-L OneFormer and DiNat-L OneFormer trained on ADE20K with 1280×1280 resolution released! + ### November 23, 2022 - Roboflow cover OneFormer on [YouTube](https://youtu.be/_Zr1pOi7Chw)! Thanks to [@SkalskiP](https://github.com/SkalskiP) for making the video! @@ -50,7 +53,7 @@ This repo contains the code for our paper **OneFormer: One Transformer to Rule U - [**Project Page**](https://praeclarumjj3.github.io/oneformer/), [**ArXiv Preprint**](https://praeclarumjj3.github.io/oneformer/) and [**GitHub Repo**](https://praeclarumjj3.github.io/oneformer/) are public! - OneFormer sets new SOTA on Cityscapes val with single-scale inference on Panoptic Segmentation with **68.5** PQ score and Instance Segmentation with **46.7** AP score! -- OneFormer sets new SOTA on ADE20K val on Panoptic Segmentation with **50.2** PQ score and on Instance Segmentation with **37.6** AP! +- OneFormer sets new SOTA on ADE20K val on Panoptic Segmentation with **51.5** PQ score and on Instance Segmentation with **37.8** AP! - OneFormer sets new SOTA on COCO val on Panoptic Segmentation with **58.0** PQ score! ## Installation Instructions @@ -92,12 +95,14 @@ This repo contains the code for our paper **OneFormer: One Transformer to Rule U | Method | Backbone | Crop Size | PQ | AP | mIoU
(s.s) | mIoU
(ms+flip) | #params | config | Checkpoint | | :---:| :---: | :---: | :---: | :---:| :---: | :---: | :---: | :---: | :---: | -| OneFormer | Swin-L | 640×640 | 48.6 | 35.9 | 57.0 | 57.7 | 219M | [config](configs/ade20k/swin/oneformer_swin_large_bs16_160k.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/250_16_swin_l_oneformer_ade20k_160k.pth) | -| OneFormer | Swin-L | 896×896 | 50.2 | 37.6 | 57.4 | 58.3 | 219M | [config](configs/ade20k/swin/oneformer_swin_large_bs16_160k_896x896.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/896x896_250_16_swin_l_oneformer_ade20k_160k.pth) | -| OneFormer | ConvNeXt-L | 640×640 | 48.7 | 36.2 | 56.6 | 57.4 | 220M | [config](configs/ade20k/convnext/oneformer_convnext_large_bs16_160k.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/250_16_convnext_l_oneformer_ade20k_160k.pth) | -| OneFormer | DiNAT-L | 640×640 | 49.1 | 36.0 | 57.8 | 58.4 | 223M | [config](configs/ade20k/dinat/oneformer_dinat_large_bs16_160k.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/250_16_dinat_l_oneformer_ade20k_160k.pth) | -| OneFormer | DiNAT-L | 896×896 | 50.0 | 36.8 | 58.1 | 58.6 | 223M | [config](configs/ade20k/dinat/oneformer_dinat_large_bs16_160k_896x896.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/896x896_250_16_dinat_l_oneformer_ade20k_160k.pth) | -| OneFormer | ConvNeXt-XL | 640×640 | 48.9 | 36.3 | 57.4 | 58.8 | 372M | [config](configs/ade20k/convnext/oneformer_convnext_xlarge_bs16_160k.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/250_16_convnext_xl_oneformer_ade20k_160k.pth) | +| OneFormer | Swin-L | 640×640 | 49.8 | 35.9 | 57.0 | 57.7 | 219M | [config](configs/ade20k/swin/oneformer_swin_large_bs16_160k.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/250_16_swin_l_oneformer_ade20k_160k.pth) | +| OneFormer | Swin-L | 896×896 | 51.1 | 37.6 | 57.4 | 58.3 | 219M | [config](configs/ade20k/swin/oneformer_swin_large_bs16_160k_896x896.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/896x896_250_16_swin_l_oneformer_ade20k_160k.pth) | +| OneFormer | Swin-L | 1280×1280 | 51.4 | 37.8 | 57.0 | 57.7 | 219M | [config](configs/ade20k/swin/oneformer_swin_large_bs16_160k_1280x1280.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/1280x1280_250_16_swin_l_oneformer_ade20k_160k.pth) | +| OneFormer | ConvNeXt-L | 640×640 | 50.0 | 36.2 | 56.6 | 57.4 | 220M | [config](configs/ade20k/convnext/oneformer_convnext_large_bs16_160k.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/250_16_convnext_l_oneformer_ade20k_160k.pth) | +| OneFormer | DiNAT-L | 640×640 | 50.5 | 36.0 | 58.3 | 58.4 | 223M | [config](configs/ade20k/dinat/oneformer_dinat_large_bs16_160k.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/250_16_dinat_l_oneformer_ade20k_160k.pth) | +| OneFormer | DiNAT-L | 896×896 | 51.2 | 36.8 | 58.1 | 58.6 | 223M | [config](configs/ade20k/dinat/oneformer_dinat_large_bs16_160k_896x896.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/896x896_250_16_dinat_l_oneformer_ade20k_160k.pth) | +| OneFormer | DiNAT-L | 1280×1280 | 51.5 | 37.1 | 58.3 | 58.7 | 223M | [config](configs/ade20k/dinat/oneformer_dinat_large_bs16_160k_1280x1280.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/1280x1280_250_16_dinat_l_oneformer_ade20k_160k.pth) | +| OneFormer | ConvNeXt-XL | 640×640 | 50.1 | 36.3 | 57.4 | 58.8 | 372M | [config](configs/ade20k/convnext/oneformer_convnext_xlarge_bs16_160k.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/250_16_convnext_xl_oneformer_ade20k_160k.pth) | ### Cityscapes @@ -122,7 +127,7 @@ If you found OneFormer useful in your research, please consider starring ā­ us ```bibtex @article{jain2022oneformer, - title={OneFormer: One Transformer to Rule Universal Image Segmentation}, + title={{OneFormer: One Transformer to Rule Universal Image Segmentation}}, author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi}, journal={arXiv}, year={2022} diff --git a/configs/ade20k/dinat/oneformer_dinat_large_IN21k_384_bs16_160k.yaml b/configs/ade20k/dinat/oneformer_dinat_large_bs16_160k.yaml similarity index 100% rename from configs/ade20k/dinat/oneformer_dinat_large_IN21k_384_bs16_160k.yaml rename to configs/ade20k/dinat/oneformer_dinat_large_bs16_160k.yaml diff --git a/configs/ade20k/dinat/oneformer_dinat_large_bs16_160k_1280x1280.yaml b/configs/ade20k/dinat/oneformer_dinat_large_bs16_160k_1280x1280.yaml new file mode 100644 index 0000000..9550684 --- /dev/null +++ b/configs/ade20k/dinat/oneformer_dinat_large_bs16_160k_1280x1280.yaml @@ -0,0 +1,42 @@ +_BASE_: ../oneformer_R50_bs16_160k.yaml +MODEL: + BACKBONE: + NAME: "D2DiNAT" + DiNAT: + EMBED_DIM: 192 + MLP_RATIO: 2.0 + DEPTHS: [3, 4, 18, 5] + NUM_HEADS: [6, 12, 24, 48] + KERNEL_SIZE: 11 + DROP_PATH_RATE: 0.3 + DILATIONS: [[1, 28, 1], [1, 7, 1, 14], [1, 3, 1, 5, 1, 5, 1, 7, 1, 3, 1, 5, 1, 5, 1, 7, 1, 7], [1, 3, 1, 3, 1]] + WEIGHTS: "dinat_large_in22k_in1k_384_11x11.pkl" + PIXEL_MEAN: [123.675, 116.280, 103.530] + PIXEL_STD: [58.395, 57.120, 57.375] + ONE_FORMER: + NUM_OBJECT_QUERIES: 250 +SOLVER: + AMP: + ENABLED: False +INPUT: + MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1280) for x in range(5, 21)]"] + MIN_SIZE_TRAIN_SAMPLING: "choice" + MIN_SIZE_TEST: 1280 + MAX_SIZE_TRAIN: 5120 + MAX_SIZE_TEST: 5120 + CROP: + ENABLED: True + TYPE: "absolute" + SIZE: (1280, 1280) + SINGLE_CATEGORY_MAX_AREA: 1.0 + COLOR_AUG_SSD: True + SIZE_DIVISIBILITY: 1280 # used in dataset mapper + FORMAT: "RGB" +TEST: + DETECTIONS_PER_IMAGE: 250 + EVAL_PERIOD: 5000 + AUG: + ENABLED: False + MIN_SIZES: [640, 960, 1280, 1600, 1920, 2240] + MAX_SIZE: 8960 + FLIP: True \ No newline at end of file diff --git a/configs/ade20k/oneformer_R50_bs16_160k.yaml b/configs/ade20k/oneformer_R50_bs16_160k.yaml index 770ffc8..7db0e41 100644 --- a/configs/ade20k/oneformer_R50_bs16_160k.yaml +++ b/configs/ade20k/oneformer_R50_bs16_160k.yaml @@ -52,7 +52,7 @@ MODEL: INSTANCE_ON: True PANOPTIC_ON: True OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 + OBJECT_MASK_THRESHOLD: 0.5 TASK: "panoptic" TEST: DETECTIONS_PER_IMAGE: 150 diff --git a/configs/ade20k/swin/oneformer_swin_large_bs16_160k.yaml b/configs/ade20k/swin/oneformer_swin_large_bs16_160k.yaml index 62caa14..c065d5f 100644 --- a/configs/ade20k/swin/oneformer_swin_large_bs16_160k.yaml +++ b/configs/ade20k/swin/oneformer_swin_large_bs16_160k.yaml @@ -11,7 +11,7 @@ MODEL: DROP_PATH_RATE: 0.3 PATCH_NORM: True PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" + WEIGHTS: "swin_large_patch4_window12_384_22kto1k.pkl" PIXEL_MEAN: [123.675, 116.280, 103.530] PIXEL_STD: [58.395, 57.120, 57.375] ONE_FORMER: diff --git a/configs/ade20k/swin/oneformer_swin_large_bs16_160k_1280x1280.yaml b/configs/ade20k/swin/oneformer_swin_large_bs16_160k_1280x1280.yaml new file mode 100644 index 0000000..37ed5cc --- /dev/null +++ b/configs/ade20k/swin/oneformer_swin_large_bs16_160k_1280x1280.yaml @@ -0,0 +1,40 @@ +_BASE_: ../oneformer_R50_bs16_160k.yaml +MODEL: + BACKBONE: + NAME: "D2SwinTransformer" + SWIN: + EMBED_DIM: 192 + DEPTHS: [2, 2, 18, 2] + NUM_HEADS: [6, 12, 24, 48] + WINDOW_SIZE: 12 + APE: False + DROP_PATH_RATE: 0.3 + PATCH_NORM: True + PRETRAIN_IMG_SIZE: 384 + WEIGHTS: "swin_large_patch4_window12_384_22kto1k.pkl" + PIXEL_MEAN: [123.675, 116.280, 103.530] + PIXEL_STD: [58.395, 57.120, 57.375] + ONE_FORMER: + NUM_OBJECT_QUERIES: 250 +INPUT: + MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1280) for x in range(5, 21)]"] + MIN_SIZE_TRAIN_SAMPLING: "choice" + MIN_SIZE_TEST: 1280 + MAX_SIZE_TRAIN: 5120 + MAX_SIZE_TEST: 5120 + CROP: + ENABLED: True + TYPE: "absolute" + SIZE: (1280, 1280) + SINGLE_CATEGORY_MAX_AREA: 1.0 + COLOR_AUG_SSD: True + SIZE_DIVISIBILITY: 1280 # used in dataset mapper + FORMAT: "RGB" +TEST: + DETECTIONS_PER_IMAGE: 250 + EVAL_PERIOD: 5000 + AUG: + ENABLED: False + MIN_SIZES: [640, 960, 1280, 1600, 1920, 2240] + MAX_SIZE: 8960 + FLIP: True diff --git a/configs/ade20k/swin/oneformer_swin_large_bs16_160k_896x896.yaml b/configs/ade20k/swin/oneformer_swin_large_bs16_160k_896x896.yaml index 365e6c2..4ccc081 100644 --- a/configs/ade20k/swin/oneformer_swin_large_bs16_160k_896x896.yaml +++ b/configs/ade20k/swin/oneformer_swin_large_bs16_160k_896x896.yaml @@ -11,7 +11,7 @@ MODEL: DROP_PATH_RATE: 0.3 PATCH_NORM: True PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" + WEIGHTS: "swin_large_patch4_window12_384_22kto1k.pkl" PIXEL_MEAN: [123.675, 116.280, 103.530] PIXEL_STD: [58.395, 57.120, 57.375] ONE_FORMER: diff --git a/configs/cityscapes/swin/oneformer_swin_large_bs16_90k.yaml b/configs/cityscapes/swin/oneformer_swin_large_bs16_90k.yaml index 9e401c8..eeb202a 100644 --- a/configs/cityscapes/swin/oneformer_swin_large_bs16_90k.yaml +++ b/configs/cityscapes/swin/oneformer_swin_large_bs16_90k.yaml @@ -11,7 +11,7 @@ MODEL: DROP_PATH_RATE: 0.3 PATCH_NORM: True PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" + WEIGHTS: "swin_large_patch4_window12_384_22kto1k.pkl" PIXEL_MEAN: [123.675, 116.280, 103.530] PIXEL_STD: [58.395, 57.120, 57.375] ONE_FORMER: diff --git a/configs/coco/swin/oneformer_swin_large_bs16_100ep.yaml b/configs/coco/swin/oneformer_swin_large_bs16_100ep.yaml index c665889..d3194e6 100644 --- a/configs/coco/swin/oneformer_swin_large_bs16_100ep.yaml +++ b/configs/coco/swin/oneformer_swin_large_bs16_100ep.yaml @@ -11,7 +11,7 @@ MODEL: DROP_PATH_RATE: 0.3 PATCH_NORM: True PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" + WEIGHTS: "swin_large_patch4_window12_384_22kto1k.pkl" PIXEL_MEAN: [123.675, 116.280, 103.530] PIXEL_STD: [58.395, 57.120, 57.375] ONE_FORMER: diff --git a/images/plots.svg b/images/plots.svg index a98b025..a38bdda 100644 --- a/images/plots.svg +++ b/images/plots.svg @@ -1,3 +1,3 @@ -
Cityscapes
Cityscapes
Panoptic
Panoptic
Instance
Instance
Semantic
Semantic
`66....
`67....
`43....
`45....
`84....
`84....
COCO
COCO
Panoptic
Panoptic
Instance
Instance
Semantic
Semantic
`49....
`49....
`67....
`67....
ADE20K
ADE20K
Panoptic
Panoptic
Instance
Instance
Semantic
Semantic
`48....
`48....
`34....
`35....
`57....
`57....
Universal Architecture, Model and Dataset
Universal Architecture, Model and Dataset
Panoptic Architecture BUT Specialized Models and Datasets
Panoptic Architecture BUT Specialized Models and Datasets
OneFormer (Swin-L)
OneFormer (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
OneFormerĀ (DiNAT-L)
OneFormerĀ (DiNAT-L)
`58....
`36....
`49....
`84....
`45....
`67....
`68....
`49....
`57....
`57....
`58....
Universal Architecture, Model and Dataset
Universal Architecture, Model and Dataset
Panoptic Architecture BUT Specialized Models and Datasets
Panoptic Architecture BUT Specialized Models and Datasets
OneFormer (Swin-L)
OneFormer (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
OneFormerĀ (DiNAT-L)
OneFormerĀ (DiNAT-L)
Universal Architecture, Model and Dataset
Universal Architecture, Model and Dataset
Panoptic Architecture BUT Specialized Models and Datasets
Panoptic Architecture BUT Specialized Models and Datasets
OneFormer (Swin-L)
OneFormer (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
OneFormerĀ (DiNAT-L)
OneFormerĀ (DiNAT-L)
Text is not SVG - cannot display
\ No newline at end of file +
Cityscapes
Cityscapes
Panoptic
Panoptic
Instance
Instance
Semantic
Semantic
`66....
`67....
`43....
`45....
`84....
`84....
COCO
COCO
Panoptic
Panoptic
Instance
Instance
Semantic
Semantic
`49....
`49....
`67....
`67....
ADE20K
ADE20K
Panoptic
Panoptic
Instance
Instance
Semantic
Semantic
`48....
`49....
`34....
`35....
`57....
`57....
Universal Architecture, Model and Dataset
Universal Architecture, Model and Dataset
Panoptic Architecture BUT Specialized Models and Datasets
Panoptic Architecture BUT Specialized Models and Datasets
OneFormer (Swin-L)
OneFormer (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
OneFormerĀ (DiNAT-L)
OneFormerĀ (DiNAT-L)
`58....
`36....
`50....
`84....
`45....
`67....
`68....
`49....
`57....
`57....
`58....
Universal Architecture, Model and Dataset
Universal Architecture, Model and Dataset
Panoptic Architecture BUT Specialized Models and Datasets
Panoptic Architecture BUT Specialized Models and Datasets
OneFormer (Swin-L)
OneFormer (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
OneFormerĀ (DiNAT-L)
OneFormerĀ (DiNAT-L)
Universal Architecture, Model and Dataset
Universal Architecture, Model and Dataset
Panoptic Architecture BUT Specialized Models and Datasets
Panoptic Architecture BUT Specialized Models and Datasets
OneFormer (Swin-L)
OneFormer (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-SemanticĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-InstanceĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
Mask2Former-PanopticĀ (Swin-L)
OneFormerĀ (DiNAT-L)
OneFormerĀ (DiNAT-L)
Text is not SVG - cannot display
\ No newline at end of file diff --git a/tools/README.md b/tools/README.md index 3bfc0b0..071532a 100644 --- a/tools/README.md +++ b/tools/README.md @@ -16,8 +16,8 @@ It's common to initialize from backbone models pre-trained on ImageNet classific wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth python tools/convert-pretrained-model-to-d2.py swin_large_patch4_window12_384_22k.pth swin_large_patch4_window12_384_22k.pkl - wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k_to_1k.pth - python tools/convert-pretrained-model-to-d2.py swin_large_patch4_window12_384_22k_to_1k.pth swin_large_patch4_window12_384_22k_to_1k.pkl + wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22kto1k.pth + python tools/convert-pretrained-model-to-d2.py swin_large_patch4_window12_384_22kto1k.pth swin_large_patch4_window12_384_22kto1k.pkl ```