The suggested list is a compendium of works that use Transformer-Based Segmentation techniques for Semantic and Instance Segmentation of image or video datasets.
You can add to this repository; we would be grateful. Please feel free to send me pull requests
The structure that we'll use:
- [Paper Name] (link) -Conference Name and Year -[github] (link)
- A Survey of Transformers -arXiv 2021.
- Transformers in Vision: A Survey -arXiv 2021.
- Transformers in computational visual media: A survey -SpringerLink 2022.
- A Survey on Vision Transformer -IEEE 2022.
- Vision Transformers in Medical Computer Vision - A Contemplative Retrospection -arXiv 2022.
- Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work -arXiv 2022.
- 3D Vision with Transformers: A Survey -arXiv 2022.
- A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective -arXiv 2022.
- VISION TRANSFORMERS FOR ACTION RECOGNITION: A SURVEY -arXiv 2022.
- Vision transformers for dense prediction: A survey -ELSEVIER 2022.
- Semantic segmentation using Vision Transformers: A survey -ELSEVIER 2023.
- A Comprehensive Survey of Transformers for Computer Vision -MDPI 2023.
- Transformers in Remote Sensing: A Survey -MDPI 2023.
- A Survey of Visual Transformers -IEEE 2023.
- Transformer-Based Visual Segmentation: A Survey -IEEE 2024.
- Mask-Attention-Free Transformer for 3D Instance Segmentation -ICCV 2023 -github
- Query Refinement Transformer for 3D Instance Segmentation -ICCV 2023 -[github]
- 2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision -ICCV 2023 -github
- CDAC: Cross-domain Attention Consistency in Transformer for Domain Adaptive Semantic Segmentation -ICCV 2023 -github
- A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation -ICCV 2023 -[github]
- Efficient 3D Semantic Segmentation with Superpoint Transformer -ICCV 2023 -github
- Adaptive Template Transformer for Mitochondria Segmentation in Electron Microscopy Images -ICCV 2023 -[github]
- CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion -ICCV 2023 -[github]
- VoxFormer: Sparse Voxel Transformer for Camera-Based 3D Semantic Scene Completion -CVPR 2023 -github
- Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation -CVPR 2023 -github
- Heat Diffusion based Multi-scale and Geometric Structure-aware Transformer for Mesh Segmentation -CVPR 2023 -[github]
- CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation -CVPR 2023 -github
- MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation -CVPR 2023 -github
- Contrastive Grouping with Transformer for Referring Image Segmentation -CVPR 2023 -github
- SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation -CVPR 2023 -[github]
- OneFormer: One Transformer to Rule Universal Image Segmentation -CVPR 2023 -github
- HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation -CVPR 2023 -github
- Incrementer: Transformer for Class-Incremental Semantic Segmentation with Knowledge Distillation Focusing on Old Class -CVPR 2023 -[github]
- MP-Former: Mask-Piloted Transformer for Image Segmentation -CVPR 2023 -github
- Transformer Scale Gate for Semantic Segmentation -CVPR 2023 -[github]
- UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration -CVPR 2023 -[github]
- HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation -WACV 2023 -github
- SCTS: Instance Segmentation of Single Cells Using a Transformer-Based Semantic-Aware Model and Space-Filling Augmentation -WACV 2023 -[github]
- Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation -WACV 2023 -github
- The Fully Convolutional Transformer for Medical Image Segmentation -WACV 2023 -github
- Towards Few-Annotation Learning for Object Detection: Are Transformer-based Models More Efficient ? -WACV 2023 -[github]
- BEVSegFormer: Bird’s Eye View Semantic Segmentation From Arbitrary Camera Rigs -** WACV 2023** -[github]
- Medical Image Segmentation via Cascaded Attention Decoding -** WACV 2023** -[github]
- Unsupervised multi-object segmentation using attention and soft-argmax -** WACV 2023** -github
- The Power of Fragmentation: A Hierarchical Transformer Model for Structural Segmentation in Symbolic Music Generation -IEEE 2023 -[github]
- Local-Global Context Aware Transformer for Language-Guided Video Segmentation -IEEE 2023 -github
- Medical Image Segmentation Based on Transformer and HarDNet Structures -IEEE 2023 -[github]
- A Unified Transformer Framework for Group-based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection -IEEE 2023 -github
- The Lighter The Better: Rethinking Transformers in Medical Image Segmentation Through Adaptive Pruning -IEEE 2023 -[github]
- RNGDet++: Road Network Graph Detection by Transformer with Instance Segmentation and Multi-scale Features Enhancement -IEEE 2023 -github
- RockFormer: A U-Shaped Transformer Network for Martian Rock Segmentation -IEEE 2023 -[github]
- Unsupervised Visual Representation Learning Based on Segmentation of Geometric Pseudo-Shapes for Transformer-Based Medical Tasks -IEEE 2023 -[github]
- CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor Segmentation -IEEE 2023 -[github]
- RSSFormer: Foreground Saliency Enhancement for Remote Sensing Land-Cover Segmentation -IEEE 2023 -[github]
- Normal-Knowledge-Based Pavement Defect Segmentation Using Relevance-Aware and Cross-Reasoning Mechanisms -IEEE 2023 -[github]
- High-Resolution Swin Transformer for Automatic Medical Image Segmentation -MDPI 2023 -[github]
- Multi-Swin Mask Transformer for Instance Segmentation of Agricultural Field Extraction -MDPI 2023 -[github]
- Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation -MDPI 2023 -[github]
- Efficient Lung Cancer Image Classification and Segmentation Algorithm Based on an Improved Swin Transformer -MDPI 2023 -[github]
- Transformer-Based Weed Segmentation for Grass Management -MDPI 2023 -[github]
- RCCT-ASPPNet: Dual-Encoder Remote Image Segmentation Based on Transformer and ASPP -MDPI 2023 -[github]
- MCANet: A Multi-Branch Network for Cloud/Snow Segmentation in High-Resolution Remote Sensing Images -MDPI 2023 -[github]
- Muscle Cross-Sectional Area Segmentation in Transverse Ultrasound Images Using Vision Transformers -MDPI 2023 -[github]
- MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images -MDPI 2023 -[github]
- Temporal Segment Transformer for Action Segmentation -arXiv 2023 -[github]
- SEAFORMER: SQUEEZE-ENHANCED AXIAL TRANSFORMER FOR MOBILE SEMANTIC SEGMENTATION -arXiv 2023 -[github]
- MP-Former: Mask-Piloted Transformer for Image Segmentation -arXiv 2023 -[github]
- MedSegDiff-V2: Diffusion based Medical Image Segmentation with Transformer -arXiv 2023 -[github]
- SwinVFTR: A Novel Volumetric Feature-learning Transformer for 3D OCT Fluid Segmentation -arXiv 2023 -[github]
- Towards Robust Video Instance Segmentation with Temporal-Aware Transformer -arXiv 2023 -[github]
- Head-Free Lightweight Semantic Segmentation with Linear Transformer -arXiv 2023 -[github]
- FullStop: Punctuation and Segmentation Prediction for Dutch with Transformers -arXiv 2023 -[github]
- Cooperation Learning Enhanced Colonic Polyp Segmentation Based on Transformer-CNN Fusion -arXiv 2023 -[github]
- SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation -arXiv 2023 -[github]
- Effects of Architectures on Continual Semantic Segmentation -arXiv 2023 -[github]
- MECPformer: Multi-estimations Complementary Patch with CNN-Transformers for Weakly Supervised Semantic Segmentation -arXiv 2023 -github
- PSST! Prosodic Speech Segmentation with Transformers -arXiv 2023 -[github]
- TRANSADAPT: A TRANSFORMATIVE FRAMEWORK FOR ONLINE TEST TIME ADAPTIVE SEMANTIC SEGMENTATION -arXiv 2023 -[github]
- Multi-class Token Transformer for Weakly Supervised Semantic Segmentation -CVPR 2022 -[github]
- TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation -CVPR 2022 -github
- Masked-attention Mask Transformer for Universal Image Segmentation -CVPR 2022 -github
- Temporally Efficient Vision Transformer for Video Instance Segmentation -CVPR 2022 -github
- An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation -CVPR 2022 -[github]
- Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation -CVPR 2022 -github
- MPViT : Multi-Path Vision Transformer for Dense Prediction -CVPR 2022 -[github]
- Unetr: Transformers for 3d medical image segmentation -WACV 2022 -github
- AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation -WACV 2022 -[github]
- Spatial-Temporal Transformer for 3D Point Cloud Sequences -WACV 2022 -[github]
- SegViT: Semantic Segmentation with Plain Vision Transformerss -NIPs 2022 -github
- Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation -NIPs 2022 -[github]
- RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer -NIPs 2022 -[github]
- Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation -IEEE 2022 -[github]
- Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-high-resolution Remote Sensing Imagery -IEEE 2022 -[github]
- A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images -IEEE 2022 -[github]
- LFT-Net: Local Feature Transformer Network for Point Clouds Analysis -IEEE 2022 -[github]
- Transformer-based Efficient Salient Instance Segmentation Networks with Orientative Query --Code
- Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images -IEEE 2022 -[github]
- Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images -IEEE 2022 -[github]
- Enhanced Feature Pyramid Vision Transformer for Semantic Segmentation on Thailand Landsat-8 Corpus -MDPI 2022 -[github]
- Pyramid fusion transformer for semantic segmentation -arXiv 2022 -[github]
- TransBTSV2: Wider Instead of Deeper Transformer for Medical Image Segmentation -arXiv 2022 -github
- Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images -arXiv 2022 -[github]
- Task-Adaptive Feature Transformer with Semantic Enrichment for Few-Shot Segmentation -arXiv 2022 -github
- Inverted Pyramid Multi-task Transformer for Dense Scene Understanding -arXiv 2022 -github
- MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers -CVPR 2021 -github
- End-to-End Video Instance Segmentation With Transformers -CVPR 2021 -github
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers -CVPR 2021 -github
- Sstvos: Sparse spatiotemporal transformers for video object segmentation -CVPR 2021 -github
- Locate then Segment: A Strong Pipeline for Referring Image Segmentation -CVPR 2021 -[github]
- Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions -ICCV 2021 -github
- Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions — Supplemental Materials -ICCV 2021 -[github]
- Joint Inductive and Transductive Learning for Video Object Segmentation -ICCV 2021 -github
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows -ICCV 2021 -github
- Self-supervised Video Object Segmentation by Motion Grouping -ICCV 2021 -github
- Vision Transformers for Dense Prediction -ICCV 2021 -github
- Point Transformer -ICCV 2021 -github
- SOTR: Segmenting Objects with Transformers -ICCV 2021 -github
- A Unified Efficient Pyramid Transformer for Semantic Segmentation -ICCV 2021 -github
- Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding -ICCV 2021 -github
- Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer -ICCV 2021 -github
- Trans4Trans: Efficient Transformer for Transparent Object Segmentation to Help Visually Impaired People Navigate in the Real World -ICCV 2021 -[github]
- Vision-Language Transformer and Query Generation for Referring Segmentation -ICCV 2021 -github
- Segmenter: Transformer for Semantic Segmentation -ICCV 2021 -github
- Twins: Revisiting the Design of Spatial Attention in Vision Transformers -NIPs 2021 -github
- HRFormer: High-Resolution Transformer for Dense Prediction -NIPs 2021 -github
- SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers -NIPs 2021 -github
- Per-Pixel Classification is Not All You Need for Semantic Segmentation -NIPs 2021 -github
- Associating Objects with Transformers for Video Object Segmentation -NIPs 2021 -github
- Video Instance Segmentation using Inter-Frame Communication Transformers -NIPs 2021 -github
- Few-Shot Segmentation via Cycle-Consistent Transformer -NIPs 2021 -github
- Medical Transformer: Gated Axial-Attention for Medical Image Segmentation -MICCIA 2021 -github
- UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation -MICCIA 2021 -github
- Transbts: Multimodal brain tumor segmentation using transformer -MICCIA 2021 -github
- Multi-compound transformer for accurate biomedical image segmentation -MICCIA 2021 -[github]
- A multi-branch hybrid transformer network for corneal endothelial cell segmentation -MICCIA 2021 -[github]
- DC-Net: Dual Context Network for 2D Medical Image Segmentation -MICCIA 2021 -[github]
- Transfuse: Fusing transformers and cnns for medical image segmentation -MICCIA 2021 -github
- Teds-net: Enforcing diffeomorphisms in spatial transformers to guarantee topology preservation in segmentations -MICCIA 2021 -[github]
- Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation -MICCIA 2021 -github
- Boundary-aware transformers for skin lesion segmentation -MICCIA 2021 -github
- Convolution-Free Medical Image Segmentation using Transformers -MICCIA 2021 -[github]
- Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images -MDPI 2021 -[github]
- Wildfire Segmentation Using Deep Vision Transformers -MDPI 2021 -[github]
- Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images -MDPI 2021 -github
- Efficient Transformer for Remote Sensing Image Segmentation -MDPI 2021 -github
- Segmentation applying TAG type label data and Transformer -IEEE 2021 -[github]
- Local Memory Attention for Fast Video Semantic Segmentation --IEEE 2021 -[github]
- A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization -IEEE 2021 -[github]
- STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation -IEEE 2021 -[github]
- Swin-Spectral Transformer for Cholangiocarcinoma Hyperspectral Image Segmentation -IEEE 2021 -[github]
- ECT-NAS: Searching Efficient CNN-Transformers Architecture for Medical Image Segmentation -IEEE 2021 -[github]
- 3D Deep Attentive U-Net with Transformer for Breast Tumor Segmentation from Automated Breast Volume Scanner -IEEE 2021 -[github]
- Visual-Semantic Transformer for Face Forgery Detection -IEEE 2021 -[github]
- MaAST: Map Attention with Semantic Transformers for Efficient Visual Navigation -IEEE 2021 -[github]
- Multi-scale Hierarchical Transformer structure for 3D medical image segmentation -IEEE 2021 -[github]
- A Temporary Transformer Network for Guide- Wire Segmentation -IEEE 2021 -[github]
- A Transformer-Based Network for Anisotropic 3D Medical Image Segmentation -IEEE 2021 -[github]
- OffRoadTranSeg: Semi-Supervised Segmentation using Transformers on OffRoad environments -arXiv 2021 -[github]
- Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation -arXiv 2021 -github
- Self-Supervised Learning with Swin Transformers -arXiv 2021 -[github]
- GT U-Net: A U-Net Like Group Transformer Network for Tooth Root Segmentation -arXiv 2021 -[github]
- SpecTr: Spectral Transformer for Hyperspectral Pathology Image Segmentation -arXiv 2021 -[github]
- Satellite Image Semantic Segmentation -arXiv 2021 -github
- Boosting Few-shot Semantic Segmentation with Transformers -arXiv 2021 -[github]
- Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation -arXiv 2021 -github
- A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation -arXiv 2021 -github
- Dynamic Convolution for 3D Point Cloud Instance Segmentation -arXiv 2021 -[github]
- Fast Point Transformer -arXiv 2021 -github
- ViTBIS: Vision Transformer for Biomedical Image Segmentation -arXiv 2021 -[github]
- Fully Transformer Networks for Semantic Image Segmentation -arXiv 2021 -[github]
- UNetFormer: A UNet-like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene Imagery -arXiv 2021 -[github]
- Unsupervised Brain Anomaly Detection and Segmentation with Transformers -arXiv 2021 -[github]
- few-Shot Temporal Action Localization with Query Adaptive Transformer -arXiv 2021 -github
- Cost Aggregation Is All You Need for Few-Shot Segmentation -arXiv 2021 -[github]
- Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers -arXiv 2021 -github
- TransAttUnet: Multi-level Attention-guided U-Net with Transformer for Medical Image Segmentation -arXiv 2021 -[github]
- ASFormer: Transformer for Action Segmentation -arXiv 2021 -github
- TransClaw U-Net: Claw U-Net with Transformers for Medical Image Segmentation -arXiv 2021 -[github]
- SeqFormer: Sequential Transformer for Video Instance Segmentation -arXiv 2021 -github
- Mask2Former for Video Instance Segmentation -arXiv 2021 -github
- Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation -arXiv 2021 -github
- LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation -arXiv 2021 -[github]
- ISTR: End-to-End Instance Segmentation with Transformers -arXiv 2021 -github
- P2T: Pyramid Pooling Transformer for Scene Understanding -arXiv 2021 -[github]
- Medical Transformer: Universal Brain Encoder for 3D MRI Analysis -arXiv 2021 -[github]
- nnFormer: Interleaved Transformer for Volumetric Segmentation -arXiv 2021 -[github]
- MISSFormer: An Effective Medical Image Segmentation Transformer -arXiv 2021 -[github]
- ViT-V-Net: Vision Transformer for Unsupervised Volumetric Medical Image Registration -arXiv 2021 -[github]
- Pyramid Medical Transformer for Medical Image Segmentation -arXiv 2021 -[github]
- U-Net Transformer: Self and Cross Attention for Medical Image Segmentation -arXiv 2021 -[github]
- Ds-transunet: Dual swin transformer u-net for medical image segmentation -arXiv 2021 -[github]
- TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation -arXiv 2021 -github
- TransVOS: Video Object Segmentation with Transformers -arXiv 2021 -[github]
- Polytransform: Deep polygon transformer for instance segmentation -CVPR 2020 -[github]
- Sct: Set constrained temporal transformer for set supervised action segmentation -CVPR 2020 -github
- Feature pyramid transformer -ECCV 2020 -github
- End-to-end object detection with transformers -ECCV 2020 -github
- Multi-task Dynamic Transformer Network for Concurrent Bone Segmentation and Large-Scale Landmark Localization with Dental CBCT -MICCIA 2020 -[github]
- Attention-Based Transformers for Instance Segmentation of Cells in Microstructures -IEEE 2020 -github
- Detecting lane and road markings at a distance with perspective transformer layers -IEEE 2020 -[github]
- Efficient aortic valve multilabel segmentation using a spatial transformer network -IEEE 2020 -[github]
- Visual transformers: Token-based image representation and processing for computer vision -arXiv 2020 -github
- Task-adaptive feature transformer for few-shot segmentation -arXiv 2020 -github
- TETRIS: Template transformer networks for image segmentation with shape priors -IEEE 2019 -[github]
- Iterative transformer network for 3d point cloud -arXiv 2019 -github
- Segmentation transformer: Object-contextual representations for semantic segmentation -arXiv 2019 -[github]
- TrSeg: Transformer for semantic segmentation -Pattern Recognition Letters 2021 -github
- Video Semantic Segmentation via Sparse Temporal Transformer -ACM 2021 -[github]
We appreciate the excellent work of the authors mentioned above.