Skip to content

[LDFusion] Official implementation for "Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space"

Notifications You must be signed in to change notification settings

wyhlaowang/LDFusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LDFusion: Infrared and Visible Image Fusion with Language-driven Loss in CLIP Embedding Space

Yuhao Wang, Lingjuan Miao, Zhiqiang Zhou, Lei Zhang, Yajun Qiao

  • We first propose to use nature language to express the whole objective of IVIF, which allows to avoid the complex and explicit mathematical modeling in current fusion loss functions.

  • A language-driven fusion model is derived in CLIP embedding space, based on which we develop a simple yet highly effective language-driven loss for IVIF. Particularly, by introducing a novel regularization and patch filtering approach, we ensure high robustness of the trained model in practice and resolve the challenge of removing textual artifacts induced by CLIP.

  • Experiments show a great improvement of fusion quality achieved by the proposed method, revealing the superiority of language in modeling of the fusion output and the potential of pre-trained vision-language model in improving the IVIF performance.

Usage

1. Create Environment

  • create conda environment
conda create -n LDFusion python=3.9.12
conda activate LDFusion
  • Install Dependencies
pip install -r requirements.txt

(recommended cuda11.1 and torch 1.8.2)

2. Data Preparation and Running

Please put test data into the test_imgs directory (infrared images in ir subfolder, visible images in vi subfolder), and run python src/test.py.

(Note: The weight files (*.pt) might require independent download from the repository)

Then, the fused results will be saved in the ./results/ folder.

Examples

From left to right are the infrared image, visible image, and the fused image generated by LDFusion.

About

[LDFusion] Official implementation for "Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages