UCCH

Peng Hu, Hongyuan Zhu, Jie Lin, Dezhong Peng, Yin-Ping Zhao, Xi Peng*,Unsupervised Contrastive Cross-modal Hashing, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 3, pp. 3877-3889, 1 March 2023, doi: 10.1109/TPAMI.2022.3177356. (PyTorch Code)

Abstract

In this paper, we study how to make unsupervised cross-modal hashing (CMH) benefit from contrastive learning (CL) by overcoming two challenges. To be exact, i) to address the performance degradation issue caused by binary optimization for hashing, we propose a novel momentum optimizer that performs hashing operation learnable in CL, thus making on-the-shelf deep cross-modal hashing possible. In other words, our method does not involve binary-continuous relaxation like most existing methods, thus enjoying better retrieval performance; ii) to alleviate the influence brought by false-negative pairs (FNPs), we propose a Cross-modal Ranking Learning loss (CRL) which utilizes the discrimination from all instead of only the hard negative pairs, where FNP refers to the within-class pairs that were wrongly treated as negative pairs. Thanks to such a global strategy, CRL endows our method with better performance because CRL will not overuse the FNPs while ignoring the true-negative pairs. To the best of our knowledge, the proposed method could be one of the first successful contrastive hashing methods. To demonstrate the effectiveness of the proposed method, we carry out experiments on five widely-used datasets compared with 13 state-of-the-art methods. The code is available at https://github.com/penghu-cs/UCCH.

Framework

Figure 1 The pipeline of the proposed method and we take a bimodal case as an example. In the example, two modality-specific networks learn unified binary representations for different modalities. The outputs of networks directly interact with the hash codes to learn the latent discrimination by using instance-level contrast without continuous relaxation, i.e., contrastive hashing learning (𝓛_𝒸). The cross-modal ranking loss 𝓛_𝑟 is utilized to bridge cross-modal hashing learning to cross-modal retrieval.

Usage

To train a model with 128 bits on MIRFLICKR-25K, just run UCCH.py:

# Features
python UCCH.py --data_name mirflickr25k_fea --bit 128 --alpha 0.7 --num_hiden_layers 3 2 --margin 0.2 --max_epochs 20 --train_batch_size 256 --shift 0.1 --lr 0.0001 --optimizer Adam

# Raw data
python UCCH.py --data_name mirflickr25k --bit 128 --alpha 0.7 --num_hiden_layers 3 2 --margin 0.2 --max_epochs 20 --train_batch_size 256 --shift 0.1 --lr 0.0001 --optimizer Adam --warmup_epoch 5 --pretrain -a vgg11

You can get outputs as follows:

Epoch: 13 / 20
[================= 70/70 ====================>]  Step: 28ms | Tot: 2s18ms | Loss: 13.205 | LR: 0.0001                                                                                                             
Evaluation:	Img2Txt: 0.75797 	 Txt2Img: 0.759172 	 Avg: 0.758571

Epoch: 14 / 20
[================= 70/70 ====================>]  Step: 28ms | Tot: 1s951ms | Loss: 13.193 | LR: 0.0001                                                                                                            
Evaluation:	Img2Txt: 0.759404 	 Txt2Img: 0.759482 	 Avg: 0.759443

Epoch: 15 / 20
[================= 70/70 ====================>]  Step: 28ms | Tot: 1s965ms | Loss: 13.180 | LR: 0.0001                                                                                                            
Evaluation:	Img2Txt: 0.758604 	 Txt2Img: 0.75909 	 Avg: 0.758847

Epoch: 16 / 20
[================= 70/70 ====================>]  Step: 28ms | Tot: 1s973ms | Loss: 13.170 | LR: 0.0001                                                                                                            
Evaluation:	Img2Txt: 0.758019 	 Txt2Img: 0.757934 	 Avg: 0.757976

Epoch: 17 / 20
[================= 70/70 ====================>]  Step: 28ms | Tot: 1s973ms | Loss: 13.160 | LR: 0.0001                                                                                                            
Evaluation:	Img2Txt: 0.757612 	 Txt2Img: 0.758054 	 Avg: 0.757833

Epoch: 18 / 20
[================= 70/70 ====================>]  Step: 29ms | Tot: 1s968ms | Loss: 13.151 | LR: 0.0001                                                                                                            
Evaluation:	Img2Txt: 0.757199 	 Txt2Img: 0.757834 	 Avg: 0.757517

Epoch: 19 / 20
[================= 70/70 ====================>]  Step: 30ms | Tot: 2s43ms | Loss: 13.144 | LR: 0.0001                                                                                                             
Evaluation:	Img2Txt: 0.757373 	 Txt2Img: 0.757289 	 Avg: 0.757331
Test:	Img2Txt: 0.769567 	 Txt2Img: 0.746658 	 Avg: 0.758112

Comparison with the State-of-the-Art

TABLE 1: Performance comparison in terms of MAP scores on the MIRFLICKR-25K and IAPR TC-12 datasets. The highest score is shown in boldface.

Method	MIRFLICKR-25K								IAPR TC-12
	Image → Text				Text → Image				Image → Text				Text → Image
	16	32	64	128	16	32	64	128	16	32	64	128	16	32	64	128
CVH[20]	0.620	0.608	0.594	0.583	0.629	0.615	0.599	0.587	0.392	0.378	0.366	0.353	0.398	0.384	0.372	0.360
LSSH[59]	0.597	0.609	0.606	0.605	0.602	0.598	0.598	0.597	0.372	0.386	0.396	0.404	0.367	0.380	0.392	0.401
CMFH[60]	0.557	0.557	0.556	0.557	0.553	0.553	0.553	0.553	0.312	0.314	0.314	0.315	0.306	0.306	0.306	0.306
FSH[18]	0.581	0.612	0.635	0.662	0.576	0.607	0.635	0.660	0.377	0.392	0.417	0.445	0.383	0.399	0.425	0.451
DLFH[23]	0.638	0.658	0.677	0.684	0.675	0.700	0.718	0.725	0.342	0.358	0.374	0.395	0.358	0.380	0.403	0.434
MTFH[16]	0.507	0.512	0.558	0.554	0.514	0.524	0.518	0.581	0.277	0.324	0.303	0.311	0.294	0.337	0.269	0.297
FOMH[58]	0.575	0.640	0.691	0.659	0.585	0.648	0.719	0.688	0.312	0.316	0.317	0.350	0.311	0.315	0.322	0.373
DCH[34]	0.596	0.602	0.626	0.636	0.612	0.623	0.653	0.665	0.336	0.336	0.344	0.352	0.350	0.358	0.374	0.391
UGACH[61]	0.685	0.693	0.704	0.702	0.673	0.676	0.686	0.690	0.462	0.467	0.469	0.480	0.447	0.463	0.468	0.463
DJSRH[62]	0.652	0.697	0.700	0.716	0.662	0.691	0.683	0.695	0.409	0.412	0.470	0.480	0.418	0.436	0.467	0.478
JDSH[63]	0.724	0.734	0.741	0.745	0.710	0.720	0.733	0.720	0.449	0.472	0.478	0.484	0.447	0.477	0.473	0.486
DGCPN[64]	0.711	0.723	0.737	0.748	0.695	0.707	0.725	0.731	0.465	0.485	0.486	0.495	0.467	0.488	0.491	0.497
UCH[13]	0.654	0.669	0.679	/	0.661	0.667	0.668	/	0.447	0.471	0.485	/	0.446	0.469	0.488	/
UCCH	0.739	0.744	0.754	0.760	0.725	0.725	0.743	0.747	0.478	0.491	0.503	0.508	0.474	0.488	0.503	0.508

Table 2: Performance comparison in terms of MAP scores on the NUS-WIDE and MS-COCO datasets. The highest score is shown in boldface.

Method	NUS-WIDE								MS-COCO
	Image → Text				Text → Image				Image → Text				Text → Image
	16	32	64	128	16	32	64	128	16	32	64	128	16	32	64	128
CVH[20]	0.487	0.495	0.456	0.419	0.470	0.475	0.444	0.412	0.503	0.504	0.471	0.425	0.506	0.508	0.476	0.429
LSSH[59]	0.442	0.457	0.450	0.451	0.473	0.482	0.471	0.457	0.484	0.525	0.542	0.551	0.490	0.522	0.547	0.560
CMFH[60]	0.339	0.338	0.343	0.339	0.306	0.306	0.306	0.306	0.366	0.369	0.370	0.365	0.346	0.346	0.346	0.346
FSH[18]	0.557	0.565	0.598	0.635	0.569	0.604	0.651	0.666	0.539	0.549	0.576	0.587	0.537	0.524	0.564	0.573
DLFH[23]	0.385	0.399	0.443	0.445	0.421	0.421	0.462	0.474	0.522	0.580	0.614	0.631	0.444	0.489	0.513	0.534
MTFH[16]	0.297	0.297	0.272	0.328	0.353	0.314	0.399	0.410	0.399	0.293	0.295	0.395	0.335	0.374	0.300	0.334
FOMH[58]	0.305	0.305	0.306	0.314	0.302	0.304	0.300	0.306	0.378	0.514	0.571	0.601	0.368	0.484	0.559	0.595
DCH[34]	0.392	0.422	0.430	0.436	0.379	0.432	0.444	0.459	0.422	0.420	0.446	0.468	0.421	0.428	0.454	0.471
UGACH[61]	0.613	0.623	0.628	0.631	0.603	0.614	0.640	0.641	0.553	0.599	0.598	0.615	0.581	0.605	0.629	0.635
DJSRH[62]	0.502	0.538	0.527	0.556	0.465	0.532	0.538	0.545	0.501	0.563	0.595	0.615	0.494	0.569	0.604	0.622
JDSH[63]	0.647	0.656	0.679	0.680	0.649	0.669	0.689	0.699	0.579	0.628	0.647	0.662	0.578	0.634	0.659	0.672
DGCPN[64]	0.610	0.614	0.635	0.641	0.617	0.621	0.642	0.647	0.552	0.590	0.602	0.596	0.564	0.590	0.597	0.597
UCH[13]	/	/	/	/	/	/	/	/	0.521	0.534	0.547	/	0.499	0.519	0.545	/
UCCH	0.698	0.708	0.737	0.742	0.701	0.724	0.745	0.750	0.605	0.645	0.655	0.665	0.610	0.655	0.666	0.677

Ablation Study

Table 3: Ablation study on different datasets. The highest score is shown in boldface.

Dataset	Method	Image → Text				Text → Image
Dataset	Method	16	32	64	128	16	32	64	128
IAPR TC-12	UCCH (with 𝓛_𝒸 only)	0.457	0.469	0.478	0.482	0.447	0.469	0.483	0.486
	UCCH (with 𝓛'_{𝑟, 𝑚=0.1} only)	0.410	0.426	0.432	0.438	0.421	0.434	0.461	0.460
	UCCH (with 𝓛'_{𝑟, 𝑚=0.5} only)	0.423	0.446	0.463	0.470	0.434	0.450	0.471	0.479
	UCCH (with 𝓛'_{𝑟, 𝑚=0.9} only)	0.444	0.460	0.472	0.480	0.450	0.472	0.469	0.476
	UCCH (with 𝓛_𝑟 only)	0.461	0.482	0.496	0.495	0.457	0.476	0.492	0.488
	Full UCCH	0.478	0.491	0.503	0.508	0.474	0.488	0.503	0.508
MS-COCO	UCCH (with 𝓛_𝒸 only)	0.577	0.605	0.621	0.624	0.579	0.610	0.626	0.627
	UCCH (with 𝓛'_{𝑟, 𝑚=0.1} only)	0.495	0.512	0.548	0.555	0.483	0.503	0.534	0.549
	UCCH (with 𝓛'_{𝑟, 𝑚=0.5} only)	0.499	0.525	0.554	0.579	0.498	0.527	0.546	0.566
	UCCH (with 𝓛'_{𝑟, 𝑚=0.9} only)	0.529	0.535	0.554	0.558	0.525	0.545	0.546	0.560
	UCCH (with 𝓛_𝑟 only)	0.563	0.574	0.599	0.602	0.563	0.576	0.606	0.609
	Full UCCH	0.605	0.645	0.655	0.665	0.610	0.655	0.666	0.677

Citation

If you find UCCH useful in your research, please consider citing:

@article{hu2022UCCH,
   title={Unsupervised Contrastive Cross-modal Hashing},
   author={Peng Hu, Hongyuan Zhu, Jie Lin, Dezhong Peng, Yin-Ping Zhao, Xi Peng},
   journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
   year={2023},
   volume={45},
   number={3},
   pages={3877-3889},
   doi={10.1109/TPAMI.2022.3177356}
}

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
NCE		NCE
data		data
nets		nets
paper		paper
src		src
utils		utils
README.md		README.md
UCCH.py		UCCH.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCCH

Abstract

Framework

Usage

Comparison with the State-of-the-Art

TABLE 1: Performance comparison in terms of MAP scores on the MIRFLICKR-25K and IAPR TC-12 datasets. The highest score is shown in boldface.

Table 2: Performance comparison in terms of MAP scores on the NUS-WIDE and MS-COCO datasets. The highest score is shown in boldface.

Ablation Study

Table 3: Ablation study on different datasets. The highest score is shown in boldface.

Citation

About

Releases

Packages

Languages

penghu-cs/UCCH

Folders and files

Latest commit

History

Repository files navigation

UCCH

Abstract

Framework

Usage

Comparison with the State-of-the-Art

TABLE 1: Performance comparison in terms of MAP scores on the MIRFLICKR-25K and IAPR TC-12 datasets. The highest score is shown in boldface.

Table 2: Performance comparison in terms of MAP scores on the NUS-WIDE and MS-COCO datasets. The highest score is shown in boldface.

Ablation Study

Table 3: Ablation study on different datasets. The highest score is shown in boldface.

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages