Releases: ssundaram21/dreamsim
v0.2.1
We're releasing 4 new variants of DreamSim! These new checkpoints are:
- DINOv2 B/14 and SynCLR B/16 as backbones
- DINOv2 B/14 and DINO B/16 trained with the original contrastive loss on both CLS and dense features.
These models (and the originals) are further evaluated in our new NeurIPS 2024 paper, When Does Perceptual Alignment Benefit Vision Representations?
We find that our perceptually-aligned representations outperform the baseline models on a variety of downstream standard computer vision tasks, including semantic segmentation, depth estimation, object counting, instance retrieval, and retrieval-augmented generation. These results point towards perceptual alignment as a useful objective for learning general-purpose vision representations. See the paper and our blog post for more details.
Here's how they perform on NIGHTS:
NIGHTS - Val | NIGHTS - Test | |
---|---|---|
ensemble |
96.9% | 96.2% |
dino_vitb16 |
95.6% | 94.8% |
open_clip_vitb32 |
95.6% | 95.3% |
clip_vitb32 |
94.9% | 93.6% |
dinov2_vitb14 |
94.9% | 95.0% |
synclr_vitb16 |
96.0% | 95.9% |
dino_vitb16 (patch) |
94.9% | 94.8% |
dinov2_vitb14 (patch) |
95.5% | 95.1% |
Additionally, we fixed a bug in embedding normalization. This shouldn't significantly affect model performance, but may explain very minor changes in pipelines where DreamSim (with normalize_embeds=True
) is being used.
v0.2.1-checkpoints
Checkpoints for v0.2.1 release.
v0.2.0
We're releasing new DreamSim models that are compatible with updated versions of peft!
If you're already using one of the DreamSim models, you just need to make the following changes:
- Update the
dreamsim
package to version0.2.0
. - Update your environment to use
peft >= 0.2.0
. - Remove any old local/cached dreamsim checkpoints. The next time you call the main
dreamsim
function, it will automatically download the updated checkpoints.
Here's how the new models perform on NIGHTS:
NIGHTS - Val | NIGHTS - Test | |
---|---|---|
ensemble |
96.6% | 96.1% |
dino_vitb16 |
95.7% | 94.8% |
open_clip_vitb32 |
95.6% | 93.6% |
clip_vitb32 |
95.5% | 95.3% |
We're also releasing updates/additions to the NIGHTS dataset:
- If you're having trouble with large file sizes when downloading NIGHTS, you can now run
./dataset/download_chunked_dataset.sh
to get the dataset split into 200 smaller zips. - We only use the 20k unanimous triplets for training and evaluation, but release all 100k triplets (many with few and/or split votes) for research purposes. Run
./dataset/download_unfiltered_dataset.sh
to download and unzip this unfiltered version of NIGHTS dataset (289 GB) - Download the just-noticeable difference (JND) votes by running
./dataset/download_jnd_dataset.sh
. We've also updated the DreamSim Colab with an example of loading a JND trial.
v0.2.0-checkpoints
Checkpoints for v0.2.0 release.
v0.1.3
- Fixed a bug with caching ensemble model checkpoints.
v0.1.2
We're releasing three lighter-weight versions of DreamSim that each use only one ViT model (instead of the full ensemble). The backbone options are DINO-ViTB/16, CLIP-ViTB/32, and OpenCLIP-ViTB/32.
To load a single-backbone version of dreamsim, use the new dreamsim_type
argument (defaults to "ensemble"). For example:
dreamsim_dino_model, preprocess = dreamsim(pretrained=True, dreamsim_type="dino_vitb16")
Here's how the single-backbone finetuned models compare to the ensemble on NIGHTS:
- Ensemble: 96.2%
- OpenCLIP-ViTB/32: 95.5%
- DINO-ViTB/16: 94.6%
- CLIP-ViTB/32: 93.9%
For more details please refer to our paper.
Release
v0.1.0 Update LICENSE