clip-benchmark model init #6

escorciav · 2024-11-07T15:23:53Z

Hi guys!

Thanks for making your research accessible to the public & congrats on your CVPRW-2024 paper 🎉

Is this the boilerplate required to plugin SynthCLIP in clip-bench as mentioned in #5 or #2 ?

cp Training/models.py <clip-benchmark-dir/clip_benchmark/models/synthclip.py>

Append this function onto that module

def load_synthclip(pretrained: str = "./checkpoints/synthclip-30m/checkpoint_best.pt",
                   device="cpu", **kwargs):
    model = CLIP_VITB16()
    # Taken from
    # https://github.com/hammoudhasan/SynthCLIP/blob/02ef69764d8dc921650bcac4a98bd0f477790787/Training/main.py#L240
    normalize = transforms.Normalize(
        mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
    )
    transform = transforms.Compose(
        [
            transforms.Resize(224),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            # dunno why I need that but whatever XD. EOM - Victor
            lambda x: x.repeat(3, 1, 1) if x.shape[0] == 1 else x,  # force RGB
            normalize,
        ]
    )
    model = model.to(device)
    tokenizer = open_clip.get_tokenizer("ViT-B-16")
    return model, transform, tokenizer

then register it as mentioned here

Thanks in advance!

The text was updated successfully, but these errors were encountered:

escorciav · 2024-11-07T16:05:25Z

Worked for me (I believe). If needed, check my fork of clip-bench out 😉 . Your welcome!

clip_benchmark eval --model "ViT-B-16" --model_type synthclip --pretrained $pretrained --dataset=$dataset --output=$output --dataset_root $dataset_root
# Debugging
# python -m ipdb clip-benchmark/clip_benchmark/cli.py eval --model_type synthclip --pretrained $pretrained --dataset=$dataset --output=$output --dataset_root $dataset_root --num_workers 0

escorciav · 2024-11-19T05:56:50Z

In case anyone is interested

escorciav · 2024-12-04T11:00:36Z

@hammoudhasan or @HaniItani could you please review if the following stuff is correct

import torch
from PIL import Image
from clip_benchmark.models.synthclip import CLIP_VITB16, load_synthclip

checkpoint_path = "./logs/synthclip-30m/checkpoint_best.pt"
device = 'gpu'
use_clip_benchmark = True

if not use_clip_benchmark:
    print('Load synthclip as per example...')
    model = torch.nn.DataParallel(CLIP_VITB16())
    checkpoint = torch.load(checkpoint_path, map_location=device)
    load_status = model.load_state_dict(checkpoint["state_dict"])
    model = model.module
    print(load_status)
else:
    print('Load synthclip as per clip_benchmark...')
    model, transform, tokenizer = load_synthclip(
        model_path="./logs/synthclip-30m/checkpoint_best.pt",
        map_location=device
    )

print('Load & preprocess image...')
img_path = "./open_clip/docs/CLIP.png"
image = Image.open(img_path)
image = image.convert('RGB')
image = transform(image).unsqueeze(0)
print('Tokenize text...')
text = tokenizer(["a diagram", "a dog", "a cat"])
print('Fwd-pass model...')
amp_kwargs = dict(device_type="cuda", dtype=torch.float16) if "gpu" in device else dict(device_type="cpu")

with torch.no_grad(), torch.amp.autocast(**amp_kwargs):
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    logit_scale = model.logit_scale.exp()

    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)
    text_probs = (logit_scale * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)  # prints: [[1., 0., 0.]]

###############################################################################
# open_clip
###############################################################################

# model, _, preprocess = open_clip.create_model_and_transforms(model_arch, pretrained=model_path,
#                                                              load_weights_only=False)
# tokenizer = open_clip.get_tokenizer(model_arch)

# img_path = "./docs/CLIP.png"
# image = preprocess(Image.open(img_path)).unsqueeze(0)
# text = tokenizer(["a diagram", "a dog", "a cat"])

# with torch.no_grad(), torch.cuda.amp.autocast():
#     image_features = model.encode_image(image)
#     text_features = model.encode_text(text)
#     image_features /= image_features.norm(dim=-1, keepdim=True)
#     text_features /= text_features.norm(dim=-1, keepdim=True)

#     text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

# print("Label probs:", text_probs)  # prints: [[1., 0., 0.]]

... # verbosity
Fwd-pass model...
Label probs: tensor([[0.3227, 0.2939, 0.3834]])

escorciav · 2024-12-04T11:04:28Z

BTW, using

output = model(image, text)
image_features, text_features = output["image_embed"], output["text_embed"]
logit_scale = output["logit_scale"]

The result is
Label probs: tensor([[0.2790, 0.3688, 0.3522]])

escorciav · 2024-12-06T11:40:54Z

Latest version polished by the grrreat @HaniItani is here 🙌

Label probs: tensor([[0.0048, 0.0878, 0.9075]], device='cuda:0') 🎉

escorciav mentioned this issue Nov 7, 2024

Support synthclip LAION-AI/CLIP_benchmark#129

Open

escorciav closed this as completed Dec 6, 2024

escorciav mentioned this issue Dec 6, 2024

Support open_clip #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip-benchmark model init #6

clip-benchmark model init #6

escorciav commented Nov 7, 2024 •

edited

Loading

escorciav commented Nov 7, 2024 •

edited

Loading

escorciav commented Nov 19, 2024

escorciav commented Dec 4, 2024

escorciav commented Dec 4, 2024

escorciav commented Dec 6, 2024 •

edited

Loading

clip-benchmark model init #6

clip-benchmark model init #6

Comments

escorciav commented Nov 7, 2024 • edited Loading

escorciav commented Nov 7, 2024 • edited Loading

escorciav commented Nov 19, 2024

escorciav commented Dec 4, 2024

escorciav commented Dec 4, 2024

escorciav commented Dec 6, 2024 • edited Loading

escorciav commented Nov 7, 2024 •

edited

Loading

escorciav commented Nov 7, 2024 •

edited

Loading

escorciav commented Dec 6, 2024 •

edited

Loading