-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clip-benchmark model init #6
Comments
Worked for me (I believe). If needed, check my fork of clip-bench out 😉 . Your welcome!
|
In case anyone is interested |
@hammoudhasan or @HaniItani could you please review if the following stuff is correct import torch
from PIL import Image
from clip_benchmark.models.synthclip import CLIP_VITB16, load_synthclip
checkpoint_path = "./logs/synthclip-30m/checkpoint_best.pt"
device = 'gpu'
use_clip_benchmark = True
if not use_clip_benchmark:
print('Load synthclip as per example...')
model = torch.nn.DataParallel(CLIP_VITB16())
checkpoint = torch.load(checkpoint_path, map_location=device)
load_status = model.load_state_dict(checkpoint["state_dict"])
model = model.module
print(load_status)
else:
print('Load synthclip as per clip_benchmark...')
model, transform, tokenizer = load_synthclip(
model_path="./logs/synthclip-30m/checkpoint_best.pt",
map_location=device
)
print('Load & preprocess image...')
img_path = "./open_clip/docs/CLIP.png"
image = Image.open(img_path)
image = image.convert('RGB')
image = transform(image).unsqueeze(0)
print('Tokenize text...')
text = tokenizer(["a diagram", "a dog", "a cat"])
print('Fwd-pass model...')
amp_kwargs = dict(device_type="cuda", dtype=torch.float16) if "gpu" in device else dict(device_type="cpu")
with torch.no_grad(), torch.amp.autocast(**amp_kwargs):
image_features = model.encode_image(image)
text_features = model.encode_text(text)
logit_scale = model.logit_scale.exp()
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
text_probs = (logit_scale * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs) # prints: [[1., 0., 0.]]
###############################################################################
# open_clip
###############################################################################
# model, _, preprocess = open_clip.create_model_and_transforms(model_arch, pretrained=model_path,
# load_weights_only=False)
# tokenizer = open_clip.get_tokenizer(model_arch)
# img_path = "./docs/CLIP.png"
# image = preprocess(Image.open(img_path)).unsqueeze(0)
# text = tokenizer(["a diagram", "a dog", "a cat"])
# with torch.no_grad(), torch.cuda.amp.autocast():
# image_features = model.encode_image(image)
# text_features = model.encode_text(text)
# image_features /= image_features.norm(dim=-1, keepdim=True)
# text_features /= text_features.norm(dim=-1, keepdim=True)
# text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
# print("Label probs:", text_probs) # prints: [[1., 0., 0.]]
|
BTW, using output = model(image, text)
image_features, text_features = output["image_embed"], output["text_embed"]
logit_scale = output["logit_scale"] The result is |
Latest version polished by the grrreat @HaniItani is here 🙌
|
Open
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi guys!
Thanks for making your research accessible to the public & congrats on your CVPRW-2024 paper 🎉
Is this the boilerplate required to plugin SynthCLIP in clip-bench as mentioned in #5 or #2 ?
cp Training/models.py <clip-benchmark-dir/clip_benchmark/models/synthclip.py>
Append this function onto that module
then register it as mentioned here
Thanks in advance!
The text was updated successfully, but these errors were encountered: