-
-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(onnx): ViT zero-shot tasks #858
Comments
Preview BlogTutorials
Self HostingModel HubModel Card
Benchmarks
Open-CLIP
EVA-CLIP[Submitted on 27 Mar 2023] DINOv2[Submitted on 14 Apr 2023] DatasetsLAION-400M[Submitted on 3 Nov 2021] LAION-2B[Submitted on 16 Oct 2022] DataComp[Submitted on 27 Apr 2023 (v1), last revised 25 Jul 2023 (this version, v4)] demoimport torch
torch.onnx.export(
model, # model being run
# model input in one of acceptable format: torch.Tensor (for single input), tuple or list of tensors for multiple inputs or dictionary with string keys and tensors as values.
dict(inputs),
"clip-vit-base-patch16.onnx", # where to save the model
opset_version=14, # the ONNX version to export the model to
input_names=["input_ids", "pixel_values", "attention_mask"], # the model's input names
output_names=["logits_per_image", "logits_per_text", "text_embeds", "image_embeds"], # the model's output names
dynamic_axes={ # variable length axes
"input_ids": {0: "batch", 1: "sequence"},
"pixel_values": {0: "batch", 1: "num_channels", 2: "height", 3: "width"},
"attention_mask": {0: "batch", 1: "sequence"},
"logits_per_image": {0: "batch"},
"logits_per_text": {0: "batch"},
"text_embeds": {0: "batch"},
"image_embeds": {0: "batch"}
}
) |
Intro
See the example code for details.
The CLIP multimodal model enables zero-shot image classification. I've tested this on multiple datasets and the model is over 99.9% accurate, as long as an appropriate prompt is provided.
We just need to write positive_labels and negative_labels based on the cue words of the known challenge (image_binary_challenge). If a new prompt is encountered that has never been processed before, the program automatically performs the conversion and adjustment for the dichotomous task.
We tried to reproduce the process module using numpy, i.e., we did not need to rely on PyTorch to implement the process.
By default, we use the
RN50.openai
specification of the model for classification tasks. We encapsulate the activation of both the ONNX and VitTransformer Pipeline branches so that the program switches automatically when you have both torch and transformers installed in your runtime environment and a CUDA GPU available. Otherwise, it defaults to using ONNX and running on a CPU.hcaptcha-challenger/hcaptcha_challenger/onnx/modelhub.py
Lines 245 to 259 in 901afd1
DEMO
hcaptcha-challenger/src/objects.yaml
Lines 553 to 574 in d38be1b
The text was updated successfully, but these errors were encountered: