20 Dec 12:31

7b79dbd

v3.1 🍏 Apple Neural Engine Optimizations Latest

Latest

Apple chips provide several functional units capable of high-throughput matrix multiplication and AI inference. Those computeUnits include the CPU, GPU, and the Apple Neural Engine (ANE). A user may naively hope that any typical architecture, like BERT or ViT, should work fine on all of those chips in any of the common quantization forms, like switching from f32 single-precision to bf16 and f16 half-precision floats or i8 and u8 integers. That is not the case. Of all the backends that UForm has been tested on, quantizing the entire model for CoreML was the most challenging task, and Apple became the only platform where we distribute the models in the original precision, which is a pity given a fleet of 2 billion potential target devices running iOS worldwide, almost all of which are in the countries and language groups natively supported by UForm multimodal multilingual embeddings.

When using @unum-cloud UForm models in Swift, we pass computeUnits: .all to let Apple's scheduler choose the target device itself and treat it as a black-box optimization. However, a better way to do this is if you can explicitly provide models tuned for the Apple Neural Engine. So, together with our friends from @TheStageAI, we've quantized our models to map perfectly to ANE-supported operations with minimal loss in precision, reducing the model size by 2-4x and accelerating inference up to 5x:

Model	GPU Text E.	ANE Text E.	GPU Image E.	ANE Image E.
`english-small`	2.53 ms	0.53 ms	6.57 ms	1.23 ms
`english-base`	2.54 ms	0.61 ms	18.90 ms	3.79 ms
`english-large`	2.30 ms	0.61 ms	79.68 ms	20.94 ms
`multilingual-base`	2.34 ms	0.50 ms	18.98 ms	3.77 ms

On Apple M4 iPad, running iOS 18.2. The batch size is 1, and the model is pre-loaded into memory. The original encoders use f32 single-precision numbers for maximum compatibility and mostly rely on GPU for computation. The quantized encoders use a mixture of i8, f16, and f32 numbers for maximum performance and mostly rely on the Apple Neural Engine (ANE) for computation. The median latency is reported.

To use them in Swift, check out the docs at unum-cloud.github.io/uform/swift/ or the SwiftSemanticSearch repository for an integrated example with USearch.

Thanks to @ArnoldMSU, @b1n0, @Aydarkhan, @AndreyAgeev from TheStage.ai for help 👏

Contributors

Aydarkhan, ArnoldMSU, and 4 other contributors

Assets 2

01 Oct 18:33

ashvardanian

v3.0.3

2c15cec

Release v3.0.3

Release: v3.0.3 [skip ci]

Assets 2

25 Apr 03:40

ashvardanian

v3.0.2

e6c7b42

v3.0.2

3.0.2 (2024-04-25)

Make

Change NPM name (e97977e)

Assets 2

25 Apr 03:20

ashvardanian

v3.0.1

8ff7066

v3.0.1

3.0.1 (2024-04-25)

Make

Upgrade CI (83fc71a)

Assets 2

25 Apr 03:13

ashvardanian

v3.0.0

21987e2

UForm v3 for 3 platforms 🕸️🍏🐍

Multimodal Embeddings for JavaScript, Swift, and Python

How many AI models can run on-device out of the box? UForm multimodal embeddings can 🥳

Model	Parameters	Languages	Architecture
`uform3-image-text-english-large` 🆕	365M	1	6 text layers, ViT-L/14, 6 multimodal layers
`uform3-image-text-english-base`	143M	1	2 text layers, ViT-B/16, 2 multimodal layers
`uform3-image-text-english-small` 🆕	79M	1	2 text layers, ViT-S/16, 2 multimodal layers
`uform3-image-text-multilingual-base`	206M	21	8 text layers, ViT-B/16, 4 multimodal layers

JavaScript

Load the models and preprocessors for different modalities:

import { getModel, Modality, TextProcessor, TextEncoder, ImageEncoder, ImageProcessor } from '@unum-cloud/uform';

const { configPath, modalityPaths, tokenizerPath } = await getModel({
    modelId: 'unum-cloud/uform3-image-text-english-small',
    modalities: [Modality.TextEncoder, Modality.ImageEncoder],
});

Embed images:

const imageProcessor = new ImageProcessor(configPath);
await imageProcessor.init();
const processedImages = await imageProcessor.process("path/to/image.png");

const imageEncoder = new ImageEncoder(modalityPaths.image_encoder, imageProcessor);
await imageEncoder.init();
const imageOutput = await imageEncoder.encode(processedImages);
assert(imageOutput.embeddings.dims.length === 2, "Output should be 2D");

Embed queries:

const textProcessor = new TextProcessor(configPath, tokenizerPath);
await textProcessor.init();
const processedTexts = await textProcessor.process("a small red panda in a zoo");

const textEncoder = new TextEncoder(modalityPaths.text_encoder, textProcessor);
await textEncoder.init();
const textOutput = await textEncoder.encode(processedTexts);
assert(textOutput.embeddings.dims.length === 2, "Output should be 2D");
await textEncoder.dispose();

Swift

Embed images:

let imageModel = try await ImageEncoder(modelName: "unum-cloud/uform3-image-text-english-small")
let imageURL = "https://github.com/ashvardanian/ashvardanian/blob/master/demos/bbq-on-beach.jpg?raw=true"
guard let url = URL(string: imageURL),
    let imageSource = CGImageSourceCreateWithURL(url as CFURL, nil),
    let cgImage = CGImageSourceCreateImageAtIndex(imageSource, 0, nil) {
    throw Exception("Could not load image from URL: \(imageURL)")
}

var imageEmbedding: Embedding = try imageModel.encode(cgImage)
var imageVector: [Float32] = embedding.asFloats()

Embed queries:

let textModel = try await TextEncoder(modelName: "unum-cloud/uform3-image-text-english-small")
let text = "A group of friends enjoy a barbecue on a sandy beach, with one person grilling over a large black grill, while the other sits nearby, laughing and enjoying the camaraderie."
let textEmbedding: Embedding = try textModel.encode(text)
let textVector: [Float32] = textEmbedding.asFloats()

Python

Load model:

from uform import get_model, Modality

model_name = 'unum-cloud/uform3-image-text-english-small'
modalities = [Modality.TEXT_ENCODER, Modality.IMAGE_ENCODER]
processors, models = get_model(model_name, modalities=modalities)

Embed images:

import requests
from io import BytesIO
from PIL import Image

image_url = 'https://media-cdn.tripadvisor.com/media/photo-s/1b/28/6b/53/lovely-armenia.jpg'
image = Image.open(BytesIO(requests.get(image_url).content))

processor_image = processors[Modality.IMAGE_ENCODER]
model_image = models[Modality.IMAGE_ENCODER]
image_data = processor_image(image)
image_features, image_embedding = model_image.encode(image_data, return_features=True)

Embed queries:

text = 'a cityscape bathed in the warm glow of the sun, with varied architecture and a towering, snow-capped mountain rising majestically in the background'

model_text = models[Modality.TEXT_ENCODER]
processor_text = processors[Modality.TEXT_ENCODER]

text_data = processor_text(text)
text_features, text_embedding = model_text.encode(text_data, return_features=True)

Thanks to @xenova and @sroussey for help with JavaScript!
Thanks to @vmanot and @pcuenca for their work on Swift!

Contributors

sroussey, pcuenca, and 2 other contributors

Assets 2

16 Apr 03:55

ashvardanian

v2.1.1

6358d94

v2.1.1

2.1.1 (2024-04-16)

Fix

Importing ViT in gen_model.py (#80) (21f49ba), closes #80

Assets 2

14 Apr 00:50

ashvardanian

v2.1.0

a5d84fc

v2.1.0

2.1.0 (2024-04-14)

Add

Initial Swift support (00bd84c)

Fix

Image preprocessing in Swift (f2772d0)

Improve

Fetching nested configs (729b9d9)

Make

Formatting Swift code (f6faf4c)

Assets 2

28 Mar 20:43

ashvardanian

v2.0.2

346ca27

v2.0.2

2.0.2 (2024-03-28)

Make

Fix PyPi CI version with hash (364afe6)

Assets 2

28 Mar 20:38

ashvardanian

v2.0.1

783b6bd

v2.0.1

2.0.1 (2024-03-28)

Make

PyPi upload version (9453802)

Assets 2

28 Mar 20:35

ashvardanian

v2.0.0

a45b12e

Multimodal Matryoshka, Multimodal DPO, and ONNX 🎉

Today we are releasing a new batch of multimodal models trained with Nebius and already available on HuggingFace 🤗

Matryoshka style multimodal embeddings ranging from 64 to 256 and 768 dimensions 🖼️
Improved multimodal chat in 1.2B parameters, tuned with Direct Preference Optimization 💬
ONNX backend, making PyTorch dependency optional for lightning fast deployments ⚡

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

3.0.2 (2024-04-25)

Make

3.0.1 (2024-04-25)

Make

Multimodal Embeddings for JavaScript, Swift, and Python

JavaScript

Swift

Python

Contributors

2.1.1 (2024-04-16)

Fix

2.1.0 (2024-04-14)

Add

Fix

Improve

Make

2.0.2 (2024-03-28)

Make

2.0.1 (2024-03-28)

Make

Releases: unum-cloud/uform

v3.1 🍏 Apple Neural Engine Optimizations

Contributors

Release v3.0.3

v3.0.2

3.0.2 (2024-04-25)

Make

v3.0.1

3.0.1 (2024-04-25)

Make

UForm v3 for 3 platforms 🕸️🍏🐍

Multimodal Embeddings for JavaScript, Swift, and Python

JavaScript

Swift

Python

Contributors

v2.1.1

2.1.1 (2024-04-16)

Fix

v2.1.0

2.1.0 (2024-04-14)

Add

Fix

Improve

Make

v2.0.2

2.0.2 (2024-03-28)

Make

v2.0.1

2.0.1 (2024-03-28)

Make

Multimodal Matryoshka, Multimodal DPO, and ONNX 🎉