Skip to content

2.17.0

Compare
Choose a tag to compare
@xenova xenova released this 11 Apr 00:08
· 573 commits to main since this release

What's new?

💬 Improved text-generation pipeline for conversational models

This version adds support for passing an array of chat messages (with "role" and "content" properties) to the text-generation pipeline (PR). Check out the list of supported models here.

Example: Chat with Xenova/Qwen1.5-0.5B-Chat.

import { pipeline } from '@xenova/transformers';

// Create text-generation pipeline
const generator = await pipeline('text-generation', 'Xenova/Qwen1.5-0.5B-Chat');

// Define the list of messages
const messages = [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Tell me a funny joke.' }
]

// Generate text
const output = await generator(messages, {
    max_new_tokens: 128,
    do_sample: false,
})
console.log(output[0].generated_text);
// [
//   { role: 'system', content: 'You are a helpful assistant.' },
//   { role: 'user', content: 'Tell me a funny joke.' },
//   { role: 'assistant', content: "Sure, here's one:\n\nWhy was the math book sad?\n\nBecause it had too many problems.\n\nI hope you found that joke amusing! Do you have any other questions or topics you'd like to discuss?" },
// ]

We also added the return_full_text parameter, which means if you set return_full_text=false, only the newly-generated tokens will be returned (only applicable if passing the raw text prompt to the pipeline).

🔢 Binary embedding quantization support

Transformers.js v2.17 adds two new parameters to the feature-extraction pipeline ("quantize" and "precision"), enabling you to generate binary embeddings. These can be used with certain embedding models to shrink the size of the document embeddings for retrieval. This results in reductions in index size/memory usage (for storage) and improvements in retrieval speed. Surprisingly, you can still achieve up to ~95% of the original performance, but at 32x storage savings and up to 32x retrieval speeds! 🤯 Thanks to @jonathanpv for this addition in #691!

import { pipeline } from '@xenova/transformers';

// Create feature-extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

// Compute binary embeddings
const output = await extractor('This is a simple test.', { pooling: 'mean', quantize: true, precision: 'binary' });
// Tensor {
//   type: 'int8',
//   data: Int8Array [49, 108, 24, ...],
//   dims: [1, 48]
// }

As you can see, this produces a 32x smaller output tensor (a 4x reduction in data type with Float32Array → Int8Array, as well as an 8x reduction in dimensionality from 384 → 48). For more information, check out this PR in sentence-transformers, which inspired this update!

🛠️ Misc. improvements

🤗 New contributors

Full Changelog: 2.16.1...2.17.0