Release v0.9.0 · argmaxinc/WhisperKit

Highlights

Package Updates

With #216 the default for checking whether a model is supported on the device uses the model repo config.json as a source of truth. The need for this came about with the release of the new large-v3 turbo model, which is listed in the model repo as openai_whisper-large-v3-v20240930, which was recommended for devices that would crash if attempting to load. This situation can now be mitigated by updating this config.json without the need for a new release and can be called directly with the new static method recommendedRemoteModels:

    let recommendedModel =  await WhisperKit.recommendedRemoteModels().default
	let pipe  = WhisperKit(model: recommendModel)

The existing interface for WhisperKit.recommendedModels() remains the same, but now returns a ModelSupport object with a list of supported models for the current device.

public struct ModelSupport: Codable, Equatable {
    public let `default`: String
    public let supported: [String]
    public var disabled: [String] = []
}

Also, in an ongoing effort to improve modularity, extensibility, and code structure, there is a new way to initialize WhisperKit: using the new WhisperKitConfig class. The parameters are exactly the same and the previous init method is still in place, but this can assist in defining WhisperKit settings and protocol objects ahead of time and initialize WhisperKit more cleanly:

let pipe = try? await WhisperKit(model: "your-custom-model", modelRepo: "username/your-model-repo")

New:

let config = WhisperKitConfig(model: "your-custom-model", modelRepo: "username/your-model-repo") // Initialize config
config.model = "your-custom-model" // Alternatively set parameters directly
let pipe = try? await WhisperKit(config) // Pass into WhisperKit initializer

WhisperAX example app and CLI

Thanks to some memory and audio processing optimizations in #195, #216, and #217, (shout out to @keleftheriou for finding a big improvement there) we've updated the example implementations to use VAD by default with a concurrentWorkerCount of 4. This will significantly improve default inference speed on long files for devices that support async prediction, as well as real time streaming for devices/model combinations that are greater than 1 real-time factor.

⚠️ Deprecations and changed interfaces

The extension on Process.processor is now ProcessInfo.processor and includes a new property ProcessInfo.hwModel which will return a similar string as uname(&utsname) for non-macs.
public func modelSupport(for deviceName: String) -> (default: String, disabled: [String]) is now a disfavored overload in preference of public func modelSupport(for deviceName: String, from config: ModelSupportConfig? = nil) -> ModelSupport

What's Changed

Make additional initializers, functions, members public for extensibility by @bpkeene in #192
Fix start time logic for file loading by @ZachNagengast in #195
Change static var stored properties to static let by @fumoboy007 in #190
Add VoiceActivityDetector base class by @a2they in #199
Set default concurrentWorkerCount by @atiorh in #205
Improving modularity and code structure by @a2they in #212
Add model support config fetching from model repo by @ZachNagengast in #216
Example app VAD default + memory reduction by @ZachNagengast in #217

New Contributors

@bpkeene made their first contribution in #192
@fumoboy007 made their first contribution in #190
@a2they made their first contribution in #199
@atiorh made their first contribution in #205
@1amageek made their first contribution in #216
@keleftheriou made their first contribution in #217

Full Changelog: v0.8.0...v0.9.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.0