v0.9.0
Highlights
Package Updates
With #216 the default for checking whether a model is supported on the device uses the model repo config.json as a source of truth. The need for this came about with the release of the new large-v3 turbo model, which is listed in the model repo as openai_whisper-large-v3-v20240930, which was recommended for devices that would crash if attempting to load. This situation can now be mitigated by updating this config.json without the need for a new release and can be called directly with the new static method recommendedRemoteModels
:
let recommendedModel = await WhisperKit.recommendedRemoteModels().default
let pipe = WhisperKit(model: recommendModel)
The existing interface for WhisperKit.recommendedModels()
remains the same, but now returns a ModelSupport
object with a list of supported models for the current device.
public struct ModelSupport: Codable, Equatable {
public let `default`: String
public let supported: [String]
public var disabled: [String] = []
}
Also, in an ongoing effort to improve modularity, extensibility, and code structure, there is a new way to initialize WhisperKit: using the new WhisperKitConfig
class. The parameters are exactly the same and the previous init method is still in place, but this can assist in defining WhisperKit settings and protocol objects ahead of time and initialize WhisperKit more cleanly:
Previous:
let pipe = try? await WhisperKit(model: "your-custom-model", modelRepo: "username/your-model-repo")
New:
let config = WhisperKitConfig(model: "your-custom-model", modelRepo: "username/your-model-repo") // Initialize config
config.model = "your-custom-model" // Alternatively set parameters directly
let pipe = try? await WhisperKit(config) // Pass into WhisperKit initializer
WhisperAX example app and CLI
Thanks to some memory and audio processing optimizations in #195, #216, and #217, (shout out to @keleftheriou for finding a big improvement there) we've updated the example implementations to use VAD by default with a concurrentWorkerCount
of 4. This will significantly improve default inference speed on long files for devices that support async prediction, as well as real time streaming for devices/model combinations that are greater than 1 real-time factor.
⚠️ Deprecations and changed interfaces
- The extension on
Process.processor
is nowProcessInfo.processor
and includes a new propertyProcessInfo.hwModel
which will return a similar string asuname(&utsname)
for non-macs. public func modelSupport(for deviceName: String) -> (default: String, disabled: [String])
is now a disfavored overload in preference ofpublic func modelSupport(for deviceName: String, from config: ModelSupportConfig? = nil) -> ModelSupport
What's Changed
- Make additional initializers, functions, members public for extensibility by @bpkeene in #192
- Fix start time logic for file loading by @ZachNagengast in #195
- Change
static var
stored properties tostatic let
by @fumoboy007 in #190 - Add VoiceActivityDetector base class by @a2they in #199
- Set default concurrentWorkerCount by @atiorh in #205
- Improving modularity and code structure by @a2they in #212
- Add model support config fetching from model repo by @ZachNagengast in #216
- Example app VAD default + memory reduction by @ZachNagengast in #217
New Contributors
- @bpkeene made their first contribution in #192
- @fumoboy007 made their first contribution in #190
- @a2they made their first contribution in #199
- @atiorh made their first contribution in #205
- @1amageek made their first contribution in #216
- @keleftheriou made their first contribution in #217
Full Changelog: v0.8.0...v0.9.0