The Watson Speech iOS SDK has been deprecated in favor of the new Watson Developer Cloud iOS SDK which currently supports most of the Watson services.
An SDK for iOS mobile applications enabling use of the Bluemix Watson Speech To Text and Text To Speech APIs from Watson Developer Cloud
The SDK include support for recording and streaming audio and receiving a transcript of the audio in response.
Using the dynamic framework
- Download the WatsonSDK.framework.zip and unzip it somewhere convenient
- Once unzipped drag the watsonsdk.framework folder into your xcode project view under the Frameworks folder. (As any other third party framework, make sure the WatsonSDK.framework is listed in the "Copy Files" build phase of your target.)
- Download extra dependencies: libogg.a and libopus.a
- Drag libogg.a, libopus.a into your xcode project view under the Frameworks folder.
Some additional iOS standard frameworks must be added.
-
Select your project in the Xcode file explorer and open the "Build Phases" tab. Expand the "Link Binary With Libraries" section and click the + icon
-
Add the following frameworks
- AudioToolbox.framework
- AVFoundation.framework
- CFNetwork.framework
- CoreAudio.framework
- Foundation.framework
- libicucore.tbd (or libicucore.dylib on older versions)
- Quartzcore.framework
- Security.framework
in Objective-C
#import <WatsonSDK/WatsonSDK.h>
in Swift
Add the headers above for Objective-c into a bridging header file. - Use SwiftSpeechHeader.h in Swift sample
This repository contains a sample application demonstrating the SDK functionality.
To run the application clone this repository and then navigate in Finder to folder containing the SDK files.
Double click on the watsonsdk.xcodeproj to launch Xcode.
To run the sample application, change the compile target to 'watsonsdktest-objective-c' or 'watsonsdktest-swift' and run on the iPhone simulator.
Note that this is sample code and no security review has been performed on the code.
The Swift sample was tested in Xcode 7.2.
By default the Configuration will use the IBM Bluemix service API endpoint, custom endpoints can be set using setApiURL
in most cases this is not required.
in Objective-C
STTConfiguration *conf = [[STTConfiguration alloc] init];
in Swift
let conf:STTConfiguration = STTConfiguration()
There are currently two authentication options.
Basic Authentication, using the credentials provided by the Bluemix Service instance.
in Objective-C
[conf setBasicAuthUsername:@"<userid>"];
[conf setBasicAuthPassword:@"<password>"];
in Swift
conf.basicAuthUsername = "<userid>"
conf.basicAuthPassword = "<password>"
Token authentication, if a token authentication provider is running at https://my-token-factory/token
[conf setTokenGenerator:^(void (^tokenHandler)(NSString *token)){
NSURL *url = [[NSURL alloc] initWithString:@"https://my-token-factory/token"];
NSMutableURLRequest *request = [[NSMutableURLRequest alloc] init];
[request setHTTPMethod:@"GET"];
[request setURL:url];
NSError *error = [[NSError alloc] init];
NSHTTPURLResponse *responseCode = nil;
NSData *oResponseData = [NSURLConnection sendSynchronousRequest:request returningResponse:&responseCode error:&error];
if ([responseCode statusCode] != 200) {
NSLog(@"Error getting %@, HTTP status code %i", url, [responseCode statusCode]);
return;
}
tokenHandler([[NSString alloc] initWithData:oResponseData encoding:NSUTF8StringEncoding]);
} ];
in Objective-C
@property SpeechToText;
...
self.stt = [SpeechToText initWithConfig:conf];
in Swift
var stt:SpeechToText = SpeechToText();
...
self.stt = SpeechToText.init(config: conf)
in Objective-C
[stt listModels:^(NSDictionary* jsonDict, NSError* err){
if(err == nil)
... read values from NSDictionary ...
}];
in Swift
stt!.listModels({
(jsonDict, err) in
if err == nil {
println(jsonDict)
}
})
Available speech recognition models can be obtained using the listModel function.
[stt listModel:^(NSDictionary* jsonDict, NSError* err){
if(err == nil)
... read values from NSDictionary ...
} withName:@"WatsonModel"];
The speech recognition model can be changed in the configuration.
[conf setModelName:@"ja-JP_BroadbandModel"];
By default audio sent to the server is uncompressed PCM encoded data, compressed audio using the Opus codec can be enabled.
[conf setAudioCodec:WATSONSDK_AUDIO_CODEC_TYPE_OPUS];
[stt recognize:^(NSDictionary* res, NSError* err){
if(err == nil)
result.text = [stt getTranscript:res];
else
result.text = [err localizedDescription];
}];
The app must indicate to the SDK when transcription should be ended.
NSError* error= [stt endRecognize];
if(error != nil)
NSLog(@"error is %@",error.localizedDescription);
The Speech to Text service end of sentence detection can be used to detect that the user has stopped speaking this is indicated in the transcription result, we can use this to automatically end the recognize operation. The following code can be used in the app to do this.
in Objective-C
// start recognize
[stt recognize:^(NSDictionary* res, NSError* err){
if(err == nil) {
if([self.stt isFinalTranscript:res]) {
NSLog(@"this is the final transcript");
[stt endRecognize];
}
result.text = [stt getTranscript:res];
} else {
result.text = [err localizedDescription];
}
}];
in Swift
self.stt.recognize({ (res: [NSObject:AnyObject]!, err: NSError!) -> Void in
if err == nil {
if self.stt.isFinalTranscript(res) {
NSLog("this is the final transcript");
self.stt.endRecognize()
}
result.text = self.stt.getTranscript(res);
} else {
result.text = err.localizedDescription;
}
});
A confidence score is available for any final transcripts (whole sentences). This can be obtained by passing the resulting Dictionary object.
[stt getConfidenceScore:res]
[stt getPowerLevel:^(float power){
// user the power level to make a simple UIView graphic indicator
CGRect frm = self.soundbar.frame;
frm.size.width = 3*(70 + power);
self.soundbar.frame = frm;
self.soundbar.center = CGPointMake(self.view.frame.size.width / 2, self.soundbar.center.y);
}];
By default the Configuration will use the IBM Bluemix service API endpoint, custom endpoints can be set using setApiURL
in most cases this is not required.
TTSConfiguration *conf = [[TTSConfiguration alloc] init];
[conf setBasicAuthUsername:@"<userid>"];
[conf setBasicAuthPassword:@"<password>"];
You can change the voice model used for TTS by setting it in the configuration.
in Objective-C
[conf setVoiceName:@"en-US_MichaelVoice"];
in Swift
conf.voiceName = "en-US_MichaelVoice"
If you use tokens (from your own server) to get access to the service, provide a token generator to the Configuration. userid
and password
will not be used if a token generator is provided.
in Objective-C
[conf setTokenGenerator:^(void (^tokenHandler)(NSString *token)){
// get a token from your server in secure way
NSString *token = ...
// provide the token to the tokenHandler
tokenHandler(token);
}];
self.tts = [TextToSpeech initWithConfig:conf];
in Objective-C
[tts listVoices:^(NSDictionary* jsonDict, NSError* err){
if(err == nil)
... read values from NSDictionary ...
}];
in Swift
tts!.listVoices({
(jsonDict, err) in
if err == nil {
println(jsonDict)
}
})
in Objective-C
[self.tts synthesize:^(NSData *data, NSError *reqErr) {
// request error
if(reqErr){
NSLog(@"Error requesting data: %@", [reqErr description]);
return;
}
// play audio and log when playing has finished
[self.tts playAudio:^(NSError *err) {
if(err)
NSLog(@"error playing audio %@", [err localizedDescription]);
else
NSLog(@"audio finished playing");
} withData:data];
} theText:@"Hello World"];
in Swift
tts!.synthesize({ (data: NSData!, reqError: NSError!) -> Void in
if reqError == nil{
tts!.playAudio({ (error: NSError!) -> Void in
if error == nil{
... do something after the audio has played ...
}
else{
... data error handling ...
}
}, withData: data)
}
else
... request error handling ...
}, theText: "Hello World")
If you get an error such as...
Undefined symbols for architecture x86_64
Check that all the required frameworks have been added to your project.
Find more open source projects on the IBM Github Page.
Copyright 2016 IBM Corporation under the Apache 2.0 license.