SaluteSpeechTools is a comprehensive project that includes a set of .NET libraries and applications for speech synthesis and recognition. The libraries, collected in the SaluteSpeech-DotNet-Client subproject, provide access to the methods of the SaluteSpeech API for speech synthesis and recognition.
The SaluteSpeechClient.Auth library is responsible for authentication, while the SaluteSpeechClient.TextToSpeechService library provides access to the API methods for speech synthesis. In the current version, only streaming speech synthesis is available, but synchronous and asynchronous synthesis is planned for the future. There are plans to create an additional library for working with the speech recognition API methods.
The SaluteSpeechClient.Auth and SaluteSpeechClient.TextToSpeechService libraries can be installed via NuGet for use in your project. SaluteSpeechClient.TextToSpeechService already includes the SaluteSpeechClient.Auth library.
dotnet add package SaluteSpeechClient.Auth --version 1.0.0
dotnet add package SaluteSpeechClient.TextToSpeechService --version 1.0.0
To authenticate, use SaluteSpeechClient.Auth. Before using it, you need to obtain a secret key for API access. To obtain authorization data:
- Create a project SaluteSpeech.
- Submit the project for moderation.
After moderation is complete, you will receive access to authorization data:
a field will appear on the page where you can copy the Client Id
, and a button to generate the Client Secret
.
The Client Secret
is displayed only once, so it needs to be saved before use.
Create a new object of type TokenProvider
in the SaluteSpeechClient.Auth
namespace and pass the Client Secret
to its constructor.
You can also pass a typed logger ILogger<TokenProvider>
to the constructor if necessary.
using SaluteSpeechClient.Auth;
ITokenProvider tokenProvider = new TokenProvider("Client Secret");
To get the current token, use the GetTokenAsync method.
var token = await tokenProvider.GetTokenAsync();
To perform speech synthesis, use SaluteSpeechClient.TextToSpeechService.
For stream speech synthesis, use StreamSpeechSynthesizer.
When initializing the object, pass ITokenProvider
to the constructor and a typed logger ILogger<StreamSpeechSynthesizer>
if necessary:
using SaluteSpeechClient.TextToSpeechService.SpeechSynthesizer;
ISpeechSynthesizer streamSynthesizer = new StreamSpeechSynthesizer(tokenProvider);
To perform speech synthesis, use the SynthesizeAsync method.
Stream result = await streamSynthesizer.SynthesizeAsync(request, cancellationToken);
To create a request, use the StreamSynthesisRequest class, which takes the request settings ISynthesisRequestSettings
and the text to synthesize in its constructor:
using SaluteSpeechClient.TextToSpeechService.SpeechSynthesizer;
ISynthesisRequest request = new SynthesisRequest(requestSettings, textToSynthesize);
To set the request settings for stream speech synthesis, use the StreamSynthesisRequestSettings class.
The default parameterless constructor sets the default request settings (AudioEncoding = WAV
, ContentType = Text
, Voice = Nec_24000
):
using SaluteSpeechClient.TextToSpeechService.SpeechSynthesizer;
ISynthesisRequestSettings requestSettings = new StreamSynthesisRequestSettings();
The constructors StreamSynthesisRequestSettings(AudioEncoding audioEncoding, ContentType contentType, Voice24 voice)
and StreamSynthesisRequestSettings(AudioEncoding audioEncoding, ContentType contentType, Voice8 voice)
set specific request settings.
In the first case, voice models with a frequency of 24 kHz are used, and in the second case, 8 kHz.
Available voices - Voice24 and Voice8. Examples of voices are available here.