Skip to content

Speech synthesis and recognition tools that uses SaluteSpeech API

Notifications You must be signed in to change notification settings

MaximGorshunov/SaluteSpeechTools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SaluteSpeechTools

Description

SaluteSpeechTools is a comprehensive project that includes a set of .NET libraries and applications for speech synthesis and recognition. The libraries, collected in the SaluteSpeech-DotNet-Client subproject, provide access to the methods of the SaluteSpeech API for speech synthesis and recognition.

The SaluteSpeechClient.Auth library is responsible for authentication, while the SaluteSpeechClient.TextToSpeechService library provides access to the API methods for speech synthesis. In the current version, only streaming speech synthesis is available, but synchronous and asynchronous synthesis is planned for the future. There are plans to create an additional library for working with the speech recognition API methods.

Installation

SaluteSpeech-DotNet-Client

The SaluteSpeechClient.Auth and SaluteSpeechClient.TextToSpeechService libraries can be installed via NuGet for use in your project. SaluteSpeechClient.TextToSpeechService already includes the SaluteSpeechClient.Auth library.

dotnet add package SaluteSpeechClient.Auth --version 1.0.0

dotnet add package SaluteSpeechClient.TextToSpeechService --version 1.0.0

Usage

SaluteSpeech-DotNet-Client

Authentication

To authenticate, use SaluteSpeechClient.Auth. Before using it, you need to obtain a secret key for API access. To obtain authorization data:

  • Create a project SaluteSpeech.
  • Submit the project for moderation.

After moderation is complete, you will receive access to authorization data: a field will appear on the page where you can copy the Client Id, and a button to generate the Client Secret. The Client Secret is displayed only once, so it needs to be saved before use.

Create a new object of type TokenProvider in the SaluteSpeechClient.Auth namespace and pass the Client Secret to its constructor. You can also pass a typed logger ILogger<TokenProvider> to the constructor if necessary.

using SaluteSpeechClient.Auth;

ITokenProvider tokenProvider = new TokenProvider("Client Secret");

To get the current token, use the GetTokenAsync method.

var token = await tokenProvider.GetTokenAsync();

Speech Synthesis

To perform speech synthesis, use SaluteSpeechClient.TextToSpeechService.

Stream Speech Synthesis

For stream speech synthesis, use StreamSpeechSynthesizer. When initializing the object, pass ITokenProvider to the constructor and a typed logger ILogger<StreamSpeechSynthesizer> if necessary:

using SaluteSpeechClient.TextToSpeechService.SpeechSynthesizer;

ISpeechSynthesizer streamSynthesizer = new StreamSpeechSynthesizer(tokenProvider);

To perform speech synthesis, use the SynthesizeAsync method.

Stream result = await streamSynthesizer.SynthesizeAsync(request, cancellationToken);

To create a request, use the StreamSynthesisRequest class, which takes the request settings ISynthesisRequestSettings and the text to synthesize in its constructor:

using SaluteSpeechClient.TextToSpeechService.SpeechSynthesizer;

ISynthesisRequest request = new SynthesisRequest(requestSettings, textToSynthesize);

To set the request settings for stream speech synthesis, use the StreamSynthesisRequestSettings class. The default parameterless constructor sets the default request settings (AudioEncoding = WAV, ContentType = Text, Voice = Nec_24000):

using SaluteSpeechClient.TextToSpeechService.SpeechSynthesizer;

ISynthesisRequestSettings requestSettings = new StreamSynthesisRequestSettings();

The constructors StreamSynthesisRequestSettings(AudioEncoding audioEncoding, ContentType contentType, Voice24 voice) and StreamSynthesisRequestSettings(AudioEncoding audioEncoding, ContentType contentType, Voice8 voice) set specific request settings. In the first case, voice models with a frequency of 24 kHz are used, and in the second case, 8 kHz.

Available voices - Voice24 and Voice8. Examples of voices are available here.

About

Speech synthesis and recognition tools that uses SaluteSpeech API

Topics

Resources

Stars

Watchers

Forks

Languages