Skip to content

Latest commit

 

History

History
378 lines (279 loc) · 13.7 KB

react-text-to-speech-guide.mdx

File metadata and controls

378 lines (279 loc) · 13.7 KB
title icon description
React Guide
react
Explore natural and customizable text-to-speech synthesis for React applications with ElevenLabs' developer guide.

Image: Pexels

At ElevenLabs, our goal is to help companies build real, impactful applications with our suite of advanced AI speech synthesis tools.

To do so, we’ve created a React API that has proven invaluable for multi-language customer service applications, educational programs, and even real-time communication systems that require instantaneous text-to-speech conversion.

What sets us apart is our ability to produce high-quality spoken audio in multiple languages, complete with nuances like human-like intonation, inflection, and pronunciation. The audio output is virtually indistinguishable from a human voice.

For those seeking further customization, our API even allows users to train the model using their own voice, opening the door to voice cloning capabilities across different languages, accents, and tones.

Initializing Your Journey with ElevenLabs' Text-to-Speech API

Ready to jump in? Your first step is as simple as it is crucial: obtaining your API key.

Get an API Key

  1. To get started, create a free account at ElevenLabs Signup Page. Upon successful registration, a xi-api-key will be automatically generated for you.
  2. Navigate to your profile. You’ll find your API key under your signup email address.

Image: docs.elevenlabs.io

  1. Click the eye icon to reveal the key as plain text for copying.

Image: docs.elevenlabs.io

  1. With your API key in hand, you're all set to explore ElevenLabs' suite of services, including text-to-speech conversion, voice synthesis, voice lab, and voice library.
  2. Should you need to regenerate your API key for any reason, just click the refresh icon on your profile settings page. Note that this will deactivate your prior key, so make sure it’s not being used in any mission-critical applications before you do so.

Making a simple request to our API

To interact with our API, you'll need to send a POST request to the following endpoint:

https://api.elevenlabs.io/v1/text-to-speech/

For demonstration purposes, let’s use Axios. Axios is an HTTP request utility with simple syntax.

First, let’s install it. Head to your terminal and type:

npm install axios

Now that it’s in your packages folder, let’s create a new file (you can call it whatever you want - we’re calling ours elevenlabsUtils.js) and get to work.

First, let’s import Axios:

import axios from 'axios';

Then, let’s define a function to convert text to audio with the ElevenLabs API. We’ll call it convertTextToAudio, and use arrow function syntax.

The function will first define our API key (that we pull from process.env), and then define the ID of the voice we want to use for speech. Then, we’ll set up the syntax of the API request, add the appropriate headers, and we’re off to the races!

import axios from 'axios';

// Function to convert text to audio using ElevenLabs API
const convertTextToAudio = async (textToConvert) => {
  // Set the API key for ElevenLabs API
  const apiKey = process.env.ELEVEN_LABS_API_KEY;

  // ID of voice to be used for speech
  const voiceId = '21m00Tcm4TlvDq8ikWAM';

  // API request options
  const apiRequestOptions = {
    method: 'POST',
    url: `https://api.elevenlabs.io/v1/text-to-speech/${voiceId}`,
    headers: {
      accept: 'audio/mpeg',
      'content-type': 'application/json',
      'xi-api-key': apiKey,
    },
    data: {
      text: textToConvert,
    },
    responseType: 'arraybuffer', // To receive binary data in response
  };

  // Sending the API request and waiting for response
  const apiResponse = await axios.request(apiRequestOptions);

  // Return the binary audio data received from API
  return apiResponse.data;
};

export default convertTextToAudio;

A simple example: converting to an audio file

Once you've received the data as an ArrayBuffer from the API you can do anything with it. But if you wanted to, say, play or save the clip with a run-of-the-mill audio player (like VLC), your next step would be to convert it into an MP3 blob.

First, let’s import useState and useEffect from the React library, so we can take advantage of React’s built-in function hooks:

import { useState, useEffect } from 'react';

Then, we’ll define a component called AudioComponent to contain the rest of our code.

const AudioComponent = () => {
  // State variable to store URL of the audio source
  const [sourceUrl, setSourceUrl] = useState(null);
};

Inside AudioComponent, we’ll create an asynchronous function called fetchAndUpdateAudioData. This function does three things:

  1. It fetches audio data from our API by calling the previously defined convertTextToAudio function.
  2. It creates a new Blob object from the fetched audio data, specifying its MIME type as audio/mpeg.
  3. It generates a URL for this blob using URL.createObjectURL() method and updates the sourceUrl state variable with this URL.

Our final code is as follows:

import { useState, useEffect } from 'react';

const AudioComponent = () => {
  // State variable to store URL of the audio source
  const [sourceUrl, setSourceUrl] = useState(null);

  // Asynchronous function to fetch audio data and update state variable
  const fetchAndUpdateAudioData = async () => {
    const audioData = await convertTextToAudio('Hello welcome');

    // Create a new Blob object from the fetched audio data with matching MIME type
    const audioBlob = new Blob([audioData], { type: 'audio/mpeg' });

    // Create a URL for the audio blob
    const blobUrl = URL.createObjectURL(audioBlob);

    // Update the sourceUrl state variable with the generated URL for the audio blob
    setSourceUrl(blobUrl);
  };

  // Call the fetchAndUpdateAudioData once when the component mounts
  useEffect(() => {
    fetchAndUpdateAudioData();
  }, []);

  // Render an audio element when source URL is available
  return (
    <div>
      {sourceUrl && (
        <audio autoPlay controls>
          <source src={sourceUrl} type='audio/mpeg' />
        </audio>
      )}
    </div>
  );
};

export default AudioComponent;

For more details on working with our API, including additional endpoints, refer to our official documentation.

Leveraging ElevenLabs' AudioStream React Component

Want to take it one step further? Beyond our API, we also offer a reusable React component called AudioStream to facilitate text-to-speech conversion directly within your React app.

This component requires the following props:

  • voiceId: The ID specifying the voice to use for text-to-speech (string)
  • text: The text to convert to speech (string)
  • apiKey: Your ElevenLabs API key (string)
  • voiceSettings: An object containing additional voice settings, such as stability and similarity boost (VoiceSettings)

For clarity, here’s the VoiceSettings interface:

interface VoiceSettings {
  stability: number;
  similarity_boost: number;
}

Using AudioStream React Component

Integrating AudioStream into your React application is a straightforward process:

  1. Create a react project using vite by running npm create vite@latest in your terminal and follow along the instructions.
  2. Open the project in your favorite code editor and install axios by running npm install axios in your terminal.
  3. Create a component in your src directory called AudioStream.tsx and copy the following code into it, this is our implementation of the AudioStream component in it's most basic form:
import React from 'react';
import axios from 'axios';

interface VoiceSettings {
  stability: number;
  similarity_boost: number;
}

interface AudioStreamProps {
  voiceId: string;
  text: string;
  apiKey: string;
  voiceSettings?: VoiceSettings;
}

const AudioStream: React.FC<AudioStreamProps> = ({
  voiceId,
  text,
  apiKey,
  voiceSettings,
}) => {
  const [loading, setLoading] = React.useState(false);

  // State variable to store URL of the audio source
  const [sourceUrl, setSourceUrl] = React.useState<string | null>(null);
  const [error, setError] = React.useState('');

  // Asynchronous function to fetch audio data and update state variables
  const convertAndStream = async () => {
    setLoading(true);
    setError('');

    const baseUrl = 'https://api.elevenlabs.io/v1/text-to-speech';
    const headers = {
      'Content-Type': 'application/json',
      'xi-api-key': apiKey,
    };

    const body = {
      text,
      voice_settings: voiceSettings,
    };

    try {
      const response = await axios.post(`${baseUrl}/${voiceId}`, body, {
        headers,
        responseType: 'blob',
      });

      if (response.status === 200) {
        /**
         * The response.data is a Blob object which looks like this:
         * Blob {size: 123456, type: "audio/mpeg"}
         */

        // Create a URL for the audio blob and update the sourceUrl state variable
        return setSourceUrl(URL.createObjectURL(response.data));
      } else {
        setError('Error: Unable to stream audio.');
      }
    } catch (error) {
      setError('Error: Unable to stream audio.');
    } finally {
      setLoading(false);
    }
  };

  return (
    <div>
      {/* Render an audio element when the source URL is available */}
      {sourceUrl && (
        <audio autoPlay controls>
          <source src={sourceUrl} type='audio/mpeg' />
        </audio>
      )}
      <button type='button' onClick={convertAndStream} disabled={loading}>
        Convert and Stream
      </button>
      {error && <p>{error}</p>}
    </div>
  );
};

export default AudioStream;
  1. Import the component: Import AudioStream from its source file: import AudioStream from './AudioStream'; in your App.tsx file like so:
import React from 'react';
import AudioStream from './AudioStream';

const voiceSettings = {
  stability: 0,
  similarity_boost: 0,
};

const App = () => {
  return (
    <div>
      <AudioStream
        voiceId='304fx0Tcm4TlvDq8ikWAM'
        text='The following is an example text to test the AudioStream component.'
        apiKey='your-api-key'
        voiceSettings={voiceSettings}
      />
    </div>
  );
};

Just replace voiceId, text, apiKey, and voiceSettings with the specific settings corresponding to how you’d like to use the API.

  1. Here’s another example of using this reusable component:
import React from 'react';
import ReactDOM from 'react-dom';
import SpeechStreamComponent from './SpeechStreamComponent';

const voiceSettingsId = '304fx0Tcm4TlvDq8ikWAM';
const sampleText =
  'The following is an example text to test the AudioStream component.';
const apiAccessKey = 'your-api-key';

const audioQualitySettings = {
  stability: 0,
  similarity_boost: 0,
};

ReactDOM.render(
  <React.StrictMode>
    <SpeechStreamComponent
      voiceId={voiceSettingsId}
      text={sampleText}
      apiKey={apiAccessKey}
      voiceSettings={audioQualitySettings}
    />
  </React.StrictMode>,
  document.getElementById('root')
);

Please note that the above SpeechStreamComponent is the same as AudioStream component. We have just renamed it for the sake of providing more options for naming your component.

Once incorporated, the component will render an audio element with a "Convert and Stream" button. Just click this button to activate the text-to-speech conversion — your generated audio should play immediately.

You can also extend this reusable AudioStream component to add more features to it. Here are a few ideas to get you started:

  • Add a loading indicator to the button.
  • Use React Query to streamline the API call.
  • Add a dropdown to select the voiceIds.
  • Add a dropdown to select the voiceSettings.
  • Using Tailwind CSS or any other UI component library to style the component.

These are just a few of the many ways you can customize the AudioStream component to suit your needs. The possibilities are endless!

Wrapping Up

At ElevenLabs, we offer an unprecedented opportunity to integrate highly natural and customizable speech synthesis into your React applications.

Whether for creating authentic voiceovers, multilingual customer service, or interactive educational modules, the ElevenLabs API and our accompanying AudioStream React component offer a robust solution to meet your text-to-speech needs.

Take the first step by signing up today and explore the authentic, real human speech capabilities that await. The possibilities are bounded only by the limits of your imagination!

Take the Next Step with ElevenLabs

Intrigued by the potential that our cutting-edge speech synthesis models can bring to your React projects? The journey doesn’t have to stop here.

Check out ElevenLabs' Upcoming Projects to get a glimpse of the future innovations we’re working on to revolutionize your apps even further.

And for those who wish to delve deeper into the technical aspects, our Knowledge Base offers a wealth of articles, tutorials, and documentation that will equip you with all the tools you need to become an ElevenLabs power user.

Thanks for reading!