om-syrinx

om-syrinx（読み方：おーむ・しーりんくす）は，Discordの読み上げボット「om」のために作られた，日本語音声合成ライブラリです．

om-syrinxでは，入力全体を一度に処理するのではなく，順番に少量ずつ合成と出力を行います（準リアルタイム音声合成）．これにより，テキストが入力されてから音声が再生できるまでの時間が短くなるため，テキストチャンネルに投稿されたメッセージをいち早く読み上げはじめることができます．

実際のテキスト処理と音声合成はそれぞれ「jpreprocess」と「jbonsai」が担っています．om-syrinxはこれらとNode.jsとのバインディングに加え，スレッド管理，バッファリング，opusへのエンコード機能を提供します．

クイックスタート

準備

以下の3つの手順を実行してください．

ライブラリ本体をインストールする：

npm install github:discordjs-japan/om-syrinx#semver:^0.4.1を実行してください．
jpreprocess用の辞書をダウンロードする：

jpreprocessのリリースから辞書 (naist-jdic-jpreprocess.tar.gz) をダウンロードし，カレントディレクトリに解凍してください．
jbonsai用のモデルをダウンロードする：

htsvoice-tohoku-f01のmasterブランチ (https://github.com/icn-lab/htsvoice-tohoku-f01/archive/refs/heads/master.tar.gz) をダウンロードし，カレントディレクトリに解凍してください．

jbonsai用のモデルについて

jbonsaiは，HTS Engineでも用いられる.htsvoiceモデルを使用して音声を合成します．

ここでは例として，htsvoice-tohoku-f01を使用しました．htsvoice-tohoku-f01は，4つの.htsvoiceモデルを含むリポジトリです．他の.htsvoiceモデルを使用することもできます．

使い方

ここでは，inputTextからstreamを生成する例を示します．

import { Syrinx, EncoderType, type SynthesisOption } from "@discordjs-japan/om-syrinx";
import { Readable } from "node:stream";

// インスタンスを生成
const syrinx = Syrinx.fromConfig({
  dictionary: "naist-jdic",
  models: ["htsvoice-tohoku-f01-master/tohoku-f01-neutral.htsvoice"],
  encoder: { type: EncoderType.Opus },
});

// 音声を合成
const inputText = "鳴管は、鳥類のもつ発声器官。";
const option: SynthesisOption = {};
const stream: Readable = syrinx.synthesize(inputText, option);

// @discordjs/voice で利用
import { createAudioResource, StreamType } from "@discordjs/voice";

const resource = createAudioResource(stream, { inputType: StreamType.Opus });

Syrinx.fromConfig()でインスタンスを生成する際に必須の設定は以下の通りです：

dictionary：jpreprocess用の辞書のフォルダのパス
models：jbonsai用のモデルの.htsvoiceファイルのパスの配列
encoder：エンコード設定
- EncoderType.Opusの場合，Opusでエンコードされます．@discordjs/voiceのStreamType.Opusに対応します．
- EncoderType.Rawの場合，16ビットPCMに変換されます．@discordjs/voiceのStreamType.Rawに対応します．
その他の設定については，EncoderConfigを参照してください．

syrinx.synthesize()で音声を合成する際に渡す引数は以下の通りです：

inputText：合成するテキスト
option：合成される音声を調整するオプション．詳しくは，SynthesisOptionを参照してください．

返り値のstreamはReadableで，encoder設定の通りにエンコードされた音声データが流れます．

encoder.typeがEncoderType.Opusの場合，streamはobject modeのReadableで，1つのオブジェクトが1つのOpusフレームに対応します．
encoder.typeがEncoderType.Rawの場合，streamは通常 (non-object mode) のReadableで，16ビットPCMのデータが流れます．

合成はメインスレッドとは別のスレッドで行われます．メインスレッドは出力を非同期に受け取ります．

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github/workflows		.github/workflows
lib		lib
src		src
test		test
.gitignore		.gitignore
.npmignore		.npmignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

om-syrinx

クイックスタート

準備

使い方

About

Releases 19

Packages

Contributors 4

Languages

License

discordjs-japan/om-syrinx

Folders and files

Latest commit

History

Repository files navigation

om-syrinx

クイックスタート

準備

使い方

About

Resources

License

Stars

Watchers

Forks

Releases 19

Packages 0

Contributors 4

Languages

Packages