Python
Nginx
FFmpeg
MySQL
Anaconda
本项目是在 python = 3.8 环境中开发
其中这里看到的是后端接口部分,前端部分可以到 https://github.com/lukeewin/ASR_LLM_TTS_Front.git 中查看
本项目完全在内网离线环境中可以使用,没有调用任何云 API 接口。
ASR引擎:openai whisper 和 faster whisper
LLM: ollama 支持的任意大模型,代码内部使用的是 qwen:1.8b 大模型,如果想要更换其它大模型,可以在源码 app.py 中找到 qwen:1.8b 更换为自己部署的大模型
TTS: MeloTTS(中国大陆地区由于网络问题,导致从 huggingface 中无法下载模型文件的可以设置使用中国大陆 huggingface 镜像)
目录结构:
| model | openai_whisper | 模型
| faster_whisper | 模型
| app.py
修改 nginx,conf 文件为下面的内容:
server {
listen 80;
server_name localhost;
location / {
root D:\\Works\\Web_Projects\\ASR_LLM_TTS\\top\\lukeewin\\;
index login.html index.htm;
}
location /api {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
location /tts {
proxy_pass http://localhost:8001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
import io
import logging
from typing import Optional, Any
from melo.api import TTS
from fastapi.responses import StreamingResponse
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
from enum import Enum
# https://github.com/myshell-ai/MeloTTS/pull/56/files
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
app = FastAPI()
device = 'auto'
models = {
'EN': TTS(language='EN', device=device),
'ES': TTS(language='ES', device=device),
'FR': TTS(language='FR', device=device),
'ZH': TTS(language='ZH', device=device),
'JP': TTS(language='JP', device=device),
'KR': TTS(language='KR', device=device),
}
class RequestJson(BaseModel):
text: str = 'Ahoy there matey! There she blows!'
language: str = 'EN'
speaker: str = 'EN-US'
speed: float = 1.0
class ResponseJson(BaseModel):
code: int
message: str
data: Optional[Any] = None
class ErrorCode(Enum):
TEXT_NOT_FOUND = 100
@app.post("/tts/stream")
async def tts_stream(payload: RequestJson):
language = payload.language
text = payload.text
speaker = payload.speaker or list(models[language].hps.data.spk2id.keys())[0]
speed = payload.speed
logging.info(f"language: {language}")
logging.info(f"text: {text}")
logging.info(f"speaker: {speaker}")
logging.info(f"speed: {speed}")
def audio_stream():
bio = io.BytesIO()
if speaker == 'None':
models[language].tts_to_file(text, models[language].hps.data.spk2id[language], bio, speed=speed, format='wav')
else:
models[language].tts_to_file(text, models[language].hps.data.spk2id[speaker], bio, speed=speed, format='wav')
audio_data = bio.getvalue()
yield audio_data
return StreamingResponse(audio_stream(), media_type="audio/wav")
if __name__ in '__main__':
uvicorn.run(app, host="127.0.0.1", port=8001)
在下载完成 MeloTTS 之后,进入到 melo 目录中,添加上面的 python 代码,主要用于流式输出合成的音频。
具体如何安装 MeloTTS,可以到 https://github.com/myshell-ai/MeloTTS.git 查看。
我们只需要运行上面代码即可。
我搭建 TTS 的过程文档:https://blog.lukeewin.top/archives/melotts
默认使用 qwen:1.8b 模型,如果你要替换其它模型,可以修改下面代码:
@app.post("/api/llm")
async def llm(question: str = Form()):
dialogue_history = []
llm_url = "http://localhost:11434/api/chat"
headers = {
"Content-Type": "application/json"
}
dialogue_history.append({
"role": "user",
"content": question
})
data = {
"model": "qwen:1.8b",
"messages": dialogue_history,
"stream": False
}
response = requests.post(llm_url, data=json.dumps(data), headers=headers)
if response.status_code == 200:
response_json = response.json()
ai_answer = response_json["message"]["content"]
dialogue_history.append({
"role": "assistant",
"content": ai_answer
})
return ResponseJsonData(
code=200,
message="成功",
data={
"text": ai_answer
}
)
else:
return ResponseJsonData(
code=ErrorCode.LLM_FAIL,
message="大模型生成内容失败"
)
找到 "model": "qwen:1.8b" 修改为 ollama 支持的模型,比如修改为 "model": "llama3.2"
当然如果你想要对接云端的大模型,你需要自己改造代码。
OpenAI Whisper 模型可以到我的博客中查看如何下载 https://blog.lukeewin.top/archives/openai-whisper-offline-install#toc-head-15
Faster Whisper 模型可以到 https://huggingface.co/Systran 下载
执行下面 sql 命令,创建数据库
/*
Navicat Premium Data Transfer
Source Server : Localhost_MySQL5.7
Source Server Type : MySQL
Source Server Version : 50743
Source Host : localhost:3306
Source Schema : asr_llm_tts
Target Server Type : MySQL
Target Server Version : 50743
File Encoding : 65001
Date: 03/12/2024 12:05:53
*/
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for user
-- ----------------------------
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
`id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`username` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`password` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 2 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
SET FOREIGN_KEY_CHECKS = 1;
然后给 user 表设置内容
insert into user(username, password) value('lukeewin', '123456')
conda env create -f asr_llm_tts_environment.yml