近日,Bert-vits2-v2.2如约更新,该新版本v2.2主要把Emotion 模型换用CLAP多模态模型,推理支持输入text prompt提示词和audio prompt提示语音来进行引导风格化合成,让推理音色更具情感特色,并且推出了新的预处理webuI,操作上更加亲民和接地气。
https://github.com/fishaudio/Bert-VITS2/releases/tag/v2.2
与此同时,基于FastApi的推理web界面项目也同步适配了Bert-vits2-v2.2版本,官网如下:
https://github.com/jiangyuxiaoxiao/Bert-VITS2-UI
本次我们基于此两个项目来克隆原神角色八重神子的英文语音模型miko。
Bert-vits2-v2.2新的底模和情感模型
首先克隆Bert-vits2-v2.2官方项目:
git?clone?https://github.com/fishaudio/Bert-VITS2/tree/v2.2
安装依赖:
pip3?install?-r?requirements.txt
这里注意是v2.2的tag分支,因为官方随时都在更新,主分支可能会存在bug。
进入项目的目录:
cd?/Bert-VITS2
随后下载新的底模和情感模型,下载地址:
https://openi.pcl.ac.cn/Stardust_minus/Bert-VITS2/modelmanage/show_model
将新的情感模型clap-hatsat-fused放入到项目的emotional目录,结构如下:
E:\work\Bert-VITS2-v22\emotional>tree?/f
Folder?PATH?listing?for?volume?myssd
Volume?serial?number?is?7CE3-15AE
E:.
├───clap-htsat-fused
│???????.gitattributes
│???????config.json
│???????merges.txt
│???????preprocessor_config.json
│???????pytorch_model.bin
│???????README.md
│???????special_tokens_map.json
│???????tokenizer.json
│???????tokenizer_config.json
│???????vocab.json
│
└───wav2vec2-large-robust-12-ft-emotion-msp-dim
.gitattributes
config.json
LICENSE
preprocessor_config.json
pytorch_model.bin
README.md
vocab.json
注意,wav2vec2-large-robust-12-ft-emotion-msp-dim是Bert-vits2-v2.1的情感模型,也需要保留,具体请移步:义无反顾马督工,Bert-vits2V210复刻马督工实践(Python3.10), 这里不再赘述。
至此,新模型就配置好了。
Bert-vits2-v2.2模型训练
首先下载训练集,以原神角色八重神子的英文配音为例子,数据集下载地址:
https://github.com/AI-Hobbyist/Genshin_Datasets
随后新建miko角色目录
mkdir?miko
将语音标注文件以esd.list命名,放入miko目录。
同时将分片语音素材放入raw目录。
最后新建miko/configs/config.json配置文件:
{
"train":?{
"log_interval":?50,
"eval_interval":?50,
"seed":?42,
"epochs":?1000,
"learning_rate":?0.0002,
"betas":?[
0.8,
0.99
],
"eps":?1e-09,
"batch_size":?6,
"fp16_run":?false,
"lr_decay":?0.99995,
"segment_size":?16384,
"init_lr_ratio":?1,
"warmup_epochs":?0,
"c_mel":?45,
"c_kl":?1.0,
"skip_optimizer":?false,
"freeze_ZH_bert":?false,
"freeze_JP_bert":?false,
"freeze_EN_bert":?false
},
"data":?{
"training_files":?"data/miko/train.list",
"validation_files":?"data/miko/val.list",
"max_wav_value":?32768.0,
"sampling_rate":?44100,
"filter_length":?2048,
"hop_length":?512,
"win_length":?2048,
"n_mel_channels":?128,
"mel_fmin":?0.0,
"mel_fmax":?null,
"add_blank":?true,
"n_speakers":?1,
"cleaned_text":?true,
"spk2id":?{
"miko":?0
}
},
"model":?{
"use_spk_conditioned_encoder":?true,
"use_noise_scaled_mas":?true,
"use_mel_posterior_encoder":?false,
"use_duration_discriminator":?true,
"inter_channels":?192,
"hidden_channels":?192,
"filter_channels":?768,
"n_heads":?2,
"n_layers":?6,
"kernel_size":?3,
"p_dropout":?0.1,
"resblock":?"1",
"resblock_kernel_sizes":?[
3,
7,
11
],
"resblock_dilation_sizes":?[
[
1,
3,
5
],
[
1,
3,
5
],
[
1,
3,
5
]
],
"upsample_rates":?[
8,
8,
2,
2,
2
],
"upsample_initial_channel":?512,
"upsample_kernel_sizes":?[
16,
16,
8,
2,
2
],
"n_layers_q":?3,
"use_spectral_norm":?false,
"gin_channels":?256
},
"version":?"2.2"
}
这里注意"version": "2.2",即版本号为最新的v2.2。
其他参数根据当前的设备环境酌情调整即可。
随后启动预处理页面:
python3?webui_preprocess.py
按照页面的步骤进行操作即可,简单且方便。
操作完之后,运行训练命令:
python3?train_ms.py
训练好的模型放在data/miko/models目录,结构如下:
E:\work\Bert-VITS2-v22\Data\miko\models>tree?/f
Folder?PATH?listing?for?volume?myssd
Volume?serial?number?is?7CE3-15AE
E:.
│???DUR_0.pth
│???DUR_100.pth
│???DUR_150.pth
│???DUR_50.pth
│???D_0.pth
│???D_100.pth
│???D_150.pth
│???D_50.pth
│???events.out.tfevents.1702457087.ly.13044.0
│???events.out.tfevents.1702458207.ly.12416.0
│???githash
│???G_0.pth
│???G_100.pth
│???G_150.pth
│???G_50.pth
│???train.log
│
└───eval
events.out.tfevents.1702457087.ly.13044.1
events.out.tfevents.1702458207.ly.12416.1
至此,训练环节结束。
Bert-vits2-v2.2模型推理
推理我们使用Bert-vits2-UI项目的页面,克隆web项目:
git?clone?https://github.com/jiangyuxiaoxiao/Bert-VITS2-UI
将Web项目放入Bert-vits2-v2.2的根目录中,目录结构如下:
E:\work\Bert-VITS2-v22_lilith\Web>tree?/f
Folder?PATH?listing?for?volume?myssd
Volume?serial?number?is?7CE3-15AE
E:.
│???index.html
│
├───assets
│???????index-21bc6a28.css
│???????index-402c0217.js
│
└───img
helps1.png
helps2.png
Hiyori.ico
这里包含主页面、样式文件以及JS文件,基于Hiyori。
随后启动推理页面:
python3?server_fastapi.py
加载模型进行推理即可。
此外,还可以基于FastAPI的接口进行推理,换句话说,发送http请求即可获取推理音频,接口参数如下:
{
"openapi":?"3.1.0",
"info":?{
"title":?"FastAPI",
"version":?"0.1.0"
},
"paths":?{
"/":?{
"get":?{
"summary":?"Index",
"operationId":?"index__get",
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
}
}
}
},
"/voice":?{
"post":?{
"summary":?"Voice",
"description":?"语音接口,若需要上传参考音频请仅使用post请求",
"operationId":?"voice_voice_post",
"parameters":?[
{
"name":?"model_id",
"in":?"query",
"required":?true,
"schema":?{
"type":?"integer",
"description":?"模型ID",
"title":?"Model?Id"
},
"description":?"模型ID"
},
{
"name":?"speaker_name",
"in":?"query",
"required":?false,
"schema":?{
"type":?"string",
"description":?"说话人名",
"title":?"Speaker?Name"
},
"description":?"说话人名"
},
{
"name":?"speaker_id",
"in":?"query",
"required":?false,
"schema":?{
"type":?"integer",
"description":?"说话人id,与speaker_name二选一",
"title":?"Speaker?Id"
},
"description":?"说话人id,与speaker_name二选一"
},
{
"name":?"sdp_ratio",
"in":?"query",
"required":?false,
"schema":?{
"type":?"number",
"description":?"SDP/DP混合比",
"default":?0.2,
"title":?"Sdp?Ratio"
},
"description":?"SDP/DP混合比"
},
{
"name":?"noise",
"in":?"query",
"required":?false,
"schema":?{
"type":?"number",
"description":?"感情",
"default":?0.2,
"title":?"Noise"
},
"description":?"感情"
},
{
"name":?"noisew",
"in":?"query",
"required":?false,
"schema":?{
"type":?"number",
"description":?"音素长度",
"default":?0.9,
"title":?"Noisew"
},
"description":?"音素长度"
},
{
"name":?"length",
"in":?"query",
"required":?false,
"schema":?{
"type":?"number",
"description":?"语速",
"default":?1,
"title":?"Length"
},
"description":?"语速"
},
{
"name":?"language",
"in":?"query",
"required":?false,
"schema":?{
"type":?"string",
"description":?"语言",
"title":?"Language"
},
"description":?"语言"
},
{
"name":?"auto_translate",
"in":?"query",
"required":?false,
"schema":?{
"type":?"boolean",
"description":?"自动翻译",
"default":?false,
"title":?"Auto?Translate"
},
"description":?"自动翻译"
},
{
"name":?"auto_split",
"in":?"query",
"required":?false,
"schema":?{
"type":?"boolean",
"description":?"自动切分",
"default":?false,
"title":?"Auto?Split"
},
"description":?"自动切分"
},
{
"name":?"emotion",
"in":?"query",
"required":?false,
"schema":?{
"anyOf":?[
{
"type":?"integer"
},
{
"type":?"string"
},
{
"type":?"null"
}
],
"description":?"emo",
"title":?"Emotion"
},
"description":?"emo"
}
],
"requestBody":?{
"required":?true,
"content":?{
"multipart/form-data":?{
"schema":?{
"$ref":?"#/components/schemas/Body_voice_voice_post"
}
}
}
},
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
},
"422":?{
"description":?"Validation?Error",
"content":?{
"application/json":?{
"schema":?{
"$ref":?"#/components/schemas/HTTPValidationError"
}
}
}
}
}
},
"get":?{
"summary":?"Voice",
"description":?"语音接口",
"operationId":?"voice_voice_get",
"parameters":?[
{
"name":?"text",
"in":?"query",
"required":?true,
"schema":?{
"type":?"string",
"description":?"输入文字",
"title":?"Text"
},
"description":?"输入文字"
},
{
"name":?"model_id",
"in":?"query",
"required":?true,
"schema":?{
"type":?"integer",
"description":?"模型ID",
"title":?"Model?Id"
},
"description":?"模型ID"
},
{
"name":?"speaker_name",
"in":?"query",
"required":?false,
"schema":?{
"type":?"string",
"description":?"说话人名",
"title":?"Speaker?Name"
},
"description":?"说话人名"
},
{
"name":?"speaker_id",
"in":?"query",
"required":?false,
"schema":?{
"type":?"integer",
"description":?"说话人id,与speaker_name二选一",
"title":?"Speaker?Id"
},
"description":?"说话人id,与speaker_name二选一"
},
{
"name":?"sdp_ratio",
"in":?"query",
"required":?false,
"schema":?{
"type":?"number",
"description":?"SDP/DP混合比",
"default":?0.2,
"title":?"Sdp?Ratio"
},
"description":?"SDP/DP混合比"
},
{
"name":?"noise",
"in":?"query",
"required":?false,
"schema":?{
"type":?"number",
"description":?"感情",
"default":?0.2,
"title":?"Noise"
},
"description":?"感情"
},
{
"name":?"noisew",
"in":?"query",
"required":?false,
"schema":?{
"type":?"number",
"description":?"音素长度",
"default":?0.9,
"title":?"Noisew"
},
"description":?"音素长度"
},
{
"name":?"length",
"in":?"query",
"required":?false,
"schema":?{
"type":?"number",
"description":?"语速",
"default":?1,
"title":?"Length"
},
"description":?"语速"
},
{
"name":?"language",
"in":?"query",
"required":?false,
"schema":?{
"type":?"string",
"description":?"语言",
"title":?"Language"
},
"description":?"语言"
},
{
"name":?"auto_translate",
"in":?"query",
"required":?false,
"schema":?{
"type":?"boolean",
"description":?"自动翻译",
"default":?false,
"title":?"Auto?Translate"
},
"description":?"自动翻译"
},
{
"name":?"auto_split",
"in":?"query",
"required":?false,
"schema":?{
"type":?"boolean",
"description":?"自动切分",
"default":?false,
"title":?"Auto?Split"
},
"description":?"自动切分"
},
{
"name":?"emotion",
"in":?"query",
"required":?false,
"schema":?{
"anyOf":?[
{
"type":?"integer"
},
{
"type":?"string"
},
{
"type":?"null"
}
],
"description":?"emo",
"title":?"Emotion"
},
"description":?"emo"
}
],
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
},
"422":?{
"description":?"Validation?Error",
"content":?{
"application/json":?{
"schema":?{
"$ref":?"#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/info":?{
"get":?{
"summary":?"Get?Loaded?Models?Info",
"description":?"获取已加载模型信息",
"operationId":?"get_loaded_models_info_models_info_get",
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
}
}
}
},
"/models/delete":?{
"get":?{
"summary":?"Delete?Model",
"description":?"删除指定模型",
"operationId":?"delete_model_models_delete_get",
"parameters":?[
{
"name":?"model_id",
"in":?"query",
"required":?true,
"schema":?{
"type":?"integer",
"description":?"删除模型id",
"title":?"Model?Id"
},
"description":?"删除模型id"
}
],
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
},
"422":?{
"description":?"Validation?Error",
"content":?{
"application/json":?{
"schema":?{
"$ref":?"#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/add":?{
"get":?{
"summary":?"Add?Model",
"description":?"添加指定模型:允许重复添加相同路径模型,且不重复占用内存",
"operationId":?"add_model_models_add_get",
"parameters":?[
{
"name":?"model_path",
"in":?"query",
"required":?true,
"schema":?{
"type":?"string",
"description":?"添加模型路径",
"title":?"Model?Path"
},
"description":?"添加模型路径"
},
{
"name":?"config_path",
"in":?"query",
"required":?false,
"schema":?{
"type":?"string",
"description":?"添加模型配置文件路径,不填则使用./config.json或../config.json",
"title":?"Config?Path"
},
"description":?"添加模型配置文件路径,不填则使用./config.json或../config.json"
},
{
"name":?"device",
"in":?"query",
"required":?false,
"schema":?{
"type":?"string",
"description":?"推理使用设备",
"default":?"cuda",
"title":?"Device"
},
"description":?"推理使用设备"
},
{
"name":?"language",
"in":?"query",
"required":?false,
"schema":?{
"type":?"string",
"description":?"模型默认语言",
"default":?"ZH",
"title":?"Language"
},
"description":?"模型默认语言"
}
],
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
},
"422":?{
"description":?"Validation?Error",
"content":?{
"application/json":?{
"schema":?{
"$ref":?"#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/get_unloaded":?{
"get":?{
"summary":?"Get?Unloaded?Models?Info",
"description":?"获取未加载模型",
"operationId":?"get_unloaded_models_info_models_get_unloaded_get",
"parameters":?[
{
"name":?"root_dir",
"in":?"query",
"required":?false,
"schema":?{
"type":?"string",
"description":?"搜索根目录",
"default":?"Data",
"title":?"Root?Dir"
},
"description":?"搜索根目录"
}
],
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
},
"422":?{
"description":?"Validation?Error",
"content":?{
"application/json":?{
"schema":?{
"$ref":?"#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/get_local":?{
"get":?{
"summary":?"Get?Local?Models?Info",
"description":?"获取全部本地模型",
"operationId":?"get_local_models_info_models_get_local_get",
"parameters":?[
{
"name":?"root_dir",
"in":?"query",
"required":?false,
"schema":?{
"type":?"string",
"description":?"搜索根目录",
"default":?"Data",
"title":?"Root?Dir"
},
"description":?"搜索根目录"
}
],
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
},
"422":?{
"description":?"Validation?Error",
"content":?{
"application/json":?{
"schema":?{
"$ref":?"#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/status":?{
"get":?{
"summary":?"Get?Status",
"description":?"获取电脑运行状态",
"operationId":?"get_status_status_get",
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
}
}
}
},
"/tools/translate":?{
"get":?{
"summary":?"Translate",
"description":?"翻译",
"operationId":?"translate_tools_translate_get",
"parameters":?[
{
"name":?"texts",
"in":?"query",
"required":?true,
"schema":?{
"type":?"string",
"description":?"待翻译文本",
"title":?"Texts"
},
"description":?"待翻译文本"
},
{
"name":?"to_language",
"in":?"query",
"required":?true,
"schema":?{
"type":?"string",
"description":?"翻译目标语言",
"title":?"To?Language"
},
"description":?"翻译目标语言"
}
],
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
},
"422":?{
"description":?"Validation?Error",
"content":?{
"application/json":?{
"schema":?{
"$ref":?"#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/tools/random_example":?{
"get":?{
"summary":?"Random?Example",
"description":?"获取一个随机音频+文本,用于对比,音频会从本地目录随机选择。",
"operationId":?"random_example_tools_random_example_get",
"parameters":?[
{
"name":?"language",
"in":?"query",
"required":?false,
"schema":?{
"type":?"string",
"description":?"指定语言,未指定则随机返回",
"title":?"Language"
},
"description":?"指定语言,未指定则随机返回"
},
{
"name":?"root_dir",
"in":?"query",
"required":?false,
"schema":?{
"type":?"string",
"description":?"搜索根目录",
"default":?"Data",
"title":?"Root?Dir"
},
"description":?"搜索根目录"
}
],
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
},
"422":?{
"description":?"Validation?Error",
"content":?{
"application/json":?{
"schema":?{
"$ref":?"#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/tools/get_audio":?{
"get":?{
"summary":?"Get?Audio",
"operationId":?"get_audio_tools_get_audio_get",
"parameters":?[
{
"name":?"path",
"in":?"query",
"required":?true,
"schema":?{
"type":?"string",
"description":?"本地音频路径",
"title":?"Path"
},
"description":?"本地音频路径"
}
],
"responses":?{
"200":?{
"description":?"Successful?Response",
"content":?{
"application/json":?{
"schema":?{}
}
}
},
"422":?{
"description":?"Validation?Error",
"content":?{
"application/json":?{
"schema":?{
"$ref":?"#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
}
},
"components":?{
"schemas":?{
"Body_voice_voice_post":?{
"properties":?{
"text":?{
"type":?"string",
"title":?"Text"
},
"reference_audio":?{
"type":?"string",
"format":?"binary",
"title":?"Reference?Audio"
}
},
"type":?"object",
"required":?[
"text"
],
"title":?"Body_voice_voice_post"
},
"HTTPValidationError":?{
"properties":?{
"detail":?{
"items":?{
"$ref":?"#/components/schemas/ValidationError"
},
"type":?"array",
"title":?"Detail"
}
},
"type":?"object",
"title":?"HTTPValidationError"
},
"ValidationError":?{
"properties":?{
"loc":?{
"items":?{
"anyOf":?[
{
"type":?"string"
},
{
"type":?"integer"
}
]
},
"type":?"array",
"title":?"Location"
},
"msg":?{
"type":?"string",
"title":?"Message"
},
"type":?{
"type":?"string",
"title":?"Error?Type"
}
},
"type":?"object",
"required":?[
"loc",
"msg",
"type"
],
"title":?"ValidationError"
}
}
}
}
最后奉上Bert-vits2-v2.2本地训练推理整合包:
https://pan.baidu.com/s/1OVX9seRwZR6bZ-xsE_nRLg?pwd=v3uc
与众乡亲同飨。
领取专属 10元无门槛券
私享最新 技术干货