Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
TartuNLP
text-to-speech
Commits
ce979507
Commit
ce979507
authored
Jun 17, 2022
by
Liisa Rätsep
Browse files
multispeaker synthesis
parent
08f63b41
Changes
7
Hide whitespace changes
Inline
Side-by-side
README.md
View file @
ce979507
# Eestikeelne kõnesüntees
Skriptid eestikeelse mitmehäälse kõnesünteesi kasutamiseks teksifaili põhjal.
Kõnesüntees on loodud koostöös
[
Eesti Keele Instituudiga
](
http://portaal.eki.ee/
)
Kõnesünteesi on võimalik kasutada ka meie
[
veebidemos
](
https://www.neurokone.ee
)
. Samade mudelite rakendusliidese
Kõnesünteesi on võimalik kasutada ka meie
[
veebidemos
](
https://www.neurokone.ee
)
. Samade mudelite rakendusliidese
komponendid on kättesaadavad
[
siit
](
https://github.com/TartuNLP/text-to-speech-api
)
ja
[
siit
](
https://github.com/TartuNLP/text-to-speech-worker
)
.
## Nõuded ja seadistamine
Siinseid instruktsioone on testitud Ubuntuga. Kood on nii CPU- kui GPU-sõbralik (vajab CUDA-t GPU kasutamiseks).
Siinseid instruktsioone on testitud Ubuntuga. Kood töötab nii CPU-de kui ka GPU-ga, kuid GPU-dega on süntees märgatavlt
kiirem.
-
Veendu, et järgmised komponendid on installitud:
-
Conda (loe lähemalt: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)
-
[
CUDA
](
https://developer.nvidia.com/cuda-downloads
)
kui kasutad GPU-d
-
[
Conda
](
https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html
)
-
GNU Compiler Collection (jooksuta:
`sudo apt install build-essential`
)
-
Klooni see repositoorium koos alammoodulitega:
```
```
commandline
git clone --recurse-submodules https://koodivaramu.eesti.ee/tartunlp/text-to-speech
```
-
Loo ja aktiveeri Conda keskond:
```
-
Loo ja aktiveeri Conda keskond. Vaheta alltoodud käsus keskkonna fail
`environments/environment.gpu.yml`
vastu, kui
soovid GPU-d kasutada.
```
commandline
cd text-to-speech
conda env create -f environments/environment.yml
conda activate transformer-tts
python -c 'import nltk; nltk.download("punkt")'
```
-
Lae alla meie
[
TransformerTTS mudelid
](
https://github.com/TartuNLP/text-to-speech-worker/releases/tag/v2.0.0
)
ja aseta need
`models/`
kausta.
-
Lae alla meie
[
TransformerTTS mudel
](
https://github.com/TartuNLP/TransformerTTS/releases/tag/v1.1.0-beta.2
)
ja aseta
need
`models/`
kausta.
## Kasutamine
Tekstifaili saab sünteesida järgmise käsuga. Hetkel oskab skript lugeda vaid toorteksti kujul faile ja salvestab
Tekstifaili saab sünteesida järgmise käsuga. Hetkel oskab skript lugeda vaid toorteksti kujul faile ja salvestab
väljundi
`.wav`
formaadis.
```
python synthesizer.py --
speaker albert
test.txt test.wav
```
commandline
python synthesizer.py --
model models/albert --vocoder models/hifigan/vctk
test.txt test.wav
```
Lisainfot skripti kasutamise kohta saab
`--help`
parameetriga:
```
synthesizer.py [-h]
[
--
speaker SPEAK
ER
]
[--speed SPEED] [--
config CONFIG
] input output
synthesizer.py [-h] --
model MODEL --vocoder VOCOD
ER [--speed SPEED] [--
speaker-id SPEAKER_ID
] input output
positional arguments:
input Input text file to synthesize.
output Output .wav file path.
input
Input text file to synthesize.
output
Output .wav file path.
optional arguments:
-h, --help show this help message and exit
--speaker SPEAKER The name of the speaker to use for synthesis.
--speed SPEED Output speed multiplier.
--config CONFIG The config file to load.
-h, --help show this help message and exit
--model MODEL The directory of the TTS model weights (must contain a .hdf5 and config.yaml file)
--vocoder VOCODER The directory that contains the vocoder model.
--speed SPEED Output speed multiplier.
--speaker-id SPEAKER_ID Speaker ID for multispeaker models.
```
\ No newline at end of file
README_eng.md
View file @
ce979507
...
...
@@ -4,35 +4,37 @@ Scripts for Estonian multispeaker speech synthesis from text file input.
Speech synthesis was developed in collaboration with the
[
Estonian Language Institute
](
http://portaal.eki.ee/
)
.
Estonian text-to-speech can also be used via our
[
web demo
](
https://www.neurokone.ee
)
. The components
Estonian text-to-speech can also be used via our
[
web demo
](
https://www.neurokone.ee
)
. The components
to run the same models via API have can be found
[
here
](
https://github.com/TartuNLP/text-to-speech-api
)
and
[
here
](
https://github.com/TartuNLP/text-to-speech-worker
)
.
## Requirements and installation
The following installation instructions have been tested on Ubuntu. The code is both CPU and GPU compatible
(CUDA required)
.
The following installation instructions have been tested on Ubuntu. The code is both CPU and GPU compatible
, but
synthesis is considerably faster with GPUs
.
-
Make sure you have the following prerequisites installed:
-
[
CUDA
](
https://developer.nvidia.com/cuda-downloads
)
if you use a GPU
-
Conda (see https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)
-
GNU Compiler Collection (run
`sudo apt install build-essential`
)
-
Clone with submodules
```
```
commandline
git clone --recurse-submodules https://koodivaramu.eesti.ee/tartunlp/text-to-speech
```
-
Create and activate a Conda environment with all dependencies.
-
Create and activate a Conda environment with all dependencies. Use
`environments/environment.gpu.yml`
instead if you
use a GPU.
```
```
commandline
cd text-to-speech
conda env create -f environments/environment.yml
conda activate transformer-tts
python -c 'import nltk; nltk.download("punkt")'
```
-
Download our
[
TransformerTTS model
s
](
https://github.com/TartuNLP/
text-to-speech-w
or
k
er/releases/tag/v
2.0.0
)
and
-
Download our
[
TransformerTTS model
](
https://github.com/TartuNLP/
Transf
or
m
er
TTS
/releases/tag/v
1.1.0-beta.2
)
and
place them inside the
`models/`
directory.
## Usage
...
...
@@ -40,22 +42,23 @@ python -c 'import nltk; nltk.download("punkt")'
A file can be syntesized with the following command. Currently, only plain text files (utf-8) are supported and the
audio is saved in
`.wav`
format.
```
python synthesizer.py --
speaker albert
test.txt test.wav
```
commandline
python synthesizer.py --
model models/albert --vocoder models/hifigan/vctk
test.txt test.wav
```
More info about script usage can be found with the
`--help`
flag:
```
synthesizer.py [-h]
[
--
speaker SPEAK
ER
]
[--speed SPEED] [--
config CONFIG
] input output
synthesizer.py [-h] --
model MODEL --vocoder VOCOD
ER [--speed SPEED] [--
speaker-id SPEAKER_ID
] input output
positional arguments:
input Input text file to synthesize.
output Output .wav file path.
input
Input text file to synthesize.
output
Output .wav file path.
optional arguments:
-h, --help show this help message and exit
--speaker SPEAKER The name of the speaker to use for synthesis.
--speed SPEED Output speed multiplier.
--config CONFIG The config file to load.
-h, --help show this help message and exit
--model MODEL The directory of the TTS model weights (must contain a .hdf5 and config.yaml file)
--vocoder VOCODER The directory that contains the vocoder model.
--speed SPEED Output speed multiplier.
--speaker-id SPEAKER_ID Speaker ID for multispeaker models.
```
\ No newline at end of file
TransformerTTS
@
52eb984c
Compare
4a624d10
...
52eb984c
Subproject commit
4a624d1054c4da34e3544b87480872e3243845d6
Subproject commit
52eb984cf1eb3b6a745e573cf437bf648e5af025
config.yaml
deleted
100644 → 0
View file @
08f63b41
### Transformer-TTS requried configuration
wav_directory
:
'
'
metadata_path
:
'
'
log_directory
:
'
'
train_data_directory
:
'
'
data_config
:
'
TransformerTTS/config/data_config_est.yaml'
aligner_config
:
'
TransformerTTS/config/aligner_config.yaml'
tts_config
:
'
TransformerTTS/config/tts_config_est.yaml'
data_name
:
'
'
speakers
:
albert
:
config_path
:
config.yaml
checkpoint_path
:
models/tts/albert
vocoder_path
:
models/hifigan/vctk
kalev
:
config_path
:
config.yaml
checkpoint_path
:
models/tts/kalev
vocoder_path
:
models/hifigan/vctk
kylli
:
config_path
:
config.yaml
checkpoint_path
:
models/tts/kylli
vocoder_path
:
models/hifigan/ljspeech
mari
:
config_path
:
config.yaml
checkpoint_path
:
models/tts/mari
vocoder_path
:
models/hifigan/ljspeech
meelis
:
config_path
:
config.yaml
checkpoint_path
:
models/tts/meelis
vocoder_path
:
models/hifigan/vctk
vesta
:
config_path
:
config.yaml
checkpoint_path
:
models/tts/vesta
vocoder_path
:
models/hifigan/vctk
\ No newline at end of file
environments/environment.gpu.yml
View file @
ce979507
...
...
@@ -8,6 +8,7 @@ dependencies:
-
python==3.7.10
-
matplotlib==3.2.2
-
librosa==0.7.1
-
numba==0.48
-
numpy==1.17.4
-
ruamel.yaml==0.16.6
-
tensorflow-gpu=2.2.0
...
...
environments/environment.yml
View file @
ce979507
...
...
@@ -7,6 +7,7 @@ dependencies:
-
python==3.7.10
-
matplotlib==3.2.2
-
librosa==0.7.1
-
numba==0.48
-
numpy==1.17.4
-
ruamel.yaml==0.16.6
-
tensorflow=2.2.0
...
...
synthesizer.py
View file @
ce979507
...
...
@@ -5,32 +5,31 @@ import re
import
numpy
as
np
from
scipy.io
import
wavfile
from
tqdm
import
tqdm
import
yaml
from
yaml.loader
import
SafeLoader
from
nltk
import
sent_tokenize
os
.
environ
[
'TF_CPP_MIN_LOG_LEVEL'
]
=
'2'
sys
.
path
.
append
(
f
'
{
os
.
path
.
dirname
(
os
.
path
.
realpath
(
__file__
))
}
/TransformerTTS'
)
from
TransformerTTS.
utils.config_manager
import
Config
from
TransformerTTS.
model.models
import
ForwardTransformer
from
vocoding.predictors
import
HiFiGANPredictor
from
tts_preprocess_et.convert
import
convert_sentence
class
Synthesizer
:
def
__init__
(
self
,
config_path
:
str
,
checkpoint
_path
:
str
,
vocoder_path
:
str
):
def
__init__
(
self
,
tts_model
_path
:
str
,
vocoder_path
:
str
):
self
.
silence
=
np
.
zeros
(
10000
,
dtype
=
np
.
int16
)
self
.
config
=
Config
(
config_path
=
config_path
)
self
.
model
=
self
.
config
.
load_model
(
checkpoint_path
=
checkpoint_path
)
self
.
model
=
ForwardTransformer
.
load_model
(
tts_model_path
)
self
.
vocoder
=
HiFiGANPredictor
.
from_folder
(
vocoder_path
)
print
(
"Transformer-TTS initialized."
)
def
synthesize
(
self
,
text
:
str
,
speed
:
float
=
1
):
def
synthesize
(
self
,
text
:
str
,
speed
:
float
=
1
,
speaker_id
:
str
=
0
):
"""Convert text to speech waveform.
Args:
text (str) : Input text to be synthesized
speed (float)
speaker_id (int)
"""
def
clean
(
sent
):
...
...
@@ -63,10 +62,9 @@ class Synthesizer:
for
i
,
sentence
in
enumerate
(
tqdm
(
sentences
,
unit
=
"sentence"
)):
sentence
=
clean
(
sentence
)
out
=
self
.
model
.
predict
(
sentence
,
speed_regulator
=
speed
)
out
=
self
.
model
.
predict
(
sentence
,
speed_regulator
=
speed
,
speaker_id
=
speaker_id
)
waveform
=
self
.
vocoder
([
out
[
'mel'
].
numpy
().
T
])
if
i
!=
0
:
waveforms
.
append
(
self
.
silence
)
waveforms
.
append
(
self
.
silence
)
waveforms
.
append
(
waveform
[
0
])
waveform
=
np
.
concatenate
(
waveforms
)
...
...
@@ -82,21 +80,21 @@ if __name__ == '__main__':
help
=
"Input text file to synthesize."
)
parser
.
add_argument
(
'output'
,
type
=
FileType
(
'w'
),
help
=
"Output .wav file path."
),
parser
.
add_argument
(
'--speaker'
,
type
=
str
,
required
=
True
,
help
=
"The name of the speaker to use for synthesis."
)
parser
.
add_argument
(
'--model'
,
required
=
True
,
help
=
"The directory of the TTS model weights (must contain a .hdf5 and config.yaml file)"
)
parser
.
add_argument
(
'--vocoder'
,
required
=
True
,
help
=
"The directory that contains the vocoder model."
)
parser
.
add_argument
(
'--speed'
,
type
=
int
,
default
=
1
,
help
=
"Output speed multiplier."
)
parser
.
add_argument
(
'--
config'
,
type
=
FileType
(
'r'
),
default
=
'config.yaml'
,
help
=
"
The config file to load
."
)
parser
.
add_argument
(
'--
speaker-id'
,
type
=
int
,
default
=
0
,
help
=
"
Speaker ID for multispeaker models
."
)
args
=
parser
.
parse_known_args
()[
0
]
with
open
(
args
.
config
.
name
,
'r'
,
encoding
=
'utf-8'
)
as
f
:
config
=
yaml
.
load
(
f
,
Loader
=
SafeLoader
)[
'speakers'
][
args
.
speaker
]
synthesizer
=
Synthesizer
(
**
config
)
synthesizer
=
Synthesizer
(
tts_model_path
=
args
.
model
,
vocoder_path
=
args
.
vocoder
)
with
open
(
args
.
input
.
name
,
'r'
,
encoding
=
'utf-8'
)
as
f
:
text
=
f
.
read
()
waveform
=
synthesizer
.
synthesize
(
text
,
speed
=
args
.
speed
)
waveform
=
synthesizer
.
synthesize
(
text
,
speed
=
args
.
speed
,
speaker_id
=
args
.
speaker_id
)
wavfile
.
write
(
args
.
output
.
name
,
22050
,
waveform
.
astype
(
np
.
int16
))
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment