Custom Voices and Phoneme Tuning in eSpeak: A Practical Guide
Introduction
eSpeak is a compact, open-source text‑to‑speech (TTS) engine that offers precise control over pronunciation through custom voices and phoneme-level tuning. This guide shows practical steps to create or modify voices, adjust phoneme mappings, and test results on Linux (commands work similarly on macOS/Windows with equivalent paths).
Prerequisites
- Install eSpeak: On Debian/Ubuntu:
bash
sudo apt update sudo apt install espeak
- Optional: espeak-ng for newer features:
bash
sudo apt install espeak-ng
- Basic familiarity with a text editor and the command line.
Voice architecture overview
- Voices live in eSpeak’s
voicesdirectory (e.g.,/usr/share/espeak-data/voices/or/usr/local/share/espeak-ng-data/voices/). - A voice is defined by a plain-text file (language name) and optional variant files. Key components:
- Phoneme set — mapping from ASCII phonemes to sound units.
- Pronunciation rules — letter-to-sound rules and exceptions.
- Prosody settings — pitch, pitch range, speed, emphasis.
- Sample or audio concatenation hints (limited in eSpeak).
Locating and copying a base voice
- List available voices:
bash
ls /usr/share/espeak-data/voices/
- Copy a close existing voice as a template:
bash
sudo cp /usr/share/espeak-data/voices/en/en /usr/local/share/espeak-data/voices/en/custom
(If directory differs, adapt paths.)
Editing voice parameters
Open the copied voice file (custom) in a text editor. Common parameters:
language— language code.voice_type— male/female/child markings.pitch— base pitch (0–99).range— pitch range.rate— default words per minute.stress,intonation— controls for emphasis and phrasing.
Example (lines to adjust):
Code
language en pitch 50 range 10 rate 150
Save changes and test:
bash
espeak -v en+custom “This is a voice test.”
Phoneme tuning fundamentals
- eSpeak uses an ASCII phoneme set (e.g., ah, ih, sh). Pronunciation rules are defined in the
*.rulesfiles and the main voice file. - Two main areas to tune pronunciation:
- G2P (grapheme-to-phoneme) rules — how letters map to phonemes.
- Phoneme durations and pitch targets — adjust timing and intonation.
Phoneme durations and pitch can be set with inline phoneme notation using square brackets and tags or by editing the voice’s phoneme config files.
Inline phoneme examples
Use explicit phoneme sequences in text with the -x or -q flags, or include phonemes with braces:
bash
espeak -v en ’+%s’ “Hello”
To force phonemes directly (example uses espeak-ng syntax):
bash
espeak-ng –pho “h eh l ow” -v en+custom
Or within text (espeak-ng):
Code
[ h eh l ow ]
Modifying G2P rules
G2P rules are in files like en.rules or in the voice file. Rule format (simplified):
Code
patternleft_context right_context phonemeoutput
Example: change how “th” is pronounced:
Code
th _ vowel th
After editing, reload or restart your TTS session and test common words that exercise the rule:
bash
espeak -v en+custom “this that thin those”
Adjusting phoneme durations and prosody
- In the voice file or phoneme definitions, durations for phonemes can be set via numeric modifiers or tags. For espeak-ng, use the SAPI-style tags or SSML for fine control.
- Example SSML snippet:
xml
<speak> <prosody pitch=“+10%” rate=“90%”> Hello, <break time=“200ms”/> world. </prosody> </speak>
Use espeak-ng’s SSML support:
bash
espeak-ng –ssml=1 -v en+custom file.ssml
Creating and testing small changes iteratively
- Make one change at a time (e.g., reduce vowel duration).
- Test a list of target words and sentences.
- Keep a changelog with the voice file version and notes.
- Use recordings and compare A/B with the original voice.
Example modification: lengthening final vowels
Edit phoneme duration mapping (example syntax depends on eSpeak vs espeak-ng). For espeak-ng, add a rule to increase vowel length at word end:
Code
vowel _ word-end vowellong
Test:
bash
espeak-ng -v en+custom “see me go”
Troubleshooting
- If changes don’t apply, ensure you edited the correct voice path and restarted any services.
- Use verbose mode for diagnostics:
bash
espeak –pho=“text” -v en+custom –debug
- Backup original files before edits.
Packaging and sharing your voice
- Bundle the voice file and any rule files into a directory named after the language/variant.
- Include a README with installation steps:
bash
sudo cp -r custom /usr/local/share/espeak-ng-data/voices/en/
- Optionally share via GitHub with usage examples.
Conclusion
Customizing eSpeak voices and phoneme tuning involves editing voice parameter files, adjusting G2P rules, and using inline phoneme/SSML controls for prosody and timing. Work iteratively, test frequently, and keep backups.
Leave a Reply