Custom Voices and Phoneme Tuning in eSpeak: A Practical Guide

Introduction

eSpeak is a compact, open-source text‑to‑speech (TTS) engine that offers precise control over pronunciation through custom voices and phoneme-level tuning. This guide shows practical steps to create or modify voices, adjust phoneme mappings, and test results on Linux (commands work similarly on macOS/Windows with equivalent paths).

Prerequisites

Install eSpeak: On Debian/Ubuntu:

bash
sudo apt update sudo apt install espeak

Optional: espeak-ng for newer features:

bash
sudo apt install espeak-ng

Basic familiarity with a text editor and the command line.

Voice architecture overview

Voices live in eSpeak’s voices directory (e.g., /usr/share/espeak-data/voices/ or /usr/local/share/espeak-ng-data/voices/).
A voice is defined by a plain-text file (language name) and optional variant files. Key components:
- Phoneme set — mapping from ASCII phonemes to sound units.
- Pronunciation rules — letter-to-sound rules and exceptions.
- Prosody settings — pitch, pitch range, speed, emphasis.
- Sample or audio concatenation hints (limited in eSpeak).

Locating and copying a base voice

List available voices:

bash
ls /usr/share/espeak-data/voices/

Copy a close existing voice as a template:

bash
sudo cp /usr/share/espeak-data/voices/en/en /usr/local/share/espeak-data/voices/en/custom

(If directory differs, adapt paths.)

Editing voice parameters

Open the copied voice file (custom) in a text editor. Common parameters:

language — language code.
voice_type — male/female/child markings.
pitch — base pitch (0–99).
range — pitch range.
rate — default words per minute.
stress, intonation — controls for emphasis and phrasing.

Example (lines to adjust):

Code
language en pitch 50 range 10 rate 150

Save changes and test:

bash
espeak -v en+custom “This is a voice test.”

Phoneme tuning fundamentals

eSpeak uses an ASCII phoneme set (e.g., ah, ih, sh). Pronunciation rules are defined in the *.rules files and the main voice file.
Two main areas to tune pronunciation:
1. G2P (grapheme-to-phoneme) rules — how letters map to phonemes.
2. Phoneme durations and pitch targets — adjust timing and intonation.

Phoneme durations and pitch can be set with inline phoneme notation using square brackets and tags or by editing the voice’s phoneme config files.

Inline phoneme examples

Use explicit phoneme sequences in text with the -x or -q flags, or include phonemes with braces:

bash
espeak -v en ’+%s’ “Hello”

To force phonemes directly (example uses espeak-ng syntax):

bash
espeak-ng –pho “h eh l ow” -v en+custom

Or within text (espeak-ng):

Code
[ h eh l ow ]

Modifying G2P rules

G2P rules are in files like en.rules or in the voice file. Rule format (simplified):

Code
patternleft_context    right_context    phonemeoutput

Example: change how “th” is pronounced:

Code
th _ vowel th

After editing, reload or restart your TTS session and test common words that exercise the rule:

bash
espeak -v en+custom “this that thin those”

Adjusting phoneme durations and prosody

In the voice file or phoneme definitions, durations for phonemes can be set via numeric modifiers or tags. For espeak-ng, use the SAPI-style tags or SSML for fine control.

Example SSML snippet:

xml
<speak> <prosody pitch=“+10%” rate=“90%”> Hello, <break time=“200ms”/> world. </prosody> </speak>

Use espeak-ng’s SSML support:

bash
espeak-ng –ssml=1 -v en+custom file.ssml

Creating and testing small changes iteratively

Make one change at a time (e.g., reduce vowel duration).

Test a list of target words and sentences.

Keep a changelog with the voice file version and notes.

Use recordings and compare A/B with the original voice.

Example modification: lengthening final vowels

Edit phoneme duration mapping (example syntax depends on eSpeak vs espeak-ng). For espeak-ng, add a rule to increase vowel length at word end:

Code
vowel _ word-end vowellong

Test:

bash
espeak-ng -v en+custom “see me go”

Troubleshooting

If changes don’t apply, ensure you edited the correct voice path and restarted any services.

Use verbose mode for diagnostics:

bash
espeak –pho=“text” -v en+custom –debug

Backup original files before edits.

Packaging and sharing your voice

Bundle the voice file and any rule files into a directory named after the language/variant.

Include a README with installation steps:

bash
sudo cp -r custom /usr/local/share/espeak-ng-data/voices/en/

Optionally share via GitHub with usage examples.

Conclusion

Customizing eSpeak voices and phoneme tuning involves editing voice parameter files, adjusting G2P rules, and using inline phoneme/SSML controls for prosody and timing. Work iteratively, test frequently, and keep backups.

Custom Voices and Phoneme Tuning in eSpeak: A Practical Guide