Custom Voices and Phoneme Tuning in eSpeak: A Practical Guide

Custom Voices and Phoneme Tuning in eSpeak: A Practical Guide

Introduction

eSpeak is a compact, open-source text‑to‑speech (TTS) engine that offers precise control over pronunciation through custom voices and phoneme-level tuning. This guide shows practical steps to create or modify voices, adjust phoneme mappings, and test results on Linux (commands work similarly on macOS/Windows with equivalent paths).

Prerequisites

  • Install eSpeak: On Debian/Ubuntu:

bash

sudo apt update sudo apt install espeak
  • Optional: espeak-ng for newer features:

bash

sudo apt install espeak-ng
  • Basic familiarity with a text editor and the command line.

Voice architecture overview

  • Voices live in eSpeak’s voices directory (e.g., /usr/share/espeak-data/voices/ or /usr/local/share/espeak-ng-data/voices/).
  • A voice is defined by a plain-text file (language name) and optional variant files. Key components:
    • Phoneme set — mapping from ASCII phonemes to sound units.
    • Pronunciation rules — letter-to-sound rules and exceptions.
    • Prosody settings — pitch, pitch range, speed, emphasis.
    • Sample or audio concatenation hints (limited in eSpeak).

Locating and copying a base voice

  1. List available voices:

bash

ls /usr/share/espeak-data/voices/
  1. Copy a close existing voice as a template:

bash

sudo cp /usr/share/espeak-data/voices/en/en /usr/local/share/espeak-data/voices/en/custom

(If directory differs, adapt paths.)

Editing voice parameters

Open the copied voice file (custom) in a text editor. Common parameters:

  • language — language code.
  • voice_type — male/female/child markings.
  • pitch — base pitch (0–99).
  • range — pitch range.
  • rate — default words per minute.
  • stress, intonation — controls for emphasis and phrasing.

Example (lines to adjust):

Code

language en pitch 50 range 10 rate 150

Save changes and test:

bash

espeak -v en+custom “This is a voice test.”

Phoneme tuning fundamentals

  • eSpeak uses an ASCII phoneme set (e.g., ah, ih, sh). Pronunciation rules are defined in the *.rules files and the main voice file.
  • Two main areas to tune pronunciation:
    1. G2P (grapheme-to-phoneme) rules — how letters map to phonemes.
    2. Phoneme durations and pitch targets — adjust timing and intonation.

Phoneme durations and pitch can be set with inline phoneme notation using square brackets and tags or by editing the voice’s phoneme config files.

Inline phoneme examples

Use explicit phoneme sequences in text with the -x or -q flags, or include phonemes with braces:

bash

espeak -v en ’+%s’ “Hello”

To force phonemes directly (example uses espeak-ng syntax):

bash

espeak-ng –pho “h eh l ow” -v en+custom

Or within text (espeak-ng):

Code

[ h eh l ow ]

Modifying G2P rules

G2P rules are in files like en.rules or in the voice file. Rule format (simplified):

Code

patternleft_context right_context phonemeoutput

Example: change how “th” is pronounced:

Code

th _ vowel th

After editing, reload or restart your TTS session and test common words that exercise the rule:

bash

espeak -v en+custom “this that thin those”

Adjusting phoneme durations and prosody

  • In the voice file or phoneme definitions, durations for phonemes can be set via numeric modifiers or tags. For espeak-ng, use the SAPI-style tags or SSML for fine control.
  • Example SSML snippet:

xml

<speak> <prosody pitch=+10% rate=90%> Hello, <break time=200ms/> world. </prosody> </speak>

Use espeak-ng’s SSML support:

bash

espeak-ng –ssml=1 -v en+custom file.ssml

Creating and testing small changes iteratively

  1. Make one change at a time (e.g., reduce vowel duration).
  2. Test a list of target words and sentences.
  3. Keep a changelog with the voice file version and notes.
  4. Use recordings and compare A/B with the original voice.

Example modification: lengthening final vowels

Edit phoneme duration mapping (example syntax depends on eSpeak vs espeak-ng). For espeak-ng, add a rule to increase vowel length at word end:

Code

vowel _ word-end vowellong

Test:

bash

espeak-ng -v en+custom “see me go”

Troubleshooting

  • If changes don’t apply, ensure you edited the correct voice path and restarted any services.
  • Use verbose mode for diagnostics:

bash

espeak –pho=“text” -v en+custom –debug
  • Backup original files before edits.

Packaging and sharing your voice

  • Bundle the voice file and any rule files into a directory named after the language/variant.
  • Include a README with installation steps:

bash

sudo cp -r custom /usr/local/share/espeak-ng-data/voices/en/
  • Optionally share via GitHub with usage examples.

Conclusion

Customizing eSpeak voices and phoneme tuning involves editing voice parameter files, adjusting G2P rules, and using inline phoneme/SSML controls for prosody and timing. Work iteratively, test frequently, and keep backups.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *