Natural Sounding Text to Speech?

Question

I am looking for some easy to install text to speech software for Ubuntu that sounds natural. I've installed Festival, Gespeaker, etc., but nothing sounds very natural. All very synthetic and hard to understand.

Any recommendations out there?

Possible duplicate of How can I install and use text-to-speech software? — Organic Addict, Dec 05 '15 at 20:24

score 68 · Answer 1 · edited Aug 03 '21 at 09:11

68

SVOX pico2wave

sudo apt install libttspico-utils

A very minimalistic TTS, a better sounding than espeak or mbrola (to my mind). Some information here.

I don't understand why pico2wave is, compared to espeak or mbrola, rarely discussed. It's small, but sounds really good (natural). Without modification you'll hear a natural sounding female voice.

AND ... compared to Mbrola, it recognise Units and speaks it the right way!
For example:

2°C → two degrees
2m → two meters
2kg → two kilograms

After installation I use it in a script:

#!/bin/bash
pico2wave -w=/tmp/test.wav "$1"
aplay /tmp/test.wav
rm /tmp/test.wav

Then run it with the desired text:

<scriptname>.sh "hello world"

or read the contents of an entire file:

<scriptname>.sh "$(cat <filename>)"

That's all to have a lightweight, stable working TTS on Ubuntu.

edited Aug 03 '21 at 09:11

jkoop

69

answered Aug 24 '12 at 15:12

user85321

1,425

2

As far as I can see, it only uses cli parameters as input. Is there any way I can get pico2wave to read text from a filename? – Carlos Eugenio Thompson Pinzón Feb 15 '14 at 17:42
17

pico2wave is in package libttspico-utils in recent versions of ubuntu. @CarlosEugenioThompsonPinzón cat <filename> | xargs -I foo -0 pico2wave -w blah.wav foo – naught101 Mar 11 '14 at 09:11
3

@CarlosEugenioThompsonPinzón pico2wave -w a.wav "$(input.txt)" =). Agree that this CLI interface is bad design: unlike the huge majority of CLIs, and possible to reach the OS max CLI arg length. – Ciro Santilli OurBigBook.com Apr 13 '14 at 09:44
i have installed it pls tell me how to use it in espeak i cant get – user49557 Jun 19 '15 at 10:36
@user49557: this answer isn't about espeak. for this answer, install the libttspico-utils package, then put pico2wave -w ~/output.wav "the text" – Koen Jun 22 '15 at 09:14
@CiroSantilli六四事件法轮功纳米比亚胡海峰 May that CLI interface thingee be the reason that my pico2wave output stucks after 858 words, while the .txt file I provided is 10600 words? – Koen Jun 22 '15 at 09:18
1

@Koen I don't know! :-) Like any other problem, try to produce a minimal example, e.g. using echo {1..1000} – Ciro Santilli OurBigBook.com Jun 22 '15 at 09:48
@koen thanks dude but can i ask you any more questions related to this – user49557 Jun 25 '15 at 10:51
1

@user49557 We're not supposed to hijack others' questions, so maybe you can create a new question, explaining what exactly you installed, and what it is that went wrong, and then I can always try and help you (no guarantees, though, I'm not an expert :P) – Koen Jun 25 '15 at 12:24
@user85321 I don't understand why pico2wave is, compared to espeak or mbrola, rarely discussed. Because with such a name it's really not easy to find but you are perfectly right, I tried it and compare to eSpeak, that's far much better! – 2ndGAB Jan 31 '18 at 12:33
@2ndGAB, because the syntax doesn't natively support files, and the fact that it should be used in two steps(as explained in the shell) and the command pico2wave itself doesn't support pipes, it could be nice if someone gives a workaround about the pipes. Otherwise the reading almost human and is not outperformed by other than Prestigio's text to speech. – user10089632 Feb 10 '18 at 16:42
1

On Ubuntu 18.04 this sounds very unnatural when I ask it to pronounce English. "Hello world" comes out as "hee-low valud" :( – Martin Eden Oct 19 '18 at 11:05
1

@MartinEden That's because in this answer, the line in the script which calls pico2wave includes the option -l=de-DE which causes it to use a German voice. Just remove that option and it will default to a English (US) voice. I will propose an edit to this answer as it does not make sense given that both the answer, and the sample text ("hello world") are in English and not German. – Jon Bentley Jan 20 '19 at 19:13
This is GREAT but I cannot figure out how we can replace the default ubuntu TTS espeak with this one? – Binod Kalathil Oct 01 '20 at 10:02

score 36 · Answer 2 · answered Apr 25 '17 at 19:31

36

Pico and espeak are fun and easy to get to work, but they're not all that good. The default Festival voices are also not that good. However, Festival is a scheme-based speech framework, where a number of researchers have built much better plug-in voices. You can easily surpass the pico2wave quality on stock Ubuntu, because one of those voices is available as a ready-made package.

To make Festival sound natural, here's what to do:

sudo apt-get install festival
sudo apt-get install festvox-us-slt-hts
festival -i
festival> (voice_cmu_us_slt_arctic_hts) 
festival> (SayText "Don't hate me, I'm just doing my job!")

You can do it from the command line by using -b (or --batch) and putting each command into single quotes:

festival -b '(voice_cmu_us_slt_arctic_hts)' \
    '(SayText "The temperature is 22 degrees centigrade and there is a slight breeze from the west.")'

You can get other quite good voices from the Nitech repository, but installing them is finicky, and the default paths changed so the file name references in the bundled scheme files may need to be manually edited to work on stock Ubuntu.

answered Apr 25 '17 at 19:31

Jon Watte

526

4

Btw, in Ubuntu 16.04, this package seems to be missing. You can download and install the deb from Debian and it will work fine: https://packages.debian.org/sid/all/festvox-us-slt-hts/download sudo dpkg -i Downloads/festvox-us-slt-hts_0.2010.10.25-2_all.deb – Jon Watte Aug 20 '17 at 02:48
Much better than pico2wave, adjustable, better separation between words, better customzability, 1-step. With pico I had to slow the output wav in order to understand it – Berry Tsakala Aug 05 '21 at 18:01
@BerryTsakala I just took script above and stuck it in ~/.bashrc as a function instead. I called the function speak and I can just use speak "something to say" and it works perfectly. – WinEunuuchs2Unix Oct 01 '21 at 00:18
3

OP asked for natural sounding TTS. Festival is still quite robotic. – Nav Mar 27 '22 at 12:55
2

Small command to read content of clipboard from bash: echo "(SayText \"$(xclip -selection clipboard -o)\")" | festival '(voice_cmu_us_slt_arctic_hts)' --pipe – Olle Härstedt Nov 05 '22 at 23:41
Is there no way to get this to go to a wav file? It crashed because my server does not have audio out... – jjxtra Feb 09 '23 at 00:35
1

@jjxtra The manual page is at https://linux.die.net/man/1/festival and documents the command. You can run in --server mode. Or you can the festival command language to synthesize an utterance and save it to disk. See also https://www.cstr.ed.ac.uk/projects/festival/manual/festival_7.html – Jon Watte Feb 17 '23 at 17:04
1

An example on how to read a text file, and being able to pause it, would be nice, too. – Olle Härstedt Nov 03 '23 at 09:10
For me (Pop!OS 22.04, based on Ubuntu), once I've installed both packages, the (voice_cmu_us_slt_arctic_hts) command has no effect. I'm not sure whether the voice I'm hearing is the default, or your suggested "natural sounding" one. – Jonathan Hartley Dec 10 '24 at 22:34
Remember that bash helps with the CLI. For example, if you omit the (voice...) command, than you can combine festival's --tts parameter (the given files should contain just text to be spoken, rather than "SayText" commands and the like), with Bash's "Put this text in a temporary file and return the filename" operator <<<"...", like this: festival --tts <<<"hello there" – Jonathan Hartley Dec 10 '24 at 22:37
Although I can't figure out how to combine the above technique with supplying extra commands like the (voice...) command from the post. Festival seems to assume --tts makes ALL params contain text to be spoken, not just the ones following. – Jonathan Hartley Dec 10 '24 at 22:38
Festival does not seem to be currently maintained. The official site only has docs for v2.4 (whereas 2.5 is what Ubuntu currently installs), and the official site has many broken links, including all the demos, and many of the paths to download things like extra voices. – Jonathan Hartley Dec 11 '24 at 01:36
@JonathanHartley it's not surprising that a research project from ten years ago is no longer maintained, given that generative nerural networks are outclassing any previous syntehsis method, so there's no additional research value in the old methods. suno/bark was popular on HuggingFace but it's already being outdone by even newer models. Go to HuggingFace, browse TTS models, pick one that you like. – Jon Watte Dec 16 '24 at 03:03
@JonWatte Hey there! Yep, absolutely! Thanks for the suggestion. The point of my comment and downvote was so that that future readers of this question won't waste time investigating this answer's unqualified recommendation (as I did) without at least some hints about what they're getting into. – Jonathan Hartley Dec 16 '24 at 14:47
Yikes, though for someone used to traditional TTS, it's a bit bemusing to try the hugging face suggestion, pipx install TTS (downloads 6GB), then try and get it to say something (downloads a further 2GB, then crashes with a big traceback.) I guess that means I chose the "wrong" TTS model...? – Jonathan Hartley Dec 16 '24 at 16:36
I guess what I'm trying to say is, if the huggingface suggestion is really viable, then this question could really use a new answer laying it out for drive-by readers who just want a simple, good TTS that works. – Jonathan Hartley Dec 16 '24 at 16:37

score 21 · Accepted Answer · edited Feb 21 '19 at 17:50

21

SpeakIt!

I believe Ive found the best TTS software for free using a Google Chrome extension called "SpeakIt". This only works in the Chrome browser for me on Ubuntu. It doesnt work with Chromium for some reason. SpeakIt comes with two female voices which both sound very realistic compared to everything else out there. There are at least four more male & female voices listed s Chrome extensions if you search the Chrome Web Store using "TTS" as your query.

Usage: For use on a website. you highlight the text you want to be read and either right click and "SpeakIt" or click the SpeakIt icon docked on the Chrome top bar.

Firefox users also have two options. Within Firefox addons, do a search for TTS and you should find "Click Speak" and also "Text to Voice". The voices are not as good as the Chrome SpeakIt voices, but are definitely usable.

The SpeakIt extension uses iSpeech technology and for a price of $20 a year, the site can convert text to MP3 audio files. You can input text, URLs, RSS feeds, as well as documents such as TXT, DOC, and PDF and output to MP3. You can make podcast, embed audio, etc. Here is a link, and a sample of their audio (don't know how long the link will last).

edited Feb 21 '19 at 17:50

Pablo Bianchi

17,552

answered Jan 27 '13 at 00:11

I Heart Ubuntu

2,360

4

Unfortunately none of the browser options work for PDF files. Have you come across one that does? I'd like to be able to select paragraphs to read from a PDF (i.e. not have to paste bits to terminal or other) – James Owers May 07 '16 at 18:05
1

this extension works for me on chromium 50.0.2661.94 using Debian 8.4 and its great! i especially like the english female voice. my only complaint is that it pauses for too long on commas. – mulllhausen Jun 28 '16 at 21:56
It often mispronounces words and also takes time to send the text to a separate server rather then just using your own system. – Goddard Mar 04 '17 at 06:25
Link is broken. – 842Mono Feb 28 '21 at 01:33
output is terrible compared to voicerss - very mechanical – Michael Nov 12 '21 at 17:10

Pablo Bianchi · Answer 4 · 2023-05-04T04:57:09.523

Piper

A fast, local neural text to speech system. Check site project for installation, download of a voice and usage. For e.g.:

echo 'Welcome to the world of speech synthesis!' | \
  ./piper --model blizzard_lessac-medium.onnx --output_file welcome.wav

gTTS, Google Text-to-Speech

gTTS, a Python library and CLI tool to interface with Google Translate's text-to-speech API. Writes spoken mp3 data to a file, a file-like object (bytestring) for further audio manipulation, or stdout.

Cons: CLI-only. Need to be online as it requires requesting to Google public open endpoint.

sudo -H pip install gTTS  # Install

Usage

gtts-cli 'hello' --output hello.mp3
gtts-cli -l es 'Nadie es patria, todos lo somos' | play -t mp3 -

Documentation and more examples

Others

^{Some were already mentioned}

Coqui.ia TTS. Installation:
```
pip install TTS
```

Mimic. Installation:

sudo apt-get install gcc make pkg-config automake libtool libasound2-dev
git clone https://github.com/MycroftAI/mimic.git # take a while
cd mimic
./dependencies.sh --prefix="/usr/local" # take a while
./autogen.sh
./configure --prefix="/usr/local"
make # take a while
make check

Mimic 3. Installation of the plugin:

sudo apt-get install libespeak-ng1  # Install system packages
mycroft-pip install --upgrade pip  # Ensure that you're using the latest pip
mycroft-pip install mycroft-plugin-tts-mimic3[all]  # Install plugin
mycroft-config set tts.module mimic3_tts_plug  # Activate plugin
mycroft-start all  # Start mycroft

eSpeak + Gespeaker (GUI) (Gespeaker source code)

Cons: Old and ugly
```
sudo apt install espeak gespeaker
```
Firefox
- Google Translate, ImTranslator, Dictionary, TTS by Smart Link Corporation
Chromium/Brave/Chrome
- Text to speech that brings productivity
- SpeakIt!
tacotron and mimic2, based on the Google paper

I found piper to be the best. I use this script for "speak selected text" feature: https://medium.com/@IanEdington/natural-sounding-speek-selected-text-for-linux-41025874c019 — IanEdington, Mar 11 '24 at 02:11

score 14 · Answer 5 · edited Jul 26 '21 at 22:52

Simple Google™ TTS

Update from project page (2016): This project is currently unmaintained and will remain so for the foreseeable future.

Because of the lack of a better alternative I wrote a bash script that interfaces with a perl script by Michal Fapso to provide TTS via Google Translate. From the project description:

The intention is to provide an easy to use interface to text-to-speech output via Google's speech synthesis system. A fallback option using pico2wave automatically provides TTS synthesis in case no Internet connection is found.

As it stands, the wrapper supports reading from standard input, plain text files and the X selection (highlighted text).

The main features are:

online TTS synthesis via Google translate
offline TTS synthesis via pico2wave
supports a variety of different languages
can read from CLI, text files and highlighted text
supports reading highlighted text with fixed formatting (e.g. PDF files)

Installation and usage are documented on the project page.

I'd be glad if you gave it a try. Bug reports and any other feedback are welcome!

This has to be one of the coolest projects I've ever seen. Just wow. — , Nov 30 '16 at 21:25

score 13 · Answer 6 · edited Feb 21 '19 at 17:45

13

I have looked high and low for text to speech for Ubuntu that is high quality. There is none. My vocal cords are paralyzed so I needed TTS to add voice instructions to my Ubuntu videos. You can get commercial high quality Linux text to speech software here. It's just really expensive. I ended up buying Natural Reader for Windows (doesn't work in Ubuntu under Wine) for $40. Maybe later I will get the Linux one.

edited Feb 21 '19 at 17:45

Pablo Bianchi

17,552

answered Jul 20 '11 at 17:57

Joe Steiger

139

dude, there is and I was using it like last week there are at least 5 or 6 and I can't for the life of me find any of them now, gotta love our community – mchid Dec 21 '15 at 10:53
1

Textaloud has instructions to make their product work under wine. see http://nextup.com/forum/viewtopic.php?t=3349 I believe that cepstral has a linux port too. I have not been able to get my favorite software balabolka to work. I have windows 10 installed mostly for tts processing. MS David is good and similar to cepstral david. The prior one is free if you have windows 10. – Bhikkhu Subhuti Jun 19 '16 at 11:34

score 8 · Answer 7 · answered Apr 24 '12 at 15:35

8

I have been conducting research on the best sounding and easily tuned text to speech voices. Below is a listing of what I thought were the top 5 products in order of sound quality. Most of the websites associated with these product have an interactive demo that will allow for you to make your own determination.

NeoSpeech
iVona
Acapela
AT&T Natural voices
CereProc Voices

answered Apr 24 '12 at 15:35

Jim

81

3

are there are available for linux? idon't think so – Mehdi Khademloo Dec 03 '16 at 00:36

score 6 · Answer 8 · answered Dec 15 '13 at 00:48

Combine SVOX tools (pico) with LibreOffice:

SVOX (pico) tools are easy to install and brings good quality voices in Ubuntu. Install it:

sudo apt-get install libttspico0 libttspico-utils libttspico-data

You can use LibreOffice in combination with SVOX (pico) tools by install the "Read Text" extension and you obtain a "GUI" for this excellent TTS software:

Set up Read Text Extension's options with Tools - Add-ons - Read selection.... Use /usr/bin/python as the external program. Select a command line option that includes the token (PICO_READ_TEXT_PY), you may want to experiment some of them.

Now you only have to select some text in LO Writer, Calc, Impress or Draw and clic on the icon added as a tool bar (a happy face with a ballon).

score 5 · Answer 9 · answered Nov 09 '11 at 13:56

5

I find Nitech HTS voices on festival very natural and comforting over any other voices I have heard. See this link on how to set up Nitech and other sounds with festival. I have not found a good gui which I can use to configure those voices but setting them via festival.scm still works. That post is very old and you might want to find the actual installation directory using "locate festival" command

answered Nov 09 '11 at 13:56

razor

398

Seems to be very good. Found demos here http://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html – Iacchus Aug 21 '14 at 08:32
3

Yes, the Nitech voices are heads and shoulders above other Festival voices (except the CMU voices, which are also very good.) Too bad they're hard to install. There is one good CMU voice that has a default package in Ubunut, it's called cmu_us_slt_arctic_hts and comes in the package festvox-us-slt-hts. It is much better than pico or espeak! – Jon Watte Apr 25 '17 at 19:23

Ciro Santilli OurBigBook.com · Answer 10 · 2025-07-29T08:40:59.920

Comparison table of free offline CLI software

I think what we need at this point is the big summary table:

Tool	Sounds remotely natural	Output to file	Multilingual	Tested on
pico2wave (libttspico-utils 1.0+git20130326-14)	y. Some weird distortions, but reasonable.	y	`-l fr-FR`	24.04
idiap/coqui-ai-TTS 0.24.1 + Tacotron2	y. Output is randomly different each time. Most words are awesome. Punctuation timing is off. Sometimes it goes completely crazy and it is hilarious.	`--out_path tmp.wav`		24.04
Speech Note 4.7.0 + Mimic3 Arctic Aew Low	y	grom GUI only	y	24.10
Speech Note 4.7.0 + Piper Amy Low Female	y	from GUI only	y	24.10
festival 2.5.0 + festvox-us-slt-hts 2010.10.25	y. Not amazing, but OK. Slight voice distortion and punctuation off.	n	`--language english`	24.04
spd-say (speech-dispatcher 0.12.0)	n	n	`-l fr`	24.04
say (gnustep-gui-runtime 0.30.0)	n	n	n	24.04
espeak 1.48.15	n	`--stdout > tmp.wav`	`-v fr`	24.04
festival 2.5.0	n	n	`--language english`	24.04
svox nanotts d8b91f3	n			24.04
espeak-ng 1.51	n			24.04
piper				24.04
toirtoise-tts 3.0.0				24.04

Empty cell means "unknown, untested".

My quick test strings are:

en: "Hello, my name is John Smith. What is your name?"
fr: "Bonjour, je m'appelle Jean Jacques. Tu t'appelles comment?"

"Remotely natural" is of course extremely subjective, and will suffer from the continual moving of AI goalposts as things evolve and we get used to better systems. For now, maybe I'd consider it something along "good enough for an informal video voiceover".

Piper

Previously mentioned at: https://askubuntu.com/a/1466489/52975

On Ubuntu 24.04 in a clean virtualenv running:

pip install piper-tts

fails with:

ERROR: Cannot install piper-tts==1.1.0 and piper-tts==1.2.0 because these package versions have conflicting dependencies.

bug report: https://github.com/rhasspy/piper/issues/509

pico2wave

On Ubuntu 24.04:

sudo apt install libttspico-utils
pico2wave -w tmp.wav "Hello, my name is John Smith. What is your name?"
ffplay -autoexit tmp.wav

idiap/coqui-ai-TTS

https://github.com/idiap/coqui-ai-TTS

pipx install coqui-tts
tts --text "Hello, my name is John Smith. What is your name?" --pipe_out | aplay

The first time you call it it installs the necessary model automatically.

Sound takes 5-10 s to start coming out on each invocation, which is unacceptable for frequent short sentences.

The default model seems to be Tacotron2: https://github.com/NVIDIA/tacotron2 but you can select other models from CLI.

coqui-ai/TTS

Previously mentioned at: https://askubuntu.com/a/1447599/52975

Does not support python 3.12 (Ubuntu 24.04), pip install TTF fails. Report: https://github.com/coqui-ai/TTS/issues/3257 Collaborator: https://github.com/coqui-ai/TTS/issues/3257#issuecomment-2096792618 says instead use idiap/coqui-ai-TTS

Based on the README similarity it seems to be a fork of https://github.com/mozilla/TTS

festival + festvox-us-slt-hts

Mentioned at: https://askubuntu.com/a/908889/52975 tested on Ubuntu 24.04:

sudo apt install festvox-us-slt-hts
festival -b '(voice_cmu_us_slt_arctic_hts)' '(SayText "Hello, my name is John Smith. What is your name?")'

tortoise-tts

https://github.com/neonbjb/tortoise-tts

On Ubuntu 24.04:

virtualenv -p python3 .venv
. .venv/bin/activate
pip install tortoise-tts==3.0.0

fails with:

ERROR: Failed building wheel for tokenizers

Bug report: https://github.com/neonbjb/tortoise-tts/issues/728

Speech Note supports it and it worked there.

Mimic3

Previously mentioned at: https://askubuntu.com/a/1447599/52975

At https://github.com/MycroftAI/mimic3/issues/83#issuecomment-2740023510 a maintainer said it's not maintained anymore.

On Ubuntu 24.10 I tried:

sudo apt-get install libespeak-ng1
pipx install 'mycroft-mimic3-tts[all]'

but that failed with:

Fatal error from pip prevented installation. Full pip output in file:
    /home/ciro/.local/pipx/logs/cmd_2025-03-20_07.57.51_pip_errors.log

pip failed to build package:
libwapiti
Some possibly relevant errors from pip install:
error: subprocess-exited-with-error
libwapiti/src/api.c:157:36: error: passing argument 4 of ‘tag_nbviterbi’ from incompatible pointer type [-Wincompatible-pointer-types]
libwapiti/src/api.c:157:46: error: passing argument 6 of ‘tag_nbviterbi’ from incompatible pointer type [-Wincompatible-pointer-types]
error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (libwapiti)
Error installing mycroft-mimic3-tts from spec 'mycroft-mimic3-tts[all]'.

Bug report: https://github.com/MycroftAI/mimic3/issues/83

Speech Note supports it and it worked there.

Speech Note

https://github.com/mkiol/dsnote

This project is front-end for a bunch of possible backend TTS and STT models on multiple languages. That is cool because with it you can quickly try several models on a given text to decide which one is better, without having to try to install a bunch of differently broken software systems. Trying:

flatpak install flathub net.mkiol.SpeechNote
flatpak run net.mkiol.SpeechNote

opens a GUI.

Then under:

Languages
English
Text to Speech

I can download a model. Just note that some of them require "voice samples" presumably to clone from, which might or might not be what you want.

Then you can type your text on the GUI an click the "Read" button to hear it.

And to save to a file:

File
Export to a file

Now for CLI-only attempt:

flatpak run net.mkiol.SpeechNote --print-available-models tts

lists models I've downloaded:

        en_coqui_fairseq_eng          "English (Coqui MMS) / en"
        en_piper_us_amy_low           "English (Piper Amy Low Female) / en"
        en_rhvoice_alan               "English (RHVoice Alan Male) / en"
        en_whisperspeech_q4_base_enpl "English (WhisperSpeech Base) / en"

but TODO also opens the GUI. And finally TODO CLI-only TTS? This is a starting point:

flatpak run net.mkiol.SpeechNote \
  --id en_coqui_fairseq_eng \
  --text 'Hello, my name is John Smith. What is your name?'

After changing a setting under:

Settings
Acessibility
Allow external applications to invoke actions

I can also make it speak with:

flatpak run net.mkiol.SpeechNote \
  --id en_coqui_fairseq_eng \
  --text 'Hello, my name is John Smith. What is your name?' \
  --action start-reading-text

but I don't know how to save to file from the CLI.

Related: https://github.com/mkiol/dsnote/issues/83

Tested on Speech Note 4.7.0, Ubuntu 24.10.

Others

No easy CLI instructions:

Bibliography:

I like this a lot but I assume this is too far down the answers to ever get traction. Is it worth making this a community wiki post? — crantok, Feb 15 '25 at 21:30
I found many of these solutions worked on my decade old laptop even if I didn't like the results. I found idiap/coqui-ai-TTS to be too slow to be useable though, e.g. I gave up after 5 minutes of waiting for it to process 3.5K of text. — crantok, Feb 15 '25 at 21:33
@crantok I don't think there's much advantage to community wiki. The main issue is I don't get notifications for upvotes, which is demotivating. But I also want my rep. If people want to join forces, edits in a similar format to the answer and comments are very welcome. About being low down: yes, the fastest gun problem is still with us. But on the other hand, I have also fought many a gun battle and won even when I'm late to the party :-) — Ciro Santilli OurBigBook.com, Feb 15 '25 at 22:26

Pouya Sanooei · Answer 11 · 2013-12-12T02:29:16.687

Here is what I did to have pure natural speech for pdf and other text files(other solutions are not natural or they're just paid services). This is actually a work around using chromium or chrome but works fast and easy.

Install SpeakIt! extension on your chrome or chromium.
Install PDF Viewer if you're using chromium(chrome already has a pdf viewer for free) and check 'Allow in incognito' and 'Allow access to file URLs' options in extensions settings of chromium.
Drag and drop your pdf to browser.
Now highlight some text and right click and select SpeakIt! so you can listen to pure natural text-to-speech.

There's also ways to open other files like .doc and .txt in chrome and do the same. There's other extensions for chrome that view pdf files, check if it fits you better. Besides you can upload all kind of texts in Google Drive and use SpeakIt! to read it for you. Another extension called 'Speak text' works the same way and has natural speech.

Could you elaborate on how to make SpeakIt read pdf files saved in Google Drive? — Marco Lackovic, Sep 24 '14 at 15:12

score 3 · Answer 12 · edited Feb 21 '19 at 17:43

When searching for a better tts engine to use with the new firefox 49 narrative mode I found pico tts (svox) - my favorite TTS engine.

sudo apt install espeak libttspico0 libttspico-data libttspico-utils

How to change the default speech synthesis engine system wide?

People at arch linux brought me to the right path:

Uncomment the module you like and make it default in speech-dispatcher settings:

# sudo vim /etc/speech-dispatcher/speechd.conf

[...]
# -----OUTPUT MODULES CONFIGURATION-----
# Each AddModule line loads an output module.
#AddModule "espeak"       "sd_espeak"   "espeak.conf"
AddModule "pico-generic"  "sd_generic"   "pico-generic.conf"

[...]
#DefaultModule espeak
DefaultModule pico-generic

Restart the daemon:

# sudo systemctl restart speech-dispatcher.service

BUT, when starting firefox again, nothing happens. According to the link above (arch forum post #10 and #16) works with festival (did not try), but the speech-dispatcher for pico does not list available voices. It won't run.

Any idea out there would be highly appreciated ;-)

score 1 · Answer 13 · answered Mar 06 '23 at 14:18

Verbify-TTS

Yes! I encounter the exact same problem you are describing myself. One year ago I created a custom TTS I am using myself since almost two years now, and I open sourced it. It works offline and for free, using AI-based high-quality voice. You can you it everywhere: Firefox browser, PDF reader, chrome, LibreOffice, etc. It supports both Ubuntu and windows.

Feel free to have a look, I just created a video tutorial with installation steps and DEMO: https://youtu.be/hb1ZVwUcPCU

Download link and Project page: https://github.com/MattePalte/Verbify-TTS

Feel free to leave comment/open issue to discuss new ideas, problems or constructive criticism.

Hoping it will help you.

SouthwindCG · Answer 14 · 2016-12-16T05:38:15.360

1

My favorite text-to-speech program is called Magic English, but like Natural Reader mentioned by Joe Steiger, it is a Windows program and I'm not sure if it will run under Wine.

AT&T Natural Voices is available online as a demo, but that's more of a work-around than a solution...

edited Dec 16 '16 at 05:38

answered Jul 20 '11 at 19:10

SouthwindCG

824

Vitaly Zdanevich · Answer 15 · 2017-09-17T07:55:30.403

1

For that I build Intelligent Speaker - extension for Google Chrome. It can read pages even without selection (when text detention is correct).

edited Sep 17 '17 at 07:55

answered Sep 16 '17 at 18:02

Vitaly Zdanevich

1,137

Much better than Speakit! for me, thx – Vahid Pazirandeh Feb 25 '22 at 20:51

score 1 · Answer 16 · edited Jul 26 '21 at 22:54

1

Simple Google™ TTS

Update from project page (2016): This project is currently unmaintained and will remain so for the foreseeable future.

Pico, mbrola, cmu, festival, flite, all SUCK in 2017 (They were amazing in the 90s). AT&T natural speech (which is fantastic) isn't linux compat and it's not free, therefore we use Google

git clone https://github.com/Glutanimate/simple-google-tts.git
sudo apt install xsel libnotify-bin libttspico0 libttspico-utils libttspico-data libwww-perl libwww-mechanize-perl libhtml-tree-perl so$
cd simple-google-tts
sudo ln -s `pwd`/simple_google_tts /usr/local/bin
simple_google_tts en "Text to speech is now installed"
cd -

edited Jul 26 '21 at 22:54

Pablo Bianchi

17,552

answered Nov 29 '17 at 05:32

Jonathan

3,994

3

This is a duplicate of Glutanimate answer (the author of that project). Also: "Status update: This project is currently unmaintained and will remain so for the foreseeable future." He suggests some alternatives – Pablo Bianchi Feb 21 '19 at 17:59
This project is currently unmaintained and will remain so for the foreseeable future.
This script and many others like it rely on an unofficial API that has recently become increasingly difficult to support. As Google continues to lock down access to their TTS interface I see no choice other than to suspend maintaining this script for the time being.
– erwin Jul 19 '21 at 10:48
@PabloBianchi his answer did not have the install code in it – Jonathan Jul 26 '21 at 22:10

score 0 · Answer 17 · edited Feb 23 '21 at 16:53

In Linux systems, you can dump X selection (the text you have selected on your screen with the mouse) to a text file, then read with some TTS (currently I use Google Translate Python script gTTS):

#!/bin/bash
TXT="/tmp/speak.txt"

save X text selection to a file
xclip -out > $TXT
remove smiles
sed -i 's/ :[pP]/./' $TXT
sed -i 's/ ://./' $TXT
sed -i 's/ :D/./' $TXT
sed -i 's/ ;D/./' $TXT
sed -i 's/ :(/./' $TXT
Abbreviations:
sed -i 's/[^a-z]IPv6[^a-z]/I P version 6/gi' $TXT
sed -i 's/[^a-z]MR[^a-z]/merge request/gi' $TXT
sed -i 's/[^a-z]btw[^a-z]/by the way/gi' $TXT
sed -i 's/[^a-z]WIP[^a-z]/work in progress/gi' $TXT
sed -i 's/[^a-z]CLI[^a-z]/command line/gi' $TXT
Latin
sed -i 's/i.e./that is/gi' $TXT
sed -i 's/e.g./for example/gi' $TXT
gtts-cli -f $TXT | play -t mp3 -

Bind this script to some key, for example, right menu key, and every time you select some text in any program: Firefox, Thunderbird, LibreOffice Write, PDF reader, or even Terminal, you will hear the text.

PS. you can also add --slow option to gtts-cli.

Natural Sounding Text to Speech?

17 Answers17

SVOX pico2wave

SpeakIt!

Piper

gTTS, Google Text-to-Speech

Others

Simple Google™ TTS

Comparison table of free offline CLI software

Piper

pico2wave

idiap/coqui-ai-TTS

coqui-ai/TTS

festival + festvox-us-slt-hts

tortoise-tts

Mimic3

Speech Note

Others

Verbify-TTS

Simple Google™ TTS

save X text selection to a file

remove smiles

Abbreviations:

Latin

Linked