6

I need something that can be installed by ubuntu. I don't know how to make my own source code program and that seems to be everything i can find online. I don't need something that uses a browser either. Please include a "sudo apt-get install" command in any answer, as anything else is far too complicated, especially since typing is a problem. I'm fine with installing the ppa for it if needed, as long as a command for that is given as well. I really need a cut and paste way to install something via terminal.

muru
  • 207,970
Liberated
  • 354
  • A bit of Googling told me sudo apt install julius should give you some tools for speech to text. Here is the man page I have never used it, so YMMV. – user68186 Aug 28 '24 at 18:44
  • There are a few questions and answers on this site about this already - eg https://askubuntu.com/questions/5495/speech-recognition-api To be honest, I’ve never found anything very satisfactory. – Will Aug 28 '24 at 19:21
  • julius installed but i can't find any way to make it run. the main page was all confusing to me. – Liberated Sep 05 '24 at 01:00
  • Related: https://askubuntu.com/questions/161515/speech-recognition-app-to-convert-mp3-voice-to-text – Ciro Santilli OurBigBook.com Mar 19 '25 at 13:22

2 Answers2

4

You may try OpenAI's Whisper: https://github.com/openai/whisper

This is an AI model for speech recognition that uses Python. Of course, you need Python and pip installed in your system, and simply install Whisper with:

pip install -U openai-whisper

It is very easy to use and quite powerful. The github page provides the basic steps for using it.

Note that if you are in an Ubuntu version higher than 22, you will not be able to install PyPI modules system-wide using pip. You must create a Python virtual environment and install all the Python modules within that environment, as well as running Whisper within the environment.

  • my computer doesn't like the version of pip it's got. I'll have to wait till my proper computer is back from repairs to try it. – Liberated Sep 05 '24 at 00:57
  • what do you mean by 'the computer doesn't like the version of pip'? – Fernando Roig Sep 06 '24 at 05:02
  • I just did a wrapper around openai-whisper as a GNOME extension:

    https://github.com/kavehtehrani/gnome-speech2text

    – k-war Jun 13 '25 at 08:57
  • Hey folks, I've implemented (or more precisely vibe coded) a prototype solution that seems to work on Wayland : https://github.com/lbke/mic-to-keyboard It doesn't have a GUI as it is intended to run in the background. It requires PortAudio for now, I need to find an audio lib with a wider support. I would need to study @k-war approach to see if I can blend some good ideas in. It could be a good start for an open source solution with more work. – Eric Burel Jul 24 '25 at 17:21
  • 1
    @EricBurel I have updated the extension to support Wayland as well. Due to more restrictive security permission on Wayland it won't auto-insert the transcribed text but it does provide a preview modal to copy and paste. – k-war Jul 31 '25 at 08:10
  • @k-war there seems to be libraries that works with Wayland, in my example "pyinput" worked fine (https://pypi.org/project/pynput/), I've also explored the doc of lower level tools like '/dev/input" with python-evdev, and there's a wayland client in Rust (but wayland-specific solution should probably be avoided) but I haven't tested those. – Eric Burel Jul 31 '25 at 11:56
  • @EricBurel Thanks! I think regardless of the tool you would need to make a rule to access /dev/uinput via super privileges. There's also ydotool which is a more broad xdotool that also works on Wayland. I'll get around to it eventually but for now the copy/paste is fairly seamess and I rather not complicate the extension until I have a bit of a better grasp on Wayland. – k-war Aug 03 '25 at 05:53
1

I just wrote a GNOME extension wrapper around openai-whisper that does exactly this: https://github.com/kavehtehrani/gnome-speech2text

k-war
  • 121