5

When I try to detect text on my jpeg, it shows correctly all areas where it suspects text and images, but when I export it to ODT it only creates an ODT with empty text- and imageframes.

Do I have to configure tesseract somehow?

(I use Ubuntu 14.10 32bit)

rubo77
  • 34,212

2 Answers2

3

Try this:

Open the ocrfeeder program.

Edit the engine: Click Tools - OCR Engine

Select the Tesseract engine and click Edit

Where it says arguments engine changed the script for this:

$IMAGE $FILE -l eng -psm 3 > /dev/null 2> /dev/null; cat $FILE.txt; rm $FILE $FILE.txt

To export the document click File - Export

Select the desired output format.

If the document has pictures I advise using the html format text.

If only has text the best is to use the format plain text txt .

kyodake
  • 18,075
  • 2
    You just need to setup the engine command line on OCR Feeder settings. replacing $LANG with -l lang_id where lang_id is the id as shown on the correspondin language package. The lang_ids can be found with apt-get search tesseract-ocr for example spa = spanish, fra = french, deu = german, nld = dutch; ita = italian, por = portugese, ... If you just want to scan in your language, you can stick with $LANG which is your system language – rubo77 Jul 04 '15 at 21:17
  • 1
    @rubo77 thanks for the hint! But i think you mean apt-cache search – Murmel Oct 14 '15 at 08:20
0

I have better results with : $IMAGE $FILE -l eng > /dev/null 2> /dev/null; cat $FILE.txt; rm $FILE $FILE.txt in tools > Roc engines > tesseract > modify > parameters of the engine.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. – Community Jun 24 '24 at 18:41