I used tesseract to produce the special html to use with hocr2pdf starting from a muti-page tif.
I tried using hoc2pdf to produce a "sandwich pdf" (image + hidden text layer).
Hocr2pdf produces a one page pdf with all the pages superimposed.
Is there a way to solve this problem or an alternative solution?
The only thing missing is the ability to tweak the OCR for the inevitable errors... is there any way to open the already OCR'd pdfs in OCRfeeder? I found its interface quite easy to use but I can't seem to get it to detect the existing text layer... I also tried PDFedit but its interface is way too advanced for mortals like me!
– waldyrious Jun 19 '13 at 05:11-enforcehocr2pdf" parameter? – Pablo Bianchi Mar 08 '17 at 15:18