How to read pdf with tesseract?


#1

Hello, I need to develop a solution to read the contents of a PDF via OCR. I saw tesseract allow me to do this reading, but it only reads images. Does anyone know how I can convert PDF to image and feed the tesseract?

Tnks


#2

Text is stored in a PDF as text, unless the text itself is an image of course.


#3

in case the PDF is a scanner, then I need to get the contents through the same OCR