Using ScanSnap Manager to OCR non-ScanSnap PDFs

I had some PDFs that I wanted to perform optical character recognition (OCR) processing on. I have a Fujitsu ScanSnap and wanted to use the ScanSnap Manager software to do this. The management software checks supplied PDFs and will only perform procession on those which originated using ScanSnap hardware. I wanted to circumvent this and it ended up being easy.

PDFs created with a ScanSnap have the Exif tag “creator” with the model string value. You can use ExifTool by Phil Harvey to print and modify Exif data. For example:

$ exiftool -creator ~/example.pdf
Creator                         : ScanSnap Manager #iX500

The file example.pdf has the correct tag/value pair and will be processed. The next file, covfefe.pdf, does not. You can add/modify the tag to the PDF which did not originate from a ScanSnap.

$ exiftool -creator="ScanSnap Manager #iX500" ~/covfefe.pdf 
    1 image files updated
$ exiftool -creator ~/covfefe.pdf
Creator                         : ScanSnap Manager #iX500

Voila! The ScanSnap Manager software will now process the PDF. You can certainly use free OCR software but I didn’t find any of them to be quite a slick. Plus this was more fun. 🙂

Using ScanSnap Manager to OCR non-ScanSnap PDFs