I had some PDFs that I wanted to perform optical character recognition (OCR) processing on. I have a Fujitsu ScanSnap and wanted to use the ScanSnap Manager software to do this. The management software checks supplied PDFs and will only perform procession on those which originated using ScanSnap hardware. I wanted to circumvent this and it ended up being easy.
PDFs created with a ScanSnap have the Exif tag “creator” with the model string value. You can use ExifTool by Phil Harvey to print and modify Exif data. For example:
$ exiftool -creator ~/example.pdf Creator : ScanSnap Manager #iX500
The file example.pdf has the correct tag/value pair and will be processed. The next file, covfefe.pdf, does not. You can add/modify the tag to the PDF which did not originate from a ScanSnap.
$ exiftool -creator="ScanSnap Manager #iX500" ~/covfefe.pdf 1 image files updated $ exiftool -creator ~/covfefe.pdf Creator : ScanSnap Manager #iX500
Voila! The ScanSnap Manager software will now process the PDF. You can certainly use free OCR software but I didn’t find any of them to be quite a slick. Plus this was more fun. 🙂