dChan - Q Origins Project Archive

Indisputable Proof that OCR is not perfect.

Comey/Corney was not an “intentional” placement.

Search Term: How does Adobe Acrobat OCR work

Adobe Acrobat Export PDF supports optical character recognition. OCR is the conversion of images of text (scanned text) into editable characters, so that you can search, correct, and copy the text. When OCR is enabled, Adobe Acrobat Export PDF performs OCR on PDF files that contain images, vector art, hidden text, or a combination of these elements. (For example, Adobe Acrobat Export PDF performs OCR on PDF files created from scanned documents.) Adobe Acrobat Export PDF also performs OCR on text that it can't interpret because the text was encoded incorrectly in the source application. The OCR engine uses the selected language to interpret the scanned text. Selecting the correct language improves the accuracy of the conversion, as the OCR engine uses language-specific dictionaries for conversion.

Modern OCR uses many techniques to sort scanned document images. One method that assists this process involves spell checking for close matches. Comey is a word that is not picked up by the standard English dictionary used in Adobe Acrobat OCR. That is why some words like "government" are read correctly. However, basic consumer OCR tools like Adobe Acrobat document processing optimizations are not perfect. Sometimes it works, sometimes it doesn't.

If you have ever scanned an actual printed document you would know and understand that there are variations in coloring of the paper. A pure digital white [rgb 255:255:255] is not perfectly “white” when scanned from a hardcopy. In fact, scanning hardcopies have slight variations in coloring and gradient throughout the document. Kerning, line spacing, font variations and document skew also affects how well OCR programs will work. Just because we perceive with our eyes words that look identical, does not mean that the OCR software can. Couple this with a 150dpi document resolution and high compression, which introduces artifacts and edge blurring around gradient transitions, a word printed on one part of the page will look slightly different than that same word on another section of the page. Just because it looks clear to your eye, everything looks different to your computer.

Even with perfect high resolution sections of a document, the OCR software can and will make mistakes. The Adobe OCR software used on this document is not perfect. It does better with some words than others. It just so happens, that “Comey” among many other words, Adobe OCR has a difficult time with.

As I have said before, I am not disputing that there are coincidences or that the “Comey/Corney” and other text anomalies are in any way insignificant to our research nor am I denying that it gives us additional items to look into.

Future proves past after all, however…

If you think this is a slide then do the analysis yourself. Do the due diligence and extract the images yourself, any page, any resolution you wish and run Adobe OCR on the page. If you refuse to do the research yourself, then you deserve to be ignorant on this topic.

Here IS indisputable proof of this fact: