I am not disputing that there are coincidences or that the “Comey/Corney” and other text anomalies are in any way insignificant to our research nor am I denying that it gives us additional items to look into however… If you think this is a slide then do the analysis yourself. Do the due diligence and extract the images yourself, any page, any resolution you wish and run Adobe OCR on the page. If you refuse to do the research yourself you deserve your ignorance on this topic.
Modern OCR also uses spell check references… Comey is a name that is not picked up by the standard English dictionary used in Adobe Acrobat OCR. That is why other words like “government” are not changed. If the person that re-did the OCR scan of the report using better software or run with Comey added as a dictionary word in your OCR software is on here, would appreciate some support from another print production expert on this topic.
NORMIE VERSION:
Upon a simple analysis of the document, anyone can discover that it is scanned in at 150 dpi with highest compression setting and no background suppression, on auto color detect, hence the color/grayscale scan :( Even by looking at just the file size, these specific setting can be inferred as well, based on years of experience doing this type of work. As you can see from this sample image taken from page 2 of the report (blue guides @0.1”, grid and black bar above for dpi reference), the baseline image that Acrobat has to work with for OCR looks like hammered ass. And for you that say, “That’s not what it looks like on my screen you liar.” Adobe programs use filters and anti-aliasing to make crappy docs look better on your screen for reading.
MORE DETAILS:
Due to the way they scanned this document at 150dpi with high compression, it is not a conspiracy nor is it intentionally done. However, these anomalies can still be used as a reference to Q posts and does indeed seem relevant, as we have found repeatedly, future proves past.
Here are the FACTS about scanning documents from a print production specialist, I’m an old grey beard at this point and I’m tired of seeing ignorant people harp on this, bread after bread, as if this were a conspiracy or done intentionally.
There is no "strange font substitution" within this report, period, full stop. OCR mis-identification of text is very common when converting a scanned document in an image format that is not good in resolution and full of compression artifacts, or words are not in the dictionary of the OCR software used. A better resolution for OCR conversion would be from a 300+ dpi, non-compressed image format. But at 476 pages the file would be much larger and not appropriate for download distribution, if done with the auto color detect settings they used.
Additionally, the IG report, in this case, was printed out, comb bound, and then manually redacted with a permanent marker (see page 12 for the comb bound left edge). This method for redactions and distribution is VERY common and obvious to anyone who has looked over documents released by our government. That the comb bind type of scanning artifact is not on every page is not uncommon either. Typically when scanning a comb bound document you would specify about 0.4" or 0.5" scanning margin to eliminate this from appearing, however as these documents are scanned using mechanical means (i.e. from a Xerox copier) and this document being large, pages from a document will not always scan in perfectly aligned, less aligned if scanned in feeding the comb bound edge first. Slight shifting/page edge start does occur which is usually fixed up a bit with document processing and OCR being run.
Typically as my OCD kicks in, for my customers I would have done 4 separate scans. One set at 300dpi and one set at 600 dpi both with less compression, sets of the color pages scanned in color with no background suppression, and sets of the B/W scanned in B/W with background suppression. I would have then combined the scan sets into 2 different PDF’s. One for download and one for print/OCR, both would be good for OCR, with higher the res, the more accurate the OCR. Mind you, preparing the documents this way is a bit more technical requiring a few extra steps, it produces very reasonable file sizes of MUCH better quality.
This is the government, does anyone think that they would take the time to perform these extra steps to get you autist grade OCR results… or would they just toss it in a copier, set it to auto detect color, 150dpi, highest compression, push green button, look at it digitally, click Optimize Scanned PDF then upload?
Seeing ignorant people harp on this, bread after bread, as if this were a conspiracy or done intentionally or claiming that this is gaslighting, have not done any image analysis and have no understanding as to how modern OCR works… an that’s a level of stupid nobody can fix.