dChan - Q Origins Project Archive

Anonymous ID: 9a9d5c Dec. 10, 2019, 10:39 a.m. No.7473717 🗄️.is 🔗kun >>3731 >>3753

I am not disputing that there are coincidences or that the “Comey/Corney” and other text anomalies are in any way insignificant to our research nor am I denying that it gives us additional items to look into however… If you think this is a slide then do the analysis yourself. Do the due diligence and extract the images yourself, any page, any resolution you wish and run Adobe OCR on the page. If you refuse to do the research yourself you deserve your ignorance on this topic.

Modern OCR also uses spell check references… Comey is a name that is not picked up by the standard English dictionary used in Adobe Acrobat OCR. That is why other words like “government” are not changed. If the person that re-did the OCR scan of the report using better software or run with Comey added as a dictionary word in your OCR software is on here, would appreciate some support from another print production expert on this topic.

NORMIE VERSION:

Upon a simple analysis of the document, anyone can discover that it is scanned in at 150 dpi with highest compression setting and no background suppression, on auto color detect, hence the color/grayscale scan :( Even by looking at just the file size, these specific setting can be inferred as well, based on years of experience doing this type of work. As you can see from this sample image taken from page 2 of the report (blue guides @0.1”, grid and black bar above for dpi reference), the baseline image that Acrobat has to work with for OCR looks like hammered ass. And for you that say, “That’s not what it looks like on my screen you liar.” Adobe programs use filters and anti-aliasing to make crappy docs look better on your screen for reading.

MORE DETAILS:

Due to the way they scanned this document at 150dpi with high compression, it is not a conspiracy nor is it intentionally done. However, these anomalies can still be used as a reference to Q posts and does indeed seem relevant, as we have found repeatedly, future proves past.

Here are the FACTS about scanning documents from a print production specialist, I’m an old grey beard at this point and I’m tired of seeing ignorant people harp on this, bread after bread, as if this were a conspiracy or done intentionally.

There is no "strange font substitution" within this report, period, full stop. OCR mis-identification of text is very common when converting a scanned document in an image format that is not good in resolution and full of compression artifacts, or words are not in the dictionary of the OCR software used. A better resolution for OCR conversion would be from a 300+ dpi, non-compressed image format. But at 476 pages the file would be much larger and not appropriate for download distribution, if done with the auto color detect settings they used.

Additionally, the IG report, in this case, was printed out, comb bound, and then manually redacted with a permanent marker (see page 12 for the comb bound left edge). This method for redactions and distribution is VERY common and obvious to anyone who has looked over documents released by our government. That the comb bind type of scanning artifact is not on every page is not uncommon either. Typically when scanning a comb bound document you would specify about 0.4" or 0.5" scanning margin to eliminate this from appearing, however as these documents are scanned using mechanical means (i.e. from a Xerox copier) and this document being large, pages from a document will not always scan in perfectly aligned, less aligned if scanned in feeding the comb bound edge first. Slight shifting/page edge start does occur which is usually fixed up a bit with document processing and OCR being run.

Typically as my OCD kicks in, for my customers I would have done 4 separate scans. One set at 300dpi and one set at 600 dpi both with less compression, sets of the color pages scanned in color with no background suppression, and sets of the B/W scanned in B/W with background suppression. I would have then combined the scan sets into 2 different PDF’s. One for download and one for print/OCR, both would be good for OCR, with higher the res, the more accurate the OCR. Mind you, preparing the documents this way is a bit more technical requiring a few extra steps, it produces very reasonable file sizes of MUCH better quality.

This is the government, does anyone think that they would take the time to perform these extra steps to get you autist grade OCR results… or would they just toss it in a copier, set it to auto detect color, 150dpi, highest compression, push green button, look at it digitally, click Optimize Scanned PDF then upload?

Seeing ignorant people harp on this, bread after bread, as if this were a conspiracy or done intentionally or claiming that this is gaslighting, have not done any image analysis and have no understanding as to how modern OCR works… an that’s a level of stupid nobody can fix.

Anonymous ID: 9a9d5c Dec. 10, 2019, 10:42 a.m. No.7473734 🗄️.is 🔗kun >>3878

Adobe Acrobat OCR uses spell check… Comey is a name that is not picked up by the standard English dictionary used in Adobe Acrobat OCR. That is why other words like “government” are not changed.

If the person that re-did the OCR scan of the report using better software or run with Comey added as a dictionary word in your OCR software is on here, would appreciate some support from another print production expert on this topic.

I am not disputing that there are coincidences or that the “Comey/Corney” and other text anomalies are in any way insignificant to our research nor am I denying that it gives us additional items to look into however… If you think this is a slide then do the analysis yourself. Do the due diligence and extract the images yourself, any page, any resolution you wish and run Adobe OCR on the page. If you refuse to do the research yourself you deserve your ignorance on this topic.

>>7473728

Adobe Acrobat OCR uses spell check… Comey is a name that is not picked up by the standard English dictionary used in Adobe Acrobat OCR. That is why other words like “government” are not changed.

If the person that re-did the OCR scan of the report using better software or run with Comey added as a dictionary word in your OCR software is on here, would appreciate some support from another print production expert on this topic.

I am not disputing that there are coincidences or that the “Comey/Corney” and other text anomalies are in any way insignificant to our research nor am I denying that it gives us additional items to look into however… Do the due diligence and extract the images yourself, any page, any resolution you wish and run Adobe OCR on the page.

The Comey/Corney and other deviations are because OCR also uses spell check references… Comey is a name that is not picked up by the standard English dictionary used in Adobe Acrobat OCR. The OCR software compares what it thinks are words and makes a best guess using a dictionary file to assist. However, if the document is of sufficient resolution, such deviations do not occur.

The baseline image that Acrobat has to work with for OCR looks like hammered ass. And for you that say, “That’s not what it looks like on my screen you liar.” Adobe programs use filters and anti-aliasing to make crappy docs look better on your screen for reading. Save the file as an image using auto detect resolution, which saves the document as an image at document resolution and you will see what I mean.

>>7473878

That would be a journalist that has auto-correct on and doesn't proof read their work. C'mon, do you honestly think journalists these days even care anymore to proof read or fact back their work… lolz

Anonymous ID: 9a9d5c Dec. 10, 2019, 11:10 a.m. No.7473933 🗄️.is 🔗kun >>3940 >>3947

>>7473878

If you spell Comey, even here depending on your browser, it will show Comey as an incorrectly spelled word. Pair that with older or even newer spell check auto-correct, if Comey is not added to the dictionary, what do you think auto-correct will do to "Comey" →"Corney"

>>7473923

It really just sounds like you haven't done the research yourself. I am a graphic designer/print production specialist and have all the software to do my job exceptionally well. I extracted the images myself and I also understand how ORC software uses a dictionary to match words when best guessing crappy images. And Comey is seen as incorrect as it specifically is not in the dictionary used by Adobe Acrobat OCR, so it defaults to just looking at the image and takes a guess. Do the work yourself like I did this morning and you would know that this is TRUTH.

>>7473942

True, but do journalists even check their work anymore, for facts let alone spelling. What do you think auto-correct does to Comey if it is not in it's dictionary? If you have it set on auto change… you think just maybe it would find a close word match and change what it thinks is wrong to what it thinks is correct. C'mon auto correct and texting is a bitch at times as is having auto-correct set in word. I spell chech after, not on the fly, which auto-correct enabled is default on many word processing programs

>>7474002

Yes I absolutely can do this, I'll put together a poorly scanned document using words that are not in the Adobe OCR dictionary and upload it…. You can do the OCR yourself to prove this. I'll post it as soon as I get a chance

Anonymous ID: 9a9d5c Dec. 10, 2019, 11:29 a.m. No.7474069 🗄️.is 🔗kun >>4077 >>4121

>>7474004

>>7474002

can you duplicate this phenomena with another made up word that OCR shouldnt recognize also and post the results?

Yes, I will produce a poorly scanned document image and you will be able to see this phenomenon replicated directly. Hell, do the OCR yourself when I post it

Perhaps then you will not be an id10t, 6d8eed?

Anonymous ID: 9a9d5c Dec. 10, 2019, 11:31 a.m. No.7474091 🗄️.is 🔗kun >>4123

>>7474077

No not insulting you so much as, if I ran the doc you wouldn't believe me. But I'll post both the image and the converted PDF with OCR run so you can see 4 yourself faggot