Anonymous ID: 7306d4 Jan. 19, 2020, 3 p.m. No.7857304   🗄️.is 🔗kun   >>7332 >>1582

I did some quick testing with pytesseract and had mixed results. I didn't spend a lot of time on trying to tweak it. I tried running it without any training then training with the impact font(seems like a lot of meme images have that). I didn't get a clean output either way.

 

For anyone interested in trying this, you will need to install docker.

After you have docker installed:

Copy into Dockerfile:

=

FROM python:3.7

 

RUN apt-get update \

&& apt-get install tesseract-ocr -y

 

WORKDIR /app

 

COPY . /app

 

RUN pip install -r requirements.txt

=

 

Copy into requirements.txt:

=

pillow

pytesseract

opencv-python

=

 

Copy into ocr1.py:

=

from PIL import Image

import pytesseract

 

print(pytesseract.image_to_string(Image.open('memes/test1.jpeg')))

==

 

Then build the docker image:

docker build . -t ocr:latest

 

Then run the docker image and mount your code directory (this will allow you to make code changes without having to rebuild the image):

docker run -v ~/code/pytesseract:/app -it ocr:latest bash

 

NOTE: I have ~/code/pytesseract, you will need to change this if your code location is different

 

This will bring you into your docker container and you can run your code in there:

python ocr1.py

 

Under your code directory you can access your images. I have a 'memes' subdir. Whatever png or jpg image you copy there should be accessible for the OCR to run against.

 

This file:

http://ahijackedlife.com/wp-content/gallery/qanon-memes-3f/Anderson-its-OK.jpg

 

resulted in output:

ANDERSON, IT'S

CE

 

 

WE GOT THIS= WELL

Aa

IS A"CONSPIRACY THEORY”

 

And with the Impact font training data output:

ANpERSoNl lTS

YGoNEBEoK

 

 

WEGoTATHlSNWEfLL

TELL lEM THAT THE sToRM

lsAUcoNSPlRAcv THEonU

 

Larger example here:

https://www.pyimagesearch.com/2017/07/10/using-tesseract-ocr-python/

 

Free online font training:

http://trainyourtesseract.com/

 

If you train fonts, copy your trained data to your native OS code directory, then in your docker environment, copy it to the tesseract data directory inside of docker:

cp training/Impact.traineddata /usr/share/tesseract-ocr/4.00/tessdata/.

 

I hope this is readable!

Anonymous ID: 7306d4 Jan. 19, 2020, 3:04 p.m. No.7857332   🗄️.is 🔗kun   >>8419

>>7857304

I forgot - to use font training data, add 'lang' to the image_to_string function.

For example: lang="Impact" will use the training data Impact.traineddata that was copied in the last step above

 

Full training data example usage:

print(pytesseract.image_to_string(Image.open('memes/test1.jpeg'), lang="Impact"))