I did some quick testing with pytesseract and had mixed results. I didn't spend a lot of time on trying to tweak it. I tried running it without any training then training with the impact font(seems like a lot of meme images have that). I didn't get a clean output either way.
For anyone interested in trying this, you will need to install docker.
After you have docker installed:
Copy into Dockerfile:
=
FROM python:3.7
RUN apt-get update \
&& apt-get install tesseract-ocr -y
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
=
Copy into requirements.txt:
=
pillow
pytesseract
opencv-python
=
Copy into ocr1.py:
=
from PIL import Image
import pytesseract
print(pytesseract.image_to_string(Image.open('memes/test1.jpeg')))
==
Then build the docker image:
docker build . -t ocr:latest
Then run the docker image and mount your code directory (this will allow you to make code changes without having to rebuild the image):
docker run -v ~/code/pytesseract:/app -it ocr:latest bash
NOTE: I have ~/code/pytesseract, you will need to change this if your code location is different
This will bring you into your docker container and you can run your code in there:
python ocr1.py
Under your code directory you can access your images. I have a 'memes' subdir. Whatever png or jpg image you copy there should be accessible for the OCR to run against.
This file:
http://ahijackedlife.com/wp-content/gallery/qanon-memes-3f/Anderson-its-OK.jpg
resulted in output:
ANDERSON, IT'S
CE
WE GOT THIS= WELL
Aa
IS A"CONSPIRACY THEORYโ
And with the Impact font training data output:
ANpERSoNl lTS
YGoNEBEoK
WEGoTATHlSNWEfLL
TELL lEM THAT THE sToRM
lsAUcoNSPlRAcv THEonU
Larger example here:
https://www.pyimagesearch.com/2017/07/10/using-tesseract-ocr-python/
Free online font training:
http://trainyourtesseract.com/
If you train fonts, copy your trained data to your native OS code directory, then in your docker environment, copy it to the tesseract data directory inside of docker:
cp training/Impact.traineddata /usr/share/tesseract-ocr/4.00/tessdata/.
I hope this is readable!