AUTOMATED PDF DIGGING
This is a method I used to dig Hillary's emails you may find useful.
-Requires Linux type Operating System
-Ensure you have pdftotext installed
-Put list of pdf URLs into a file, one URL on each line.
-Save as allpdfs.txt
-Run following command from terminal:
for i in grep -E '^htt.*\.pdf$' allpdfs.txt
;do foo=$(basename $i);wget $i; pdftotext $foo; rm $foo; done
It will get each pdf by its URL in the allpdfs.txt file, change it into a text file, save it with the name of the pdf, but with extension txt.
For example, if a line in the file "allpdfs.txt" contains:
http:// some.website.com/a/b/c/d/grab_this.pdf
the output would be
grab_this.txt
which is a text conversion of the pdf document.
The original pdf is deleted to save space (remove the "rm $foo;" part if you want to keep the original pdf too).
CAVEATS: Obviously pictures are ignored. Transformations may include spelling mistakes, especially for scanned pdfs, and depends on document clarity.
DIGGING
Now you have a text version of your pdfs you can simply search them all with grep
#basic search:grep keyword txt #search for a phrase:grep 'some phrase' txt#case insensitive search (will find keyword, Keyword, kEYWORD, etc):grep -i keyword *txt #consult grep manual for other options:man grep