Cross posting from /pol/ but of interest here.
A number of Clinton emails were released 1st, and 5th March on the FOIA website
https:// foia.state.gov/Search/results.aspx?searchText=*&caseNumber=F-2016-07895
I clicked on each and copied the URLs, a list of which can be found here:
https:// ghostbin.com/paste/k4227
If you wanted to dig them you can use the method I made to speed things up:
AUTOMATED PDF DIGGING
This is a method I used to dig Hillary's emails you may find useful.
-Requires Linux type Operating System
-Ensure you have pdftotext installed
-Put list of pdf URLs into a file, one URL on each line.
-Save as allpdfs.txt
-Run following command from terminal:
for i in grep -E '^htt.*\.pdf$' allpdfs.txt
;do foo=$(basename $i);wget $i; pdftotext -r 300 $foo; rm $foo; done
It will get each pdf by its URL in the allpdfs.txt file, change it into a text file (scanning it at 300dpi), save it with the name of the pdf, but with extension txt.
For example, if a line in the file "allpdfs.txt" contains:
http:// some.website.com/a/b/c/d/grab_this.pdf
the output would be
grab_this.txt
which is a text conversion of the pdf document.
The original pdf is deleted to save space (remove the "rm $foo;" part if you want to keep the original pdf too).
CAVEATS: Obviously pictures are ignored. Transformations may include spelling mistakes, especially for scanned pdfs, and depends on document clarity.
DIGGING
Now you have a text version of your pdfs you can simply search them all with grep
#basic search:grep keyword txt #search for a phrase:grep 'some phrase' txt#case insensitive search (will find keyword, Keyword, kEYWORD, etc):grep -i keyword *txt #consult grep manual for other options:man grep
Reversing to find the URL
If you have found a match (say for keyword "B1" indicating classified emails), and want to know what the original URL was:
for i in grep -l B1 *.txt
;do j=$(basename $i .txt); grep $j allpdfs.txt >>B1sources.txt;done
Classified emails I've found in the current set:
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_34/F-2016-07895/DOC_0C06160560/C06160560.pdf
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_34/F-2016-07895/DOC_0C06160696/C06160696.pdf
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_34/F-2016-07895/DOC_0C06160703/C06160703.pdf
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_34/F-2016-07895/DOC_0C06160705/C06160705.pdf
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_34/F-2016-07895/DOC_0C06160713/C06160713.pdf
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_33/F-2016-07895/DOC_0C06160798/C06160798.pdf
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_33/F-2016-07895/DOC_0C06160818/C06160818.pdf
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_33/F-2016-07895/DOC_0C06160824/C06160824.pdf
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_33/F-2016-07895/DOC_0C06160844/C06160844.pdf
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_33/F-2016-07895/DOC_0C06160849/C06160849.pdf
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_33/F-2016-07895/DOC_0C06160876/C06160876.pdf
https:// foia.state.gov/searchapp/DOCUMENTS/Litigation_F-2016-07895_33/F-2016-07895/DOC_0C06161014/C06161014.pdf