dChan - Q Origins Project Archive

Anonymous ID: e39ab8 April 10, 2018, 2:31 p.m. No.987352 🗄️.is 🔗kun >>7515 >>7600

I'm speaking from a Linux operating system perspective (though aspects may apply to the equivalent windows version)

wget manual can be downloaded here:

this board doesn't allow pdfs

I would also recommend using the -U or –user-agent options to change how the website sees the wget program. (wget can impersonate a browser when making connections). This can get around some sites that actually look for and filter wget connections.

(see p14 of attached manual)

>>983930

It looks like you were in that poster thread anon ;^)

>it works but does not fetch the fullsize images, only the thumbnails, and I haven't figured out how to modify your wget mirror -the-whole-qresearch command to also fetch images and adjust references so the local HTML pages refer to locally-mirrored images

fetch fullsize: adjust your recursion depth, if I recall correctly from -l1 to -l2 as with -l1 you would only be grabbing that page's content, not anything it links to (the full size images). I think there was a way to only get the items at depth level 2 (the full size content)

adjust references: the pages should contain relative links from the current page to the other and not absolute links (i.e. page1.html has a link to page2.html not http:// somewebsite/fulladdress/page2.html). You may wish to look at using the -m option for site mirroring. WARNING: it has infinite recursion depth and can chew disk space as it attempts to grab anything linked, and anything those links point to, etc, ad infinitum! MAKE SURE -l depth is set to stop it.

See this webpage for more:

https:// stackoverflow.com/questions/4602153/how-do-i-use-wget-to-download-all-images-into-a-single-folder-from-a-url

>>984029

>>985562

>How would a clever anon go about saving a person of interest's entire twitter feed

Off the top of my head, you would be looking at a scraper script using curl for server requests, most likely written in Python, Perl, PHP, or similar. Search "twitter scraper" for lots of hits on the sort of thing you'd be using. There are lots of scripts on gitHub or similar.

Getting a list of URLs when you have 1 per line in a file "grab-these-URLs.txt"

wget -i grab-these-URLs.txt

Downloading videos from YouTube, Twitter, basically anywhere

youtube-dl -F http-URL-goes-here

will give you a list of the available formats to download with a CODE by each (on sites like YouTube) (e.g. CODE 18 RESOLUTION 1920X1080, CODE 22 RESOLUTION 1280X720….)

youtube-dl -fCODE http-URL-goes-here

will download that version specified by the CODE

youtube-dl http-URL-goes-here

will download the best quality version of the video (= largest file size)

Creating a folder for each day of the year on Linux not sure if you'd need this, but it was on my mind for some reason

mkdir -p {01,03,05,07,08,10,12}/{01..31} 02/{01..28} {04,06,09,11}/{01..30}

This will create a folder for each month 01-12. Inside each month folder will be 30 or 31 folders for each day of the month, except 28 for February (which can be changed to 29 for a leap year)

Anonymous ID: e39ab8 April 10, 2018, 3:03 p.m. No.987830 🗄️.is 🔗kun >>8012 >>8778

>>987606

>When you want to admit it, you'll need it to be certified by the court,

You'll may well be looking at "digital timestamping" and creating file/document hashes. You might be interacting with a time server/authentication party via OpenSSL.

The digital timestamp proves it was created after some other document, which in turn was created after a different document, and so on, right back to the beginning of the chain - anchored in time. This proves 1) the file existed at 2) that point in time.

The file or document hashes are a one way math operation to reduce a file into a signature. Every signature of a file is different, and if you change one character of the file the new signature will change dramatically. This can be used to prove a document copy is identical to the original, or a downloaded file was not corrupted/intercepted during download.

For example, a free service for timestamping documents (1st result from a quick search)

https:// www.freetsa.org/index_en.php

There are other ways it can be done with file hashes in a crypto currency blockchain.

Anonymous ID: e39ab8 April 10, 2018, 3:26 p.m. No.988164 🗄️.is 🔗kun >>4257

A useful one line script to grab pdfs listed 1-per-line in the file "pdflist", and convert each into text

for i in grep -E '^htt.*\.pdf$' pdflist;do foo=$(basename $i .pdf);wget $i; pdftotext -r 300 $foo; done

What this does:

for i in :perform a loop

grep -E '^htt.*\.pdf$' pdflist :search for any lines in the file "pdflist" using a regular expression where the line starts with "htt" and ends with ".pdf" - essentially this matches any URLs listing PDFs in the file.

;do foo=$(basename $i .pdf) :call a variable "foo" the basename of $i - this strip "http….some.thing/blah/whatever.pdf" down to "whatever"

;wget $i :grab the PDF at the URL

;pdftotext -r 300 $foo :convert the grabbed PDF "htt…../blah/whatever.pdf" into text using a print level scan resolution (300ppi) and save it in "whatever.txt"

;done :loop for the next URL found in the "pdflist" file.

I've used this when downloading Hillary Clinton Emails from the Judicial Watch website.

I end up with text versions of the 1000s of emails. I can then use the grep program to search through them all and get a list of search term matches.

grep -oHC4 pizza *.txt

search all files ending in ".txt" for the term pizza, print out the matched parts, along with the filename and in context of 4 lines (e.g. 2 lines before the match & 2 lines after)

The search term I first look for isn't "pizza" but "B1" because this indicates Classified Emails.

>>987937

I'd recommend using pastebin or similar and post the URL.

If it is a file collection then you could use filedropper, mixtape.moe, MEGA NZ, or similar to upload the zip archive.

If the files are small, you could create a b64 from the zip then upload the resulting text file to pastebin, but only oldfags/nerds may understand how to decode it.

>>988648

>You actually do kind of get a timestamp, in that the creation of the files on your computer have a creation date as they are written. So long as you don't go about editing them, the date remains intact.

This would be insufficient proof, since anyone can change the clock on your computer to give any date.

The "digital timestamping" and file hashes mentioned previously are a recognized form of document authentication.

>>987830

Anonymous ID: e39ab8 April 10, 2018, 4:20 p.m. No.988857 🗄️.is 🔗kun >>6231 >>8724

>>988648

>1 and 2 got me his profile, 3 got me his profile in a ton of different languages and maybe a month's worth of tweets (with Chinese headings).

You may wish to see if there are filtering options, so you can ignore content that doesn't match the filter.

So a depth of 3 + filter will ignore irrelevant information like the Chinese headings.

This would be equivalent to

-r -l3 -A ext1,ext2,ext3

in wget (to recursively grab to a depth of 3 files ending in "ext1", "ext2", or "ext3"