Anonymous ID: 23d7ce How to archive a website offline April 9, 2018, 5:40 p.m. No.974637   🗄️.is 🔗kun   >>1764 >>0990 >>1142 >>6432 >>2172 >>2298 >>4029 >>5562 >>7600 >>0154

This is a guide for archiving websites offline using HTTrack:

https:// www.httrack.com/

 

If someone knows of a different method, please feel free to talk about it here. Also, I'm no expert on this…so any tips are welcome.

 

The benefits of copying (aka "mirroring") websites or website pages offline are myriad. For one, you know it won't be deleted unless you delete it. For another, you get a complete copy of the structure of the site–from the directory structure on down. This has actually led me to find folders that I wouldn't have otherwise knew existed, as well as other things that the designer might have tucked away.

 

The downside is that it can be slightly complicated. That's why this anon is writing this guide–to help you identify common errors, and how to overcome them. By twiddling with certain settings and understanding what they mean, you'll be getting results in no time.

Anonymous ID: 23d7ce April 9, 2018, 5:43 p.m. No.974690   🗄️.is 🔗kun   >>2298

Step 1: go to this page and select the version appropriate for your system:

https:// www.httrack.com/page/2/en/index.html

 

Go ahead and install it where you want. I can't remember what options pop up, but if it asks you where you would like to archive your stuff, give it an appropriate directory–there will be one folder for each site/page you download, so I recommend you have a completely separate directory/folder so it doesn't mess everything else up.

Anonymous ID: 23d7ce April 9, 2018, 5:45 p.m. No.974732   🗄️.is 🔗kun   >>7620

Step 2: Once you've installed it, click "next." The blackened area in the image is where you'll see your directory structure. I've hidden mine so you don't see my 2.9 TB directory of nasty midget porn.

Anonymous ID: 23d7ce April 9, 2018, 5:49 p.m. No.974792   🗄️.is 🔗kun

Step 3: Give your project an appropriate name; I recommend naming it after the website that you're going to mirror. After that, put it into an appropriate category–in my example, I've put it under "qresearch_NK", which is where I've put my other North Korea-related mirrors as well.

 

After that, click on "next"

Anonymous ID: 23d7ce April 9, 2018, 5:51 p.m. No.974849   🗄️.is 🔗kun

Step 4: copy & paste the url from your browser into the indicated box. Before you move forward, the most important step comes up: you have to set the options.

 

These options are rarely "one size fits all." Different websites have different setups, so you've got to adapt your setup in order to get what you want. We'll get into that next.

 

After your options are set, click "next"

Anonymous ID: 23d7ce April 9, 2018, 5:53 p.m. No.974883   🗄️.is 🔗kun

Step 4a: If you aren't using a proxy, un-click the "use proxy for ftp transfers" box under the "Proxy" tab.

Anonymous ID: 23d7ce April 9, 2018, 6 p.m. No.975019   🗄️.is 🔗kun

step 4b: Under the "Scan Rules" tab, make sure you check each of the boxes if you want to download that type of media for the page. Typically you want to get the pictures that go with the site, so check the "gif, jpg, jpeg…" box. If there are movies on the site that you want, check the "mov, mpg, mpeg…" box. If the website has files that you can download, select the "zip, tar, tgz…" box.

 

What you select is really about what you're after–if you want a complete record, select all of them…but if you just need the text, don't check any. Whether or not you select these items can make a huge difference in how big the result is–but don't worry: if it's taking to long or the result is getting too huge, you can always cancel and try again.

Anonymous ID: 23d7ce April 9, 2018, 6:03 p.m. No.975096   🗄️.is 🔗kun

Step 4c: This setting will tell HTTrack how to go through the website. If you want more information, go here:

https:// moz.com/learn/seo/robotstxt

 

I'd say keep it off, but sometimes you run into issues with this setting…so I'm mentioning it here because having the wrong setting often gives an error, and you can try twiddling it between "follow" and "don't follow" to fix the error.

Anonymous ID: 23d7ce April 9, 2018, 6:08 p.m. No.975203   🗄️.is 🔗kun

Step 4d: Under the "Browser ID" tab, you have the option of setting your "Browser Identity". I'm guessing that, by telling the website which browser that you're using, the website will present certain features in order to take advantage of that browser. If you find that you get an error almost immediately after trying to move forward (like pic related), go into your options and change these to "none" and it should clear it up.

Anonymous ID: 23d7ce April 9, 2018, 6:14 p.m. No.975296   🗄️.is 🔗kun

Step 4e: If, after you mirrored the website you've found that you didn't get what you wanted, you might try messing with these settings. Essentially what they do is tell HTTrack how to move about the website–can it only move downward through the directory structure, or can it go upward as well?

 

Depending on how the website is set up, you may have to mess with these…but I suggest making the settings slightly less restrictive each time, until you get only what you need. The reason I say this is that you may find yourself downloading all manner of things from every website connected to your target–every ad from the ad sites, every movie from links, etc. When I was downloading liddlekidz.org, I found myself well past 2 GB before I realized that I wasn't just getting stuff from that website–I was pulling stuff from at least a dozen websites, and most of it was being downloaded before the stuff from liddlekidz. So be conservative here, otherwise you're wasting your time and hard drive space.

Anonymous ID: 23d7ce April 9, 2018, 6:16 p.m. No.975311   🗄️.is 🔗kun

Step 5: after you've set your options and clicked "next", you'll get to this page. Just click "next," and hopefully everything goes well.

Anonymous ID: 23d7ce April 9, 2018, 6:28 p.m. No.975521   🗄️.is 🔗kun   >>0849

Step 6: After HTTrack has completed, you'll get this page. If there's an error, you'll get a flashing notifier–you can take a look at the log to get the details, and use that information to search the web for a solution. For the most part, twiddling with the settings that I've mentioned will handle any of the errors you get…and it won't take long before you get familiar with them.

 

Sometimes, an error is essentially meaningless. For instance, I often get errors that state that HTTrack couldn't get an image from an ad site because of my settings–that's not important, so I don't worry about it.

 

You can click on "browse mirrored site" to see how your copy looks. If you're unhappy, change the options and try again.

 

Finally, you can go into your archive folder, and you'll see a new folder with the project. You can go in there any time, click on "index.html," and it will open up your fresh new copy of the website.

 

Now get out there and archive offline!

 

One final note: if you find something really, like that image of Hillary sacrificing children to Moloch that we all know is floating around, make sure to archive first before you come here and tell everyone else. We know that 8ch is being watched, and by blabbing what you've found you're giving them a chance to pull their stuff offline before anyone else can get to it. But once you've got it, then by all means, tell everyone–the more people there are that have a copy, the better it is for you…after all, you don't want to be the only person with that kind of evidence on your hard drive, do you?

 

Happy archiving!

Henry Case ID: ae3e60 Archiving the Web April 9, 2018, 9:50 p.m. No.978477   🗄️.is 🔗kun   >>8833 >>2212 >>2298 >>3930 >>4029 >>5562 >>7352 >>7600

There are a LOT of options for archival, as listed in graphic (1) and article (2). There are synopses for Wget in (3) and (4).

 

I prefer Wget, for the simple reason of power and flexibility. Those of you who use *nix prolly already know this, but the mirror of choice is Wget, and that goes really well if you have access to s VPS. You can queue it up and then tgz and sftp it when it's complete. Sometimes it can take days to mirror a full site if they've got aggressive leech protection.

 

You'll want to be aware of the robots option and the retry option, if you notice a server blocking your access because of too many requests in rapid succession, or a bitchy robots.txt.

 

MAN:: Wget - The non-interactive network downloader.SYNOPSIS wget [option]... [URL]...OPTIONS Download Options -w seconds --wait=secondsWait the specified number of seconds between the retrievals. Use of this option is recommended, as it lightens the server load by making the requests less frequent. Instead of in seconds, the time can be specified in minutes using the "m" suffix, in hours using "h" suffix, or in days using "d" suffix.Specifying a large value for this option is useful if the network or the destination host is down, so that Wget can wait long enough to reasonably expect the network error to be fixed before the retry. The waiting interval specified by this function is influenced by "--random-wait", which see.

 

My recommended initial configuration is below, but I'm sure you can tailor it to suit your needs.

wget --mirror --page-requisites --adjust-extension --no-parent --no-clobber --no-check-certificate --convert-links -e robots=off https:// example.com/

 

Happy archiving.

Anonymous ID: 9d7f52 Position on front QResearch index, AND each new bread April 10, 2018, 7:52 a.m. No.982298   🗄️.is 🔗kun

>>974690

>>974637

>>978477

>>978833

 

Brilliant. This was what I was asking for.

 

  1. It should linked to at the top of each new bread's Resources,

 

AND

 

  1. on first page of qresearch index with the "ARCHIVE, ARCHIVE, ARCHIVE EVERYTHING OFFLINE" instructions for "newfags"/"normies".

Anonymous ID: 67f895 April 10, 2018, 10:12 a.m. No.983930   🗄️.is 🔗kun   >>4029 >>5562 >>7352 >>7600

>>978477

Have used wget before but always with hesitation because the options don't seem to always do the expected.

When testing your suggested wget + parms I don't see any images being downloaded…?

Can you speak to this and recommend a fix?

 

On Dec 17th someone sent me this wget command which scrapes all the images referenced in a single thread:

 

wget -P ./thread/ -nd -r -l 1 -H -D media.8ch.net -A png,gif,jpg,jpeg,webm https:// insert_thread_URL_here.html

 

and it works but does not fetch the fullsize images, only the thumbnails, and I haven't figured out how to modify your wget mirror -the-whole-qresearch command to also fetch images and adjust references so the local HTML pages refer to locally-mirrored images.

 

I just don't have the time or patience to work this out.

Anonymous ID: 9d7f52 How to archive all of a "person of interest" on Twitter? April 10, 2018, 10:21 a.m. No.984029   🗄️.is 🔗kun   >>7388 >>4291 >>7352 >>3455

>>974637 (OP)

>>978477

>>978833

>>983930

 

A serious question: How would a clever anon go about saving a person of interest's entire twitter feed from inception, including pics, in case of deletion?

I get the feeling that these methods can narrow select/focus the file folder to be copied, yes?

Anonymous ID: f2398b April 10, 2018, 10:48 a.m. No.984291   🗄️.is 🔗kun   >>7388 >>5562 >>7600

>>984029

I'm not really sure about twitter specifically; I don`t use it. But when researching POTUS' tweets I came across a couple of good websites and figured that they are using twitter's api.

 

For people that don't program, an "api" is an "application programming interface," which is basically a set of tools for you to get what you want, designed by the maker of the app. It works in the best interest of these social media companies to develop a good api because it allows others to use and display their stuff on other websites–free advertising and spread of influence. I'll look further into it, as it relates to an app I'm working on.

 

As far as using HTTrack on it, I haven't tried. For the most part I've had success with websites that don't involve user accounts; there may be a way around that, but again, I'm pretty new to it myself. Just taught myself like 4 days before Q highlighted liddlekidz.

Anonymous ID: 9d7f52 Important: How to archive a particular person's/institution's complete Twitter threads? April 10, 2018, 12:26 p.m. No.985562   🗄️.is 🔗kun   >>7185 >>7352 >>7840

>>984291

>>974637

>>978477

>>978833

>>983930

>>877198

>>122807

>>93735

>>3138

 

Thanks for your reply.

 

I know that it is possible to copy a particular Twitter discussion thread by printing a pdf, or by using 'ThreadReader' I think it is called, but again this is only for one discussion thread, not the root or branches, and in this case only if Twitter author allows it.

 

So the challenge for non-Tech anons like me is to find an easy technical way to copy all roots and branch discussion threads of a particular person/institution without copying absolutely everyone on Twitter.

 

Maybe there is an easy solution, if so great, that is why I am asking.

 

Thanking all anons in advance for their patience, consideration, and time.

 

We all know of instances where evidence has disappeared before archiving.

 

Non-Tech Anons need to archive particular person's/institution's complete thread discussions for possible use as evidence etc.

Anonymous ID: 353534 April 10, 2018, 2:17 p.m. No.987185   🗄️.is 🔗kun   >>7515 >>7600

>>985562

Newfag here. Long-time lurker, first post.

 

Wrote a beta PHP script for Twitter that works in conjunction with Youtube-DL and wget (all freeware) to archive an entire conversation piece. Vids, whether native Twitter uploads or click-thrus to Youtube, are saved in their entirety, in best res available. Pics, PDFs, same thing. Any Web URLs, their front-page HTML is saved off as an HTML file.

 

Let me know if interested. It is by no means a finished product but it works very well as long as the convo is PUBLIC. Oh yeah, Instagram too.

Anonymous ID: e39ab8 April 10, 2018, 2:31 p.m. No.987352   🗄️.is 🔗kun   >>7515 >>7600

I'm speaking from a Linux operating system perspective (though aspects may apply to the equivalent windows version)

>>978477

wget manual can be downloaded here:

>>>/pdfs/8640

this board doesn't allow pdfs

 

I would also recommend using the -U or –user-agent options to change how the website sees the wget program. (wget can impersonate a browser when making connections). This can get around some sites that actually look for and filter wget connections.

(see p14 of attached manual)

 

>>983930

It looks like you were in that poster thread anon ;^)

>it works but does not fetch the fullsize images, only the thumbnails, and I haven't figured out how to modify your wget mirror -the-whole-qresearch command to also fetch images and adjust references so the local HTML pages refer to locally-mirrored images

fetch fullsize: adjust your recursion depth, if I recall correctly from -l1 to -l2 as with -l1 you would only be grabbing that page's content, not anything it links to (the full size images). I think there was a way to only get the items at depth level 2 (the full size content)

adjust references: the pages should contain relative links from the current page to the other and not absolute links (i.e. page1.html has a link to page2.html not http:// somewebsite/fulladdress/page2.html). You may wish to look at using the -m option for site mirroring. WARNING: it has infinite recursion depth and can chew disk space as it attempts to grab anything linked, and anything those links point to, etc, ad infinitum! MAKE SURE -l depth is set to stop it.

See this webpage for more:

https:// stackoverflow.com/questions/4602153/how-do-i-use-wget-to-download-all-images-into-a-single-folder-from-a-url

 

>>984029

>>985562

>How would a clever anon go about saving a person of interest's entire twitter feed

Off the top of my head, you would be looking at a scraper script using curl for server requests, most likely written in Python, Perl, PHP, or similar. Search "twitter scraper" for lots of hits on the sort of thing you'd be using. There are lots of scripts on gitHub or similar.

 

Getting a list of URLs when you have 1 per line in a file "grab-these-URLs.txt"

wget -i grab-these-URLs.txt

 

Downloading videos from YouTube, Twitter, basically anywhere

youtube-dl -F http-URL-goes-here

will give you a list of the available formats to download with a CODE by each (on sites like YouTube) (e.g. CODE 18 RESOLUTION 1920X1080, CODE 22 RESOLUTION 1280X720….)

youtube-dl -fCODE http-URL-goes-here

will download that version specified by the CODE

 

youtube-dl http-URL-goes-here

will download the best quality version of the video (= largest file size)

 

Creating a folder for each day of the year on Linux not sure if you'd need this, but it was on my mind for some reason

mkdir -p {01,03,05,07,08,10,12}/{01..31} 02/{01..28} {04,06,09,11}/{01..30}

This will create a folder for each month 01-12. Inside each month folder will be 30 or 31 folders for each day of the month, except 28 for February (which can be changed to 29 for a leap year)

Anonymous ID: 9d7f52 Possible Twitter solutions April 10, 2018, 2:42 p.m. No.987515   🗄️.is 🔗kun   >>7937

>>987185

Welcome!

A very useful, thoughtful first post. Not everyone can say that.

Sounds great.

I got nothing for Twitter now apart from pdf-printers so hell yeah, I'd love to test it if you're up for it. Anything is better than nothing. Thank you for your reply.

 

>>987352

Thanks very much for your detailed and useful post with easy to read, easy to follow explanations. Really great work.

 

Ok, I will try to follow through with all the kind anons' advice and test on sites and Twitter.

 

Thank you all anons :)

Henry Case ID: ae3e60 Twitter WebScraping with Python April 10, 2018, 2:50 p.m. No.987606   🗄️.is 🔗kun   >>7830 >>7855

Twitter scraping can be achieved with something like Python. The juridical viability for use in a legal proceeding will vary by jurisdiction, as this is a prime vulnerability for distortion.

 

If you intend to use the data for a disposition, it might be best to scrape as well as printing (with timestamp) to PDF and/or hardcopy. When you want to admit it, you'll need it to be certified by the court, so the more supporting information, the better.

 

https:// medium.com/@dawran6/twitter-scraper-tutorial-with-python-requests-beautifulsoup-and-selenium-part-1-8e76d62ffd68

Anonymous ID: e39ab8 April 10, 2018, 3:03 p.m. No.987830   🗄️.is 🔗kun   >>8012 >>8778

>>987606

>When you want to admit it, you'll need it to be certified by the court,

You'll may well be looking at "digital timestamping" and creating file/document hashes. You might be interacting with a time server/authentication party via OpenSSL.

The digital timestamp proves it was created after some other document, which in turn was created after a different document, and so on, right back to the beginning of the chain - anchored in time. This proves 1) the file existed at 2) that point in time.

The file or document hashes are a one way math operation to reduce a file into a signature. Every signature of a file is different, and if you change one character of the file the new signature will change dramatically. This can be used to prove a document copy is identical to the original, or a downloaded file was not corrupted/intercepted during download.

 

For example, a free service for timestamping documents (1st result from a quick search)

https:// www.freetsa.org/index_en.php

 

There are other ways it can be done with file hashes in a crypto currency blockchain.

Anonymous ID: 23d7ce April 10, 2018, 3:04 p.m. No.987840   🗄️.is 🔗kun   >>8012

>>985562

I think I have a simple solution. I can't quite verify yet, but so far it seems to be doing what I think you would want it to do.

 

First, you need to go to twitter's advanced search*:

https:// twitter.com/search-advanced?lang=en&lang=en

 

After entering the user's name, and the starting date from which you would like to collect tweets, click "search"

 

After that, you'll get a results page. Copy & paste the url into HTTrack. I didn't have to adjust my settings at all–I just chose to download the images, not movies.

 

From then on, it should start downloading without any serious issues. In my first two images, I did a search for @snowden's tweets from October 25th today. In my trial run (which I'm currently still processing), I chose to grab all of @JulianAssange's tweets since 1-1-2017. Big mistake–as the poor man is locked up, he probably averages about 10-15 tweets a day. So far as I can tell, not only am I gathering his tweets, but also the tweets of those that he has retweeted and the tweets on their profiles. I'm at 9 GB far, and almost 20k files downloaded.

 

I'm sure there's a setting somewhere that might tell it not to go to far, but I haven't figured that out yet. Regardless, it's pretty much doing what you would want–as you can see from the image, there's a separate folder for each person that Assange has interacted with. Below those folders are some completed .html files, and tons of html.tmp files–which are basically unfinished downloads.

 

When this is all done, I'll go over it and confirm how well it turned out. At this point, I can individually click on some of the .html files and they bring up profiles, so I'm pretty confident.

 

*Twitter's advanced search doesn't work well on mobile devices. If you can't find it on your mobile device and want to reach it, go into your browser's settings and click "request desktop view." Also, it may not be necessary to go to advanced search at all–if you look in the "Capture1.jpg" image I've posted, you can see "from:snowden since:2017-10-25". You may just be able to enter a query like that into their regular search to get the results you want.

Anonymous ID: 9d7f52 For legal evidence archiving, do both a pdf print and program site scrape for context. April 10, 2018, 3:05 p.m. No.987855   🗄️.is 🔗kun

>>987606

Good advice, thank you.

 

For legal evidence archiving, do both a pdf print and program site scrape for context.

 

Both a pdf print and a site scrape may be necessary for court admission to prevent defense lawyers throwing out good evidence.

 

In reality I hope that our backups won't be needed for court cases, but better safe than sorry.

 

Almost any personal pain is worth it so see these legions of evil people be served the justice they rightfully deserve for their disgusting crimes.

We have it easy, think of the multitudes of victims who have no voice, and rely on us and other good people to investigate and give them a voice.

Let alone our relatives and the dead from the world wars, or more recent affairs. The scars are deep.

Anonymous ID: 353534 April 10, 2018, 3:10 p.m. No.987937   🗄️.is 🔗kun   >>7385 >>8088 >>8218

>>987515

While I'm fluent on normie platforms (fb, twitter, insta) when it comes to what I can/can't upload, I've never attached anything on 8ch. Can I upload a ZIPped archive that contains my PHP scripts? Or is a scanner gonna see scripting in my archive and go apeshit?

 

I would upload the source in plain-text (not copywriting anything here…) but…it's alot.

Henry Case ID: ae3e60 No Quick and Easy eDiscovery April 10, 2018, 3:22 p.m. No.988086   🗄️.is 🔗kun

There is NO quick and easy way to do juridical data collection. The cause is worth it, so put some sweat and tears into this shit. It's worth it.

http:// technology.findlaw.com/electronic-discovery.html

Anonymous ID: 9d7f52 April 10, 2018, 3:22 p.m. No.988088   🗄️.is 🔗kun   >>0558

>>987937

Some other anons used arc hiv e.is for the first archives, but then switched to Meg aU plo ad.

Your work sounds very useful and pretty cool tbh, so I would still love to try it if you're ok with that. I don't know anything about locking down the plain-text source so it doesn't get f-d with. I assume that once it is there on Me gaU pl oad for example, then it is safe from alteration?

Gotta start some coding.

Anonymous ID: e39ab8 April 10, 2018, 3:26 p.m. No.988164   🗄️.is 🔗kun   >>4257

A useful one line script to grab pdfs listed 1-per-line in the file "pdflist", and convert each into text

for i in grep -E '^htt.*\.pdf$' pdflist;do foo=$(basename $i .pdf);wget $i; pdftotext -r 300 $foo; done

What this does:

for i in :perform a loop

grep -E '^htt.*\.pdf$' pdflist :search for any lines in the file "pdflist" using a regular expression where the line starts with "htt" and ends with ".pdf" - essentially this matches any URLs listing PDFs in the file.

;do foo=$(basename $i .pdf) :call a variable "foo" the basename of $i - this strip "http….some.thing/blah/whatever.pdf" down to "whatever"

;wget $i :grab the PDF at the URL

;pdftotext -r 300 $foo :convert the grabbed PDF "htt…../blah/whatever.pdf" into text using a print level scan resolution (300ppi) and save it in "whatever.txt"

;done :loop for the next URL found in the "pdflist" file.

 

I've used this when downloading Hillary Clinton Emails from the Judicial Watch website.

I end up with text versions of the 1000s of emails. I can then use the grep program to search through them all and get a list of search term matches.

 

grep -oHC4 pizza *.txt

search all files ending in ".txt" for the term pizza, print out the matched parts, along with the filename and in context of 4 lines (e.g. 2 lines before the match & 2 lines after)

 

The search term I first look for isn't "pizza" but "B1" because this indicates Classified Emails.

Anonymous ID: e39ab8 April 10, 2018, 3:31 p.m. No.988218   🗄️.is 🔗kun

>>987937

I'd recommend using pastebin or similar and post the URL.

If it is a file collection then you could use filedropper, mixtape.moe, MEGA NZ, or similar to upload the zip archive.

 

If the files are small, you could create a b64 from the zip then upload the resulting text file to pastebin, but only oldfags/nerds may understand how to decode it.

Anonymous ID: 23d7ce April 10, 2018, 4:06 p.m. No.988648   🗄️.is 🔗kun   >>8055 >>8778 >>8857

>>988012

You actually do kind of get a timestamp, in that the creation of the files on your computer have a creation date as they are written. So long as you don't go about editing them, the date remains intact. You might want to consider making a copy and storing it someplace safe; if it were for a legal case, I would perhaps throw it onto a thumb drive and give that to, say, a lawyer or notary public. You could upload it to some website, but if all of this archiving is about having information while the web is down, then that presents a problem…

 

I've found something interesting about archiving with HTTrack, but it doesn't solve the Twitter problem. You can set the "mirror depth" to a certain number, which represents the number of "clicks" away from your page you want to copy. If you don't set this (in options >limits >> Maximum mirror depth), you may wind up with a gigantic download.

 

Consider this scenario: You're downloading the last 100 tweets from someone. In one of those tweets, they re-tweeted someone else…so that person is clicked on, which brings up all of their tweets. Each of those is clicked on…and on and on and on…

 

So this is what I recommend: for social media, start with a setting of 1 or 2; if it doesn't get enough, bump it up one until you get what you need. Leaving it unset means that it will continue onwards with infinite clicks–when it comes to social media, that means that it could go on for a -very- long time, as people quote other people, etc.

 

As far as Twitter is concerned, I've tried it with a setting of 1, 2, and 3; 1 and 2 got me his profile, 3 got me his profile in a ton of different languages and maybe a month's worth of tweets (with Chinese headings). So it doesn't look like it's necessarily a productive means of getting the info you want–more than likely it's going to be a matter of using the api to get the results you want.

 

I found the issue I've been having–it has to do with another setting.

Anonymous ID: e39ab8 April 10, 2018, 4:15 p.m. No.988778   🗄️.is 🔗kun

>>988648

>You actually do kind of get a timestamp, in that the creation of the files on your computer have a creation date as they are written. So long as you don't go about editing them, the date remains intact.

This would be insufficient proof, since anyone can change the clock on your computer to give any date.

The "digital timestamping" and file hashes mentioned previously are a recognized form of document authentication.

>>987830

Anonymous ID: e39ab8 April 10, 2018, 4:20 p.m. No.988857   🗄️.is 🔗kun   >>6231 >>8724

>>988648

>1 and 2 got me his profile, 3 got me his profile in a ton of different languages and maybe a month's worth of tweets (with Chinese headings).

You may wish to see if there are filtering options, so you can ignore content that doesn't match the filter.

So a depth of 3 + filter will ignore irrelevant information like the Chinese headings.

 

This would be equivalent to

-r -l3 -A ext1,ext2,ext3

in wget (to recursively grab to a depth of 3 files ending in "ext1", "ext2", or "ext3"

Anonymous ID: f1dd0a April 10, 2018, 7:15 p.m. No.991469   🗄️.is 🔗kun   >>5721

Thanks for the link. I used to be a fucking wizard on a PC, but got into another platform and still need to figure out a lot on this side of things for this platform. Thanks again, and I will post back if I find something useful.

Anonymous ID: 78e364 Trump is really 17th Prez!!! 17=Q April 10, 2018, 8:31 p.m. No.992419   🗄️.is 🔗kun   >>5721 >>8062

>>990154

A new hot theory from reddit.com/r/greatawakening

 

https:// www.reddit.com/r/greatawakening/comments/8bd84s/trump_is_our_real_17th_president_since_lincoln/?st=jfuhf6fm&sh=d68afeb7

 

Trump is our real 17th President. Since Lincoln, 16th, we have only had corporate CEOs pretending to be president. self.greatawakening

 

submitted 2 hours ago by BlackSand7New arrival.

 

With "The Act Of 1871" - Our Republic became a corporation named "THE UNITED STATES". (Names in all caps represent corporations). Since then all our presidents have just been corporate CEOs. Now we can get our Republic back and have true Presidents again. Thank you Donald Trump!

 

So yes, his jersey 17=Q, but 17 might also mean our true 17th President.

Anonymous ID: 4c2f44 saving twitter info April 10, 2018, 10 p.m. No.993455   🗄️.is 🔗kun

>>984029

ask quinn michaels, he is a programmer

find him on youtu be,he programs bots that search twitter for all items related to a specific hashtag

Anonymous ID: 068feb April 11, 2018, 3:52 a.m. No.995721   🗄️.is 🔗kun   >>5745

Unbroken link test

https%3A%2F%2F8ch.net/qresearch/catalog.html

 

>>991469

No one knows who you are replying to unless they are linked in to your post. Click on the Post Number "No. XXXXXXX" that comes after the posters "ID: xxxxxx"

to automatically have it inserted into your reply.

>>992419

Use the correct thread for your posts, regardless of how excited you feel about something. That is irrelevant to this thread. It is like your sports team wins something, so you interrupt a meeting of strangers to shout about it - i.e. rude. Yes it is good news, and I read your post about it in another thread. Putting it here too is poor form, and spamming. Spamming is always bad etiquette.

Anonymous ID: 068feb April 11, 2018, 3:57 a.m. No.995745   🗄️.is 🔗kun   >>5761

>>995721

This test successfully kept the URL intact, but is still broken as far as the browser/ search engine sees it.

You can paste the entire address into a browser and go without editing.

 

Unbroken link test 2

https:%2F%2F8ch.net/qresearch/catalog.html

Testing to see if the browser, or 8ch filters the URL based on these characters.

 

Unbroken link test 3

https:/%2F8ch.net/qresearch/catalog.html

Testing to see if the browser, or 8ch filters the URL based on these characters.

 

Unbroken link test 4

https:%2F/8ch.net/qresearch/catalog.html

Testing to see if the browser, or 8ch filters the URL based on these characters.

Anonymous ID: 068feb April 11, 2018, 4 a.m. No.995761   🗄️.is 🔗kun   >>4570 >>5964

>>995745

Tests 2, 3, 4,all produce URLs that are not broken by 8ch's board software, but the browser can interpret correctly to open a new tab when the URL is selected and right click menu activated.

Anonymous ID: a5583b April 11, 2018, 4:33 a.m. No.995964   🗄️.is 🔗kun

>>995761

Clever workaround. I think the board owner threw the word filter in to avoid drawing the attention of "sniffer" programs. I doubt they would register your altered versions.

Henry Case ID: ae3e60 Complications April 11, 2018, 6:03 p.m. No.1004570   🗄️.is 🔗kun   >>1225

>>995761

You're over complicating the issue… in most cases, there is no need to encode the URL. So if you configure a shortcut for the "Open URL" service on most systems, it's a no-brainer.

Anonymous ID: 23d7ce April 11, 2018, 7:24 p.m. No.1006231   🗄️.is 🔗kun   >>6290

>>988857

I think I misspoke when I said "headings." What I meant was that the entire page is in Chinese, except for the tweet itself.

 

What's happening is that I'm getting copies of certain .html files in every language–1000 in total for the index, login, and search .html files.

It looks like they have a typedef set up to link a language to a 4 digit hexadecimal code which is appended like so: indexffbb.html, etc. Also, the other files have different designations.

 

It would be trivial to write a program that found the right one, but you need to know the right one beforehand in order to set up the right filter so the whole purpose is defeated. Who knows how often they change it? That having been said, it really doesn't matter–the files are relatively small.

 

I think a setting of "3" is best. I pushed it to "4," and ended up getting far more than I wanted. Once your download completes, look for the folder of the person who's tweets you're collecting, and they should all be in there. It will be in "project name folder" >twitter.com >> "person whose tweets you're getting" >> status. For me they're in English.

 

The difference between a setting of "3" and "4", in my case, was about a 50x increase in my download size. Twitter put me on a time-out because of it, lulz.

Anonymous ID: 23d7ce April 11, 2018, 7:26 p.m. No.1006290   🗄️.is 🔗kun

>>1006231

Oh, and if the helps you wget users, the file structure is as I'd mentioned: twitter.com >(twitter handle, without the '@' in front) >> status >> *.html

Henry Case ID: ae3e60 Casting a wider net... April 11, 2018, 10:38 p.m. No.1008724   🗄️.is 🔗kun

>>988857

 

By default, wget grabs everything so it's actually better to cast a wide net when mirroring unknown servers, as you said.. you can find more hidden gems this way. If you add the flags, you'll limit your retrieval to those filetypes.

Anonymous ID: 4c2796 April 12, 2018, 12:17 a.m. No.1009935   🗄️.is 🔗kun

For local archive searching, I'm using Java DocFinder with decent results, curious about other standalone options for a private/local fulltext search supporting fuzzy and near, etc.

Anonymous ID: 4c2796 April 12, 2018, 12:20 a.m. No.1009947   🗄️.is 🔗kun   >>1009

This works great for saving this board, it was posted a while back.

 

wget -nH -k -p -nc -np -H -m -l 2 -e robots=off -I stylesheets,static,js,file_st

ore,file_dl,qresearch,res -U "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6

) Gecko/20070802 SeaMonkey/1.1.4" –random-wait –no-check-certificate https:// 8

ch.net/qresearch

Anonymous ID: e6f97a qarkzilla April 12, 2018, 1:44 a.m. No.1010366   🗄️.is 🔗kun   >>0886

Glad to see someone else noticed this limitation with wget. I do have the patience. Searched all over the web for a solution for a couple weeks now and even tried alternatives to wget but really wanted the wget solution to work. Finally wrote my own solution. It is a bash script. Couldn't get it to work with a one-liner but got it down to two wget calls and a couple sed calls to rewrite the img URLs. It's a very early iteration but I have a feeling it will have a few more before this is all done with. Calling it 'qarkzilla.sh'. I'm releasing it further down this thread so others can start tinkering with the idea and optimize it.

Anonymous ID: 3a6b7f April 12, 2018, 2:49 a.m. No.1010558   🗄️.is 🔗kun

>>988088

 

can anyone give me a FQDN, to test an archiving script? Possibly bitcoin registered outside USA (dotCOM)?

 

Archive.is/fo/today is censored. I've had more than a few archive links change the url they archived and the domain searching yeilds nothing. All in the past few months, so I know it's related to Q stuff.

Anonymous ID: 2cb4f9 Introducing qarkzilla April 12, 2018, 4:35 a.m. No.1010886   🗄️.is 🔗kun   >>1009 >>7914

>>1010366

Searched for days to find a way to easily archive a thread in a scriptable manner. Finally determined that wget does not have the ability to download both the thumbnail pics embedded in the page AND the larger size images. Would love to hear of someone who got it working in a one-liner, but failing that, I created a bash script that performs two wget calls and does very minor sed URL rewriting to get the local page working with minimal transformations.

 

This is beta code. It works, but is not optimized, and can grow to be something really cool. For example, it currently downloads a single thread at a time, but you could easily get it to download from a whole list of threads, or even autodetect/update threads, etc. Anything is possible, so this is just the core of a handy little archival utility for chans. There are others, this one's merit is super lightweight yet effective.

 

Calling it Qarkzilla because it ARKives Qresearch threads and is kind of like a Zilla.

 

The script is available at https:// github.com/subqarkanon/Qarkzilla

Anonymous ID: 2030ac April 12, 2018, 5:06 a.m. No.1011009   🗄️.is 🔗kun

>>1010886

Thanks anon!

Why is the ssh key included?

 

> wget does not have the ability to download both the thumbnail pics embedded in the page AND the larger size images

? wget -l 2 would grab them, like this anon >>1009947 using the -m gives a mirror copy.

 

I wrote a oneliner to grab only the larger images a while back, but it was probably in the early hours in the middle of an autism attack, and I can't find it in my history…Ah!! I MIGHT HAVE MADE A NOTE! HANG ON….

 

rm 11245726.html; wget https:// 8ch.net/pol/res/11245726.html; grep -Po '(?<=href=")[^"]' 11245726.html| grep -vE '*html$'|sort -u |wget -nc -A jpeg,jpg,bmp,gif,png -i -

 

The leading rm is to remove the previously downloaded html, because this one liner was run as an update. You can use clobber options to overwrite the old file & the -N to "get if newer than current file" and leave off the rm really.

Summary of what it does:

wget -grab thread html

grep -parse html and find any download links

grep -find any of those links that don't end in html

sort -u -sort and remove duplicates

wget -grab the from those links if they are images

Anonymous ID: 2030ac April 12, 2018, 5:50 a.m. No.1011225   🗄️.is 🔗kun

>>1004570

Those who aren't technically versed in creating shortcuts would be the ones who gain the most benefit from viewing (not posting) hex coded URLs.

As you correctly implied ("in most cases" but not all), not all systems would allow shortcut configuration either, and the prescribed method works for every case.

Anonymous ID: faceb2 Snapshot everything April 12, 2018, 9:52 a.m. No.1012849   🗄️.is 🔗kun

I use zfs. This is how I archive:

 

cd /bread/qresearch/ && wget -nH -k -p -np -H -m -e robots=off -I stylesheets,static,js,file_store,file_dl,qresearch,res -U "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4" –random-wait –no-check-certificate http:// 8ch.net/qresearch ; snapshotname=$(date +%Y%m%d-%H%M) && zfs snapshot bread/qresearch@manual-grab-n-snap-$snapshotname

 

The four main parts are:

Change to directory

And if successful wget the board

Regardless of success set a snapshot time

And if success create a zfs snapshot

 

Pics related

Anonymous ID: 353534 Archiving a Specific Twitter Thread April 12, 2018, 4:16 p.m. No.1017385   🗄️.is 🔗kun   >>7494

>>987937

Here goes…and we'll take it from here.

 

Sorry for the wait. Several components were gonna need to be prerequisites and, considering that not everyone's a tech anon, I put some extra work into rolling up the extra components into the existing folder.

———————————–

CTTPortable beta v0.1

 

-Tested ONLY in Windows 10 and Server 2012, 64-bit, but any 64-bit windows should suffice.

-Will require running from a Command Prompt

-Instagram features only include downloading videos and pics

————————————–

INSTALLATION:

————————————–

  1. Extract the compressed folder anywhere. (My example will be c:\temp\CTTPortable)

  2. Navigate to a specific Twitter conversation or thread in a web browser, and copy the URL. (My example will be the latest [InTheMatrix] thread: https:// twitter.com/intheMatrixxx/status/984559849873780736

  3. Open a command prompt window, navigating to C:\temp\CTTPortable and type

 

php cttURL.php https:// twitter.com/intheMatrixxx/status/984559849873780736

 

If all goes smoothly, this will:

 

  1. create a subfolder roughly named after the URL

  2. Place a WGETed copy of the original thread file "984559849873780736.html"

  3. Download locally all youtube videos referenced within the thread

  4. Download locally all pics contained within the thread

  5. Download locally any external website references (html,pdf,etc)

  6. Generate a SECOND copy of the Twitter Thread("984559849873780736new.html"), substituting remote links with the locally stored copies of each respective file.

 

I'll post a couple instructional vids later if anyone struggles with this.

Anonymous ID: 7b3ab0 April 12, 2018, 4:35 p.m. No.1017595   🗄️.is 🔗kun   >>7666 >>7335

>>1010995

 

That wget command is from:

>>493884

 

It's a 20+GB download of images and HTML for all of qresearch, takes quite a while the first time. May not want to run it., or remove the file_store from list of things to get, reducing size greatly.

Anonymous ID: 104106 April 27, 2018, 1:35 a.m. No.1207179   🗄️.is 🔗kun

>>1206291

Thank you BO!

 

Ripping an Instagram profile without login

The publicly viewable content in Instagram can be archived using Instalooter - a Python program.

https://github.com/althonos/InstaLooter

The usage manual is found at https://instalooter.readthedocs.io/en/latest/usage.html

 

To rip an archive for a user called "foo-bar":

python -m instalooter user foo-bar -v -d

-v option rips videos as well as images

-d option rips JSON metadata containing comments found on the Instagram posts

 

A helpful one-liner to use with the JSON files saved - Searching the JSON metadata for keywords of interest:

grep shortcode grep -li pizza *.json|sed 's/^."shortcode":\s"(.)".*/http:\/\/www.instagram.com\/p\/\1/' -

This will run a case-insensitive search for "pizza" and return all URLs in the Instagram profile that contain that term in the comments.

Anonymous ID: c305f5 April 27, 2018, 2:33 a.m. No.1207335   🗄️.is 🔗kun

>>1010995

I never completed an enire download. I think I got to 11 GB one time and figured something was off (I was just trying to get one thread). If I were to stick to a depth of one, it probably would've worked.

 

>>1017595

Knowing there's an upper limit is helpful. 20 GB isn't as bad as I would've imagined.

Anonymous ID: 35e9de April 28, 2018, 11:42 a.m. No.1224047   🗄️.is 🔗kun

>>1214842

>comms are unfolding.

May your origami be beautiful and to your liking anon.

 

Recommendation by anons to use clipconverter.cc for making video clips.

https://www.clipconverter.cc/

 

Those who are more technically minded can make use of youtube-dl as described above in this thread.

Anonymous ID: 6dde74 April 30, 2018, 1:27 p.m. No.1251551   🗄️.is 🔗kun

>>1251422

 

We need a spiritual cooking element to the whole thing. Show the entree' like one of those sick parties where someone is laying on a table.

Cocktail…blood bag?

Anonymous ID: ca2872 April 30, 2018, 1:58 p.m. No.1251964   🗄️.is 🔗kun   >>1253

Illuminati overlord Albert Pike explained in 1871 that the Third World War will focus on the mutual destruction of the Islamic World and the Political Zionists.

Cabalists planned to initiate this war that will bring ‘complete physical, moral spiritual and economic exhaustion’ for decades. The entire populist/Trump/Q movement has arisen at this time for the sole purpose of preventing the ultimate conflict before it starts.

The catalyst that will initiate the Third World War was to be an Iranian Nuclear strike on Israel, conducted with a nuclear weapon built in Syria, containing ‘Russian’ Uranium (U1). The Iran deal will ensure immediate US involvement and the Russian aspect would be used to foment war between the EU Countries and Russia, thus dragging the whole of Western Society into a Global Conflict.

https://www.youtube.com/watch?v=9RC1Mepk_Sw

https://www.youtube.com/watch?v=yWAFvIT-NHs&feature=youtu.be

North Korea was to be used as an agent for initiating conflict in the East. A Medium-Range ICBM was provided to the regime to engage Hawaii. The ‘false alert’ on Hawaii that occurred several months ago was a test missile, launched from NK. The missile passed over Japan, causing a simultaneous ‘false alert’. This secondary nuclear attack will drag NK and China into the World War.

Patriots have installed themselves in Jordan. They will attempt to intercept the Iranian missile that would strike Israel, thereby preventing the Third World War. However, war may still be declared on the grounds of breaking the terms of the Iran deal.

Everything has been leading up to this moment, this is a crossroad in our civilisation.

SA -NK.

NK -Armenia.

Armenia -Iran

Iran ->ENDGAME

Anonymous ID: c0f3a4 May 1, 2018, 12:37 a.m. No.1259141   🗄️.is 🔗kun

# OS X + httrack command line

Steps to install httrack on the OS X command line and run it. httrack has a lot of options to play with depending on your desired results.

 

# Install homebrew

https://brew.sh/

 

# Install httrack

brew install httrack

 

# Make a directory for your httrack 8ch files

cd ~/Downloads

mkdir 8ch

cd 8ch

 

# Run httrack

cd ~/Downloads/8ch

httrack https://8ch.net/qresearch/

 

# httrack update

cd ~/Downloads/8ch

httrack –update

 

# httrack help

httrack –help

man httrack

Anonymous ID: 92310b May 1, 2018, 4:05 p.m. No.1265626   🗄️.is 🔗kun   >>6775

Requesting modification to (and reminding people of) the search script here: pastebin.com/tM53Q6AM

 

The original works by searching in arcdir/qresearch/res. It only works in one folder at a time. A modification is needed to search my /bread/qresearch/.zfs/snapshot//qresearch/res. Note the asterisk. If I run ls /bread/qresearch/.zfs/snapshot//qresearch/res then I can successfully get a wall of text.

 

Not sure how to do so myself. Doesn't look like a simple oneliner to me. I've also thought to modify the temp dir to use the folder I run the script from for temp (since the snapshots are read-only and we can't modify hundreds of folders to have a temp folder).

 

This is what I get when a just change the arcdir:

 

# Error! No HTML-files found in "/bread/qresearch/.zfs/snapshot/*/qresearch/res"

# Please check if archivePath ("arcDir=…") is set correct.

 

I'd like to see if I can ~~cause any headaches~~ revive any dead posts.

Anonymous ID: 2e8bb2 May 2, 2018, 3:20 p.m. No.1276775   🗄️.is 🔗kun   >>6841

>>1265626

>This is what I get when a just change the arcdir:

I have no idea what line you changed, since there are several places in the script you could have done that, or what you changed it to, as your question is fairly vague.

 

However, I set up the necessary directory structure, and tested the script, and had no problems with it searching multiple directories.

For instance, I used a glob (i.e. * ) to pick more than one directory on line 11:

arcDir="${HOME}/../../tmp/a/*/b"

Works as expected, and searched through multiple directories.

 

My guess is you changed line 11 from:

arcDir="${HOME}/archive/qresearch/res"

to:

arcDir="/bread/qresearch/.zfs/snapshot/*/qresearch/res"

but the archive is in your home directory, not root directory???

If so, line 11 should be:

arcDir="${HOME}/bread/qresearch/.zfs/snapshot/*/qresearch/res"

which would specify the correct path.

 

Otherwise check you really do have any html files in those directories.

Anonymous ID: 33eb04 May 2, 2018, 3:26 p.m. No.1276833   🗄️.is 🔗kun   >>6872 >>7909

Help. Not tech savvy. Received a message on my computer to call Apple Support "NOW" regarding security issues. The message listed my IP address and gave me a case number, which has me freaked out b/c the end of that case number is: ….-qch8nt Could it be someone from 8ch is trying to contact me?

Anonymous ID: 2e8bb2 May 2, 2018, 3:26 p.m. No.1276841   🗄️.is 🔗kun

>>1276775

>fairly vague

..and by "vague", I mean the problem is clear ( I understand what you're trying to do, and what the end result should be), BUT the specification is vague (your description isn't specific enough to know exactly what you're doing wrong, so I'm guessing where you need to fix to get the required result.)

Anonymous ID: 2e8bb2 May 2, 2018, 3:40 p.m. No.1276987   🗄️.is 🔗kun   >>7184

>>1276914

>>1276929

bareback = connect from your home router without any proxy to cover what your home IP is.

VPN = virtual private network. You connect to any of the VPN companies proxy servers to get an IP in the country of your choice, so it makes it look like you are connecting from somewhere other than your home. (There's a lot more to it, but that is the basics)

Also, in certain countries like China, EU, UK the government collect what websites you visit and rank their citizens accordingly.

Anonymous ID: ea5f76 May 2, 2018, 4:14 p.m. No.1277388   🗄️.is 🔗kun   >>8055

>>984291

>>987600

>You guys are like geniuses.

Yes, thank you anon(s).

>>984029

>saving a person of interest's entire twitter feed

 

Just attempted. Account wasn't set on private, I wasn't a suscriber. HHTrack hummed along 15h, was up to 30Gb on last check, and it just finished with this error:

>15:31:39 Panic: Too many URLs, giving up..(>100000)

 

The site did mirror, but only to about 20 tweets.

So it was a fail. If you have any ideas HHTrackanon, I'd be grateful.

 

I know NSA has it all ultimately, but I suspect this twitter account has leads to high level tech elite pedo stuff on west coast. I don't want to publish to anons w/o mirror. Meantime, I'm just copying to word from time to time. -Ty

Anonymous ID: 093d6a May 2, 2018, 5:01 p.m. No.1277909   🗄️.is 🔗kun   >>8274

>>1276833

Just FYI, there are a lot of hoax websites that will pop up a scary-looking message telling you that you've been infected. Then, they'll give you some fake contact information, you'll contact them, and they'll try to convince you to do something that ultimately will allow them to screw you. They can be very, very tricky, but believe me: no legitimate business operates that way.

 

Here's how you deal with those kinds of things: close the tab in your web browser that the scary page is on. If you only have one tab open, open another and close the bad one. If that doesn't work, close your web browser (Opera, Firefox, Chrome, or whatever you use to get on the internet). If you open it again and you get the same message, you need to change your homepage to something normal, like startpage.com.

 

Don't ever, every contact those people–they're lying. If you're really, really worried, take a picture of the message and the web page address, then call Apple and ask them about it.

 

Hope that helps, and welcome to the internet–you'll get used to it in no time :)

Anonymous ID: 093d6a May 2, 2018, 5:13 p.m. No.1278055   🗄️.is 🔗kun   >>8294

>>1277388

It sounds like you just need to change one setting–see this post:

>>988648

 

I think a mirror depth of "2" is enough. This is the thing–you won't be able to open the folder, click on "index.html", and get all of the posts up like you would if you did an advanced search…but if you look in the folders, you'll find one that's named the same as the person you're trying to "grab." Inside that folder will be all of the tweets–you can double-click each one and take a look. It's not pretty, but it's pretty thorough.

 

Also, make sure that you do an advanced search before you start mirroring, with the start date as far back as you think you'll have to go to get them all.

 

If a "mirror depth" of 2 doesn't get you everything you want after doing both of those things, then go ahead and try "3". But don't bother going up to "4," because it gets way out of hand.

 

Another note: if you didn't use a vpn, odds are that twitter is going to rate-limit you after such a large download. You'll have to go through a proxy if that's the case. Good luck!

Anonymous ID: 33eb04 May 2, 2018, 5:32 p.m. No.1278274   🗄️.is 🔗kun   >>9675

>>1277909

Thank you, as well for helping me. Again, what really spooked me was the end numbers/letters of the "case number," which was qch8nt (Q + 8chan.) You guys and gals are the BEST

Anonymous ID: ea5f76 May 2, 2018, 5:33 p.m. No.1278294   🗄️.is 🔗kun   >>9581

>>1278055

Ty anon. lol at the large download. I should have used a VPN, but didn't. This anon has always made a clumsy operative–either sperg out on every detail or just decide fuck it and dive in headfirst.

 

I'll report back if I make progress.

Thanks again for your detailed help.

Much appreciation

Anon ID: 9a62d3 webrecorder.io May 2, 2018, 10:43 p.m. No.1282594   🗄️.is 🔗kun

Different way of archiving but it can archive all media, links, etc of a site. You can run it locally if you have any skill or run it from the webrecorder.io site.

Anonymous ID: 6db461 May 3, 2018, 4:01 a.m. No.1284257   🗄️.is 🔗kun   >>9311

>>988164

>for i in grep -E &#x27;^htt.*\.pdf$&#x27; pdflist;do foo=$(basename $i .pdf);wget $i; pdftotext -r 300 $foo; done

No need for the basename strip as the shell can handle string parsing, so:

for i in grep -E &#x27;^htt.*\.pdf$&#x27; pdflist;do wget $i; pdftotext -r 300 ${i%.pdf}; done

is equivalent.

¢♄Δ⊕$

Anonymous ID: 4981c3 May 3, 2018, 10:37 a.m. No.1287062   🗄️.is 🔗kun

Fireshot works for easily saving high-quality PDF renders of webpages. Can't handle really long webpages, however.

Anonymous ID: 093d6a May 3, 2018, 3:18 p.m. No.1289581   🗄️.is 🔗kun

>>1278294

don't worry about it–I speak from experience re: the large download. And I sperg as well. You might notice some big posts where the author calls out and corrects mistakes right after putting it up–that's me a lot of times.

Anonymous ID: 093d6a May 3, 2018, 3:29 p.m. No.1289675   🗄️.is 🔗kun

>>1278274

I would've made a double-take as well.

 

I notice that I get a "Q" in around half of my captcha challenges. If you figure 26 capital letters, 26 lower case, and ten digits, that makes for a one-in-sixty-two chance of getting a "Q" for each character. With six characters, the odds of getting one should be something like 9.3%. Superstition or not, it gives me comfort.

 

It's funny how noticing patterns can either make you an idiot or a genius. It's probably the most important predictor of both mathematical ability and paranoid schizophrenia, ha ha.

Anonymous ID: 6a246e May 6, 2018, 2:15 a.m. No.1316720   🗄️.is 🔗kun

# Twitter: Scrape / download a user's tweets on OS X

Isn't real pretty but it worked for me

Doesn't require a twitter API key

Scrapes twitter search instead

Most steps are performed in an OS X Terminal/shell

Requires basic shell experience

 

### Install homebrew

https://brew.sh/

 

### Install jq

brew install jq

 

### Install TweetScraper

https://github.com/jonbakerfish/TweetScraper

 

cd ~/Downloads/

git clone https://github.com/jonbakerfish/TweetScraper.git

cd TweetScraper/

pip install -r requirements.txt

 

### Run TweetScraper

Here, SaRaAshcraft, is an example twitter user name

 

scrapy crawl TweetScraper -a query=&quot;from:SaRaAshcraft&quot;

cd Data/tweet/

find . -type f -print0 | while read -d $&#x27;\0&#x27; file; do jq &#x27;select((.is_reply==false) and .is_retweet==false) | .text&#x27; $file ; done &gt; ../saraashcraft-all.txt

 

### Open your new text file

Use any text editor to open your new saraashcraft-all.txt file

 

vim ../saraashcraft-all.txt

Anonymous ID: 4ae2db May 6, 2018, 6:57 a.m. No.1317520   🗄️.is 🔗kun

When POTUS speaks at various events he should ensure that there are strategically placed mirrors behind him that forces media to show the crowd!

Anonymous ID: feb0d5 May 7, 2018, 3:54 p.m. No.1330780   🗄️.is 🔗kun

Once you have the data mirrored, this is awesome for smaller filesets, handles Q archive fine, with regular expressions "near" phrase, synonyms, etc. No image searching, but does filenames and content of text, html, pdf, etc.

 

Open Source, java, increase java memory limit if doing dozens of gigs of text/PDF.

http://docfetcher.sourceforge.net/de/index.html

Anonymous ID: ac9690 DownThemAll Filter May 7, 2018, 10:24 p.m. No.1334628   🗄️.is 🔗kun

If you're browsing qresearch with Tor (or older Firefox), you can add DownThemAll and configure the following filter to pull down all the large size images of a given thread. (Ctrl-S will save the page, but it only captures the smaller size images. This gets the larger ones too.)

 

/^(?!.+\/(\d{10})-?[1-9]?..+).+(file_dl\/.+(jpg|png|jpeg|gif)\/)/i

 

pics related

Anonymous ID: 85358a May 21, 2018, 4:18 p.m. No.1497942   🗄️.is 🔗kun

<iframe width="504" height="283" src="https://www.youtube.com/embed/tpH5L8zCtSk" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>

Anonymous ID: 85358a May 21, 2018, 5:24 p.m. No.1498919   🗄️.is 🔗kun   >>5178 >>8228

Barack & Michelle Obama just signed a multi-year contract with Netflix.

 

Some titles for their new shows have already been released.

 

  • House of Race Cards

  • Orange is the New Barack

  • 13 Reasons Why I Was Indicted

  • Stranger Things Than Michelle

  • Better Call Saul Alinsky

Anonymous ID: 36a986 May 22, 2018, 4:25 a.m. No.1504090   🗄️.is 🔗kun   >>5178 >>5689

Kek! These are some great titles, Anon… thanks for the morning laugh.

 

Related: I cancelled Netflix because they went full-retard with their Soros-level programming. Namely the show that about how shitty white people are. When I heard Obummer and Rice had partnered with the company, that was it for me.

 

Unrelated: I can't stop seeing Biden in this pic as Simple Jack.

Anon ID: ffc98a Organized GUI Programming May 22, 2018, 8:19 a.m. No.1505538   🗄️.is 🔗kun   >>5995

>>1505178

 

This is what happens when bad programmers dont know how to program for humans…and think all humans will understand your code structure.

 

its called updating your site beyond 2002 style.

 

Thank you

 

– The autists and The anons and other organized diggers who have moved beyond 2002.

Anonymous ID: 7f80cb May 22, 2018, 9:33 a.m. No.1505995   🗄️.is 🔗kun   >>6006

>>1505538

>its called updating your site beyond 2002 style.

>it's not my fault I didn't read what page I was on, it's someone else's fault.

No, it's called "reading the page you're on and posting accordingly - not being lazy and blaming someone else."

>updating your site beyond 2002 style.

This site can have ANY STYLE YOU CHOOSE!

You can write your OWN style/theme and use it under the options mention. So don't blame "2002 style" for your own laziness for 1) not finding this out by reading the FAQ, and 2) not writing your own style.

 

sage for off-topic

Anonymous ID: 093d6a May 24, 2018, 1:51 p.m. No.1531142   🗄️.is 🔗kun   >>4108

>>974637

Just wanted to drop off some good settings for getting one thread from 8ch.net, separate from the entire board.

 

In the options, under the Limits tab, select "Maximum Mirrorring Depth" and set it equal to "2", then select "Maximum External Depth" and set it equal to "1". When selecting options under the "Scan Rules" tab, I usually select the first box (for graphics), but if you really want to make a complete copy select the other ones as well. I believe it will download every movie that's embedded onto the page…so be prepared for a long download.

 

Setting the "Maximum Mirroring Depth" to "2" will allow you to get not just the thumbnails, but also the larger images you see when clicking on thumbnails. Also, bear in mind that even if you don't download the videos, the links to the video will still be there…so you will still be able to see them so long as they're still being hosted elsewhere. But if they're taken down, not so much.

Anonymous ID: b0f309 June 7, 2018, 2:19 a.m. No.1657831   🗄️.is 🔗kun   >>9329 >>9588

What is the expiration time for images not frequently accessed on 8ch.net?

 

I'm trying to do an independent archive of the qresearch breads. While I can still pull down all the old html files, I find that most of the images and thumbnails for the early days 404 out. The ones that still work are mostly frequently used memes. This leads me to suspect that files which are inactive for a specific period of time are discarded to same space.

Its a shame. Historians researching America's second revolution 100 years from now might like to seen that stuff.

Anonymous ID: 1d5a63 June 7, 2018, 8:38 a.m. No.1659311   🗄️.is 🔗kun

>>1284257

>for i in grep -E &#x27;^htt.*\.pdf$&#x27; pdflist;do wget $i; pdftotext -r 300 ${i%.pdf}; done

It isn't equivalent, since basename strips off the leading http URI element. ${i%.pdf} is OK for a filename but not a URL.

You could do a more complex match, but might as well use basename since it will handle any unusual cases more smoothly.

sK:dd

Anonymous ID: 71f475 June 7, 2018, 9:25 a.m. No.1659588   🗄️.is 🔗kun   >>6116

>>1657831

Images hosted on 8ch expire when they are bumped off the catalog in to the archives. Otherwise, there is no timed expiration that I'm aware of.

 

The 8chan servers must accommodate nearly 18,000 boards, with almost 2000 posts per hour, many of which will include images, PDFs, SFWs, oekaki pictures, etc.

 

CodeMonkey would have all the details. admin@8chan.co

Anonymous ID: 093d6a June 7, 2018, 11:24 p.m. No.1666116   🗄️.is 🔗kun

>>1659588

I was going to say, that seems to be the delimiter. Threads in the archives lose images, but not all of them. I don't believe it's based on size–there are some big images left, while small ones are lost.

 

All the more reason to archive threads individually, or archiving the whole board periodically–say once a week, or whenever you think something epic is happening. Even if an image is deleted by the person that posted it, you'll still have a copy on your hard drive if you catch it in time. This is especially important when we get visitors :)

Anonymous ID: c43717 June 11, 2018, 6:15 p.m. No.1706888   🗄️.is 🔗kun   >>1338

>>1706515

Not really, they are not perfect. Everyone is different. Unless you have sauce which says they can do that each and every time to anyone?. Think electric chair. Not everyone responds the same. Some people are harder to kill. Read up on the Rosenbergs' deaths?

 

>>1705773

>>1705990

Jobs is likely another simulated "death?"

He's another fucking clown / actor. They own them all. Almost.

Yes POWell ["Powell is her first name? Like

"Stanley" for "Obama's" mom.] Must be a clown thang;.Looks like a "Stepford Wife" undated. Or a Pizza clone girl; one of Hillary's pals, like LdR? Plays the part of the grieving widow. yuk

Friends don't let friends read WaPOO

Anonymous ID: 65ae20 June 13, 2018, 4:52 p.m. No.1736024   🗄️.is 🔗kun

Spread the Qword into the physical world wherever you go.. use this technique to get to normies on the street.

Anonymous ID: ba5deb June 13, 2018, 8:37 p.m. No.1739302   🗄️.is 🔗kun

>>1736185

 

You Clowns sure are terrible at memes….. I mean this is below the IQ of Forest Gump.

 

Put on some cloths, put down the peanut butter and hang your head in shame you idiot shill.

 

Send it to corsi, he will like it and tell you how great you are…….faggot

Anonymous ID: 284301 June 13, 2018, 9:36 p.m. No.1740239   🗄️.is 🔗kun

Japan seeks meeting between Prime Minister Shinzo Abe and Kim Jong Un: Report

 

Japanese Prime Minister Shinzo Abe and North Korean leader Kim Jong Un have communicated several times over the past few months and are seeking to schedule a meeting between the two later this year, a new report says.

 

One option on the table is for Abe to travel to Pyongyang in August, but another possibility is for Abe and Kim to meet during the Eastern Economic Forum in September if Kim decided to attend, according to the Yomiuri newspaper. The report comes after a Tuesday summit

between the U.S. and North Korea, where President Trump and Kim signed a joint statement agreeing to pursue a “stable peace” on the peninsula.At a press conference after Kim had left Singapore, Trump said North Korea agreed to give up its nuclear arms. Trump also claimed he was confident that the rogue regime would pursue complete denuclearization and that the process would be underway in the near future.

 

Trump also decided that U.S.-South Korean military exercises would be suspended following the meeting, but Japan’s defense minister has claimed the exercises were “vital” to national security in East Asia. Two missiles were fired over Japan in 2017 by North Korea as the rogue regime continued to develop its nuclear weapons program.

 

https:// www.washingtonexaminer.com/news/japan-seeks-meeting-between-prime-minister-shinzo-abe-and-kim-jong-un-report

Anonymous ID: 2205ac June 19, 2018, 11:07 a.m. No.1815238   🗄️.is 🔗kun   >>6733 >>7093

>>1807836

Q is not a LARP;

Q is a master disinformation spreader.

He has fooled everyone.

 

The optics are now SO bad on the administration that even I believe that Trump is Hitler for "caging" children.

Horowitz and Wray have NOT delivered. Horowitz provided the FULL report, not a REDACTED report like Q said.

 

This is now a massive goat fuck.

 

Horowitz: a deep state lifer.

Wray: a new, indoctrinated deep state lifer.

Anonymous ID: 39995a June 19, 2018, 12:44 p.m. No.1816733   🗄️.is 🔗kun

>>1815238

IT APPEARS THAT YOU ARE CORRECT.

 

ONLY TIME WILL TELL…BUT THIS IS HOW POLITICIANS OPERATE, THEY OBFUSCATE, DISTRACT, AND MUDDY THE WATER SO MUCH THAT NOBODY WANTS TO HEAR ABOUT IT ANYMORE…THATS HOW THEY BURY PUBLIC, INTEREST, DISSENTION, CONCERN, ETC.

 

SO BY REPEATING TRUCKLOADS OF INSINUATIONS, AND BLOVIATING SO MUCH BULLCRAP, THE FACTS BECOME LOST AS A NEEDLE IN A HAYSTACK, SO MUCH SO THAT NO VOTERS KNOW WHAT TO BELIEVE.

HOPEFULLY PEOPLE LIKE GOWDY WILL BE HEARD AND PEOPLE LIKE CUMMINGS WILL BE IGNORED. THESE DEMOCRATS ARE VERY GOOD AT CONFUSING OR IGNORING THE FACTS. MIXING THE SUBJECTS TO MAKE A POINT ON TV RATHER THAN FOCUSING ON WHY THE WITNESS IS THERE.

BAIT AND SWITCH. WITH AN AGENDA…..

Anonymous ID: 558350 June 20, 2018, 7:15 a.m. No.1828553   🗄️.is 🔗kun

>>1826165 (pb)

Anon, this pic might be a good addition for your infographic. I never thought I would admire and trust the Russian president more than one of ours. The side by side memes at the time were comedy gold...shirtless Putin vs. Hussein...good stuff.

Anonymous ID: 573f7d June 20, 2018, 6:01 p.m. No.1838009   🗄️.is 🔗kun

Q–message from @prayingmedic

Congressional Research Service major swamp

Medic's contact former military

States CRS needs drained in a big way.

 

ThankQ

Anonymous ID: 0999ec June 20, 2018, 8:03 p.m. No.1840496   🗄️.is 🔗kun

>>1839974

What is that in President Trump's left hand?

Is that a Q sign?

 

I don't know much about ASL signs, but it looks like a left hand Q.

Good sign in Minn.

Anonymous ID: 940f72 June 21, 2018, 9:21 a.m. No.1847914   🗄️.is 🔗kun   >>8072

>>1010886

Thanks, you inspired me to upload a few scripts that I have been using to track all the places in the qresearch threads where someone or somebot has written a post with 1 or more lines that are of the form "X = Y".

 

>https://g ithub.co m/daizyr/qresearch-extract-equals

 

Instructions to run it are in the readme. The resulting file is called "training.out" because at the time I wrote it, I was actually trying to demonstrate it was used for bot training data (I still think it is, but to a lesser degree than back in the day).

 

Anyway, I think it's still relevant for tracking and archiving things. I hope others find it useful. It just downloads all thread HTML, no attachments or thumbnails. It is also a good basis for building tools that download all HTML, then analyze the contents of each comment. Threads are inspected in numerical order (so older threads are processed first.)

 

Example output for a specific thread in training.out (tho this file contains exerpts for ALL threads processed). It is suitable for posting as a comment/threadlink elsewhere in 8ch:

 

>[2018-06-20T03:13:27Z] Q Research General #2298: The Shills Should Masturbate Edition

>>>/qresearch/1823809

>css = cascading style sheet

>Trust the plan = also

> Occam's Razor = D = Donald

>pass = qanon:qanon

> As D5 = 45 I thought D meant Delta

>cut off soros funding bc of human rights abuses = no more antifa

> [1] Do you think the delta between posts and tweets could be who is posting at any given time? [1] Delta = POTUS ? Just a thought

 

I do not plan on maintaining this code, but feel free to fork it and use it. It was meant to scratch a specific itch. not to become a maintained software project.

Anonymous ID: 09e5bf July 4, 2018, 3:17 p.m. No.2033086   🗄️.is 🔗kun

Conquer we must as conquer we shall.

 

Watching The Darkest Hour today. Was listening to Churchill's 1st radio broadcast to the people w/the direct address regarding matters in the world. His details might have been off, but his motive was pure. It was time to rally the people for a mighty fight. And he succeeded in doing so by using this approach. The victory was for all. So grateful for this motive being held in the hearts of my patriotic leader & his teammates.

 

So grateful for all of you. I stand acting in part in any way I can imagine to. Happy Independence Day!

Anonymous ID: 758038 July 6, 2018, 3:26 a.m. No.2054108   🗄️.is 🔗kun   >>0069

>>1531142

Can an anon post a command line version of this using HTTrack.exe? Scriptable version = more powerful than GUI. Wget isn't getting me what I want, this might do the trick. Thank you.

Anonymous ID: 92310b July 6, 2018, 3:38 p.m. No.2060069   🗄️.is 🔗kun

>>2054108

I too have been having wget problems. It doesn't regrab the html files for me, leaving a lot of partial breads.

 

I found this page: https://www.archiveteam.org/index.php?title=HTTrack_options and this page.

 

I've been practicing on the comms board, it should be about 600 megs, httrack is grabbing too much for my preferences. Currently it's at 3.6 gigs so I canceled it. Gonna go back and read the manual and any examples I find and if I find something that works I report back.

Anonymous ID: b3653c Aug. 1, 2018, 9:45 a.m. No.2391452   🗄️.is 🔗kun

When you say watch the water what do you mean?

A. OCEAN beaches radioactive etc.?

B. POLITICS red wave blue wave?

C. DRINKING are they putting a drug in it ie CIA's favorite ?

D. NONE please explain?

Anonymous ID: 99e44b Aug. 12, 2018, 11:19 a.m. No.2570762   🗄️.is 🔗kun

All things considered I figured it was a good idea to bump this back to the top for folks who may not know how to archive a website off-line. :)