Anonymous ID: af3481 Tech/codefaggotry: how to archive 8kun boards July 21, 2020, 10:18 p.m. No.20526   🗄️.is 🔗kun

Basic writeup on how to archive 8kun boards to be augmented/discussed further in this thread:

 

Manual archiving (for small boards)

  1. go to the catalog

  2. open each thread in the catalog in a new tab

  3. for each thread, click to expand all images (this part can be done automatically using a script in the browser console)

  4. use the Save Page WE browser extension to grab a complete archive of the thread with all full-size images included in a .html file. I haven't tried to see if this works with video attachments, and I know it won't work with pdfs.

 

Automated archiving

  1. make requests to the catalog

  2. make requests to each individual thread in the catalog, optionally based on the time each thread was last updated

2a. save the information for each post into a database and mark deleted posts as deleted

2b. look through all the media files in each new post and download the full version of them (if not already done)

  1. repeat

  2. periodically review media files that are present in deleted posts and decide if they should be kept or not. it is hard for a program to tell the difference between content that was deleted by mods and things that disappeared due to site errors.

Anonymous ID: af3481 July 22, 2020, 7:05 p.m. No.20576   🗄️.is 🔗kun   >>1054

I use an extension called "Save Page WE":

https://addons.mozilla.org/en-US/firefox/addon/save-page-we/

Works well, and much better than a pdf or png because there is no data loss this way.

MozArchive with MAFF or MHT should be fine too but I like Save Page WE because it creates a plain .html file with all images inlined into it.

 

I've done the automated archive thing too, what I described works but my version is not currently in a state where it would be of any use to others to release it publicly.