Basic writeup on how to archive 8kun boards to be augmented/discussed further in this thread:
Manual archiving (for small boards)
-
go to the catalog
-
open each thread in the catalog in a new tab
-
for each thread, click to expand all images (this part can be done automatically using a script in the browser console)
-
use the Save Page WE browser extension to grab a complete archive of the thread with all full-size images included in a .html file. I haven't tried to see if this works with video attachments, and I know it won't work with pdfs.
Automated archiving
-
make requests to the catalog
-
make requests to each individual thread in the catalog, optionally based on the time each thread was last updated
2a. save the information for each post into a database and mark deleted posts as deleted
2b. look through all the media files in each new post and download the full version of them (if not already done)
-
repeat
-
periodically review media files that are present in deleted posts and decide if they should be kept or not. it is hard for a program to tell the difference between content that was deleted by mods and things that disappeared due to site errors.