>>21608 (off-bread)
>using wget for archiving
this works. the resulting html page isn't immediately browseable (no styles, links to media are broken) but it does grab all posts and media files:
wget https://8kun.top/comms/res/21322.html --no-clobber --recursive --level=1 --span-hosts --domains=media.8kun.top --wait=0.3 --random-wait
this will not re-download files that already exist, which is what you want for media but not for the thread HTML file. so if you want to re-archive a thread (like when it gets new posts) find the old .html file and delete it or rename it before running that command.
other stuff that could be improved:
-
this will download both thumbnails and full versions of media files. I think the –accept-regex option is the way to fix this.
-
this might not be able to catch some errors like corrupted media files by itself (sometimes downloads fail partway through and they probably won't be re-downloaded).