Anonymous ID: 24d34a July 30, 2020, 10:59 p.m. No.21652   🗄️.is 🔗kun   >>1655

>>21608 (off-bread)

>using wget for archiving

 

this works. the resulting html page isn't immediately browseable (no styles, links to media are broken) but it does grab all posts and media files:

wget https://8kun.top/comms/res/21322.html --no-clobber --recursive --level=1 --span-hosts --domains=media.8kun.top --wait=0.3 --random-wait

this will not re-download files that already exist, which is what you want for media but not for the thread HTML file. so if you want to re-archive a thread (like when it gets new posts) find the old .html file and delete it or rename it before running that command.

 

other stuff that could be improved:

  • this will download both thumbnails and full versions of media files. I think the –accept-regex option is the way to fix this.

  • this might not be able to catch some errors like corrupted media files by itself (sometimes downloads fail partway through and they probably won't be re-downloaded).

Anonymous ID: 24d34a July 30, 2020, 11:04 p.m. No.21655   🗄️.is 🔗kun

>>21652

that line break in the middle of "–level=1" is not supposed to be there.

fixed command here:

https://www2.qanonbin.com/paste/1H1Wur2Fv