AutoArchiveAnon !!!OGY0YjcwNDFlOTMz ID: 6db19f Aug. 12, 2020, 3:09 p.m. No.22530   🗄️.is 🔗kun   >>2532

>>22521

> I like the idea of the site trying media.8kun.top first and then other archives if that's not available any more.

Yes, that would be perfect.

 

If you keep database data on every post + image you could even do it within your catalog parser. You can figure out if a video should be gone (you surely know when the same video is uploaded several times the same media.8kun.top URL will get used) and then try archive.org instead.

 

You could even start looking for wayback machine URLs at this point (I showed how it's done properly, also wait around 2 seconds between each look up, it seems wayback machine blocks IPs that access too much stuff at once), save these and keep them.

 

I had weird situations with wayback machine, it seems it's stable now, but I don't really trust it. And of course we can never be sure if someone censors certain videos and removes them from wayback machine. So going for media.8kun.top first makes absolutely sense.

 

Maybe you could even check if any videos are missing and if that's the case tell me.

 

I finally fixed the unicode decode issue ("\uXXXX"), and it's not as easy as I originally thought because you need to verify that you don't decode stuff like ("\uXXXX", which would actually be "\uXXXX" and not an encoded unicode character, bad guys may use stuff like this sooner or later).

 

I need to finish the notables parser now.

o7