Anonymous ID: 1df330 AUTOMAGIC ARCHIVING March 2, 2018, 6:25 p.m. No.535793   🗄️.is 🔗kun   >>5808 >>5927 >>6007

I keep reading posts about archiving everything off-line. I'm trying to help!

I'm writing code to pull posts from the boards and place them into a real database with a web application.

 

I have some questions for Codemonkey that I need answered.

 

question 1

I'm pulling the board information with catalog.json, but this brings in only the most recent 25 pages of threads. How do I get the earlier threads?

question 2

Are image files duplicated between posts where they are used or are they shared and merely referenced again? I'm trying to understand if I need to call the server and download each image as a separate instance. The image identifiers are 65-bytes, but I don't know if they're copies of copies between posts.

 

The big issue is question 1 as we're now into the 600s for the threads just on this board!!!

 

I've imported the threads that I can get with the current catalog.json and it looks beautiful for less than an hours work.

 

NOTE that the present selection table is a subset of the thread data. It's all captured into the database.

 

CM or BO can contact me at either:

n4hpg@comcast.net

n4hpg@protonmail.com

 

I'll work on importing actual non-Q posts tomorrow. Long day here.

 

You can try it out:

 

www.pavuk.com

username qanon

password qanon

 

No sense ME being anonymous…

Anonymous ID: 1df330 March 2, 2018, 6:32 p.m. No.535828   🗄️.is 🔗kun

>>535808

All timestamps are in Zulu for easier comparison. I'm going to break out the message number so it will display better.

 

The image names being 65 bytes is 2 bytes more than the default limit for primary keys in my DBMS of 63. No worries.

 

535k messages is not a big issue either. The image storage has be a bit concerned, but we'll see how it goes tomorrow.

Anonymous ID: 1df330 March 2, 2018, 6:51 p.m. No.535983   🗄️.is 🔗kun

>>535927

You can search with an empty form. Also, I was rerunning the import.

 

We're at alpha-level <grin>

 

More tomorrow. Most performance issues will be related only to large pulls into data grids.