>>666471
I've not been back into this thread for a while. I'm running the qresearch import process to get up-to-date. One technique that is needed is to re-scan already imported threads for posts missed during initial scans.
Threads are imported from the catalog.json file. In this state, we know the thread number and the number of messages at that time. The only time we know a thread is closed is when the number of posts >= the number in the official "bake" count.
Therefore, my program keeps testing until the posts counter >= the bake counter and then marks the thread as complete in the thread table. This then prevents re-scanning all threads because we get only the open ones.
Multiple scans of posts are needed to get all of them and to deal with duplicate threads.
I use the 8-chan post number as part of the primary key to the threads and posts tables.
8GA_1 is 8chan Great Awakening post 1
8QR_655000 is 8chan Qresearch post 655000
The big problem is going back to find threads BEFORE the last 25 pages in the catalog.json. Therefore, I can't get anything earlier than when I first wrote the import.