A place for codefags to make the chans searchable.
ctrl-f as in fagg0t like 0p
Posts from #608
>>493751
>>494228
>>494299
>>494080
>>494202
>>494015
>>493888
>>493884
>>493882
>>493881
>>493886
>>493919
>>493939
>>493854
>>494489
>>494264
>>494457
>>494503
>>494460
>>494405
>>494451
>>494528
>>494471
>>493877
>>493898
>>493929
>>494283
>>494184
>>494548
One further comment from a heavy database user for what it's worth:
If we had a list of 'tags' that anons could enter as they post (in a specific format, e.g. preceded by **) covering topics that emerge (such as 'mkultra', 'bridge' etc. - related to topics brought up by Q) when searching through the data it would serve as a way to link crumbs by subject and as an additional variable / filter in any search would serve to streamline any search.
these would have to be moderated by BV / BO / Baker; would not be any more work than creating the notable posts per bread, although would be useful to find a way to insert them after the post identifier to create the link (i.e. >>xxxxxx within/2 **xxxxxx = TRUE).
Historical would be an issue but if there were some way of batch-adding at data assimilation stage based on linked crumbs, as well as specific 'meta-moderators' as we run searches etc.
However it might work - principle is an easily assignable value to identify crumb subject based on Q's topics so more information can be retrieved via regular search.
>One further comment
Well, you kind of lost me pretty quickly. Correct me if I'm wrong, but what you're suggesting is for posts going forward, and posts that are Q centric.
My goal is to see ALL of the board searchable because much of the digging and research that was collected was not just related to items Q had in mind, but many ancillary topics and evidence discovered would help build the "parallel construct".
That's what I see as important, your thoughts?
Might I suggest using SQLite as the DB for the"file format". It's a single file db that performs well for read heavy workloads, is single file, so easy to distribute, easily usable from PHP and just about any other programming language, and could easily be used to load a regular server based db (obv depending on how the schema is designed). Also multi-platform, so should keep everybody happy irrespective of what OS you use.
I forgot to mention, SQLite also supports full text indices via the ft4 virtual table type
I made a thread a few minutes ago asking if a wiki could be a good format to organize findings? Could help with navigating. What do you think?
>>497784
I've been working on exactly this. I'm pulling the catalog from ga & qresearch. Finding the research general threads and saving those with q posts. Only goes back to about 2/15 when I turned the machine on. Currently working on getting old posts reconstructed. 99% sure I can grab all breads from 8ch.
C# dll to scrape q posts and threads from 8ch. 8ch+ json format but could be serialized XML I guess
One of the anons from the other thread.
I'm not going to jump in too much if other's are doing something where we end up stepping on Pepe's toes. Couple of thoughts thoughโฆ
-
Full text searches/indexes can be garbage. Only good for reserved words
-
Most likely want this in a relational database. Creating the schema would consist of a really simple data model. Not even sure I would worry about normalizing it.
-
Messages (body) could be stored in a blob and be searched with wildcards.
-
Only looking at about 10-15 different queries tops. All simple SQL statements except for a couple that would need to be hierarchical..but still easy.
-
I was thinking to use MySQL or SQL Server for the DB Engine.
-
Biggest challenge will be the parsing of the threads and crumbs into a loaded format for the database. Once in a useful formatโฆloading will be easy.
I see three main parts to this:
1) Getting the data so it can be loaded into a database.
2) Creating the database structures (really should be first)
3) Spitting out the queries, views, and sprocs that will be used. And putting a front end on it.
-
almost doxxed myself and put a link to my web siteโฆso close :-)
>C# dll to scrape q posts and threads from 8ch. 8ch+ json format but could be serialized XML I guess
Good call me thinks
I think that is a great thought. May be a good idea to just get one set started and loaded then look into the other boards.
We (at least I) can't see a way to search the 'board' itself, but to create a copy of the data in the threads and make those searchable.
>Download Chan.
>Host JSON of posts.
>Build simple interface.
>Use nginx as reverse proxy.
>??????????
Profit
Why the fuck do you want a DB when it's already JSON. FFS.
Open Source, Cross Platform search engine library - xapian.org
github .com/mcmontero/php-xapian JSON support and web-friendly middleware
A better way to do this is to probably put everything client side. Make a cross platform application that just fetches new posts every so often. The browser is pretty perfect for this is we can set up a cross platform local server to host a local copy of qcodefag and this board.
Search
https:// github.com/bvaughn/js-search
Pros: Fast enough once index is built.
Cons: Have to build index, or send it from a server, ipfs, blockchain, whatever.
UI
rip it from qcodefag for q posts
Add 8ch layout to some button on qcodefag or some tab
Display the posts as normal, but add search bar for board side of new client for qcodefag and this board.
Pic related, it's easy to get .json formatted threads.
inb4 we all pwn ourselves.
Conveniently this also alleviates the clown issue should that garbage bill pass. I mean not really since we'll still own ourselves but fuck, we can try.
>>502660
>>>502453
>
>>Research Threads Ideas. Please claim or create yours, let us know of more subject ideas
>
>>Quest for Research Searchability Thread
>
>>>494745 (You) (You)
>
>Thanks for including my thread. I'm not a coder so I'm not much more than a cheerleader. I am quite sincere in my belief that we have to make it all searchable. I'm not naive enough to expect a volunteer to tackle it. Without doxxing themselves, can any anon point me to a service or company that could accomplish this Quest?
>>502660
>>494745 (You) (You)
Thanks for including my thread. I'm not a coder so I'm not much more than a cheerleader. I am quite sincere in my belief that we have to make it all searchable. I'm not naive enough to expect a volunteer to tackle it. Without doxxing themselves, can any anon point me to a service or company that could accomplish this Quest?
A pleasure anon. Here's wishing you all, all the very best in this noble quest. It would be Christmas for us all if you did it. GODSPEED.
The omega interface for xapian could do most work. wget to grab site data, json->csv converter to translate, and to be ready to go. All Free and open source software. Not quite plug and play, but a start.
Sample usage described:
xapian.org /docs/omega/overview.html
linode .com has very affordable linux shell hosting.
Hey just had a thought but couldn't /ourguys/ look at all bullets ( like they have to on a crime scene )?
Wouldn't "LIPPEL" the one who had been "grazed" be able to connect bullet to her dna with whatever DNA would be on her?
What about the other student who was walking after being shot in both legs by 4 rounds?
Where is the DNA for that match to bullet?
What about the dead coach, the HERO we seen at the funeral? DNA match to that?
All this stuff might not help us ATM but IMO,
would play a big handle in the game out there with Q and friends?
https:// www.youtube.com/watch?v=cPvYxTa1ph4
https:// www.youtube.com/watch?v=cPvYxTa1ph4
https:// www.youtube.com/watch?v=cPvYxTa1ph4
LIPPEL
Another thing, this video she talks about how "BREAKING THE GLASS WITH SHOTS" starting at @ 2:05.and then she says they arrived..
MAYBE AN HOUR AFTER
She then states at the end of the video then she states the "Swat team/Police" was on the ground, she aid they were banging on the doors to let them in, she "DIDN'T TRUST IT WAS THEM, BECAUSE THE POLICE WERE BANGING ON THE DOORS - NOBODY GOT UP"
==IF THE SHOOTER DRESSED IN FULL METAL GARB SHOT OUT HER WINDOW, SHE WOULD OF SEEN IT BEING POLICE, AND THEY WOULD OF SEEN HER.. AND THEN PROBABLY OPENED THE DOOR THRU THE BROKEN GLASS INSTEAD OF BANGING ON THE DOOR WOULDN'T THEY ?"
Whole story right here in the video proves it was either a False Flag or some type of fuckery
It's not really a company, but wouldn't the person running the 4plebs archive be a good place to look for tools/code in this quest? Maybe he'd even be willing to assist? The site uses some fairly powerful search tools for certain halfchan boards already. I'm not a codefag so I apologize if this hasn't been suggested already.
https:// archive.4plebs.org/_/articles/faq/
That's a good suggestion. Do you know off the top of your head how many archive sites have been used at 4ch and 8ch? I know about archive.is and 4plebs, but I've seen a lot more. I'm pretty sure the threads are scattered about the internet.
What about a bulletin board type of system like vbulletin for example? built in search and different forums and sub forums for topics.
>>4144
> excised from threads and posted in these subs?
Sorry, didn't finish my thought, and they might not be captured in a search of posts in Qresearch? Not sure why you posted these.
>>4274
In naval warfare, a "false flag" refers to an attack where a vessel flies a flag other than their true battle flag before engaging their enemy.
It is a trick, designed to deceive the enemy about the true nature and origin of an attack.
In the democratic era, where governments require at least a plausible pretext before sending their nation to war, it has been adapted as a psychological warfare tactic to deceive a government's own population into believing that an enemy nation has attacked them.
In the 1780s, Swedish King Gustav III was looking for a way to unite an increasingly divided nation and raise his own falling political fortunes.
Deciding that a war with Russia would be a sufficient distraction but lacking the political authority to send the nation to war unilaterally, he arranged for the head tailor of the Swedish Opera House to sew some Russian military uniforms.
Swedish troops were then dressed in the uniforms and sent to attack Sweden's own Finnish border post along the Russian border. The citizens in Stockholm, believing it to be a genuine Russian attack, were suitably outraged, and the Swedish-Russian War of 1788-1790 began.
In 1931 the Japan was looking for a pretext to invade Manchuria. On September 18th of that year, a Lieutenant in the Imperial Japanese Army detonated a small amount of TNT along a Japanese-owned railway in the Manchurian city of Mukden.
The act was blamed on Chinese dissidents and used to justify the occupation of Manchuria just six months later. When the deception was later exposed, Japan was diplomatically shunned and forced to withdraw from the League of Nations.
In 1939 Heinrich Himmler masterminded a plan to convince the public that Germany was the victim of Polish aggression in order to justify the invasion of Poland.
It culminated in an attack on Sender Gleiwitz, a German radio station near the Polish border, by Polish prisoners who were dressed up in Polish military uniforms, shot dead, and left at the station.
The Germans then broadcast an anti-German message in Polish from the station, pretended that it had come from a Polish military unit that had attacked Sender Gleiwitz, and presented the dead bodies as evidence of the attack. Hitler invaded Poland immediately thereafter, starting World War II.
http:// www.bibliotecapleyades.net/sociopolitica/sociopol_falseflag29.htm
For hundreds of links to FF research/reports, use this link below. You are welcome Anons..
http:// www.bibliotecapleyades.net/sociopolitica/sociopol_falseflag.htm
>person running the 4plebs archive be a good place to look for tools/code in this quest?
For the archives 4plebs uses sphinx search (http:// sphinxsearch.com/). It's used to index from the database and display search results very quickly.
Easy to implement but I would say it's worth it only if you have a lot of data to search through. For smaller datasets you can use full text search included in a regular database engine.
Also you can take a look at other search engines like Solr (http:// lucene.apache.org/solr/) and elasticsearch (https:// www.elastic.co/)
been using duckduck for searches
cryptocert keys moded on puterโฆ should i reboot or undo?
I also would Second the Idea of using Sphinx - it can be connected to a currently live database and given clues and sample queries to Index all text in the DB - https://
www.percona.com/resources/technical-presentations/how-optimally-configure-sphinx-search-mysql-percona-live-mysql and they have a video. I don't think there are any existent Docker setups to play with, although I imagine 8ch is quite custom anyway.
OK So I think I've got my chanscraper console app working as designed.
AFAIK, I've got all the QPosts in a single JSON, I've got complete breads starting with Bread #364 2018-02-07. That's as far back as I've been able to reach programatically. Each complete bread has also been filtered into another json file containing just Q's posts.
The complete breads have only come from 8ch. The chanscraper is set up to whee it could scrape 4ch as well - assuming the json is still available.
I'm showing 825 QPosts - 1 more than qCodeFag because I believe I have a deleted
one. All counted it's 210 threads.
I've done all the hard work of setting up the old catalog/threads/posts. Its set up where you can specify how far back to refresh (to cut down on unnecessary http gets), It reads in the existing data, finds the new threads to search for on 8ch/greatawakening and 8ch/qresearch, and then archives the threads/posts that q has made locally.
If anybody wants the full Q archive as I have it now, here it is: 6mb https:// anonfile.com/H6B7G7dcbc/QJsonArchive.zip
I'm going to integrate the DJTweets + minute Deltas in this week.
Once I get this all cleaned up I'll cut it loose on Github if there are any C#codeFags interested.
My idea is to set up a simple HTML page using some javascript that can be run locally on a single users machine or website. Since the scraper is a C# dll it could be set up to run as a timed service on a web server to keep a site up to date.
Code at github.com/anonsw/qtmerge does some similar things. Check it out, maybe there are some useful ideas to lift from there: anonsw.github.io
Yeah I knew about that - but I'd already been getting data from QCodeFag. The QCodeFag data was the basis for what I have now since it had already done the scraping on 4ch. I wanted my own in C# source going forward that I can use locally with my other C# code.
I don't know why nobody cares but it's trivial do download threads, posts, and boards through the 8ch api in the form of JSON. There is no reason to not have the local client make the get request every so often.
Yep. That's why I did it. Getting all the JSON is easy once you know where everything is - but stuff sliding off the catalog was what made me want to keep a local archive.
I meant the hypothetical client with which people are searching this board and staying updated. That client should search for posts all on it's own instead of relying on a single source of truth. (saves infrastructure money too)
Precisely.
Once I get it finished I'll provide a single HTML page that is like QCodeFag. View on your desktop.
Run the chanscraper then view the HTML to see new posts
Cool, check out qanonmap too for posts no longer retrievable. I think they have some that qcodefag doesn't have.
github.com/qanonmap
qanonmap.github.io
not sure if thestoryofq.com is related
But they are qcodefag forks.
Yep, but I think new ones just haven't been added yet to qcodefag.
Hmm.. That doesn't help me - I've got those. I'm only showing 825
Ctrl-f is only good on a single thread. What researchers really need is a way to access the entire set of Q posts. I've built that capability for myself locally by parsing ctrl-s saves of the threads into a MySQL database and running SQL searches on that.
The best bet for a public search engine might be to cooperate with CodeMonkey to build a search capability for the boards. We'd still have to search each board separately, but at least we would be able to search each board all at once.
I've got most of the Q related posts from 4chan and 8ch locally, but I'm not sure how to make that much data publicly available. I've also got a fair amount of PHP code that I use to access and organize the raw data. I'd be willing to share it if I had a place to do it.
>>493751
Actually, I have had chan posts show up in browser search engine results, but I know this isn't what you're after. I've built the type of search capability you're after on my local machine. It still takes a lot of time to work with the posts, but it's definitely easier than anything we can do at the original sources.
>>494228
Timeline is easily generated when one has the ability to set the post time to something other than the current time. That's how I create timeline posts in my own database.
I definitely appreciate that notable posts are included in the breads on each thread. It isn't necessary for them to be updated on each and every thread, but it is good to have them updated at least every day. Right now, I'm using the links in the bread posts to mark posts in my private database as being included in the bread. Given the volume of posts that I am now working with, these links make it easier to determine what is important to include.
>>494503
I use PHP because it's free. shrug
>>494471
If you're lucky, you can find your archives on archive.org. That site saves pages with about nearly the same HTML elements as the original page. Archive.is converts the classes used on the original page into their style equivalents, making for a parsing nightmare. When I've had to use the archive.is version of a page, it was a painstaking process to recreate the single post that I went to the archive to get. My parser code can parse the archive.org archives the same as the original, so it's easy to get all posts from that archive.
I've already done this. I'm willing to share my data structures and parsers, if I have a place to do it.
I've got tagging fields included in my data structure. Getting them filled is an entirely different matter. I've got a tool to help do it more efficiently than phpMyAdmin, but it needs a bit of work to make it just a bit more efficient so that more than one post can be updated in one pass.
The challenge is classifying the posts to determine which sub forum to direct them to. Not trivial.
There are over 750,000 total posts from both sites and all boards containing Q related posts. It's a large data set now.
Why not just build a 4chan archive site? That's the main thing lacking from 8ch.
Literally just build an index of tags and use fucking client side javascript. Muh databases. Jesus Christ people. You could even let users share tags.
First one with a completed project wins. Peace.
https:// 8ch.net/qresearch/archive/index.html
Here's the archive again + a handy HTML page that you can use in your browser to view the archives locally. Works fine in Chrome and IE. Readme included.
https:// anonfile.com/W3f5H6d8be/QJSONArchive.zip
OK, so why not do the fashionable, continuous integration FOSS thing and add searching to the archive site at the repo?
I expect because 8ch is not a massive corporation with a bunch of resources at their disposal. /sudo/
What difference does that make? Anons are gathered here. Why don't they just go there to assist in development instead of fragmenting and branching out to 1000 directions? Consolidate, integrate, then diverge.
>links
If it's server based something like http:// arborjs.org/ For data visualization/selection would then fix the mapping problem and help a lot with the search problem.
>links
There's also the Open Visual Thesaurus project to maybe grab code/ideas from www.chuongduong.net/thinkmap/ to view the data search and what else might be related to walk through the data.
Here's a newer local archive that moves there.
I've put in some UI enhancements to the JSON Viewer HTML page. Seems to be working good. With a slight mod it could work with local json from any QCodeFag site or even direct from 8ch.
https:// anonfile.com/5ercH3d9ba/QJSONArchive_v1.zip
Getting the posts into 2 columns should be no problem. It's getting a reliable news source that is gonna cause you trouble.
I was planning on putting 3 columns in the viewer, QPosts, Times, DJTweets. In doing all this I've discovered a few things about 8ch/halfchan. The post id's are not guaranteed unique. The best unique key is time and I've found 2 posts that dropped at the same timestamp. Thematically I've been trying to key everything to time. [qposts, tweets, news]
Jump in.
yEd can produce maps from spreadsheet data. That's one I know of.
https:// www.yworks.com/products/yed
Maybe when I get further along in the post tagging work, it'll be useful.
I'm toying with the idea of making my raw data available in some way, possibly in read only format. (Clowns can be destructive.)
I would like to be able to allow others to tag posts in my database. Any ideas on how to keep clowns from shitting everything up?
My initial thought is to allow suggesting of tags (similar to comment logic in the blog) with moderators making final decisions on them.
One of the big reasons I hesitate in making the entire database available is because a few of the images uploaded into the threads are obscene. I have no desire to inadvertently public that sort of thing. When I'm publishing a reviewed subset, the chances of that happening are low.
Perhaps?? just a guess.
Half Past Human .com
Absolutely the capability!
Discretion and interests match? Dunno.
Is there an interest in pre-selecting data?
For example, select only posts identified on "notable posts" lists from each general #.
Plus, of course, any to-from links on those selected, chained.
Just asking. DB size, usability, etc.
Or is the data set also for researching shill/troll themes? It is a possibility, so I ask.
I'm working on that right now. I got started on this a week or so ago. I wrote a bit of code to travel back through context links, too. Hopefully, in a few days, I'll be able to repost my blog with the results of this work.
A bit more to say about that:
It's my plan to include items that reach back to a Q post together with that Q post when I can identify such. I may do a little pruning to keep the length of the entry associated with a Q post under control. Not everything in a context thread is important, after all. I may have to think about further arranging of things. I'll think more about that as I get closer to a point where I can implement such a strategy.
>There are over 750,000 total posts from both sites and all boards containing Q related posts.
Yes, and that's the challenge. Making the Q "related" post searchable. Making Q's posts searchable is arguably not as important as making the body of related posts searchable as that's where the body of knowledge resides.
"You have more than you know" taunts us with its promise. We get pointed to Loop Capital, or Stanislav Lunev. We need to be able to search/aggregate all of the posts over weeks/months with a single search. The dedicated research threads are great as far as they go but we're missing a lot of other info posted as snippets.
>few of the images uploaded into the threads are obscene.
That does complicate it, but a lot of the information in the Q "related" posts is graphic. It seems culling of obscene content would need to be done manually to avoid throwing the baby out with the bathwater.
Good catch. I found some in my db as well.
I like the post headers in the UI. Nice and clean.
Yeah, qanonmap has had all of those for over a week nowโฆ
What is everybody using as their sources for drops? 8ch? One of the QCode forks? Something else?
How do we verify that our collections are the same?
I've been adding a Guid for each post I scrape, just to give them all a unique value.
qtmerge uses the raw JSON/HTML data where relevant from 8ch, 4plebs and trumptwitterarchive as it's source data. It also merges in the JSON from qcodefag/qanonmap. It currently uses the host, board, post timestamp and post number to sync.
I like the idea of matching the GUIDs along with a post hash using some method we agree on.
Oh shit. Qtmerge is scraping HTML pages? You are dedicated. I sourced stuff from qcodefag that I couldn't get json for.
Do you have the full bread sources?
Phonefag right now.
There's an md5 field as you know in the 8ch json, but it wasn't in the data I got from Qcodefag. Because he'd modified the .com to strip HTML into a.text field.
My chanscraper keeps the md5 and the .com and strips HTML into .text.
Any C#fags here?
I did set up a GitHub yesterday and push the chanscraper out. Gonna get the Twitter stuff mashed in the next few days.
Just ran my chanscraper again since apparently there were new posts last night as I was jacking around with Github.
I checked my posts with what's on qresearch and I think I'm good. Showing 839 total now.
New Q posts from 828 - 839.
I found a bug in the ChanScraper code too. A thing I've been working on that I forgot to remove. I'll push it out too and then link the GitHub.
Here's the link to my new GitHub
https:// github.com/QCodeFagNet/SFW.ChanScraper
If you are going to run the ChanScraper and then view the posts locally, when you open the QJSONViewer.html page, don't open the [json_allQPosts.json] file, open the newly generated [bin\json_allQPosts.json] file.
The machine needed me to include all the existing posts/work json. It's kind of clunky the way I'm doing it because I want to keep this updated with the latest posts/work json. But for a normal user everything is kept updated automagically in the bin\json folders. The project is set up to copy new files if newer - so everything should be kept in sync.
If you are planning on running this locally you'll need the .NET framework 4.5 at least. Probably better to go with 4.5.2
https:// www.microsoft.com/net/download/dotnet-framework-runtime/net452
You'll need Visual Studio free (at least) to build it unless you are a commandline master.
https:// www.visualstudio.com/vs/visual-studio-express/
Only HTML of archive pages.
Does your scraper work on the archive.is versions? These are the most complete most of the time since that is where so many of the pages were almost immediately saved by anons.
Tedious Dayum. Think you could convert your full bread scrape into some json?
Gotta link to one of the JSON files?
Here's a mini local JSON viewer as an HTML page + allQPosts.json. @225KB
Includes all QPosts up to 2018-03-04T11:29:14
https:// anonfile.com/06HeJbdeb6/Mini_Local_JSONViewer.zip
I was just thinking that what we really need, to start off with is a single schema that we can all agree on. It will go a far way in interoperability.
I'm going to run some tests on my local QCodeFag install and see if it will work off of the ChanScraper _allQPosts.json file. I think it should.
The JSONViewer could work with straight files from 8ch or 4ch with a single minor change I forgot to put in.
The ChanScraper includes the full JSON archive as of this morning. I haven't need to go back to any archive.is HTML archives because I've been collecting breads locally since the beginning of Feb. All the Q Posts before that I sourced from the QCodeFag forks.
Here's what the JSON schema I'm working with looks like.
[
{
"source": "qresearch",
"threadId": 544266,
"link": "https:// 8ch.net/qresearch/res/544266.html#544985",
"imageLinks": [
{
"url": "https:// media.8ch.net/file_store/ffd6128f5949e4d4f6f3480236a63be002ffc5e59c0a31714360624d8ce45170.jpeg"
},
{
"url": "https:// media.8ch.net/file_store/ffd6128f5949e4d4f6f3480236a63be002ffc5e59c0a31714360624d8ce45170.jpeg/B42CA278-6C32-4618-A856-0CB9B680CC38.jpeg"
}
],
"references": [
{
"source": "qresearch",
"threadId": 0,
"link": "https:// 8ch.net/qresearch/res/0.html#548166",
"imageLinks": [],
"references": [],
"no": 548166,
"uniqueId": "19294a1b-8cae-435d-9503-8eb70c573d6b",
"_unixEpoch": "1970-01-01T00:00:00Z",
"text": "\r\r>>548157\r\rAlso not a real Q post\r\rQ",
"postDate": "2018-03-04T11:19:47",
"time": 1520180387,
"tn_h": 0,
"tn_w": 0,
"h": 0,
"w": 0,
"tim": null,
"fsize": 0,
"filename": null,
"ext": null,
"md5": null,
"last_modified": 1520180387,
"sub": null,
"com": "<p class=\"body-line ltr \"><a onclick=\"highlightReply('548157', event);\" href=\"/qresearch/res/547414.html#548157\">>>548157</a></p><p class=\"body-line ltr \">Also not a real Q post</p><p class=\"body-line ltr \">Q</p>",
"name": "Q ",
"trip": "!UW.yye1fxo",
"replies": 0
}
],
"no": 544985,
"uniqueId": "35c759aa-4998-4009-83a7-2af1b3273f28",
"_unixEpoch": "1970-01-01T00:00:00Z",
"text": "\r\r>>548166\r\rNOT A REAL Q POST\r\rQ",
"postDate": "2018-03-04T00:17:27",
"time": 1520140647,
"tn_h": 237,
"tn_w": 255,
"h": 1114,
"w": 1200,
"tim": "ffd6128f5949e4d4f6f3480236a63be002ffc5e59c0a31714360624d8ce45170",
"fsize": 271479,
"filename": "B42CA278-6C32-4618-A856-0CB9B680CC38",
"ext": ".jpeg",
"md5": "CbsCGk0pVEahunzSuV4LKw==",
"last_modified": 1520140647,
"sub": null,
"com": "<p class=\"body-line ltr \"><a onclick=\"highlightReply('548166', event);\" href=\"/qresearch/res/547414.html#548166\">>>548166</a></p><p class=\"body-line ltr \">NOT A REAL Q POST.</p><p class=\"body-line ltr \">Q</p>",
"name": "Q ",
"trip": "!UW.yye1fxo",
"replies": 0
}
]
Let me clarify, HTML for just the archive pages (to capture threads not in catalog/threads.json). JSON for everything in else.
I'm working on how to share it, currently unoptimized and around 6 GiB of data uncompressed.
http:// archive.is/https:// 8ch.net/cbts/res/*
It doesn't look like archive.is does JSON. Your parser doesn't do HTML?
Yeah I've dug thru all the html looking for a reference to a json file. Can't find a reference to one either. My guess is, that once it drops off the main thread catalog, the JSON is no longer available. Too bad because that's the meat in a simple format.
No the machine is more of a scraper (grab data and save it) than a parser. It does parse the HTML out of the .com field into .text like QCodeFag does though. It's not designed to read thru html pages to look for posts.
It has a local baseline archive of everything.It reads in that entire local and then figures out the json breads it needs to download from the 8ch/qresearch/catalog.json. Then it downloads all those new breads and resets itself so you don't download everything every time - only the breads from the past [x] days.
You've got a database? I assume that's with all the images as blobs?
Here's an updated mini local JSON viewer as an HTML page + allQPosts.json. @225KB
I updated it so it works with the raw json from 8ch.
https:// 8ch.net/qresearch/res/553655.json
Could probably use an [ascending/descending] button butโฆ
Includes all QPosts up to 2018-03-04T11:29:14
https:// anonfile.com/z4U1Jdd9b9/Mini_Local_JSONViewer.zip
If folks don't like a zip, it's only 2 files they can download the HTML file (ChanScraper) and the allQPosts.json (Console\bin) file on github https:// github.com/QCodeFagNet/SFW.ChanScraper
My images are kept as separate files in original form. Only the links are kept in the database. Here's the record definition for MySQL:
CREATE TABLE chan_posts
(
post_key
varchar(31) NOT NULL COMMENT 'site/board#post (post is set to length 9 with . fill.',
thread_key
varchar(31) NOT NULL COMMENT 'site/board#thread (thread is set to length 9 with . fill.',
post_site
varchar(19) NOT NULL COMMENT 'For editor post, use editor. For spreadsheet, use sheet.',
post_board
varchar(15) NOT NULL COMMENT 'For editor post, use editor. For spreadsheet, use sheet.',
post_thread_id
int(10) UNSIGNED NOT NULL COMMENT 'For editor post, use 1. For spreadsheet, use row.',
post_id
int(10) UNSIGNED NOT NULL COMMENT 'For editor post, use next available. For spreadsheet, use column converted to number.',
ghost
int(10) UNSIGNED DEFAULT NULL,
post_url
text,
local_thread_file
text,
post_time
datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
post_title
text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
post_thread_title
text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
post_text
text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
prev_post_key
varchar(31) DEFAULT NULL,
next_post_key
varchar(31) DEFAULT NULL,
wp_post_id
int(11) UNSIGNED DEFAULT NULL,
post_type
set('editor','q-post','anon','approved','high','mid','low','irrelevant','timeline') NOT NULL DEFAULT 'anon',
flag_use_in_blog
tinyint(1) NOT NULL DEFAULT '0',
flag_included_on_maps
tinyint(1) NOT NULL DEFAULT '0',
flag_included_in_bread
tinyint(1) DEFAULT NULL,
flag_bread_post
tinyint(1) DEFAULT NULL,
flag_relevant_img
tinyint(1) DEFAULT NULL,
flag_relevant_post
tinyint(1) DEFAULT NULL,
author_name
text,
author_trip
text,
author_hash
text,
author_type
smallint(6) DEFAULT NULL,
img_files
json DEFAULT NULL,
link_list
json DEFAULT NULL,
video_list
json DEFAULT NULL,
editor_notes
text,
tags
text,
people
text,
places
text,
organizations
text,
signatures
text,
event_date
datetime DEFAULT NULL,
report_date
datetime DEFAULT NULL,
timeline_title
tinytext
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE chan_posts
ADD PRIMARY KEY (post_key
),
ADD KEY post_id
(post_id
),
ADD KEY thread_key
(thread_key
),
ADD KEY site_board
(post_site
,post_board
);
I'm considering making the database publicly available. I need to figure out how much space it will take up and whether it will fit within my current hosting plan. At present, I have over 880,000 posts in the database. The size of the database file for just this table without the images is 1.1GB. There's another GB for images of Q posts, but this is only the fraction that is Q posts, bread posts, and for the context posts related to these.
I guess I should start uploading. I've got the unlimited plan. Anyone want to write the search feature for it? Preferred language is PHP.
For now it just uses a dedicated file system.
With images gathered so far this mirror's total size is 193 GiB.
holey phuck. 193 GB. That's for a full archive of all breads + images? My local scrape of Q breads and posts as text only comes in at 6mb. My local QCodeFag install with text + Q images is just under 100mb.
193GB is getting unmanageable.
Yes, unoptimized and incomplete.
Not unmanageable. Just big. Maybe every thread needs its own directory for its images. And maybe the data needs to be moved to my other drive locally.
I'm working on the export files now. I need to change the posts just a bit before I can make them public.
I promised that no links would go to 8ch and particularly qresearch, and also that I would redact mentions of them from the content. I already do this on my blog, but I simply broke the links rather than made them go somewhere else. To get the most out of the republishing of the posts, I need to convert the >and >>> links so that they link to posts stored on my own site. This is probably better anyway since many posts and threads are now missing from their original locations.
Yeah it's not totally unmanageable. It's more like moving a full grown oak tree. You can do it, but it's a huge pain in the ass. I was thinking more in terms of moving it around the internet or hosting. That's a pretty big db.
I rejiggered the ChanScraper to archive all the breads even if there isn't a Q post in that bread. It rendered 215 NEW complete breads and brought my jason net filesize from 6MB to 200MB. Starts around "Q Research General #358".
That's with no images, just the raw JSON from 8ch. Each bread is around 700kb.
I did some research on collecting the CBTS threads from 4chan/pol the other night and the results might be useful for others. They can be found at the bottom of the page here:
https:// anonsw.github.io/qtmerge/catalog.html
It's still a work in progress.
>anonsw.github.io/qtmerge/catalog.html
I may be able to give you an list of all those links from the data I have from QCodeFag
Yes, the breads are essential. I've got them going back all the way through 4chan stuff. The breads are how you connect in the answers. If you connect up the contexts, most of them link back to a Q post at some point. Then the context of that post that was linked into the bread can be associated with the Q post. That is what I was working on before I started looking at making my entire database available for research.
Were you able to capture any of the original 4chan JSON/HTML data? I wasn't researching Q at that time so I've relied on 4plebs.
I have created a searchable application for /qresearch/.
The database is filling right now. I kept only the image attachments in order to save hard disk space.
At present 52,000 of the most recent posts on qresearch are loaded in the table with the attachments. We'll see how the storage works out.
I'll advise when anons can attempt to use the system.
I've got most of it, yes.
I don't know if y'all noticed, but I've got several columns in my database that are not part of the original data. Some of these are tagging fields: tags
, people
, places
, organizations
, and signatures
. It would be difficult to automate the filling of these fields, but I don't want to entirely open up editing of these fields to anons, either, due to the potential of clown interference. There's no way I can fill all of them in myself. I have an idea to allow tags to be suggested and then allow up-voting and down-voting and coming up with an acceptance criteria before giving them a permanent place in the data record. Or maybe just leave them in that form with their ratings.
Excellent. Will that raw JSON data be in the DB as well?
I did notice, those are great ideas. Can I suggest letting each user have their own copy/edits of the metadata? The user-specific data could then feedback into the system for suggestions to others, etc. But primarily it gives the user some way to control the interference/noise.
What JSON are you looking for anon? Bread before 2/6/2018?
I've rejiggered the ChanScraper to produce TwitterSmashed json. It includes any DJTweets within 60 mins of a Qpost. Here's what a [5], [8], [10] deltas look like.
{
"DJTtwitterPosts": [
{
"accountId": "realDonaldTrump",
"accountName": "Donald J. Trump",
"tweetId": 944665687292817415,
"text": "How can FBI Deputy Director Andrew McCabe, the man in charge, along with leakinโ James Comey, of the Phony Hillary Clinton investigation (including her 33,000 illegally deleted emails) be given $700,000 for wifeโs campaign by Clinton Puppets during investigation?",
"delta": 5,
"link": "https:// twitter.com/realDonaldTrump/status/944665687292817415",
"uniqueId": "00e6951d-5f49-455b-bdd9-bda7f184d9c7",
"time": 1514060825,
"_unixEpoch": "1970-01-01T00:00:00Z",
"postDate": "2017-12-23T15:27:05"
},
{
"accountId": "realDonaldTrump",
"accountName": "Donald J. Trump",
"tweetId": 944666448185692166,
"text": "FBI Deputy Director Andrew McCabe is racing the clock to retire with full benefits. 90 days to go?!!!",
"delta": 8,
"link": "https:// twitter.com/realDonaldTrump/status/944666448185692166",
"uniqueId": "92fbb1a2-169e-412c-abba-6e441d3acbaa",
"time": 1514061006,
"_unixEpoch": "1970-01-01T00:00:00Z",
"postDate": "2017-12-23T15:30:06"
},
{
"accountId": "realDonaldTrump",
"accountName": "Donald J. Trump",
"tweetId": 944667102312566784,
"text": "Wow, โFBI lawyer James Baker reassigned,โ according to @FoxNews.",
"delta": 10,
"link": "https:// twitter.com/realDonaldTrump/status/944667102312566784",
"uniqueId": "eabb202f-3b59-48c9-b282-f0110b8388a5",
"time": 1514061162,
"_unixEpoch": "1970-01-01T00:00:00Z",
"postDate": "2017-12-23T15:32:42"
}
],
"no": 158078,
"name": "Q",
"trip": "!UW.yye1fxo",
"sub": null,
"com": null,
"text": "SEARCH crumbs: [#2]\nWho is #2?\nNo deals.\nQ\n",
"tim": null,
"fsize": 0,
"filename": null,
"ext": null,
"tn_h": 0,
"tn_w": 0,
"h": 0,
"w": 0,
"replies": 0,
"md5": null,
"last_modified": 0,
"source": "8chan_cbts",
"threadId": 157461,
"link": "https:// 8ch.net/cbts/res/157461.html#158078",
"imageLinks": [],
"references": [],
"uniqueId": "e22306cc-2831-453a-ae1d-16e90aa23707",
"time": 1514060541,
"_unixEpoch": "1970-01-01T00:00:00Z",
"postDate": "2017-12-23T15:22:21"
}
4chan JSON for pol between 2017-10-30 and 2017-12-01.
I'll keep my eyes peeled. Finding old JSON for those days is hard. Is 12-1 when you started archiving? Got bread json < 2-6-2018?
I could develop an export, I suppose. But that's low on my list of priorities at the moment. The data structure is above in the list. Minor alteration needed: My host does not support JSON fields. Substitute TEXT, and you should be good. If you want to write an exporter, I can review it and include it.
But I still don't have the data up there yet. I'm working on the alterations to the data needed to keep everything on site at the host.
I was thinking of attaching the IP address to each suggestion to keep the up-votes and down-votes honest. Is that enough? Or maybe even too much? The other thing I could do is perhaps tie in the WordPress login system, since it's there anyway. It might take a bit of time for me to figure out how to limit permissions.
Thanks, 4plebs is good for now, but a second witness is preferable. Started archiving Feb 15, but some old data was still available at the time.
For 8ch these are the oldest breads I have:
pol: 10509790 (2017-08-28)
cbts: 10 (2017-11-21)
thestorm: 1 (2018-01-31)
I don't have all breads after though, it is incomplete.
I've since stopped archiving pol/cbts/thestorm to save time/space.
Not enough due to VPNs, DHCP, etc. The login may be the best way.
I think you and I started archiving those about the same time. I've got complete json breads from 2/6/2018 to now. if you want any of that.
I might already have it, is it in the QJsonArchive.zip from earlier?
Ya - you probably have the breads from the last few days eh?
I do, I'll call your dataset QCodeFagNet unless you want a different name. Instead of the zip I'll pull it from your github.
Sounds fine. I'll try to keep it updated.
Logins require email addresses. I guess it's always a choice whether to participate.
Q Research General - searchable archive breads 716-477 presently online.
www.pavuk.com
username qanon
password qanon
updates as I find them
Looking good
There so much content being produced now that it should be compiled into a wiki in a dedicated thread. The other threads investigate and make the content, this one adds the best content into one big archive, updated in real-time ofc bc they never stop why should we pic related.
BUT WHY
To take Q's work to the next level we have to increase the public's basic awareness of the criminality being exposed, investigated, and terminated, by an order of magnitude. That order of magnitude is pretty normal people.
>be a normal person
>want to do the right thing but get a link to this Q thing and there's too much complex and """scary""" info what with muh job and family and everything else
>the big load of content is overwhelming and i don't know where to begin and have it be easy
<make 1 entry point to begin browsing the entire body of accepted content
<terse organization keeps it brief and saves the details for a leaf page a click away, as deep as is necessary
<keep source of body of accepted content continuously up to date
<using https for minimal integrity protection
>now i can begin a review of the evidence contained in the case file archive with a single click! jeff bozos eat your heart out nigger
>and look at short well-organized and sourced text, and pictures, and the odd video
>and easily get a run down on whatever topics i browse my way upon
>and now even though my eyes have been opened in a pretty dramatic way, it was easy to use and i know it'll be easy to share, to the topic level
I hear you anon.
The key is the content. We have the ability archive threads/qposts. Posts that Q references. Tweets. Known tripcodes/twitter accounts.
What is the source of all the evidence? The dedicated research threads? Notables? In order for it to be automagic, there needs to be a reliable single source here on 8ch. None of the codefag work I've seen reaches a level of what could be called AI - or the ability to discern which anon has posted a certifiable answer/evidence.
Non automated means anonomated, but that causes it's own set of issues.
I agree a wikipedia style thing would be good because it's familiar, but populating it with data may be an issue. Some of it's going to have to be entered in manually.
If all you are looking for is a location for an anon wiki, I think that's pretty easy.
No, not automated, curated.
Should I hit _allQPosts.json?
I'm stuck. I'm working on getting that database up for you, but I have to make some modifications to the post_text
field so that those links don't come here to 8ch. (I promised that I wouldn't do that.) I'm trying to fix the post_text
field so that the >links refer back into the database, but I'm not familiar enough with the DOMDocument and related classes in PHP. Are there any good tutorials out there on how to do advanced manipulation of HTML using these classes? The reference manual stuff just isn't doing it for me.
I should clarify something. Not only am I going to make the existing links self-reference, but I'm also going to revive those dead >links and point them back into the database. I've got many of the deleted threads in my database, too, and I can make those available.
Ya that's fine. I'm going to update that today to cover the latest.
I've been working on a new local viewer that uses the twitter smashed data. It shows the delta + alt text of the tweet + a link to the tweet. I've noticed that alot of the image links I have a currently broken. I was thinking I'd just update those to point to one of the other QCodeFag branch archives rather than try and archive all the images as well.
Expect an update on GitHub later
Here's what it looks like. Just trying to finish off a sort idea and clean data.
Good news! I've got the code working which makes the post links compliant and refer back into the database. Almost as soon as I posted the request, it came to me that I was making things more complicated than they needed to be and a better algorithm came to mind. The algorithm is so good that in cases where good posts didn't link in 8ch, they will be linked on my site. That includes links such as the one Q pasted into the middle of a word the other day or when they are consecutive with or without comma or white space. Anywhere there is a >followed by a bunch of digits, a link should be created. The only exception is where the post number of the link is greater than the post number of the current post. This type of error was encountered in early posts after the transition from one board to another. Anyway, I'm going to run a few more quick tests, and then I should be uploading to my host within a few hours. I still don't have code ready to search it, though.
When you get that worked out make sure to let us know. I've been wondering about that myself. The early halfchan no's are pretty big. I've found some bugs in my code around there being multiple references per Q post. It does happen on occasion and my scraper isn't catching them all.
I've just uploaded a bunch of json data to the https:// github.com/QCodeFagNet/SFW.ChanScraper/tree/master/JSON gihub. The json folder is what's generated when you run the ChanScraper, the smash folder when you run the TwitterSmash. Each of those folders has a Viewer.html file that can be used with just the _allQPosts.json or _allSmashPosts.json.
Like I said I need to clean up some dead image links for everything to be working right.
You MIGHT be able to get thumbnails from archives, but you won't get full size images there, for the most part.
Ya think it's bad form to go lazy and link em to one of the qcodefag archives?
Part of making those offline archives is storing the items. Plus, don't assume any platform is forever. There are too many clowns out there who don't want anyone to see this stuff.
So now I've got a bunch of export files of my database ready to upload. Next challenge: Automating the import on the hose.
The table of posts has been added to the database. It's all up there. (All I have, anyway.) I need to get a way to make searches available to you now.
So you have all the breads searchable as well?
Everything is searchable. The database includes all posts I could find. I'm working on the search front end right now.
This is what the front end looks like right now. I'm working now on turning that into a SQL statement that can search the database. I'm only an hour or two from putting this online.
It's up there. The paging isn't working yet, so don't anyone complain about that. I'll fix it in the morning. I also discovered that a key range of posts didn't import properly. I'll fix that in the morning, too. For now, I've set the posts per page to 2000, which may cause timeouts, but it will allow people to play with things a bit.
http:// q-questions.info/research-tool.php
ANON, great work.
HOLEY FUCK YES.
This crosses all breads? If so then this is exactly what we need. I can help you with the SQL if you need it.
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15 you should also specify an ORDER BY
How are you getting the breads? Maybe I can work out a way to get you those. Combine up somehow
I've been thinking about this. Preliminary research shows that elasticsearch and lucene would probably be the best match for what we've got. There are alot of tools that pile into elasticsearch. Any hostfags here with the ability to set up an elasticsearch node?
The data is big. Tons of images. A proper archive takes space. I'm holding @546 complete breads and with no images it's 250MB+. That's for like a month. By the end of the year the bread collection alone is going to be over 1.5GB.
The images I've got so far is around 100MB, but that's just from the Q posts - and even then I know I'm missing some.
Econ Godaddy hosting is like $45 a year. I'm thinking about just putting the chanscraper/twittersmash online, then write some simple apis. Get thread#, filteredThread, qpost# that kind of thing. Useful or no?
My algorithm for getting breads is this:
-
Get the author_hash for the first post in a thread.
-
Mark the first posts in the thread that match that author_hash until the author hash doesn't match.
If someone jumps in before the baker is done, oh well. But that shouldn't be much of a problem because the breads get repeated a lot. I can mark posts as bread later, if need be.
Hmmโฆ When I say bread I mean a full Q Research thread. Like this
https:// github.com/QCodeFagNet/SFW.ChanScraper/blob/master/JSON/json/8ch/archive/651280_archive.json
That's the straight bread/thread from 8ch. It includes all the responses whether the BV posted it or not.
I'm finding those by getting the full catalog from
https:// 8ch.net/qresearch/catalog.json, finding the breads/threads that have q research, q general etc in them, and then getting the json for that thread only from https:// 8ch.net/qresearch/res/651280.json
I think I see what you are doing - going thru and trying to mark the relevant posts?
I haven't even looked at at that.
Paging is fixed, plus I gave you a couple other search parameters.
I'm still working on the import issue, but I at least have put the posts I initially identified as missing up there.
>I think I see what you are doing - going thru and trying to mark the relevant posts?
Yes. Most of it is done automatically. Since I save the marks in the post records, I can go back in there and adjust it, if necessary.
>Useful or no?
I'm not the guy to ask. The discussions here went over my head immediately. Looks like there's some serious progress being made here:
One question I have for contributors here is when there is a consensus that you have created a viable search tool, how will you manage promulgation? Do it like a war room announcement on qresearch?
As many have noted, the search tool has to be hardened against tampering before release. Clowns/shills are devious and destructive.
I agree on shill proofing.
I've been playing around with a webAPI. I've got it working nice with all the q posts, looking for a specific post# like #929, and posts on a day. Returns json or xml. This is the Crumb Archive.
My plan is to expand that so that the archived breads can be accessed as well - each as a single json file. This is the Bread Archive.
I'm going to set it up where it's an autonomous machine. It will scrape and archive automagically moving forward from the current baseline. No delete. No put. No fuckery.
I'm pretty sure it would with the QCodeFag scraper repos.
The bread archive is pretty big. I'm sure there's no way I can archive images for all the breads. An image archive isn't what I've been focused on. The focus of this is only making the json/xml available from the chanscraper.
Once I can get the breads all up and being served automagically my plan is to set up an elasticsearch node and suck all the breads in.
I figure a year of godaddy hosting is currently $12 with unmetered bandwidth. I'll throw in.
Yes, I'm concerned about that, too.
Perhaps it helps that this data does not reside only there?
In this case, it would take me about half a day to get it all up there again, if need be.
Searchable Qresearch
www.pavuk.com
username: qanon
password: qanon
Updated regularly with the messages and images from Qresearch general.
I'm beginning to wonder if I'm up against some kind of limit on my remote host. I just tried importing into it again, and I'm still missing some posts.
Remote host: 1,010,127 records
Local machine: 1,049,610 records
I'm using the 8chan JSON API endpoints. I still need to pull from the archive.json file downloaded yesterday.
My server is on a linode so I have fast response time.
Maybe I can split the table into 4chan and NewChan (my name for 8ch, since we can't link back to here) and see if they all go up.
You can search the text is the posts with wildcards. Say you want all posts with the word BOOM. Just enter boom.
Say you want the posts from Q with his tripcode and "boom"
Put !UW.yye1fxo in the trip code.
put boom in the comment
Click search button
voila.
Has anyone found a way to go back past the 25 pages in the console.json?
Can I access this? I'd like to add the DJT tweets into the database. Twitter is wanting more and more data before they give me an API key.
U.POSTS.NEW is the new-format table.
U.POSTS.NEW.ATT is the table of attachment for the primary table. Each one is a link to a binary
Wow, awesome job! I knew it could be done. I'm going to need some help getting started. Could you put a qsearch for dummies tutorial together?
Did you have to create, or did this create a chronological list of all Q related threads and their titles if any? (/pol/cbts/CBTS(8ch)/The Storm/ qresearch)?
That might be a good Mnemonic to speed searches.
how about:
-archive threads as they go
-convert to text files, with links to posts
-txt files are easily searchable
I've not been back into this thread for a while. I'm running the qresearch import process to get up-to-date. One technique that is needed is to re-scan already imported threads for posts missed during initial scans.
Threads are imported from the catalog.json file. In this state, we know the thread number and the number of messages at that time. The only time we know a thread is closed is when the number of posts >= the number in the official "bake" count.
Therefore, my program keeps testing until the posts counter >= the bake counter and then marks the thread as complete in the thread table. This then prevents re-scanning all threads because we get only the open ones.
Multiple scans of posts are needed to get all of them and to deal with duplicate threads.
I use the 8-chan post number as part of the primary key to the threads and posts tables.
8GA_1 is 8chan Great Awakening post 1
8QR_655000 is 8chan Qresearch post 655000
The big problem is going back to find threads BEFORE the last 25 pages in the catalog.json. Therefore, I can't get anything earlier than when I first wrote the import.
The import routine uses the JSON API endpoint from the boards. In the JSON is the Unix timestamp of the message. This is a native field/object type in Pavuk. Thus all timestamps are set to UTC internally.
NOW, if I could get DJT's Twitter feed in JSON, it also has UnixTime and this goes in directly.
Twitter wants me to give them all sorts of documentation before they will allow me to use their API. Frankly, I don't have the time to deal with them or the inclination.
I can get other boards provided the endpoints are similar and that the catalog.json file still has links to the threads.
BO has never responded to my requests on how to get older threads.
Super simple.
Entry forms are also search forms.
Enter the data that you wish to match.
Click the search button.
Pavuk creates and then executes the appropriate query and returns the items in a Kendo grid. Scroll, resort, export to excel or click on a row to return to the entry form with your data.
searching on timestamps has issues that i need to resolve
I'm done for the day.
The comments from the JSON API include markup and JS to go to real links. This is a problem with the storage and search. I pipe the comment string through Lynx with the -dump option and this gives me clean text in STDOUT and then a separator and then the list of actual links. I put the text in the comments and the links in a multivalue table. I'll expose the links tomorrow as a separate tab in the entry form.
What about 100k transactional batches?
Jesus Einstein, give us a starting point to keep up.
Yeah man hit it. I've got a github here you can browse around.
https:// github.com/QCodeFagNet/SFW.ChanScraper/tree/master/JSON
json/8ch has the filtered/unfiltered bread and archives in it. smash has the twittersmashed posts. I've been getting my twitter data from http:// www.trumptwitterarchive.com/data/realdonaldtrump/2017.json, 2018.json
I set up a test for the webAPI twittersmashed posts here https:// qcodefagnet.github.io/SmashViewer/index.html
I'm getting close on having the webAPI thing finished up. Just running some more tests and then I should be ready to go.
Yeah you could mebbe use the smashed json from me. I've already done the unix timestamp on the trump tweets. All 8ch posts and Twitter posts dervive from the same Post base object with the unix timestamp built in.
I think that's because you can't really get them. There is an 8ch beta archive here, but all the Q Research threads dissappeared shortly after we started archiving them. Even then, those archives are straight HTML. It's of no use to me. AFAIK, once it slides off the main catalog, its pretty much gone. Some trial and error got me a few breads, but not many.
>BO has never responded
I'm not the board owner, just some schmuck who started a thread he thought was being overlooked. You folks are so far out of my ballpark all I can do is try to keep it inside the curbs of what my original intent was.
I'd like to see a list/catalogue/file of all Q "related" posts.
Aaand I'd like to see a list of post Q "related" posts across all platforms/threads made searchable. Plenty of focus on Q, we need the early digging and free association.
Interesting concept you have anon. You want to be able to search across ALL 8ch? Not just Q Research? By platforms are you talking 4ch/8ch? or 4ch/8ch/twitter/reddit/facebookโฆ?
The first time I uploaded, I batched them in by 1000.
The second time, I batched them in by thread. I'm not sure how well the LIMIT clause on the SQL works.
In any case, I may have a problem on both computers. I could have sworn I had over 1.1 million records the other night. (Not to worry. I still have all of the source.) The solution may be to partition the table. I won't have to rewrite any code, but it'll chunk the table's file down into smaller sections.
This should be interesting. I've never had to partition a table before. Apparently, newer versions of MySQL do it automatically. But until then, it's gotta be done.
Mine has 4chan, too.
If threads are missing, you have to look in archive.org/web or archive.is. Of the two, archive.org/web is better for scraping because the HTML code is about as close to the original as they can make it. I can actually use the same scraper program on it.
Since the stuff that is on archive.is is so different from the original, I will need to write a new scraper for those. On several occasions, the post was important enough that I rebuilt it by hand.
With either archive, you need to know the URL, which can be tricky sometimes. Just having the post number won't do it. You must know the thread as well.
Just thought of something: When I get threads from these archive sites, what time zone do they show? I believe my stuff is saving to GMT when I save a post directly from a chan site. I'm not sure what I'm saving when I get posts from these archives.
I would think the time is relative to the archive home timezone. That is, unless archive.x has done some wizardry to change the time zone it's pulling at to be the time zone of the user requesting the original archive. That would be more problematic - but you could still deal. It should be marked what time zone and then you convert into the unix timestamp.
The 4ch breads or the 4ch Q posts?
What are the chances it's hanging on a specific record? I see that all the time doing inserts. Bad data kills it off.
You could look into raising the timeout. Mebbe it's just such a long job that it's taking too long and timing out? https:// support.rackspace.com/how-to/how-to-change-the-mysql-timeout-on-a-server/
Here's a hint for how to find the post a dead thread belongs to: Go to the earliest archive of the thread on which you found the link, which will usually be on the archive.is site. If you're lucky, the link was still live when the thread was archived. The other thing to do is search earlier posts that you already have to see if someone else linked the same post.
Time out isn't the problem in this case. Since I'm working with small batches at a time, they're quite quick.
I have the vast majority of both. Go check it out.
http:// q-questions.info/research-tool.php
After I resolve the table size problem (which is what I think the real problem is), I think it would be good to work some more on my contexting program. On my local computer, I've got it so that it can look back through the links and show all available context with the post. What I haven't done yet is copy that contexting information to a Q post's context when I find one in the backward linking. It'll be ridiculously easy once I set about doing it. Then, when a Q post is pulled up, all that stuff that linked back to it can show together with it.
Hmm. Yeah just doing some easy math I can see how you would have more than 1mm records. We're at bread 815+ something here and with 751 post each that over 600k here on 8ch alone.
You may be onto something with that. Is there a limit? https:// stackoverflow.com/questions/2716232/maximum-number-of-records-in-a-mysql-database-table
Looks like number of rows my be determined by the size of your rows.
Yes, there may be a 1GB limit on the file size, and I'm right about there now. If I partition, I can get around that.
Below is the qtmerge modified raw dataset (text-only) as of 2018-03-14 02:07 UTC.
I'm putting this out in the hopes that it may be useful to others for ETL, mining, search tools, archiving etc.
Some notes:
-
The data is a synthesis of the the qtmerge datasets: https:// anonsw.github.io/qtmerge/datasets.html
-
For an idea of threads that are available see: https:// anonsw.github.io/qtmerge/catalog.html
-
eventcache.json file contains the posts/tweets/etc in chronological order. The type attribute currently dictates the local object structure (working to fix this to be more clean)
-
refcache.json contains the detected post cross references (this is a work in progress)
-
The referenceID attribute is the "primary key" between the files
-
Timestamps are Unix Time and time strings are US Eastern
Extracted size: ~850 MiB
SHA-256 sum: d6ed89da05c0b714fc66b04ca66a8d701456d882d5f128ee1cef26c8d2e22eb6
http:// anonfile.com/dazfO8d4ba/qtmerge-text-2018-03-15_05.18.37.tar.bz2
That's just the general threads. When I started linking through the breads, I found that I needed many of the other threads, too. Most of those are smaller, though.
> You want to be able to search across ALL 8ch? Not just Q Research? By platforms are you talking 4ch/8ch?
Not all 8ch. Just 4 and 8ch Q related threads. Q has posted in but a small part of all of the digging (and bullshit) threads and much info is contained in those threads. /pol/ was a cluster until adopting the /cbts/ threads, but they shouldn't be too hard to round up and include in the searchable database.
In fact, I'd only include the qresearch general threads since the GA/qresearch reset. Add the digging/ancillary threads as possible. Most of the gold is in the general's IMO.
The reason I'm pulling in other threads is because they get cited as notable posts. I'm not bothering with them unless that happens.
I can get the other boards and other threads, the issue is disk storage. Linode gives me a lot of bandwidth, but only a few gigs of disk until I change my plan with them.
The limit of an OpenQM hash file (table) is 16TB. When this becomes a problem, I can create a distributed file (table) by primary key. Say, put all 8QR in 1 portion, 8GW in another. Simply a way to have physical storage allocated
Pavuk session records are GUIDS. (don't worry, I'll purge anons out of the storage.) It was done because of commercial requirements for SOX and other audit compliance issues. Remember, I created Pavuk to build commercial apps.
The distributed file is built by using the first 2 bytes of the GUID from the primary key. Thus, it has component files:
00
01
โฆ
FE
FF
Or 256 parts.
Theoretical table size:
256 x 16TB = 4096TB
www.openqm.com
I'm going to look at your work.
I tried 4chan/cbts/index.html and got a 404 yesterday
Brother Anons, I can find the IDs of the threads by using the search function on Archive.is. For example, research general #2 was post number 799. Once I know this, I can go back to 8chan and pull up the thread.
Sadly, I cannot get it with JSON. I only can get HTML. This means parsing the HTML.
This means a new string parser, but it goes into the same table as the JSON, but with more work. Here's what the posts look like in HTML
I've put out a tweet thread showing the progress and asking if someone will step up to help lead a crowdfunding campaign so I can afford a bigger Linode.
>I tried 4chan/cbts/index.html and got a 404 yesterday
I'd expect that. Threads sunset there rather quickly. I think most everything from 4ch is in http:// archive.is/search/?q=%2Fcbts%2F
I got 22,900 hits. Some people used 4plebs and maybe even other archives. Need to know all of the archive sites used so we can add them to the soup.
A search on 4 plebs from 10-28-2017 to the night of the bans, 11-26-2017 shows 714 hits.
https:// archive.4plebs.org/pol/search/subject/cbts/start/2017-10-28/end/2017-11-26/
>crowdfunding
I don't know diddly about crowdfunding, but I will certainly contribute. Are things like that generally paypal friendly?
Belay that last link. It searches only to midnight of the day specified. This one goes through the 26th.
https:// archive.4plebs.org/pol/search/subject/cbts/start/2017-10-28/end/2017-11-27/
Here's some interesting trivia that I missed after being banned. I did see Q approving the first migration in real time, but missed this. Interesting.
I was just going to have folks send to my personal paypal account since I'm funding the site anyway. You can set up a regular monthly payment. I do that with others like Stefan Molyneux where we send $10/month.
We need to work together to get all of the data into the database. If someone could help with a Twatter feed from DJT - preferably raw and in JSON, that can be added to the posts table.
That was helpful. I would ask people in this thread to help develop the information model.
There is a "boards" table with the links to get data for each type. It can be expanded into which boards are archived where and I can automate the pulls.
>personal paypal accoun
Set up an account specifically for this, don't dox youself. (((They))) will be able to find you, but the malicious shills won't.
already doxxed. I own pavuk.
Ha, OK. Thought that might be the case.
Here's another archive with over 1,200 threads:
https:// archive.fo/search/?q=%2Fpol%2F+-+cbts
Some good ones here missing in other archives. How many more are out there?
archive.is
archive.4plebs
archive.fo
Found first CBTS thread on 8ch.
http:// archive.is/Pvbqq
Yes they will and I will add that my paypal account was subject to MUCH fuckery during the time I was posting a lot about PG on my twitter. Nov/dec 2016
I can probably get you what you need. What are you looking for specifically? All DJT tweets? Tweets with Delta's?
That sucks. I love the system, though! More user friendly than my crap attempts.
They sell "Storage Blocks" expansion way cheaper than more memory. Very fast systems already. Lots of data on the 8GB plan, buy another 100GB storage for way less than the next plan. Call linode to get info on that.
>http:// www.trumptwitterarchive.com/data/realdonaldtrump/2017.json, 2018.json
It probably won't be long until I find out if my host really means it when they say "unlimited".
Limits depend on the operating system. I'm not sure how much I'll end up needing in the end. I've got some full page web captures in my system that may bump up the size needed fairly fast. So far, I haven't outgrown the 500GB on my home system. It's about half full now. But that also includes just about all of my software. I have other drives, so I'm not limited to that 500GB. (Recalling when a 60MB hard drive was a big dealโฆ)
Yeah, that would be cool to add to my system, too. I wonder where I should fit that into the task list. I've got to reparse anyway, so it has to be after that. (Backslashes weren't properly handled the first time around.) It was my plan to get to it eventually. So much to do! If you've got it in JSON files, I've got to believe it would be very easy to get them into my system.
>https:// yuki.la/
The archive sites are only as good as whether they're actually saving our stuff. What's the hit rate finding stuff there?
I'm not sure, but I think archive.is and archive.fo may be the same system. Mirrors, perhaps?
I don't have 4chan/cbts. Was Q posting there, too? If I recall correctly, we went from 4chan/pol to 8ch/cbts.
AllQPosts smashed with DJTwitterposts by day
https:// github.com/QCodeFagNet/SFW.ChanScraper/tree/master/JSON/smash
I got the problem with the backslashes fixed. Also, I changed the way I process emoji characters. There actually might be a few more posts that get parsed in during the reparsing. I am in the process of reprocessing everything now. This is going to take a while. I'll let you know when the uploads are done, which will probably be tomorrow afternoon.
speaking of searchability, here is a search engine anons can use that will let you search for all those things normal search engines won't, like stringers that include punctuation / symbols or exact spellings of short words and abreviations, without the search engine being 'helpful' and excluding the results you want, and returnign the results it thinks you want.
http:// symbolhound.com/
> I think archive.is and archive.fo may be the same system
Yes, they sure look like the same system as does archive.li. I must admit complete ignorance how they are structured and how they work. I initially thought archive.is was for /pol/, but now I've found /pol/ and cbts all over the place. Any anon's have any insight it would sure be appreciated.
>we went from 4chan/pol to 8ch/cbts.
4chan/pol/ first posts were 10/28/2017. We were flushed by a bot storm on 11/26/2017 and regrouped on 8chan as CBTS. When that blew up the campaign became The Storm. When that blew up is when we landed on our own board qresearch/greatawakening.
Archives and threads are all over the place, one of our fundamental challenges aggregating all the info to be searchable.
All records and images that I have should now be up on the research tool.
I thought my post count was short on the site last night, but using the following statement on both, they are equal:
SELECT COUNT(post_key
) FROM chan_posts
Funny thing is that when I pull up the table in phpMyAdmin, the row count does not equal the answer to that query. It's short on both. Don't trust the row count in phpMyAdmin when you view a table.
Total number of posts in the research tool is:
1,113,968
Next up: Getting the POTUS tweets into the database.
http:// q-questions.info/research-tool.php
>Has anyone thought to take full news articles and social data dumps, per person, and do sub text matching across the entire body of text to find exact matches?
I've thought to do it. The tagging feature can get us there. The problem is that tagging posts is a lot of work. I need to find a way to get others to help with that without compromising the database.
OK brother codefags. I've stood up a simple API. It serves json and XML for your consumption pleasure.
It's currently set up to:
1) Scrape the chan automagically and keep an archive of QResearch breads and GreatAwakening.
2) Filter each bread to search for Q posts and include anything in GreatAwakening into a single QPosts list
3) Serve up access to posts/bread by list, by id, and by date.
I'm going to incorporate the TwitterSmash delta output next. I figure I can do a simple search across all Q posts easily. Searching across the breads is harder.
You can check it out here: http:// qanon.news/
McAffee says secure https:// www.mcafeesecure.com/verify?host=qanon.news
There's a sample single page app that shows how to use it. http:// qanon.news/posts.html
I still gotta set up my email account so if you spam me now, it's likely to get bounced. I'll check back in later.
My reason for doing this is twofold, I figured we could use it, and I'm looking at the job market in my area and thinking about changing it up. This is partially a learning project to open opportunities by using different tech. I'm claiming ignorance. My plan is to try out an elasticsearch node once I get this working as designed.
Let me know if you can think of a query/filter that you think would be useful. It's not proven to be too difficult to work new things in other than the ugly local path issue I came across working on it this morning.
Try it out anons.
I think you're misunderstanding my idea. The idea is to identify sources of narrative scripts being pumped into the public conciousness. Remember when Trump's speech at the '16 RNC was immediately phrased as "dark" in dozens of articles, tweets, etc? We need to know who's putting out the scripts ("dark") and who's repeating the scripts ("""journalists""" that articles with "dark" are attributed to, shitter users with "dark" in their tweets, etc)
The code could work in different ways but trying to automate everything at the beginning is hard. The easiest way to start would be:
>anon notices a suspicious pattern of the same language being used all of a sudden
<like "dark"
>anon enters the string that's being repeated into a text box
<bonus points if it's pure JS that can run locally rather than requiring a server, at least initially
>code ingests search results of news, shitter, faceblack, etc with that string from the recent past
<configurable in near term increments like past hour, past day, past 2 days
>anon is provided a list of results
From this simple aggregated news & social search an anon can easily see by visually skimming the results to see how widespread the suspicious pattern of the same language being used all of a sudden is.
<next features
>let anons select search result items as suspect and enter them into a database that indexes on journalist/author, keyword, etc
>database can use search result item post date to build a timeline, to identify the earliest sources of the narrative script
At this point, with the database trained on common sources of narrative script repeating, it would be pretty doable to automate suspicious pattern detection by ingesting the full body of content from the sources and searching for sub text matches that exceed noise. Like if "the" is used in most of the article headlines and tweets, that doesn't mean shit because "the" is a common word, but if "dark", an much less common word, all of a sudden appears across article headlines and facebook posts, that would be pretty easy to pick up for human review.
>We need to know who's putting out the scripts ("dark") and who's repeating the scripts ("""journalists""" that articles with "dark" are attributed to, shitter users with "dark" in their tweets, etc)
You can search the word "dark" in my database as it is right now. If that word was used in chan discussions (and it was), you can get results for it. Is there something you think we need to add? Do you have an idea for an algorithm based on what we have?
Right now, though, I changed my mind about what to do next. I want to get the contexting code finished. When I've used my personal version of it, I learned quite a lot.
After that, I will work on getting the tweets in there. If anyone can point me to php code for that, it would be appreciated. I'm not talking about chan posts that link them, but rather the tweets themselves.
I've got a suggestion for the search: enter the following in the text field:
dark%http
and also in a separate search
http%dark
Those should find posts that use the word "dark" and include a link. I don't know how to do this better with what I have without doing some extensive programming.
> I've been getting my twitter data from http:// www.trumptwitterarchive.com/data/realdonaldtrump/2017.json, 2018.json
He isn't keeping it up to date.
>www.trumptwitterarchive.com/data/realdonaldtrump/2018.json
There was a 9 day gap at the beginning of the year. Otherwise it's been updated. Unfortunately I think there were 2 markers in that time. Delta anon knows about it.
I didn't see anything past January.
Refresh yer cache? I'm seeing Jan 9 - March 21 2018
>www.trumptwitterarchive.com/data/realdonaldtrump/2018.json
Reverse order. OK, I see it. Thank you.
Feckin dates. I got it all sorted out. Discovered a bug in the different times zones my dev server is on and the API webserver.
I've been sorting out small bugs and about to wire in the TwitterSmash. The automation part seems to be working good now that I sorted the date bug. I've got it set up to do hourly scrapes. Last run at 8:03pm 3-21 est. The scrapes themselves only take about 45 seconds - including the twittersmashing. There's a test smashpost page here to see the deltas in action. Not totally live Q post data online yet.
http:// qanon.news/smashposts.html
This is another test page using live data
http:// qanon.news/posts.html
I did this to test some code out. Get a random Q post.
http:// qanon.news/api/posts/random/?xml=true
I set up an elasticsearch node today to experiment. We'll see how that goes. Could be an huge pain in the ass to set up at a host. We'll see.
I think that's beyond the scope of what I'm doing. Hopefully, there will be enough here that what I have can help you do that research, especially after I finish the contexting work. Right now, I've had to reparse the database yet again to correct image links. I hope I've finally gotten it right because it takes an entire day to cycle through the entire set.
Update your tripcodes codefags.
public readonly string[] ConfirmedTrips = new string[] { "!ITPb.qbhqo", "!UW.yye1fxo", "!xowAT4Z3VQ" };
http:// qanon.news/api/posts/943/?xml=true
>!xowAT4Z3VQ
Thank you for the heads up. I've made the change in my code, too.
The export/import finally looks like it's ok. Please let me know if you run into issues.
I'm going to be pulling out the post range and thread range options from the form. They unnecessarily complicate things now that I've added date range capability.
I'm moving on to contexting now. Y'all are going to love that feature.
yeah that sounds like a good one.
I've done some more work on the http:// qanon.news api. I managed to work out a coupla small bugs and get the TwitterSmashed posts integrated. Everything seems to be working as designed.
Here's the smashposts.html demo page. Shows deltas to Q posts within the hour.
http:// qanon.news/smashposts.html
I've going to add another result to the smashposts where everything is grouped by days. I'll probably put it in the posts API as well.
It's starting to look like this may be close to going on autopilot. Any interest in changes/additions before I move onto something else?
I'd love to work out a local copy of the Jan 1 2018 - Jan 9 2018 @realDonaldTrump tweets. Those are missing from the trumptwitterarchive site. Anybody got access to that?
>qanon.news/smashposts.html
It looks good so far. One thing, though: you need to save the images. You're linking directly to the 8ch images, and those have a tendency to go missing.
Hmm. Yeah I'll look into it. I can see that archive getting really big really fast. This things only been running for a month and it's over 400mb only JSON. I'll have to make sure what kind of space I've got avail.
But you're not saving more than the Q posts, right? There aren't that many Q posts, and he hasn't posted that many images. But if you're trying to save the entire thing, yes, it's really big and grows really fast. I'm not automatically saving the full size images, and there's still quite a lot in my set.
I never figured that another image archive was what we needed. Each of the QCodefag installs has it's own local archive. My concern was in preserving the JSON data from QResearch before it slid off the main catalog.
I'm going to put up a more simple list to show what's been archived. I'm showing 716 total breads., but again that's only starting at 2-7-2018. Q Research General #358 is my earliest full archive - it's up to #982 now.
That's 624 breads in 47 days. 13.2 breads per day. EST 4846 breads in one year ~ 800k/bread = @ 4GB/year in JSON bread alone. Mebbe different if I moved to a DB.
I may have enough storage, but it's so hard to say. Any image archive estimates anons?
I just saw this info. I need to convert my monthly plan to an hourly plan before they'll let me buy storage blocks.
Pavuk Searchable.
Can someone post the original json of GA post 461 which was deleted? I pulled the json data from qanon.pub, and can use pieces of it to fill in my local copy, but I'd rather have the real thing if I can get it.
As an example, below is a comparison of the original 460 from 8ch and the archived version from qanon.pub. They are close, but the 'com' field did go through a filter to get into qanon's 'text' field. Not saying there's anything wrong with it, but I have the originals for all except 461. Am playing with python code to save all the json files locally for all relevant boards on 8ch, and can parse & search for keywords or q's trips, etc. and display in a browser. Since it's all stored locally, a search doesn't have to hit the net. It's not perfect by any means, but if I can clean it up a bit, I'll share if there's interest.
8ch original 460:
{
"com": "<p class=\"body-line ltr \">Updated Tripcode.</p><p class=\"body-line ltr \">Q</p>",
"name": "Q ",
"locked": 0,
"sticky": 0,
"time": 1521824977,
"cyclical": "0",
"bumplocked": "0",
"last_modified": 1521824977,
"no": 460,
"resto": 452,
"trip": "!xowAT4Z3VQ"
}
qanon.pub copy of 460:
{
"email": null,
"id": "460",
"images": [],
"link": "https:// 8ch.net/greatawakening/res/452.html#460",
"name": "Q",
"source": "8chan_greatawakening",
"subject": null,
"text": "Updated Tripcode.\nQ\n",
"threadId": "452",
"timestamp": 1521824977,
"trip": "!xowAT4Z3VQ",
"userId": null
},
Need 8ch original 461 please if someone has it.
Try this
http:// qanon.news/api/posts/962/
or this
http:// qanon.news/api/bread/452/?xml=true
add/remove the xml from the query string to get XML
>http:// qanon.news/api/posts/962/
Perfect - thanks! The xml flag showed me the exact pieces I was missing to rebuild my entry. Much appreciated and quite a handy apiโฆ
Forgive me, lads. Where do i go for info on Valerie Jarrett? Got lost.
Linode is telling me that I can get block storage, but only by migrating my VM to the Fremont data center, getting a new IP address (SSL cert. etc.)
Crickets from followers whom I've asked to donate funds for the added expenses.
>803653
The search engine on the Research Tool works well. Try searching VJ, too.
http:// q-questions.info/research-tool.php
Do you have to have block data storage? Any other options?
Glad it was useful. The posts API numbering is a bit squirrelly till you get used to it. The post ID is the post count starting from 1 on Nov 28 2017.
So finding out it was post #692 I had to view all posts (or posts.html or and of the QCodeFag installs) to get the post#. The bread# is in the post as threadId
What other options?!?!
"Archive EVERYTHING OFF LINE"
"MAKE IT SEARCHABLE"
If I don't have enough storage, where am I going to store the data?
If you don't know about IT, you should not be in this thread.
Fuck off nigger. I'm just trying to come up with other ideas. I've been in IT for over the last 2 decades. I know exactly whats going on.
My point was, hosting can be found on the cheap if you look around. Not sure you NEED SSD. What you need is storage space. I was thinking drop the SSD for cheaper storage.
Whatever, it's your problem. You seem to be capable of figuring it out.
I'm sure you're really good at building PCs for your aunt Martha. Plugging in the cards and loading and reloading Windows.
Hurts me to my core!
No I write the software. Whatever. Deal with your own problem - it doesn't concern me.
I decided to prune. Too much garbage is in the chans.
Raw video.
related
The research tool is undergoing extensive overhaul at the moment.
I think I finally managed to squash the date bug in the QPosts/DJTweets.
I took the 60min delta restriction off - and it's applying each day's tweets on each Q post to allow you to see all the deltas.
http:// qanon.news/smashposts.html
Sometimes I get lucky.
The Research Tool is back up with a more concise data set. Much will be added in the next several days as I return to development of the contexting feature.
http:// q-questions.info/research-tool.php
I've been thinking about a timeline for the past few days. I looked into different solutions and found timelineJS that works pretty good.
I managed to wrangle the API data into a timeline. I'm planning on adding in the DJTwitter data and ideally news/notable events.
Once I can get the twitter data in I'll cut it loose. I was hoping to figure out an easy way to get other data into the timeline. News/notables. Any ideas? QTMergefag? You got good news/events?
Here's what it looks like:
If I can figure out how to import the twitter posts WITH the images, getting a timeline in Research Tool system is a no brainer. The JSON someone directed me to does not appear to have the image links, unfortunately. The images are essential to some of the tweets.
The plan is for POTUS to have his own post type. Then all one need do is select both q-post and potus posts in the same search, and they'll be displayed properly interleaved.
I think the timelineJS handles that for you if you add it as media/tweet to each slide.
OK. I guess I'll have to take another look at it. Right now, though, my priority is to get the contexting feature working. I do wish there was a way to safely hand off some of the work on the site I'm putting together. There's so much to do! But I have no idea how to know to trust someone. Clowns will be clowns.
Agree. I've been thinking about trying to work out a way of collab. I'm sure I could come up with a way to prove we're who we each say we are. Unless the clowns are here building community Q research toolsโฆ
Check it out. I got the twitter working.
What I can say about this timeline is that there's alot of events on it. There's Q posts batched down to days across 98 days. Add in the Tweets and there's alot going on. Each day/tweet == a slide. It's definitely more than it was probably designed to handle. It takes a minute to make sense of the somewhat sizable JSON data and then render the display.
FOK delete this please
>It takes a minute to make sense of the somewhat sizable JSON data and then render the display.
I just have to make sense of a few of them. Then I can come up with an algorithm to parse them into the structures I already have developed. My site is quite capable of handling multiple sources (chan, tweet, other posts) if I can do that much.
{"scale": "human","events": [{ "start_date":{"year":"2017","month":"10","day":"28","hour":"0","minute":"0","second":"0","millisecond":"0","display_date":"2017-10-28 00:00:00Z"}, "end_date":{"year":"2017","month":"10","day":"28","hour":"0","minute":"0","second":"0","millisecond":"0","display_date":"2017-10-28 00:00:00Z"}, "text":{ "headline":"HRC extradition...", "text":"The body text...<hr/>" }, "media":null,"group":"QAnon Posts", "display_date":"Saturday, October 28, 2017","background":null,"autolink":true,"unique_id":"1dba35d4-46ac-4c5f-94d7-1e6b0f53ad4d" }, { "start_date":{"year":"2017","month":"10","day":"28","hour":"21","minute":"9","second":"0","millisecond":"0","display_date":"2017-10-28 21:09:00Z"}, "end_date":{"year":"2017","month":"10","day":"28","hour":"21","minute":"9","second":"0","millisecond":"0", "display_date":"2017-10-28 21:09:00Z"}, "text":{"headline":"Δ 25","text":"2017-10-28 21:09:00Z<br/>@realDonaldTrump<br/>After strict consultation with General Kelly..."}, "media": {"url":"https:// twitter.com/realDonaldTrump/status/924382514613030912","caption":null,"credit":null,"thumbnail":null,"alt":null,"title":null,"link":null,"link_target":"_new"}, "group":"realDonaldTrump","display_date":null,"background":null,"autolink":true,"unique_id":null }]}
What is this from?
I decided to see if I could find some hidden Q:
SELECT * FROM chan_posts
WHERE post_type
!= "q-post" AND author_hash
IN (SELECT author_hash
FROM chan_posts
WHERE post_type
= "q-post")
This statement found 718 of them I hadn't identified.
Figured out quickly that I had to add a couple additional checks.
SELECT * FROM chan_posts
WHERE post_type
!= "q-post" AND author_hash
IS NOT NULL AND LENGTH(author_hash
) 0 AND author_hash
IN (SELECT author_hash
FROM chan_posts
WHERE post_type
= "q-post")
Still came up with 120. Perhaps a couple of them were misidentified as Q in the first place?
Interdasting. I'd have to see a list.
http:// qanon.news/timeline.html
http:// qanon.news/Help/Api/GET-api-timeline
At least one of the ones I had identified as Q, maybe 2, had been mislabeled. Plus, a known impostor got tagged as Q. Not sure how that happened. I'll have to fix it. But a few other interesting ones popped up.
I made one of my editor features available to you so that you can have a look. On the search form, go to the bottom and check "In processing list:" box. Leave the rest blank. And you can have a look for yourself.
http:// q-questions.info/research-tool.php
>q-questions.info/research-tool.php
Yeah it looks like there are some missed posts in there for sure. You may have done some good work on that one.
ID:RrydKbi3 in post 147683274 definitely looks misidentified to me.
I have to go to an appointment now. But I'll fix the known misses this afternoon, and I can tag you to have another look, if you like.
>RrydKbi3
Agree. That's the only post with that ID. Nothing ties it back to Q.
Same for Anonymous ID:9o5YWnk7 2017-10-29 19:35:45 Thread.147146601 Post.147171101
NP
There are more of them than you're seeing, actually. I've just discovered that I'm still having issues with the import/export process. Not everything I've set to export is getting up there. I'll have to run that to ground tonight and fix it. I thought I had that worked out already. When I was still thread-based, everything I was exporting from the home machine was importing just fine into the online machine. But I guess I changed the logic somehow when I went from thread-based to post-based. (It can sometimes actually be more difficult to change a program than to write it for the first time.) At the moment, some of what I've said below may not be visible. But sometime tonight, it should all be there.
>ID:RrydKbi3
He responded to Q. That's it.
>ID:9o5YWnk7
Yes, he was just responding to a Q post. He isn't Q. I'm not sure at this point if it's an approved post or just another response. I'll have to take another look at it when I'm working with the maps again. For now, I've demoted him to a regular anon. And I'm removing the posts that weren't marked as Q from the online database, at least for now.
I'm not sure what to think about ID:afa548. I had the impression that a hash was good for only one thread. And yet he shows up as a hidden Q in one thread and with his trip in another. Same with ID:4533cb, but there was only one unmarked post for that one.
ID:5ace4f has only one marked post. It looks like he got marked as Q because he's on a map, but I'm not sure it's really him. The other posts look interesting and possibly relevant, though. Still, it's possible the one should be marked as approved rather than as a Q post.
ID:071c71 got reused on a different board. On one, with a non-Q trip. But it's interesting who that ended up being.
ID:23de7f looks entirely legit and probably could be marked.
With ID:d5784a, you can see what I can do to imposter clowns.
ID:1beb61 and ID:26682f look like imposters, but I haven't heard one way or the other on those. Maybe I need to put date ranges on my trip test?
Some hashes are particularly colorful in their unmarked posts. Not sure what the story is there. But I do believe the one that's marked is legit. Maybe another should be marked, but I certainly wouldn't mark all of them.
They're all up there now. There was something weird about two of the records. In one case, someone did something to a file name that I didn't know could even be done! I'll just have to edit that in the database, and it should be fine if it ever needs to be exported again. And I don't know what the deal is with the other. I pasted the SQL statement for it directly, and it worked just fine. Slash issues, maybe.
>ID:afa548
I've been looking further at this. I don't think the one in cbts is Q. The hash just happens to be the same. But there's something like a 3 month gap in when the hash was used.
>ID:1beb61
Fairly certain he's fake, and I'm marking him as so.
A couple of the ones I'd incorrectly marked as Q had the same post number as an actual Q post on another board. So I suppose it's easy to see how that could have happened. Now that Q uses a trip, that's much less likely to happen. They're probably relics of a time when I hadn't developed my toolset so well yet. Now, it's easier because the editor mode of the research tool has drop boxes and the like for making those kind of changes. When I had to use phpAdmin, I was somewhat flying blind because I couldn't see as well what was really in a post. Now I can see the posts in their final form when I'm making changes like that.
Not constructive newgro. You would do well to realize the calibre of techs that browse chans and do what you can to get their help rather than get salty.
By the way, this has not been an idle exercise. One of the things I'll be doing is keeping track (programmatically, in the data) of context chains that reach back to Q. So it's important that Q be properly identified. To that end, finding hidden Q has been valuable. Not only did I find Q gems I had not recognized (probably because they're on maps I haven't worked through yet), but I was able to recognize some misidentified posts as well as get the imposters properly marked. So it's all good.
Qanon.news bumped from the bread anons.
Somebody said that the site was serving malware and it was taken out of the bread. I posted in the meta thread to have BV check it out and he gave it the OK. I spent an hr or so trying to get it back in. No luck.
I'm not interested in begging - but I do want people to use what I've been working on. I'll see what happens after dinner I guess.
Meh. I've been thinking about it. After reading all about codefags problems, bandwidth issues, SSL certs, all the other qcodeClonesโฆ It may be better to just stay quiet and let people use it when needed. I'm a little disappointed that it was so easy to get something removed from the bread.
What I've been working on is really more backend style anyways. I have been thinking about a few different things though.
I saw one anon post something about there needing to be an RSS feed for QPosts. I think that should be pretty easy to provide. If I get some time I may whoop something out.
I've been playing around with the timelineJS. I worked it up where you can select a specific timeline. Qposts. DJTweets. Etc. Q has mentioned timelines a few times and I've been looking around trying to find threads that were timeline based. No real luck so far. Anyways, I was thinking about working on some different timelines.
I've been starting to wonder if moving to a database solution rather than file based json is going to be worthwhile. Better speed probably? Built in caching? Do I want that for an api? What does everybody else think?
>966124
Even in here.
We must be over the target.
I built a new API to get a specific post from a specific bread. Maybe I'll get it uploaded today.
Looks like ~/api/bread/981411/981444/
to get >>981444
Researching an RSS/ATOM feed. That looks to be low hanging fruit.
Very afraid they are!
Goodbye trolls and shills!
I was contacted by a guy that says he's from this site http:// we-go-all.com
Looks to have a Qcodefag repo installed on a page. He wanted to know if he could help at all and I asked him if he had posted anything in here.
He doesn't know anything of the codefags thread. He's interested in access to the api. I don't wanna dox the guy, but this name matches a guy that works for Representative Jared Polis (D-CO 2nd)
5th-term Democrat from Colorado.
http:// www.congress.org/congressorg/mlm/congressorg/bio/staff/?id=61715
Probably nothing. The QCodeFag stuff is open, 8ch is open. Nothing to worry about anons?
Hgbbbkop
All updated
New Qanon ATOM feed:
I managed to throw together an ATOM feed here:
http:// qanon.news/feed
or
http:// qanon.news/feed?rss=true
It returns the last 50 of q posts. It's a work in progress. I can include referred posts, images etc.
New Timeline api: Timeline api that shows Qposts and DJTweets. I also set up an Obama timeline that another anon pointed out. I'm planning on adding more to it and some other timelines I'm thinking about. You can see a few at http:// qanon.news/timeline.html
With the contexting problem I'm working on, I'm thinking I need to also write a "mea cupla" system for when a bread (or bread-like) post is not properly identified. It would go in and recalculate context when status of a post changes. This way, I don't have to be so concerned whether bread posts are properly identified at the outset, and I can just get on with it.
Hey CodeStuds - I was wondering if there's a quick way to find all posts in the qresearch thread by 'U'? I've run across a couple and I've really enjoyed them. I am not trying to take anything away from 'Q' drops - I owe 'Q' a ton for waking me up. But the 'U' drops always ease my mind and make things clearer for meโฆnot sure if they're benefiting anyone else in the same way or not. I wanted to grab them all if I can find them. Thanks Patriots. #WWG1WGA
There could have been before I took everything down and then uploaded only select posts. But to do what you want, I still would have had to set up a whole word search mode, and I didn't have that yet. I abridged my public database due to obnoxious content by shills. I don't want to republish that stuff. I won't put the whole thing back up unless I have a way for visitors to flag posts for review, and right now I don't.
If all you want to search are Q posts, you could try using my system. The way it's set up, you can't force it to look at the first or last letter of the post. But you could try doing searches with a space before and after or a period before and after and other such things to force a word search. The REGEX of the LIKE statement is not strong enough for much else than this.
http:// q-questions.info/research-tool.php
Thank both of you Patriots for your responses. I will do some regexing around. Be safe anons. Love you guys.
Anything is possible.
U is the username? Any other identifying info? Do you know of a post you could point us towards?
Let me clarify something. Is U a name? Is that the whole name? If I've made it public, you can search that on my site already. If not, I can take a peak and possibly make that public for you if it isn't shill stuff.
I found 1 in qresearch and 3 in 4chan. I've added them to my public database for you. I don't see any real revelations in them, though. Enjoy!
http:// q-questions.info/research-tool.php
I've discovered the machine broke for a few hours on March 27-28 and I'm missing some json. Am I the only one saving off json or does some other codefag have some to send my way?
PageScraper to json?
Nevermind. The JSON I needed had slid off the catalog but was still avail. Thanks CM!
It probably should be part of my work eventually, but it isn't yet. It's taken some time to get to that contexting feature. I'm finalizing the algorithm now.
A context chain will begin with a post that has been listed in a bread post and go backward through the links. These are either from the top of the thread or later where the next baker is being told what to include.
Links will also be followed backward from Q posts.
Contexts will stop at bread posts and not include them. (The intent is for context chains to stick to one topic as much as possible.)
When a post that includes a map is encountered, the posts from the map will not be included in the context chain, but links from the text of the post will be included. (Same reason as above: Maps include multiple topics.)
I will keep track of context chains that include Q posts. These can be shown with the Q posts. To minimize confusion, I will be displaying the context chains in separate bordered DIVs with a display/hide button. Not sure yet which to make default. Probably the hidden state to minimize clutter. I MIGHT parse the description of the leading post of the chain from the bread post into it. In the hidden state, this would be all that would show.
Interesting that you should post that anon, I've been thinking the same thing. We need a crawler. Sounds like a great idea. A better way of visualizing the context thread would be great. Ya know I've been reading about Google. PageRank. How that was designed in the beginning. Links you come across that have alot of responses can be either good or bad on 8ch.
With the new breadID/postID feature I rolled out you could find anything you were missing for sure.
So you think your initial targets are just the baker posts and the other posts that are deemed notable?
I've been wondering if we could use a hashtag internally for our own benefit. #notable. That kind of thing.
It sounds like an interesting project. If I can help at all let me know.
Hmmmโฆ. I wasn't thinking about doing an indented method of arranging things. Should I be?
And if I knew how to pass off some of the work to others, I'd do it. It's a LOT for one person to do. One of the reasons it's been taking so long is because I'm still adding to the database, etc. If I had left the entire database online, perhaps? But the clowns were shitting things up with some truly raunchy stuff, and I didn't want to republish that. Truth is, though, that I've done some preliminary with this already. It shouldn't take me long to finish the coding. But it will take a while to do the following:
-properly identify the bread and map posts on some 2300 threads (Yes, this matters.)
-identify the posts listed on the map posts
Even so, I've identified enough bread and maps already that some interesting stuff should begin floating to the surface. That's part of what is taking so long. The code is pointless without at least some of that done.
I'm eager to get to work on this. I lost an entire evening/night due to a power outage.
I think what I'm getting at is that it's difficult to share the work without putting the entire database back online again. If I do that, I may have to do the following:
-Buy dedicated hosting. If I do that, I'll be putting a donation button on the site for sure. So far, this has all been from my own time (a LOT of it) and resources.
-Including a "report this post" button. Like I said, I don't want to be republishing truly obnoxious unrelated stuff. But it's all on me right now, and I can only do so much by myself. I'd have to let the community help me control that content.
But you know, really, the way I'm doing things now has a good side to it. There's a lot of fluff in the complete database. The way I'm doing things now eliminates a lot of that. You're going to get the dense info rich posts this way.
The program can now save data for the contexting. Tomorrow (aka, after I wake up in the morning), I will be working on display.
Nice.
I bought hosting from Godaddy. Unlimited bandwidth and 100GB storage. Economy plan on sale was $12/year. I think I even got another domain with that deal for $1/year that I'm not even using.
Yet.
I hear ya on time. My shit got bumped from the bread because 1 anon got confused about a malware notification. I've got 2 pretty solid months of time in on what I've been doing and got taken out by a single post.
As we reach more and more of the masses, the information is going to appear on more sites that show ads/donations. It's a way of paying for the infrastructure needed to provide the service. I see nothing wrong with it.
The Research Tool can now display context the way I described above EXCEPT that I have not built in a show/hide button yet.
Right now, you have available to you SOME context that I calculated during my initial work putting together a contexting feature a couple months ago. I have more up through the date on the first image, but I have to get an export/import process built to get it into the online database. Since I have an export/import system for the posts, it shouldn't take much to make a modified version for the contexts.
My current task list is:
โ Build the export/import process for the contexts.
โ Get the contexts calculated for the 2300 or so more threads that I currently have. This could take several days.
โ Then perhaps I'll look at getting that show/hide button in there. I might do it in the middle of working on getting the contexts calculated if I get bored of that.
โ After that, including POTUS tweets is next.
http:// q-questions.info/research-tool.php
Wow anon. It's coming together. It will be great to see it once finished.
Interesting what you are doing with the links. I think some of my pages are linking like the qcodefag sites. The RSS I hooked up to go back into the api. Think I should change that?
That's up to you and how you want to display your data. It might be cool to automate at least the downloading of new threads for what I'm doing. But to get the contexting right, I have to go through what comes in anyway. As mentioned before, not properly identifying bread and maps can overload the context chains.
Contexting functionality is complete. The export/import process to make calculated contexts is complete.
I asked Anons on the general thread whether it is more important to calculate the contexts or to include POTUS tweets. The ones who responded want the tweets, so that's next.
I think the messege 'we are being set up' is in response to the SC failing to pass the IMMIGRATION BILL. Also POTUS tweeted CA will not be accepting national guard to border.
https:// www.denverpost.com/2018/04/17/neil-gorsuch-immigration-law-vote/
Nice.
Let me know if you want to hit the smash data. I'll set you up.
I rejiggered the links on some of my pages. It was set up like the qcodefag sites where each post contained a link back here. I changed that to a self referencing link instead. I decided to not be the cause of any more traffic back here.
Statistics show that the pages people coming to my site are interested in primarily the presentation pages - not the API. I think what I've decided to do is remove all references to the API - but still provide it. Default to the posts page or something. I got a few ideas.
That would be great! A JSON source would speed that process along greatly.
Look at the SmashPosts
http:// qanon.news/Help/
Tell me what ya want and I'll see what I can work out.
I'm looking at the help page, but I don't understand how to actually make the call to your API. It looks like the call I would want to use is
GET api/timeline/{2}
but I don't see how to actually implement it.
I think I figured out what I need to do. I just need to add the path to the URL.
There are only 32 tweets in the JSON I got with api/timeline/2. There must have been more than that since October. Maybe I need a different call?
My search-fu is nonexistent & need help for something current:
Somewhere within the past few weeks, someone posted a manual for Mueller firing protests. Didn't see it as a notable in BoB. Think it might have been pinched from ShariaBlue or the like. Thought it was a pdf, but not sure. Couple of screengrabs posted. In any case, it was a pretty thorough treatise on how to organize the march, chants, dealing with infiltrators (:D) and other stuff.
A couple of posts appeared today where one city (Pittsburgh) police department announced they were preparing for "semi-spontaneous" Mueller firing riots. That means they have that manual (but aren't disclosing it).
If we can find that manual again and post it all over that town's (and other) social media, it will awaken many to the fact that most of these protests are always preplanned.
Anyway, sorry for the hijack, but appreciate any help.
I just can't find it.
Do you recall any words that would have been in the post?
Someone found the site where it was from in the current bread:
https:// act.moveon.org/event/mueller-firing-rapid-response-events/search/
I could have sworn it was the whole "rapid events response manual" from MoveOn or allied organizations as a standalone doc.
"Mueller" would return too many hits.
Maybe Mueller + fired + protest(s) or something. Maybe add "plans" or "manual"
This is why their Mueller firing riots plan should get out into the public domain before any protests occur:
http:// pittsburgh.cbslocal.com/2018/04/18/robert-mueller-pittsburgh-police-prepare-riots-if-trump-fires/
Normies will realize how scripted all these protest marches are.
On phone so can't grab the whole site.
TY for any help!
Try these. I was not able to find any PDF files posted recently about this.
>>208025
>>209411
>>211959
>>214550
>>674819
>>725107 (Unfortunately, I was not able to find the fullsize image of this one. Put a request on the general thread as well as Lost & Found if you really want it. Ask them to put it in the Lost & Found thread so you can find it if you look later.)
Looks like those got deleted. I'll make them available on the Research Tool for you.
http:// q-questions.info/research-tool.php
Look in a few hours. I have to run to an appointment right now.
The Smash API will give you more data you want.
You probably don't want the timeline stuff just yet. Unless you want to just stick with the default q/DJT timeline. Just do a get on the timeline API. The timeline API filters out all the tweets to just show the 5,10,15โฆ deltas.
Yeah Gotta add the full path to the URL. If you are hitting it programattically I gotta give you access. Domain you would be calling it from?
I believe you are talking about this website:
https:// act.moveon.org/survey/resistance-recess-host-materials
Yes, that's most of the material, but it had been put into a document (pdf or doc, I think) and indexed.
Much easier to forward a doc to which notes can be added than point normies to a site which is hostile-owned. That document (in whatever format) contained all the articles on that page and more. Was well done by somebody.
Somebody found it!
www.scribd.com/document/375930782/Nobody-is-Above-the-Law-Mueller-Firing-Rapid-Response-Moveonorg-Protest-Guide
This is the basic protest manual all Soros/ SEIU and associated groups use.
Great doc to hand out to redpill people. Leave the redline the Mueller title and add the protest du jour.
Found in this bread:
https:// 8ch.net/qresearch/res/1092389.html#1092719
I'm glad you found it. I'm beginning to think that I need to get the entire database back up there again, even if I have to not upload the images. We've had a couple of search requests like this for which I've had the data. In this case, the original posts had been removed, which would explain why he couldn't find it.
Since it was in a Scribd doc, not sure it would have been found anyway, unless someone commented on it using key words.
I couldn't even hazard a guess as to what percentage of information here since Day 1 is critical vs.otherwise. Throughout it all, it's painting pretty clear pictures of the players& their proclivities, even if we haven't found a smoking gun yet.
In any case, thanks again for everyone's efforts.
One of the posts I found would have led you there.
There is an awful lot of absolute garbage posts out there, to be sure. And now that there are over 1.5 million of them, there is no way one person can censure out the stuff that absolutely should not be republished. I don't like the idea of putting all of the unreviewed stuff up there without their images, either, since a lot of the intel is in those images. It's a tough call. Even though I do have a content warning on the research page, I have concerns about the legal side of just blindly posting some of those images. I most definitely couldn't do it without a reporting feature.
I was hoping for the complete set of Trump tweets since Q showed up in late October. Do any of your API calls provide that?
Well you can get all those from the trumptwitterarchive. What I did is group them into days that Q posted, and then only calculated the ones that DJT tweeted after Q posted.
If you check the API you can see the data, or look at http:// qanon.news/smashposts.html to see it more visually.
You are on it!
Pain having to get the 2017 and then the 2018 from TrumpTwitterArchive butโฆ it's the only way.
I guess I could suck all that in and then offer it as an apiโฆ just raw twitter data.
I only thing I found with the twitterdata is that there's a 9 day gap in January at the beginning of 2018. I've been fighting off a compulsion to archive those (manually) to make it complete.
css : You can just use the twitter magic.
https:// dev.twitter.com/web/overview
On the smash page I just make links and decorate with the bird and tweet. The timeline does it automagically.
Here's a question for you.
How hard would it be for you to remove all the inline style you have on q-questions.info/research-tool.php ?
Do you know about jqueryui themeroller?
Conjigger your jqueryUI website and then download the custom css like magic.
I've pulled it into the same database that contains the chan posts. I don't want to make too many exceptions to how I do things. That makes it more difficult to keep track of what is for what.
Eventually, the cream of the project will be going into that WP site that's at the front of the URL. That will take care of appearances nicely.
And all of the text is back up there now. People won't have to request searches anymore.
Kinda wondering about that myself.
IMO, he was talking specifically about the NP/NK video. Many have archived that offline.
On one hand, I'm archiving online - but that makes it easier for others to archive.
On the other hand - I'm archiving at home too.
The online stuff I'm doing has no bearing on my archives. I put it online so others could use it.
Hardcopies. Print out things. Copy files to USB/CD/DVD. Place inside of safe or better yet faraday cage. Use means that are hard to destroy and items that are not online and can be erased via virus or EMP. It's not just for you, but for the Country. Think that everyone is an off line version of "the cloud" but with a hard copy.
That was the reason I finally ended up putting it online as well. It seemed a shame to keep that functionality to myself. I reworked a few things to make it better in a multiuser environment. It ended up being better for myself as well.
I believe USBs are magnetic. CDs and DVDs are your best choices.
Don't mind me. I'm just trying to find some missing posts.
>>309741, >>309240, >>209205
Anyplace we can download your stash?
Well, now I feel stupid. I just realized there's an "Expand all images" link in the lower right of the page header. Had I realized this, I would not have lost so many full size images. One save could have been done in thumbnail most, and another in full size mode, and I would have had everything on the page.
The ctrl-S method of saving a page will NOT automatically pick up the full size images when in thumbnail mode. If the page is expanded when the save is done, then you'll get the full size images (but not the thumnails, though this is a minor issue).
So here's my suggestion to get the best archiving:
You can save once or twice, but one of the saves should be in expanded most. If you want the thumbnail mode as well, then that's a separate save.
(All of the official archives so far have been in thumbnail mode.)
Huh. Anon never showed up to drop his image link on us?
Not yet, apparently. It's a lot of files. It's going to take some time to upload them all, possibly a few days. Even my thumbnail image set takes a long time to upload.
Still nada.
You are so wrong faggot
She only asks that her comments page is respected, that's who she deems as her people. Do some research before you fuck up your own opinions next time
Unfortunately, we're anonymous here. I have no idea how we can even check on something like this.
Anon asked about the JSON for all Q posts.
The API is still there, I just removed all the links.
http://qanon.news/Help/
Anon asked for a word count in all Q posts and I did it really quick. Just gonna drop this here.
Here's the results, sorted by occurrences.
https://pastebin.com/e1u1jxR2
Why Did George Bush Buy Nearly 300,000 acres in Paraguay?
I finally found one by grepping around in the json files. I'm searching for more, but here's an example.
https:// 8ch.net/qresearch/res/932740.html#933285
Coincidence?
https://www.bostonglobe.com/news/nation/2018/05/04/kerry-quietly-seeking-salvage-iran-deal-helped-craft/2fTkGON7xvaNbO0YbHECUL/story.html
Sorry for popping in on you re this but the Anon I was speaking with about "time stamps and markers" said to come here. They have info for me to be able to start working on it.
There was a thread dedicated to this but appears to be missing now or I keep missing it.
I'm a little behind in my archives at the moment, but that should be remedied by this evening. (I was busy working on my tools.) My site is a good one for looking at tweets vs. Q posts because I can show them on the same timeline.
http://q-questions.info/research-tool.php
That is fine and thank you. Can you tell me what is Q's marker that I should look for?
Q's trip codes are listed at the top of the general threads on this board. On my site, known Q posts are shown in green.
For some reason, posts on Q's new board aren't saving properly (except the first post on the thread). But I'll have everything else up there shortly.
Got the - http://q-questions.info/research-tool.php
Got - https://qanon.pub/
and another that has actual screenshots of Q's posts
been doing some research on Q's marker and need to clarify then will start on "wind the clock"
Much appreciate all the help - I need to do a better job at bookmarking important info on decoding Q.
Found this in QMap PDF thread. Going to try to locate Anon because no sense on duplicating work.
"Anonymous 01/28/18 (Sun) 10:17:16 ID 3c320a No.190706
Thank you for all this hard work. One thing that I think would really help. If the book could include all the Q post with Time Stamps including the early post before trip code. This needs to be searchable by time stamps (EST). The time stamps and dates could be either with each post or in the front with reference to the post. I find that the time stamps are important to first identify Markers. Iโm currently have to jump from time Stamp Search to Marker Search and most data bases I use are not complete with latest posts. This would be extreamly helpful. Thank you Anon. Truly a Patriot! One other thing is some links to Q posts are 404 when link is clicked so I canโt find related time stamp."
You may be looking for the Delta thread.
I think I told you to come here. I did some Delta workk here
http://qanon.news/smashposts.html
That Delta is only considering the difference between a Q post and a DJT tweet. There's is nothing in there to account for DJT corrections of deltas between tweets.
The deltas you see on the smashpost page are spread out across the Q posts - since there is a different delta for each.
IE: Q posts at 12:00p
DJT tweets at 12:10p [10] delta
Qposts at 12:05p <- this would also mean the DJT tweet at 12:10p is also [5] delta.
I did it like that because I wasn't sure of the meaning of all deltas. Is a [29] valid? Only on the 5's? Good luck anon! Let us know what we can do to help.
I think most everything we've been doing here has all been resolved to either GMT or Zulu time. 8ch JSON comes in GMT/Zulu. The TrumpTwitterArchive comes in GMT/Zulu.
Correct me if I'm wrong codefags.
So, just to confirm, the Delta's are the marker?
And I will pop over there and see what is going on. Thank you.
The work being done via this thread is very important. Thank you Anons!!
Yes, my posts are saved in GMT also.
I've about got the issue with the new Q board taken care of. I just needed to tell my database about it. I'm getting those posts ready to upload now.
As it happens, I'm currently working on setting up special search types that you may find useful. One of those search types will show just the Q posts and POTUS tweets. That way, you won't have to think about the proper way to limit your searches if that is what you are after. Look for that in the next day or two. I'm still working on finalizing that feature.
The deltas are what helps you find the marker.
IE: Q posts something about "win" 5 mins later DJT posts something about "Goodwin" That's a marker. (Just an example - I don't remember the deltas on the goodwin marker.)
The Delta thread where the work has been done on deltas. I'd like to see def documentation of confirmed markers.
It would not be difficult at all to include calculations in my displays. So let me double-check what the logic should be.
When displaying a Q post
โ show delta since last Trump tweet.
When displaying a Trump tweet
โ show delta since last Trump tweet
โ show delta since last Q post
Is there anything else?
I added delta calculations. Check it out and let me know if it's what you need.
http://q-questions.info/research-tool.php
>http://q-questions.info/research-tool.php
Looking good!
Checking the Show Delta box seemed to kill off any results for me tho. I'll try again later!
I believe you are nearly correct.
Once you have found a [marker], then the time between DJT tweets/Corrections appears to be the indicator of another marker. I don't think it goes back to a Q post delta.
Check the logic for the [5] & [1] markers.
I disregarded all negative deltas (any tweet BEFORE a Q drop). There's information there possibly - but it just introduced too much noise into the results.
I didn't even attempt to find the series. I'm simply showing the delta between the last of either. I suppose I could. So what is the pattern we are looking for?
Not sure how checking the box kills results. The logic of the check box is implemented in a way that does not affect the search logic. The deltas are calculated after the fact. The actual SQL statement that creates the results is at the top of the page. That doesn't change. Still, I've seen unusual and unexpected things before. What are you seeing that has you thinking there's a difference in the search results?
Never mind. The whole darn thing broke. I had overhauled the search logic to better support the data prep steps, and I guess stuff got messed up in the process. When I get done being disgusted about that, I'll fix it.
>>1341498
#68
I wonder if what Q is referring to is the Legal Status of the US., Macron brought a new contract to sign for Trump in conservatorship. That the old, legal status with the Rothschilds is no longer in effect due to bankruptcy.
I have no idea how defined() can return FALSE and yet the value be correctly set. Anyway, the program has been fixed, I believe.
Hows it looking you faggots? Things progressing as designed?
I got a nagging image issue sorted out. Now archiving Q images and reference images to my site. Just about ready to get back on the elasticsearch idea.
I have no idea what elasticsearch is. Would you care to explain?
I'm still working on things. At the moment, I'm adding some editing features to the research-tool version of things that I'd had in a prior tool. If you've noticed, older posts on my site have thumbnails and screenshots of links from the posts. And I've also started some work on the flagging feature so that I can feel better about putting all of the images back online rather than just selected ones.
Superfast multitenant full text search for json. Clients in Java, C#, PHP, Pyton, Apache Groovy, Ruby etcโฆ
I think all I need to do is write something that will input all my json into my local elasticsearch instance and then all lights are for go.
I've heard whispers of Q + Team posting at set time intervals
Worthwhile to investigate
How to visualize?
Side by side threads (yes, whole threads!) + time lines (with colours)
Helluva Job, No doubt, but who else to ask .. ?
Saw this on Qresearch and didn't know if it had any merit. Leave it to the experts.
MMmmm Yes I have. I like the idea.
There are many services out there that will allow you to do this or you can create your own blockchain w/ ethereum.
Were you thinking just qposts or all qresearch?
Just got back from vacation and saw this. My site can display Q posts and Trump tweets in the same search results in time order.
http://q-questions.info/research-tool.php
I just got back from vacation, so my archive is over a week behind at the moment. I should be more current in a few hours.
Last night, anons were discussing the fact that the chans are part of history. Concern was expressed about the shill impacts on the boards and that perhaps there needed to be a cleaner view of it all. I suppose one answer could be to get back to the original purpose intended for the private version of my database, which is to identify what should be included in the blog that is in the root directory of the site. I haven't actually updated anything there in quite a while. Maybe it's time to get back to that.
Sounds like a good idea. Probably alot of work!
>>1732671 (prev.)
I have heard estimates of Roth wealth in the area of 400-500 Trillion dollars.
It HAS been a lot of work and will continue to be. I've been coasting for a bit, just making sure that the general threads have been archived and made available. But there's also a lot of processing to do with the data if the ultimate goal is to be achieved as imagined. Kinda wish there was a way to safely share the work.
I heard that. I coasted about 2 weeks for the same reason. I've been working on tightening up the site and working on small bugs I've found.
I implemented a search for Q posts and am working on the big bread search now.
One of the tricky things about making my research tool available publicly is that the platforms are different. Different operating system, different database, and (apparently) different PHP. So I may have something working perfectly on my development machine, but I find there are problems when I try to share it. If the focus is to prepare the blog, which is an abridged view, then maybe I shouldn't sweat it if what I have shared publicly doesn't always work?
Ahh you've entered the big new world of internet interoperability! The internet is great, but it's not always the easiest to move data from platform to platform.
It's one of the reasons I stuck with straight JSON. Platform independent. Easily shared. Do you have the capability to transform into JSON/XML? What is your end goal? Share the database? Share the data? The app itself?
I probably do. It's all databased. I'd just have to put stuff into a structure and run an encode_json() on it. Not sure it would be all that easy to put the advanced features into the JSON, though. It doesn't solve the problem of making something accessible for non-techie types, though, which is my goal.
Big bread search update.
http://YaCy.net โ distributed search engine โ has 17 hits for clean query {Q Clearance Patriot}. Kek.
But we should probably download the software and seed a lot moarโฆ
Certainly a page could be made for telling people how to search the original sources. Maybe it could include input fields as well to help people get it right. Unfortunately, original sources have been hacked from time to time, and some material is no longer available.
Hi there anons, just stumbled on this thread in my search for a collection of notables.
Anyone thought of putting them together in a tread/breads?
What were/would be pros/cons of doing such?
Data duplication, Too big etc.
Are there easy ways to make/view/access such collection?
My project has the capability of searching by threads.
As for breads, I'd been working toward that, and I'll probably get back to it soon. The challenge of breads is a bit tougher because they must be identified. So far, my own solution has been a combination of automation and inspection.
Hey TY for getting back to me about this anon.
Your solution is similar to mine I see.
It is why I'd like to have a blogroll with exclusively notables, scraped from all breads by automation, so I could inspect the works thereof.
I may be back to Solr not being a good solution.
In trying to create a prebuilt index I've discovered that either
a) javascript just doesn't have enough memory to do it
b) javascript times out before it gets done and nothing happens.
I'm going to take a closer look at this
https://xapian.org/docs/bindings/csharp/
Moar testing today. Solr is NEVER going to work in this instance. I was hoping that I could just create an index on my dev machine and save that off and then use a worker process to add to the index. I've got one other idea to see if I can bend it to my will - but so far no workie. From what I can tell it's not possible to add to the index - it needs to be completely regenerated when you add a new document.
I don't understand how other people can add so many docs to the index and have it work. My tests were showing it to run for 12+ minutes just to generate an index and it never finished.
I'm open to new ideas if anybody has one.
The custom Google search I've got on there now does seem to work, but again it's not ideal. What I want is a list of POSTS that match and the goog search seems to find the matches, but only returns complete breads. You still have to CRTL F to find what you were looking for within the bread.
I can put together a test harness for Solr if anybody want to see if they can figure out a way to make it go.
My gut is telling me that my next best option is to move into a database in order to accomplish the bigbreadsearch. It's probably possible to do using a hosted elasticsearch solution (https://www.elastic.co/cloud @$50/mo)
On the other hand, I think that I can write an app to fill a database in a couple hours, and it would solve a few of the problems I was seeing in the other search tech. Most of the good search engines will plug into a database anyways so I think this is probably the direction I'm headed.
$50/month seems like a lot. My cost isn't nearly that much.
For elastic search?
>$50/month seems like a lot. My cost isn't nearly that much.
Derp. I clicked the wrong post.
I agree - which is why I haven't done anything on it. My hosting costs a bit more than that - ANNUALLY.
I feel like a DB is just just going to be a better solution now. I'd hoped that I'd be able to just do everything with straight JSON - but alas! You cannot.
I guess I need to find the best search engine to plug a DB into now. I'm hoping to write the code to insert my existing data into the database today, write code to insert new data into the DB tomorrow.
That sounds like a software lease.
MySQL and MariaDB have a natural language search capability built into it. Have you checked to see if that meets your needs?
https://resignation.info/scripts/8chan/search.php
Anons might find this useful.
Doesn't work so well with images but is good for keyword searches.
Yeah. It's a hosted service. It appears that deploying a custom elasticsearch is probably a large pain in the ass most folks don't want to deal with.
I have SQLServer currently set up and my host gives me a database so I'll probably go with that.
WTFERK? We already have like 3 bread searches already now? Am I totally wasting my time?
Regardlessโฆ.
Interesting! Tell me more about how you are doing this. Search seems to be pretty quick. Are you using a DB backend? Straight text search? Is all this in PHP?
I've managed to import all the JSON data I have on hand. 1,569,777 posts took 25mins to import. My DB design is ultra simple. Single table that virtually matches the JSON data structure. There's no telling what the performance is going to be like just yet. Even getting a count takes 16 seconds. Ugh.
I'll run some simple tests later to see what I can figure out.