Anonymous ID: 85a843 Feb. 25, 2018, 2:11 p.m. No.495005   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0470 >>8755

Posts from #608

>>493751

>>494228

>>494299

>>494080

>>494202

>>494015

>>493888

>>493884

>>493882

>>493881

>>493886

>>493919

>>493939

>>493854

>>494489

>>494264

>>494457

>>494503

>>494460

>>494405

>>494451

>>494528

>>494471

>>493877

>>493898

>>493929

>>494283

>>494184

>>494548

Anonymous ID: 1eba68 Feb. 25, 2018, 3:56 p.m. No.495890   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>6431 >>4261 >>4511

One further comment from a heavy database user for what it's worth:

 

If we had a list of 'tags' that anons could enter as they post (in a specific format, e.g. preceded by **) covering topics that emerge (such as 'mkultra', 'bridge' etc. - related to topics brought up by Q) when searching through the data it would serve as a way to link crumbs by subject and as an additional variable / filter in any search would serve to streamline any search.

 

these would have to be moderated by BV / BO / Baker; would not be any more work than creating the notable posts per bread, although would be useful to find a way to insert them after the post identifier to create the link (i.e. >>xxxxxx within/2 **xxxxxx = TRUE).

 

Historical would be an issue but if there were some way of batch-adding at data assimilation stage based on linked crumbs, as well as specific 'meta-moderators' as we run searches etc.

 

However it might work - principle is an easily assignable value to identify crumb subject based on Q's topics so more information can be retrieved via regular search.

Anonymous ID: 85a843 Feb. 25, 2018, 4:46 p.m. No.496431   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0470 >>9393 >>1143

>>495890

>One further comment

Well, you kind of lost me pretty quickly. Correct me if I'm wrong, but what you're suggesting is for posts going forward, and posts that are Q centric.

 

My goal is to see ALL of the board searchable because much of the digging and research that was collected was not just related to items Q had in mind, but many ancillary topics and evidence discovered would help build the "parallel construct".

That's what I see as important, your thoughts?

Anonymous ID: 91c771 Feb. 25, 2018, 5:29 p.m. No.496858   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>6999

Might I suggest using SQLite as the DB for the"file format". It's a single file db that performs well for read heavy workloads, is single file, so easy to distribute, easily usable from PHP and just about any other programming language, and could easily be used to load a regular server based db (obv depending on how the schema is designed). Also multi-platform, so should keep everybody happy irrespective of what OS you use.

Anonymous ID: dbb4a4 Feb. 25, 2018, 8:59 p.m. No.498755   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>9341

>>495005

I've been working on exactly this. I'm pulling the catalog from ga & qresearch. Finding the research general threads and saving those with q posts. Only goes back to about 2/15 when I turned the machine on. Currently working on getting old posts reconstructed. 99% sure I can grab all breads from 8ch.

 

C# dll to scrape q posts and threads from 8ch. 8ch+ json format but could be serialized XML I guess

Anonymous ID: 4073db Feb. 25, 2018, 10:03 p.m. No.499327   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1166 >>0068 >>2931

One of the anons from the other thread.

 

I'm not going to jump in too much if other's are doing something where we end up stepping on Pepe's toes. Couple of thoughts thoughโ€ฆ

 

  • Full text searches/indexes can be garbage. Only good for reserved words

  • Most likely want this in a relational database. Creating the schema would consist of a really simple data model. Not even sure I would worry about normalizing it.

  • Messages (body) could be stored in a blob and be searched with wildcards.

  • Only looking at about 10-15 different queries tops. All simple SQL statements except for a couple that would need to be hierarchical..but still easy.

  • I was thinking to use MySQL or SQL Server for the DB Engine.

  • Biggest challenge will be the parsing of the threads and crumbs into a loaded format for the database. Once in a useful formatโ€ฆloading will be easy.

 

I see three main parts to this:

1) Getting the data so it can be loaded into a database.

2) Creating the database structures (really should be first)

3) Spitting out the queries, views, and sprocs that will be used. And putting a front end on it.

 

  • almost doxxed myself and put a link to my web siteโ€ฆso close :-)

Anonymous ID: 4073db Feb. 25, 2018, 10:13 p.m. No.499393   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>496431

I think that is a great thought. May be a good idea to just get one set started and loaded then look into the other boards.

 

We (at least I) can't see a way to search the 'board' itself, but to create a copy of the data in the threads and make those searchable.

Anonymous ID: 7cdf2a Feb. 26, 2018, 7:36 a.m. No.501408   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1440

A better way to do this is to probably put everything client side. Make a cross platform application that just fetches new posts every so often. The browser is pretty perfect for this is we can set up a cross platform local server to host a local copy of qcodefag and this board.

 

Search

https:// github.com/bvaughn/js-search

Pros: Fast enough once index is built.

Cons: Have to build index, or send it from a server, ipfs, blockchain, whatever.

UI

rip it from qcodefag for q posts

Add 8ch layout to some button on qcodefag or some tab

Display the posts as normal, but add search bar for board side of new client for qcodefag and this board.

 

Pic related, it's easy to get .json formatted threads.

 

inb4 we all pwn ourselves.

Anonymous ID: 8143cc Feb. 26, 2018, 11:24 a.m. No.502752   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>502660

>>>502453

>

>>Research Threads Ideas. Please claim or create yours, let us know of more subject ideas

>

>>Quest for Research Searchability Thread

>

>>>494745 (You) (You)

>

>Thanks for including my thread. I'm not a coder so I'm not much more than a cheerleader. I am quite sincere in my belief that we have to make it all searchable. I'm not naive enough to expect a volunteer to tackle it. Without doxxing themselves, can any anon point me to a service or company that could accomplish this Quest?

Anonymous ID: 8143cc Feb. 26, 2018, 11:27 a.m. No.502773   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2802 >>2951 >>3685 >>2910

>>502660

 

>>494745 (You) (You)

 

Thanks for including my thread. I'm not a coder so I'm not much more than a cheerleader. I am quite sincere in my belief that we have to make it all searchable. I'm not naive enough to expect a volunteer to tackle it. Without doxxing themselves, can any anon point me to a service or company that could accomplish this Quest?

Anonymous ID: 8dbdfa Feb. 26, 2018, 12:04 p.m. No.502951   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>502773

The omega interface for xapian could do most work. wget to grab site data, json->csv converter to translate, and to be ready to go. All Free and open source software. Not quite plug and play, but a start.

 

Sample usage described:

xapian.org /docs/omega/overview.html

 

linode .com has very affordable linux shell hosting.

Anonymous ID: e4e2ff Feb. 26, 2018, 1:17 p.m. No.503420   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

Hey just had a thought but couldn't /ourguys/ look at all bullets ( like they have to on a crime scene )?

 

Wouldn't "LIPPEL" the one who had been "grazed" be able to connect bullet to her dna with whatever DNA would be on her?

 

What about the other student who was walking after being shot in both legs by 4 rounds?

 

Where is the DNA for that match to bullet?

 

What about the dead coach, the HERO we seen at the funeral? DNA match to that?

 

All this stuff might not help us ATM but IMO,

 

would play a big handle in the game out there with Q and friends?

 

https:// www.youtube.com/watch?v=cPvYxTa1ph4

 

https:// www.youtube.com/watch?v=cPvYxTa1ph4

 

https:// www.youtube.com/watch?v=cPvYxTa1ph4

 

LIPPEL

 

Another thing, this video she talks about how "BREAKING THE GLASS WITH SHOTS" starting at @ 2:05.and then she says they arrived..

 

MAYBE AN HOUR AFTER

 

She then states at the end of the video then she states the "Swat team/Police" was on the ground, she aid they were banging on the doors to let them in, she "DIDN'T TRUST IT WAS THEM, BECAUSE THE POLICE WERE BANGING ON THE DOORS - NOBODY GOT UP"

 

==IF THE SHOOTER DRESSED IN FULL METAL GARB SHOT OUT HER WINDOW, SHE WOULD OF SEEN IT BEING POLICE, AND THEY WOULD OF SEEN HER.. AND THEN PROBABLY OPENED THE DOOR THRU THE BROKEN GLASS INSTEAD OF BANGING ON THE DOOR WOULDN'T THEY ?"

 

Whole story right here in the video proves it was either a False Flag or some type of fuckery

Anonymous ID: 1ca2ba Feb. 26, 2018, 2:06 p.m. No.503685   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3938 >>9646

>>502773

It's not really a company, but wouldn't the person running the 4plebs archive be a good place to look for tools/code in this quest? Maybe he'd even be willing to assist? The site uses some fairly powerful search tools for certain halfchan boards already. I'm not a codefag so I apologize if this hasn't been suggested already.

 

https:// archive.4plebs.org/_/articles/faq/

Anonymous ID: 8143cc Feb. 26, 2018, 2:42 p.m. No.503938   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>503685

That's a good suggestion. Do you know off the top of your head how many archive sites have been used at 4ch and 8ch? I know about archive.is and 4plebs, but I've seen a lot more. I'm pretty sure the threads are scattered about the internet.

Anonymous ID: ec7b2a Feb. 26, 2018, 6:46 p.m. No.506133   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

In naval warfare, a "false flag" refers to an attack where a vessel flies a flag other than their true battle flag before engaging their enemy.

 

It is a trick, designed to deceive the enemy about the true nature and origin of an attack.

 

In the democratic era, where governments require at least a plausible pretext before sending their nation to war, it has been adapted as a psychological warfare tactic to deceive a government's own population into believing that an enemy nation has attacked them.

 

In the 1780s, Swedish King Gustav III was looking for a way to unite an increasingly divided nation and raise his own falling political fortunes.

 

Deciding that a war with Russia would be a sufficient distraction but lacking the political authority to send the nation to war unilaterally, he arranged for the head tailor of the Swedish Opera House to sew some Russian military uniforms.

 

Swedish troops were then dressed in the uniforms and sent to attack Sweden's own Finnish border post along the Russian border. The citizens in Stockholm, believing it to be a genuine Russian attack, were suitably outraged, and the Swedish-Russian War of 1788-1790 began.

 

In 1931 the Japan was looking for a pretext to invade Manchuria. On September 18th of that year, a Lieutenant in the Imperial Japanese Army detonated a small amount of TNT along a Japanese-owned railway in the Manchurian city of Mukden.

 

The act was blamed on Chinese dissidents and used to justify the occupation of Manchuria just six months later. When the deception was later exposed, Japan was diplomatically shunned and forced to withdraw from the League of Nations.

 

In 1939 Heinrich Himmler masterminded a plan to convince the public that Germany was the victim of Polish aggression in order to justify the invasion of Poland.

 

It culminated in an attack on Sender Gleiwitz, a German radio station near the Polish border, by Polish prisoners who were dressed up in Polish military uniforms, shot dead, and left at the station.

 

The Germans then broadcast an anti-German message in Polish from the station, pretended that it had come from a Polish military unit that had attacked Sender Gleiwitz, and presented the dead bodies as evidence of the attack. Hitler invaded Poland immediately thereafter, starting World War II.

 

http:// www.bibliotecapleyades.net/sociopolitica/sociopol_falseflag29.htm

 

For hundreds of links to FF research/reports, use this link below. You are welcome Anons..

 

http:// www.bibliotecapleyades.net/sociopolitica/sociopol_falseflag.htm

Anonymous ID: 8143cc Feb. 27, 2018, 5:47 a.m. No.509646   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0581 >>9706 >>4543 >>1528

>>503685

>person running the 4plebs archive be a good place to look for tools/code in this quest?

 

For the archives 4plebs uses sphinx search (http:// sphinxsearch.com/). It's used to index from the database and display search results very quickly.

Easy to implement but I would say it's worth it only if you have a lot of data to search through. For smaller datasets you can use full text search included in a regular database engine.

Also you can take a look at other search engines like Solr (http:// lucene.apache.org/solr/) and elasticsearch (https:// www.elastic.co/)

Anonymous ID: 9176e6 Feb. 28, 2018, 1:42 p.m. No.519706   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>509646

I also would Second the Idea of using Sphinx - it can be connected to a currently live database and given clues and sample queries to Index all text in the DB - https://

www.percona.com/resources/technical-presentations/how-optimally-configure-sphinx-search-mysql-percona-live-mysql and they have a video. I don't think there are any existent Docker setups to play with, although I imagine 8ch is quite custom anyway.

Anonymous ID: dbb4a4 Feb. 28, 2018, 2:24 p.m. No.520068   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0151

>>499327

>>499341

 

OK So I think I've got my chanscraper console app working as designed.

 

AFAIK, I've got all the QPosts in a single JSON, I've got complete breads starting with Bread #364 2018-02-07. That's as far back as I've been able to reach programatically. Each complete bread has also been filtered into another json file containing just Q's posts.

 

The complete breads have only come from 8ch. The chanscraper is set up to whee it could scrape 4ch as well - assuming the json is still available.

 

I'm showing 825 QPosts - 1 more than qCodeFag because I believe I have a deleted

one. All counted it's 210 threads.

 

I've done all the hard work of setting up the old catalog/threads/posts. Its set up where you can specify how far back to refresh (to cut down on unnecessary http gets), It reads in the existing data, finds the new threads to search for on 8ch/greatawakening and 8ch/qresearch, and then archives the threads/posts that q has made locally.

 

If anybody wants the full Q archive as I have it now, here it is: 6mb https:// anonfile.com/H6B7G7dcbc/QJsonArchive.zip

 

I'm going to integrate the DJTweets + minute Deltas in this week.

 

Once I get this all cleaned up I'll cut it loose on Github if there are any C#codeFags interested.

 

My idea is to set up a simple HTML page using some javascript that can be run locally on a single users machine or website. Since the scraper is a C# dll it could be set up to run as a timed service on a web server to keep a site up to date.

Anonymous ID: dbb4a4 Feb. 28, 2018, 2:41 p.m. No.520179   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0263

>>520151

Yeah I knew about that - but I'd already been getting data from QCodeFag. The QCodeFag data was the basis for what I have now since it had already done the scraping on 4ch. I wanted my own in C# source going forward that I can use locally with my other C# code.

Anonymous ID: 7cdf2a Feb. 28, 2018, 2:42 p.m. No.520183   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0193

I don't know why nobody cares but it's trivial do download threads, posts, and boards through the 8ch api in the form of JSON. There is no reason to not have the local client make the get request every so often.

Anonymous ID: 7cdf2a Feb. 28, 2018, 2:45 p.m. No.520201   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0237

>>520193

I meant the hypothetical client with which people are searching this board and staying updated. That client should search for posts all on it's own instead of relying on a single source of truth. (saves infrastructure money too)

Anonymous ID: 07564d March 1, 2018, 1:23 a.m. No.524371   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>9568

>>494816

Ctrl-f is only good on a single thread. What researchers really need is a way to access the entire set of Q posts. I've built that capability for myself locally by parsing ctrl-s saves of the threads into a MySQL database and running SQL searches on that.

 

The best bet for a public search engine might be to cooperate with CodeMonkey to build a search capability for the boards. We'd still have to search each board separately, but at least we would be able to search each board all at once.

 

I've got most of the Q related posts from 4chan and 8ch locally, but I'm not sure how to make that much data publicly available. I've also got a fair amount of PHP code that I use to access and organize the raw data. I'd be willing to share it if I had a place to do it.

Anonymous ID: 07564d March 1, 2018, 1:27 a.m. No.524384   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>493751

Actually, I have had chan posts show up in browser search engine results, but I know this isn't what you're after. I've built the type of search capability you're after on my local machine. It still takes a lot of time to work with the posts, but it's definitely easier than anything we can do at the original sources.

Anonymous ID: 07564d March 1, 2018, 1:29 a.m. No.524395   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>494228

Timeline is easily generated when one has the ability to set the post time to something other than the current time. That's how I create timeline posts in my own database.

Anonymous ID: 07564d March 1, 2018, 1:36 a.m. No.524431   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0474

>>494015

I definitely appreciate that notable posts are included in the breads on each thread. It isn't necessary for them to be updated on each and every thread, but it is good to have them updated at least every day. Right now, I'm using the links in the bread posts to mark posts in my private database as being included in the bread. Given the volume of posts that I am now working with, these links make it easier to determine what is important to include.

Anonymous ID: 07564d March 1, 2018, 1:51 a.m. No.524489   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>494471

If you're lucky, you can find your archives on archive.org. That site saves pages with about nearly the same HTML elements as the original page. Archive.is converts the classes used on the original page into their style equivalents, making for a parsing nightmare. When I've had to use the archive.is version of a page, it was a painstaking process to recreate the single post that I went to the archive to get. My parser code can parse the archive.org archives the same as the original, so it's easy to get all posts from that archive.

Anonymous ID: 07564d March 1, 2018, 1:58 a.m. No.524511   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>495890

I've got tagging fields included in my data structure. Getting them filled is an entirely different matter. I've got a tool to help do it more efficiently than phpMyAdmin, but it needs a bit of work to make it just a bit more efficient so that more than one post can be updated in one pass.

Anonymous ID: 7cdf2a March 1, 2018, 6:32 a.m. No.525489   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0978

Literally just build an index of tags and use fucking client side javascript. Muh databases. Jesus Christ people. You could even let users share tags.

 

First one with a completed project wins. Peace.

Anonymous ID: dbb4a4 March 1, 2018, 12:11 p.m. No.527353   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>520237

Here's the archive again + a handy HTML page that you can use in your browser to view the archives locally. Works fine in Chrome and IE. Readme included.

 

https:// anonfile.com/W3f5H6d8be/QJSONArchive.zip

Anonymous ID: 6db142 March 1, 2018, 8:26 p.m. No.530474   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0920

>>524431

>links

If it's server based something like http:// arborjs.org/ For data visualization/selection would then fix the mapping problem and help a lot with the search problem.

 

>links

There's also the Open Visual Thesaurus project to maybe grab code/ideas from www.chuongduong.net/thinkmap/ to view the data search and what else might be related to walk through the data.

Anonymous ID: dbb4a4 March 1, 2018, 9:04 p.m. No.530677   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>530283

Here's a newer local archive that moves there.

I've put in some UI enhancements to the JSON Viewer HTML page. Seems to be working good. With a slight mod it could work with local json from any QCodeFag site or even direct from 8ch.

https:// anonfile.com/5ercH3d9ba/QJSONArchive_v1.zip

 

Getting the posts into 2 columns should be no problem. It's getting a reliable news source that is gonna cause you trouble.

 

I was planning on putting 3 columns in the viewer, QPosts, Times, DJTweets. In doing all this I've discovered a few things about 8ch/halfchan. The post id's are not guaranteed unique. The best unique key is time and I've found 2 posts that dropped at the same timestamp. Thematically I've been trying to key everything to time. [qposts, tweets, news]

Anonymous ID: 07564d March 1, 2018, 10:22 p.m. No.530920   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0994

>>530474

yEd can produce maps from spreadsheet data. That's one I know of.

https:// www.yworks.com/products/yed

Maybe when I get further along in the post tagging work, it'll be useful.

 

I'm toying with the idea of making my raw data available in some way, possibly in read only format. (Clowns can be destructive.)

Anonymous ID: 07564d March 1, 2018, 10:42 p.m. No.530978   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>525489

I would like to be able to allow others to tag posts in my database. Any ideas on how to keep clowns from shitting everything up?

 

My initial thought is to allow suggesting of tags (similar to comment logic in the blog) with moderators making final decisions on them.

Anonymous ID: 07564d March 1, 2018, 10:47 p.m. No.530994   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>8787

>>530920

One of the big reasons I hesitate in making the entire database available is because a few of the images uploaded into the threads are obscene. I have no desire to inadvertently public that sort of thing. When I'm publishing a reviewed subset, the chances of that happening are low.

Anonymous ID: 00c874 March 2, 2018, 8:59 a.m. No.532931   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4887

>>499327

 

Is there an interest in pre-selecting data?

For example, select only posts identified on "notable posts" lists from each general #.

Plus, of course, any to-from links on those selected, chained.

Just asking. DB size, usability, etc.

Or is the data set also for researching shill/troll themes? It is a possibility, so I ask.

Anonymous ID: 07564d March 2, 2018, 3:32 p.m. No.534887   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4908

>>532931

I'm working on that right now. I got started on this a week or so ago. I wrote a bit of code to travel back through context links, too. Hopefully, in a few days, I'll be able to repost my blog with the results of this work.

Anonymous ID: 07564d March 2, 2018, 3:37 p.m. No.534908   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>534887

A bit more to say about that:

It's my plan to include items that reach back to a Q post together with that Q post when I can identify such. I may do a little pruning to keep the length of the entry associated with a Q post under control. Not everything in a context thread is important, after all. I may have to think about further arranging of things. I'll think more about that as I get closer to a point where I can implement such a strategy.

Anonymous ID: 8143cc March 3, 2018, 6:25 a.m. No.538775   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>524543

>There are over 750,000 total posts from both sites and all boards containing Q related posts.

Yes, and that's the challenge. Making the Q "related" post searchable. Making Q's posts searchable is arguably not as important as making the body of related posts searchable as that's where the body of knowledge resides.

"You have more than you know" taunts us with its promise. We get pointed to Loop Capital, or Stanislav Lunev. We need to be able to search/aggregate all of the posts over weeks/months with a single search. The dedicated research threads are great as far as they go but we're missing a lot of other info posted as snippets.

Anonymous ID: 8143cc March 3, 2018, 6:29 a.m. No.538787   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>530994

>few of the images uploaded into the threads are obscene.

That does complicate it, but a lot of the information in the Q "related" posts is graphic. It seems culling of obscene content would need to be done manually to avoid throwing the baby out with the bathwater.

Anonymous ID: dbb4a4 March 3, 2018, 6:15 p.m. No.543389   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>5176

>>540555

What is everybody using as their sources for drops? 8ch? One of the QCode forks? Something else?

 

How do we verify that our collections are the same?

 

I've been adding a Guid for each post I scrape, just to give them all a unique value.

Anonymous ID: 98bd4e March 3, 2018, 9:32 p.m. No.545176   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>7789 >>7826

>>543389

qtmerge uses the raw JSON/HTML data where relevant from 8ch, 4plebs and trumptwitterarchive as it's source data. It also merges in the JSON from qcodefag/qanonmap. It currently uses the host, board, post timestamp and post number to sync.

 

I like the idea of matching the GUIDs along with a post hash using some method we agree on.

Anonymous ID: dbb4a4 March 4, 2018, 7:23 a.m. No.547826   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>8084

Phonefag right now.

>>545176

>>547789

There's an md5 field as you know in the 8ch json, but it wasn't in the data I got from Qcodefag. Because he'd modified the .com to strip HTML into a.text field.

 

My chanscraper keeps the md5 and the .com and strips HTML into .text.

 

Any C#fags here?

I did set up a GitHub yesterday and push the chanscraper out. Gonna get the Twitter stuff mashed in the next few days.

Anonymous ID: dbb4a4 March 4, 2018, 8:08 a.m. No.548084   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>8229

>>547826

Just ran my chanscraper again since apparently there were new posts last night as I was jacking around with Github.

 

I checked my posts with what's on qresearch and I think I'm good. Showing 839 total now.

New Q posts from 828 - 839.

 

I found a bug in the ChanScraper code too. A thing I've been working on that I forgot to remove. I'll push it out too and then link the GitHub.

Anonymous ID: dbb4a4 March 4, 2018, 8:26 a.m. No.548229   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>8433 >>9586

>>548084

Here's the link to my new GitHub

https:// github.com/QCodeFagNet/SFW.ChanScraper

 

If you are going to run the ChanScraper and then view the posts locally, when you open the QJSONViewer.html page, don't open the [json_allQPosts.json] file, open the newly generated [bin\json_allQPosts.json] file.

 

The machine needed me to include all the existing posts/work json. It's kind of clunky the way I'm doing it because I want to keep this updated with the latest posts/work json. But for a normal user everything is kept updated automagically in the bin\json folders. The project is set up to copy new files if newer - so everything should be kept in sync.

 

If you are planning on running this locally you'll need the .NET framework 4.5 at least. Probably better to go with 4.5.2

https:// www.microsoft.com/net/download/dotnet-framework-runtime/net452

Anonymous ID: dbb4a4 March 4, 2018, 12:29 p.m. No.550148   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0218 >>0251 >>1411

>>549377

Tedious Dayum. Think you could convert your full bread scrape into some json?

 

>>549586

Gotta link to one of the JSON files?

 

>>548564

Here's a mini local JSON viewer as an HTML page + allQPosts.json. @225KB

 

Includes all QPosts up to 2018-03-04T11:29:14

 

https:// anonfile.com/06HeJbdeb6/Mini_Local_JSONViewer.zip

 

I was just thinking that what we really need, to start off with is a single schema that we can all agree on. It will go a far way in interoperability.

 

I'm going to run some tests on my local QCodeFag install and see if it will work off of the ChanScraper _allQPosts.json file. I think it should.

 

The JSONViewer could work with straight files from 8ch or 4ch with a single minor change I forgot to put in.

Anonymous ID: dbb4a4 March 4, 2018, 12:33 p.m. No.550167   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>549586

The ChanScraper includes the full JSON archive as of this morning. I haven't need to go back to any archive.is HTML archives because I've been collecting breads locally since the beginning of Feb. All the Q Posts before that I sourced from the QCodeFag forks.

Anonymous ID: dbb4a4 March 4, 2018, 12:42 p.m. No.550218   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>550148

Here's what the JSON schema I'm working with looks like.

 

[

{

"source": "qresearch",

"threadId": 544266,

"link": "https:// 8ch.net/qresearch/res/544266.html#544985",

"imageLinks": [

{

"url": "https:// media.8ch.net/file_store/ffd6128f5949e4d4f6f3480236a63be002ffc5e59c0a31714360624d8ce45170.jpeg"

},

{

"url": "https:// media.8ch.net/file_store/ffd6128f5949e4d4f6f3480236a63be002ffc5e59c0a31714360624d8ce45170.jpeg/B42CA278-6C32-4618-A856-0CB9B680CC38.jpeg"

}

],

"references": [

{

"source": "qresearch",

"threadId": 0,

"link": "https:// 8ch.net/qresearch/res/0.html#548166",

"imageLinks": [],

"references": [],

"no": 548166,

"uniqueId": "19294a1b-8cae-435d-9503-8eb70c573d6b",

"_unixEpoch": "1970-01-01T00:00:00Z",

"text": "\r\r>>548157\r\rAlso not a real Q post\r\rQ",

"postDate": "2018-03-04T11:19:47",

"time": 1520180387,

"tn_h": 0,

"tn_w": 0,

"h": 0,

"w": 0,

"tim": null,

"fsize": 0,

"filename": null,

"ext": null,

"md5": null,

"last_modified": 1520180387,

"sub": null,

"com": "<p class=\"body-line ltr \"><a onclick=\"highlightReply('548157', event);\" href=\"/qresearch/res/547414.html#548157\">&gt;&gt;548157</a></p><p class=\"body-line ltr \">Also not a real Q post</p><p class=\"body-line ltr \">Q</p>",

"name": "Q ",

"trip": "!UW.yye1fxo",

"replies": 0

}

],

"no": 544985,

"uniqueId": "35c759aa-4998-4009-83a7-2af1b3273f28",

"_unixEpoch": "1970-01-01T00:00:00Z",

"text": "\r\r>>548166\r\rNOT A REAL Q POST\r\rQ",

"postDate": "2018-03-04T00:17:27",

"time": 1520140647,

"tn_h": 237,

"tn_w": 255,

"h": 1114,

"w": 1200,

"tim": "ffd6128f5949e4d4f6f3480236a63be002ffc5e59c0a31714360624d8ce45170",

"fsize": 271479,

"filename": "B42CA278-6C32-4618-A856-0CB9B680CC38",

"ext": ".jpeg",

"md5": "CbsCGk0pVEahunzSuV4LKw==",

"last_modified": 1520140647,

"sub": null,

"com": "<p class=\"body-line ltr \"><a onclick=\"highlightReply('548166', event);\" href=\"/qresearch/res/547414.html#548166\">&gt;&gt;548166</a></p><p class=\"body-line ltr \">NOT A REAL Q POST.</p><p class=\"body-line ltr \">Q</p>",

"name": "Q ",

"trip": "!UW.yye1fxo",

"replies": 0

}

]

Anonymous ID: 98bd4e March 4, 2018, 12:48 p.m. No.550251   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3109

>>550148

Let me clarify, HTML for just the archive pages (to capture threads not in catalog/threads.json). JSON for everything in else.

 

I'm working on how to share it, currently unoptimized and around 6 GiB of data uncompressed.

Anonymous ID: dbb4a4 March 4, 2018, 6:30 p.m. No.553092   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>551411

Yeah I've dug thru all the html looking for a reference to a json file. Can't find a reference to one either. My guess is, that once it drops off the main thread catalog, the JSON is no longer available. Too bad because that's the meat in a simple format.

 

No the machine is more of a scraper (grab data and save it) than a parser. It does parse the HTML out of the .com field into .text like QCodeFag does though. It's not designed to read thru html pages to look for posts.

 

It has a local baseline archive of everything.It reads in that entire local and then figures out the json breads it needs to download from the 8ch/qresearch/catalog.json. Then it downloads all those new breads and resets itself so you don't download everything every time - only the breads from the past [x] days.

Anonymous ID: dbb4a4 March 4, 2018, 8:30 p.m. No.554074   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

Here's an updated mini local JSON viewer as an HTML page + allQPosts.json. @225KB

I updated it so it works with the raw json from 8ch.

https:// 8ch.net/qresearch/res/553655.json

Could probably use an [ascending/descending] button butโ€ฆ

 

Includes all QPosts up to 2018-03-04T11:29:14

 

https:// anonfile.com/z4U1Jdd9b9/Mini_Local_JSONViewer.zip

 

If folks don't like a zip, it's only 2 files they can download the HTML file (ChanScraper) and the allQPosts.json (Console\bin) file on github https:// github.com/QCodeFagNet/SFW.ChanScraper

Anonymous ID: 07564d March 4, 2018, 9:10 p.m. No.554309   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4376 >>9900

>>553109

My images are kept as separate files in original form. Only the links are kept in the database. Here's the record definition for MySQL:

 

CREATE TABLE chan_posts (

post_key varchar(31) NOT NULL COMMENT 'site/board#post (post is set to length 9 with . fill.',

thread_key varchar(31) NOT NULL COMMENT 'site/board#thread (thread is set to length 9 with . fill.',

post_site varchar(19) NOT NULL COMMENT 'For editor post, use editor. For spreadsheet, use sheet.',

post_board varchar(15) NOT NULL COMMENT 'For editor post, use editor. For spreadsheet, use sheet.',

post_thread_id int(10) UNSIGNED NOT NULL COMMENT 'For editor post, use 1. For spreadsheet, use row.',

post_id int(10) UNSIGNED NOT NULL COMMENT 'For editor post, use next available. For spreadsheet, use column converted to number.',

ghost int(10) UNSIGNED DEFAULT NULL,

post_url text,

local_thread_file text,

post_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,

post_title text CHARACTER SET utf8 COLLATE utf8_unicode_ci,

post_thread_title text CHARACTER SET utf8 COLLATE utf8_unicode_ci,

post_text text CHARACTER SET utf8 COLLATE utf8_unicode_ci,

prev_post_key varchar(31) DEFAULT NULL,

next_post_key varchar(31) DEFAULT NULL,

wp_post_id int(11) UNSIGNED DEFAULT NULL,

post_type set('editor','q-post','anon','approved','high','mid','low','irrelevant','timeline') NOT NULL DEFAULT 'anon',

flag_use_in_blog tinyint(1) NOT NULL DEFAULT '0',

flag_included_on_maps tinyint(1) NOT NULL DEFAULT '0',

flag_included_in_bread tinyint(1) DEFAULT NULL,

flag_bread_post tinyint(1) DEFAULT NULL,

flag_relevant_img tinyint(1) DEFAULT NULL,

flag_relevant_post tinyint(1) DEFAULT NULL,

author_name text,

author_trip text,

author_hash text,

author_type smallint(6) DEFAULT NULL,

img_files json DEFAULT NULL,

link_list json DEFAULT NULL,

video_list json DEFAULT NULL,

editor_notes text,

tags text,

people text,

places text,

organizations text,

signatures text,

event_date datetime DEFAULT NULL,

report_date datetime DEFAULT NULL,

timeline_title tinytext

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 

ALTER TABLE chan_posts

ADD PRIMARY KEY (post_key),

ADD KEY post_id (post_id),

ADD KEY thread_key (thread_key),

ADD KEY site_board (post_site,post_board);

 

I'm considering making the database publicly available. I need to figure out how much space it will take up and whether it will fit within my current hosting plan. At present, I have over 880,000 posts in the database. The size of the database file for just this table without the images is 1.1GB. There's another GB for images of Q posts, but this is only the fraction that is Q posts, bread posts, and for the context posts related to these.

Anonymous ID: dbb4a4 March 5, 2018, 3:27 p.m. No.560076   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0415 >>4762

>>555095

holey phuck. 193 GB. That's for a full archive of all breads + images? My local scrape of Q breads and posts as text only comes in at 6mb. My local QCodeFag install with text + Q images is just under 100mb.

 

193GB is getting unmanageable.

Anonymous ID: 07564d March 5, 2018, 11:23 p.m. No.564862   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

I'm working on the export files now. I need to change the posts just a bit before I can make them public.

 

I promised that no links would go to 8ch and particularly qresearch, and also that I would redact mentions of them from the content. I already do this on my blog, but I simply broke the links rather than made them go somewhere else. To get the most out of the republishing of the posts, I need to convert the >and >>> links so that they link to posts stored on my own site. This is probably better anyway since many posts and threads are now missing from their original locations.

Anonymous ID: dbb4a4 March 6, 2018, 9:07 a.m. No.568187   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>8666 >>9170

>>564762

Yeah it's not totally unmanageable. It's more like moving a full grown oak tree. You can do it, but it's a huge pain in the ass. I was thinking more in terms of moving it around the internet or hosting. That's a pretty big db.

 

I rejiggered the ChanScraper to archive all the breads even if there isn't a Q post in that bread. It rendered 215 NEW complete breads and brought my jason net filesize from 6MB to 200MB. Starts around "Q Research General #358".

 

That's with no images, just the raw JSON from 8ch. Each bread is around 700kb.

Anonymous ID: 98bd4e March 6, 2018, 9:44 a.m. No.568666   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>8861 >>9061

>>568187

I did some research on collecting the CBTS threads from 4chan/pol the other night and the results might be useful for others. They can be found at the bottom of the page here:

 

https:// anonsw.github.io/qtmerge/catalog.html

 

It's still a work in progress.

Anonymous ID: 07564d March 6, 2018, 10:31 a.m. No.569170   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>9329

>>568187

Yes, the breads are essential. I've got them going back all the way through 4chan stuff. The breads are how you connect in the answers. If you connect up the contexts, most of them link back to a Q post at some point. Then the context of that post that was linked into the bread can be associated with the Q post. That is what I was working on before I started looking at making my entire database available for research.

Anonymous ID: d6b0f8 March 6, 2018, 11:11 a.m. No.569596   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>494745

I have created a searchable application for /qresearch/.

 

The database is filling right now. I kept only the image attachments in order to save hard disk space.

 

At present 52,000 of the most recent posts on qresearch are loaded in the table with the attachments. We'll see how the storage works out.

 

I'll advise when anons can attempt to use the system.

Anonymous ID: 07564d March 6, 2018, 11:48 a.m. No.569900   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0074

>>554309

I don't know if y'all noticed, but I've got several columns in my database that are not part of the original data. Some of these are tagging fields: tags, people, places, organizations, and signatures. It would be difficult to automate the filling of these fields, but I don't want to entirely open up editing of these fields to anons, either, due to the potential of clown interference. There's no way I can fill all of them in myself. I have an idea to allow tags to be suggested and then allow up-voting and down-voting and coming up with an acceptance criteria before giving them a permanent place in the data record. Or maybe just leave them in that form with their ratings.

Anonymous ID: 98bd4e March 6, 2018, 12:10 p.m. No.570074   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0566 >>0766 >>0809

>>569793

Excellent. Will that raw JSON data be in the DB as well?

 

>>569900

I did notice, those are great ideas. Can I suggest letting each user have their own copy/edits of the metadata? The user-specific data could then feedback into the system for suggestions to others, etc. But primarily it gives the user some way to control the interference/noise.

Anonymous ID: dbb4a4 March 6, 2018, 1:16 p.m. No.570604   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>5010

I've rejiggered the ChanScraper to produce TwitterSmashed json. It includes any DJTweets within 60 mins of a Qpost. Here's what a [5], [8], [10] deltas look like.

 

{

"DJTtwitterPosts": [

{

"accountId": "realDonaldTrump",

"accountName": "Donald J. Trump",

"tweetId": 944665687292817415,

"text": "How can FBI Deputy Director Andrew McCabe, the man in charge, along with leakinโ€™ James Comey, of the Phony Hillary Clinton investigation (including her 33,000 illegally deleted emails) be given $700,000 for wifeโ€™s campaign by Clinton Puppets during investigation?",

"delta": 5,

"link": "https:// twitter.com/realDonaldTrump/status/944665687292817415",

"uniqueId": "00e6951d-5f49-455b-bdd9-bda7f184d9c7",

"time": 1514060825,

"_unixEpoch": "1970-01-01T00:00:00Z",

"postDate": "2017-12-23T15:27:05"

},

{

"accountId": "realDonaldTrump",

"accountName": "Donald J. Trump",

"tweetId": 944666448185692166,

"text": "FBI Deputy Director Andrew McCabe is racing the clock to retire with full benefits. 90 days to go?!!!",

"delta": 8,

"link": "https:// twitter.com/realDonaldTrump/status/944666448185692166",

"uniqueId": "92fbb1a2-169e-412c-abba-6e441d3acbaa",

"time": 1514061006,

"_unixEpoch": "1970-01-01T00:00:00Z",

"postDate": "2017-12-23T15:30:06"

},

{

"accountId": "realDonaldTrump",

"accountName": "Donald J. Trump",

"tweetId": 944667102312566784,

"text": "Wow, โ€œFBI lawyer James Baker reassigned,โ€ according to @FoxNews.",

"delta": 10,

"link": "https:// twitter.com/realDonaldTrump/status/944667102312566784",

"uniqueId": "eabb202f-3b59-48c9-b282-f0110b8388a5",

"time": 1514061162,

"_unixEpoch": "1970-01-01T00:00:00Z",

"postDate": "2017-12-23T15:32:42"

}

],

"no": 158078,

"name": "Q",

"trip": "!UW.yye1fxo",

"sub": null,

"com": null,

"text": "SEARCH crumbs: [#2]\nWho is #2?\nNo deals.\nQ\n",

"tim": null,

"fsize": 0,

"filename": null,

"ext": null,

"tn_h": 0,

"tn_w": 0,

"h": 0,

"w": 0,

"replies": 0,

"md5": null,

"last_modified": 0,

"source": "8chan_cbts",

"threadId": 157461,

"link": "https:// 8ch.net/cbts/res/157461.html#158078",

"imageLinks": [],

"references": [],

"uniqueId": "e22306cc-2831-453a-ae1d-16e90aa23707",

"time": 1514060541,

"_unixEpoch": "1970-01-01T00:00:00Z",

"postDate": "2017-12-23T15:22:21"

}

Anonymous ID: 07564d March 6, 2018, 1:34 p.m. No.570766   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>570074

I could develop an export, I suppose. But that's low on my list of priorities at the moment. The data structure is above in the list. Minor alteration needed: My host does not support JSON fields. Substitute TEXT, and you should be good. If you want to write an exporter, I can review it and include it.

 

But I still don't have the data up there yet. I'm working on the alterations to the data needed to keep everything on site at the host.

Anonymous ID: 07564d March 6, 2018, 1:38 p.m. No.570809   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0944

>>570074

I was thinking of attaching the IP address to each suggestion to keep the up-votes and down-votes honest. Is that enough? Or maybe even too much? The other thing I could do is perhaps tie in the WordPress login system, since it's there anyway. It might take a bit of time for me to figure out how to limit permissions.

Anonymous ID: 98bd4e March 6, 2018, 1:44 p.m. No.570874   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0983

>>570660

Thanks, 4plebs is good for now, but a second witness is preferable. Started archiving Feb 15, but some old data was still available at the time.

 

For 8ch these are the oldest breads I have:

 

pol: 10509790 (2017-08-28)

cbts: 10 (2017-11-21)

thestorm: 1 (2018-01-31)

 

I don't have all breads after though, it is incomplete.

 

I've since stopped archiving pol/cbts/thestorm to save time/space.

Anonymous ID: 6a9543 March 8, 2018, 10:30 p.m. No.598094   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2595

There so much content being produced now that it should be compiled into a wiki in a dedicated thread. The other threads investigate and make the content, this one adds the best content into one big archive, updated in real-time ofc bc they never stop why should we pic related.

BUT WHY

To take Q's work to the next level we have to increase the public's basic awareness of the criminality being exposed, investigated, and terminated, by an order of magnitude. That order of magnitude is pretty normal people.

>be a normal person

>want to do the right thing but get a link to this Q thing and there's too much complex and """scary""" info what with muh job and family and everything else

>the big load of content is overwhelming and i don't know where to begin and have it be easy

<make 1 entry point to begin browsing the entire body of accepted content

<terse organization keeps it brief and saves the details for a leaf page a click away, as deep as is necessary

<keep source of body of accepted content continuously up to date

<using https for minimal integrity protection

>now i can begin a review of the evidence contained in the case file archive with a single click! jeff bozos eat your heart out nigger

>and look at short well-organized and sourced text, and pictures, and the odd video

>and easily get a run down on whatever topics i browse my way upon

>and now even though my eyes have been opened in a pretty dramatic way, it was easy to use and i know it'll be easy to share, to the topic level

Anonymous ID: 70e498 March 9, 2018, 10:31 a.m. No.602595   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3402

>>598094

I hear you anon.

The key is the content. We have the ability archive threads/qposts. Posts that Q references. Tweets. Known tripcodes/twitter accounts.

 

What is the source of all the evidence? The dedicated research threads? Notables? In order for it to be automagic, there needs to be a reliable single source here on 8ch. None of the codefag work I've seen reaches a level of what could be called AI - or the ability to discern which anon has posted a certifiable answer/evidence.

 

Non automated means anonomated, but that causes it's own set of issues.

 

I agree a wikipedia style thing would be good because it's familiar, but populating it with data may be an issue. Some of it's going to have to be entered in manually.

 

If all you are looking for is a location for an anon wiki, I think that's pretty easy.

Anonymous ID: 07564d March 9, 2018, 4 p.m. No.605608   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>5926

I'm stuck. I'm working on getting that database up for you, but I have to make some modifications to the post_text field so that those links don't come here to 8ch. (I promised that I wouldn't do that.) I'm trying to fix the post_text field so that the >links refer back into the database, but I'm not familiar enough with the DOMDocument and related classes in PHP. Are there any good tutorials out there on how to do advanced manipulation of HTML using these classes? The reference manual stuff just isn't doing it for me.

Anonymous ID: 07564d March 9, 2018, 4:26 p.m. No.605926   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>605608

I should clarify something. Not only am I going to make the existing links self-reference, but I'm also going to revive those dead >links and point them back into the database. I've got many of the deleted threads in my database, too, and I can make those available.

Anonymous ID: 70e498 March 10, 2018, 9:47 a.m. No.612945   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3236

>>603568

Ya that's fine. I'm going to update that today to cover the latest.

 

I've been working on a new local viewer that uses the twitter smashed data. It shows the delta + alt text of the tweet + a link to the tweet. I've noticed that alot of the image links I have a currently broken. I was thinking I'd just update those to point to one of the other QCodeFag branch archives rather than try and archive all the images as well.

 

Expect an update on GitHub later

Anonymous ID: 07564d March 10, 2018, 10:26 a.m. No.613641   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3892

Good news! I've got the code working which makes the post links compliant and refer back into the database. Almost as soon as I posted the request, it came to me that I was making things more complicated than they needed to be and a better algorithm came to mind. The algorithm is so good that in cases where good posts didn't link in 8ch, they will be linked on my site. That includes links such as the one Q pasted into the middle of a word the other day or when they are consecutive with or without comma or white space. Anywhere there is a >followed by a bunch of digits, a link should be created. The only exception is where the post number of the link is greater than the post number of the current post. This type of error was encountered in early posts after the transition from one board to another. Anyway, I'm going to run a few more quick tests, and then I should be uploading to my host within a few hours. I still don't have code ready to search it, though.

Anonymous ID: 70e498 March 10, 2018, 10:49 a.m. No.613892   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>8146

>>613641

When you get that worked out make sure to let us know. I've been wondering about that myself. The early halfchan no's are pretty big. I've found some bugs in my code around there being multiple references per Q post. It does happen on occasion and my scraper isn't catching them all.

 

I've just uploaded a bunch of json data to the https:// github.com/QCodeFagNet/SFW.ChanScraper/tree/master/JSON gihub. The json folder is what's generated when you run the ChanScraper, the smash folder when you run the TwitterSmash. Each of those folders has a Viewer.html file that can be used with just the _allQPosts.json or _allSmashPosts.json.

 

Like I said I need to clean up some dead image links for everything to be working right.

Anonymous ID: 07564d March 10, 2018, 7:15 p.m. No.622768   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2903

>>620330

Part of making those offline archives is storing the items. Plus, don't assume any platform is forever. There are too many clowns out there who don't want anyone to see this stuff.

 

So now I've got a bunch of export files of my database ready to upload. Next challenge: Automating the import on the hose.

Anonymous ID: 07564d March 13, 2018, 12:41 a.m. No.649300   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0810 >>3255

>>648594

It's up there. The paging isn't working yet, so don't anyone complain about that. I'll fix it in the morning. I also discovered that a key range of posts didn't import properly. I'll fix that in the morning, too. For now, I've set the posts per page to 2000, which may cause timeouts, but it will allow people to play with things a bit.

 

http:// q-questions.info/research-tool.php

Anonymous ID: 70e498 March 13, 2018, 6:27 a.m. No.650810   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2644

>>648594

HOLEY FUCK YES.

 

This crosses all breads? If so then this is exactly what we need. I can help you with the SQL if you need it.

SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15 you should also specify an ORDER BY

 

>>649300

How are you getting the breads? Maybe I can work out a way to get you those. Combine up somehow

Anonymous ID: 70e498 March 13, 2018, 7:58 a.m. No.651528   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3255

>>509646

I've been thinking about this. Preliminary research shows that elasticsearch and lucene would probably be the best match for what we've got. There are alot of tools that pile into elasticsearch. Any hostfags here with the ability to set up an elasticsearch node?

 

The data is big. Tons of images. A proper archive takes space. I'm holding @546 complete breads and with no images it's 250MB+. That's for like a month. By the end of the year the bread collection alone is going to be over 1.5GB.

 

The images I've got so far is around 100MB, but that's just from the Q posts - and even then I know I'm missing some.

 

Econ Godaddy hosting is like $45 a year. I'm thinking about just putting the chanscraper/twittersmash online, then write some simple apis. Get thread#, filteredThread, qpost# that kind of thing. Useful or no?

Anonymous ID: 07564d March 13, 2018, 9:51 a.m. No.652644   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4567

>>650810

My algorithm for getting breads is this:

  1. Get the author_hash for the first post in a thread.

  2. Mark the first posts in the thread that match that author_hash until the author hash doesn't match.

 

If someone jumps in before the baker is done, oh well. But that shouldn't be much of a problem because the breads get repeated a lot. I can mark posts as bread later, if need be.

Anonymous ID: 70e498 March 13, 2018, 1:07 p.m. No.654567   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4852 >>4901

>>652644

Hmmโ€ฆ When I say bread I mean a full Q Research thread. Like this

https:// github.com/QCodeFagNet/SFW.ChanScraper/blob/master/JSON/json/8ch/archive/651280_archive.json

 

That's the straight bread/thread from 8ch. It includes all the responses whether the BV posted it or not.

 

I'm finding those by getting the full catalog from

https:// 8ch.net/qresearch/catalog.json, finding the breads/threads that have q research, q general etc in them, and then getting the json for that thread only from https:// 8ch.net/qresearch/res/651280.json

 

I think I see what you are doing - going thru and trying to mark the relevant posts?

Anonymous ID: 07564d March 13, 2018, 1:41 p.m. No.654852   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>654567

I haven't even looked at at that.

 

Paging is fixed, plus I gave you a couple other search parameters.

 

I'm still working on the import issue, but I at least have put the posts I initially identified as missing up there.

Anonymous ID: 07564d March 13, 2018, 1:46 p.m. No.654901   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>654567

>I think I see what you are doing - going thru and trying to mark the relevant posts?

 

Yes. Most of it is done automatically. Since I save the marks in the post records, I can go back in there and adjust it, if necessary.

Anonymous ID: 5f9a22 March 14, 2018, 9:42 a.m. No.663255   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4801

>>651528

>Useful or no?

I'm not the guy to ask. The discussions here went over my head immediately. Looks like there's some serious progress being made here:

>>648594

>>649300

 

One question I have for contributors here is when there is a consensus that you have created a viable search tool, how will you manage promulgation? Do it like a war room announcement on qresearch?

 

As many have noted, the search tool has to be hardened against tampering before release. Clowns/shills are devious and destructive.

Anonymous ID: 70e498 March 14, 2018, 12:36 p.m. No.664801   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4904

>>663255

I agree on shill proofing.

 

I've been playing around with a webAPI. I've got it working nice with all the q posts, looking for a specific post# like #929, and posts on a day. Returns json or xml. This is the Crumb Archive.

 

My plan is to expand that so that the archived breads can be accessed as well - each as a single json file. This is the Bread Archive.

 

I'm going to set it up where it's an autonomous machine. It will scrape and archive automagically moving forward from the current baseline. No delete. No put. No fuckery.

 

I'm pretty sure it would with the QCodeFag scraper repos.

 

The bread archive is pretty big. I'm sure there's no way I can archive images for all the breads. An image archive isn't what I've been focused on. The focus of this is only making the json/xml available from the chanscraper.

 

Once I can get the breads all up and being served automagically my plan is to set up an elasticsearch node and suck all the breads in.

 

I figure a year of godaddy hosting is currently $12 with unmetered bandwidth. I'll throw in.

Anonymous ID: 07564d March 14, 2018, 12:54 p.m. No.664960   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4984

I'm beginning to wonder if I'm up against some kind of limit on my remote host. I just tried importing into it again, and I'm still missing some posts.

Remote host: 1,010,127 records

Local machine: 1,049,610 records

Anonymous ID: d6b0f8 March 14, 2018, 12:56 p.m. No.664975   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

I'm using the 8chan JSON API endpoints. I still need to pull from the archive.json file downloaded yesterday.

 

My server is on a linode so I have fast response time.

Anonymous ID: d6b0f8 March 14, 2018, 12:58 p.m. No.664990   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>6471

You can search the text is the posts with wildcards. Say you want all posts with the word BOOM. Just enter boom.

 

Say you want the posts from Q with his tripcode and "boom"

 

Put !UW.yye1fxo in the trip code.

put boom in the comment

Click search button

 

voila.

Anonymous ID: 5f9a22 March 14, 2018, 3:51 p.m. No.666471   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>6959

>>664990

Wow, awesome job! I knew it could be done. I'm going to need some help getting started. Could you put a qsearch for dummies tutorial together?

 

Did you have to create, or did this create a chronological list of all Q related threads and their titles if any? (/pol/cbts/CBTS(8ch)/The Storm/ qresearch)?

 

That might be a good Mnemonic to speed searches.

Anonymous ID: d6b0f8 March 14, 2018, 4:45 p.m. No.666959   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>666471

I've not been back into this thread for a while. I'm running the qresearch import process to get up-to-date. One technique that is needed is to re-scan already imported threads for posts missed during initial scans.

 

Threads are imported from the catalog.json file. In this state, we know the thread number and the number of messages at that time. The only time we know a thread is closed is when the number of posts >= the number in the official "bake" count.

 

Therefore, my program keeps testing until the posts counter >= the bake counter and then marks the thread as complete in the thread table. This then prevents re-scanning all threads because we get only the open ones.

 

Multiple scans of posts are needed to get all of them and to deal with duplicate threads.

 

I use the 8-chan post number as part of the primary key to the threads and posts tables.

 

8GA_1 is 8chan Great Awakening post 1

8QR_655000 is 8chan Qresearch post 655000

 

The big problem is going back to find threads BEFORE the last 25 pages in the catalog.json. Therefore, I can't get anything earlier than when I first wrote the import.

Anonymous ID: d6b0f8 Timestamps March 14, 2018, 4:47 p.m. No.666983   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>7927

The import routine uses the JSON API endpoint from the boards. In the JSON is the Unix timestamp of the message. This is a native field/object type in Pavuk. Thus all timestamps are set to UTC internally.

 

NOW, if I could get DJT's Twitter feed in JSON, it also has UnixTime and this goes in directly.

 

Twitter wants me to give them all sorts of documentation before they will allow me to use their API. Frankly, I don't have the time to deal with them or the inclination.

Anonymous ID: d6b0f8 Searching with Pavuk March 14, 2018, 4:50 p.m. No.667022   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

Super simple.

 

Entry forms are also search forms.

Enter the data that you wish to match.

Click the search button.

 

Pavuk creates and then executes the appropriate query and returns the items in a Kendo grid. Scroll, resort, export to excel or click on a row to return to the entry form with your data.

 

searching on timestamps has issues that i need to resolve

Anonymous ID: d6b0f8 COMMENTS scrubbed with Lynx March 14, 2018, 4:57 p.m. No.667075   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>7776

The comments from the JSON API include markup and JS to go to real links. This is a problem with the storage and search. I pipe the comment string through Lynx with the -dump option and this gives me clean text in STDOUT and then a separator and then the list of actual links. I put the text in the comments and the links in a multivalue table. I'll expose the links tomorrow as a separate tab in the entry form.

Anonymous ID: 70e498 March 14, 2018, 6:30 p.m. No.667886   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>665010

Yeah man hit it. I've got a github here you can browse around.

https:// github.com/QCodeFagNet/SFW.ChanScraper/tree/master/JSON

json/8ch has the filtered/unfiltered bread and archives in it. smash has the twittersmashed posts. I've been getting my twitter data from http:// www.trumptwitterarchive.com/data/realdonaldtrump/2017.json, 2018.json

 

I set up a test for the webAPI twittersmashed posts here https:// qcodefagnet.github.io/SmashViewer/index.html

 

I'm getting close on having the webAPI thing finished up. Just running some more tests and then I should be ready to go.

Anonymous ID: 70e498 March 14, 2018, 6:33 p.m. No.667927   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>666983

Yeah you could mebbe use the smashed json from me. I've already done the unix timestamp on the trump tweets. All 8ch posts and Twitter posts dervive from the same Post base object with the unix timestamp built in.

Anonymous ID: 70e498 March 14, 2018, 6:37 p.m. No.667971   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>666995

I think that's because you can't really get them. There is an 8ch beta archive here, but all the Q Research threads dissappeared shortly after we started archiving them. Even then, those archives are straight HTML. It's of no use to me. AFAIK, once it slides off the main catalog, its pretty much gone. Some trial and error got me a few breads, but not many.

Anonymous ID: 5f9a22 March 14, 2018, 7:08 p.m. No.668375   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>8958

>>666995

>BO has never responded

I'm not the board owner, just some schmuck who started a thread he thought was being overlooked. You folks are so far out of my ballpark all I can do is try to keep it inside the curbs of what my original intent was.

I'd like to see a list/catalogue/file of all Q "related" posts.

Aaand I'd like to see a list of post Q "related" posts across all platforms/threads made searchable. Plenty of focus on Q, we need the early digging and free association.

Anonymous ID: 07564d March 14, 2018, 9:56 p.m. No.670142   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0304 >>0317

>>667648

The first time I uploaded, I batched them in by 1000.

The second time, I batched them in by thread. I'm not sure how well the LIMIT clause on the SQL works.

 

In any case, I may have a problem on both computers. I could have sworn I had over 1.1 million records the other night. (Not to worry. I still have all of the source.) The solution may be to partition the table. I won't have to rewrite any code, but it'll chunk the table's file down into smaller sections.

 

This should be interesting. I've never had to partition a table before. Apparently, newer versions of MySQL do it automatically. But until then, it's gotta be done.

Anonymous ID: 07564d March 14, 2018, 10:06 p.m. No.670221   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0279

>>666995

If threads are missing, you have to look in archive.org/web or archive.is. Of the two, archive.org/web is better for scraping because the HTML code is about as close to the original as they can make it. I can actually use the same scraper program on it.

 

Since the stuff that is on archive.is is so different from the original, I will need to write a new scraper for those. On several occasions, the post was important enough that I rebuilt it by hand.

 

With either archive, you need to know the URL, which can be tricky sometimes. Just having the post number won't do it. You must know the thread as well.

 

Just thought of something: When I get threads from these archive sites, what time zone do they show? I believe my stuff is saving to GMT when I save a post directly from a chan site. I'm not sure what I'm saving when I get posts from these archives.

Anonymous ID: 70e498 March 14, 2018, 10:13 p.m. No.670279   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>670221

I would think the time is relative to the archive home timezone. That is, unless archive.x has done some wizardry to change the time zone it's pulling at to be the time zone of the user requesting the original archive. That would be more problematic - but you could still deal. It should be marked what time zone and then you convert into the unix timestamp.

Anonymous ID: 07564d March 14, 2018, 10:17 p.m. No.670322   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>666995

Here's a hint for how to find the post a dead thread belongs to: Go to the earliest archive of the thread on which you found the link, which will usually be on the archive.is site. If you're lucky, the link was still live when the thread was archived. The other thing to do is search earlier posts that you already have to see if someone else linked the same post.

Anonymous ID: 07564d March 14, 2018, 10:25 p.m. No.670388   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0470

>>670289

I have the vast majority of both. Go check it out.

http:// q-questions.info/research-tool.php

After I resolve the table size problem (which is what I think the real problem is), I think it would be good to work some more on my contexting program. On my local computer, I've got it so that it can look back through the links and show all available context with the post. What I haven't done yet is copy that contexting information to a Q post's context when I find one in the backward linking. It'll be ridiculously easy once I set about doing it. Then, when a Q post is pulled up, all that stuff that linked back to it can show together with it.

Anonymous ID: 70e498 March 14, 2018, 10:28 p.m. No.670421   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0518 >>0571

>>670332

Hmm. Yeah just doing some easy math I can see how you would have more than 1mm records. We're at bread 815+ something here and with 751 post each that over 600k here on 8ch alone.

 

You may be onto something with that. Is there a limit? https:// stackoverflow.com/questions/2716232/maximum-number-of-records-in-a-mysql-database-table

 

Looks like number of rows my be determined by the size of your rows.

Anonymous ID: 98bd4e March 14, 2018, 10:41 p.m. No.670526   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2344

Below is the qtmerge modified raw dataset (text-only) as of 2018-03-14 02:07 UTC.

 

I'm putting this out in the hopes that it may be useful to others for ETL, mining, search tools, archiving etc.

 

Some notes:

  • The data is a synthesis of the the qtmerge datasets: https:// anonsw.github.io/qtmerge/datasets.html

  • For an idea of threads that are available see: https:// anonsw.github.io/qtmerge/catalog.html

  • eventcache.json file contains the posts/tweets/etc in chronological order. The type attribute currently dictates the local object structure (working to fix this to be more clean)

  • refcache.json contains the detected post cross references (this is a work in progress)

  • The referenceID attribute is the "primary key" between the files

  • Timestamps are Unix Time and time strings are US Eastern

 

Extracted size: ~850 MiB

SHA-256 sum: d6ed89da05c0b714fc66b04ca66a8d701456d882d5f128ee1cef26c8d2e22eb6

 

http:// anonfile.com/dazfO8d4ba/qtmerge-text-2018-03-15_05.18.37.tar.bz2

Anonymous ID: 5f9a22 March 14, 2018, 10:56 p.m. No.670617   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0657 >>2421

>>668958

> You want to be able to search across ALL 8ch? Not just Q Research? By platforms are you talking 4ch/8ch?

Not all 8ch. Just 4 and 8ch Q related threads. Q has posted in but a small part of all of the digging (and bullshit) threads and much info is contained in those threads. /pol/ was a cluster until adopting the /cbts/ threads, but they shouldn't be too hard to round up and include in the searchable database.

In fact, I'd only include the qresearch general threads since the GA/qresearch reset. Add the digging/ancillary threads as possible. Most of the gold is in the general's IMO.

Anonymous ID: d6b0f8 Import procedure debugging view March 15, 2018, 5:47 a.m. No.672305   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0213

I can get the other boards and other threads, the issue is disk storage. Linode gives me a lot of bandwidth, but only a few gigs of disk until I change my plan with them.

Anonymous ID: d6b0f8 Limits March 15, 2018, 5:53 a.m. No.672334   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0263

The limit of an OpenQM hash file (table) is 16TB. When this becomes a problem, I can create a distributed file (table) by primary key. Say, put all 8QR in 1 portion, 8GW in another. Simply a way to have physical storage allocated

 

Pavuk session records are GUIDS. (don't worry, I'll purge anons out of the storage.) It was done because of commercial requirements for SOX and other audit compliance issues. Remember, I created Pavuk to build commercial apps.

 

The distributed file is built by using the first 2 bytes of the GUID from the primary key. Thus, it has component files:

 

00

01

โ€ฆ

FE

FF

 

Or 256 parts.

 

Theoretical table size:

256 x 16TB = 4096TB

 

www.openqm.com

Anonymous ID: d6b0f8 No JSON for older threads :( March 15, 2018, 6:30 a.m. No.672572   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

Brother Anons, I can find the IDs of the threads by using the search function on Archive.is. For example, research general #2 was post number 799. Once I know this, I can go back to 8chan and pull up the thread.

 

Sadly, I cannot get it with JSON. I only can get HTML. This means parsing the HTML.

 

This means a new string parser, but it goes into the same table as the JSON, but with more work. Here's what the posts look like in HTML

Anonymous ID: 41bee9 March 15, 2018, 6:43 a.m. No.672688   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2743 >>2920

>>672421

>I tried 4chan/cbts/index.html and got a 404 yesterday

I'd expect that. Threads sunset there rather quickly. I think most everything from 4ch is in http:// archive.is/search/?q=%2Fcbts%2F

I got 22,900 hits. Some people used 4plebs and maybe even other archives. Need to know all of the archive sites used so we can add them to the soup.

A search on 4 plebs from 10-28-2017 to the night of the bans, 11-26-2017 shows 714 hits.

https:// archive.4plebs.org/pol/search/subject/cbts/start/2017-10-28/end/2017-11-26/

Anonymous ID: d6b0f8 IF I GET HELP FUNDING... March 15, 2018, 7:13 a.m. No.672908   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4321 >>0546

We need to work together to get all of the data into the database. If someone could help with a Twatter feed from DJT - preferably raw and in JSON, that can be added to the posts table.

Anonymous ID: d6b0f8 March 15, 2018, 7:15 a.m. No.672920   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>672688

That was helpful. I would ask people in this thread to help develop the information model.

 

There is a "boards" table with the links to get data for each type. It can be expanded into which boards are archived where and I can automate the pulls.

Anonymous ID: 41bee9 March 15, 2018, 9:08 a.m. No.673658   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>7084

Here's another archive with over 1,200 threads:

https:// archive.fo/search/?q=%2Fpol%2F+-+cbts

Some good ones here missing in other archives. How many more are out there?

archive.is

archive.4plebs

archive.fo

Anonymous ID: adacee March 15, 2018, 10:42 a.m. No.674536   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1772

>>672886

They sell "Storage Blocks" expansion way cheaper than more memory. Very fast systems already. Lots of data on the 8GB plan, buy another 100GB storage for way less than the next plan. Call linode to get info on that.

Anonymous ID: 07564d March 16, 2018, 5:03 p.m. No.690263   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>672334

Limits depend on the operating system. I'm not sure how much I'll end up needing in the end. I've got some full page web captures in my system that may bump up the size needed fairly fast. So far, I haven't outgrown the 500GB on my home system. It's about half full now. But that also includes just about all of my software. I have other drives, so I'm not limited to that 500GB. (Recalling when a 60MB hard drive was a big dealโ€ฆ)

Anonymous ID: 07564d March 16, 2018, 5:08 p.m. No.690320   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>8628

>>674321

Yeah, that would be cool to add to my system, too. I wonder where I should fit that into the task list. I've got to reparse anyway, so it has to be after that. (Backslashes weren't properly handled the first time around.) It was my plan to get to it eventually. So much to do! If you've got it in JSON files, I've got to believe it would be very easy to get them into my system.

Anonymous ID: 07564d March 16, 2018, 5:10 p.m. No.690349   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>9043

>>677084

>https:// yuki.la/

The archive sites are only as good as whether they're actually saving our stuff. What's the hit rate finding stuff there?

 

I'm not sure, but I think archive.is and archive.fo may be the same system. Mirrors, perhaps?

Anonymous ID: 07564d March 17, 2018, 10:37 p.m. No.704953   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

I got the problem with the backslashes fixed. Also, I changed the way I process emoji characters. There actually might be a few more posts that get parsed in during the reparsing. I am in the process of reprocessing everything now. This is going to take a while. I'll let you know when the uploads are done, which will probably be tomorrow afternoon.

Anonymous ID: b08c93 March 17, 2018, 11:26 p.m. No.705344   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

speaking of searchability, here is a search engine anons can use that will let you search for all those things normal search engines won't, like stringers that include punctuation / symbols or exact spellings of short words and abreviations, without the search engine being 'helpful' and excluding the results you want, and returnign the results it thinks you want.

Anonymous ID: 59e915 March 18, 2018, 8:32 a.m. No.709043   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>690349

> I think archive.is and archive.fo may be the same system

Yes, they sure look like the same system as does archive.li. I must admit complete ignorance how they are structured and how they work. I initially thought archive.is was for /pol/, but now I've found /pol/ and cbts all over the place. Any anon's have any insight it would sure be appreciated.

Anonymous ID: 59e915 March 18, 2018, 8:40 a.m. No.709094   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>690481

>we went from 4chan/pol to 8ch/cbts.

4chan/pol/ first posts were 10/28/2017. We were flushed by a bot storm on 11/26/2017 and regrouped on 8chan as CBTS. When that blew up the campaign became The Storm. When that blew up is when we landed on our own board qresearch/greatawakening.

Archives and threads are all over the place, one of our fundamental challenges aggregating all the info to be searchable.

Anonymous ID: 07564d March 19, 2018, 9:46 a.m. No.722324   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

All records and images that I have should now be up on the research tool.

 

I thought my post count was short on the site last night, but using the following statement on both, they are equal:

SELECT COUNT(post_key) FROM chan_posts

 

Funny thing is that when I pull up the table in phpMyAdmin, the row count does not equal the answer to that query. It's short on both. Don't trust the row count in phpMyAdmin when you view a table.

 

Total number of posts in the research tool is:

1,113,968

 

Next up: Getting the POTUS tweets into the database.

 

http:// q-questions.info/research-tool.php

Anonymous ID: 70e498 March 20, 2018, 10:24 a.m. No.733102   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

OK brother codefags. I've stood up a simple API. It serves json and XML for your consumption pleasure.

It's currently set up to:

1) Scrape the chan automagically and keep an archive of QResearch breads and GreatAwakening.

2) Filter each bread to search for Q posts and include anything in GreatAwakening into a single QPosts list

3) Serve up access to posts/bread by list, by id, and by date.

 

I'm going to incorporate the TwitterSmash delta output next. I figure I can do a simple search across all Q posts easily. Searching across the breads is harder.

 

You can check it out here: http:// qanon.news/

McAffee says secure https:// www.mcafeesecure.com/verify?host=qanon.news

 

There's a sample single page app that shows how to use it. http:// qanon.news/posts.html

 

I still gotta set up my email account so if you spam me now, it's likely to get bounced. I'll check back in later.

My reason for doing this is twofold, I figured we could use it, and I'm looking at the job market in my area and thinking about changing it up. This is partially a learning project to open opportunities by using different tech. I'm claiming ignorance. My plan is to try out an elasticsearch node once I get this working as designed.

 

Let me know if you can think of a query/filter that you think would be useful. It's not proven to be too difficult to work new things in other than the ugly local path issue I came across working on it this morning.

 

Try it out anons.

Anonymous ID: 43423a March 20, 2018, 12:37 p.m. No.734330   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>7563 >>3341

>>724127

I think you're misunderstanding my idea. The idea is to identify sources of narrative scripts being pumped into the public conciousness. Remember when Trump's speech at the '16 RNC was immediately phrased as "dark" in dozens of articles, tweets, etc? We need to know who's putting out the scripts ("dark") and who's repeating the scripts ("""journalists""" that articles with "dark" are attributed to, shitter users with "dark" in their tweets, etc)

 

The code could work in different ways but trying to automate everything at the beginning is hard. The easiest way to start would be:

>anon notices a suspicious pattern of the same language being used all of a sudden

<like "dark"

>anon enters the string that's being repeated into a text box

<bonus points if it's pure JS that can run locally rather than requiring a server, at least initially

>code ingests search results of news, shitter, faceblack, etc with that string from the recent past

<configurable in near term increments like past hour, past day, past 2 days

>anon is provided a list of results

From this simple aggregated news & social search an anon can easily see by visually skimming the results to see how widespread the suspicious pattern of the same language being used all of a sudden is.

<next features

>let anons select search result items as suspect and enter them into a database that indexes on journalist/author, keyword, etc

>database can use search result item post date to build a timeline, to identify the earliest sources of the narrative script

At this point, with the database trained on common sources of narrative script repeating, it would be pretty doable to automate suspicious pattern detection by ingesting the full body of content from the sources and searching for sub text matches that exceed noise. Like if "the" is used in most of the article headlines and tweets, that doesn't mean shit because "the" is a common word, but if "dark", an much less common word, all of a sudden appears across article headlines and facebook posts, that would be pretty easy to pick up for human review.

Anonymous ID: 07564d March 20, 2018, 6:22 p.m. No.737563   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>7661 >>9099 >>1567

>>734330

>We need to know who's putting out the scripts ("dark") and who's repeating the scripts ("""journalists""" that articles with "dark" are attributed to, shitter users with "dark" in their tweets, etc)

 

You can search the word "dark" in my database as it is right now. If that word was used in chan discussions (and it was), you can get results for it. Is there something you think we need to add? Do you have an idea for an algorithm based on what we have?

 

Right now, though, I changed my mind about what to do next. I want to get the contexting code finished. When I've used my personal version of it, I learned quite a lot.

 

After that, I will work on getting the tweets in there. If anyone can point me to php code for that, it would be appreciated. I'm not talking about chan posts that link them, but rather the tweets themselves.

Anonymous ID: 07564d March 20, 2018, 6:29 p.m. No.737661   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1567

>>737563

I've got a suggestion for the search: enter the following in the text field:

dark%http

and also in a separate search

http%dark

 

Those should find posts that use the word "dark" and include a link. I don't know how to do this better with what I have without doing some extensive programming.

Anonymous ID: 70e498 March 21, 2018, 8:14 a.m. No.744374   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>6289

>>742213

>www.trumptwitterarchive.com/data/realdonaldtrump/2018.json

 

There was a 9 day gap at the beginning of the year. Otherwise it's been updated. Unfortunately I think there were 2 markers in that time. Delta anon knows about it.

Anonymous ID: 70e498 March 22, 2018, 5:26 p.m. No.760314   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

Feckin dates. I got it all sorted out. Discovered a bug in the different times zones my dev server is on and the API webserver.

 

I've been sorting out small bugs and about to wire in the TwitterSmash. The automation part seems to be working good now that I sorted the date bug. I've got it set up to do hourly scrapes. Last run at 8:03pm 3-21 est. The scrapes themselves only take about 45 seconds - including the twittersmashing. There's a test smashpost page here to see the deltas in action. Not totally live Q post data online yet.

http:// qanon.news/smashposts.html

 

This is another test page using live data

http:// qanon.news/posts.html

 

I did this to test some code out. Get a random Q post.

http:// qanon.news/api/posts/random/?xml=true

 

I set up an elasticsearch node today to experiment. We'll see how that goes. Could be an huge pain in the ass to set up at a host. We'll see.

Anonymous ID: 07564d March 22, 2018, 10:33 p.m. No.763341   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>734330

I think that's beyond the scope of what I'm doing. Hopefully, there will be enough here that what I have can help you do that research, especially after I finish the contexting work. Right now, I've had to reparse the database yet again to correct image links. I hope I've finally gotten it right because it takes an entire day to cycle through the entire set.

Anonymous ID: 07564d March 23, 2018, 6:07 p.m. No.773397   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4681

>>771168

>!xowAT4Z3VQ

Thank you for the heads up. I've made the change in my code, too.

 

The export/import finally looks like it's ok. Please let me know if you run into issues.

 

I'm going to be pulling out the post range and thread range options from the form. They unnecessarily complicate things now that I've added date range capability.

 

I'm moving on to contexting now. Y'all are going to love that feature.

Anonymous ID: 70e498 March 23, 2018, 8:02 p.m. No.774681   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4698 >>5587

>>773397

yeah that sounds like a good one.

 

I've done some more work on the http:// qanon.news api. I managed to work out a coupla small bugs and get the TwitterSmashed posts integrated. Everything seems to be working as designed.

 

Here's the smashposts.html demo page. Shows deltas to Q posts within the hour.

http:// qanon.news/smashposts.html

 

I've going to add another result to the smashposts where everything is grouped by days. I'll probably put it in the posts API as well.

 

It's starting to look like this may be close to going on autopilot. Any interest in changes/additions before I move onto something else?

Anonymous ID: 70e498 March 24, 2018, 1:54 p.m. No.781191   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2643

>>775587

Hmm. Yeah I'll look into it. I can see that archive getting really big really fast. This things only been running for a month and it's over 400mb only JSON. I'll have to make sure what kind of space I've got avail.

Anonymous ID: 07564d March 24, 2018, 4:14 p.m. No.782643   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1554

>>781191

But you're not saving more than the Q posts, right? There aren't that many Q posts, and he hasn't posted that many images. But if you're trying to save the entire thing, yes, it's really big and grows really fast. I'm not automatically saving the full size images, and there's still quite a lot in my set.

Anonymous ID: 70e498 March 25, 2018, 3:06 p.m. No.791554   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>782643

I never figured that another image archive was what we needed. Each of the QCodefag installs has it's own local archive. My concern was in preserving the JSON data from QResearch before it slid off the main catalog.

 

I'm going to put up a more simple list to show what's been archived. I'm showing 716 total breads., but again that's only starting at 2-7-2018. Q Research General #358 is my earliest full archive - it's up to #982 now.

 

That's 624 breads in 47 days. 13.2 breads per day. EST 4846 breads in one year ~ 800k/bread = @ 4GB/year in JSON bread alone. Mebbe different if I moved to a DB.

 

I may have enough storage, but it's so hard to say. Any image archive estimates anons?

Anonymous ID: 663ab1 March 26, 2018, 6:42 a.m. No.798718   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>9873

Can someone post the original json of GA post 461 which was deleted? I pulled the json data from qanon.pub, and can use pieces of it to fill in my local copy, but I'd rather have the real thing if I can get it.

 

As an example, below is a comparison of the original 460 from 8ch and the archived version from qanon.pub. They are close, but the 'com' field did go through a filter to get into qanon's 'text' field. Not saying there's anything wrong with it, but I have the originals for all except 461. Am playing with python code to save all the json files locally for all relevant boards on 8ch, and can parse & search for keywords or q's trips, etc. and display in a browser. Since it's all stored locally, a search doesn't have to hit the net. It's not perfect by any means, but if I can clean it up a bit, I'll share if there's interest.

 

8ch original 460:

{

"com": "<p class=\"body-line ltr \">Updated Tripcode.</p><p class=\"body-line ltr \">Q</p>",

"name": "Q ",

"locked": 0,

"sticky": 0,

"time": 1521824977,

"cyclical": "0",

"bumplocked": "0",

"last_modified": 1521824977,

"no": 460,

"resto": 452,

"trip": "!xowAT4Z3VQ"

}

 

qanon.pub copy of 460:

{

"email": null,

"id": "460",

"images": [],

"link": "https:// 8ch.net/greatawakening/res/452.html#460",

"name": "Q",

"source": "8chan_greatawakening",

"subject": null,

"text": "Updated Tripcode.\nQ\n",

"threadId": "452",

"timestamp": 1521824977,

"trip": "!xowAT4Z3VQ",

"userId": null

},

 

Need 8ch original 461 please if someone has it.

Anonymous ID: d6b0f8 Pavuk Searchable March 26, 2018, 5:25 p.m. No.803777   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>5300 >>9411

Linode is telling me that I can get block storage, but only by migrating my VM to the Fremont data center, getting a new IP address (SSL cert. etc.)

 

Crickets from followers whom I've asked to donate funds for the added expenses.

Anonymous ID: 70e498 March 26, 2018, 7:36 p.m. No.805321   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>803461

Glad it was useful. The posts API numbering is a bit squirrelly till you get used to it. The post ID is the post count starting from 1 on Nov 28 2017.

So finding out it was post #692 I had to view all posts (or posts.html or and of the QCodeFag installs) to get the post#. The bread# is in the post as threadId

Anonymous ID: 70e498 March 27, 2018, 6:39 a.m. No.809048   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>9084

>>809001

Fuck off nigger. I'm just trying to come up with other ideas. I've been in IT for over the last 2 decades. I know exactly whats going on.

 

My point was, hosting can be found on the cheap if you look around. Not sure you NEED SSD. What you need is storage space. I was thinking drop the SSD for cheaper storage.

 

Whatever, it's your problem. You seem to be capable of figuring it out.

Anonymous ID: 70e498 March 30, 2018, 3 p.m. No.843897   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

I think I finally managed to squash the date bug in the QPosts/DJTweets.

I took the 60min delta restriction off - and it's applying each day's tweets on each Q post to allow you to see all the deltas.

http:// qanon.news/smashposts.html

Anonymous ID: 07564d April 3, 2018, 9:14 a.m. No.879844   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>838965

The Research Tool is back up with a more concise data set. Much will be added in the next several days as I return to development of the contexting feature.

http:// q-questions.info/research-tool.php

Anonymous ID: 70e498 April 3, 2018, 8:50 p.m. No.887653   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0080

I've been thinking about a timeline for the past few days. I looked into different solutions and found timelineJS that works pretty good.

 

I managed to wrangle the API data into a timeline. I'm planning on adding in the DJTwitter data and ideally news/notable events.

 

Once I can get the twitter data in I'll cut it loose. I was hoping to figure out an easy way to get other data into the timeline. News/notables. Any ideas? QTMergefag? You got good news/events?

 

Here's what it looks like:

Anonymous ID: 07564d April 4, 2018, 1:22 a.m. No.890080   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1187

>>887653

If I can figure out how to import the twitter posts WITH the images, getting a timeline in Research Tool system is a no brainer. The JSON someone directed me to does not appear to have the image links, unfortunately. The images are essential to some of the tweets.

 

The plan is for POTUS to have his own post type. Then all one need do is select both q-post and potus posts in the same search, and they'll be displayed properly interleaved.

Anonymous ID: 07564d April 4, 2018, 8:27 a.m. No.891871   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2076

>>891187

OK. I guess I'll have to take another look at it. Right now, though, my priority is to get the contexting feature working. I do wish there was a way to safely hand off some of the work on the site I'm putting together. There's so much to do! But I have no idea how to know to trust someone. Clowns will be clowns.

Anonymous ID: 70e498 April 4, 2018, 8:59 a.m. No.892076   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2089 >>2772

>>891871

Agree. I've been thinking about trying to work out a way of collab. I'm sure I could come up with a way to prove we're who we each say we are. Unless the clowns are here building community Q research toolsโ€ฆ

 

Check it out. I got the twitter working.

 

What I can say about this timeline is that there's alot of events on it. There's Q posts batched down to days across 98 days. Add in the Tweets and there's alot going on. Each day/tweet == a slide. It's definitely more than it was probably designed to handle. It takes a minute to make sense of the somewhat sizable JSON data and then render the display.

Anonymous ID: 07564d April 4, 2018, 10:48 a.m. No.892772   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2975

>>892076

>It takes a minute to make sense of the somewhat sizable JSON data and then render the display.

 

I just have to make sense of a few of them. Then I can come up with an algorithm to parse them into the structures I already have developed. My site is quite capable of handling multiple sources (chan, tweet, other posts) if I can do that much.

Anonymous ID: 70e498 April 4, 2018, 11:13 a.m. No.892975   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3062

>>892772

{"scale": "human","events": [{ "start_date":{"year":"2017","month":"10","day":"28","hour":"0","minute":"0","second":"0","millisecond":"0","display_date":"2017-10-28 00:00:00Z"}, "end_date":{"year":"2017","month":"10","day":"28","hour":"0","minute":"0","second":"0","millisecond":"0","display_date":"2017-10-28 00:00:00Z"}, "text":{ "headline":"HRC extradition...", "text":"The body text...<hr/>" }, "media":null,"group":"QAnon Posts", "display_date":"Saturday, October 28, 2017","background":null,"autolink":true,"unique_id":"1dba35d4-46ac-4c5f-94d7-1e6b0f53ad4d" }, { "start_date":{"year":"2017","month":"10","day":"28","hour":"21","minute":"9","second":"0","millisecond":"0","display_date":"2017-10-28 21:09:00Z"}, "end_date":{"year":"2017","month":"10","day":"28","hour":"21","minute":"9","second":"0","millisecond":"0", "display_date":"2017-10-28 21:09:00Z"}, "text":{"headline":"&Delta; 25","text":"2017-10-28 21:09:00Z<br/>@realDonaldTrump<br/>After strict consultation with General Kelly..."}, "media": {"url":"https:// twitter.com/realDonaldTrump/status/924382514613030912","caption":null,"credit":null,"thumbnail":null,"alt":null,"title":null,"link":null,"link_target":"_new"}, "group":"realDonaldTrump","display_date":null,"background":null,"autolink":true,"unique_id":null }]}

Anonymous ID: 07564d April 4, 2018, 11:35 a.m. No.893190   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3234

I decided to see if I could find some hidden Q:

 

SELECT * FROM chan_posts WHERE post_type != "q-post" AND author_hash IN (SELECT author_hash FROM chan_posts WHERE post_type = "q-post")

 

This statement found 718 of them I hadn't identified.

Anonymous ID: 07564d April 4, 2018, 11:50 a.m. No.893321   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3483

>>893234

Figured out quickly that I had to add a couple additional checks.

 

SELECT * FROM chan_posts WHERE post_type != "q-post" AND author_hash IS NOT NULL AND LENGTH(author_hash) 0 AND author_hash IN (SELECT author_hash FROM chan_posts WHERE post_type = "q-post")

 

Still came up with 120. Perhaps a couple of them were misidentified as Q in the first place?

Anonymous ID: 07564d April 4, 2018, 12:52 p.m. No.893908   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4169 >>4303

>>893483

At least one of the ones I had identified as Q, maybe 2, had been mislabeled. Plus, a known impostor got tagged as Q. Not sure how that happened. I'll have to fix it. But a few other interesting ones popped up.

 

I made one of my editor features available to you so that you can have a look. On the search form, go to the bottom and check "In processing list:" box. Leave the rest blank. And you can have a look for yourself.

http:// q-questions.info/research-tool.php

Anonymous ID: 07564d April 4, 2018, 5:56 p.m. No.898657   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0583 >>4541

>>894345

 

There are more of them than you're seeing, actually. I've just discovered that I'm still having issues with the import/export process. Not everything I've set to export is getting up there. I'll have to run that to ground tonight and fix it. I thought I had that worked out already. When I was still thread-based, everything I was exporting from the home machine was importing just fine into the online machine. But I guess I changed the logic somehow when I went from thread-based to post-based. (It can sometimes actually be more difficult to change a program than to write it for the first time.) At the moment, some of what I've said below may not be visible. But sometime tonight, it should all be there.

 

>ID:RrydKbi3

He responded to Q. That's it.

 

>ID:9o5YWnk7

Yes, he was just responding to a Q post. He isn't Q. I'm not sure at this point if it's an approved post or just another response. I'll have to take another look at it when I'm working with the maps again. For now, I've demoted him to a regular anon. And I'm removing the posts that weren't marked as Q from the online database, at least for now.

 

I'm not sure what to think about ID:afa548. I had the impression that a hash was good for only one thread. And yet he shows up as a hidden Q in one thread and with his trip in another. Same with ID:4533cb, but there was only one unmarked post for that one.

 

ID:5ace4f has only one marked post. It looks like he got marked as Q because he's on a map, but I'm not sure it's really him. The other posts look interesting and possibly relevant, though. Still, it's possible the one should be marked as approved rather than as a Q post.

 

ID:071c71 got reused on a different board. On one, with a non-Q trip. But it's interesting who that ended up being.

 

ID:23de7f looks entirely legit and probably could be marked.

 

With ID:d5784a, you can see what I can do to imposter clowns.

 

ID:1beb61 and ID:26682f look like imposters, but I haven't heard one way or the other on those. Maybe I need to put date ranges on my trip test?

 

Some hashes are particularly colorful in their unmarked posts. Not sure what the story is there. But I do believe the one that's marked is legit. Maybe another should be marked, but I certainly wouldn't mark all of them.

Anonymous ID: 07564d April 4, 2018, 7:48 p.m. No.900583   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>898657

They're all up there now. There was something weird about two of the records. In one case, someone did something to a file name that I didn't know could even be done! I'll just have to edit that in the database, and it should be fine if it ever needs to be exported again. And I don't know what the deal is with the other. I pasted the SQL statement for it directly, and it worked just fine. Slash issues, maybe.

Anonymous ID: 07564d April 5, 2018, 2:30 a.m. No.904541   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>7276

>>898657

>ID:afa548

I've been looking further at this. I don't think the one in cbts is Q. The hash just happens to be the same. But there's something like a 3 month gap in when the hash was used.

 

>ID:1beb61

Fairly certain he's fake, and I'm marking him as so.

 

A couple of the ones I'd incorrectly marked as Q had the same post number as an actual Q post on another board. So I suppose it's easy to see how that could have happened. Now that Q uses a trip, that's much less likely to happen. They're probably relics of a time when I hadn't developed my toolset so well yet. Now, it's easier because the editor mode of the research tool has drop boxes and the like for making those kind of changes. When I had to use phpAdmin, I was somewhat flying blind because I couldn't see as well what was really in a post. Now I can see the posts in their final form when I'm making changes like that.

Anonymous ID: 07564d April 5, 2018, 10:06 a.m. No.907276   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>904541

By the way, this has not been an idle exercise. One of the things I'll be doing is keeping track (programmatically, in the data) of context chains that reach back to Q. So it's important that Q be properly identified. To that end, finding hidden Q has been valuable. Not only did I find Q gems I had not recognized (probably because they're on maps I haven't worked through yet), but I was able to recognize some misidentified posts as well as get the imposters properly marked. So it's all good.

Anonymous ID: 70e498 April 8, 2018, 5:26 p.m. No.958418   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>5953

Qanon.news bumped from the bread anons.

 

Somebody said that the site was serving malware and it was taken out of the bread. I posted in the meta thread to have BV check it out and he gave it the OK. I spent an hr or so trying to get it back in. No luck.

 

I'm not interested in begging - but I do want people to use what I've been working on. I'll see what happens after dinner I guess.

Anonymous ID: 70e498 April 9, 2018, 8:06 a.m. No.965953   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>958418

Meh. I've been thinking about it. After reading all about codefags problems, bandwidth issues, SSL certs, all the other qcodeClonesโ€ฆ It may be better to just stay quiet and let people use it when needed. I'm a little disappointed that it was so easy to get something removed from the bread.

 

What I've been working on is really more backend style anyways. I have been thinking about a few different things though.

 

I saw one anon post something about there needing to be an RSS feed for QPosts. I think that should be pretty easy to provide. If I get some time I may whoop something out.

 

I've been playing around with the timelineJS. I worked it up where you can select a specific timeline. Qposts. DJTweets. Etc. Q has mentioned timelines a few times and I've been looking around trying to find threads that were timeline based. No real luck so far. Anyways, I was thinking about working on some different timelines.

 

I've been starting to wonder if moving to a database solution rather than file based json is going to be worthwhile. Better speed probably? Built in caching? Do I want that for an api? What does everybody else think?

Anonymous ID: 70e498 April 10, 2018, 5:56 a.m. No.981495   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>8865

I built a new API to get a specific post from a specific bread. Maybe I'll get it uploaded today.

Looks like ~/api/bread/981411/981444/

to get >>981444

 

Researching an RSS/ATOM feed. That looks to be low hanging fruit.

Anonymous ID: 70e498 April 10, 2018, 10:52 a.m. No.984329   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

I was contacted by a guy that says he's from this site http:// we-go-all.com

 

Looks to have a Qcodefag repo installed on a page. He wanted to know if he could help at all and I asked him if he had posted anything in here.

 

He doesn't know anything of the codefags thread. He's interested in access to the api. I don't wanna dox the guy, but this name matches a guy that works for Representative Jared Polis (D-CO 2nd)

5th-term Democrat from Colorado.

http:// www.congress.org/congressorg/mlm/congressorg/bio/staff/?id=61715

 

Probably nothing. The QCodeFag stuff is open, 8ch is open. Nothing to worry about anons?

Anonymous ID: 70e498 April 10, 2018, 4:20 p.m. No.988865   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>981495

All updated

New Qanon ATOM feed:

 

I managed to throw together an ATOM feed here:

 

http:// qanon.news/feed

or

http:// qanon.news/feed?rss=true

 

It returns the last 50 of q posts. It's a work in progress. I can include referred posts, images etc.

 

New Timeline api: Timeline api that shows Qposts and DJTweets. I also set up an Obama timeline that another anon pointed out. I'm planning on adding more to it and some other timelines I'm thinking about. You can see a few at http:// qanon.news/timeline.html

Anonymous ID: 07564d April 10, 2018, 4:31 p.m. No.989046   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

With the contexting problem I'm working on, I'm thinking I need to also write a "mea cupla" system for when a bread (or bread-like) post is not properly identified. It would go in and recalculate context when status of a post changes. This way, I don't have to be so concerned whether bread posts are properly identified at the outset, and I can just get on with it.

Anonymous ID: f47016 April 10, 2018, 4:48 p.m. No.989332   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4355 >>9845 >>0025 >>1237

Hey CodeStuds - I was wondering if there's a quick way to find all posts in the qresearch thread by 'U'? I've run across a couple and I've really enjoyed them. I am not trying to take anything away from 'Q' drops - I owe 'Q' a ton for waking me up. But the 'U' drops always ease my mind and make things clearer for meโ€ฆnot sure if they're benefiting anyone else in the same way or not. I wanted to grab them all if I can find them. Thanks Patriots. #WWG1WGA

Anonymous ID: 07564d April 10, 2018, 5:24 p.m. No.989845   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>989332

There could have been before I took everything down and then uploaded only select posts. But to do what you want, I still would have had to set up a whole word search mode, and I didn't have that yet. I abridged my public database due to obnoxious content by shills. I don't want to republish that stuff. I won't put the whole thing back up unless I have a way for visitors to flag posts for review, and right now I don't.

Anonymous ID: 07564d April 10, 2018, 5:36 p.m. No.990025   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>989332

If all you want to search are Q posts, you could try using my system. The way it's set up, you can't force it to look at the first or last letter of the post. But you could try doing searches with a space before and after or a period before and after and other such things to force a word search. The REGEX of the LIKE statement is not strong enough for much else than this.

http:// q-questions.info/research-tool.php

Anonymous ID: 07564d April 13, 2018, 5:13 p.m. No.1030259   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4522

>>1028050

It probably should be part of my work eventually, but it isn't yet. It's taken some time to get to that contexting feature. I'm finalizing the algorithm now.

 

A context chain will begin with a post that has been listed in a bread post and go backward through the links. These are either from the top of the thread or later where the next baker is being told what to include.

 

Links will also be followed backward from Q posts.

 

Contexts will stop at bread posts and not include them. (The intent is for context chains to stick to one topic as much as possible.)

 

When a post that includes a map is encountered, the posts from the map will not be included in the context chain, but links from the text of the post will be included. (Same reason as above: Maps include multiple topics.)

 

I will keep track of context chains that include Q posts. These can be shown with the Q posts. To minimize confusion, I will be displaying the context chains in separate bordered DIVs with a display/hide button. Not sure yet which to make default. Probably the hidden state to minimize clutter. I MIGHT parse the description of the leading post of the chain from the bread post into it. In the hidden state, this would be all that would show.

Anonymous ID: 70e498 April 13, 2018, 8:47 p.m. No.1034522   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0483

>>1030259

Interesting that you should post that anon, I've been thinking the same thing. We need a crawler. Sounds like a great idea. A better way of visualizing the context thread would be great. Ya know I've been reading about Google. PageRank. How that was designed in the beginning. Links you come across that have alot of responses can be either good or bad on 8ch.

 

With the new breadID/postID feature I rolled out you could find anything you were missing for sure.

 

So you think your initial targets are just the baker posts and the other posts that are deemed notable?

 

I've been wondering if we could use a hashtag internally for our own benefit. #notable. That kind of thing.

 

It sounds like an interesting project. If I can help at all let me know.

Anonymous ID: 73cc1f April 14, 2018, 9:37 a.m. No.1040483   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0716

>>1034522

Hmmmโ€ฆ. I wasn't thinking about doing an indented method of arranging things. Should I be?

 

And if I knew how to pass off some of the work to others, I'd do it. It's a LOT for one person to do. One of the reasons it's been taking so long is because I'm still adding to the database, etc. If I had left the entire database online, perhaps? But the clowns were shitting things up with some truly raunchy stuff, and I didn't want to republish that. Truth is, though, that I've done some preliminary with this already. It shouldn't take me long to finish the coding. But it will take a while to do the following:

-properly identify the bread and map posts on some 2300 threads (Yes, this matters.)

-identify the posts listed on the map posts

 

Even so, I've identified enough bread and maps already that some interesting stuff should begin floating to the surface. That's part of what is taking so long. The code is pointless without at least some of that done.

 

I'm eager to get to work on this. I lost an entire evening/night due to a power outage.

Anonymous ID: 73cc1f April 14, 2018, 9:55 a.m. No.1040716   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1320

>>1040483

I think what I'm getting at is that it's difficult to share the work without putting the entire database back online again. If I do that, I may have to do the following:

 

-Buy dedicated hosting. If I do that, I'll be putting a donation button on the site for sure. So far, this has all been from my own time (a LOT of it) and resources.

-Including a "report this post" button. Like I said, I don't want to be republishing truly obnoxious unrelated stuff. But it's all on me right now, and I can only do so much by myself. I'd have to let the community help me control that content.

 

But you know, really, the way I'm doing things now has a good side to it. There's a lot of fluff in the complete database. The way I'm doing things now eliminates a lot of that. You're going to get the dense info rich posts this way.

Anonymous ID: 70e498 April 15, 2018, 8:25 a.m. No.1051320   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1049462

Nice.

>>1040716

I bought hosting from Godaddy. Unlimited bandwidth and 100GB storage. Economy plan on sale was $12/year. I think I even got another domain with that deal for $1/year that I'm not even using.

Yet.

 

I hear ya on time. My shit got bumped from the bread because 1 anon got confused about a malware notification. I've got 2 pretty solid months of time in on what I've been doing and got taken out by a single post.

 

As we reach more and more of the masses, the information is going to appear on more sites that show ads/donations. It's a way of paying for the infrastructure needed to provide the service. I see nothing wrong with it.

Anonymous ID: 73cc1f April 15, 2018, 7:33 p.m. No.1059305   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>9896

>>1049462

The Research Tool can now display context the way I described above EXCEPT that I have not built in a show/hide button yet.

 

Right now, you have available to you SOME context that I calculated during my initial work putting together a contexting feature a couple months ago. I have more up through the date on the first image, but I have to get an export/import process built to get it into the online database. Since I have an export/import system for the posts, it shouldn't take much to make a modified version for the contexts.

 

My current task list is:

โ€“ Build the export/import process for the contexts.

โ€“ Get the contexts calculated for the 2300 or so more threads that I currently have. This could take several days.

โ€“ Then perhaps I'll look at getting that show/hide button in there. I might do it in the middle of working on getting the contexts calculated if I get bored of that.

โ€“ After that, including POTUS tweets is next.

 

http:// q-questions.info/research-tool.php

Anonymous ID: 70e498 April 15, 2018, 8:03 p.m. No.1059896   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0038

>>1059305

Wow anon. It's coming together. It will be great to see it once finished.

 

Interesting what you are doing with the links. I think some of my pages are linking like the qcodefag sites. The RSS I hooked up to go back into the api. Think I should change that?

Anonymous ID: 73cc1f April 15, 2018, 8:11 p.m. No.1060038   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1059896

That's up to you and how you want to display your data. It might be cool to automate at least the downloading of new threads for what I'm doing. But to get the contexting right, I have to go through what comes in anyway. As mentioned before, not properly identifying bread and maps can overload the context chains.

Anonymous ID: 73cc1f April 18, 2018, 2:01 a.m. No.1087614   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>8682

Contexting functionality is complete. The export/import process to make calculated contexts is complete.

 

I asked Anons on the general thread whether it is more important to calculate the contexts or to include POTUS tweets. The ones who responded want the tweets, so that's next.

Anonymous ID: 51250b April 18, 2018, 3:45 a.m. No.1087924   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1080429

I think the messege 'we are being set up' is in response to the SC failing to pass the IMMIGRATION BILL. Also POTUS tweeted CA will not be accepting national guard to border.

https:// www.denverpost.com/2018/04/17/neil-gorsuch-immigration-law-vote/

Anonymous ID: 70e498 April 18, 2018, 6:13 a.m. No.1088682   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0066

>>1087614

Nice.

Let me know if you want to hit the smash data. I'll set you up.

 

I rejiggered the links on some of my pages. It was set up like the qcodefag sites where each post contained a link back here. I changed that to a self referencing link instead. I decided to not be the cause of any more traffic back here.

 

Statistics show that the pages people coming to my site are interested in primarily the presentation pages - not the API. I think what I've decided to do is remove all references to the API - but still provide it. Default to the posts page or something. I got a few ideas.

Anonymous ID: 73cc1f April 18, 2018, 11:13 a.m. No.1091190   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1273

I'm looking at the help page, but I don't understand how to actually make the call to your API. It looks like the call I would want to use is

 

GET api/timeline/{2}

 

but I don't see how to actually implement it.

Anonymous ID: 4a2958 April 18, 2018, 12:15 p.m. No.1091802   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1988

My search-fu is nonexistent & need help for something current:

 

Somewhere within the past few weeks, someone posted a manual for Mueller firing protests. Didn't see it as a notable in BoB. Think it might have been pinched from ShariaBlue or the like. Thought it was a pdf, but not sure. Couple of screengrabs posted. In any case, it was a pretty thorough treatise on how to organize the march, chants, dealing with infiltrators (:D) and other stuff.

 

A couple of posts appeared today where one city (Pittsburgh) police department announced they were preparing for "semi-spontaneous" Mueller firing riots. That means they have that manual (but aren't disclosing it).

 

If we can find that manual again and post it all over that town's (and other) social media, it will awaken many to the fact that most of these protests are always preplanned.

 

Anyway, sorry for the hijack, but appreciate any help.

I just can't find it.

Anonymous ID: 4a2958 April 18, 2018, 12:57 p.m. No.1092170   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2387 >>2813

>>1091988

 

Someone found the site where it was from in the current bread:

 

https:// act.moveon.org/event/mueller-firing-rapid-response-events/search/

 

I could have sworn it was the whole "rapid events response manual" from MoveOn or allied organizations as a standalone doc.

 

"Mueller" would return too many hits.

 

Maybe Mueller + fired + protest(s) or something. Maybe add "plans" or "manual"

 

This is why their Mueller firing riots plan should get out into the public domain before any protests occur:

 

http:// pittsburgh.cbslocal.com/2018/04/18/robert-mueller-pittsburgh-police-prepare-riots-if-trump-fires/

 

Normies will realize how scripted all these protest marches are.

 

On phone so can't grab the whole site.

 

TY for any help!

Anonymous ID: 73cc1f April 18, 2018, 1:21 p.m. No.1092387   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2397

>>1092170

Try these. I was not able to find any PDF files posted recently about this.

>>208025

>>209411

>>211959

>>214550

>>674819

>>725107 (Unfortunately, I was not able to find the fullsize image of this one. Put a request on the general thread as well as Lost & Found if you really want it. Ask them to put it in the Lost & Found thread so you can find it if you look later.)

>>1003999

Anonymous ID: 70e498 April 18, 2018, 2:08 p.m. No.1092764   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>6204

>>1091428

The Smash API will give you more data you want.

 

You probably don't want the timeline stuff just yet. Unless you want to just stick with the default q/DJT timeline. Just do a get on the timeline API. The timeline API filters out all the tweets to just show the 5,10,15โ€ฆ deltas.

 

Yeah Gotta add the full path to the URL. If you are hitting it programattically I gotta give you access. Domain you would be calling it from?

Anonymous ID: 4a2958 April 18, 2018, 2:22 p.m. No.1092887   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1092813

 

Yes, that's most of the material, but it had been put into a document (pdf or doc, I think) and indexed.

 

Much easier to forward a doc to which notes can be added than point normies to a site which is hostile-owned. That document (in whatever format) contained all the articles on that page and more. Was well done by somebody.

Anonymous ID: 4a2958 April 18, 2018, 2:32 p.m. No.1092973   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3667

>>1092813

 

Somebody found it!

 

www.scribd.com/document/375930782/Nobody-is-Above-the-Law-Mueller-Firing-Rapid-Response-Moveonorg-Protest-Guide

 

This is the basic protest manual all Soros/ SEIU and associated groups use.

 

Great doc to hand out to redpill people. Leave the redline the Mueller title and add the protest du jour.

 

Found in this bread:

 

https:// 8ch.net/qresearch/res/1092389.html#1092719

Anonymous ID: 73cc1f April 18, 2018, 3:40 p.m. No.1093667   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3804

>>1092973

I'm glad you found it. I'm beginning to think that I need to get the entire database back up there again, even if I have to not upload the images. We've had a couple of search requests like this for which I've had the data. In this case, the original posts had been removed, which would explain why he couldn't find it.

Anonymous ID: 4a2958 April 18, 2018, 3:51 p.m. No.1093804   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3832 >>3909

>>1093667

 

Since it was in a Scribd doc, not sure it would have been found anyway, unless someone commented on it using key words.

 

I couldn't even hazard a guess as to what percentage of information here since Day 1 is critical vs.otherwise. Throughout it all, it's painting pretty clear pictures of the players& their proclivities, even if we haven't found a smoking gun yet.

 

In any case, thanks again for everyone's efforts.

Anonymous ID: 73cc1f April 18, 2018, 4:01 p.m. No.1093909   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1093804

There is an awful lot of absolute garbage posts out there, to be sure. And now that there are over 1.5 million of them, there is no way one person can censure out the stuff that absolutely should not be republished. I don't like the idea of putting all of the unreviewed stuff up there without their images, either, since a lot of the intel is in those images. It's a tough call. Even though I do have a content warning on the research page, I have concerns about the legal side of just blindly posting some of those images. I most definitely couldn't do it without a reporting feature.

Anonymous ID: 70e498 April 18, 2018, 7:22 p.m. No.1096311   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>9789

>>1096204

Well you can get all those from the trumptwitterarchive. What I did is group them into days that Q posted, and then only calculated the ones that DJT tweeted after Q posted.

 

If you check the API you can see the data, or look at http:// qanon.news/smashposts.html to see it more visually.

Anonymous ID: 70e498 April 19, 2018, 6:27 a.m. No.1100463   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0644 >>0681

>>1099789

You are on it!

Pain having to get the 2017 and then the 2018 from TrumpTwitterArchive butโ€ฆ it's the only way.

 

I guess I could suck all that in and then offer it as an apiโ€ฆ just raw twitter data.

 

I only thing I found with the twitterdata is that there's a 9 day gap in January at the beginning of 2018. I've been fighting off a compulsion to archive those (manually) to make it complete.

 

>>1099789

css : You can just use the twitter magic.

https:// dev.twitter.com/web/overview

On the smash page I just make links and decorate with the bird and tweet. The timeline does it automagically.

 

Here's a question for you.

How hard would it be for you to remove all the inline style you have on q-questions.info/research-tool.php ?

 

Do you know about jqueryui themeroller?

Conjigger your jqueryUI website and then download the custom css like magic.

Anonymous ID: 70e498 April 20, 2018, 2:24 p.m. No.1119101   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>5149

>>1117987

Kinda wondering about that myself.

IMO, he was talking specifically about the NP/NK video. Many have archived that offline.

 

On one hand, I'm archiving online - but that makes it easier for others to archive.

On the other hand - I'm archiving at home too.

 

The online stuff I'm doing has no bearing on my archives. I put it online so others could use it.

Anonymous ID: 4202ae April 20, 2018, 2:30 p.m. No.1119154   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4796

>>1117987

 

Hardcopies. Print out things. Copy files to USB/CD/DVD. Place inside of safe or better yet faraday cage. Use means that are hard to destroy and items that are not online and can be erased via virus or EMP. It's not just for you, but for the Country. Think that everyone is an off line version of "the cloud" but with a hard copy.

Anonymous ID: 73cc1f April 20, 2018, 9:04 p.m. No.1125149   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1119101

That was the reason I finally ended up putting it online as well. It seemed a shame to keep that functionality to myself. I reworked a few things to make it better in a multiuser environment. It ended up being better for myself as well.

Anonymous ID: 73cc1f April 23, 2018, 7:22 p.m. No.1164429   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

Well, now I feel stupid. I just realized there's an "Expand all images" link in the lower right of the page header. Had I realized this, I would not have lost so many full size images. One save could have been done in thumbnail most, and another in full size mode, and I would have had everything on the page.

Anonymous ID: 73cc1f April 23, 2018, 8:03 p.m. No.1164952   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

The ctrl-S method of saving a page will NOT automatically pick up the full size images when in thumbnail mode. If the page is expanded when the save is done, then you'll get the full size images (but not the thumnails, though this is a minor issue).

 

So here's my suggestion to get the best archiving:

 

You can save once or twice, but one of the saves should be in expanded most. If you want the thumbnail mode as well, then that's a separate save.

 

(All of the official archives so far have been in thumbnail mode.)

Anonymous ID: 90e281 May 1, 2018, 7:58 a.m. No.1261359   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

Anon asked for a word count in all Q posts and I did it really quick. Just gonna drop this here.

 

Here's the results, sorted by occurrences.

 

https://pastebin.com/e1u1jxR2

Anonymous ID: 19c17f May 5, 2018, 2:09 p.m. No.1310801   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>0819 >>0983 >>3635

Sorry for popping in on you re this but the Anon I was speaking with about "time stamps and markers" said to come here. They have info for me to be able to start working on it.

 

There was a thread dedicated to this but appears to be missing now or I keep missing it.

Anonymous ID: 131565 May 5, 2018, 2:30 p.m. No.1310983   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1143 >>3047

>>1310801

I'm a little behind in my archives at the moment, but that should be remedied by this evening. (I was busy working on my tools.) My site is a good one for looking at tweets vs. Q posts because I can show them on the same timeline.

http://q-questions.info/research-tool.php

Anonymous ID: 19c17f May 5, 2018, 6:19 p.m. No.1313371   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3627

>>1311143

>>1312967

Got the - http://q-questions.info/research-tool.php

Got - https://qanon.pub/

and another that has actual screenshots of Q's posts

 

been doing some research on Q's marker and need to clarify then will start on "wind the clock"

 

Much appreciate all the help - I need to do a better job at bookmarking important info on decoding Q.

Anonymous ID: 19c17f May 5, 2018, 6:44 p.m. No.1313627   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3667

>>1313371

Found this in QMap PDF thread. Going to try to locate Anon because no sense on duplicating work.

 

"Anonymous 01/28/18 (Sun) 10:17:16 ID 3c320a No.190706

 

>>187088

 

Thank you for all this hard work. One thing that I think would really help. If the book could include all the Q post with Time Stamps including the early post before trip code. This needs to be searchable by time stamps (EST). The time stamps and dates could be either with each post or in the front with reference to the post. I find that the time stamps are important to first identify Markers. Iโ€™m currently have to jump from time Stamp Search to Marker Search and most data bases I use are not complete with latest posts. This would be extreamly helpful. Thank you Anon. Truly a Patriot! One other thing is some links to Q posts are 404 when link is clicked so I canโ€™t find related time stamp."

Anonymous ID: 90e281 May 5, 2018, 6:45 p.m. No.1313635   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3775

>>1310801

You may be looking for the Delta thread.

>>410413

 

I think I told you to come here. I did some Delta workk here

http://qanon.news/smashposts.html

 

That Delta is only considering the difference between a Q post and a DJT tweet. There's is nothing in there to account for DJT corrections of deltas between tweets.

 

The deltas you see on the smashpost page are spread out across the Q posts - since there is a different delta for each.

IE: Q posts at 12:00p

DJT tweets at 12:10p [10] delta

Qposts at 12:05p <- this would also mean the DJT tweet at 12:10p is also [5] delta.

 

I did it like that because I wasn't sure of the meaning of all deltas. Is a [29] valid? Only on the 5's? Good luck anon! Let us know what we can do to help.

Anonymous ID: 131565 May 5, 2018, 7:12 p.m. No.1313946   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1313667

Yes, my posts are saved in GMT also.

 

I've about got the issue with the new Q board taken care of. I just needed to tell my database about it. I'm getting those posts ready to upload now.

 

As it happens, I'm currently working on setting up special search types that you may find useful. One of those search types will show just the Q posts and POTUS tweets. That way, you won't have to think about the proper way to limit your searches if that is what you are after. Look for that in the next day or two. I'm still working on finalizing that feature.

Anonymous ID: 90e281 May 5, 2018, 7:32 p.m. No.1314163   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>9771 >>1452

>>1313775

The deltas are what helps you find the marker.

IE: Q posts something about "win" 5 mins later DJT posts something about "Goodwin" That's a marker. (Just an example - I don't remember the deltas on the goodwin marker.)

 

The Delta thread where the work has been done on deltas. I'd like to see def documentation of confirmed markers.

Anonymous ID: 131565 May 6, 2018, 12:22 p.m. No.1319771   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>6496

>>1314163

It would not be difficult at all to include calculations in my displays. So let me double-check what the logic should be.

 

When displaying a Q post

โ€“ show delta since last Trump tweet.

 

When displaying a Trump tweet

โ€“ show delta since last Trump tweet

โ€“ show delta since last Q post

 

Is there anything else?

Anonymous ID: 90e281 May 7, 2018, 6:36 a.m. No.1326496   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>9318 >>9416

>>1321452

>http://q-questions.info/research-tool.php

Looking good!

 

Checking the Show Delta box seemed to kill off any results for me tho. I'll try again later!

 

>>1319771

I believe you are nearly correct.

Once you have found a [marker], then the time between DJT tweets/Corrections appears to be the indicator of another marker. I don't think it goes back to a Q post delta.

Check the logic for the [5] & [1] markers.

 

I disregarded all negative deltas (any tweet BEFORE a Q drop). There's information there possibly - but it just introduced too much noise into the results.

Anonymous ID: 131565 May 7, 2018, 1:07 p.m. No.1329416   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1326496

Not sure how checking the box kills results. The logic of the check box is implemented in a way that does not affect the search logic. The deltas are calculated after the fact. The actual SQL statement that creates the results is at the top of the page. That doesn't change. Still, I've seen unusual and unexpected things before. What are you seeing that has you thinking there's a difference in the search results?

Anonymous ID: 131565 May 8, 2018, 3:43 p.m. No.1341497   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1398

Never mind. The whole darn thing broke. I had overhauled the search logic to better support the data prep steps, and I guess stuff got messed up in the process. When I get done being disgusted about that, I'll fix it.

Anonymous ID: 8dde8e May 8, 2018, 5:34 p.m. No.1342663   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1341498

#68

I wonder if what Q is referring to is the Legal Status of the US., Macron brought a new contract to sign for Trump in conservatorship. That the old, legal status with the Rothschilds is no longer in effect due to bankruptcy.

Anonymous ID: 90e281 May 17, 2018, 12:01 p.m. No.1445611   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>6207

Hows it looking you faggots? Things progressing as designed?

I got a nagging image issue sorted out. Now archiving Q images and reference images to my site. Just about ready to get back on the elasticsearch idea.

Anonymous ID: 131565 May 17, 2018, 12:53 p.m. No.1446207   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>6391

>>1445611

I have no idea what elasticsearch is. Would you care to explain?

 

I'm still working on things. At the moment, I'm adding some editing features to the research-tool version of things that I'd had in a prior tool. If you've noticed, older posts on my site have thumbnails and screenshots of links from the posts. And I've also started some work on the flagging feature so that I can feel better about putting all of the images back online rather than just selected ones.

Anonymous ID: 90e281 May 17, 2018, 1:06 p.m. No.1446391   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1446207

Superfast multitenant full text search for json. Clients in Java, C#, PHP, Pyton, Apache Groovy, Ruby etcโ€ฆ

 

I think all I need to do is write something that will input all my json into my local elasticsearch instance and then all lights are for go.

Anonymous ID: 7daa5d May 22, 2018, 10:23 a.m. No.1506424   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1211

I've heard whispers of Q + Team posting at set time intervals

Worthwhile to investigate

How to visualize?

Side by side threads (yes, whole threads!) + time lines (with colours)

Helluva Job, No doubt, but who else to ask .. ?

Anonymous ID: 131565 May 30, 2018, 4:14 p.m. No.1591211   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1506424

Just got back from vacation and saw this. My site can display Q posts and Trump tweets in the same search results in time order.

http://q-questions.info/research-tool.php

I just got back from vacation, so my archive is over a week behind at the moment. I should be more current in a few hours.

Anonymous ID: 131565 June 13, 2018, 12:43 p.m. No.1732451   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3098

Last night, anons were discussing the fact that the chans are part of history. Concern was expressed about the shill impacts on the boards and that perhaps there needed to be a cleaner view of it all. I suppose one answer could be to get back to the original purpose intended for the private version of my database, which is to identify what should be included in the blog that is in the root directory of the site. I haven't actually updated anything there in quite a while. Maybe it's time to get back to that.

Anonymous ID: 131565 June 13, 2018, 1:19 p.m. No.1733203   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1733098

It HAS been a lot of work and will continue to be. I've been coasting for a bit, just making sure that the general threads have been archived and made available. But there's also a lot of processing to do with the data if the ultimate goal is to be achieved as imagined. Kinda wish there was a way to safely share the work.

Anonymous ID: 90e281 June 13, 2018, 2:02 p.m. No.1733870   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1733098

I heard that. I coasted about 2 weeks for the same reason. I've been working on tightening up the site and working on small bugs I've found.

 

I implemented a search for Q posts and am working on the big bread search now.

Anonymous ID: 131565 June 13, 2018, 2:16 p.m. No.1734049   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>5439

One of the tricky things about making my research tool available publicly is that the platforms are different. Different operating system, different database, and (apparently) different PHP. So I may have something working perfectly on my development machine, but I find there are problems when I try to share it. If the focus is to prepare the blog, which is an abridged view, then maybe I shouldn't sweat it if what I have shared publicly doesn't always work?

Anonymous ID: 90e281 June 13, 2018, 4:12 p.m. No.1735439   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>6163

>>1734049

Ahh you've entered the big new world of internet interoperability! The internet is great, but it's not always the easiest to move data from platform to platform.

 

It's one of the reasons I stuck with straight JSON. Platform independent. Easily shared. Do you have the capability to transform into JSON/XML? What is your end goal? Share the database? Share the data? The app itself?

Anonymous ID: 131565 June 13, 2018, 5 p.m. No.1736163   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1735439

I probably do. It's all databased. I'd just have to put stuff into a structure and run an encode_json() on it. Not sure it would be all that easy to put the advanced features into the JSON, though. It doesn't solve the problem of making something accessible for non-techie types, though, which is my goal.

Anonymous ID: 965f24 June 22, 2018, 1:58 p.m. No.1865219   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>5264

http://YaCy.net โ€“ distributed search engine โ€“ has 17 hits for clean query {Q Clearance Patriot}. Kek.

 

But we should probably download the software and seed a lot moarโ€ฆ

Anonymous ID: 131565 June 22, 2018, 2:50 p.m. No.1865819   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1865264

Certainly a page could be made for telling people how to search the original sources. Maybe it could include input fields as well to help people get it right. Unfortunately, original sources have been hacked from time to time, and some material is no longer available.

Anonymous ID: cabbaa June 23, 2018, 3:29 a.m. No.1873487   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>6296

Hi there anons, just stumbled on this thread in my search for a collection of notables.

Anyone thought of putting them together in a tread/breads?

What were/would be pros/cons of doing such?

Data duplication, Too big etc.

Are there easy ways to make/view/access such collection?

Anonymous ID: 131565 June 23, 2018, 10:42 a.m. No.1876296   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>9714

>>1873487

My project has the capability of searching by threads.

 

As for breads, I'd been working toward that, and I'll probably get back to it soon. The challenge of breads is a bit tougher because they must be identified. So far, my own solution has been a combination of automation and inspection.

Anonymous ID: cabbaa June 23, 2018, 4:09 p.m. No.1879714   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>1876296

Hey TY for getting back to me about this anon.

Your solution is similar to mine I see.

It is why I'd like to have a blogroll with exclusively notables, scraped from all breads by automation, so I could inspect the works thereof.

Anonymous ID: 90e281 June 29, 2018, 4:42 p.m. No.1963374   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>1866

>>1774255

I may be back to Solr not being a good solution.

 

In trying to create a prebuilt index I've discovered that either

a) javascript just doesn't have enough memory to do it

b) javascript times out before it gets done and nothing happens.

 

I'm going to take a closer look at this

https://xapian.org/docs/bindings/csharp/

Anonymous ID: 90e281 June 30, 2018, 10:44 a.m. No.1971866   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>2297

>>1963374

Moar testing today. Solr is NEVER going to work in this instance. I was hoping that I could just create an index on my dev machine and save that off and then use a worker process to add to the index. I've got one other idea to see if I can bend it to my will - but so far no workie. From what I can tell it's not possible to add to the index - it needs to be completely regenerated when you add a new document.

 

I don't understand how other people can add so many docs to the index and have it work. My tests were showing it to run for 12+ minutes just to generate an index and it never finished.

 

I'm open to new ideas if anybody has one.

 

The custom Google search I've got on there now does seem to work, but again it's not ideal. What I want is a list of POSTS that match and the goog search seems to find the matches, but only returns complete breads. You still have to CRTL F to find what you were looking for within the bread.

 

I can put together a test harness for Solr if anybody want to see if they can figure out a way to make it go.

Anonymous ID: 90e281 June 30, 2018, 11:16 a.m. No.1972297   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>3030 >>7608

>>1971866

My gut is telling me that my next best option is to move into a database in order to accomplish the bigbreadsearch. It's probably possible to do using a hosted elasticsearch solution (https://www.elastic.co/cloud @$50/mo)

 

On the other hand, I think that I can write an app to fill a database in a couple hours, and it would solve a few of the problems I was seeing in the other search tech. Most of the good search engines will plug into a database anyways so I think this is probably the direction I'm headed.

Anonymous ID: 90e281 July 2, 2018, 6:08 a.m. No.1997626   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun   >>4084

>>1993030

>$50/month seems like a lot. My cost isn't nearly that much.

 

Derp. I clicked the wrong post.

 

I agree - which is why I haven't done anything on it. My hosting costs a bit more than that - ANNUALLY.

I feel like a DB is just just going to be a better solution now. I'd hoped that I'd be able to just do everything with straight JSON - but alas! You cannot.

 

I guess I need to find the best search engine to plug a DB into now. I'm hoping to write the code to insert my existing data into the database today, write code to insert new data into the DB tomorrow.

Anonymous ID: 90e281 July 3, 2018, 10:37 a.m. No.2013368   ๐Ÿ—„๏ธ.is ๐Ÿ”—kun

>>2004075

Yeah. It's a hosted service. It appears that deploying a custom elasticsearch is probably a large pain in the ass most folks don't want to deal with.

 

>>2004084

I have SQLServer currently set up and my host gives me a database so I'll probably go with that.

 

>>2009885

WTFERK? We already have like 3 bread searches already now? Am I totally wasting my time?

 

Regardlessโ€ฆ.

Interesting! Tell me more about how you are doing this. Search seems to be pretty quick. Are you using a DB backend? Straight text search? Is all this in PHP?

 

I've managed to import all the JSON data I have on hand. 1,569,777 posts took 25mins to import. My DB design is ultra simple. Single table that virtually matches the JSON data structure. There's no telling what the performance is going to be like just yet. Even getting a count takes 16 seconds. Ugh.

 

I'll run some simple tests later to see what I can figure out.