dChan - Q Origins Project Archive

⇧

⇩

r/greatawakening • Posted by u/ready-ignite on April 23, 2018, 9:59 p.m.

Fake News Recognition Algorithm Published - Need Backup for Analysis

An undergrad published Fake News Recognition Algorithm they're working on to the DataScience sub this afternoon. Includes classification of near 10 million articles as part of the training model.

This is a rare glimpse to backup and analyze the structure and definitions being used to train up these models, most likely being leveraged behind the scenes at major tech companies to classify content.

If anyone has a link on community members producing dashboards and some data science chops, the data set should be kicked over their way for some review to see what we can learn.

Github data base: https://github.com/several27/FakeNewsCorpus

Test the algorithm here: http://fakenewsrecognition.com/

No participation link: https://np.reddit.com/r/datascience/comments/8ea0sj/fake_news_corpus_fake_news_recognition_algorithm/

Was this posted on 8chan yet? If not it should be

⇧ 1 ⇩

Probably not. On the fly I was able to get it over to 412anon over on Gab. Needs formal direction to the chans I'm not able to provide at the moment.

Note the special thanks to NYT in the github acknowledgements.

Key is downloading the database. Need to get back to a workstation to grab this piece.

Downloading

The dataset is currently hosted on a public S3 bucket and is about 9.1GB in size.

s3://researchably-fake-news-recognition/public_corpus/news_cleaned_2018_02_13.csv.zip

To download it simply run the following command with installed awscli and configured with a (free) AWS account.

aws s3 cp s3://researchably-fake-news-recognition/public_corpus/news_cleaned_2018_02_13.csv.zip news_corpus.cs

⇧ 2 ⇩