Anonymous ID: de850f Dec. 13, 2020, 8:40 p.m. No.12017045   🗄️.is 🔗kun   >>7088 >>7224 >>7281 >>7383 >>7484 >>7614 >>7616

I've been working on and am putting in the finishing touches on a program that uses NLP fuzzy name comparison algorithms parallelized with tensorflow to quickly compare names across two different databases to locate potential matches. The hope is that we can use this to efficiently cross reference and narrow down CCP agents (from the recent dump) with potential matches from any organization's roster database that can be dug up. From there, it targets the search of who we should look at closely and potentially out.

 

I'm hoping we can crowdsource this project. Not only can we each dig up and out potential people of interest in different places, but also other anons can dig into the standouts and see if we find a potential match. First thing though is I'll need the original CCP data dump in its english form, I only found it in Chinese so I've been testing using other databases with somewhat similar format. We'll of course need the original data for this to be any use.

 

Additionally, right now the program is in a py script and pretty user unfriendly to non coders. I work mostly in Jupyter notebooks, then dump it all into a script once the prototype is working. Perhaps some anon with a background in it could convert what I'm making into a simple UI to allow anons to more easily use it regardless of tech expertise, maybe even including some easy way to install the necessary libraries.

 

This is a call out for digital solders. Lets find these Commies so we can drive them from our shores.

Anonymous ID: de850f Dec. 13, 2020, 9:45 p.m. No.12017616   🗄️.is 🔗kun

>>12017045

##In progress; Early, non-optimized functional version. Proof of concept.

##Name CSV of known commies as 'commies.csv', dataset to test against 'test2.csv'. Both must have name as first column

 

import hmni

import pandas as pd

import tensorflow as tf

 

matcher = hmni.Matcher()

d1 = pd.read_csv('commies.csv')

d2 = pd.read_csv('test2.csv')

commies = d1.iloc[:,0].tolist()

to_compare = d2.iloc[:,0].tolist()

com_rows = list()

compd_rows = list()

 

##Wanting to parallelize loop with tensorflow

for i in range(len(commies)-1):

for j in range(len(to_compare)-1):

name = to_compare[j]

name = name.replace('.','')

if ',' in name:

n = name.split(',')

name = n[1] + ' ' + n[0]

if ' ' in name:

name = name.replace(' ', ' ')

if matcher.similarity(commies[i],name) >= .5:

if j not in compd_rows:

compd_rows.append(j)

 

compd_rows.sort()

compd_rows = list(set(compd_rows))

to_check = d2.iloc[compd_rows,:]

to_check.to_csv('potential_commies.scv')