CCP leak statistics analysis
Also interesting: the latest version of my sqlite-utils CLI tool added a new "sqlite-utils analyze-tables shanghai-ccp-member.db" command which outputs interesting statistics about table columns - documentation here: https://sqlite-utils.readthedocs.io/en/stable/cli.html#analy…
Here's a partial output from running it against that database file:
[English translation]
member.name: (2/11)
Total rows: 1956731
Null rows: 0
Blank rows: 0
Distinct values: 1072319
Most common:
650: Zhang Wei
621: Zhang Min
620: Wang Wei
579: Zhang Lei
507: Wang Lei
488: Chen Jie
479: Zhang Jie
448: Wang Yong
442: Li Wei
434: Zhang Jing
member.sex: (3/11)
Total rows: 1956731
Null rows: 0
Blank rows: 0
Distinct values: 2
Most common:
1229406: male
727325: Female
member.ethnicity: (4/11)
Total rows: 1956731
Null rows: 0
Blank rows: 0
Distinct values: 51
Most common:
1935764: Han nationality
7636: Hui
4588: Manchu
2282: Mongolian
1468: Tujia
954: Zhuang
741: Korean
638: Hmong
351: Bai Clan
319: Dong Nationality
member.hometown: (5/11)
Total rows: 1956731
Null rows: 0
Blank rows: 0
Distinct values: 2875
Most common:
846850: Shanghai
350361: null
139903: Jiangsu
80958: Zhejiang
29935: Shanghai Baoshan
29705: Shandong
26366: Shanghai Chongming
26120: Anhui
17697: Shanghai Jiading
14745: Shanghai Nanhui
member.organization: (6/11)
Total rows: 1956731
Null rows: 0
Blank rows: 0
Distinct values: 77453
Most common:
3533: Retirement Branch
1925: The first party branch
1796: Party branch of the agency
1759: School overdue and did not return
1552: Retired a department
1528: Second Party Branch
1508: Retired Second Branch
1443: Retired Party Branch
1243: Third Party Branch
955: Retired Party Branch
member.education: (11/11)
Total rows: 1956731
Null rows: 0
Blank rows: 0
Distinct values: 22
Most common:
589531: University
360788: College
300370: Junior High
230117: Ordinary high school
161803: Master's degree students
158041: secondary specialist
93859: Elementary School
25460: PhD student
14729: Technical School
8894: other
https://news.ycombinator.com/item?id=25410608
https://news.ycombinator.com/item?id=25411571