dChan - Q Origins Project Archive

Anonymous ID: 438073 Nov. 21, 2020, 5:07 a.m. No.11724452 🗄️.is 🔗kun >>4707

>>11724174 lb

See this about Serbian President Vucic

>>11723976

>Early in March 2020 Aleksandar Vucic attended a meeting in Washington with the United States National Security Advisor Robert O’Brien. During this meeting Vucic asked O’Brien to urge Trump’s support for Vucic’s plan to populate Serbia with 1,2 million migrants, predominantly Islamic State fighters. This is the deal that Vucic made with Soros, who is behind the behind the migrant wave and migrant crisis. As a part of this deal Vucic gets 25 thousand dollars for each migrant he receives in Serbia. O’Brien was in shock. Firstly, he couldn’t believe that there is a madman willing to receive the terrorists and destroy his own country. Secondly, he was even more shocked that this madman is asking him for Trump’s support for this kind of plan, knowing full well that Trump is a primordial enemy of George Soros and that he is strongly opposed to migrants and to the Islamic State. Naturally, O’Brien thought that Vucic is setting him up so he asked Vucic did he spoke to anyone about this. Vucic underestimated the person he was speaking to so he lied and told O’Brien that he spoke to Grenell and Kushner. O’Brien immediately called for Grenell and Kushner to come to the White House and sure enough, Vucic was caught lying. Vucic began to squirm and tried to get himself out of this mess by implicating Hashim Thaci in all of this. Since Thaci was in Washington at the same time, he was immediately summoned to the White House and Vucic and Thaci were forced to admit what they were doing with Soros. Grenell can say even more about Vucic. Grenell was appalled when Vucic openly told him to ignore all the statements that he is giving to Serbian media because he has to lie to his own people. As the mediator for Belgrade and Pristina negotiations Grenell was able to see how hard Vucic is trying to give Kosovo its independence, which is why Soros brought him into power in the first place.

>>11724276

NOT HIM

https://www.sandia.gov/news/publications/labnews/archive/14-22-08.html

>>11724269

https://www.sandia.gov/news/publications/labnews/archive/14-22-08.html

KEGELMEIER Pt1

Can there be too much data?

Knowledge is power, but too much knowledge — in the form of data — can be a bad thing. “More information doesn’t always lead to better decisions,” says Philip Kegelmeyer (8900). “In fact, sometimes the two can be anti-correlated.”

An expert in machine learning, Philip has spent a lot of time pondering dangers and opportunities in “big data” — essentially, large and complex data sets that can only be processed on a supercomputer. He’s given numerous presentations to answer questions related to the use of personal data to enhance national security data analysis.

“Does all that data make a difference? Is it worth the privacy concerns?” he asks. “Big data is tricky. It can help or hurt your analysis, depending on how you use it.”

To understand these issues, Philip says you first have to appreciate how data can influence, or fail to influence, human decision-making. “The leading theory in evolutionary psychology is that intelligence evolved to win arguments, not to arrive at the truth. So in a roomful of people, the opinion of the most charismatic person often wins out,” he says. “That’s fairly depressing, and a good argument for thinking carefully about how data and judgment interact.”

The base rate fallacy

One way data can lead us astray is the base rate fallacy — an error in thinking in which we fail to take into account how likely things are to happen, or not to happen.

Philip gives the example of a bozometer that can accurately detect bozos 99.99 percent of the time. “I point it at you and it says you are a bozo. But are you really? The very counterintuitive answer depends on who else I test. This is not solely about you and the accuracy of the instrument,” he says.

On a pre-selected group of 2,000, of whom 1,000 are known bozos, the device will accurately find 999 bozos with one false alarm. But add a lot of untargeted data — the rest of the US population of approximately 300 million people — and you now have 300,000 false alarms.

“If you know there are only about 2,000 bozos in the entire data set, 99.99 percent accuracy isn’t so great,” says Philip. “The chances that you are really a bozo become quite small. This is the danger of adding untargeted data to any analytic.”

Even an analytic with 99.999 percent accuracy would still turn up 30,000 false alarms. “So you either need an incredibly accurate analytic, or a situation in which a high false alarm rate is acceptable,” he says. “This can work in the medical community, when medical tests are given to a broad population to screen for critical conditions. In this situation, a high false alarm rate may be tolerable.”

On the flip side, extra untargeted data can fill in connections and help you understand the importance of those connections. Philip invents the example of Abe and Abigail, who are both people of interest and have both been seen in Damascus. With additional flight information, you’d learn that they both frequently fly into Yemen and their time in Damascus almost always overlaps by a day.

https://www.sandia.gov/news/publications/labnews/archive/14-22-08.html

KEGELMEIER PT2

“Without broad data, that is all you have and those facts seem very suggestive,” Philip explains. “But if you look at the entire set of normal flight records for that region, you might learn, for example, that 80 percent of all travel to Yemen goes through Damascus, most of that travel requires an overnight stay for refueling, and that 90 percent of that travel happens in three months of the year. With this additional, non-specific data, the odds that any two random travelers to Yemen would be in Damascus at the same time go way up.”

This is an example of how large amounts of properly used data, even if the vast bulk of that data is about people who are not of security interest, can enhance national security data analysis. Such data, he explains, is useful in providing context for what is normal and what is truly unique, as in the case of Abe and Abigail’s travel patterns. “The human mind prefers simple stories. The value of bulk data is that it can tell us when the stories are too simple, when we should look deeper and not trust our first impressions,” he says.

Mining blog posts to predict violence

Philip led the 2008-2010 Networks Grand Challenge LDRD that demonstrates the power of big data. The project dug into the question of why certain events sparked violent protests. In 2005, the publication of editorial cartoons depicting the Islamic prophet Muhammad in the Danish newspaper Jyllands-Posten set off worldwide protests, violent demonstrations, and riots, which were blamed for the deaths of hundreds of people.

“This wasn’t the first or last time that these cartoons were published, so why such an extreme reaction that one time?” asks Philip. “We looked at blog postings and comments and how the information travels across the web and developed an algorithm that can predict, based on multilingual text analysis, if an event will spark deadly violence.”

The project took in a lot of data by continuously scanning blogs in multiple languages and analyzing the aggregated voluntarily public text for keywords, text clustering, and sentiment. “The prediction capability comes from looking at what is a ‘normal’ response to incendiary events in the news,” says Philip. “Our algorithm can tell us if the response will lead to violence, but it can’t tell us when, where, or by whom that violence will occur.”

Can you trust your data?

Philip has a complicated relationship with data — he doesn’t always trust it. “People can fall in love with their data, to the point that they are blind to the idea that an adversary can manipulate data,” he says.

He cites a major metropolitan police department that implemented a computer-based system to assign police officers to the neighborhoods with the most illegal drug activity. A college student arrested for possession of marijuana might not trigger an increase in police presence, but violence among cocaine dealers would. The program worked great, until police officers began seeing disparities between the computer program’s assessment of the neighborhoods and what they saw on the streets.

It turned out that a drug gang had started bribing a data entry clerk in the police department, a scheme that went undetected for a year before the gang got too ambitious. At first the clerk only flagged the arrests of the gang doing the bribing as less violent, but eventually they had the clerk flag the arrests of a rival gang as more violent.

So it soon all unraveled on the witness stand. “And it’s not like the tampering was subtle,” explains Philip. “They were able to track the problems with the data back to the very day the bribery started.”

>>11724473

CAPTION TO PIC

Philip Kegelmeyer uses the Bayes Rule, a theorem of probability theory, to evaluate the bozometer results. In cautioning about the potential pitfalls of relying too heavily on big data, Philip notes that if a bozometer worked with 99.99 percent accuracy, it would return more than 300,000 false readings when sampling the US population — 300,000 people incorrectly deemed to be bozos. (Photo By Dino Vournas)

KEGELMEIER PT3

https://www.sandia.gov/news/publications/labnews/archive/14-22-08.html

Unfortunately, adversaries also have far more sophisticated methods of sapping or suborning the critical use of data analytics on which many research institutions, government agencies, and companies rely, including Sandia.

“Through understanding our methods, adversaries seek to produce data that is evolving, incomplete, deceptive, and otherwise custom-defined to defeat analysis,” he says. “We can’t prevent this. In fact, we frequently depend on data over which adversaries have extensive influence.”

To address this problem, Phil is now leading another LDRD project, Counter Adversarial Data Analysis (CADA), that seeks to develop and assess novel data analysis methods to counter that adversarial influence.

“We are trying to understand if an adversary can know how we are using data and if they can actually change our data,” Philip explains. “How paranoid should we be that this could happen, and what can we do to remediate the situation? The bottom line is that big data can be powerful, but only if you understand the inherent weaknesses and tradeoffs. You can’t just take data at face value.”

– Patti Koning

Anonymous ID: 438073 Nov. 21, 2020, 5:41 a.m. No.11724677 🗄️.is 🔗kun >>4854

>>11724548

I can't see the connection of all the Serbian and DVS content to PROPLE. I looked at the Prople site…Is Trump's spell error the only connection?

Please confirm what is the connection?

>>11724269

Kegelmeier is thanked in this paper co-written by Aleksander Lazarevic

SMOTEBoost: Improving Prediction

of the Minority Class in Boosting

Nitesh V. Chawla1, Aleksandar Lazarevic2

, Lawrence O. Hall3, Kevin Bowyer4,

Business Analytic Solutions, Canadian Imperial Bank of Commerce (CIBC),

BCE Place, 161 Bay Street, 11th Floor, Toronto, ON M5J 2S8, Canada

nitesh.chawla@cibc.ca 2

Department of Computer Science, University of Minnesota,

200 Union Street SE, Minneapolis, MN 55455, USA

aleks@cs.umn.edu 3

Department of Computer Science and Engineering, University of South Florida,

ENB 118, 4202 E. Fowler Avenue, Tampa, FL 33620, USA

hall@csee.usf.edu 4

Department of Computer Science and Engineering,

384 Fitzpatrick Hall, University of Notre Dame, IN 46556, USA

kwb@cse.nd.edu

So Nitesh V Chawla is from Canadian Imperial Bank of Commerce - Toronto

Lazarevic is Univ Minnesota

5 Conclusions

A novel approach for learning from imbalanced data sets is presented. The proposed

SMOTEBoost algorithm is based on the integration of the SMOTE algorithm within

the standard boosting procedure. Experimental results from several imbalanced data

sets indicate that the proposed SMOTEBoost algorithm can result in better prediction

of minority classes than AdaBoost, AdaCost, “First SMOTE then Boost” procedure

and a single classifier. Data sets used in our experiments contained different degrees

of imbalance and different sizes, thus providing a diverse test bed.

The SMOTEBoost algorithm successfully utilizes the benefits from both boosting

and the SMOTE algorithm. While boosting improves the predictive accuracy of classifiers by focusing on difficult examples that belong to all the classes, the SMOTE

algorithm improves the performance of a classifier only on the minority class examples. Therefore, the embedded SMOTE algorithm forces the boosting algorithm to

focus more on difficult examples that belong to the minority class than to the majority

class. SMOTEBoost implicitly increases the weights of the misclassified minority

class instances (false negatives) in the distribution Dt by increasing the number of

minority class instances using the SMOTE algorithm. Therefore, in the subsequent

boosting iterations SMOTEBoost is able to create broader decision regions for the

minority class compared to the standard boosting. We conclude that SMOTEBoost

can construct an ensemble of diverse classifiers and reduce the bias of the classifiers.

SMOTEBoost combines the power of SMOTE in vastly improving the recall with the

power of boosting in improving the precision. The overall effect is a better F-value.

Our experiments have also shown that SMOTEBoost is able to achieve higher Fvalues than AdaCost, due to SMOTE's ability to improve the coverage of the minority

class when compared to the indirect effect of oversampling with replication in AdaCost.

Although the experiments have provided evidence that the proposed method can

be successful for learning from imbalanced data sets, future work is needed to address

its possible drawbacks. First, automatic determination of the amount of SMOTE will

not only be useful when deploying SMOTE as an independent approach, but also for

combining SMOTE and boosting. Second, our future work will also focus on investigating the effect of mislabeling noise on the performance of SMOTEBoost, since it is

known that boosting does not perform well in the presence of noise.

>>11724269

https://www3.nd.edu/~nchawla/papers/ECML03.pdf

SMOTEBoost: Improving Prediction

of the Minority Class in Boosting

1 Motivation and Introduction

Rare events are events that occur very infrequently, i.e. whose frequency ranges from

say 5% to less than 0.1%, depending on the application. Classification of rare events

is a common problem in many domains, such as detecting fraudulent transactions,

network intrusion detection, Web mining, direct marketing, and medical diagnostics.

For example, in the network intrusion detection domain, the number of intrusions on

the network is typically a very small fraction of the total network traffic. In medical

databases, when classifying the pixels in mammogram images as cancerous or not [1],

abnormal (cancerous) pixels represent only a very small fraction of the entire image.

The nature of the application requires a fairly high detection rate of the minority class

and allows for a small error rate in the majority class since the cost of misclassifying

a cancerous patient as non-cancerous can be very high.

In all these scenarios when the majority class typically represents 98-99% of the

entire population, a trivial classifier that labels everything with the majority class can

achieve high accuracy. It is apparent that for domains with imbalanced and/or skewed

distributions, classification accuracy is not sufficient as a standard performance measure. ROC analysis [2] and metrics such as precision, recall and F-value [3, 4] have

been used to understand the performance of the learning algorithm on the minority

class. The prevalence of class imbalance in various scenarios has caused a surge in

research dealing with the minority classes. Several approaches for dealing with

imbalanced data sets were recently introduced [1, 2, 4, 9-15].

A confusion matrix as shown in Table 1 is typically used to evaluate performance

of a machine learning algorithm for rare class problems. In classification problems,

assuming class “C” as the minority class of the interest, and “NC” as a conjunction of

all the other classes, there are four possible outcomes when detecting class “C”.

Table 1. Confusion matrix defines four possible scenarios when classifying class “C”

Predicted Class “C” Predicted Class “NC”

Actual class “C” True Positives (TP) False Negatives (FN)

Actual class “NC” False Positives (FP) True Negatives (TN)

From Table 1, recall, precision and F-value may be defined as follows:

Precision = TP / (TP + FP)

Recall = TP / (TP + FN)

F-value = Re call Pr ecision

( ) Re call Pr ecision

⋅ +

⋅ ⋅

2

2 1

β

β ,

where β corresponds to relative importance of precision vs. recall and it is usually set

to 1. The main focus of all learning algorithms is to improve the recall, without sacrificing the precision. However, the recall and precision goals are often conflicting and

attacking them simultaneously may not work well, especially when one class is rare.

The F-value incorporates both precision and recall, and the “goodness” of a learning

algorithm for the minority class can be measured by the F-value. While ROC curves

represent the trade-off between values of TP and FP, the F-value basically incorporates the relative effects/costs of recall and precision into a single number.

It is well known in machine learning that a combination of classifiers can be an effective technique for improving prediction accuracy. As one of the most popular combining techniques, boosting [5] uses adaptive sampling of instances to generate a

highly accurate ensemble of classifiers whose individual global accuracy is only moderate. There has been significant interest in the recent literature for embedding costsensitivities in the boosting algorithm. CSB [6] and AdaCost boosting algorithms [7]

update the weights of examples according to the misclassification costs. Karakoulas

and Shawe-Taylor’s ThetaBoost adjusts the margins in the presence of unequal loss

functions [8]. Alternatively, Rare-Boost [4, 9] updates the weights of the examples

differently for all four entries shown in Table 1.

Anonymous ID: 438073 Nov. 21, 2020, 6:26 a.m. No.11725003 🗄️.is 🔗kun >>5020

>>11724947

FFS THEY ARE ALL US CITIZENS

But keep posting it…

>>11725020

So how does repetitively posting the same old shit about escorting the Murdoch's to the airport make any sense if they are all US citizens? You think that because they were originally Australian this negates US citizenship? Or you just pretending to be stupid?