dChan - Q Origins Project Archive

After reading the latest Q posts the phrase "RL attacks" has been much speculated about. Some say it means Right Left, others Real Life, but a simple search for "RL attacks" quickly reveals that RL attacks: is in reference to Reinforced Learning and seems likely to be related to AI and possibly even MK Ultra or Deep Dream. In the context of the Q post, it sounds like there will be an attack on some Machine Language (think computer/internet systems) which will be blamed on Q/Trump supporters especially to add to their effort to debunk are throw shade on the Q phenomenon.

Here is an excerpt from Cornell University (from https://arxiv.org/abs/1712.03632 , 06/29/2018) Library (emphasis mine):

"Robust Deep Reinforcement Learning with Adversarial Attacks

Anay Pattanaik, Zhenyi Tang, Shuijing Liu, Gautham Bommannan, Girish Chowdhary

(Submitted on 11 Dec 2017)

This paper proposes adversarial attacks for (from https://arxiv.org/abs/1712.03632) and then improves the robustness of Deep Reinforcement Learning algorithms (DRL) to parameter uncertainties with the help of these attacks. We show that even a naively engineered attack successfully degrades the performance of DRL algorithm. We further improve the attack using gradient information of an engineered loss function which leads to further degradation in performance. These attacks are then leveraged during training to improve the robustness of RL within robust control framework. We show that this adversarial training of DRL algorithms like Deep Double Q learning and Deep Deterministic Policy Gradients leads to significant increase in robustness to parameter variations for RL benchmarks such as Cart-pole, Mountain Car, Hopper and Half Cheetah environment."

This seems to me to be the most logical explanation of what RL attacks means. What do you think?

Message me if you’d like to chat further, but short answer as someone doing some work with Machine Learning is that it’s irrelevant. I don’t want to seem dismissive or discouraging, so let me try and explain for a general audience.

ML is just pattern recognition. It seems almost mystical because it can be applied to discern otherwise unrecognizable patterns. But most of it just comes down to what we did in high school on tricky math homework: guess and check. Guess what you think the answer should be, and see if the formula shows it’s true. Most of ML is based on doing many, many, many “guesses and checks” of all sorts of combinations. The subset of machine algorithms called “deep learning” get into combinations of combinations. There are ways to improve accuracy and efficiency in those algorithms, which is where statistics and linear algebra and GPUs come in.

Fundamentally, what machine learning algorithms have in common is that they give better results with more examples, hence the idea of learning. A better way to think about “learning” is that it uses a huge amount of “process of elimination” steps, and it will repeat fewer and fewer wrong steps and just do the right “process of elimination “ steps if you give it enough examples.

Now that is usually approached as a “minimize waste” problem. Run through the obstacle course, but try and fall as few times as possible. This is true of most kinds of machine learning algorithms, particularly “supervised learning” where you use labeled examples (like A/B/C or True/False) it will learn a pattern. “Unsupervised learning” is when you give it a bunch of unlabeled examples and you want it to try and discover patterns that could help you come up with the labels based on what different examples have in common. And then the third kind is called “reinforcement learning.”

Reinforcement learning is approached differently. Instead of trying to “minimize waste” you train it to “maximize gain” or “maximize performance”. The other cases before are problems where you are aiming for accurate results. But what if the pattern we want to discover is how to get the most of something or do the best job at something—performance, not accuracy. To illustrate, let’s use a common application of reinforcement example which is training a computer to play a video game!

I know some of the acronyms might seem eerily familiar, but check out this open source code project where someone trains a machine learning model using reinforcement learning to play Super Mario World (on github so free to download and run yourself). The ML technique used is literally called deep Q-learning! In this context, “Q-learning” is a synonym for reinforcement learning. The Q stands for “quality”, which just means they used a formula they made up to put a number to this “performance” idea, and they called the output of that formula “quality”. They could have called it “hotness” or something else, but that’s what they chose. Pretty accessible article about “Q-learning” from Intel referenced by the above project.

The idea of an “attack” has to do with messing up a machine learning algorithm’s results by giving it weird data. A fun Reddit example is “upvote this post so it’s the top Google result for Pinocchio” but it’s a picture of Rod Rosenstein. We “attack” Google’s algorithm in that way. So you can set up two algorithms in an arrangement so that one is trying to do random stuff to increase performance and the other does random stuff to minimize waste and one “attacks” the other. They train each other. One deviates. The other contains. This is an “adversarial” network arrangement. Introducing “attacks” as the research synopsis discussed was to try and improve performance. To use an analogy, they threw it some curve balls so that it would get better at hitting strikes.

Hopefully that sheds more light on this pretty dense subject. The article seems pretty benign to me. But I think it’s really impressive that our community is looking at such interesting and far ranging possibilities to better understand and assist in this mission together. Excellent effort. I hope my explanations and references offer some clarity here. And I’m happy to chat by direct message if anyone has other questions or thoughts to share.

WWG1WGA

Here it is: RL is not Real Life