Anonymous ID: f4ddea March 8, 2019, 12:21 p.m. No.5578131   🗄️.is 🔗kun   >>8161 >>8173 >>8220 >>8446 >>8666 >>8687

Q-Learning

 

Have anons seen this before? related to machine learning and game theory.

 

https://qresear.ch/?q=q-learning returns no exact results so posting here.

 

-

 

Q-learning is a model-free reinforcement learning algorithm. The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances. It does not require a model (hence the connotation "model-free") of the environment, and it can handle problems with stochastic transitions and rewards, without requiring adaptations.

 

For any finite Markov decision process (FMDP), Q-learning finds a policy that is optimal in the sense that it maximizes the expected value of the total reward over any and all successive steps, starting from the current state.[1] Q-learning can identify an optimal action-selection policy for any given FMDP, given infinite exploration time and a partly-random policy.[1] "Q" names the function that returns the reward used to provide the reinforcement and can be said to stand for the "quality" of an action taken in a given state.[2]

 

https://en.wikipedia.org/wiki/Q-learning

 

Part 1 of 2

Anonymous ID: f4ddea March 8, 2019, 12:21 p.m. No.5578135   🗄️.is 🔗kun   >>8161 >>8220 >>8446 >>8687

Q-Learning

 

Collection of interdating sauces for learning about Q-learning and MARL, Multi-Agent Reinforcement Learning.

 

-

 

https://en.wikipedia.org/wiki/Q-learning

 

https://en.wikipedia.org/wiki/Model-free_(reinforcement_learning)

 

https://en.wikipedia.org/wiki/Reinforcement_learning

 

https://en.wikipedia.org/wiki/Multi-agent_system

 

https://en.wikipedia.org/wiki/Quantum_machine_learning#Quantum-enhanced_reinforcement_learning

 

-

 

https://github.com/LantaoYu/MARL-Papers

 

MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

http://www.masfoundations.org/mas.pdf (532pg book/pdf, Stanford)

 

An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

https://www.cs.cmu.edu/~mmv/papers/00TR-mike.pdf (sponsored by the United States Air Force, mentions DARPA)

 

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

https://arxiv.org/pdf/1711.00832.pdf

 

Multi-Agent Reinforcement Learning: An Overview

http://www.dcsc.tudelft.nl/~bdeschutter/pub/rep/10_003.pdf

 

EAQR: A Multiagent Q-Learning Algorithm for Coordination of Multiple Agents

http://downloads.hindawi.com/journals/complexity/2018/7172614.pdf

 

https://papers.nips.cc/paper/2503-extending-q-learning-to-general-adaptive-multi-agent-systems.pdf

 

Part 2 of 2

Anonymous ID: f4ddea March 8, 2019, 1:07 p.m. No.5578666   🗄️.is 🔗kun   >>8701 >>8773

>>5578131

 

Q-Learning hand-waving and breakdown re: https://en.wikipedia.org/wiki/Q-learning

 

>Q-learning is a model-free reinforcement learning algorithm.

 

Knowledge is Power. Anons reinforce learning as Q drops are decoded and future proves past.

 

>The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances. It does not require a model (hence the connotation "model-free") of the environment, and it can handle problems with stochastic transitions and rewards, without requiring adaptations.

 

Q continues to provide drops and says anons have more than they know. However Q doesn't explicitly outline connections between drops. ie Model-free environment.

 

>For any finite Markov decision process (FMDP), Q-learning finds a policy that is optimal in the sense that it maximizes the expected value of the total reward over any and all successive steps, starting from the current state.[1]

 

Speculation: most Q drops were created long in advance. Military/intelligence planning at its finest, years in advance. Q thus selects the comm||action that provides the maximal expect total return at the time, given the current state of the "world" (all actors, good and bad) and anons current knowledge (ie state), and provides as a drop here. Why? Moves and counter-moves. Comm selection requires continual adjustment and fine-tuning/selection. Thinking Stingers here… see last point below.

 

>Q-learning can identify an optimal action-selection policy for any given FMDP, given infinite exploration time and a partly-random policy.[1]

 

Hivemind has infinite exploration time and infinite resources; all actors acting anonymously and partly random. We have no idea who the anon besides us is (unless they filled out the damn email field) but all have the same end goal; achieved by partly-random, but directed, action(s)/policy.

 

>"Q" names the function that returns the reward used to provide the reinforcement and can be said to stand for the "quality" of an action taken in a given state.[2]

 

Interdasting Q is quoted in Wikipedia article. "Q" names the function that returns the reward = Non-stinger Drops? Why? Stingers align to actions taken, eg satcom knockout, SJAH [-48] etc (see speculation above). Non-stinger drops enlighten anons and enhance their knowledge and power. Re-read: used to provide the reinforcement for the "quality" of an action taken in a given state. ie the Stinger is the action which produces the result, the non-stinger comm reinforces the quality/result of action taken in the minds of anons.

 

Summary: Q's drop/comm methodology is based on Q-Learning.