dChan - Q Origins Project Archive

Q-Learning

Have anons seen this before? related to machine learning and game theory.

https://qresear.ch/?q=q-learning returns no exact results so posting here.

-

Q-learning is a model-free reinforcement learning algorithm. The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances. It does not require a model (hence the connotation "model-free") of the environment, and it can handle problems with stochastic transitions and rewards, without requiring adaptations.

For any finite Markov decision process (FMDP), Q-learning finds a policy that is optimal in the sense that it maximizes the expected value of the total reward over any and all successive steps, starting from the current state.[1] Q-learning can identify an optimal action-selection policy for any given FMDP, given infinite exploration time and a partly-random policy.[1] "Q" names the function that returns the reward used to provide the reinforcement and can be said to stand for the "quality" of an action taken in a given state.[2]

https://en.wikipedia.org/wiki/Q-learning

Part 1 of 2

Q-Learning

Collection of interdating sauces for learning about Q-learning and MARL, Multi-Agent Reinforcement Learning.

-

https://en.wikipedia.org/wiki/Q-learning

https://en.wikipedia.org/wiki/Model-free_(reinforcement_learning)

https://en.wikipedia.org/wiki/Reinforcement_learning

https://en.wikipedia.org/wiki/Multi-agent_system

https://en.wikipedia.org/wiki/Quantum_machine_learning#Quantum-enhanced_reinforcement_learning

-

https://github.com/LantaoYu/MARL-Papers

MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

http://www.masfoundations.org/mas.pdf (532pg book/pdf, Stanford)

An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

https://www.cs.cmu.edu/~mmv/papers/00TR-mike.pdf (sponsored by the United States Air Force, mentions DARPA)

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

https://arxiv.org/pdf/1711.00832.pdf

Multi-Agent Reinforcement Learning: An Overview

http://www.dcsc.tudelft.nl/~bdeschutter/pub/rep/10_003.pdf

EAQR: A Multiagent Q-Learning Algorithm for Coordination of Multiple Agents

http://downloads.hindawi.com/journals/complexity/2018/7172614.pdf

https://papers.nips.cc/paper/2503-extending-q-learning-to-general-adaptive-multi-agent-systems.pdf

Part 2 of 2

Opps my bad, posts were supposed to be linked

>>5578131, >>5578135 Q-Learning

Anonymous ID: f4ddea March 8, 2019, 12:28 p.m. No.5578220 🗄️.is 🔗kun >>8307

>>5578173

Copypasta error, Baker. Got the POTUS/FLOTUS post included twice, errant insert into Q-Learning:

>5578133 POTUS and FLOTUS wheels up to Maralago

>5578131 , >5578133 Q-Learning

Should be:

>>5578131, >>5578135 Q-Learning

ThankQ

>>5578131

Q-Learning hand-waving and breakdown re: https://en.wikipedia.org/wiki/Q-learning

>Q-learning is a model-free reinforcement learning algorithm.

Knowledge is Power. Anons reinforce learning as Q drops are decoded and future proves past.

>The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances. It does not require a model (hence the connotation "model-free") of the environment, and it can handle problems with stochastic transitions and rewards, without requiring adaptations.

Q continues to provide drops and says anons have more than they know. However Q doesn't explicitly outline connections between drops. ie Model-free environment.

>For any finite Markov decision process (FMDP), Q-learning finds a policy that is optimal in the sense that it maximizes the expected value of the total reward over any and all successive steps, starting from the current state.[1]

Speculation: most Q drops were created long in advance. Military/intelligence planning at its finest, years in advance. Q thus selects the comm||action that provides the maximal expect total return at the time, given the current state of the "world" (all actors, good and bad) and anons current knowledge (ie state), and provides as a drop here. Why? Moves and counter-moves. Comm selection requires continual adjustment and fine-tuning/selection. Thinking Stingers here… see last point below.

>Q-learning can identify an optimal action-selection policy for any given FMDP, given infinite exploration time and a partly-random policy.[1]

Hivemind has infinite exploration time and infinite resources; all actors acting anonymously and partly random. We have no idea who the anon besides us is (unless they filled out the damn email field) but all have the same end goal; achieved by partly-random, but directed, action(s)/policy.

>"Q" names the function that returns the reward used to provide the reinforcement and can be said to stand for the "quality" of an action taken in a given state.[2]

Interdasting Q is quoted in Wikipedia article. "Q" names the function that returns the reward = Non-stinger Drops? Why? Stingers align to actions taken, eg satcom knockout, SJAH [-48] etc (see speculation above). Non-stinger drops enlighten anons and enhance their knowledge and power. Re-read: used to provide the reinforcement for the "quality" of an action taken in a given state. ie the Stinger is the action which produces the result, the non-stinger comm reinforces the quality/result of action taken in the minds of anons.

Summary: Q's drop/comm methodology is based on Q-Learning.

>>5578666

devilishly clever? trips would seem to confirm, kek!

>>5578038

Flag looks familiar anon, I think we're the anons who lifted each others graphs not too far back! kek