引用次数 394

速度、稳定

共同架构:DQN

方案 原理 解决问题
Double DQN decouple selection and evaluation of actions overestimation bias
Prioritized DDQN Prioritized Experience Replay data efficiency
Dueling Prioritized DDQN Dueling network arch generalize across actions
A3C Multi-step,使用截断的 n 步折扣奖励 更快,shift bias-variance trade-off, propagate rewards faster
Distributional DQN learn distribution of returns 随机性?
Noisy DQN stochastic network layer exploration

Q-learning

$$ \Delta = R_{t+1} + \gamma_{t+1} \max\limits_{a'} q_{\bar{\theta}} (S_{t+1}, a') - q_{\theta} (S_t, A_T) $$

Double Q-learning

$$ \Delta = R_{t+1} + \gamma_{t+1} q_{\bar{\theta}} (S_{t+1}, \argmax\limits_{a'} q_{\theta} (S_{t+1}, a')) - q_{\theta} (S_t, A_T) $$

Prioritized Replay

$$ p_t \propto |\Delta|^\omega $$

Deuling network https://arxiv.org/abs/1511.06581.pdf

Page Not Found

Try to search through the entire repo.