전체 글
-
3. Week_4-3_Actor-Critic for continuing tasks카테고리 없음 2021. 6. 21. 05:12
3. Actor-Critic for Continuing Tasks Estimating the Policy Gradient Actor-Critic Algorithm $\cdot$ Estimating the Policy Gradient Derive a sample-based estimate for the gradient of the average reward objective We have an objective for policy optimization. We also have the policy gradient theorem, which gives us a simple expression for the gradient of that objective. In this video, we'll complete..
-
3. Week_4-2_Policy Gradient for Continuing Tasks카테고리 없음 2021. 6. 21. 05:11
2. Policy Gradient for Continuing Tasks The Objective for Learning Policies The Policy Gradient Theorem $\cdot$ The Objective for Learning Policies Describe the objective for policy gradient algorithms Now that we've introduced the idea of parameterizing policies directly, we're ready to talk about how we can learn to improve a parameterized policy. Just like with action-value based methods, The..
-
3. Week_4-1_Learning Parameterized Policies카테고리 없음 2021. 6. 21. 05:10
Week_4 INDEX Learning Parameterized Policies Learning Policies Directly Advantages of Policy Parameterization Policy Gradient for Contunuing Tasks The Objective for Learning Policies The Policy Gradient Theorem Actor-Critic for Continuing Tasks Estimating the Policy Gradient Actor-Critic Algorithm Policy Parameterizations Actor-Critic with Softmax Policies Demonstration with Actor-Critic Gaussia..
-
3. Week_3-4_Week 3 Review카테고리 없음 2021. 6. 21. 05:10
Week 3 Review This week, we talked about how to do control when using fucntion approximation. Let's go over the main ideas. Summary That's all for this week. We extended our Tabular control algoorithms to function approximation, discussed how exploration changes, and introduce a new way to think about the control problem. Next week, we'll talk all about how to do Reinforcement Learning without l..
-
3. Week_3-3_Average Reward카테고리 없음 2021. 6. 21. 05:08
3. Average Reward Average Reward $\quad : \;$ A New Way of Formulating Control Problem Satinder Singh $\quad : \;$ on Intrinsic Rewards Week 3 Review $\cdot$ Average Reward $\quad : \;$ A New Way of Formulating Control Problem Describe the average reward setting Explain when average reward optimal policies are different from policies obtained under discounting Understand different value function..
-
3. Week_3-2_Exploration under Function Approximation카테고리 없음 2021. 6. 21. 05:08
2. Exploration under Function Approximation Exploration under Function Approximation $\cdot$ Exploration under Function Approximation Describe how optimistic initial values and $\epsilon - greedy$ can be used with function approximation. The need to balance exploration and exploitation is one of the defining characteristics of the sequential decision-making problem. We've talked about several si..
-
3. Week_3-1_Episodic SARSA with Function Approximation카테고리 없음 2021. 6. 21. 05:07
Week_3 INDEX Episodic SARSA with Function Approximation Episodic SARSA with Function Approximation Episodic SARSA in Mountain Car Expected SARSA with Function Approximation Exploration under Function Approximation Exploration under Function Approximation Average Reward Average Reward $\quad : \;$ A New Way of Formulating Control Problem Satinder Singh $\quad : \;$ On Intrinsic Rewards Week 3 Rev..
-
3. Week_2-4_David Silver _ Week 2 Summary카테고리 없음 2021. 6. 21. 05:05
$\cdot$ David Silver $\quad : \;$ Deep learning + RL = AI ? $\cdot$ Week 2 Summary This week, we discussed methods for representing large impossibly continuous state space ! $\rightarrow \quad$ Ways to construct features. Week 2. Feature representation & update through Neural Network A representation is an agent's internal encoding of the state ! The agent constructs features to summarize the cu..