3. Week_3-4_Week 3 Review

3. Week_3-4_Week 3 Review

카테고리 없음 2021. 6. 21. 05:10
3. Week_3-4_Week 3 Review

Week 3 Review

This week, we talked about how to do control when using fucntion approximation.
Let's go over the main ideas.

Summary

That's all for this week.

We extended our Tabular control algoorithms to function approximation,

discussed how exploration changes,

and introduce a new way to think about the control problem.

Next week,
we'll talk all about how to do Reinforcement Learning without learning the value function !

Summary

That's all for this week.

We extended our Tabular control algoorithms to function approximation,

discussed how exploration changes,

and introduce a new way to think about the control problem.

Next week,
we'll talk all about how to do Reinforcement Learning without learning the value function !

Summary

That's all for this week.

We extended our Tabular control algoorithms to function approximation,

discussed how exploration changes,

and introduce a new way to think about the control problem.

Next week,
we'll talk all about how to do Reinforcement Learning without learning the value function !

Respresenting actions

1 2

First, we showed you how to estimate action values with function approximation.

If the action space is discrete,
it's probably easiest to stack the state fearues.

If the action space is continuous or you want to generalize over actions,
The action can be passed as an input like any other state variable.

TD Control with Approximation

Let's get some context abouth the next part of the module by looking at the algorithm map.

Function appoximation puts us on the left side of the map.
the focus of the first lecture was on the control algorithms in the bottom left corner,
SARSA / Expected SARSA / Q-learning

These are all extensions of the Tabular control algorithms we covered in Course 2.

The only difference between these algorithms and their tabular counterparts are the update equations.
The ipdates are all adapted for function approximation in the same way, using the gradient to update the weights.

We also saw how episodic SARSA could be used to solve the mountain car problem.
In this case, the larger step size is $0.5$ was able to learn more quickly.

Exploration with Approximation

Next,
we talked about exploration.

Optimistic initialization
Optimistic initialization can be used with some structured featrue representations like Tile-Coding.

But in general,
it's not clear how to optimistically initialize values with non-linear function approximators like neural networks.
And it might not behve as expected.

For example,
the optimism may fade too quickly.

$\epsilon - greedy$
Epsilon-greedy can be used regardless of the function approximator.

Average Reward

Finally,
we talked about a new way to think about the continuing control problem.

Instead of maximizing the discounted return from the current state,
we can think about maximizing the average reward that a policy receives overtime.

We defined differential returns and differential values.

These enable the agent to assess the relative value of actions in the average reward setting !

Differential semi-gradient SARSA

Finally,
we introduced differential semi-gradient SARSA,
that approximates differential values to learn policies.

Differential SARSA is also in the left half of the algorithm map.

But unlike the algorithms we covered earlier in the week,
it uses the average reward framework.

Summary

That's all for this week.

We extended our Tabular control algoorithms to function approximation,

discussed how exploration changes,

and introduce a new way to think about the control problem.

Next week,
we'll talk all about how to do Reinforcement Learning without learning the value function !

ABOUT ME

돗토리 돗토리

Week 3 Review

Summary

Summary

Summary

TD Control with Approximation

Exploration with Approximation

Average Reward

Differential semi-gradient SARSA

Summary

티스토리툴바