Value Network PPO agent #924

101AlexMartin · 2024-03-30T12:55:59Z

101AlexMartin
Mar 30, 2024

Hello. I was wondering if there is any strategy to (more or less) guess a proper size of the Value Network for a PPO agent. For instance, currently I'm trying to solve a problem where the PPO agent gets as input 18 features, and outputs 3 values. With this information, I can (more or less) guess that an Actor Network of two hidden layers of 16 and 8 neurons might be appropriate. However, for the size of the Value Network, I'm not sure which numbers should I look at in order to guess a potential proper depth.

SMH17 · 2024-04-07T14:23:55Z

SMH17
Apr 7, 2024

The optimal choose depends heavily on specific case and requires some practical test to see what produces better results, however (apart for the output neuron that is 1 because has to represent estimated state value) as general rule of thumb for Value Network it is used a slightly less complex hidden layer network than Actor Network, because Actor Network has to map state-action that is more complex problem, especially in situations where there are many actions, however if the Actor Network has a simple structure, you could also use same number of neurons for Value Network first and second hidden layer. So, in your case you could go for a 16-8 configuration for first and second hidden layer respectively, or try to start lower with a 16-4 and see how it will behave and if it is really needed to increase or better to stay low to prevent overfitting and reduce computational power needs.

5 replies

101AlexMartin Apr 8, 2024
Author

Which metric do you look at in order to compare the performance between two RL runs (for instance for different sizes of the Value Network)? I'd say the convergence of the loss and the reward should be correlated, but I'm dealing with a case study in which I see a convergence of the loss, but not of the reward, so that's a bit confusing to me.

SMH17 Apr 8, 2024

@101AlexMartin That is, convergence and loss as usual. If the whole system achieves better results with one Value Network configurations compared to other, that means that that is the best configuration for the Value Network. Doesn't make sense to focus on Value Network as some kind of standalone component and so look for some "direct metric" for its specific evaluation; it is synergic to the rest.

101AlexMartin Apr 8, 2024
Author

Indeed, I should focus on the whole system. My question, however, was related towards what metric to look at in order to evaluate this improvement of performance. Imagine I try out two value networks, and one of them gives a higher loss but also a higher cumulative return than the other one. Which system would you say has the best performance? In other words, is the loss a more important metric that the cumulative return?

SMH17 Apr 8, 2024

@101AlexMartin While both are important metrics, the choice should prioritize the one with the highest cumulative return over the one with the lowest loss. The main purpose is to maximize the long-term reward, and you should expect that a higher cumulative return leads to better long-term rewards. In addition, if you get lower loss but also lower cumulative returns, this should make you suspect that the lower loss is due to an overfitting situation.

SMH17 Apr 15, 2024

@101AlexMartin If you don't need further clarifications about this question, please confirm the answer. Thanks. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value Network PPO agent #924

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Value Network PPO agent #924

101AlexMartin Mar 30, 2024

Replies: 1 comment · 5 replies

SMH17 Apr 7, 2024

101AlexMartin Apr 8, 2024 Author

SMH17 Apr 8, 2024

101AlexMartin Apr 8, 2024 Author

SMH17 Apr 8, 2024

SMH17 Apr 15, 2024

101AlexMartin
Mar 30, 2024

Replies: 1 comment 5 replies

SMH17
Apr 7, 2024

101AlexMartin Apr 8, 2024
Author

101AlexMartin Apr 8, 2024
Author