Value Network PPO agent #924
Replies: 1 comment 5 replies
-
The optimal choose depends heavily on specific case and requires some practical test to see what produces better results, however (apart for the output neuron that is 1 because has to represent estimated state value) as general rule of thumb for Value Network it is used a slightly less complex hidden layer network than Actor Network, because Actor Network has to map state-action that is more complex problem, especially in situations where there are many actions, however if the Actor Network has a simple structure, you could also use same number of neurons for Value Network first and second hidden layer. So, in your case you could go for a 16-8 configuration for first and second hidden layer respectively, or try to start lower with a 16-4 and see how it will behave and if it is really needed to increase or better to stay low to prevent overfitting and reduce computational power needs. |
Beta Was this translation helpful? Give feedback.
-
Hello. I was wondering if there is any strategy to (more or less) guess a proper size of the Value Network for a PPO agent. For instance, currently I'm trying to solve a problem where the PPO agent gets as input 18 features, and outputs 3 values. With this information, I can (more or less) guess that an Actor Network of two hidden layers of 16 and 8 neurons might be appropriate. However, for the size of the Value Network, I'm not sure which numbers should I look at in order to guess a potential proper depth.
Beta Was this translation helpful? Give feedback.
All reactions