You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Simulating a einvoronment for reenfocment learning has some constraints, afaik:
efficient network interaction
After each simulation step, we have to provide new feedback to the network
and get new actions from it. This may require breaking out of numba mode, as
we have to interact with some other library.
can we run simulation all numba and use shared mem as IO to ML thread
does ML impl. provied a C function we can call with numba
is it possible to exchange between simcompyl and ML just on gpu?
check who we have to talk to and ways to interact? pytorch, tensorflow, ...
relative small population
I guess it is not ideal to have millions of parallel instances of the
env, each training it's own network or giving one network feedback from
hundreds of envs, but this may be possible. (For example you may have to swap
the state of LTSM units, but train on the same wights).
Furthermore, rewards could be slow/delayed, so for the agents to learn,
we require more steps inside the same env, not having many worlds with less
steps.
A small population may be problematic, as we may have to go out of numba
code after each step to call the network for the next action.
do we really need small populations?
how can a single network learn form multiple envs in parallel?
can we train many networks and exchange knowledge (see parameter server?)
can the network run in numba mode (numba implementation)
The text was updated successfully, but these errors were encountered:
Simulating a einvoronment for reenfocment learning has some constraints, afaik:
efficient network interaction
After each simulation step, we have to provide new feedback to the network
and get new actions from it. This may require breaking out of numba mode, as
we have to interact with some other library.
relative small population
I guess it is not ideal to have millions of parallel instances of the
env, each training it's own network or giving one network feedback from
hundreds of envs, but this may be possible. (For example you may have to swap
the state of LTSM units, but train on the same wights).
Furthermore, rewards could be slow/delayed, so for the agents to learn,
we require more steps inside the same env, not having many worlds with less
steps.
A small population may be problematic, as we may have to go out of numba
code after each step to call the network for the next action.
The text was updated successfully, but these errors were encountered: