New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

use for reenforcment learning env sim #3

Open

8 tasks

wabu opened this issue Jun 21, 2019 · 0 comments

Labels

community input

Contributor

wabu commented Jun 21, 2019 •

edited

Loading

Simulating a einvoronment for reenfocment learning has some constraints, afaik:

efficient network interaction

After each simulation step, we have to provide new feedback to the network
and get new actions from it. This may require breaking out of numba mode, as
we have to interact with some other library.
- can we run simulation all numba and use shared mem as IO to ML thread
- does ML impl. provied a C function we can call with numba
- is it possible to exchange between simcompyl and ML just on gpu?
- check who we have to talk to and ways to interact? pytorch, tensorflow, ...
relative small population

I guess it is not ideal to have millions of parallel instances of the
env, each training it's own network or giving one network feedback from
hundreds of envs, but this may be possible. (For example you may have to swap
the state of LTSM units, but train on the same wights).

Furthermore, rewards could be slow/delayed, so for the agents to learn,
we require more steps inside the same env, not having many worlds with less
steps.

A small population may be problematic, as we may have to go out of numba
code after each step to call the network for the next action.
- do we really need small populations?
- how can a single network learn form multiple envs in parallel?
- can we train many networks and exchange knowledge (see parameter server?)
- can the network run in numba mode (numba implementation)

wabu added the community input label

wabu changed the title ~~use for reenforcment learning env sim [Feedback/Input welcome]~~ use for reenforcment learning env sim

wabu pinned this issue

wabu unpinned this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment