Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use for reenforcment learning env sim #3

Open
8 tasks
wabu opened this issue Jun 21, 2019 · 0 comments
Open
8 tasks

use for reenforcment learning env sim #3

wabu opened this issue Jun 21, 2019 · 0 comments
Labels
community input Would like to gain more insights from the community.

Comments

@wabu
Copy link
Contributor

wabu commented Jun 21, 2019

Simulating a einvoronment for reenfocment learning has some constraints, afaik:

  • efficient network interaction

    After each simulation step, we have to provide new feedback to the network
    and get new actions from it. This may require breaking out of numba mode, as
    we have to interact with some other library.

    • can we run simulation all numba and use shared mem as IO to ML thread
    • does ML impl. provied a C function we can call with numba
    • is it possible to exchange between simcompyl and ML just on gpu?
    • check who we have to talk to and ways to interact? pytorch, tensorflow, ...
  • relative small population

    I guess it is not ideal to have millions of parallel instances of the
    env, each training it's own network or giving one network feedback from
    hundreds of envs, but this may be possible. (For example you may have to swap
    the state of LTSM units, but train on the same wights).

    Furthermore, rewards could be slow/delayed, so for the agents to learn,
    we require more steps inside the same env, not having many worlds with less
    steps.

    A small population may be problematic, as we may have to go out of numba
    code after each step to call the network for the next action.

    • do we really need small populations?
    • how can a single network learn form multiple envs in parallel?
    • can we train many networks and exchange knowledge (see parameter server?)
    • can the network run in numba mode (numba implementation)
@wabu wabu added the community input Would like to gain more insights from the community. label Jun 21, 2019
@wabu wabu changed the title use for reenforcment learning env sim [Feedback/Input welcome] use for reenforcment learning env sim Jun 21, 2019
@wabu wabu pinned this issue Jun 24, 2019
@wabu wabu unpinned this issue Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community input Would like to gain more insights from the community.
Projects
None yet
Development

No branches or pull requests

1 participant