Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of custom environment and agents #10

Open
Adaickalavan opened this issue Apr 26, 2021 · 6 comments
Open

Use of custom environment and agents #10

Adaickalavan opened this issue Apr 26, 2021 · 6 comments

Comments

@Adaickalavan
Copy link
Member

I am interested in using XingTian for multi-agent training with PPO algorithm in the SMARTS environment. An example to use SMARTS environment is available here.

Could you provide a detailed step-by-step instructions and an example on how to use XingTian with our own custom environment for multi-agent training?

@hustqj
Copy link
Collaborator

hustqj commented Apr 29, 2021

I will upload a multi-agent PPO example, You can refer to it .

@hustqj
Copy link
Collaborator

hustqj commented Apr 30, 2021

I have added new examples, you can find it in xingtian/examples/ma_cases/ppo_share_catch_pigs.yaml.

@Adaickalavan
Copy link
Member Author

I have several questions as follows:

[1] Could you explain the differences between the setting self.env_info["api_type"] == "standalone" and self.env_info["api_type"] == "unified" using an example? When do we use each of them?

[2] I tried using a custom environment and a custom agent in XingTian. There were 2 agents (i.e., multiagent) training with 1 environment and 1 learner. The custom environment accepts agent actions of dict format {“0”: Action_of_agent_0, “1”: Action_of_agent_1} and returns a dict format {“0”: Observation_of_agent_0, “1”: Observation_of_agent_1} on reset and on each step. The custom agent implements infer_action function which accepts input raw_state of format Observation_of_agent_x and returns action of format Action_of_agent_x.

[2a] When the training was run with api_type==unified, the following error message was printed:

  • image
  • From the error trace, this error appears to be because the code sidestepped the Agent block and directly proceeded to the Algorithm block. The code feeds the states directly to self.algs[0].predict(states) from Environment block.
  • Consider the line self.algs[0].predict(states) and assume we want the two agents to use different algorithms. How can we achieve it in api_type==unified since self.algs is of length 1?

[2b] On the other hand, when the training was run with api_type==standalone, the following error message was printed:

  • image
  • From the error trace, this error appears to be because a dictionary of all agents' state (i.e., {“0”: Observation_of_agent_0, “1”: Observation_of_agent_1}) is being fed to the infer_action function. However, infer_action function accepts input raw_state for a single agent at each time of format Observation_of_agent_x.

[2c] What should I do to achieve two agents (i.e., multiagent) training with N environments and M learners, with the above custom environment and custom agent interfaces?

[3] Refer to this portion of the code.

def explore(self, episode_count):
"""
Explore the environment.
agent_num impact on the api about run interaction with environment.
== 1: use standalone api, `run_one_episode`
>= 2 and env.api_type == "standalone": agent.run_one_episode
>= 2 and env.api_type == "unified": agent.do_one_interaction.
:param episode_count:
:return:
"""
# single agent, always use the `run_one_episode` api.
# multi agent with `standalone` api_type, use the `run_one_episode` api.
if self.env_info["api_type"] == "standalone":
# (use_explore, collect)
_paras = [
(True, False if _ag.alg.async_flag else True) for _ag in self.agents
]
job_funcs = [agent.run_one_episode for agent in self.agents]
for _epi_index in range(episode_count):
_start2 = time()
self.env.reset()
for agent in self.agents:
agent.reset()
trajectory_list = self.bot.do_multi_job(job_funcs, _paras)

Assume we are training 2 agents (i.e., multiagent) with 1 environment and 1 learner. When using api_type==standalone, each agent appears to be executed in the same environment instance for one full episode using separate threads via self.bot.do_multi_job(job_funcs, _paras).

  • So, are the agents stepped independently at different speeds in the same environment instance?
  • In other words, each agent is not guaranteed to make one step together at each time point?

@hustqj
Copy link
Collaborator

hustqj commented May 7, 2021

[1] "standalone" means the simulator provides an independent interface for each agent, "unified" means all agents share one interface like smarts

@hustqj
Copy link
Collaborator

hustqj commented May 7, 2021

[2a] You should convert oberservation to numpy type in the your agent module.
[2c] You can set env_num=N to achieve interacting with N environments, but we only support one learner now, all training data from N environments will be sent to learner for training

@hustqj
Copy link
Collaborator

hustqj commented May 7, 2021

[3] In the "standalone" mode, each agent is running in independent thread, whether they run synchronously depends on the environment, some environment will guarantee all agent running in the same time piont and some environments are completely asynchronous

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants