Use of custom environment and agents #10

Adaickalavan · 2021-04-26T23:44:39Z

I am interested in using XingTian for multi-agent training with PPO algorithm in the SMARTS environment. An example to use SMARTS environment is available here.

Could you provide a detailed step-by-step instructions and an example on how to use XingTian with our own custom environment for multi-agent training?

hustqj · 2021-04-29T12:25:40Z

I will upload a multi-agent PPO example, You can refer to it .

hustqj · 2021-04-30T08:46:16Z

I have added new examples, you can find it in xingtian/examples/ma_cases/ppo_share_catch_pigs.yaml.

Adaickalavan · 2021-05-03T21:50:41Z

I have several questions as follows:

[1] Could you explain the differences between the setting self.env_info["api_type"] == "standalone" and self.env_info["api_type"] == "unified" using an example? When do we use each of them?

[2] I tried using a custom environment and a custom agent in XingTian. There were 2 agents (i.e., multiagent) training with 1 environment and 1 learner. The custom environment accepts agent actions of dict format {“0”: Action_of_agent_0, “1”: Action_of_agent_1} and returns a dict format {“0”: Observation_of_agent_0, “1”: Observation_of_agent_1} on reset and on each step. The custom agent implements infer_action function which accepts input raw_state of format Observation_of_agent_x and returns action of format Action_of_agent_x.

[2a] When the training was run with api_type==unified, the following error message was printed:

From the error trace, this error appears to be because the code sidestepped the Agent block and directly proceeded to the Algorithm block. The code feeds the states directly to self.algs[0].predict(states) from Environment block.
Consider the line self.algs[0].predict(states) and assume we want the two agents to use different algorithms. How can we achieve it in api_type==unified since self.algs is of length 1?

[2b] On the other hand, when the training was run with api_type==standalone, the following error message was printed:

From the error trace, this error appears to be because a dictionary of all agents' state (i.e., {“0”: Observation_of_agent_0, “1”: Observation_of_agent_1}) is being fed to the infer_action function. However, infer_action function accepts input raw_state for a single agent at each time of format Observation_of_agent_x.

[2c] What should I do to achieve two agents (i.e., multiagent) training with N environments and M learners, with the above custom environment and custom agent interfaces?

[3] Refer to this portion of the code.

xingtian/xt/framework/agent_group.py

Lines 436 to 462 in 9dee512

    
               def explore(self, episode_count): 
        
                   """ 
        
                   Explore the environment. 
        
                   agent_num impact on the api about run interaction with environment. 
        
                       == 1: use standalone api, `run_one_episode` 
        
                       >= 2 and env.api_type == "standalone": agent.run_one_episode 
        
                       >= 2 and env.api_type == "unified": agent.do_one_interaction. 
        
                   :param episode_count: 
        
                   :return: 
        
                   """ 
        
                   # single agent, always use the `run_one_episode` api. 
        
                   # multi agent with `standalone` api_type, use the `run_one_episode` api. 
        
                   if self.env_info["api_type"] == "standalone": 
        
                       # (use_explore, collect) 
        
                       _paras = [ 
        
                           (True, False if _ag.alg.async_flag else True) for _ag in self.agents 
        
                       ] 
        
                       job_funcs = [agent.run_one_episode for agent in self.agents] 
        
                       for _epi_index in range(episode_count): 
        
                           _start2 = time() 
        
                           self.env.reset() 
        
                           for agent in self.agents: 
        
                               agent.reset() 
        
                           trajectory_list = self.bot.do_multi_job(job_funcs, _paras)

Assume we are training 2 agents (i.e., multiagent) with 1 environment and 1 learner. When using api_type==standalone, each agent appears to be executed in the same environment instance for one full episode using separate threads via self.bot.do_multi_job(job_funcs, _paras).

So, are the agents stepped independently at different speeds in the same environment instance?
In other words, each agent is not guaranteed to make one step together at each time point?

hustqj · 2021-05-07T09:08:04Z

[1] "standalone" means the simulator provides an independent interface for each agent, "unified" means all agents share one interface like smarts

hustqj · 2021-05-07T09:20:36Z

[2a] You should convert oberservation to numpy type in the your agent module.
[2c] You can set env_num=N to achieve interacting with N environments, but we only support one learner now， all training data from N environments will be sent to learner for training

hustqj · 2021-05-07T09:30:30Z

[3] In the "standalone" mode, each agent is running in independent thread, whether they run synchronously depends on the environment， some environment will guarantee all agent running in the same time piont and some environments are completely asynchronous

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of custom environment and agents #10

Use of custom environment and agents #10

Adaickalavan commented Apr 26, 2021

hustqj commented Apr 29, 2021

hustqj commented Apr 30, 2021

Adaickalavan commented May 3, 2021

hustqj commented May 7, 2021

hustqj commented May 7, 2021

hustqj commented May 7, 2021

Use of custom environment and agents #10

Use of custom environment and agents #10

Comments

Adaickalavan commented Apr 26, 2021

hustqj commented Apr 29, 2021

hustqj commented Apr 30, 2021

Adaickalavan commented May 3, 2021

hustqj commented May 7, 2021

hustqj commented May 7, 2021

hustqj commented May 7, 2021