What is the difference between FP and EP？ #49

lordemyj · 2024-08-16T08:10:32Z

    if self.state_type == "EP":
        data = (
            share_obs[:, 0],  # (n_threads, share_obs_dim)
            obs,  # (n_agents, n_threads, obs_dim)
            actions,  # (n_agents, n_threads, action_dim)
            available_actions,  # None or (n_agents, n_threads, action_number)
            rewards[:, 0],  # (n_threads, 1)
            np.expand_dims(dones_env, axis=-1),  # (n_threads, 1)
            valid_transitions.transpose(1, 0, 2),  # (n_agents, n_threads, 1)
            terms,  # (n_threads, 1)
            next_share_obs[:, 0],  # (n_threads, next_share_obs_dim)
            next_obs.transpose(1, 0, 2),  # (n_agents, n_threads, next_obs_dim)
            next_available_actions,  # None or (n_agents, n_threads, next_action_number)
        )
    elif self.state_type == "FP":
        data = (
            share_obs,  # (n_threads, n_agents, share_obs_dim)
            obs,  # (n_agents, n_threads, obs_dim)
            actions,  # (n_agents, n_threads, action_dim)
            available_actions,  # None or (n_agents, n_threads, action_number)
            rewards,  # (n_threads, n_agents, 1)
            np.expand_dims(dones, axis=-1),  # (n_threads, n_agents, 1)
            valid_transitions.transpose(1, 0, 2),  # (n_agents, n_threads, 1)
            terms,  # (n_threads, n_agents, 1)
            next_share_obs,  # (n_threads, n_agents, next_share_obs_dim)
            next_obs.transpose(1, 0, 2),  # (n_agents, n_threads, next_obs_dim)
            next_available_actions,  # None or (n_agents, n_threads, next_action_number)
        )

When self.state_type == "EP", why is only the reward of the first agent taken rewards[:, 0]，
and why the reward of the second agent ignored?

The text was updated successfully, but these errors were encountered:

Ivan-Zhong · 2024-09-07T08:53:19Z

Hi, sorry for the late reply. EP and FP are first introduced in the MAPPO paper (Figure 4). EP stands for environment-provided global state and it provides the same global state input to the critic for all actors. FP is the agent-specific global state and it produces different global state inputs to the critic for all actors. Thus, for the data related to critic in FP, we always have an extra dimension of n_agents to maintain different inputs. As for the rewards, since we consider the fully cooperative scenarios, all agents' rewards are the total reward they receive. Thus, in EP we only save the reward of the first agent, while in FP we save the rewards of all agents for the convenience of data processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the difference between FP and EP？ #49

What is the difference between FP and EP？ #49

lordemyj commented Aug 16, 2024

Ivan-Zhong commented Sep 7, 2024

What is the difference between FP and EP？ #49

What is the difference between FP and EP？ #49

Comments

lordemyj commented Aug 16, 2024

Ivan-Zhong commented Sep 7, 2024