Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the difference between FP and EP? #49

Open
lordemyj opened this issue Aug 16, 2024 · 1 comment
Open

What is the difference between FP and EP? #49

lordemyj opened this issue Aug 16, 2024 · 1 comment

Comments

@lordemyj
Copy link

    if self.state_type == "EP":
        data = (
            share_obs[:, 0],  # (n_threads, share_obs_dim)
            obs,  # (n_agents, n_threads, obs_dim)
            actions,  # (n_agents, n_threads, action_dim)
            available_actions,  # None or (n_agents, n_threads, action_number)
            rewards[:, 0],  # (n_threads, 1)
            np.expand_dims(dones_env, axis=-1),  # (n_threads, 1)
            valid_transitions.transpose(1, 0, 2),  # (n_agents, n_threads, 1)
            terms,  # (n_threads, 1)
            next_share_obs[:, 0],  # (n_threads, next_share_obs_dim)
            next_obs.transpose(1, 0, 2),  # (n_agents, n_threads, next_obs_dim)
            next_available_actions,  # None or (n_agents, n_threads, next_action_number)
        )
    elif self.state_type == "FP":
        data = (
            share_obs,  # (n_threads, n_agents, share_obs_dim)
            obs,  # (n_agents, n_threads, obs_dim)
            actions,  # (n_agents, n_threads, action_dim)
            available_actions,  # None or (n_agents, n_threads, action_number)
            rewards,  # (n_threads, n_agents, 1)
            np.expand_dims(dones, axis=-1),  # (n_threads, n_agents, 1)
            valid_transitions.transpose(1, 0, 2),  # (n_agents, n_threads, 1)
            terms,  # (n_threads, n_agents, 1)
            next_share_obs,  # (n_threads, n_agents, next_share_obs_dim)
            next_obs.transpose(1, 0, 2),  # (n_agents, n_threads, next_obs_dim)
            next_available_actions,  # None or (n_agents, n_threads, next_action_number)
        )

When self.state_type == "EP", why is only the reward of the first agent taken rewards[:, 0],
and why the reward of the second agent ignored?

@Ivan-Zhong
Copy link
Collaborator

Hi, sorry for the late reply. EP and FP are first introduced in the MAPPO paper (Figure 4). EP stands for environment-provided global state and it provides the same global state input to the critic for all actors. FP is the agent-specific global state and it produces different global state inputs to the critic for all actors. Thus, for the data related to critic in FP, we always have an extra dimension of n_agents to maintain different inputs. As for the rewards, since we consider the fully cooperative scenarios, all agents' rewards are the total reward they receive. Thus, in EP we only save the reward of the first agent, while in FP we save the rewards of all agents for the convenience of data processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants