Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A tiny confusion about the ;state' concept #51

Open
AquaHorseM opened this issue Sep 7, 2024 · 4 comments
Open

A tiny confusion about the ;state' concept #51

AquaHorseM opened this issue Sep 7, 2024 · 4 comments

Comments

@AquaHorseM
Copy link

I'm trying to fit my own env into the gym to use the algorithms, however the step() function needs to return a 'state'. Is this the global state that contain all the information in the game, or the shared observation that is visible to all agents? If it is the former, should I put the shared observation into each agent's own returning observation?

@Ivan-Zhong
Copy link
Collaborator

Hi. The 'state' is the global state containing all information in the game, that makes it a valid MDP. The observation of each agent should follow your environment design. Hope it helps.

@AquaHorseM
Copy link
Author

Thank you for your reply! Sorry for another bothering here: how can I most conveniently modify the algorithms (HASAC for example) into offline training? Is there any suggestion?

@Ivan-Zhong
Copy link
Collaborator

These algorithms are not inherently designed for offline settings, so they do not curate large offline datasets or have conservative training constraints. I think you may need to design new algorithms and construct offline datasets to achieve offline training.

@AquaHorseM
Copy link
Author

Okay thanks! Actually I am only looking for a way to apply the algorithm to existing datasets and I found where it should be done in the codebase. I would try it myself. Thank you anyway.
Moreover, I found such a definition here:
image
It seems like the returning 'state' of the 'step' function should be the shared observations? Sounds weird but it makes sense according to the code. I would try the implementations to see if they work, and hope there would be a more precise guide on how to adapt the algorithms to an own environment (or I would like to contribute to one if I succeed).
Again, thanks for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants