Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about strategy for sampling goals for replay? #24

Open
whynpt opened this issue Mar 30, 2022 · 4 comments
Open

question about strategy for sampling goals for replay? #24

whynpt opened this issue Mar 30, 2022 · 4 comments

Comments

@whynpt
Copy link

whynpt commented Mar 30, 2022

thanks a lot! tihs project works well with my own robotic environment. But I am confused about her.her_sampler.sample_her_transitions, because it's quite different from the strategy future as I think.
Screenshot from 2022-03-30 17-16-02
In paper, for every transition in buffer, k goals are sampled for every transition in buffer. Then k new transitions are stored in buffer, which seems to be data augmentation .In code, replay-k means ratio to replace, not the number of goals. As her.her_sampler.sample_her_transitions shows, when updating the network, 256 transitions are choosen and part of their goals are replaced with achieved goal.
Does replacing goals proportionally eual the strategy future?

@captainzhu123
Copy link

yes, I also have the same question. I see other project, but for the strategy of future they have different sampling methods. However, The result of this project is very good. I am also confused with you, whether to replace it with a specific k target or just replace it with a ratio of k.

@whynpt
Copy link
Author

whynpt commented Apr 23, 2022

I don't change the implement of Future strategy, but there is another project which replace 'desired goal' with a specific k target. See https://github.com/kaixindelele/DRLib.git @captainzhu123

@captainzhu123
Copy link

I think it is the author of this project who modified the way of selecting sub-targets. You also raised this question earlier. I also think that the sampling experience should not be modified directly according to the ratio, which is similar to a kind of data enhancement. @whynpt

@ChrisZonghaoLi
Copy link

Okay so after some research I think he mention the reasons behind these code here in this paper (page 32): https://link.springer.com/content/pdf/10.1007/978-3-030-89370-5.pdf?pdf=button. Paper is titled "Diversity-Based Trajectory and Goal Selection with Hindsight Experience Replay". The code is also available here where the same HER sampling method was used: https://github.com/TianhongDai/div-hindsight/blob/master/baselines/her/her.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants