-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pr allow curriculum learning in grocery ground goal task #82
Conversation
…resolve gym.Space.DiscreteSequence not supported when _spec_from_gym_space (cherry picked from commit 213c46a7d4b6d4388d5df5b710b24d4bb9e2f42b) make groceryground goal task more configurable
address Jiangtao's comments on GroceryGoundGoalTask initialization y make goal task configurable via gin address Jiangtao's comments on GroceryGoundGoalTask initialization y
start_range (float): for curriculum learning, the starting random_range to set the goal | ||
Enables curriculum learning if start_range > 1.2 * success_distance_thresh. | ||
NOTE: Because curriculum learning is implemented using teacher in the environment, | ||
currently teacher status are not stored in model checkpoints. Resuming is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As curriculum range is increased according to parameter reward_thresh_to_increase_range automatically, is it supposed to be kind of supporting resuming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is described in Issue #79
python/social_bot/teacher_tasks.py
Outdated
def _push_reward_queue(self, value): | ||
if (not self.should_use_curriculum_training() or | ||
self._is_full_range_in_curriculum): | ||
return | ||
while len(self._q) >= self._max_reward_q_length: | ||
self._q.popleft() | ||
self._q.append(value) | ||
if (value > 0 and len(self._q) == self._max_reward_q_length and | ||
sum(self._q) >= self._max_reward_q_length * | ||
self._reward_thresh_to_increase_range): | ||
self._random_range *= 1. + self._increase_range_by_percent | ||
if self._random_range > self._orig_random_range: | ||
self._random_range = self._orig_random_range | ||
logging.info("Raising random_range to %f", self._random_range) | ||
self._q.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest polyak average can be used here, which could make the code logic simpler and less computational, and can have similar effect.
alpha = 0.001
self.polyak_reward = value * alpha + self.polyak_reward * (1 - alpha)
if self.polyak_reward > reward_thresh_to_increase_range:
self._random_range += self._random_range * self._increase_range_by_percent
self.polyak_reward = 0
alpha has similar effect to 'max_reward_q_length' here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using queue if Ok, it's easier to reason about the effect, plus the code isn't that much complex if maxlen is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, Jiangtao, I'll keep this in mind for the future. For this one, I'll just use success rate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, of course, just a simple suggestion :-)
start_range (float): for curriculum learning, the starting random_range to set the goal | ||
Enables curriculum learning if start_range > 1.2 * success_distance_thresh. | ||
NOTE: Because curriculum learning is implemented using teacher in the environment, | ||
currently teacher status are not stored in model checkpoints. Resuming is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is described in Issue #79
python/social_bot/teacher_tasks.py
Outdated
if (not self.should_use_curriculum_training() or | ||
self._is_full_range_in_curriculum): | ||
return | ||
while len(self._q) >= self._max_reward_q_length: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deque has an argument maxlen. You don't need to pop it if maxlen is provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
while len(self._q) >= self._max_reward_q_length: | ||
self._q.popleft() | ||
self._q.append(value) | ||
if (value > 0 and len(self._q) == self._max_reward_q_length and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "len(self._q) == self._max_reward_q_length" can be removed. It's unlikely to exceed the reward_thresh without the queue being full.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually after curriculum advances, we clear the deque, and it's very likely the agent can get a few episodes successfully (because agent can already pass earlier level in curriculum), and pass the next level of the curriculum by accident.
python/social_bot/teacher_tasks.py
Outdated
def _push_reward_queue(self, value): | ||
if (not self.should_use_curriculum_training() or | ||
self._is_full_range_in_curriculum): | ||
return | ||
while len(self._q) >= self._max_reward_q_length: | ||
self._q.popleft() | ||
self._q.append(value) | ||
if (value > 0 and len(self._q) == self._max_reward_q_length and | ||
sum(self._q) >= self._max_reward_q_length * | ||
self._reward_thresh_to_increase_range): | ||
self._random_range *= 1. + self._increase_range_by_percent | ||
if self._random_range > self._orig_random_range: | ||
self._random_range = self._orig_random_range | ||
logging.info("Raising random_range to %f", self._random_range) | ||
self._q.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using queue if Ok, it's easier to reason about the effect, plus the code isn't that much complex if maxlen is used.
Thanks for the comments guys. Please take another look. |
@@ -509,6 +536,9 @@ def __init__(self, | |||
agent_type='pioneer2dx_noplugin', | |||
world_time_precision=None, | |||
step_time=0.1, | |||
random_goal=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that random_goal,fail_distance_thresh and max_steps are not used in this class. These parameters are configured it by gin files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Forgot to remove them.
remove unused code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch Jiangtao. I also added capability to allow specifying a subset of goals.
@@ -509,6 +536,9 @@ def __init__(self, | |||
agent_type='pioneer2dx_noplugin', | |||
world_time_precision=None, | |||
step_time=0.1, | |||
random_goal=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Forgot to remove them.
Hey Jiangtao, Wei and Haonan, this change allows curriculum teaching of goal task by increasing random_range every time agent's success rate is above certain threshold (e.g. 0.9).
It also allows mixing in some percentage (e.g. 20%) of full random_range goals, in which case, those instances are not used to compute the agent's success rate.
Could you take a look?
Thanks,
Le