Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pr allow curriculum learning in grocery ground goal task #82

Merged
merged 8 commits into from
Oct 11, 2019

Conversation

le-horizon
Copy link
Contributor

Hey Jiangtao, Wei and Haonan, this change allows curriculum teaching of goal task by increasing random_range every time agent's success rate is above certain threshold (e.g. 0.9).

It also allows mixing in some percentage (e.g. 20%) of full random_range goals, in which case, those instances are not used to compute the agent's success rate.

Could you take a look?

Thanks,
Le

Le Zhao added 4 commits October 7, 2019 16:25
…resolve gym.Space.DiscreteSequence not supported when _spec_from_gym_space

(cherry picked from commit 213c46a7d4b6d4388d5df5b710b24d4bb9e2f42b)

make groceryground goal task more configurable
address Jiangtao's comments on GroceryGoundGoalTask initialization
y

make goal task configurable via gin

address Jiangtao's comments on GroceryGoundGoalTask initialization
y
start_range (float): for curriculum learning, the starting random_range to set the goal
Enables curriculum learning if start_range > 1.2 * success_distance_thresh.
NOTE: Because curriculum learning is implemented using teacher in the environment,
currently teacher status are not stored in model checkpoints. Resuming is not supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As curriculum range is increased according to parameter reward_thresh_to_increase_range automatically, is it supposed to be kind of supporting resuming?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is described in Issue #79

Comment on lines 91 to 105
def _push_reward_queue(self, value):
if (not self.should_use_curriculum_training() or
self._is_full_range_in_curriculum):
return
while len(self._q) >= self._max_reward_q_length:
self._q.popleft()
self._q.append(value)
if (value > 0 and len(self._q) == self._max_reward_q_length and
sum(self._q) >= self._max_reward_q_length *
self._reward_thresh_to_increase_range):
self._random_range *= 1. + self._increase_range_by_percent
if self._random_range > self._orig_random_range:
self._random_range = self._orig_random_range
logging.info("Raising random_range to %f", self._random_range)
self._q.clear()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest polyak average can be used here, which could make the code logic simpler and less computational, and can have similar effect.

alpha = 0.001
self.polyak_reward = value * alpha + self.polyak_reward * (1 - alpha)
if self.polyak_reward > reward_thresh_to_increase_range:
    self._random_range += self._random_range * self._increase_range_by_percent
    self.polyak_reward = 0

alpha has similar effect to 'max_reward_q_length' here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using queue if Ok, it's easier to reason about the effect, plus the code isn't that much complex if maxlen is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, Jiangtao, I'll keep this in mind for the future. For this one, I'll just use success rate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course, just a simple suggestion :-)

start_range (float): for curriculum learning, the starting random_range to set the goal
Enables curriculum learning if start_range > 1.2 * success_distance_thresh.
NOTE: Because curriculum learning is implemented using teacher in the environment,
currently teacher status are not stored in model checkpoints. Resuming is not supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is described in Issue #79

if (not self.should_use_curriculum_training() or
self._is_full_range_in_curriculum):
return
while len(self._q) >= self._max_reward_q_length:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deque has an argument maxlen. You don't need to pop it if maxlen is provided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

while len(self._q) >= self._max_reward_q_length:
self._q.popleft()
self._q.append(value)
if (value > 0 and len(self._q) == self._max_reward_q_length and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "len(self._q) == self._max_reward_q_length" can be removed. It's unlikely to exceed the reward_thresh without the queue being full.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually after curriculum advances, we clear the deque, and it's very likely the agent can get a few episodes successfully (because agent can already pass earlier level in curriculum), and pass the next level of the curriculum by accident.

Comment on lines 91 to 105
def _push_reward_queue(self, value):
if (not self.should_use_curriculum_training() or
self._is_full_range_in_curriculum):
return
while len(self._q) >= self._max_reward_q_length:
self._q.popleft()
self._q.append(value)
if (value > 0 and len(self._q) == self._max_reward_q_length and
sum(self._q) >= self._max_reward_q_length *
self._reward_thresh_to_increase_range):
self._random_range *= 1. + self._increase_range_by_percent
if self._random_range > self._orig_random_range:
self._random_range = self._orig_random_range
logging.info("Raising random_range to %f", self._random_range)
self._q.clear()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using queue if Ok, it's easier to reason about the effect, plus the code isn't that much complex if maxlen is used.

@HorizonRobotics HorizonRobotics deleted a comment from emailweixu Oct 9, 2019
@le-horizon
Copy link
Contributor Author

Thanks for the comments guys. Please take another look.

@@ -509,6 +536,9 @@ def __init__(self,
agent_type='pioneer2dx_noplugin',
world_time_precision=None,
step_time=0.1,
random_goal=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that random_goal,fail_distance_thresh and max_steps are not used in this class. These parameters are configured it by gin files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Forgot to remove them.

Copy link
Contributor Author

@le-horizon le-horizon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch Jiangtao. I also added capability to allow specifying a subset of goals.

@@ -509,6 +536,9 @@ def __init__(self,
agent_type='pioneer2dx_noplugin',
world_time_precision=None,
step_time=0.1,
random_goal=None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Forgot to remove them.

@le-horizon le-horizon merged commit 1751b97 into master Oct 11, 2019
@witwolf witwolf deleted the PR_grocery_vocab_curriculum branch November 20, 2019 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants