Pr allow curriculum learning in grocery ground goal task #82

le-horizon · 2019-10-07T23:35:25Z

Hey Jiangtao, Wei and Haonan, this change allows curriculum teaching of goal task by increasing random_range every time agent's success rate is above certain threshold (e.g. 0.9).

It also allows mixing in some percentage (e.g. 20%) of full random_range goals, in which case, those instances are not used to compute the agent's success rate.

Could you take a look?

Thanks,
Le

…resolve gym.Space.DiscreteSequence not supported when _spec_from_gym_space (cherry picked from commit 213c46a7d4b6d4388d5df5b710b24d4bb9e2f42b) make groceryground goal task more configurable

address Jiangtao's comments on GroceryGoundGoalTask initialization y make goal task configurable via gin address Jiangtao's comments on GroceryGoundGoalTask initialization y

python/social_bot/envs/grocery_ground.py

Jialn · 2019-10-09T03:35:52Z

python/social_bot/envs/grocery_ground.py

+            start_range (float): for curriculum learning, the starting random_range to set the goal
+                Enables curriculum learning if start_range > 1.2 * success_distance_thresh.
+                NOTE: Because curriculum learning is implemented using teacher in the environment,
+                currently teacher status are not stored in model checkpoints.  Resuming is not supported.


As curriculum range is increased according to parameter reward_thresh_to_increase_range automatically, is it supposed to be kind of supporting resuming?

This is described in Issue #79

python/social_bot/envs/grocery_ground.py

Jialn · 2019-10-09T03:56:03Z

python/social_bot/teacher_tasks.py

+    def _push_reward_queue(self, value):
+        if (not self.should_use_curriculum_training() or
+            self._is_full_range_in_curriculum):
+            return
+        while len(self._q) >= self._max_reward_q_length:
+            self._q.popleft()
+        self._q.append(value)
+        if (value > 0 and len(self._q) == self._max_reward_q_length and
+            sum(self._q) >= self._max_reward_q_length *
+                self._reward_thresh_to_increase_range):
+            self._random_range *= 1. + self._increase_range_by_percent
+            if self._random_range > self._orig_random_range:
+                self._random_range = self._orig_random_range
+            logging.info("Raising random_range to %f", self._random_range)
+            self._q.clear()


I suggest polyak average can be used here, which could make the code logic simpler and less computational, and can have similar effect.

alpha = 0.001 self.polyak_reward = value * alpha + self.polyak_reward * (1 - alpha) if self.polyak_reward > reward_thresh_to_increase_range: self._random_range += self._random_range * self._increase_range_by_percent self.polyak_reward = 0

alpha has similar effect to 'max_reward_q_length' here.

I think using queue if Ok, it's easier to reason about the effect, plus the code isn't that much complex if maxlen is used.

Interesting, Jiangtao, I'll keep this in mind for the future. For this one, I'll just use success rate?

Yes, of course, just a simple suggestion :-)

emailweixu · 2019-10-09T21:59:22Z

python/social_bot/envs/grocery_ground.py

+            start_range (float): for curriculum learning, the starting random_range to set the goal
+                Enables curriculum learning if start_range > 1.2 * success_distance_thresh.
+                NOTE: Because curriculum learning is implemented using teacher in the environment,
+                currently teacher status are not stored in model checkpoints.  Resuming is not supported.


This is described in Issue #79

emailweixu · 2019-10-09T22:05:29Z

python/social_bot/teacher_tasks.py

+        if (not self.should_use_curriculum_training() or
+            self._is_full_range_in_curriculum):
+            return
+        while len(self._q) >= self._max_reward_q_length:


deque has an argument maxlen. You don't need to pop it if maxlen is provided.

emailweixu · 2019-10-09T22:06:40Z

python/social_bot/teacher_tasks.py

+        while len(self._q) >= self._max_reward_q_length:
+            self._q.popleft()
+        self._q.append(value)
+        if (value > 0 and len(self._q) == self._max_reward_q_length and


I think "len(self._q) == self._max_reward_q_length" can be removed. It's unlikely to exceed the reward_thresh without the queue being full.

Sounds good.

Oh, actually after curriculum advances, we clear the deque, and it's very likely the agent can get a few episodes successfully (because agent can already pass earlier level in curriculum), and pass the next level of the curriculum by accident.

emailweixu · 2019-10-09T22:07:49Z

python/social_bot/teacher_tasks.py

+    def _push_reward_queue(self, value):
+        if (not self.should_use_curriculum_training() or
+            self._is_full_range_in_curriculum):
+            return
+        while len(self._q) >= self._max_reward_q_length:
+            self._q.popleft()
+        self._q.append(value)
+        if (value > 0 and len(self._q) == self._max_reward_q_length and
+            sum(self._q) >= self._max_reward_q_length *
+                self._reward_thresh_to_increase_range):
+            self._random_range *= 1. + self._increase_range_by_percent
+            if self._random_range > self._orig_random_range:
+                self._random_range = self._orig_random_range
+            logging.info("Raising random_range to %f", self._random_range)
+            self._q.clear()


I think using queue if Ok, it's easier to reason about the effect, plus the code isn't that much complex if maxlen is used.

le-horizon · 2019-10-10T00:06:12Z

Thanks for the comments guys. Please take another look.

Jialn · 2019-10-10T08:06:03Z

python/social_bot/envs/grocery_ground.py

@@ -509,6 +536,9 @@ def __init__(self,
                 agent_type='pioneer2dx_noplugin',
                 world_time_precision=None,
                 step_time=0.1,
+                 random_goal=None,


It seems that random_goal，fail_distance_thresh and max_steps are not used in this class. These parameters are configured it by gin files.

Good point. Forgot to remove them.

remove unused code

le-horizon

Good catch Jiangtao. I also added capability to allow specifying a subset of goals.

le-horizon · 2019-10-10T15:54:03Z

python/social_bot/envs/grocery_ground.py

@@ -509,6 +536,9 @@ def __init__(self,
                 agent_type='pioneer2dx_noplugin',
                 world_time_precision=None,
                 step_time=0.1,
+                 random_goal=None,


Good point. Forgot to remove them.

Le Zhao added 4 commits October 7, 2019 16:25

allow adjust grocery_ground max_steps and max_failure_distance, also …

bd77133

…resolve gym.Space.DiscreteSequence not supported when _spec_from_gym_space (cherry picked from commit 213c46a7d4b6d4388d5df5b710b24d4bb9e2f42b) make groceryground goal task more configurable

make goal task configurable via gin

0320df1

make goal task configurable via gin

ed16e50

address Jiangtao's comments on GroceryGoundGoalTask initialization y make goal task configurable via gin address Jiangtao's comments on GroceryGoundGoalTask initialization y

allow curriculum training for grocery ground goal task

c29778b

le-horizon requested review from Jialn, emailweixu and hnyu October 7, 2019 23:35

Jialn reviewed Oct 9, 2019

View reviewed changes

python/social_bot/envs/grocery_ground.py Show resolved Hide resolved

Jialn reviewed Oct 9, 2019

View reviewed changes

python/social_bot/envs/grocery_ground.py Outdated Show resolved Hide resolved

Jialn reviewed Oct 9, 2019

View reviewed changes

emailweixu reviewed Oct 9, 2019

View reviewed changes

HorizonRobotics deleted a comment from emailweixu Oct 9, 2019

address comments

16b9960

compute success rate only after deque is full

3bc2591

Jialn suggested changes Oct 10, 2019

View reviewed changes

allow specifying a subset of goals in grocery ground goal task

b963045

remove unused code

le-horizon commented Oct 10, 2019

View reviewed changes

typo fix

1320e4f

Jialn approved these changes Oct 11, 2019

View reviewed changes

le-horizon merged commit 1751b97 into master Oct 11, 2019

witwolf deleted the PR_grocery_vocab_curriculum branch November 20, 2019 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr allow curriculum learning in grocery ground goal task #82

Pr allow curriculum learning in grocery ground goal task #82

le-horizon commented Oct 7, 2019

Jialn Oct 9, 2019

emailweixu Oct 9, 2019

Jialn Oct 9, 2019

emailweixu Oct 9, 2019

le-horizon Oct 9, 2019

Jialn Oct 10, 2019

emailweixu Oct 9, 2019

emailweixu Oct 9, 2019

le-horizon Oct 9, 2019

emailweixu Oct 9, 2019

le-horizon Oct 9, 2019

le-horizon Oct 10, 2019

emailweixu Oct 9, 2019

le-horizon commented Oct 10, 2019

Jialn Oct 10, 2019

le-horizon Oct 10, 2019

le-horizon left a comment

le-horizon Oct 10, 2019

Pr allow curriculum learning in grocery ground goal task #82

Pr allow curriculum learning in grocery ground goal task #82

Conversation

le-horizon commented Oct 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

le-horizon commented Oct 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

le-horizon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment