Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDP exploration #227

Open
cdondrup opened this issue Apr 19, 2016 · 7 comments
Open

MDP exploration #227

cdondrup opened this issue Apr 19, 2016 · 7 comments

Comments

@cdondrup
Copy link
Member

We currently have an issue in AAF regarding the MDP path planning. The info terminal is running at certain locations in the building that are close to walls to not have the robot be an obstacle in the centre of the corridor while waiting for people to info terminate. This means that these waypoints are visited much more frequently than "good waypoints" that are close to these but more to the centre of the corridor. See picture below:

screenshot from 2016-04-19 17 36 31

The robot has an info_termianl waypoint at WayPoint68 which is the seconds from the right. Apparently it learned that even if it doesn't have an info terminal task there but wants to go to Kindergarten (first waypoint on the left), it will still go via WayPoint68 instead of skipping it going from WayPoint13 directly to WayPoint88. This means that the robot drives very close to the wall even if it wouldn't have to. This led to the robot getting stuck before a walking group because it came to close to the wall. Moving these points more to the centre of the corridor is sadly not possible.

The question now is, how could we solve this? @bfalacerda offered to change the fobidden nodes implementation to allow going to forbidden nodes when the task is at these nodes but never traverse them when the task is at a different node. This however only works if the robot is not coming from a forbidden node. Since the robot is always infoterminating, it will come from a forbidden node in the majority of times. Hence, this behaviour will not help us.

Another question is, why is the robot learning that this route is better? Is it because it has more training data?Also, what is the exploration strategy if there is one or will it always be greedy and therefore always get close to the wall? @marc-hanheide and I had a brief discussion about this and came up with a few ideas. Exploration should help but obviously just randomly exploring edges (e.g. in epsilon greedy) will lead to illegible behaviour of the navigation and will never bring the robot to its goal. To mitigate this, the topological route could also be calculated and exploration would mean to take the topo route instead of the MDP one. This would still bring the robot to the goal but would at least explore the topo rout in case this differs from the MDP. Still not perfect because this will always choose the MDP or the route with the minimum number of nodes traversed but might never find a route that is neither but still quicker. We couldn't really come up with a better exploration idea but maybe @bfalacerda or @gestom have one.

Regarding our current predicament, if @bfalacerda is willing to do so, the forbidden node behaviour could be changed so it will never go through forbidden nodes even when coming from a forbidden node. Since we will solve night time exploration differently in AAF, this would be fine for us. We could then restart the whole learning process, keep the current data, and add the info terminal waypoints to the forbidden zones and have a new learning outcome for the last 4 weeks of the deployment that might still be useful for @bfalacerda ?!

I hope I didn't forget anything @marc-hanheide

@bfalacerda
Copy link
Contributor

i can do the change you asked for, I didn't understand that the quick solution from yesterday would have a small impact, and was focusing on other stuff I need to sort out for TSC deployment... Should have something ready tmrw before lunch time.

I'll also take a look in Werner to see if there's anything looking strange in the MDP, will get back to you in a bit.

Regarding exploration idea: to learn edge WayPoint1_WayPoint2, our edge learning last year added topological_navigation tasks with starting_node=WayPoint1, and goal.target=WayPoint2.

@cdondrup
Copy link
Member Author

I didn't really realise that this would still be a problem when we talked about it yesterday, sorry.

Regarding the exploration, how would that be triggered?

@bfalacerda
Copy link
Contributor

well, you'd need a node adding tasks to the scheduler at certain times or something. @hawesie might be able to tell you where to find last year's script

@hawesie
Copy link
Member

hawesie commented Apr 19, 2016

@bfalacerda
Copy link
Contributor

I looked at the last MDP that was generated for execution (got it as 18h30 BST more or less), and the prediction values at that time made sense. I also generated a policy to reach KinderGarten, and it doesn't do the stupid detour.

Do you always see this behaviour @cdondrup ?

@bfalacerda
Copy link
Contributor

see https://github.com/strands-project/strands_executive/tree/aaf-forb for policy generation that always avoids forbidden waypoints (unless they are either start or target wps), and ignores them during execution.

@bfalacerda
Copy link
Contributor

So @Jailander did some nice plots on predictions for yesterday that seem to confirm that the problem is the predictions. He also checked the amount of nav stats for each of the edges, and that doesn't seem to be a problem:

WayPoint68_WayPoint88: 26
WayPoint13_WayPoint68: 61
WayPoint13_WayPoint88: 45

14hCEST is 46800 in these plots:

13->88:
wp13_wp88

13->68:
wp13_wp68

68->88
wp68-wp88

Note that the success probability for 13->88 around 46800 is pretty low.

This means that with the current predictions, the use of forbidden nodes (see my previous post) is a good way of avoiding the unwanted route.

This raises the question of how to predict successful edge traversals: I think that what is happening is fremen is trying to model stupid move base failures, which are not periodic really. Then the models become inaccurate. I think we should be a bit more selective in terms of which edges to fremenise, and just do "non-temporal" predictions for the others. We should think about this for TSC, does anyone have any suggestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants