-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDP exploration #227
Comments
i can do the change you asked for, I didn't understand that the quick solution from yesterday would have a small impact, and was focusing on other stuff I need to sort out for TSC deployment... Should have something ready tmrw before lunch time. I'll also take a look in Werner to see if there's anything looking strange in the MDP, will get back to you in a bit. Regarding exploration idea: to learn edge |
I didn't really realise that this would still be a problem when we talked about it yesterday, sorry. Regarding the exploration, how would that be triggered? |
well, you'd need a node adding tasks to the scheduler at certain times or something. @hawesie might be able to tell you where to find last year's script |
I looked at the last MDP that was generated for execution (got it as 18h30 BST more or less), and the prediction values at that time made sense. I also generated a policy to reach KinderGarten, and it doesn't do the stupid detour. Do you always see this behaviour @cdondrup ? |
see https://github.com/strands-project/strands_executive/tree/aaf-forb for policy generation that always avoids forbidden waypoints (unless they are either start or target wps), and ignores them during execution. |
So @Jailander did some nice plots on predictions for yesterday that seem to confirm that the problem is the predictions. He also checked the amount of nav stats for each of the edges, and that doesn't seem to be a problem:
14hCEST is 46800 in these plots: Note that the success probability for 13->88 around 46800 is pretty low. This means that with the current predictions, the use of forbidden nodes (see my previous post) is a good way of avoiding the unwanted route. This raises the question of how to predict successful edge traversals: I think that what is happening is fremen is trying to model stupid move base failures, which are not periodic really. Then the models become inaccurate. I think we should be a bit more selective in terms of which edges to fremenise, and just do "non-temporal" predictions for the others. We should think about this for TSC, does anyone have any suggestion? |
We currently have an issue in AAF regarding the MDP path planning. The info terminal is running at certain locations in the building that are close to walls to not have the robot be an obstacle in the centre of the corridor while waiting for people to info terminate. This means that these waypoints are visited much more frequently than "good waypoints" that are close to these but more to the centre of the corridor. See picture below:
The robot has an info_termianl waypoint at
WayPoint68
which is the seconds from the right. Apparently it learned that even if it doesn't have an info terminal task there but wants to go toKindergarten
(first waypoint on the left), it will still go viaWayPoint68
instead of skipping it going fromWayPoint13
directly toWayPoint88
. This means that the robot drives very close to the wall even if it wouldn't have to. This led to the robot getting stuck before a walking group because it came to close to the wall. Moving these points more to the centre of the corridor is sadly not possible.The question now is, how could we solve this? @bfalacerda offered to change the fobidden nodes implementation to allow going to forbidden nodes when the task is at these nodes but never traverse them when the task is at a different node. This however only works if the robot is not coming from a forbidden node. Since the robot is always infoterminating, it will come from a forbidden node in the majority of times. Hence, this behaviour will not help us.
Another question is, why is the robot learning that this route is better? Is it because it has more training data?Also, what is the exploration strategy if there is one or will it always be greedy and therefore always get close to the wall? @marc-hanheide and I had a brief discussion about this and came up with a few ideas. Exploration should help but obviously just randomly exploring edges (e.g. in epsilon greedy) will lead to illegible behaviour of the navigation and will never bring the robot to its goal. To mitigate this, the topological route could also be calculated and exploration would mean to take the topo route instead of the MDP one. This would still bring the robot to the goal but would at least explore the topo rout in case this differs from the MDP. Still not perfect because this will always choose the MDP or the route with the minimum number of nodes traversed but might never find a route that is neither but still quicker. We couldn't really come up with a better exploration idea but maybe @bfalacerda or @gestom have one.
Regarding our current predicament, if @bfalacerda is willing to do so, the forbidden node behaviour could be changed so it will never go through forbidden nodes even when coming from a forbidden node. Since we will solve night time exploration differently in AAF, this would be fine for us. We could then restart the whole learning process, keep the current data, and add the info terminal waypoints to the forbidden zones and have a new learning outcome for the last 4 weeks of the deployment that might still be useful for @bfalacerda ?!
I hope I didn't forget anything @marc-hanheide
The text was updated successfully, but these errors were encountered: