smallGridWorld.csv

Value Iteration,652,285,79,7,9,11,9,7,10,12,8,11,8,8,7,7,9,7,8,7,8,9,10,8,11,11,7,8,7,9,8,7,7,22,11,8,10,9,8,16,9,10,10,7,7,7,15,11,8,7,8,10,9,8,8,13,9,8,10,7,8,11,10,7,9,8,10,8,12,7,8,9,12,8,7,15,9,7,13,15,10,9,7,7,11,9,7,14,8,8,7,14,7,7,7,12,9,9,19,9

Policy Iteration,389,112,32,7,7,12,9,13,9,7,7,7,7,9,7,12,9,10,15,7,7,7,10,8,8,8,7,9,10,8,9,7,10,7,9,7,9,9,11,7,13,7,9,9,14,7,7,7,7,9,7,11,12,8,9,9,11,8,9,9,7,7,8,12,7,8,11,7,10,8,9,16,11,8,9,9,10,7,12,7,9,7,10,10,7,9,9,8,9,10,14,9,7,7,7,8,11,12,7,7

The data below shows the number of steps/actions the agent required to reach 
the terminal state given the number of iterations the algorithm was run.
Iterations,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
Value Iteration,652,285,79,7,9,11,9,7,10,12,8,11,8,8,7,7,9,7,8,7,8,9,10,8,11,11,7,8,7,9,8,7,7,22,11,8,10,9,8,16,9,10,10,7,7,7,15,11,8,7,8,10,9,8,8,13,9,8,10,7,8,11,10,7,9,8,10,8,12,7,8,9,12,8,7,15,9,7,13,15,10,9,7,7,11,9,7,14,8,8,7,14,7,7,7,12,9,9,19,9
Policy Iteration,389,112,32,7,7,12,9,13,9,7,7,7,7,9,7,12,9,10,15,7,7,7,10,8,8,8,7,9,10,8,9,7,10,7,9,7,9,9,11,7,13,7,9,9,14,7,7,7,7,9,7,11,12,8,9,9,11,8,9,9,7,7,8,12,7,8,11,7,10,8,9,16,11,8,9,9,10,7,12,7,9,7,10,10,7,9,9,8,9,10,14,9,7,7,7,8,11,12,7,7
Q Learning,47,45,268,22,23,21,25,18,7,7,10,9,8,9,9,28,10,9,61,13,9,32,8,23,25,10,7,73,10,9,8,108,8,23,12,24,9,38,11,23,10,9,7,9,10,15,11,8,7,7,9,8,32,23,9,8,23,7,9,7,10,8,7,7,33,12,8,7,14,7,13,20,10,8,13,13,12,15,12,26,38,16,27,44,41,10,15,12,11,8,7,11,21,30,12,31,21,13,11,57

The data below shows the number of milliseconds the algorithm required to generate 
the optimal policy given the number of iterations the algorithm was run.
Iterations,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
Value Iteration,34,18,3,6,7,7,6,5,6,7,8,24,6,6,6,7,6,8,7,7,8,9,14,15,12,16,13,18,12,32,11,10,9,10,13,13,14,26,10,10,10,14,16,9,9,9,25,12,12,9,9,11,13,14,18,12,14,11,10,10,12,13,22,14,15,16,13,13,16,17,13,15,15,13,13,25,14,16,15,14,14,11,13,10,8,15,17,16,18,22,19,16,14,16,17,21,28,22,21,17
Policy Iteration,4,2,4,4,4,11,6,6,4,7,5,29,6,8,7,8,8,49,8,8,8,51,11,12,11,33,21,11,11,19,16,94,15,14,15,14,14,15,16,34,17,18,20,24,34,570,18,55,22,24,30,22,33,32,45,30,170,40,436,23,78,25,30,27,26,27,30,30,31,24,27,28,33,30,32,40,41,27,46,35,35,34,32,30,34,36,36,39,36,38,41,47,41,44,40,38,39,44,34,31
Q Learning,17,5,10,2,3,2,6,5,5,9,6,5,20,4,11,13,3,3,14,8,4,6,8,5,7,7,3,22,5,9,7,6,5,6,5,8,13,8,9,31,8,8,11,6,13,6,9,8,9,31,9,6,6,7,7,9,12,9,21,5,6,8,17,8,13,11,18,9,10,17,13,12,17,11,13,10,12,11,20,12,15,13,16,10,12,6,12,11,11,20,20,7,9,10,4,5,6,7,11,8

The data below shows the total reward gained for 
the optimal policy given the number of iterations the algorithm was run.
Iterations,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
Value Iteration Rewards,-550.0,-183.0,23.0,95.0,84.0,91.0,84.0,95.0,92.0,90.0,94.0,91.0,94.0,94.0,95.0,95.0,93.0,95.0,94.0,95.0,94.0,93.0,92.0,94.0,91.0,91.0,95.0,94.0,95.0,93.0,94.0,95.0,95.0,71.0,91.0,94.0,92.0,93.0,94.0,86.0,93.0,92.0,92.0,95.0,95.0,95.0,87.0,91.0,94.0,95.0,94.0,83.0,93.0,94.0,94.0,89.0,93.0,94.0,92.0,95.0,94.0,91.0,92.0,95.0,93.0,94.0,92.0,94.0,90.0,95.0,94.0,93.0,90.0,94.0,95.0,78.0,93.0,95.0,89.0,87.0,92.0,93.0,95.0,95.0,91.0,93.0,95.0,79.0,94.0,94.0,95.0,88.0,95.0,95.0,95.0,81.0,93.0,93.0,83.0,93.0
Policy Iteration Rewards,-287.0,-10.0,70.0,95.0,95.0,90.0,93.0,89.0,93.0,95.0,95.0,95.0,95.0,93.0,95.0,90.0,93.0,92.0,78.0,95.0,95.0,95.0,74.0,94.0,94.0,94.0,95.0,93.0,92.0,94.0,93.0,95.0,92.0,95.0,93.0,95.0,93.0,93.0,91.0,95.0,89.0,95.0,93.0,93.0,88.0,95.0,95.0,95.0,95.0,93.0,95.0,91.0,81.0,94.0,93.0,93.0,82.0,94.0,93.0,93.0,95.0,95.0,94.0,90.0,95.0,94.0,82.0,95.0,92.0,94.0,93.0,86.0,82.0,94.0,93.0,93.0,92.0,95.0,90.0,95.0,93.0,95.0,92.0,92.0,95.0,93.0,93.0,94.0,93.0,92.0,79.0,93.0,95.0,95.0,95.0,94.0,91.0,90.0,95.0,95.0
Q Learning Rewards,37.0,12.0,-247.0,80.0,79.0,81.0,77.0,75.0,95.0,95.0,92.0,84.0,94.0,93.0,93.0,65.0,92.0,93.0,41.0,89.0,93.0,43.0,94.0,79.0,59.0,92.0,95.0,11.0,92.0,93.0,94.0,-42.0,94.0,79.0,90.0,51.0,93.0,64.0,91.0,70.0,92.0,93.0,95.0,93.0,92.0,51.0,91.0,94.0,95.0,95.0,84.0,94.0,70.0,79.0,93.0,94.0,79.0,95.0,93.0,95.0,92.0,94.0,95.0,95.0,69.0,90.0,94.0,95.0,88.0,95.0,89.0,82.0,92.0,94.0,89.0,89.0,90.0,87.0,90.0,67.0,55.0,86.0,75.0,58.0,61.0,92.0,87.0,90.0,91.0,94.0,95.0,91.0,81.0,54.0,90.0,35.0,81.0,89.0,91.0,45.0