There are 3 terminal states.
- V(S1)=sqrt(3) = 1.732,
- V(S5)=100,
- V(S9)=sqrt(3) = 1.732
As 100 > sqrt(3) --> the best choice is moving towards S5. So:
- from S2, S3, S4, the optimal action is Right
- from S6, S7, S8, the optimal action is Left
Note: For simplicity I will skip "S" letter in equations. I.e., V(S2) will become V2, V(S3) will become V3, and so on.
Also this problem is symmetric around S5, thus we will have below values same:
V2 = V8 V3 = V7 V4 = V6
So we need to solve equations for S2, S3, and S4
For S2: 40% to S3, 50% to S4, 10% to stay in S2, which becomes:
V2=0.9(0.4*V3+0.5*V4+0.1*V2) --> 0.91*V2 = 0.36*V3+0.45*V4
For S3: 40% to S4, 50% to S5, 10% to stay in S3, which becomes: V3=0.9(0.4*V4+0.5*(100)+0.1*V3) --> 0.91*V3 = 0.36*V4+45
For S4: 40% to S5, 50% to S6, 10% to stay in S4, which becomes: V4=0.9*(0.4*(100)+0.5*V6+0.1*V4)
Recall that V(S4) = V(S6). Then we can simplify above equation by:
V4 = 0.9(40+0.5*V4+0.1*V4) --> 0.46*V4=36 --> V4=78.26
Now substitute value of V4 into V3 equation above:
0.91*V3=0.36*(78.26)+45 --> V3=80.41
Then substitute values of V3 and V4 into V2 equation above:
0.91*V2=0.36*(80.41)+0.45*(78.26) --> V2=70.51
Thus, we have V2 = 70.51, V3 = 80.41, V4 = 78.26. Also, recall that by symmetry:
V8 = V2 = 70.51, V7 = V3 = 80.41, V6 = V4 = 78.26
Therefore, we found all values. Below are final values:
| S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 |
| 1.732 | 70.51 | 80.41 | 78.26 | 100 | 78.26 | 80.41 | 70.51 | 1.732 |