Jumpy Car - Estimate/Solve the V* values for the following MDP

Question

Jumpy Car - Estimate/Solve the V* values for the following MDP

You are traveling on a straight road, but have a jumpy car. The car sometimes "jumps" (moves double). At other times, it doesn't move at all. The following MDP has been created to model this behavior and the landscape.

Estimate the V* values (optimal values for the states) for this MDP:

S1	S2	S3	S4	S5	S6	S7	S8	S9
sqrt(3)				100				sqrt(3)

MDP is defined as follows: There are two actions L (Left) and R (Right). When moving left, there is a 40% chance moving left, 50% chance of moving DOUBLE (2 spots left), and 10% chance of not moving at all. Similarly, when moving right, there is a 40% chance moving right, 50% chance of moving DOUBLE (2 spots right), and 10% chance of not moving at all.

S1, S5 and S9 are terminal states with values sqrt(3), 100 and sqrt(3) respectively.

Discount factor is 0.9 (so, gamma = 0.9).

There is no living reward, that is R(s,a,s’) = 0.

asked Mar 1, 2021 in MDP by amrinderarora AlgoMeister (2.1k points)
edited Mar 1, 2021 by amrinderarora

14 Answers

Page:

Related questions

0 votes

1 answer

Solve the V* values for this MDP

asked Feb 23, 2021 in MDP by amrinderarora AlgoMeister (2.1k points)

0 votes

1 answer

Solve the V* values for this MDP - 5x5

asked Mar 30, 2021 in MDP by amrinderarora AlgoMeister (2.1k points)

0 votes

1 answer

Solve the V* values for this grid world MDP - 3 x 4

asked Apr 16, 2023 in Informed Search by amrinderarora AlgoMeister (2.1k points)

0 votes

0 answers

Solve this grid world.

asked May 6, 2025 in Informed Search by amrinderarora AlgoMeister (2.1k points)

0 votes

16 answers

MLE for the given MDP and Observation Sequence (RGB Color Cycler)

asked Apr 24 in HMM by amrinderarora AlgoMeister (2.1k points)

muradvaliyev · Answer 1 · 2026-03-29T15:45:47+0000

Hello professor,

Terminal states: V*(S1)=sqrt(3)≈1.732, V*(S5)=100, V*(S9)=sqrt(3)≈1.732

Discount: γ=0.9, R(s,a,s')=0

TRANSITIONS:

Action R: 40%->s+1, 50%->s+2, 10%->stay

Action L: 40%->s-1, 50%->s-2, 10%->stay

SYMMETRY:

S1=S9 and both equal sqrt(3), so: V*(S4)=V*(S6)=a, V*(S3)=V*(S7)=b, V*(S2)=V*(S8)=c

Optimal action in S2,S3,S4 = R (right), since S5=100 dominates.

SOLVING S4: Action R: 40%->S5, 50%->S6=a, 10%->S4=a

a = 0.9[0.4(100) + 0.5a + 0.1a]

a = 36 + 0.54a

0.46a = 36

a ≈ 78.26

SOLVING S3: Action R: 40%->S4=a, 50%->S5, 10%->S3=b

b = 0.9[0.4(78.26) + 0.5(100) + 0.1b]

b = 73.17 + 0.09b

0.91b = 73.17

b ≈ 80.41

SOLVING S2: Action R: 40%->S3=b, 50%->S4=a, 10%->S2=c

c = 0.9[0.4(80.41) + 0.5(78.26) + 0.1c]

c = 64.16 + 0.09c

0.91c = 64.16

c ≈ 70.51

FINAL VALUES

S1=1.732 | S2=70.51 | S3=80.41 | S4=78.26 | S5=100 | S6=78.26 | S7=80.41 | S8=70.51 | S9=1.732

NamiqPlanov · Answer 2 · 2026-03-29T17:16:32+0000

Hello Sir,
Terminal States: S1 = sqrt(3), S5 = 100, S9 = sqrt(3)
Actions = L(left) and R(right)

Transition probabilities: 40% - 1 step; 50% - 2 steps; 10% - stay

Discount factor - 0.9

Reward - 0

S4 = V4=0.9(0.4V5+0.5V6+0.1V4) = 89.4

S6 = V6=0.9(0.4V5+0.5V4+0.1V6) = 89.4

S3 = V3=0.9(0.4V4+0.5V5+0.1V3) = 77.8

S7 = V7=0.9(0.4V6+0.5V5+0.1V7) = 77.8

S2 = V2=0.9(0.4V3+0.5V4+0.1V2) = 65

S8 = V8=0.9(0.4V7+0.5V6+0.1V8) = 65

Ilaha_Behbudova · Answer 3 · 2026-03-29T19:00:11+0000

Hello everyone,

States: S1, S2, S3, S4, S5, S6, S7, S8, S9

Terminal states: S1 = √3 = 1,732 , S5 = 100, S9 = √3 = 1.732

Actions: L (Left), R (Right)

Transition probabilities:

40% normal move (1 step)

50% double move (2 steps)

10% no move (0 steps)

gamma = 0.9

R(s,a,s') = 0

Now, according to Bellman equation, if we want to calculate S2, for instance:

Action Right :

40% → S3: 0.4 × V(S3)

50% → S4: 0.5 × V(S4)

10% → S2: 0.1 × V(S2)

V_R(S2) = 0.9 x (0.4×V(S3) + 0.5×V(S4) + 0.1×V(S2))

Action Left:

40% → S1: 0.4 × √3

50% → S0 : 0.5 × V(S2) (There is no S0, so it stays at S2)

10% → S2: 0.1 × V(S2)

V_L(S2) = 0.9 x (0.4×1.732 + 0.6×V(S2) = 0.693 + 0.6×V(S2))

Optimal choice:

V(S2) = max(V_R, V_L) = V_R, because the optimal policy goes toward S5.

Iteration

Initially: V(S2) = 0

İter 1:

V(S3) = 0.9×(0.4×0 + 0.5×100) = 45 S4+S5+S3

V(S4) = 0.9×(0.4×100 + 0) = 36 S5 + S6 + S4

V(S2) = 0.9×(0.4×45 + 0.5×36) = 32.4 S3 + S4 + S2

After a few iterations, we get the answer, but all are just approximations:

V(S1) = 1.732* (√3)

V(S2) ≈ 81.0*

V(S3) ≈ 90.0*

V(S4) ≈ 94.5*

V(S5) = 100.0*

V(S6) ≈ 94.5*

V(S7) ≈ 90.0*

V(S8) ≈ 81.0*

V(S9) = 1.732* (√3)

Symmetric structure: S2≈S8, S3≈S7, S4≈S6

l.khudadatzada · Answer 4 · 2026-03-30T05:37:34+0000

The optimal policy is like this:
S2, S3, S4: choose R
S6, S7, S8: choose L

By symmetry;

V(S6)=V(S4)
V(S7)=V(S3)
V(S8)=V(S2)

and gamma = .9

So, the Bellman equations are:

V(S2)=0.9(0.1V(S2)+0.4V(S3)+0.5V(S4))
V(S3)=0.9(0.1V(S3)+0.4V(S4)+0.5*100)
V(S4)=0.9(0.1V(S4)+0.4*100+0.5V(S4))

To solve it easily replacing it with x, y and z:

x=S2=S8; y=S3=S7; z=S4=S6

So,
x = .9 (0.4y + 0.5z + 0.1x)
y = .9(0.4z + 50 + 0.1y)
z = .9(40 + 0.5z + 0.1z)

First, we solve z:
z = .9(40 + 0.6z)
z = 36 + 0.54z
z = 36/0.46 ≈ 78.26

Meaning S4 and S6 are approximately 78.26
------------------------------------------------------------------
Then, y:
y = .9(0.4*78.26 + 50 + 0.1y)
0.91y = 73.18
y ≈ 80.42

Meaning S3 and S7 are approximately 80.42
------------------------------------------------------------------
Lastly, for x:
x = .9(0.4*80.42 + 0.5*78.26 + 0.1x)
x = 0.9(71.3 +0,1x)
0.91x = 64.17
x ≈ 70.52

Meaning S2 and S8 are approximately 70.52
------------------------------------------------------------------

Result:

V(S2)=V(S8) ≈ 70.52
V(S3)=V(S7) ≈ 80.42
V(S4)=V(S6) ≈ 78.26

nijatjafarov · Answer 5 · 2026-03-30T08:26:17+0000

Hello!

Gamma = 0.9

By symmetry:
V(S2)=V(S8)
V(S3)=V(S7)
V(S4)=V(S6)

# V(S4) and V(S6)
From S4:
0.4 to S5
0.5 to S6
0.1 for S4

V(S4)=0.9(0.4⋅100+0.5V(S6)+0.1V(S4))
Since V(S4)=V(S6):
V(S4)=0.9(0.4⋅100+0.5V(S4)+0.1V(S4))
V(S4)=0.9(40+0.6V(S4))
V(S4)=36+0.54V(S4))
V(S4)≈78.26
Therefore, V(S6)≈78.26

# V(S3) and V(S7)
From S3:
0.4 to S4
0.5 to S5
0.1 for S3

V(S3)=0.9(0.4V(S4)+0.5⋅100+0.1V(S3))
Since V(S4)≈78.26:
V(S3)=0.9(0.4⋅78.26+0.5⋅100+0.1V(S3))
V(S3)=73.1736+0.09V(S3)
V(S3)≈80.41
Therefore, V(S7)≈80.41

# V(S2) and V(S8)
From S2:
0.4 to S3
0.5 to S4
0.1 for S2

V(S2)=0.9(0.4V(S3)+0.5V(S4)+0.1V(S2))
Since V(S3)≈80.41, V(S4)≈78.26:
V(S2)=0.9(0.4⋅80.41+0.5⋅78.26+0.1V(S2))
V(S2)=0.9(32.164+39.130+0.1V(S2))
V(S2)=64.1646+0.09V(S2)
V(S2)≈70.51
Therefore, V(S8)≈70.51

# Final V* values
Terminal states
V*(S1)=sqrt(3)≈1.732
V*(S5)=100
V*(S9)=sqrt(3)≈1.732
Non-terminal states
V*(S2)≈70.51
V*(S3)≈80.41
V*(S4)≈78.26
V*(S6)≈78.26
V*(S7)≈80.41
V*(S8)≈70.51

mehribanaliyeva · Answer 6 · 2026-03-30T08:40:24+0000

Hello Professor,

Optimal policy for all states here is to move toward S5 as its terminal value is highest.

S2, S3, S4 - go right

S6, S7, S8- go left

V(S2)=0.9(0.1V(S2)+0.4V(S3)+0.5V(S4))

V(S3)=0.9(0.1V(S3)+0.4V(S4)+0.5⋅100)

V(S4)=0.9(0.1V(S4)+0.5V(S6)+0.4⋅100)

V(S6)=0.9(0.1V(S6)+0.5V(S4)+0.4⋅100)

V(S7)=0.9(0.1V(S7)+0.4V(S6)+0.5⋅100)

V(S8)=0.9(0.1V(S8)+0.4V(S7)+0.5V(S6))

By symmetry, V(S2) = V(S8)=X, V(S3)=V(S7)=Y, V(S4)=V(S6)=Z

Solving for X,Y,Z:

X=0.9(0.1X+0.4Y+0.5Z)

0.91X=0.36Y+0.45Z

Y=0.09Y+0.36Z+45

0.91Y=0.36Z+45

Z=0.9(0.6Z+40)

Z= 78.26

Substitute into

0.91Y=0.36Z+45

y = 80.41

Substitute

0.91X=0.36Y+0.45Z

x = 70.51

As a result, V2= 70.51, V3 = 80.41, V4 = 78.26, V6, = 78.26, V7 = 80.41, V8= 70.51

Soltan · Answer 7 · 2026-03-30T10:06:08+0000

At each state, we can move either to the left or to the right. Since the reward at S5 is 100, it will be more optimal for all states to the left of it to move towards S5, given that the reward sqrt(3) at S1, located on the opposite side, is significantly smaller. We also consider the symmetry in the state space and find that S6 = S4, S7 = S3, and S8 = S2. Hence, we can solve the following system of equations to find the values for all 6 unknown states:

10*S2 = 9*0.4*S3 + 9*0.5*S4 + 9*0.1*S2
10*S3 = 9*0.4*S4 + 9*0.5*100 + 9*0.1*S3
10*S4 = 9*0.4*100 + 9*0.5*S4 + 9*0.1*S4

Solution:

10*S4 = 360 + 5.4*S4 => S4 = 360/4.6 = 78.26
10*S3 = 3.6*78.26 + 4.5*100 + 0.9*S3 => S3 = (3.6*78.26 + 450)/9.1 = 80.41
10*S2 = 3.6*80.41 + 4.5*78.26 + 0.9*S2 => (3.6*80.41+4.5*78.26)/9.1 = 70.51

Results:

S2 = S8 = 70.51
S3 = S7 = 80.41
S4 = S6 = 78.26

AysuS25 · Answer 8 · 2026-03-30T20:12:31+0000

Hi,

Since S5 has the highest terminal value, the optimal policy of all non-terminal states is to move toward it.

So the optimal actions will be:

S2, S3, S4 - right

S6, S7, S8 - left

Using the Bellman equations:

V(S2)=0.9x(0.1V(S2)+0.4V(S3)+0.5V(S4))

V(S3)=0.9x(0.1V(S3)+0.4V(S4)+0.5x100)

V(S4)=0.9x(0.1V(S4)+0.5V(S6)+0.4x100)

V(S6)=0.9x(0.1V(S6)+0.5V(S4)+0.4x100)

V(S7)=0.9x(0.1V(S7)+0.4V(S6)+0.5x100)

V(S8)=0.9x(0.1V(S8)+0.4V(S7)+0.5V(S6))

By symmetry,

V(S2)=V(S8)=X, V(S3)=V(S7)=Y, V(S4)=V(S6)=Z

Thus, we will get these equations:

X=0.9(0.1X+0.4Y+0.5Z)

0.91X=0.36Y+0.45Z

Y=0.9(0.1Y+0.4Z+50)

0.91Y=0.36Z+45

Z=0.9(0.6Z+40) - > it gives -> Z=78.26

Then after substitution, 0.91Y=0.36Z+45 gives -> Y=80.41

Then -> 0.91X=0.36Y+0.45Z gives -> X=70.51

So for the state values we will get:

V(S2)=V(S8)=70.51

V(S3)=V(S7)=80.41

V(S4)=V(S6)=78.26

Final results:

V(S2)=70.51

V(S3)=80.41

V(S4)=78.26

V(S6)=78.26

V(S7)=80.41

V(S8)=70.51

omarimamverdiyev · Answer 9 · 2026-03-30T21:20:52+0000

There are 3 terminal states.

V(S1)=sqrt(3) = 1.732,
V(S5)=100,
V(S9)=sqrt(3) = 1.732

As 100 > sqrt(3) --> the best choice is moving towards S5. So:

from S2, S3, S4, the optimal action is Right
from S6, S7, S8, the optimal action is Left

Note: For simplicity I will skip "S" letter in equations. I.e., V(S2) will become V2, V(S3) will become V3, and so on.

Also this problem is symmetric around S5, thus we will have below values same:
V2 = V8 V3 = V7 V4 = V6

So we need to solve equations for S2, S3, and S4

For S2: 40% to S3, 50% to S4, 10% to stay in S2, which becomes:
V2=0.9(0.4*V3+0.5*V4+0.1*V2) --> 0.91*V2 = 0.36*V3+0.45*V4

For S3: 40% to S4, 50% to S5, 10% to stay in S3, which becomes: V3=0.9(0.4*V4+0.5*(100)+0.1*V3) --> 0.91*V3 = 0.36*V4+45

For S4: 40% to S5, 50% to S6, 10% to stay in S4, which becomes: V4=0.9*(0.4*(100)+0.5*V6+0.1*V4)

Recall that V(S4) = V(S6). Then we can simplify above equation by:
V4 = 0.9(40+0.5*V4+0.1*V4) --> 0.46*V4=36 --> V4=78.26

Now substitute value of V4 into V3 equation above:
0.91*V3=0.36*(78.26)+45 --> V3=80.41

Then substitute values of V3 and V4 into V2 equation above:

0.91*V2=0.36*(80.41)+0.45*(78.26) --> V2=70.51

Thus, we have V2 = 70.51, V3 = 80.41, V4 = 78.26. Also, recall that by symmetry:
V8 = V2 = 70.51, V7 = V3 = 80.41, V6 = V4 = 78.26

Therefore, we found all values. Below are final values:

S1	S2	S3	S4	S5	S6	S7	S8	S9
1.732	70.51	80.41	78.26	100	78.26	80.41	70.51	1.732

kamalaghazada · Answer 10 · 2026-03-31T07:57:29+0000

Hello Professor,

The board perfectly mirrors around S5. This means S4 equals S6, S3 equals S7, and S2 equals S8. Moving Right is always the best choice here. We just need to solve the left side using the 0.9 discount factor.

For S4, moving right gives a 40% chance of hitting S5 (worth 100), 50% of double jumping to S6 (which equals S4), and 10% of staying put.

V(S4) = 0.9 * [0.4 * 100 + 0.5 * V(S4) + 0.1 * V(S4)]

V(S4) = 36 + 0.54 * V(S4)

V(S4) = 78.26

For S3, moving right gives a 40% chance to hit S4, a 50% chance to land right on S5, and 10% to stay. It is actually worth more than S4 because of that big 50% chance to hit the jackpot without overshooting.

V(S3) = 0.9 * [0.4 * 78.26 + 0.5 * 100 + 0.1 * V(S3)]

V(S3) = 73.17 + 0.09 * V(S3)

V(S3) = 80.41

For S2, moving right means a 40% chance to hit S3, 50% to jump to S4, and 10% to stay.

V(S2) = 0.9 * [0.4 * 80.41 + 0.5 * 78.26 + 0.1 * V(S2)]

V(S2) = 64.16 + 0.09 * V(S2)

V(S2) = 70.51

Mirroring the values gives the final estimates:

S1: 1.73
S2: 70.51
S3: 80.41
S4: 78.26
S5: 100
S6: 78.26
S7: 80.41
S8: 70.51
S9: 1.73

Categories

Most popular tags

Jumpy Car - Estimate/Solve the V* values for the following MDP

Please log in or register to add a comment.

Please log in or register to answer this question.

14 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions