QUESTION

Text
Image

help
1. Value Iteration for Markov Decision Process ద Bookmark this page Homework due Aug 17, 2022 14:59 +03 Consider the following problem through the lens of a Markov Decision Process (MDP), and answer questions 1 - 3 accordingly. Damilola is a soccer player for the ML United under-15s who is debating whether to sign for the NLP Albion youth team or the Computer Vision Wanderers youth team. After three years, signing for NLP Albion has two possibilities: He will still be in the youth team, earning 10,000 (60\% chance), or he will make the senior team and earn 70,000 (40\% chance). Lastly, he is assured of making the Computer Vision Wanderers senior team after three years, with a salary of 37,000.
Q2 1 point possible (graded) Let us now assume that Damilola cares about the utility derived from the salary as opposed to the salary $S$ itself. And his utility function, which baffles economists, is given by Utility, $U=\Psi S^{2}+\zeta$ where $\Psi, \zeta>0$, and $\Psi$ \& are constants. In this scenario, the optimal policy $\pi^{*}$ (ML United under-15s) would be to sign for NLP Albion. True False Submit You have used 0 of 1 attempt Save Q3 1 point possible (graded) There are 3 unique states defined in total in this setting. True False
b 3 points possible (graded) If we initialize the value function with 0 , enter the value of state $B$ after: one value iteration, $V_{B 1}^{*}$ two value iterations, $V_{B 2}^{*}$ infinite value iterations, $V_{B}^{*}$ Submit You have used 0 of 3 attempts Save C 1 point possible (graded) Select all that are true In an MDP, the optimal policy for a given state $s$ is unique The problem of determining the value of a state is solved recursively by value iteration algorithm For a given MDP, the value function $V^{*}(s)$ of each state is known a priori $V^{*}(s)=\sum_{s^{\prime}} T\left(s, a, s^{\prime}\right)\left[R\left(s, a, s^{\prime}\right)+\gamma V^{*}\left(s^{\prime}\right)\right]$ $Q^{*}(s, a)=\sum_{s^{\prime}} T\left(s, a, s^{\prime}\right)\left[R\left(s, a, s^{\prime}\right)+\gamma V^{*}\left(s^{\prime}\right)\right]$

Public Answer

LTBDTZ The First Answerer