Mathematics of Operations Research, Vol. 2, No. 4 (Nov., 1977), pp. 360-381 (22 pages) This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary ...
Define state-value and (true) state value of an MDP Define Q-value and (true) Q value of an MDP The idea of discounting stems from the common idea that a reward now is better than the same reward ...
Unele rezultate au fost ascunse, deoarece pot fi inaccesibile pentru dvs.
Afișați rezultatele inaccesibile