Double-Factored Decision Theory for Markov Decision Processes with Multiple Scenarios of the Parameters

Expand
  • Software Research and Data Science, Amazon Robotics, North Reading, MA 01864, USA

Received date: 2022-03-16

  Revised date: 2022-12-20

  Online published: 2025-07-07

Supported by

This research was originally supported by the (United States) National Science Foundation (No. 1409214).

Abstract

The double-factored decision theory for Markov decision processes with multiple scenarios of the parameters is proposed in this article. We introduce scenario belief to describe the probability distribution of scenarios in the system, and scenario expectation to formulate the expected total discounted reward of a policy. We establish a new framework named as double-factored Markov decision process (DFMDP), in which the physical state and scenario belief are shown to be the double factors serving as the sufficient statistics for the history of the decision process. Four classes of policies for the finite horizon DFMDPs are studied and it is shown that there exists a double-factored Markovian deterministic policy which is optimal among all policies. We also formulate the infinite horizon DFMDPs and present its optimality equation in this paper. An exact solution method named as double-factored backward induction for the finite horizon DFMDPs is proposed. It is utilized to find the optimal policies for the numeric examples and then compared with policies derived from other methods from the related literatures.

Cite this article

Cheng-Jun Hou . Double-Factored Decision Theory for Markov Decision Processes with Multiple Scenarios of the Parameters[J]. Journal of the Operations Research Society of China, 2025 , 13(2) : 484 -514 . DOI: 10.1007/s40305-023-00457-5

References

[1] Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 2nd edn. Wiley, New Jersey (2005)
[2] Beemsterboer, B.J., Land, M.J., Teunter, R.H.: Flexible lot sizing in hybrid make-to-order/make-to-stock production planning. Eur. J. Oper. Res. 260(3), 1014-1023 (2017)
[3] Chen, N., Teven, K., Wang, C.: A partitioning algorithm for markov decision processes with applications to market microstructure. Manag. Sci. 64(2), 784-803 (2018)
[4] Steimle, L.N., Kaufman, D.L., Denton, B.T.: Multi-model Markov decision processes. IISE Trans. 53(10), 1124-1139 (2022)
[5] Buchholz, P., Scheftelowitsch, D.: Computation of weighted sums of rewards for concurrent MDPs. Math. Methods Oper. Res. 89(1), 1-42 (2019)
[6] Givan, R., Leach, S., Dean, T.: Bounded-parameter Markov decision processes. Artif. Intell. 122(1-2), 71-109 (2000)
[7] Delgado, K.V., Sanner, S., de Barros, L.N.: Efficient solutions to factored MDPs with imprecise transition probabilities. Artif. Intell. 175(9-10), 1498-1527 (2011)
[8] Witwicki, S.J., Melo, F.S., Capitan, J., Spaan, M.T.J.: A flexible approach to modeling unpredictable events in MDPs. In: Proceedings of Twenty-Third International Conference on Automated Planning and Scheduling ICAPS2013, pp. 260-268 (2013)
[9] Duff, M.: Optimal learning: computational procedures for bayes-adaptive markov decision processes. Ph.D. thesis, University of Massachusetts Amherst, Amherst, MA (2002)
[10] Castro, P. S., Precup, D.: Using linear programming for Bayesian exploration in Markov decision processes. In: International Joint Conference on Artificial Intelligence IJCAI2007, pp. 2437-2442 (2007)
[11] Kumar, P.: Information theoretic learning methods for Markov decision processes with parametric uncertainty. Ph.D. thesis, University of Washington (2018).
[12] Iyengar, G.: Robust dynamic programming. Math. Oper. Res. 30(2), 257-280 (2005)
[13] Nilim, A., El Ghaoui, L.: Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780-798 (2005)
[14] Delgado, K.V., de Barros, L.N., Dias, D.B., Sanner, S.: Real-time dynamic programming for Markov decision process with imprecise probabilities. Artif. Intell. 230(8), 192-223 (2016)
[15] Moreira, D.A.M., Delgado, K.V., de Barros, L.N.: Robust probabilistic planning with ilao. Appl. Intell. 45(3), 662-672 (2016)
[16] Delage, E., Shie, M.: Percentile optimization for markov decision processes with parameter uncertainty. Oper. Res. 58(1), 203-213 (2010)
[17] Adulyasak, Y., Varakantham, P., Ahmed, A., Jaillet, P.: Solving uncertain MDPs with objectives that are separable over instantiations of model uncertainty. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, AAAI Press, pp. 3454-3460 (2015)
[18] Ahmed, A., Varakantham, P., Lowalekar, M., Adulyasak, Y., Jaillet, P.: Sampling based approaches for minimizing regret in uncertain Markov decision processes (MDPs). J. Artif. Intell. Res. 59, 229-264 (2017)
[19] Meraklı, M., Küçükyavuz, S.: Risk aversion to parameter uncertainty in Markov decision processes with an application to slow-onset disaster relief. IISE Trans. 52(8), 811-831 (2019)
[20] Shani, G., Heckerman, D., Brafman, R.: An MDP-based recommender system. J. Mach. Learn. Res. 6(43), 1265-1295 (2005)
[21] Chen, Q., Ayer, T., Chhatwal, J.: Sensitivity analysis in sequential decision models: a probabilistic approach. Med. Decis. Making 37(2), 243-252 (2017)
[22] Bala, M.V., Mauskopf, J.A.: Optimal assignment of treatments to health states using a Markov decision model. Pharmacoeconomics 24(4), 345-354 (2006)
Options
Outlines

/