PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2023 | vol. 23, iss. 1 | 1--15
Tytuł artykułu

Solving Finite-Horizon Discounted Non-Stationary MDPS

Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Research background: Markov Decision Processes (MDPs) are a powerful framework for modeling many real-world problems with finite-horizons that maximize the reward given a sequence of actions. Although many problems such as investment and financial market problems where the value of a reward decreases exponentially with time, require the introduction of interest rates. Purpose: This study investigates non-stationary finite-horizon MDPs with a discount factor to account for fluctuations in rewards over time. Research methodology: To consider the fluctuations of rewards with time, the authors define new non****stationary finite-horizon MDPs with a discount factor. First, the existence of an optimal policy for the proposed finite-horizon discounted MDPs is proven. Next, a new Discounted Backward Induction (DBI) algorithm is presented to find it. To enhance the value of their proposal, a financial model is used as an example of a finite-horizon discounted MDP and an adaptive DBI algorithm is used to solve it. Results: The proposed method calculates the optimal values of the investment to maximize its expected total return with consideration of the time value of money. Novelty: No existing studies have before examined dynamic finite-horizon problems that account for temporal fluctuations in rewards. (original abstract)
Rocznik
Strony
1--15
Opis fizyczny
Twórcy
  • Sultan Moulay Slimane University
autor
  • Sultan Moulay Slimane University, Béni Mellal, Morocco
Bibliografia
  • Allamigeon, X., Boyet, M., Gaubert, S. (2021). Piecewise Affine Dynamical Models of Petri Nets-Application to Emergency Call Centers. Fundamenta Informaticae, 183(3-4), 169-201. DOI: 10.3233/FI-2021-2086.
  • Asadi, A., Pinkley, S.N., Mes, M. (2022). A Markov decision process approach for managing medical drone deliveries. Expert Systems With Applications, 204, 117490. DOI: 10.1016/j.eswa.2022.117490.
  • Bellman, R. (1958). Dynamic programming and stochastic control processes. Information and Control, 1(3), 228-239. DOI: 10.1016/S0019-9958(58)80003-0.
  • Bertsekas, D. (2012). Dynamic programming and optimal control: Volume I (vol. 1). Athena scientific.
  • Bertsimas, D., Mišić, V.V. (2016). Decomposable markov decision processes: A fluid optimization approach. Operations Research, 64(6), 1537-1555. DOI: 10.1287/opre.2016.1531.
  • Dulac-Arnold, G., Levine, N., Mankowitz, D.J., Li, J., Paduraru, C., Gowal, S., Hester, T. (2021). Challenges of real-world reinforcement learning: Definitions, benchmarks and analysis. Machine Learning, 110(9), 2419-2468. DOI: 10.1007/s10994-021-05961-4.
  • El Akraoui, B., Daoui, C., Larach, A. (2022). Decomposition Methods for Solving Finite-Horizon Large MDPs. Journal of Mathematics, 2022. DOI: 10.1155/2022/8404716.
  • Emadi, H., Atkins, E., Rastgoftar, H. (2022). A Finite-State Fixed-Corridor Model for UAS Traffic Management. ArXiv Preprint ArXiv:2204.05517.
  • Feinberg, E.A. (2016). Optimality conditions for inventory control. In Optimization Challenges in Complex, Networked and Risky Systems (pp. 14-45). INFORMS. DOI: 10.1287/educ.2016.0145.
  • Hordijk, A., Kallenberg, L.C.M. (1984). Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints. Mathematical Programming, 30(1), 46-70. DOI: 10.1007/BF02591798.
  • Howard, R.A. (1960). Dynamic programming and markov processes. MIT Press, Cambridge, MA. https://books.google.co.ma/books?id=fXJEAAAAIAAJ.
  • Kallenberg, L.C.M. (1983). Linear programming and finite Markovian control problems, Math. Centre Tracts, 148, 1-245.
  • Larach, A., Chafik, S., Daoui, C. (2017). Accelerated decomposition techniques for large discounted Markov decision processes. Journal of Industrial Engineering International, 13(4), 417-426. DOI: 10.1007/s40092-017-0197-7.
  • Mao, W., Zheng, Z., Wu, F., Chen, G. (2018). Online Pricing for Revenue Maximization with Unknown Time Discounting Valuations. IJCAI, 440-446. DOI: 10.24963/ijcai.2018/61.
  • Pavitsos, A., Kyriakidis, E.G. (2009). Markov decision models for the optimal maintenance of a production unit with an upstream buffer. Computers & Operations Research, 36(6), 1993-2006. DOI: 10.1016/j.cor.2008.06.014.
  • Peng, H., Cheng, Y., Li, X. (2023). Real-Time Pricing Method for Spot Cloud Services with Non-Stationary Excess Capacity. Sustainability, 15(4), 3363.
  • Puterman, M.L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons Inc. DOI: 10.1002/9780470316887.
  • Rimélé, A., Grangier, P., Gamache, M., Gendreau, M., Rousseau, L.-M. (2021). E-commerce warehousing: Learning a storage policy. ArXiv Preprint ArXiv:2101.08828. DOI: 10.48550/arXiv.2101.08828.
  • Spieksma, F., Nunez-Queija, R. (2015). Markov Decision Processes. Adaptation of the Text by R. Nunez-Queija, 55.
  • Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9-44. DOI: 10.1007/BF00115009.
  • White III, C.C., White, D.J. (1989). Markov decision processes. European Journal of Operational Research, 39(1), 1-16. DOI: 10.1016/0377-2217(89)90348-2.
  • Wu, Y., Zhang, J., Ravey, A., Chrenko, D., Miraoui, A. (2020). Real-time energy management of photovoltaic-assisted electric vehicle charging station by markov decision process. Journal of Power Sources, 476, 228504.
  • Ye, G., Lin, Q., Juang, T.-H., Liu, H. (2020). Collision-free Navigation of Human-centered Robots via Markov Games. 2020 IEEE International Conference on Robotics and Automation (ICRA), 11338-11344. DOI: 10.1109/ICRA40945.2020.9196810.
  • Ye, Y. (2011). The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Mathematics of Operations Research, 36(4), 593-603. DOI: 10.1287/moor.1110.0516.
  • Zhang, Y., Kim, C.-W., Tee, K.F. (2017). Maintenance management of offshore structures using Markov process model with random transition probabilities. Structure and Infrastructure Engineering, 13(8), 1068-1080. DOI: 10.1080/15732479.2016.1236393.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.ekon-element-000171668929

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane

Musisz być zalogowany aby pisać komentarze.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.