強化學習

出自維基百科,自由嘅百科全書
跳去導覽 跳去搵嘢
強化學習嘅抽象圖解

強化學習粵拼koeng4 faa3 hok6 zaap6英文reinforcement learning,簡稱「RL」)係機械學習上嘅一種學習範式:喺強化學習嘅過程當中,研究者唔會有數據 俾個機械學習程式睇同跟住學-唔似得監督式或者非監督式學習,而係俾個程式自主噉同周圍環境互動(個環境可以係現場,又可以係一個模擬嘅環境):喺每一個時間點 ,個程式會產生一個用輸出嘅數字表示嘅動作,而跟住佢周圍個環境會俾返一啲回輸(feedback)-簡單啲講就係話返俾個程式聽,佢個動作啱唔啱。而個程式跟手就會根據呢個回輸計吓,睇吓要點樣改佢嗰啲參數,先可以令到下次佢做行動嗰陣得到正面回應嘅機會率高啲[1][2]

概念[編輯]

一個強化學習過程可以模擬成一個馬可夫決策過程(Markov decision process)[3]

  • 環境(environment):
    • 環境有若有若干個可能狀態(state),
    • 環境狀態可以因為個體嘅行動而改變,
    • 對個實驗者嚟講,環境狀態嘅改變規律-環境狀態嘅函數-可以係已知(如果係模擬環境),又可以係未知(如果係現實世界)。
  • 個體(agent):
    • 個體有若干個可能狀態
    • 個體有若干個可能行動(action),
    • 政策(policy)-指「由外界感知到嘅 」去到「要採取嘅行動 」嘅關係,
    • 係喺時間點 ,做咗行動 之後環境由 變成 機會率
    • 強化或者叫回報(reward),簡單講反映咗「如果做咗行動 之後環境由 變成 ,個個體有幾「鍾意」呢個結果」。

強化學習嘅用途好廣泛,例如可以用嚟教人工智能程式打機:只要研究者用某啲方法令個程式能夠感知遊戲嘅狀態同有方法向隻遊戲俾輸入,順利嘅話,強化學習可以令個程式學識玩隻遊戲[4][5]

應用例子[編輯]

用強化學習教一個程式喺一個虛擬空間入面爬嘅前後示範

睇埋[編輯]

參考[編輯]

  • Auer, Peter; Jaksch, Thomas; Ortner, Ronald (2010). "Near-optimal regret bounds for reinforcement learning". Journal of Machine Learning Research. 11: 1563–1600.
  • Busoniu, Lucian; Babuska, Robert; De Schutter, Bart; Ernst, Damien (2010). Reinforcement Learning and Dynamic Programming using Function Approximators. Taylor & Francis CRC Press. ISBN 978-1-4398-2108-4.
  • François-Lavet, Vincent; Henderson, Peter; Islam, Riashat; Bellemare, Marc G.; Pineau, Joelle (2018). "An Introduction to Deep Reinforcement Learning". Foundations and Trends in Machine Learning. 11 (3–4): 219–354. arXiv:1811.12560. Bibcode:2018arXiv181112560F. doi:10.1561/2200000071.
  • Powell, Warren (2007). Approximate dynamic programming: solving the curses of dimensionality. Wiley-Interscience. ISBN 978-0-470-17155-4.
  • Sutton, Richard S.; Barto, Andrew G. (1998). Reinforcement Learning: An Introduction. MIT Press. ISBN 978-0-262-19398-6.
  • Sutton, Richard S. (1988). "Learning to predict by the method of temporal differences". Machine Learning. 3: 9–44. doi:10.1007/BF00115009.
  • Szita, Istvan; Szepesvari, Csaba (2010). "Model-based Reinforcement Learning with Nearly Tight Exploration Complexity Bounds" (PDF). ICML 2010. Omnipress. pp. 1031–1038. Archived from the original (PDF) on 2010-07-14.

[編輯]

  1. Kaelbling, Leslie P.; Littman, Michael L.; Moore, Andrew W. (1996). "Reinforcement Learning: A Survey". Journal of Artificial Intelligence Research. 4: 237–285.
  2. Dominic, S.; Das, R.; Whitley, D.; Anderson, C. (July 1991). "Genetic reinforcement learning for neural networks". IJCNN-91-Seattle International Joint Conference on Neural Networks. Seattle, Washington, USA: IEEE.
  3. François-Lavet, Vincent; Henderson, Peter; Islam, Riashat; Bellemare, Marc G.; Pineau, Joelle (2018). "An Introduction to Deep Reinforcement Learning". Foundations and Trends in Machine Learning. 11 (3–4): 219–354.
  4. Dubey, R., Agrawal, P., Pathak, D., Griffiths, T. L., & Efros, A. A. (2018). Investigating human priors for playing video games. arXiv preprint arXiv:1802.10217.
  5. Algorta, S., & Şimşek, Ö. (2019). The Game of Tetris in Machine Learning. arXiv preprint arXiv:1905.01652.

[編輯]