Evaluating the Efficacy of Deep Neural Networks in Reinforcement Learning Problems

  • Amir Girgis Jumeirah College, Sheikh Zayed Road, Dubai, UAE
Keywords: Deep Neural Networks, Deep Reinforcement Learning, Model-based methods, Monte Carlo Tree Search, AlphaGo, Q-learning, Deep-Q Networks.


The deep learning community has greatly progressed towards integrating deep neural nets with reinforcement learning, in what is termed ‘deep reinforcement learning.’ This project aims to investigate the importance of deep neural networks in reinforcement learning. It analyzes the role that deep learning plays in tackling a range of different reinforcement learning problems. By analyzing and evaluating different methods (like Monte Carlo Tree Searches and model-based methods), the project refutes the popular claim that deep reinforcement learning is always the best option to tackle certain problems and explores research papers that support this hypothesis. It identifies the current limitations of deep neural nets, such as overfitting, sparse/shaped reward functions, and sample inefficiency. The project also discusses the potential of Deep-Q Networks, and surprising results in various domains. Thus, in an attempt to compare the merits and problems of deep learning, the project determines the degree to which neural networks are useful in reinforcement learning problems, both now and in the future. Taking the AlphaGo algorithm (and how it beat world Go champion Lee Sedol) case study as a starting point, the project unveils the potential of deep reinforcement learning despite the many challenges it faces today. Therefore, it also aims to come to a conclusion about how deep neural nets in reinforcement learning is likely to develop in the future as data becomes increasingly available and hardware becomes cheaper.


[1] F. Rosenblatt. “The Perceptron: A probabilistic model for information storage and organization in the brain.” Psychological Review vol. 65, no. 6, pp. 1-6 1958
[2] Skymind. “A Beginner's Guide to Deep Reinforcement Learning.” Internet: https://skymind.ai/wiki/deep-reinforcement-learning [24 August 2018]
[3] M. Riedmiller, T. Gabel, R. Hafner and S. Lange. "Reinforcement learning for robot soccer", Autonomous Robots, vol. 27, no. 1, pp. 55-73, 2009.
[4] A. Gosavi, EMGT 457. Class Lecture, Topic: ‘Neural Networks and Reinforcement Learning’, Rolla, MO 65409, Department of Engineering Management and Systems Engineering, Missouri University of Science and Technology, 2018
[5] V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis. "Human-level control through deep reinforcement learning", Nature, vol. 518, no. 7540, pp. 529-533, 2015.
[6] T. Schual, J. Quan,, I. Antonoglou, D. Silver. “Prioritized Experience Replay” in 6th Int. Conf. on Learning Representations, Vancouver, Canada 2016
[7] M. Hessel. et. al. “Rainbow: Combining Improvement in Deep Reinforcement Learning” presented at the AAAI Conf. on Artificial Intelligence, New Orleans, Louisiana, 2018
[8] X. Guo, S. Singh, H. Lee, R. Lewis, X. Wang. “Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning” in Advances in Neural Information Processing Systems (NIPS), Montreal, Canada, 2014
[9] M. Kelly. “Introduction to Trajectory Optimization” Internet: https://www.youtube.com/watch?v=wlkRYMVUZTs, May 1, 2016 [Sep. 2, 2018]
[10] N. Heess. et. al. “Emergence of Locomotion Behaviours in Rich Environments” in CoRR abs/1707.02286, 2017
[11] N. Heess. et. al. “Emergence of Locomotion Behaviours in Rich Environments” Internet: https://www.youtube.com/watch?v=hx_bgoTF7bs, July 14, 2017 [Sep. 2 2018]
[12] Y. Tassa, T. Erez, E. Todorov. “Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization” in IEEE/RSJ Int. Conf. on Intelligent Robot and Systems, 2012, pp: 4906 - 4913
[13] A. Irpan. “Deep Reinforcement Learning Doesn't Work Yet.” Internet: https://www.alexirpan.com/2018/02/14/rl-hard.html, June 24, 2018 [Sep. 2 2018].
[14] J. Clark, D. Amodei. “Faulty Reward Functions in the Wild.” Internet: https://blog.openai.com/faulty-reward-functions/, Dec. 21 2016 [Sep. 5 2018].
[15] I. Popov. et. al. “Data-efficient Deep Reinforcement Learning for Dexterous Manipulation” in CoRR, abs/1704.03073, 2017