mlpack  blog
Deep Reinforcement Learning Methods - Summary

Deep Reinforcement Learning Methods - Summary

Shangtong Zhang, 21 August 2017

This blog is the summary of my gsoc project – implementation of popular deep reinforcement learning methods. During this project, I implemented deep (double) q learning, asynchronous one/n step q learning, asynchronous one step sarsa and asynchronous advantage actor critic (in progress), as well as two classical control problems, i.e. mountain car and cart pole, to test my implementations.


My work mainly locates in methods/reinforcement_learning folder

  • q_learning.hpp: the main entrance for (double) q learning
  • async_learning.hpp: the main entrance for async methods
  • training_config.hpp: wrapper for hyper parameters
  • environment: implementation of two classical control problems, i.e. mountain car and cart pole
  • policy: implementation of several behavior policies
  • replay: implementation of experience replay
  • network: wrapper for non-standard networks (e.g actor critic network without shared layers)
  • worker: implementation of async rl methods

Refactoring of existing neural network components is another important part of my work

  • Detachment of module and optimizer: This influences all the optimizers and most test cases.
  • PVS convention: Now many of mlpack components comply with pass-by-reference, which is less flexible. I proposed the idea of pass-by-value in combination with std::move. This is assumed to be a very huge change, now only newly added components adopts this convention. Ryan is working on old codebase.
  • Exposure of Forward and Backward: Before this we only have Predict and Train, which may lead to duplicate computation in some case. By the exposure of Forward and Backward, we can address this issue.
  • Support for shared layers: This is still in progress, however I think it's very important for A3C to work with Atari. We proposed the Alias layer to address this issue. This is also a huge change, which will influence all the visitors.
  • Misc update of old APIs.

Detailed usage can be found in the two test cases: async_learning_test.cpp and q_learning_test.cpp. You can run the test cases by bin/mlpack_test -t QLearningTest and bin/mlpack_test -t AsyncLearningTest.

In total, I contributed following PRs:


The most challenging parts are:

  • Making amount of threads independent with amount of workers in async rl methods: This is really a fantastic idea. To my best knowledge, I haven't seen any public implementation of this idea. All the available implementations in the Internet simply assume them to be equal. To achieve this, we need to build a worker pool and use step instead of episode as a job unit.
  • Alias layer: This blocked me most and is still blocking me. We need a deep understanding of armadillo memory management, boost::variant and #include directives.

Future Work

Apparently RL support of MLPack is far from complete. Supporting classical control problems is an important milestone – we are almost there. However we are still far from the next milestone – Atari games. At least we need GPU support, infrastructure of basic image processing and an effective communication method with popular simulators (e.g. OpenAI gym, ALE).


I thank Marcus for his mentorship during the project and detailed code review. I also want to thank Ryan for his thoughtful suggestions. I also appreciate the generous funding from Google.