[mlpack] [gsoc-2019] Interested in add RL algorithm to codebase

Kaiqiang Xu rickllykqxu at gmail.com
Sat Apr 6 11:50:52 EDT 2019


Hello, Mentors,
	I am second-year postgraduate student majoring in CS. I have commit some PR in mlpack, especially in Decision Tree / decision stump. These days I am glad to see that implementing SOTA RL algorithm is arranged as a project in GSoC2019. I am very interested in contribute code about RL algorithm to mlpack. I am familiar with Actor-Critic algorithm series such as A3C, DDPG, PPO and etc. Of course If I want to implement them, I need to dive into them.
	Here is my thought on this subject. Firstly I will build a Actor-Critic algorithm framework in order to integrate various algorithm into it conveniently later. Then I will implement ACKTR and TRPO/PPO. Considering Month May, so here are around 4 months to finish this task. My proposal is as following:

0. week -6 ~ -4 (in April), I am familiar with the framework and logic of MLPack. However I also should cost time to read the code about the implementation in method/reinforcement_learning, and the zoq/gym_tcp_api <https://github.com/zoq/gym_tcp_api>. I think it it better to understand the system design done by pioneer. 
1. week -3 ~ 0 (in May), review these method and refer other implementations based on Pytorch to summary the framework of the A-C algorithm. A comprehensive report should be presented. Besides, a manuscript about a scalable reinforcement learning module design should present.
2. week 1 ~ 4 (Project Beginning), finish the Actor-Critic framework, make it compatible with existing code, or refactor existing code for scalable considering future.
3. week 5 ~ 8, finish the PPO and ACKTR. It is SOTA method. Some simple tests should be done to guarantee code correction.
4. week 9~12, a comprehensive unit test should be done and corresponding documents and tutorials should be posted. The result should be comparable to other implementation based on Pytorch. And documents about API should be clarify. A tutorial containing an RL example or demo also should be posted.

How do you think of it? Can you give me some suggestions? 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20190406/7e5b3c60/attachment.html>


More information about the mlpack mailing list