[mlpack] Reinforcement Learning Project - GSOC21

Mon Mar 29 03:03:31 EDT 2021

Hello everyone,

I am planning on contributing to mlpack under GSOC 2021 (Reinforcement
Learning project ideas
<https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas#reinforcement-learning>).
Currently, there is only one implementation of policy gradient methods in
mlpack, namely SAC. PPO method is listed in the project ideas but there's
already a PR on that <https://github.com/mlpack/mlpack/pull/2788>. So, I
would like to propose the implementation of other policy gradient methods
as my GSOC 2021 project.

There are tons of policy gradient methods
<https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html>,
but as a starting point, I would like to implement from the basic
first. OpenAI's
Spinning Up <https://spinningup.openai.com/en/latest/user/introduction.html>
has starting code for some policy gradient methods i.e. vanilla policy
gradient (actor-critic), TRPO, PPO, DDPG, TD3, and SAC. Following from
this, I wish to implement the vanilla policy gradient methods (reinforce
and actor-critic), TRPO, and DDPG. What do you think about that as my
potential GSOC 2021 project?

Besides that, I actually have a question about the mlpack's reinforcement
learning methods. Why does it use template parameters everywhere? Why not
use inheritance? For example, *prioritized_replay* and *random_replay* can
inherit from a *base_replay_buffer* class, *q_networks* classes can inherit
from *base_q_network* class. The former case will allow easier replay
buffer customisation (e.g. maybe there are some new prioritization
formulas, etc), while the latter case will avoid confusion on how to use
*q_learning* class (e.g. confusion like this one
<https://github.com/mlpack/mlpack/issues/2849>).

-- 
Best Regards,
Tri Wahyu Guntara
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210329/ca83fdb8/attachment.htm>