[mlpack] Reinforcement Learning Project - GSOC21

Wahyu Guntara wahyu.guntara at gmail.com
Mon Mar 29 03:03:31 EDT 2021

Hello everyone,

I am planning on contributing to mlpack under GSOC 2021 (Reinforcement
Learning project ideas
Currently, there is only one implementation of policy gradient methods in
mlpack, namely SAC. PPO method is listed in the project ideas but there's
already a PR on that <https://github.com/mlpack/mlpack/pull/2788>. So, I
would like to propose the implementation of other policy gradient methods
as my GSOC 2021 project.

There are tons of policy gradient methods
but as a starting point, I would like to implement from the basic
first. OpenAI's
Spinning Up <https://spinningup.openai.com/en/latest/user/introduction.html>
has starting code for some policy gradient methods i.e. vanilla policy
gradient (actor-critic), TRPO, PPO, DDPG, TD3, and SAC. Following from
this, I wish to implement the vanilla policy gradient methods (reinforce
and actor-critic), TRPO, and DDPG. What do you think about that as my
potential GSOC 2021 project?

Besides that, I actually have a question about the mlpack's reinforcement
learning methods. Why does it use template parameters everywhere? Why not
use inheritance? For example, *prioritized_replay* and *random_replay* can
inherit from a *base_replay_buffer* class, *q_networks* classes can inherit
from *base_q_network* class. The former case will allow easier replay
buffer customisation (e.g. maybe there are some new prioritization
formulas, etc), while the latter case will avoid confusion on how to use
*q_learning* class (e.g. confusion like this one

Best Regards,
Tri Wahyu Guntara
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210329/ca83fdb8/attachment.htm>

More information about the mlpack mailing list