[mlpack] Reinforcement Learning Project - GSOC21

Mon Mar 29 09:38:28 EDT 2021

Hello Tri,

welcome and thanks for getting in touch, the methods you proposed
fit perfectly in the current codebase, so if you are interested in implementing
these, please feel free to submit a proposal. About inheritance, in general,
we like to avoid inheritance in deference to templates because virtual
functions incur runtime overhead, especially in critical inner loops where
these functions are called many, many times; this overhead is non-negligible.
Of course, there are exceptions for instance, we are currently restructuring
the network code to use inheritance instead of boost::variant because it
turned out boost::variant introduced some complexity and could be slow.

Let me know if I should clarify anything further.

Thanks,
Marcus

> On 29. Mar 2021, at 03:03, Wahyu Guntara <wahyu.guntara at gmail.com> wrote:
> 
> Hello everyone,
> 
> I am planning on contributing to mlpack under GSOC 2021 (Reinforcement Learning project ideas <https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas#reinforcement-learning>). Currently, there is only one implementation of policy gradient methods in mlpack, namely SAC. PPO method is listed in the project ideas but there's already a PR on that <https://github.com/mlpack/mlpack/pull/2788>. So, I would like to propose the implementation of other policy gradient methods as my GSOC 2021 project.
> 
> There are tons of policy gradient methods <https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html>, but as a starting point, I would like to implement from the basic first. OpenAI's Spinning Up <https://spinningup.openai.com/en/latest/user/introduction.html> has starting code for some policy gradient methods i.e. vanilla policy gradient (actor-critic), TRPO, PPO, DDPG, TD3, and SAC. Following from this, I wish to implement the vanilla policy gradient methods (reinforce and actor-critic), TRPO, and DDPG. What do you think about that as my potential GSOC 2021 project?
> 
> Besides that, I actually have a question about the mlpack's reinforcement learning methods. Why does it use template parameters everywhere? Why not use inheritance? For example, prioritized_replay and random_replay can inherit from a base_replay_buffer class, q_networks classes can inherit from base_q_network class. The former case will allow easier replay buffer customisation (e.g. maybe there are some new prioritization formulas, etc), while the latter case will avoid confusion on how to use q_learning class (e.g. confusion like this one <https://github.com/mlpack/mlpack/issues/2849>).
> 
> -- 
> Best Regards,
> Tri Wahyu Guntara
> _______________________________________________
> mlpack mailing list
> mlpack at lists.mlpack.org
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210329/ae553f5b/attachment.htm>