[mlpack] Extending RL codebase

Tue Mar 17 08:17:21 EDT 2020

Hello mentors!

I am Nishant, a sophomore at IIT(BHU), Varanasi, India. I would like to
spend my time this summer, working with the Mlpack library under GSoC.

I have been working with mlpack for quite a while, and have been
familiarizing myself with the RL codebase. After exploring and going
through the tests, I felt that a lot of state of the art algorithms
implementations are still missing. From what I could infer, presently, we
only have the implementation of DQN(with DoubleDQN), Multistep DQN and
async multistep DQNs and sarsa in the codebase. Also, I guess PR#1912 (PPO)
is in the process of getting merged.

So, I would like to extend the library, by adding the implementation of
some relatively recent model-free learning algorithms, along with proper
tests and documentation, and dedicated tutorial, if time permits. I have
the following in mind:
1) *Soft Actor-Critic and A2C/A3C*: the two of them are quite versatile and
most people would want to use it. SAC is a relatively new idea.
2) *Twin Delayed DDPG*: It's new and a stable version of DDPG.
3) *ACKTR and Hindsight Experience Replay(HER) support for DQNs*: these are
also recent ideas, although I am not sure of their practical use cases.
4) *Rainbow DQN*: three of the six extensions are already added, so I think
adding the remaining would probably not take much time. So implementing
Rainbow with one of the above algorithms should be enough for the 12-week
program.

Which among these should I proceed with, for making a proposal? Kindly let
me know your thoughts on them. Also, do suggest any other algorithms that
you might be thinking of.
I would also like to know if anyone else has been working on the same
ideas, in order to avoid redundancies.

Please let me know your thoughts.

Thanks for reading!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20200317/a34af116/attachment.htm>