[mlpack] Gsoc Idea - Reinforcement Learning

Wed Mar 30 18:42:40 EDT 2022

Hello everyone,

I'm Eshaan, 2nd year student at IIT(BHU), Varanasi, India. I would like to
spend the coming summer working with the mlpack library under GSoC.

I have been working with mlpack for quite a while, and have been
familiarizing myself with the RL codebase. I want to propose a potential
idea for a large project (~350 hours) and get the community's feedback to
strengthen my proposal.

As per my knowledge, there have been various attempts of adding algorithms
like DDPG at https://github.com/mlpack/mlpack/pull/2912, PPO at
https://github.com/mlpack/mlpack/pull/2788 and
https://github.com/mlpack/mlpack/pull/1912.

So, I would like to extend the library, by adding the implementation of
some popular algorithms, along with proper tests and documentation, and
dedicated tutorial. I have the following in mind:

1) PPO - PPO is one of the most sought-out algorithms which has not been
implemented yet. More specifically, I intend to implement the Clipped
version of PPO.

2) Twin Delayed DDPG(TD3) : While DDPG can achieve great performance, it is
brittle to hyperparameters and other kinds of tuning. TD3 comes with 3
major improvements to counter this.

3) ACKTR :

4) Hindsight Experience Replay (HER) - Particularly helpful in multi-task
and sparse reward situations which is often encountered in practical
scenarios like Robotics etc; It can be also added as a component in DQN,
QR-DQN, SAC, TQC, TD3, or DDPG etc;

5) Revisiting and Improving Rainbow -

   -

   Implement various flavours of DQN like - QR-DQN, IDQN and Modified
   Rainbow as per https://arxiv.org/abs/2011.14826

   -

   Benchmarking of DQN, Rainbow and other flavors amongst themselves.
   -

   Benchmark our implemented algorithms against other existing version like
   OpenAi’s Baselines, Google’s Dopamine etc;

Besides that, I actually have a question - I noticed that all components of
Rainbow are present in the library but I am not sure why it remains a
subtopic in the Reinforcement Learning Section of GSOC Ideas. Is there
anything left in Rainbow ?

Which among these should I proceed with, for making a proposal? Also, do
suggest any other algorithms that you might be thinking of. What should be
the ideal number of deliverables sufficient for a large sized project on
this topic? Please let me know your thoughts.

Looking forward to hearing back from the community :)

Thanks for reading!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20220331/78a8d80d/attachment.htm>