[mlpack] Gsoc Idea - Reinforcement Learning

Wed Mar 30 22:47:32 EDT 2022

Hello Eshaan,

Thanks for the introduction and the interest in the project. PPO, TD3, ACKTR,
HER, Improving Rainbow implementation are all interesting methods providing a
good baseline. My suggestion would be to pick one or two; I don't think it's
feasible to implement everything over the summer, especially if we want to
implement proper tests and a dedicated tutorial; those things often take more
time than anticipated. You are right about the existing Rainbow features; there
is no need to mention it on the GSoC idea page anymore. I'll go ahead and update
the section.

I hope anything I said was helpful. Let me know if there is anything I should
clarify.

Thanks
Marcus

> On Mar 30, 2022, at 6:42 PM, Eshaan Agarwal <eshaan060202 at gmail.com> wrote:
> 
> Hello everyone,
> 
> I'm Eshaan, 2nd year student at IIT(BHU), Varanasi, India. I would like to spend the coming summer working with the mlpack library under GSoC.
> 
> I have been working with mlpack for quite a while, and have been familiarizing myself with the RL codebase. I want to propose a potential idea for a large project (~350 hours) and get the community's feedback to strengthen my proposal. 
> As per my knowledge, there have been various attempts of adding algorithms like DDPG at https://github.com/mlpack/mlpack/pull/2912 <https://github.com/mlpack/mlpack/pull/2912>, PPO at https://github.com/mlpack/mlpack/pull/2788 <https://github.com/mlpack/mlpack/pull/2788> and https://github.com/mlpack/mlpack/pull/1912 <https://github.com/mlpack/mlpack/pull/1912>.
> 
> So, I would like to extend the library, by adding the implementation of some popular algorithms, along with proper tests and documentation, and dedicated tutorial. I have the following in mind:
> 
>   1) PPO - PPO is one of the most sought-out algorithms which has not been implemented yet. More specifically, I intend to implement the Clipped version of PPO.
>   2) Twin Delayed DDPG(TD3) : While DDPG can achieve great performance, it is brittle to hyperparameters and other kinds of tuning. TD3 comes with 3 major improvements to counter this. 
>   3) ACKTR : 
>   4) Hindsight Experience Replay (HER) - Particularly helpful in multi-task and sparse reward situations which is often encountered in practical scenarios like Robotics etc; It can be also added as a component in   DQN, QR-DQN, SAC, TQC, TD3, or DDPG etc;
>  5) Revisiting and Improving Rainbow - 
> Implement various flavours of DQN like - QR-DQN, IDQN and Modified Rainbow as per https://arxiv.org/abs/2011.14826 <https://arxiv.org/abs/2011.14826> 
> Benchmarking of DQN, Rainbow and other flavors amongst themselves.
> Benchmark our implemented algorithms against other existing version like OpenAi’s Baselines, Google’s Dopamine etc;
> 
> Besides that, I actually have a question - I noticed that all components of Rainbow are present in the library but I am not sure why it remains a subtopic in the Reinforcement Learning Section of GSOC Ideas. Is there anything left in Rainbow ?
> 
> Which among these should I proceed with, for making a proposal? Also, do suggest any other algorithms that you might be thinking of. What should be the ideal number of deliverables sufficient for a large sized project on this topic? Please let me know your thoughts.
> 
> Looking forward to hearing back from the community :)
> 
> 
> Thanks for reading!
> 
> 
> 
> 
> 
> 
> _______________________________________________
> mlpack mailing list
> mlpack at lists.mlpack.org
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20220330/10595ab6/attachment-0001.htm>