[mlpack] Gsoc Idea - Reinforcement Learning
Marcus Edel
marcus.edel at fu-berlin.de
Wed Mar 30 22:47:32 EDT 2022
Hello Eshaan,
Thanks for the introduction and the interest in the project. PPO, TD3, ACKTR,
HER, Improving Rainbow implementation are all interesting methods providing a
good baseline. My suggestion would be to pick one or two; I don't think it's
feasible to implement everything over the summer, especially if we want to
implement proper tests and a dedicated tutorial; those things often take more
time than anticipated. You are right about the existing Rainbow features; there
is no need to mention it on the GSoC idea page anymore. I'll go ahead and update
the section.
I hope anything I said was helpful. Let me know if there is anything I should
clarify.
Thanks
Marcus
> On Mar 30, 2022, at 6:42 PM, Eshaan Agarwal <eshaan060202 at gmail.com> wrote:
>
> Hello everyone,
>
> I'm Eshaan, 2nd year student at IIT(BHU), Varanasi, India. I would like to spend the coming summer working with the mlpack library under GSoC.
>
> I have been working with mlpack for quite a while, and have been familiarizing myself with the RL codebase. I want to propose a potential idea for a large project (~350 hours) and get the community's feedback to strengthen my proposal.
> As per my knowledge, there have been various attempts of adding algorithms like DDPG at https://github.com/mlpack/mlpack/pull/2912 <https://github.com/mlpack/mlpack/pull/2912>, PPO at https://github.com/mlpack/mlpack/pull/2788 <https://github.com/mlpack/mlpack/pull/2788> and https://github.com/mlpack/mlpack/pull/1912 <https://github.com/mlpack/mlpack/pull/1912>.
>
> So, I would like to extend the library, by adding the implementation of some popular algorithms, along with proper tests and documentation, and dedicated tutorial. I have the following in mind:
>
> 1) PPO - PPO is one of the most sought-out algorithms which has not been implemented yet. More specifically, I intend to implement the Clipped version of PPO.
> 2) Twin Delayed DDPG(TD3) : While DDPG can achieve great performance, it is brittle to hyperparameters and other kinds of tuning. TD3 comes with 3 major improvements to counter this.
> 3) ACKTR :
> 4) Hindsight Experience Replay (HER) - Particularly helpful in multi-task and sparse reward situations which is often encountered in practical scenarios like Robotics etc; It can be also added as a component in DQN, QR-DQN, SAC, TQC, TD3, or DDPG etc;
> 5) Revisiting and Improving Rainbow -
> Implement various flavours of DQN like - QR-DQN, IDQN and Modified Rainbow as per https://arxiv.org/abs/2011.14826 <https://arxiv.org/abs/2011.14826>
> Benchmarking of DQN, Rainbow and other flavors amongst themselves.
> Benchmark our implemented algorithms against other existing version like OpenAi’s Baselines, Google’s Dopamine etc;
>
> Besides that, I actually have a question - I noticed that all components of Rainbow are present in the library but I am not sure why it remains a subtopic in the Reinforcement Learning Section of GSOC Ideas. Is there anything left in Rainbow ?
>
> Which among these should I proceed with, for making a proposal? Also, do suggest any other algorithms that you might be thinking of. What should be the ideal number of deliverables sufficient for a large sized project on this topic? Please let me know your thoughts.
>
> Looking forward to hearing back from the community :)
>
>
> Thanks for reading!
>
>
>
>
>
>
> _______________________________________________
> mlpack mailing list
> mlpack at lists.mlpack.org
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20220330/10595ab6/attachment-0001.htm>
More information about the mlpack
mailing list