[mlpack] Evolution strategies along with policy gradients

Tue Mar 6 15:28:49 EST 2018

Hello Chirag,

> I could implement a basic evolution strategies module within the
> src/mlpack/methods/reinforcement_learning module or as a separate module itself,
> and test it on sample functions for a start ( reference :
> https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d)

It might make sense to implement the Natural Evolution Strategie as an
optimizer, see mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html and
arxiv.org/abs/1711.06581 for more information. Let me know what you think.

> All in all, I feel I can form a proper timeline to try to fit this in the
> timeframe of the summer.

Agreed, really like the idea to combine RL with Neuroevolution, also
https://github.com/mlpack/mlpack/wiki/Google-Summer-of-Code-Application-Guide
might be helpful.

Let me know if I should clarify anything.

Thanks,
Marcus

> On 3. Mar 2018, at 16:31, Chirag Ramdas <chiragramdas at gmail.com> wrote:
> 
> Hello Marcus,
> 
> Following up on my previous email, where I mentioned finding this idea very interesting
> https://arxiv.org/abs/1802.04821 <https://arxiv.org/abs/1802.04821>
> 
> So in the past three days, I have been going through OpenAI's blog on Evolution strategies as well their paper.
> https://arxiv.org/abs/1703.03864 <https://arxiv.org/abs/1703.03864>
> https://blog.openai.com/evolution-strategies/ <https://blog.openai.com/evolution-strategies/>
> 
> The blog post is very well written, and brings out the simple yet beautiful way in which evolution strategies work.
> 
> In terms of the paper in general, where they have combined evolution strategies along with policy gradients, I feel it would be a nice addition to the existing code base of mlpack.
> 
> I could implement a basic evolution strategies module within the src/mlpack/methods/reinforcement_learning module or as a separate module itself, and test it on sample functions for a start ( reference : https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d <https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d>)
> 
> After that, i could go on and implement the idea suggested in the paper, which combines it with a policy gradient technique.
> 
> Since the paper suggests that their results are at par with state of the art TRPO/PPO, we could also benchmark the performance of this technique against a standard MuJoCo environment. 
> 
> All in all, I feel I can form a proper timeline to try to fit this in the timeframe of the summer.
> 
> Do let me know what you feel about this, and if it appeals to you!
> 
> Thanks a lot!
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180306/d7c5a55f/attachment.html>