[mlpack] Evolution strategies along with policy gradients
Marcus Edel
marcus.edel at fu-berlin.de
Tue Mar 6 15:28:49 EST 2018
Hello Chirag,
> I could implement a basic evolution strategies module within the
> src/mlpack/methods/reinforcement_learning module or as a separate module itself,
> and test it on sample functions for a start ( reference :
> https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d)
It might make sense to implement the Natural Evolution Strategie as an
optimizer, see mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html and
arxiv.org/abs/1711.06581 for more information. Let me know what you think.
> All in all, I feel I can form a proper timeline to try to fit this in the
> timeframe of the summer.
Agreed, really like the idea to combine RL with Neuroevolution, also
https://github.com/mlpack/mlpack/wiki/Google-Summer-of-Code-Application-Guide
might be helpful.
Let me know if I should clarify anything.
Thanks,
Marcus
> On 3. Mar 2018, at 16:31, Chirag Ramdas <chiragramdas at gmail.com> wrote:
>
> Hello Marcus,
>
> Following up on my previous email, where I mentioned finding this idea very interesting
> https://arxiv.org/abs/1802.04821 <https://arxiv.org/abs/1802.04821>
>
> So in the past three days, I have been going through OpenAI's blog on Evolution strategies as well their paper.
> https://arxiv.org/abs/1703.03864 <https://arxiv.org/abs/1703.03864>
> https://blog.openai.com/evolution-strategies/ <https://blog.openai.com/evolution-strategies/>
>
> The blog post is very well written, and brings out the simple yet beautiful way in which evolution strategies work.
>
> In terms of the paper in general, where they have combined evolution strategies along with policy gradients, I feel it would be a nice addition to the existing code base of mlpack.
>
> I could implement a basic evolution strategies module within the src/mlpack/methods/reinforcement_learning module or as a separate module itself, and test it on sample functions for a start ( reference : https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d <https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d>)
>
> After that, i could go on and implement the idea suggested in the paper, which combines it with a policy gradient technique.
>
> Since the paper suggests that their results are at par with state of the art TRPO/PPO, we could also benchmark the performance of this technique against a standard MuJoCo environment.
>
> All in all, I feel I can form a proper timeline to try to fit this in the timeframe of the summer.
>
> Do let me know what you feel about this, and if it appeals to you!
>
> Thanks a lot!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180306/d7c5a55f/attachment.html>
More information about the mlpack
mailing list