[mlpack] Greetings GSOC 19 : Idea Reinforcement Learning

Wed Mar 27 00:40:14 EDT 2019

Hello all,

Apologies for the delay in reply. I have started writing the proposal for
the coming GSOC year. I sincerely wanted to know a few things from the
authors. For the PPO Reinforcement Learning algorithm, we can either have 2
different neural networks for policy and value estimation or club these
into a single model with different outputs (as openai baselines or
deepmind). The first option is approachable in MlPack. However, I am
confused with the second approach. I feel that the following lines (
https://github.com/mlpack/mlpack/blob/2635297c8793396e57469bc731451fbe18bed656/src/mlpack/methods/ann/layer/add_merge.hpp#L127-L128)
might be helpful for the purpose, however, I am not completely sure.

Could you please let me know how we can achieve the parameter sharing in
mlpack?

Thanks,

Rohan Raj
Indian Institute of Technology Guwahati
Assam , India
Phone : +91 8723990557

ᐧ

On Mon, 11 Mar 2019 at 01:11, Ryan Curtin <ryan at ratml.org> wrote:

> On Fri, Mar 08, 2019 at 04:31:55AM +0530, Rohan Raj wrote:
> > Hello Ryan, Marcus and fellow contributors of MLPACK,
> >
> > I am Rohan Raj (Github : mirraaj) <https://github.com/mirraaj>,
> > undergraduate student from Indian Institute of Technology (IIT)
> Guwahati. I
> > am writing this email to you to express my interests in becoming a part
> of
> > *MLPACK* for the coming *Google Summer of Codes 2019.*
> >
> > I sincerely congratulate Mlpack for being accepted as a mentor
> organization
> > for the coming Google Summer of Codes 2019. I am interested in
> > reinforcement learning project for the coming year. In particular, I plan
> > to implement Rainbow and PPO for the coming coding season.
> >
> > My tentative schedule is present below,
> >
> > Week 1-6 : Implement different Rainbow DQN functions
> >
> > Week 6-10 : PPO Algorithm
> >
> > Week 11-12  Bug fixing and final submission.
> >
> > I believe it is really important to test any function/feature added to
> the
> > mlpack codebase. I have been working on RL and Mlpack for quite a long
> time
> > and I personally think it is difficult to reproduce result sometimes. It
> is
> > also a time taking procedure to stabilize statistical test results on
> > mlpack codebase. Hence I would like to go ahead with 2 algorithms so
> that I
> > get proper time to test the algorithms on different environments.
> >
> > Please let me know your valuable inputs to this short proposal. I will
> > definitely add the details of the project in my actual proposal.
>
> Hi Rohan,
>
> Thanks for the congratulations and we're happy to have you involved.
> Although I am not a reinforcement learning expert and I won't be the
> mentor for that project, I will at least say that two weeks set aside
> for 'bug fixing' is a bit vague---it's definitely hard to predict when
> you'll have bugs, but as you prepare your proposal I'd encourage you to
> spend a bit of time thinking about how you will write the tests to catch
> all potential bugs you might have during implementation.
>
> You're right that testing is a very important part, so often when I am
> reviewing proposals, I look for a lot of detail about how the proposed
> algorithm will be implemented and things of this nature.
>
> I hope this is helpful. :)
>
> Thanks!
>
> Ryan
>
> --
> Ryan Curtin    | "None of your mailman friends can hear you."
> ryan at ratml.org |   - Alpha
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20190327/82c2b513/attachment-0001.html>