[mlpack] Greetings GSOC 19 : Idea Reinforcement Learning

Tue Apr 2 11:31:42 EDT 2019

Dear Marcus,

Could you please let me know if I should try to implement the second method
aswell for gsoc proposal? The blog (
https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas#reinforcement-learning)
doesn't really give the idea whether parameter sharing is the objective of
the project or not?

Thanks,

Rohan Raj
Indian Institute of Technology Guwahati
Assam , India
Phone : +91 8723990557

On Tue, 2 Apr 2019 at 20:21, Marcus Edel <marcus.edel at fu-berlin.de> wrote:

> Hello Rohan,
>
> sorry for the slow response, the first approach is just fine, for the
> second
> approach, we could use the sequential layer which is basically a
> feedforward
> network but exposes the layer interface. Anyway, as I said I think the
> first
> approach might have some advantages.
>
> Thanks,
> Marcus
>
> On 2. Apr 2019, at 16:24, Rohan Raj <rajrohan1108 at gmail.com> wrote:
>
> Dear Ryan and Marcus,
>
> Please answer my doubts sent in my previous mail as soon as you get time.
>
> Thanks,
>
> Rohan Raj
> Indian Institute of Technology Guwahati
> Assam , India
> Phone : +91 8723990557
>
>
>
> On Wed, 27 Mar 2019 at 10:10, Rohan Raj <rajrohan1108 at gmail.com> wrote:
>
>> Hello all,
>>
>> Apologies for the delay in reply. I have started writing the proposal for
>> the coming GSOC year. I sincerely wanted to know a few things from the
>> authors. For the PPO Reinforcement Learning algorithm, we can either have 2
>> different neural networks for policy and value estimation or club these
>> into a single model with different outputs (as openai baselines or
>> deepmind). The first option is approachable in MlPack. However, I am
>> confused with the second approach. I feel that the following lines (
>> https://github.com/mlpack/mlpack/blob/2635297c8793396e57469bc731451fbe18bed656/src/mlpack/methods/ann/layer/add_merge.hpp#L127-L128)
>> might be helpful for the purpose, however, I am not completely sure.
>>
>> Could you please let me know how we can achieve the parameter sharing in
>> mlpack?
>>
>> Thanks,
>>
>> Rohan Raj
>> Indian Institute of Technology Guwahati
>> Assam , India
>> Phone : +91 8723990557
>>
>>
>> ᐧ
>>
>> On Mon, 11 Mar 2019 at 01:11, Ryan Curtin <ryan at ratml.org> wrote:
>>
>>> On Fri, Mar 08, 2019 at 04:31:55AM +0530, Rohan Raj wrote:
>>> > Hello Ryan, Marcus and fellow contributors of MLPACK,
>>> >
>>> > I am Rohan Raj (Github : mirraaj) <https://github.com/mirraaj>,
>>> > undergraduate student from Indian Institute of Technology (IIT)
>>> Guwahati. I
>>> > am writing this email to you to express my interests in becoming a
>>> part of
>>> > *MLPACK* for the coming *Google Summer of Codes 2019.*
>>> >
>>> > I sincerely congratulate Mlpack for being accepted as a mentor
>>> organization
>>> > for the coming Google Summer of Codes 2019. I am interested in
>>> > reinforcement learning project for the coming year. In particular, I
>>> plan
>>> > to implement Rainbow and PPO for the coming coding season.
>>> >
>>> > My tentative schedule is present below,
>>> >
>>> > Week 1-6 : Implement different Rainbow DQN functions
>>> >
>>> > Week 6-10 : PPO Algorithm
>>> >
>>> > Week 11-12  Bug fixing and final submission.
>>> >
>>> > I believe it is really important to test any function/feature added to
>>> the
>>> > mlpack codebase. I have been working on RL and Mlpack for quite a long
>>> time
>>> > and I personally think it is difficult to reproduce result sometimes.
>>> It is
>>> > also a time taking procedure to stabilize statistical test results on
>>> > mlpack codebase. Hence I would like to go ahead with 2 algorithms so
>>> that I
>>> > get proper time to test the algorithms on different environments.
>>> >
>>> > Please let me know your valuable inputs to this short proposal. I will
>>> > definitely add the details of the project in my actual proposal.
>>>
>>> Hi Rohan,
>>>
>>> Thanks for the congratulations and we're happy to have you involved.
>>> Although I am not a reinforcement learning expert and I won't be the
>>> mentor for that project, I will at least say that two weeks set aside
>>> for 'bug fixing' is a bit vague---it's definitely hard to predict when
>>> you'll have bugs, but as you prepare your proposal I'd encourage you to
>>> spend a bit of time thinking about how you will write the tests to catch
>>> all potential bugs you might have during implementation.
>>>
>>> You're right that testing is a very important part, so often when I am
>>> reviewing proposals, I look for a lot of detail about how the proposed
>>> algorithm will be implemented and things of this nature.
>>>
>>> I hope this is helpful. :)
>>>
>>> Thanks!
>>>
>>> Ryan
>>>
>>> --
>>> Ryan Curtin    | "None of your mailman friends can hear you."
>>> ryan at ratml.org |   - Alpha
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20190402/d2b57775/attachment-0001.html>