[mlpack] Reinforcement Learning GSOC

Sahith D sahithdn at gmail.com
Tue Mar 13 15:09:27 EDT 2018


Hello Marcus,

I see we could definitely introduce a metric that is related, e.g. that
> counts
> the number of evaluations/iterations.
>

Yes that seems like a good metric. Though like I said before it might be a
bit redundant for some environments.


> I like both ideas, we should just make sure it is manageable, as you
> already
> pointed out the Advanced Policy Gradient method might take more time.
>

I'll start making a more concrete approach to both these methods


> Are you talking about the additional RL method?
>
> Not exactly. I was referring to the wrapper that we will be building
around OpenAI gym. So I was wondering whether we will integrate that into
the main mlpack repository or whether it'll be a completely separate
project.

I will start adding stuff to my proposal and send it to you for your
thoughts soon.

Thanks,
Sahith

> On 11. Mar 2018, at 04:54, Sahith D <sahithdn at gmail.com> wrote:
>
>
> Hello Marcus,
> Apologies for the long delay in my reply. I had my midsem examinations
> going on and was unable to respond.
>
> The time metric I had in mind was more related to how long the actual in
> game time is for which I think is independent of the system and is part of
> the environment itself. However I realized that most games already have a
> score that focuses on time so this might seem redundant.
>
> In one of your previous mails you mentioned we should initially focus on
> existing mlpack methods for the training. The only mlpack RL method
> currently present is a Q-Learning model from last year's GSOC which
> includes policies and also experience replays. While this is good for the
> basic environments in OpenAI we should implement at least one more method
> to supplement it.
>
> 1. Double- DQN could be a good fit as it just builds on top of the current
> method and hence would be the best to pursue
> 2. An advanced Policy Gradient method which would take more time but could
> also extend the number of environments that can be solved in the future.
>
> Also in regards building an API I would like to know whether you wanted to
> focus on building on top of the methods already present in mlpack and
> extend them as much as we can or build something from scratch but using the
> mlpack methods present whenever we need them.
>
> Thanks
>
>
>
> On Sat, Mar 3, 2018 at 5:39 PM Marcus Edel <marcus.edel at fu-berlin.de>
> wrote:
>
>> Hello Sahith,
>>
>> I'm not sure about the time metric, it might be meaningless if not run on
>> the
>> same or similar system. If we only compare our own methods, that should
>> be fine
>> through. The rest sounds reasonable to me.
>>
>> Best,
>> Marcus
>>
>> On 2. Mar 2018, at 22:34, Sahith D <sahithdn at gmail.com> wrote:
>>
>> Hi Marcus,
>>
>> Making pre-trained models sounds good however we'll have to pick the most
>> popular or easiest environments for this at least in the start.
>> For meaningful metrics other than iterations we could have use the
>> *score* of the game which is the best possible metric and also the *time*
>> it takes to reach that score. Depending on the environment, a low time or a
>> large time could be better. The user controlled parameters could also
>> include
>> 1. Exploration rate/ Exploration rate decay
>> 2. Learning rate
>> 3. Reward size
>> Perhaps a few more but these are essential.
>>
>> I like the idea of creating an API to upload results. We could include
>> the metrics that we've talked about and perhaps include a bit more like the
>> a recording that you mentioned possibly one where they can watch the agent
>> learn through each iteration and see it become better.
>>
>> Thanks,
>> Sahith
>>
>> On Fri, Mar 2, 2018 at 6:11 PM Marcus Edel <marcus.edel at fu-berlin.de>
>> wrote:
>>
>>> Hello Sahith,
>>>
>>> This looks very feasible along with being cool and intuitive. We could
>>> implement
>>> a system where a user who is a beginner can just choose an environment
>>> and input
>>> a particular pre-built methods and can compare different methods through
>>> visualizations and the actual emulation of the game environment. Other
>>> users can
>>> have more control and call only specific functions of the API which they
>>> need
>>> and can modify everything and these people would be the ones who would
>>> most
>>> benefit from a having leaderboard for comparison between other users on
>>> OpenAI
>>> gym.
>>>
>>>
>>> I think merging ideas from both sides is a neat idea; the first step
>>> should
>>> focus on the existing mlpack methods, provide pre-trained models for
>>> specific
>>> parameter sets and output some metrics, providing a recording of the
>>> environment
>>> is also a neat feature. Note the optimizer visualization allows a user
>>> to fine
>>> control the optimizer parameter, but only because the time to find a
>>> solution is
>>> low, in case of  RL methods we are talking about minutes or hours, so
>>> providing
>>> pretraining models is essential. If you like the idea, we should think
>>> about
>>> some meaningful metrics, besids number of iterations.
>>>
>>> For other frameworks, one idea is to provide an API to upload the
>>> results, based
>>> on the information, we could generate the metrics.
>>>
>>> Let me know what you think.
>>>
>>> Thanks,
>>> Marcus
>>>
>>> On 2. Mar 2018, at 13:08, Sahith D <sahithdn at gmail.com> wrote:
>>>
>>> Hi Marcus,
>>> This looks very feasible along with being cool and intuitive. We could
>>> implement a system where a user who is a beginner can just choose an
>>> environment and input a particular pre-built methods and can compare
>>> different methods through visualizations and the actual emulation of the
>>> game environment. Other users can have more control and call only specific
>>> functions of the API which they need and can modify everything and these
>>> people would be the ones who would most benefit from a having leaderboard
>>> for comparison between other users on OpenAI gym.
>>> Though I would like to know how in depth you would want this to be. The
>>> optimizer tutorial seems to have pretty much all the major optimizers
>>> currently being used. Do you think we should try something thats as
>>> extensive or just set up a framework for future contributors?
>>>
>>> Thanks,
>>> Sahith
>>>
>>> On Thu, Mar 1, 2018 at 3:35 PM Marcus Edel <marcus.edel at fu-berlin.de>
>>> wrote:
>>>
>>>> Hello Sahith,
>>>>
>>>> I like the idea, also since OpenAI abandoned the leaderboard this could
>>>> be a
>>>> great opportunity. I'm a fan of giving a user the opportunity to test
>>>> the
>>>> methods without much hassle, so one idea is to provide an interface for
>>>> the web,
>>>> that exposes a minimal set of settings, something like:
>>>>
>>>> www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html
>>>>
>>>> Let me know what you think, there are a bunch of interesting features,
>>>> that we
>>>> could look into, but we should make sure each is tangible and useful.
>>>>
>>>> Thanks,
>>>> Marcus
>>>>
>>>> On 28. Feb 2018, at 23:03, Sahith D <sahithdn at gmail.com> wrote:
>>>>
>>>> A playground type project sounds like a great idea. We could start with
>>>> using the current Q-Learning method already present in the mlpack
>>>> repository and then apply it to a environments in gym as a sort of
>>>> tutorial. We could then move onto more complex methods like Double
>>>> Q-Learning and Monte Carlo Tree Search (just suggestions) just to get
>>>> started so that more people will get encouraged to try their hand at
>>>> solving the environments in more creative ways using C++ as the python
>>>> community is already pretty strong. If we could build something of a
>>>> leaderboard similar to what OpenAI gym already has then it could foster a
>>>> creative community of people who want to try more RL. Does this sound good
>>>> or can it be improved upon?
>>>>
>>>> Thanks,
>>>> Sahith.
>>>>
>>>> On Wed, Feb 28, 2018 at 3:50 PM Marcus Edel <marcus.edel at fu-berlin.de>
>>>> wrote:
>>>>
>>>>> Hello Sahith,
>>>>>
>>>>> 1. We could implement all the fundamental RL algorithms like those
>>>>> over here
>>>>> https://github.com/dennybritz/reinforcement-learning . This
>>>>> repository contains
>>>>> nearly all the algorithms that are useful for RL according to David
>>>>> Silver's RL
>>>>> course. They're all currently in python so it could just be a matter
>>>>> of porting
>>>>> them over to use mlpack.
>>>>>
>>>>>
>>>>> I don't think implementing all the methods, is something we should
>>>>> pursue over
>>>>> the summer, writing the method itself and coming up with some
>>>>> meaningful tests
>>>>> takes time. Also, in my opinion instead of implementing all methods,
>>>>> we should
>>>>> pick methods that make sense in a specific context and make them as
>>>>> fast and
>>>>> easy to use as possible.
>>>>>
>>>>> 2. We could implement fewer algorithms but work more on solving the
>>>>> OpenAI gym
>>>>> environments using them. This would require tighter integration of the
>>>>> gym
>>>>> wrapper that you have already written. If enough environments can be
>>>>> solved then
>>>>> this could become a viable C++ library for comparing RL algorithms in
>>>>> the
>>>>> future.
>>>>>
>>>>>
>>>>> I like the idea, this could be a great way to present the RL
>>>>> infrastructure to a
>>>>> wider audience, in the form of a playground.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Thanks,
>>>>> Marcus
>>>>>
>>>>> On 27. Feb 2018, at 23:01, Sahith D <sahithdn at gmail.com> wrote:
>>>>>
>>>>> Hi Marcus,
>>>>> Sorry for not updating you earlier as I had some exams that I needed
>>>>> to finish first.
>>>>> I've been working on the policy gradient over in this repository which
>>>>> you can see over here https://github.com/SND96/mlpack-rl
>>>>> I also had some ideas on what this project could be about.
>>>>>
>>>>> 1. We could implement all the fundamental RL algorithms like those
>>>>> over here https://github.com/dennybritz/reinforcement-learning . This
>>>>> repository contains nearly all the algorithms that are useful for RL
>>>>> according to David Silver's RL course. They're all currently in python so
>>>>> it could just be a matter of porting them over to use mlpack.
>>>>> 2. We could implement fewer algorithms but work more on solving the
>>>>> OpenAI gym environments using them. This would require tighter integration
>>>>> of the gym wrapper that you have already written. If enough environments
>>>>> can be solved then this could become a viable C++ library for comparing RL
>>>>> algorithms in the future.
>>>>>
>>>>> Right now I'm working on the solving one of the environments in gym
>>>>> using a Deep Q-Learning approach similar to what is already there in the
>>>>> mlpack library from last year's gsoc. Its taking a bit longer than I hoped
>>>>> as I'm still familiarizing myself with some of the server calls being made
>>>>> and how to properly get information about the environements. Would
>>>>> appreciate your thoughts on the ideas that I have and anything else that
>>>>> you had in mind.
>>>>>
>>>>> Thanks!
>>>>> Sahith
>>>>>
>>>>> On Fri, Feb 23, 2018 at 1:50 PM Sahith D <sahithdn at gmail.com> wrote:
>>>>>
>>>>>> Hi Marcus,
>>>>>> I've been having difficulties compiling mlpack which has stalled my
>>>>>> progress. I've opened an issue on the same and appreciate any help.
>>>>>>
>>>>>> On Thu, Feb 22, 2018 at 10:09 AM Sahith D <sahithdn at gmail.com> wrote:
>>>>>>
>>>>>>> Hey Marcus,
>>>>>>> No problem with the slow response as I was familiarizing myself
>>>>>>> better with the codebase and the methods present in the meantime. I'll
>>>>>>> start working on what you mentioned. I'll notify you when I finish.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> On Thu, Feb 22, 2018 at 4:56 AM Marcus Edel <
>>>>>>> marcus.edel at fu-berlin.de> wrote:
>>>>>>>
>>>>>>>> Hello Sahith,
>>>>>>>>
>>>>>>>> thanks for getting in touch and sorry for the slow response.
>>>>>>>>
>>>>>>>> > My name is Sahith. I've been working on Reinforcement Learning
>>>>>>>> for the past year
>>>>>>>> > and am interested in coding with mlpack on the RL project for
>>>>>>>> this summer. I've
>>>>>>>> > been going through the codebase and have managed to get the Open
>>>>>>>> AI gym api up
>>>>>>>> > and running on my computer. Is there any other specific task I
>>>>>>>> can do while I
>>>>>>>> > get to know more of the codebase?
>>>>>>>>
>>>>>>>> Great that you got it all working, another good entry point is to
>>>>>>>> write a simple
>>>>>>>> RL method, one method that is simple that comes to mind is the
>>>>>>>> Policy Gradients
>>>>>>>> method. Another idea is to write an example for solving a GYM
>>>>>>>> environment with
>>>>>>>> the existing codebase, something in the vein of the Kaggel Digit
>>>>>>>> Recognizer
>>>>>>>> Eugene wrote
>>>>>>>> (
>>>>>>>> https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer
>>>>>>>> ).
>>>>>>>>
>>>>>>>> Let me know if I should clarify anything.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Marcus
>>>>>>>>
>>>>>>>> > On 19. Feb 2018, at 20:41, Sahith D <sahithdn at gmail.com> wrote:
>>>>>>>> >
>>>>>>>> > Hello Marcus,
>>>>>>>> > My name is Sahith. I've been working on Reinforcement Learning
>>>>>>>> for the past year and am interested in coding with mlpack on the RL project
>>>>>>>> for this summer. I've been going through the codebase and have managed to
>>>>>>>> get the Open AI gym api up and running on my computer. Is there any other
>>>>>>>> specific task I can do while I get to know more of the codebase?
>>>>>>>> > Thanks!
>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180313/cb1ab49f/attachment-0001.html>


More information about the mlpack mailing list