[mlpack] Reinforcement Learning GSOC

Tue Mar 13 19:11:54 EDT 2018

Hello Sahith,

> Not exactly. I was referring to the wrapper that we will be building around
> OpenAI gym. So I was wondering whether we will integrate that into the main
> mlpack repository or whether it'll be a completely separate project

I think we should outsource the wrapper, perhaps someone finds it useful for
another project; but it's not a completely separate project, every RL method in
mlpack already follows the OpenAI gym interface (action, step), so the
integration is straightforward.

Let me know if I should clarify anything.

Thanks,
Marcus

> On 13. Mar 2018, at 20:09, Sahith D <sahithdn at gmail.com> wrote:
> 
> Hello Marcus,
> 
> I see we could definitely introduce a metric that is related, e.g. that counts
> the number of evaluations/iterations.
> 
> Yes that seems like a good metric. Though like I said before it might be a bit redundant for some environments.
>  
> I like both ideas, we should just make sure it is manageable, as you already
> pointed out the Advanced Policy Gradient method might take more time.
> 
> I'll start making a more concrete approach to both these methods
>  
> Are you talking about the additional RL method?
> 
> Not exactly. I was referring to the wrapper that we will be building around OpenAI gym. So I was wondering whether we will integrate that into the main mlpack repository or whether it'll be a completely separate project.
> 
> I will start adding stuff to my proposal and send it to you for your thoughts soon.
> 
> Thanks,
> Sahith 
>> On 11. Mar 2018, at 04:54, Sahith D <sahithdn at gmail.com <mailto:sahithdn at gmail.com>> wrote:
> 
>> 
>> Hello Marcus,
>> Apologies for the long delay in my reply. I had my midsem examinations going on and was unable to respond.
>> 
>> The time metric I had in mind was more related to how long the actual in game time is for which I think is independent of the system and is part of the environment itself. However I realized that most games already have a score that focuses on time so this might seem redundant.
>> 
>> In one of your previous mails you mentioned we should initially focus on existing mlpack methods for the training. The only mlpack RL method currently present is a Q-Learning model from last year's GSOC which includes policies and also experience replays. While this is good for the basic environments in OpenAI we should implement at least one more method to supplement it.
>> 
>> 1. Double- DQN could be a good fit as it just builds on top of the current method and hence would be the best to pursue
>> 2. An advanced Policy Gradient method which would take more time but could also extend the number of environments that can be solved in the future.
>> 
>> Also in regards building an API I would like to know whether you wanted to focus on building on top of the methods already present in mlpack and extend them as much as we can or build something from scratch but using the mlpack methods present whenever we need them.
>> 
>> Thanks
>> 
>> 
>> 
>> On Sat, Mar 3, 2018 at 5:39 PM Marcus Edel <marcus.edel at fu-berlin.de <mailto:marcus.edel at fu-berlin.de>> wrote:
>> Hello Sahith,
>> 
>> I'm not sure about the time metric, it might be meaningless if not run on the
>> same or similar system. If we only compare our own methods, that should be fine
>> through. The rest sounds reasonable to me.
>> 
>> Best,
>> Marcus
>> 
>>> On 2. Mar 2018, at 22:34, Sahith D <sahithdn at gmail.com <mailto:sahithdn at gmail.com>> wrote:
>>> 
>>> Hi Marcus,
>>> 
>>> Making pre-trained models sounds good however we'll have to pick the most popular or easiest environments for this at least in the start. 
>>> For meaningful metrics other than iterations we could have use the score of the game which is the best possible metric and also the time it takes to reach that score. Depending on the environment, a low time or a large time could be better. The user controlled parameters could also include
>>> 1. Exploration rate/ Exploration rate decay
>>> 2. Learning rate 
>>> 3. Reward size
>>> Perhaps a few more but these are essential.
>>> 
>>> I like the idea of creating an API to upload results. We could include the metrics that we've talked about and perhaps include a bit more like the a recording that you mentioned possibly one where they can watch the agent learn through each iteration and see it become better.
>>> 
>>> Thanks,
>>> Sahith  
>>> 
>>> On Fri, Mar 2, 2018 at 6:11 PM Marcus Edel <marcus.edel at fu-berlin.de <mailto:marcus.edel at fu-berlin.de>> wrote:
>>> Hello Sahith,
>>> 
>>>> This looks very feasible along with being cool and intuitive. We could implement
>>>> a system where a user who is a beginner can just choose an environment and input
>>>> a particular pre-built methods and can compare different methods through
>>>> visualizations and the actual emulation of the game environment. Other users can
>>>> have more control and call only specific functions of the API which they need
>>>> and can modify everything and these people would be the ones who would most
>>>> benefit from a having leaderboard for comparison between other users on OpenAI
>>>> gym.
>>> 
>>> I think merging ideas from both sides is a neat idea; the first step should
>>> focus on the existing mlpack methods, provide pre-trained models for specific
>>> parameter sets and output some metrics, providing a recording of the environment
>>> is also a neat feature. Note the optimizer visualization allows a user to fine
>>> control the optimizer parameter, but only because the time to find a solution is
>>> low, in case of  RL methods we are talking about minutes or hours, so providing
>>> pretraining models is essential. If you like the idea, we should think about
>>> some meaningful metrics, besids number of iterations.
>>> 
>>> For other frameworks, one idea is to provide an API to upload the results, based
>>> on the information, we could generate the metrics.
>>> 
>>> Let me know what you think.
>>> 
>>> Thanks,
>>> Marcus
>>> 
>>>> On 2. Mar 2018, at 13:08, Sahith D <sahithdn at gmail.com <mailto:sahithdn at gmail.com>> wrote:
>>>> 
>>>> Hi Marcus,
>>>> This looks very feasible along with being cool and intuitive. We could implement a system where a user who is a beginner can just choose an environment and input a particular pre-built methods and can compare different methods through visualizations and the actual emulation of the game environment. Other users can have more control and call only specific functions of the API which they need and can modify everything and these people would be the ones who would most benefit from a having leaderboard for comparison between other users on OpenAI gym.
>>>> Though I would like to know how in depth you would want this to be. The optimizer tutorial seems to have pretty much all the major optimizers currently being used. Do you think we should try something thats as extensive or just set up a framework for future contributors? 
>>>> 
>>>> Thanks,
>>>> Sahith
>>>> 
>>>> On Thu, Mar 1, 2018 at 3:35 PM Marcus Edel <marcus.edel at fu-berlin.de <mailto:marcus.edel at fu-berlin.de>> wrote:
>>>> Hello Sahith,
>>>> 
>>>> I like the idea, also since OpenAI abandoned the leaderboard this could be a
>>>> great opportunity. I'm a fan of giving a user the opportunity to test the
>>>> methods without much hassle, so one idea is to provide an interface for the web,
>>>> that exposes a minimal set of settings, something like:
>>>> 
>>>> www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html <http://www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html>
>>>> 
>>>> Let me know what you think, there are a bunch of interesting features, that we
>>>> could look into, but we should make sure each is tangible and useful.
>>>> 
>>>> Thanks,
>>>> Marcus
>>>> 
>>>>> On 28. Feb 2018, at 23:03, Sahith D <sahithdn at gmail.com <mailto:sahithdn at gmail.com>> wrote:
>>>>> 
>>>>> A playground type project sounds like a great idea. We could start with using the current Q-Learning method already present in the mlpack repository and then apply it to a environments in gym as a sort of tutorial. We could then move onto more complex methods like Double Q-Learning and Monte Carlo Tree Search (just suggestions) just to get started so that more people will get encouraged to try their hand at solving the environments in more creative ways using C++ as the python community is already pretty strong. If we could build something of a leaderboard similar to what OpenAI gym already has then it could foster a creative community of people who want to try more RL. Does this sound good or can it be improved upon?
>>>>> 
>>>>> Thanks,
>>>>> Sahith.
>>>>> 
>>>>> On Wed, Feb 28, 2018 at 3:50 PM Marcus Edel <marcus.edel at fu-berlin.de <mailto:marcus.edel at fu-berlin.de>> wrote:
>>>>> Hello Sahith,
>>>>> 
>>>>>> 1. We could implement all the fundamental RL algorithms like those over here
>>>>>> https://github.com/dennybritz/reinforcement-learning <https://github.com/dennybritz/reinforcement-learning> . This repository contains
>>>>>> nearly all the algorithms that are useful for RL according to David Silver's RL
>>>>>> course. They're all currently in python so it could just be a matter of porting
>>>>>> them over to use mlpack.
>>>>> 
>>>>> I don't think implementing all the methods, is something we should pursue over
>>>>> the summer, writing the method itself and coming up with some meaningful tests
>>>>> takes time. Also, in my opinion instead of implementing all methods, we should
>>>>> pick methods that make sense in a specific context and make them as fast and
>>>>> easy to use as possible.
>>>>> 
>>>>>> 2. We could implement fewer algorithms but work more on solving the OpenAI gym
>>>>>> environments using them. This would require tighter integration of the gym
>>>>>> wrapper that you have already written. If enough environments can be solved then
>>>>>> this could become a viable C++ library for comparing RL algorithms in the
>>>>>> future.
>>>>> 
>>>>> I like the idea, this could be a great way to present the RL infrastructure to a
>>>>> wider audience, in the form of a playground.
>>>>> 
>>>>> Let me know what you think.
>>>>> 
>>>>> Thanks,
>>>>> Marcus
>>>>> 
>>>>>> On 27. Feb 2018, at 23:01, Sahith D <sahithdn at gmail.com <mailto:sahithdn at gmail.com>> wrote:
>>>>>> 
>>>>>> Hi Marcus,
>>>>>> Sorry for not updating you earlier as I had some exams that I needed to finish first.
>>>>>> I've been working on the policy gradient over in this repository which you can see over here https://github.com/SND96/mlpack-rl <https://github.com/SND96/mlpack-rl>
>>>>>> I also had some ideas on what this project could be about.
>>>>>> 
>>>>>> 1. We could implement all the fundamental RL algorithms like those over here https://github.com/dennybritz/reinforcement-learning <https://github.com/dennybritz/reinforcement-learning> . This repository contains nearly all the algorithms that are useful for RL according to David Silver's RL course. They're all currently in python so it could just be a matter of porting them over to use mlpack. 
>>>>>> 2. We could implement fewer algorithms but work more on solving the OpenAI gym environments using them. This would require tighter integration of the gym wrapper that you have already written. If enough environments can be solved then this could become a viable C++ library for comparing RL algorithms in the future.
>>>>>> 
>>>>>> Right now I'm working on the solving one of the environments in gym using a Deep Q-Learning approach similar to what is already there in the mlpack library from last year's gsoc. Its taking a bit longer than I hoped as I'm still familiarizing myself with some of the server calls being made and how to properly get information about the environements. Would appreciate your thoughts on the ideas that I have and anything else that you had in mind.
>>>>>> 
>>>>>> Thanks!
>>>>>> Sahith
>>>>>> 
>>>>>> On Fri, Feb 23, 2018 at 1:50 PM Sahith D <sahithdn at gmail.com <mailto:sahithdn at gmail.com>> wrote:
>>>>>> Hi Marcus,
>>>>>> I've been having difficulties compiling mlpack which has stalled my progress. I've opened an issue on the same and appreciate any help.
>>>>>> 
>>>>>> On Thu, Feb 22, 2018 at 10:09 AM Sahith D <sahithdn at gmail.com <mailto:sahithdn at gmail.com>> wrote:
>>>>>> Hey Marcus,
>>>>>> No problem with the slow response as I was familiarizing myself better with the codebase and the methods present in the meantime. I'll start working on what you mentioned. I'll notify you when I finish.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> On Thu, Feb 22, 2018 at 4:56 AM Marcus Edel <marcus.edel at fu-berlin.de <mailto:marcus.edel at fu-berlin.de>> wrote:
>>>>>> Hello Sahith,
>>>>>> 
>>>>>> thanks for getting in touch and sorry for the slow response.
>>>>>> 
>>>>>> > My name is Sahith. I've been working on Reinforcement Learning for the past year
>>>>>> > and am interested in coding with mlpack on the RL project for this summer. I've
>>>>>> > been going through the codebase and have managed to get the Open AI gym api up
>>>>>> > and running on my computer. Is there any other specific task I can do while I
>>>>>> > get to know more of the codebase?
>>>>>> 
>>>>>> Great that you got it all working, another good entry point is to write a simple
>>>>>> RL method, one method that is simple that comes to mind is the Policy Gradients
>>>>>> method. Another idea is to write an example for solving a GYM environment with
>>>>>> the existing codebase, something in the vein of the Kaggel Digit Recognizer
>>>>>> Eugene wrote
>>>>>> (https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer <https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer>).
>>>>>> 
>>>>>> Let me know if I should clarify anything.
>>>>>> 
>>>>>> Thanks,
>>>>>> Marcus
>>>>>> 
>>>>>> > On 19. Feb 2018, at 20:41, Sahith D <sahithdn at gmail.com <mailto:sahithdn at gmail.com>> wrote:
>>>>>> >
>>>>>> > Hello Marcus,
>>>>>> > My name is Sahith. I've been working on Reinforcement Learning for the past year and am interested in coding with mlpack on the RL project for this summer. I've been going through the codebase and have managed to get the Open AI gym api up and running on my computer. Is there any other specific task I can do while I get to know more of the codebase?
>>>>>> > Thanks!
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180314/577581fe/attachment-0001.html>