[mlpack] Reinforcement learning GSOC' 18

ROHAN SAPHAL rohansaphal at gmail.com
Fri Mar 2 14:44:54 EST 2018


Hi Marcus,

Hoping to discuss exactly how to shape the proposal keeping in mind the
interest of the organization.
Although implementing state-of-the-art algorithms is the main focus. I
would also like to propose on building a simulator for time series data.
The simulator could allow users to introduce latency into the system,
used for backtesting, etc.. This would allow users to also test the
simulator with deep learning algorithms, thus widening the reach for people
to try out new things.
As part of my proposal, I intend to focus on the following areas :

   1. Building a time series simulator so that algorithms can be tested
   easily and benchmarked. For the project, cryptocurrency data can be used
   since its openly available.
   2. Reinforcement learning algorithms (one among the three):


   - Recurrent reinforcement learning algorithm with discrete state and
   action space
   - Multi-agent reinforcement learning algorithm with discrete state and
   action space
   - Dyna-Q algorithm with discrete state and action space

Of the three types of algorithms present, I would like to know what is of
interest to the organization. Based on that, I can elaborate on the exact
algorithm that can be implemented.

I have been reading through the threads and you have asked for
implementation of policy gradient. I would like to do the same, though I
have seen that a few have already working on it. Please suggest if I should
proceed with implementation of policy gradient or maybe something new like
Doble DQN.

Hope to hear from you soon.

Thanks for your time.

Regards,

Rohan Saphal
Graduate Technical Intern
Intel Labs










‌

On Tue, Feb 27, 2018 at 3:39 AM, Marcus Edel <marcus.edel at fu-berlin.de>
wrote:

> Hello Rohan,
>
> I apologize for having used mail track. I was unaware that it was part of
> the
> mail. The previous mails i had sent were without them and have no
> intention of
> tracking who is reading the mail.
>
>
> No worries, just wanted to point this out.
>
> With regards to the proposal, how would you like me to proceed and what
> can i do
> to increase my chances of getting the proposal accepted.
>
>
> If you like you can go through the list of open issues on Github and maybe
> you
> find something interesting, also you can always pick an existing method and
> think about how it could be improved or extended, but don't feel obligated.
>
> Thanks,
> Marcus
>
> On 26. Feb 2018, at 06:36, ROHAN SAPHAL <rohansaphal at gmail.com> wrote:
>
> Hi Marcus,
>
> I apologize for having used mail track. I was unaware that it was part of
> the mail. The previous mails i had sent were without them and have no
> intention of tracking who is reading the mail.
>
> With regards to the proposal, how would you like me to proceed and what
> can i do to increase my chances of getting the proposal accepted.
>
> Regards,
>
> Rohan Saphal
>
> On Sun, Feb 25, 2018 at 7:03 PM, Marcus Edel <marcus.edel at fu-berlin.de>
> wrote:
>
>> Hello Rohan,
>>
>> The competition was part of the coursework. It was a classification
>> problem. The
>> data we were given had 2500 features, 10,000 samples and 20 classes that
>> we had
>> to classify it into. This was a hand engineered dataset by our teaching
>> assistants.The data also had missing information. Some of the competition
>> winning algorithms like Xgboost failed to perform well. I won the
>> competition by
>> using SVM and Neural networks.
>>
>>
>> Wow, I expected that Xgboost would provide some reasonably good results.
>>
>> However, using RL for trading has its own advantages. It can learn
>> directly from
>> a simulation environment and will be able to adapt to the latency of the
>> environment because it will be receiving negative reward during the
>> latency
>> period. This is however not possible by using deep learning techniques
>> which
>> cannot work around this latency. Also complex policies can be learn by
>> the agent
>> using deep learning techniques that can't be learned by humans.
>>
>>
>> Thanks for the informations, sounds like you already put some time into
>> the
>> idea, I'll see if I can take a closer look at the papers in the next days.
>>
>> Sent with MailTrack
>>
>>
>> Tracking who is reading your emails without their consent is unethical.
>> Please consider not using software like this.
>>
>> Thanks,
>> Marcus
>>
>>
>>
>>
>> On 25. Feb 2018, at 06:12, ROHAN SAPHAL <rohansaphal at gmail.com> wrote:
>>
>> Hi Marcus,
>>
>> Sorry for the delayed reply. I was traveling the past two days.
>>
>> That sounds really cool, what kind of competition was that?
>>
>> The competition was part of the coursework. It was a classification
>> problem. The data we were given had 2500 features, 10,000 samples and 20
>> classes that we had to classify it into. This was a hand engineered dataset
>> by our teaching assistants.The data also had missing information. Some of
>> the competition winning algorithms like Xgboost failed to perform well. I
>> won the competition by using SVM and Neural networks.
>>
>> The idea sounds interesting, do you have some particular methods/papers in mind
>> you like to work on since the methods listed on the ideas page are just
>> suggestions this is could be a GSoC project.
>>
>>
>> The two most frequently seen algorithms that have been used is Q-learning
>> and recurrent reinforcement learning. Firstly, I want to mention the
>> challenges in the trading domain,
>>
>>    - Environment: The trading system is a POMDP and consists of multiple
>>    other trading agents. We can have two approaches here, we can assume other
>>    agents to be part of the environment and then allow the agent to learn in
>>    that environment. Otherwise, we could take a multi-agent approach where we
>>    try to reverse-engineer the trading strategies of other agents and then
>>    learn to exploit them. This moves into  the multi-agent RL domain which is
>>    currently an ongoing research field.
>>    - Action spaces: We can have our agent take discrete action space,
>>    which is basically 3 cations of buy, hold or sell. Increasing the
>>    complexity, we can have the agent have the amount to be invested which is a
>>    continous action space. Further we could also have the agent know when to
>>    place the orders and in how much quantity which makes it much more complex.
>>    Having this level of complexity is required if needed to make profits on a
>>    regular basis.
>>    - Reward function: Although it may seem intuitive to feed the
>>    profit/loss as the reward function, it may not be a good idea as this
>>    reward is sparse and is not very frequent. We could also feed unrealized
>>    profit/loss also which is not so sparse but allows the agent to lean to
>>    trade profitably. This however causes bias towards the agent when an actual
>>    profit/loss is made. The other possibility is to choose a reward function
>>    that tends to reduce the risk involved like sharpe ratio or maximum
>>    drawdown. We might have to choose multiple reward functions to trade off
>>    between profit and risk.
>>
>> However, using RL for trading has its own advantages. It can learn
>> directly from a simulation environment and will be able to adapt to the
>> latency of the environment because it will be receiving negative reward
>> during the latency period. This is however not possible by using deep
>> learning techniques which cannot work around this latency. Also complex
>> policies can be learn by the agent using deep learning techniques that
>> can't be learned by humans.
>>
>> I feel a good starting point would be to implement state of the art
>> recurrent reinforcement learning algorithm and then improve on it by
>> incorporating multiple agents, continuous action spaces,etc. Hoping to hear
>> suggestions from mentors.
>>
>> PFA some relevant papers.
>>
>>
>> Regards,
>>
>> Rohan Saphal
>>
>>
>>
>>
>>>> <https://mailtrack.io/> Sent with Mailtrack
>> <https://chrome.google.com/webstore/detail/mailtrack-for-gmail-inbox/ndnaehgpjlnokgebbaldlmgkapkpjkkb?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality>
>>
>> On Tue, Feb 20, 2018 at 11:35 PM, ROHAN SAPHAL <rohansaphal at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am Rohan Saphal, a pre-final year undergraduate from Indian Institute
>>> of Technology Madras.
>>>
>>> My research interest is in Artificial Intelligence and specifically in
>>> Deep reinforcement learning.
>>> I have been working with  Prof. Balaraman Ravindran
>>> <https://scholar.google.co.in/citations?user=nGUcGrYAAAAJ&hl=en> in
>>> Multi-agent reinforcement learning and will continue to do my final degree
>>> thesis project under his guidance.
>>> I am currently a graduate research intern at Intel labs working on
>>> Reinforcement learning.
>>> Previously, I was a computer vision intern at Caterpillar Inc. As part
>>> of the machine learning course,  a competition was organized among the
>>> students and i have secured 1st place in that competition
>>> <https://www.kaggle.com/c/iitm-cs4011/leaderboard>
>>> I am familiar with deep learning and have completed the fast.ai MOOC
>>> course along with course offered at our Institute.
>>>
>>> I have read the papers related to the the reinforcement learning
>>> algorithms mentioned in the ideas page. I am interested to work in the
>>> reinforcement learning module.
>>>
>>> I have compiled mlpack from source and an looking at the code structure
>>> of the reinforcement learning module. I am unable to find any tickets
>>> presently and hoping that someone could direct me as to how to proceed.
>>>
>>> I have been interested to use reinforcement learning for equity trading
>>> and  recurrent reinforcement learning algorithms has interested me. I
>>> believe the stock market is a good environment (POMDP) to test and evaluate
>>> the performance of such algorithms as it is a highly challenging setting.
>>> There are so many agents that are involved in the environment and i feel to
>>> develop reinforcement learning algorithms that could trade efficiently in
>>> such a setting will be an interesting problem.Deep learning algorithms like
>>> LSTM, cannot capture the latency involved in the system and hence cannot
>>> make real time predictions. Reinforcement learning algorithms could however
>>> learn how to interact under the latency constraint to make real time
>>> predictions. Some areas that i see work in this area is to:
>>>
>>>    - Implement latest work(s) in multi-agent reinforcement learning
>>>    algorithm
>>>    - Implement Recurrent reinforcement learning algorithm(s) that
>>>    capture temporal nature of the environment. Modifications can be made to
>>>    existing work.
>>>
>>> I would like to hear suggestions from mentors what they feel about the
>>> idea suggested and if it seems like an acceptable project to suggest for
>>> GSOC.
>>>
>>> Thanks for your time
>>>
>>> Hope to hear from you soon. Feel free to ask for any more details about
>>> me or my work.
>>>
>>> Regards,
>>>
>>> Rohan Saphal
>>>
>>>
>> <rrl.pdf><RRL .pdf><07376685.pdf><LvDuZhai.pdf><SSRN-id2594477.pdf>
>> _______________________________________________
>> mlpack mailing list
>> mlpack at lists.mlpack.org
>> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180303/951b2ab4/attachment-0001.html>


More information about the mlpack mailing list