[mlpack] Reinforcement learning GSOC' 18

Sun Feb 25 00:12:43 EST 2018

Hi Marcus,

Sorry for the delayed reply. I was traveling the past two days.

That sounds really cool, what kind of competition was that?

The competition was part of the coursework. It was a classification
problem. The data we were given had 2500 features, 10,000 samples and 20
classes that we had to classify it into. This was a hand engineered dataset
by our teaching assistants.The data also had missing information. Some of
the competition winning algorithms like Xgboost failed to perform well. I
won the competition by using SVM and Neural networks.

The idea sounds interesting, do you have some particular methods/papers in mind
you like to work on since the methods listed on the ideas page are just
suggestions this is could be a GSoC project.

The two most frequently seen algorithms that have been used is Q-learning
and recurrent reinforcement learning. Firstly, I want to mention the
challenges in the trading domain,

   - Environment: The trading system is a POMDP and consists of multiple
   other trading agents. We can have two approaches here, we can assume other
   agents to be part of the environment and then allow the agent to learn in
   that environment. Otherwise, we could take a multi-agent approach where we
   try to reverse-engineer the trading strategies of other agents and then
   learn to exploit them. This moves into  the multi-agent RL domain which is
   currently an ongoing research field.
   - Action spaces: We can have our agent take discrete action space, which
   is basically 3 cations of buy, hold or sell. Increasing the complexity, we
   can have the agent have the amount to be invested which is a continous
   action space. Further we could also have the agent know when to place the
   orders and in how much quantity which makes it much more complex. Having
   this level of complexity is required if needed to make profits on a regular
   basis.
   - Reward function: Although it may seem intuitive to feed the
   profit/loss as the reward function, it may not be a good idea as this
   reward is sparse and is not very frequent. We could also feed unrealized
   profit/loss also which is not so sparse but allows the agent to lean to
   trade profitably. This however causes bias towards the agent when an actual
   profit/loss is made. The other possibility is to choose a reward function
   that tends to reduce the risk involved like sharpe ratio or maximum
   drawdown. We might have to choose multiple reward functions to trade off
   between profit and risk.

However, using RL for trading has its own advantages. It can learn directly
from a simulation environment and will be able to adapt to the latency of
the environment because it will be receiving negative reward during the
latency period. This is however not possible by using deep learning
techniques which cannot work around this latency. Also complex policies can
be learn by the agent using deep learning techniques that can't be learned
by humans.

I feel a good starting point would be to implement state of the art
recurrent reinforcement learning algorithm and then improve on it by
incorporating multiple agents, continuous action spaces,etc. Hoping to hear
suggestions from mentors.

PFA some relevant papers.

Regards,

Rohan Saphal

‌
<https://mailtrack.io/> Sent with Mailtrack
<https://chrome.google.com/webstore/detail/mailtrack-for-gmail-inbox/ndnaehgpjlnokgebbaldlmgkapkpjkkb?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality>

On Tue, Feb 20, 2018 at 11:35 PM, ROHAN SAPHAL <rohansaphal at gmail.com>
wrote:

> Hi,
>
> I am Rohan Saphal, a pre-final year undergraduate from Indian Institute of
> Technology Madras.
>
> My research interest is in Artificial Intelligence and specifically in
> Deep reinforcement learning.
> I have been working with  Prof. Balaraman Ravindran
> <https://scholar.google.co.in/citations?user=nGUcGrYAAAAJ&hl=en> in
> Multi-agent reinforcement learning and will continue to do my final degree
> thesis project under his guidance.
> I am currently a graduate research intern at Intel labs working on
> Reinforcement learning.
> Previously, I was a computer vision intern at Caterpillar Inc. As part of
> the machine learning course,  a competition was organized among the
> students and i have secured 1st place in that competition
> <https://www.kaggle.com/c/iitm-cs4011/leaderboard>
> I am familiar with deep learning and have completed the fast.ai MOOC
> course along with course offered at our Institute.
>
> I have read the papers related to the the reinforcement learning
> algorithms mentioned in the ideas page. I am interested to work in the
> reinforcement learning module.
>
> I have compiled mlpack from source and an looking at the code structure of
> the reinforcement learning module. I am unable to find any tickets
> presently and hoping that someone could direct me as to how to proceed.
>
> I have been interested to use reinforcement learning for equity trading
> and  recurrent reinforcement learning algorithms has interested me. I
> believe the stock market is a good environment (POMDP) to test and evaluate
> the performance of such algorithms as it is a highly challenging setting.
> There are so many agents that are involved in the environment and i feel to
> develop reinforcement learning algorithms that could trade efficiently in
> such a setting will be an interesting problem.Deep learning algorithms like
> LSTM, cannot capture the latency involved in the system and hence cannot
> make real time predictions. Reinforcement learning algorithms could however
> learn how to interact under the latency constraint to make real time
> predictions. Some areas that i see work in this area is to:
>
>    - Implement latest work(s) in multi-agent reinforcement learning
>    algorithm
>    - Implement Recurrent reinforcement learning algorithm(s) that capture
>    temporal nature of the environment. Modifications can be made to existing
>    work.
>
> I would like to hear suggestions from mentors what they feel about the
> idea suggested and if it seems like an acceptable project to suggest for
> GSOC.
>
> Thanks for your time
>
> Hope to hear from you soon. Feel free to ask for any more details about me
> or my work.
>
> Regards,
>
> Rohan Saphal
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180225/67b11a7f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rrl.pdf
Type: application/pdf
Size: 440970 bytes
Desc: not available
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180225/67b11a7f/attachment-0005.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RRL .pdf
Type: application/pdf
Size: 4136994 bytes
Desc: not available
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180225/67b11a7f/attachment-0006.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 07376685.pdf
Type: application/pdf
Size: 1187895 bytes
Desc: not available
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180225/67b11a7f/attachment-0007.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LvDuZhai.pdf
Type: application/pdf
Size: 474178 bytes
Desc: not available
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180225/67b11a7f/attachment-0008.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SSRN-id2594477.pdf
Type: application/pdf
Size: 1113007 bytes
Desc: not available
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180225/67b11a7f/attachment-0009.pdf>