[mlpack] Reinforcement learning GSOC' 18

Sun Feb 25 08:33:05 EST 2018

Hello Rohan,

> The competition was part of the coursework. It was a classification problem. The
> data we were given had 2500 features, 10,000 samples and 20 classes that we had
> to classify it into. This was a hand engineered dataset by our teaching
> assistants.The data also had missing information. Some of the competition
> winning algorithms like Xgboost failed to perform well. I won the competition by
> using SVM and Neural networks.

Wow, I expected that Xgboost would provide some reasonably good results.

> However, using RL for trading has its own advantages. It can learn directly from
> a simulation environment and will be able to adapt to the latency of the
> environment because it will be receiving negative reward during the latency
> period. This is however not possible by using deep learning techniques which
> cannot work around this latency. Also complex policies can be learn by the agent
> using deep learning techniques that can't be learned by humans.

Thanks for the informations, sounds like you already put some time into the
idea, I'll see if I can take a closer look at the papers in the next days.

> Sent with MailTrack

Tracking who is reading your emails without their consent is unethical.
Please consider not using software like this.

Thanks,
Marcus

> On 25. Feb 2018, at 06:12, ROHAN SAPHAL <rohansaphal at gmail.com> wrote:
> 
> 
> Hi Marcus,
> 
> Sorry for the delayed reply. I was traveling the past two days.
> 
> That sounds really cool, what kind of competition was that?
> The competition was part of the coursework. It was a classification problem. The data we were given had 2500 features, 10,000 samples and 20 classes that we had to classify it into. This was a hand engineered dataset by our teaching assistants.The data also had missing information. Some of the competition winning algorithms like Xgboost failed to perform well. I won the competition by using SVM and Neural networks.
> 
> The idea sounds interesting, do you have some particular methods/papers in mind
> you like to work on since the methods listed on the ideas page are just
> suggestions this is could be a GSoC project.
> 
> The two most frequently seen algorithms that have been used is Q-learning and recurrent reinforcement learning. Firstly, I want to mention the challenges in the trading domain,
> Environment: The trading system is a POMDP and consists of multiple other trading agents. We can have two approaches here, we can assume other agents to be part of the environment and then allow the agent to learn in that environment. Otherwise, we could take a multi-agent approach where we try to reverse-engineer the trading strategies of other agents and then learn to exploit them. This moves into  the multi-agent RL domain which is currently an ongoing research field. 
> Action spaces: We can have our agent take discrete action space, which is basically 3 cations of buy, hold or sell. Increasing the complexity, we can have the agent have the amount to be invested which is a continous action space. Further we could also have the agent know when to place the orders and in how much quantity which makes it much more complex. Having this level of complexity is required if needed to make profits on a regular basis.
> Reward function: Although it may seem intuitive to feed the profit/loss as the reward function, it may not be a good idea as this reward is sparse and is not very frequent. We could also feed unrealized profit/loss also which is not so sparse but allows the agent to lean to trade profitably. This however causes bias towards the agent when an actual profit/loss is made. The other possibility is to choose a reward function that tends to reduce the risk involved like sharpe ratio or maximum drawdown. We might have to choose multiple reward functions to trade off between profit and risk.
> However, using RL for trading has its own advantages. It can learn directly from a simulation environment and will be able to adapt to the latency of the environment because it will be receiving negative reward during the latency period. This is however not possible by using deep learning techniques which cannot work around this latency. Also complex policies can be learn by the agent using deep learning techniques that can't be learned by humans.
> 
> I feel a good starting point would be to implement state of the art recurrent reinforcement learning algorithm and then improve on it by incorporating multiple agents, continuous action spaces,etc. Hoping to hear suggestions from mentors.
> 
> PFA some relevant papers.
> 
> 
> 
> Regards,
> 
> Rohan Saphal
> 
> 
> 
> 
> 
> ‌
>  <https://mailtrack.io/> Sent with Mailtrack <https://chrome.google.com/webstore/detail/mailtrack-for-gmail-inbox/ndnaehgpjlnokgebbaldlmgkapkpjkkb?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality>
> 
> On Tue, Feb 20, 2018 at 11:35 PM, ROHAN SAPHAL <rohansaphal at gmail.com <mailto:rohansaphal at gmail.com>> wrote:
> Hi,
> 
> I am Rohan Saphal, a pre-final year undergraduate from Indian Institute of Technology Madras.
> 
> My research interest is in Artificial Intelligence and specifically in Deep reinforcement learning. 
> I have been working with  Prof. Balaraman Ravindran <https://scholar.google.co.in/citations?user=nGUcGrYAAAAJ&hl=en> in Multi-agent reinforcement learning and will continue to do my final degree thesis project under his guidance.
> I am currently a graduate research intern at Intel labs working on Reinforcement learning. 
> Previously, I was a computer vision intern at Caterpillar Inc. As part of the machine learning course,  a competition was organized among the students and i have secured 1st place in that competition <https://www.kaggle.com/c/iitm-cs4011/leaderboard>
> I am familiar with deep learning and have completed the fast.ai <http://fast.ai/> MOOC course along with course offered at our Institute.  
> 
> I have read the papers related to the the reinforcement learning algorithms mentioned in the ideas page. I am interested to work in the reinforcement learning module.
> 
> I have compiled mlpack from source and an looking at the code structure of the reinforcement learning module. I am unable to find any tickets presently and hoping that someone could direct me as to how to proceed.
> 
> I have been interested to use reinforcement learning for equity trading and  recurrent reinforcement learning algorithms has interested me. I believe the stock market is a good environment (POMDP) to test and evaluate the performance of such algorithms as it is a highly challenging setting. There are so many agents that are involved in the environment and i feel to develop reinforcement learning algorithms that could trade efficiently in such a setting will be an interesting problem.Deep learning algorithms like LSTM, cannot capture the latency involved in the system and hence cannot make real time predictions. Reinforcement learning algorithms could however learn how to interact under the latency constraint to make real time predictions. Some areas that i see work in this area is to:
> Implement latest work(s) in multi-agent reinforcement learning algorithm
> Implement Recurrent reinforcement learning algorithm(s) that capture temporal nature of the environment. Modifications can be made to existing work.
> I would like to hear suggestions from mentors what they feel about the idea suggested and if it seems like an acceptable project to suggest for GSOC. 
> 
> Thanks for your time
> 
> Hope to hear from you soon. Feel free to ask for any more details about me or my work.
> 
> Regards,
> 
> Rohan Saphal
> 
> 
> <rrl.pdf><RRL .pdf><07376685.pdf><LvDuZhai.pdf><SSRN-id2594477.pdf>_______________________________________________
> mlpack mailing list
> mlpack at lists.mlpack.org <mailto:mlpack at lists.mlpack.org>
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack <http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180225/7c61d7fe/attachment-0001.html>