[mlpack] GSoC-2021

Gopi Manohar Tatiraju deathcoderx at gmail.com
Mon Mar 22 09:35:10 EDT 2021


Hey,

This is regarding the Trading Environment idea.

I am a bit confused here actually. Till now all the examples we implemented
are based on gym env, it a requirement? I just wanted to clear this.

Apart from this I am trying to decide on the skeleton of the code, and how
the agent will use it.
I am planning on have something like this for environment:

We will have a class named *StockTradingEnv*, which will have 2 more
classes:

   1. *class Action*: three enum values, ie *BUY* and *SELL *and *HOLD*
   2. *class State*: For a state in the trading environment, we generally
   have *price data*. When I say price data it can only be OHLCV data or
   OHLCV + technical indicators data
   3. a function named *step* which will drive the env.
   4. different reward functions
   5. Other utility functions like *buy_stock()* and *sell_stock()*

Does this sound good for the starters, I am considering this as base a
starting from this point.

Regards,
Gopi



On Fri, Mar 12, 2021 at 5:01 PM Gopi Manohar Tatiraju <deathcoderx at gmail.com>
wrote:

> Hey Marcus,
>
> Yes, you got it correct. We will have a single environment but we can have
> multiple agents and reward schemes. I added more info, maybe this will make
> things more clear.
>
> These are the building blocks for solving any DRL problem, I tried to keep
> it as simple as possible for now, once we know what exactly we are getting
> into then we can talk about the implementation details.
>
>
>    -
>
>    Environment: The environment would be a simulated stock exchange that
>    will contain the functionality of any common exchange and some driver
>    functions:
>    -
>
>       Buy Stock
>       -
>
>       Sell Stock
>       -
>
>       Step
>       -
>
>       Reset
>       -
>
>       Other needed functions accordingly will be implemented
>
>
>
>    -
>
>    Action Scheme:  Agent can buy n shares of any company which can be
>    denoted as:
>
> {-k, ..., -1, 0, 1, ..., k}
>
> For example “Buy 10 shares of KO” and  “Sell 10 shares of KO” are 10 and
> -10 respectively. So we basically have 3 actions, Buy, Sell and Hold.
>
>
>    -
>
>     Reward Scheme: As I already described in my last mail that should
>    implement different types of reward functions so that we could mimic
>    different types of strategies. Currently, I am planning to implement:
>    -
>
>       Simple reward scheme which will be based on the percent change over
>       a fixed window.
>
> We can implement more reward schemes, the risk-managed scheme, I
> explained earlier can not only be based on net-worth, good risk management
> will be key here to get more reward, to implement this we need to
> implement functionality like stop loss in our environment.
>
>
>    -
>
>    State Space: The state space is what our environment sends to the
>    agent for observation. State-space will contain OHLCV(Open High Low
>    Close Volume) and some indicators for technical analysis.
>
>
>
>    -
>
>    We will be using the agents available to implement an example and if
>    during GSoC we get any new agent like A2C, we can use that as well.
>
>
> The example will be fully documented and will explain how each and every
> component works so that users can understand and get familiar quickly.
>
> Let me know if I need to clarify anything else or points that you think
> are still missing.
>
> Thanks,
> Gopi
>
> On Thu, Mar 11, 2021 at 9:49 PM Marcus Edel <marcus.edel at fu-berlin.de>
> wrote:
>
>> Hello Gopi,
>>
>> thanks for the clarification, so to me, this sounds like different reward
>> functions but in
>> the same environment. So I guess the way I would integrate such a task
>> into the existing
>> codebase is to add a separate task for each scenario. Maybe you have
>> another idea?
>>
>> Regarding the first idea, I will soon implement a basic structure and
>> make a PR, I will
>> also send a detailed mail of what I am planning regarding the
>> pre-processing tool.
>>
>>
>> Sounds good.
>>
>> Thanks,
>> Marcus
>>
>>
>> On 10. Mar 2021, at 01:09, Gopi Manohar Tatiraju <deathcoderx at gmail.com>
>> wrote:
>>
>> Heyy Marcus Edel,
>>
>> Thanks for your feedback.
>>
>> When we frame trading as an RL problem on the surface it seems like the
>> goal of the agent is to *maximize the net worth.* But there are many
>> ways to reach this goal and there are *different groups of people who
>> work on different principles. *
>>
>> Let's compare some:
>>
>>    - *Day trader: *The goal of any day trader is to maximize his profit
>>    but also minimize the risk (Trading 101: Always cap your losses). So for
>>    this use-case, we want to encourage the agent to use something called
>>    stop-loss. So more reward should be given to trades that are made with
>>    stop-loss rather than to the trades which are made without stop-loss. This
>>    will make sure that our agents learn to cover their losses, which is very
>>    important in a real-world scenario.
>>    - *Institutional Traders:* These guys consider VWAP(Volume Weighted
>>    Average Pricing) as the best price on which they can acquire the stocks. So
>>    regardless of what the current price is these guys always try to buy at
>>    VWAP only. So for cases like this, we can polarize for not following VWAP,
>>    thus making it understandable that VWAP is the best price.
>>
>>
>> Different reward_schemes will be tailored for different use-cases. Based
>> on how one wants to trade he can choose different reward schemes.
>>
>> Regarding the first idea, I will soon implement a basic structure and
>> make a PR, I will also send a detailed mail of what I am planning regarding
>> the pre-processing tool.
>>
>> Let me know if you have any more doubts regarding reward_schemes or
>> anything else.
>>
>> Thanks,
>> Gopi
>>
>> On Wed, Mar 10, 2021 at 5:37 AM Marcus Edel <marcus.edel at fu-berlin.de>
>> wrote:
>>
>>> Hello Gopi M. Tatiraju,
>>>
>>> thanks for reaching out; I like both ideas, I can see the first idea
>>> would
>>> integrate perfectly into the preprocessing pipeline; that said, it would
>>> be
>>> useful to discuss the project's scope in more detail. Specifically, what
>>> functionality you like to add, in #2727 you already implemented some
>>> features, so I'm curious to hear what other features you have in mind.
>>>
>>> The RL idea sounds interesting as well, and I think could also fit into
>>> the
>>> RL codebase that is already there. I'm curious what do you mean with
>>> "rewards schemes"?
>>>
>>> Thanks,
>>> Marcus
>>>
>>> On 9. Mar 2021, at 14:55, Gopi Manohar Tatiraju <deathcoderx at gmail.com>
>>> wrote:
>>>
>>> Hello mlpack,
>>>
>>> I am Gopi Manohar Tatiraju currently in my final year of Engineering
>>> from India.
>>>
>>> I've been working on mlpack for quite some time now. I've tried to
>>> contribute and learn from the community. I've received ample support from
>>> the community which made learning really fun.
>>>
>>> Now, as GSoC is back with its 2021 edition, I want to take this
>>> opportunity to learn from the mentors and contribute to the community.
>>>
>>> I am planning to contribute to mlapck under GSoC 2021. Currently, I am
>>> working on creating a pandas *dataframe-like class* that can be used to
>>> analyze the datasets in a better way.
>>>
>>> Having a class like this would help in working with datasets as ml is
>>> not only about the model but about data as well.
>>>
>>> I have a pr already open for this:
>>> https://github.com/mlpack/mlpack/pull/2727
>>>
>>> I wanted to know if I can work on this in GSoC? As it was not listed on
>>> the idea page, but I think this would be a start to something useful and
>>> big.
>>>
>>> If this idea doesn't seem workable right now, I want to implement *RL
>>> Environments for Trading and some working examples for each env*.
>>>
>>>
>>> What all exactly I am planning to implement are the building blocks of
>>> any RL system:
>>>
>>>    - *rewards schemes*
>>>    - *action schemes*
>>>    - *env*
>>>
>>>
>>> Fin-Tech is a growing field, and there is a lot of application of Deep-Q
>>> Learning there.
>>>
>>> I am planning to implement different *strategies* like *Bull-Sell-Hold,
>>> Long only, Short only.*..
>>> This will make example-repo rich in terms of DRL examples...
>>> We can even build a small *backtesting module* that can be used to run
>>> backtest on our predictions.
>>>
>>> There are some libraries that are currently working on such models in
>>> python, we can use it as a *reference* to go forward.
>>> *FinRL*: https://github.com/AI4Finance-LLC/FinRL-Library
>>>
>>> *Planning to implement:*
>>>
>>> Different types of *envs* for different kind of financial tasks:
>>>
>>>    - single stock trading env
>>>    - multi stock trading env
>>>    - portfolio selection env
>>>
>>> Some example env in python:
>>> https://github.com/AI4Finance-LLC/FinRL-Library/tree/master/finrl/env
>>>
>>> Different types of *action_schemes*:
>>>
>>>
>>>    - make only long trades
>>>    - make only short trades
>>>    - make both long and short
>>>    - BHS(Buy Hold Sell)
>>>
>>> Example action_schemes:
>>> https://github.com/tensortrade-org/tensortrade/blob/master/tensortrade/env/default/actions.py
>>>
>>> We can see class BHS, SimpleOrder, etc.
>>>
>>> Different types of *reward_schemes*:
>>>
>>>
>>>    - simple reward
>>>    - risk-adjusted reward
>>>    - position based reward
>>>
>>>
>>> For the past 3 months, I've been working as an ML Researcher in a
>>> Fin-Tech startup and have worked on this only.
>>>
>>> I would love to hear your feedback and suggestions.
>>>
>>> Regards.
>>> Gopi M. Tatiraju
>>>
>>> _______________________________________________
>>> mlpack mailing list
>>> mlpack at lists.mlpack.org
>>> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210322/032cff13/attachment-0001.htm>


More information about the mlpack mailing list