[mlpack] GSoC-2021

Fri Mar 12 06:31:14 EST 2021

Hey Marcus,

Yes, you got it correct. We will have a single environment but we can have
multiple agents and reward schemes. I added more info, maybe this will make
things more clear.

These are the building blocks for solving any DRL problem, I tried to keep
it as simple as possible for now, once we know what exactly we are getting
into then we can talk about the implementation details.

   -

   Environment: The environment would be a simulated stock exchange that
   will contain the functionality of any common exchange and some driver
   functions:
   -

      Buy Stock
      -

      Sell Stock
      -

      Step
      -

      Reset
      -

      Other needed functions accordingly will be implemented

   -

   Action Scheme:  Agent can buy n shares of any company which can be
   denoted as:

{-k, ..., -1, 0, 1, ..., k}

For example “Buy 10 shares of KO” and  “Sell 10 shares of KO” are 10 and
-10 respectively. So we basically have 3 actions, Buy, Sell and Hold.

   -

    Reward Scheme: As I already described in my last mail that should
   implement different types of reward functions so that we could mimic
   different types of strategies. Currently, I am planning to implement:
   -

      Simple reward scheme which will be based on the percent change over a
      fixed window.

We can implement more reward schemes, the risk-managed scheme, I explained
earlier can not only be based on net-worth, good risk management will be
key here to get more reward, to implement this we need to implement
functionality like stop loss in our environment.

   -

   State Space: The state space is what our environment sends to the agent
   for observation. State-space will contain OHLCV(Open High Low Close
   Volume) and some indicators for technical analysis.

   -

   We will be using the agents available to implement an example and if
   during GSoC we get any new agent like A2C, we can use that as well.

The example will be fully documented and will explain how each and every
component works so that users can understand and get familiar quickly.

Let me know if I need to clarify anything else or points that you think are
still missing.

Thanks,
Gopi

On Thu, Mar 11, 2021 at 9:49 PM Marcus Edel <marcus.edel at fu-berlin.de>
wrote:

> Hello Gopi,
>
> thanks for the clarification, so to me, this sounds like different reward
> functions but in
> the same environment. So I guess the way I would integrate such a task
> into the existing
> codebase is to add a separate task for each scenario. Maybe you have
> another idea?
>
> Regarding the first idea, I will soon implement a basic structure and make
> a PR, I will
> also send a detailed mail of what I am planning regarding the
> pre-processing tool.
>
>
> Sounds good.
>
> Thanks,
> Marcus
>
>
> On 10. Mar 2021, at 01:09, Gopi Manohar Tatiraju <deathcoderx at gmail.com>
> wrote:
>
> Heyy Marcus Edel,
>
> Thanks for your feedback.
>
> When we frame trading as an RL problem on the surface it seems like the
> goal of the agent is to *maximize the net worth.* But there are many ways
> to reach this goal and there are *different groups of people who work on
> different principles. *
>
> Let's compare some:
>
>    - *Day trader: *The goal of any day trader is to maximize his profit
>    but also minimize the risk (Trading 101: Always cap your losses). So for
>    this use-case, we want to encourage the agent to use something called
>    stop-loss. So more reward should be given to trades that are made with
>    stop-loss rather than to the trades which are made without stop-loss. This
>    will make sure that our agents learn to cover their losses, which is very
>    important in a real-world scenario.
>    - *Institutional Traders:* These guys consider VWAP(Volume Weighted
>    Average Pricing) as the best price on which they can acquire the stocks. So
>    regardless of what the current price is these guys always try to buy at
>    VWAP only. So for cases like this, we can polarize for not following VWAP,
>    thus making it understandable that VWAP is the best price.
>
>
> Different reward_schemes will be tailored for different use-cases. Based
> on how one wants to trade he can choose different reward schemes.
>
> Regarding the first idea, I will soon implement a basic structure and make
> a PR, I will also send a detailed mail of what I am planning regarding the
> pre-processing tool.
>
> Let me know if you have any more doubts regarding reward_schemes or
> anything else.
>
> Thanks,
> Gopi
>
> On Wed, Mar 10, 2021 at 5:37 AM Marcus Edel <marcus.edel at fu-berlin.de>
> wrote:
>
>> Hello Gopi M. Tatiraju,
>>
>> thanks for reaching out; I like both ideas, I can see the first idea would
>> integrate perfectly into the preprocessing pipeline; that said, it would
>> be
>> useful to discuss the project's scope in more detail. Specifically, what
>> functionality you like to add, in #2727 you already implemented some
>> features, so I'm curious to hear what other features you have in mind.
>>
>> The RL idea sounds interesting as well, and I think could also fit into
>> the
>> RL codebase that is already there. I'm curious what do you mean with
>> "rewards schemes"?
>>
>> Thanks,
>> Marcus
>>
>> On 9. Mar 2021, at 14:55, Gopi Manohar Tatiraju <deathcoderx at gmail.com>
>> wrote:
>>
>> Hello mlpack,
>>
>> I am Gopi Manohar Tatiraju currently in my final year of Engineering from
>> India.
>>
>> I've been working on mlpack for quite some time now. I've tried to
>> contribute and learn from the community. I've received ample support from
>> the community which made learning really fun.
>>
>> Now, as GSoC is back with its 2021 edition, I want to take this
>> opportunity to learn from the mentors and contribute to the community.
>>
>> I am planning to contribute to mlapck under GSoC 2021. Currently, I am
>> working on creating a pandas *dataframe-like class* that can be used to
>> analyze the datasets in a better way.
>>
>> Having a class like this would help in working with datasets as ml is not
>> only about the model but about data as well.
>>
>> I have a pr already open for this:
>> https://github.com/mlpack/mlpack/pull/2727
>>
>> I wanted to know if I can work on this in GSoC? As it was not listed on
>> the idea page, but I think this would be a start to something useful and
>> big.
>>
>> If this idea doesn't seem workable right now, I want to implement *RL
>> Environments for Trading and some working examples for each env*.
>>
>>
>> What all exactly I am planning to implement are the building blocks of
>> any RL system:
>>
>>    - *rewards schemes*
>>    - *action schemes*
>>    - *env*
>>
>>
>> Fin-Tech is a growing field, and there is a lot of application of Deep-Q
>> Learning there.
>>
>> I am planning to implement different *strategies* like *Bull-Sell-Hold,
>> Long only, Short only.*..
>> This will make example-repo rich in terms of DRL examples...
>> We can even build a small *backtesting module* that can be used to run
>> backtest on our predictions.
>>
>> There are some libraries that are currently working on such models in
>> python, we can use it as a *reference* to go forward.
>> *FinRL*: https://github.com/AI4Finance-LLC/FinRL-Library
>>
>> *Planning to implement:*
>>
>> Different types of *envs* for different kind of financial tasks:
>>
>>    - single stock trading env
>>    - multi stock trading env
>>    - portfolio selection env
>>
>> Some example env in python:
>> https://github.com/AI4Finance-LLC/FinRL-Library/tree/master/finrl/env
>>
>> Different types of *action_schemes*:
>>
>>
>>    - make only long trades
>>    - make only short trades
>>    - make both long and short
>>    - BHS(Buy Hold Sell)
>>
>> Example action_schemes:
>> https://github.com/tensortrade-org/tensortrade/blob/master/tensortrade/env/default/actions.py
>>
>> We can see class BHS, SimpleOrder, etc.
>>
>> Different types of *reward_schemes*:
>>
>>
>>    - simple reward
>>    - risk-adjusted reward
>>    - position based reward
>>
>>
>> For the past 3 months, I've been working as an ML Researcher in a
>> Fin-Tech startup and have worked on this only.
>>
>> I would love to hear your feedback and suggestions.
>>
>> Regards.
>> Gopi M. Tatiraju
>>
>> _______________________________________________
>> mlpack mailing list
>> mlpack at lists.mlpack.org
>> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210312/4baa75e6/attachment-0001.htm>