[mlpack] GSoC in Reinforcement Learning

Mon Mar 26 18:24:58 EDT 2018

Hi All,

I'm Attila Sulyok, second year MSc Computer Engineer student at the PPCU in
Budapest. I am also interested in participating the Google Summer of Code
this year, specifically developing the reinforcement learning modules of
mlpack.

One of my ideas is implementing the modification to the DQN algorithm
described here: [1] that uses (discretised) value distributions instead of
value functions. The trivial approach would be to implement it as a
separate algorithm (like QLearning) or modify the existing one, but I think
it's more general than that: it should be possible to use it with all
value-function-based algorithms. One idea is to hack it into a layer (not
sure if possible), the other is to extract the Q update part of the code
into a parameter, sort of like a loss function.

As I understand, the current state of the art algorithm for learning
continuous actions using value-functions is NAF [2], this may also benefit
from value distributions.

The third idea that I found is Hindsight Experience Replay [3], that wraps
a learning algorithm like DQN or NAF and creates additional goals to learn.

Would mlpack benefit from implementing these? Since the reinforcement
learning part is not large, they shouldn't require large modifications to
existing code.

I built the code and tested with some small algorithms; and one thing I
noticed (having only used keras-rl before) is the lack of metrics output
from training. Is that intentional? I've never used RL in the industry,
only for research (in my current thesis project), so I'm not quite sure
whether it would be useful. Same thing with the current state of the RL
agent not being visible.

Thanks,
Attila

[1]: https://arxiv.org/pdf/1707.06887.pdf
[2]: https://arxiv.org/abs/1603.00748.pdf
[3]: https://arxiv.org/pdf/1707.01495.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180327/2382f810/attachment.html>