[mlpack] Doubt in deep Q network implementation in mlpack

Thu Feb 15 14:03:47 EST 2018

Hi Chirag,

I think it’s better to also cc the mail list.

For your question:
1) yes
2) deterministic is a flag to distinguish training mode and test mode. If it’s false, which indicates we are training, then the policy is epsilon-greedy and we do experience replay to update the network. If it’s true, which indicates we are testing, then the policy is greedy and we don’t update the network, in this case we only want to evaluate the learned q network.
3)  
> is learningNetwork only changing its parameters during the experience mechanism part in Step() 
Yes, it only happens in this line https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/reinforcement_learning/q_learning_impl.hpp#L156 <https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/reinforcement_learning/q_learning_impl.hpp#L156>
> In the sense the learning (parameter updates) only starts once totalSteps has exceeded config.ExplorationSteps() right?

Yes, before that it doesn’t do any learning but collecting experience.

Hope my answer can help and feel free to reach me if you have any questions about my code.

Best regards,

Shangtong Zhang,
Second year graduate student,
Department of Computing Science,
University of Alberta
Github <https://github.com/ShangtongZhang> | Stackoverflow <http://stackoverflow.com/users/3650053/slardar-zhang>
> On Feb 15, 2018, at 04:48, Chirag Ramdas <chiragramdas at gmail.com> wrote:
> 
> Hello Shangtong,
> 
> I have been going through your implementation of Deep Q networks in  src/mlpack/methods/reinforcement_learning/  . The code is really well written and generalised beautifully for a variety of RL control problems..
> 
> I had a few questions, it would be great if you could clarify...
> 
> So in q_learning_impl.hpp, you are maintaining a totalSteps instance variable, as well as a steps variable initialised per call to Episode() method.
> 
> 1) assume once totalSteps exceeds  config.ExplorationSteps() over a number of initial episode runs. Thereafter, in the future calls to Episode(), will every step compose of an experience replay mechanism ie willl every step cause a change of parameters in learningNetwork? (because there is an if condition check in Steps())
> 
> 2) what does the variable deterministic mean in the implementation? I see that if deterministic is true, you never do an experience replay ie learningNetwork parameters never get changed.. What then, is happening?
> 
> 3) last question is a bit fundamental... is  learningNetwork only changing its parameters during the experience mechanism part in Step()? nowhere else are the parameters changing right? In the sense the learning (parameter updates) only starts once totalSteps has exceeded 
> config.ExplorationSteps() right? until then its just a sequence of steps over the initial randomly assigned parameters..
> 
> Thank you so much! really appreciate it!
> 
> 
> 
> Yours Sincerely,
> 
> Chirag Pabbaraju,
> B.E.(Hons.) Computer Science Engineering,
> BITS Pilani K.K. Birla Goa Campus,
> Off NH17B, Zuarinagar,
> Goa, India
> chiragramdas at gmail.com <mailto:chiragramdas at gmail.com> | +91-9860632945

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180215/2d915fd0/attachment.html>