18:12 < sumedhghaisas> zoq: Hey Marcus, had a couple of questions about the architecture.
18:13 < sumedhghaisas> I observed that the 'parameter' input to the evaluate function is not used in FFN
18:13 < sumedhghaisas> So we are assuming that only gradient descend based optimizers will be used?
18:46 < zoq> sumedhghais: That's true, ignoring the input, was the easiest was an easy way to reuse the existing optimizer classes without writing a wrapper. But if you have something in mind, that you think it worth a change, feel free.
18:49 < sumedhghaisas> zoq: yeah I agree. But I am bit confused about the working. So Evaluate returns the loss based on current network parameters
18:51 < sumedhghaisas> but the Gradient function creates gradient in a matrix style
18:51 < sumedhghaisas> for the update
18:51 < sumedhghaisas> so where are the parameters updated? usually they are updated in the optimizer, right?
18:59 < sumedhghaisas> ahh okay... the 'iterate' matrix is passed as the reference to the 'parameters' object of FFN
19:00 < sumedhghaisas> so it gets updated in vanilla update...
19:00 < zoq> yeah, absolutely right
19:01 < sumedhghaisas> but then we can maybe we can somehow parameterize the update policy to accept actual update operation and bypass the entire gradient matrix creation?
19:02 < sumedhghaisas> what do you think?
19:02 < sumedhghaisas> that update operation will implement a forward pass through all the layers and update their individual parameters?
19:07 < zoq> I mean you could do that, I guess the benefit is you would save memory, since you only have to hold the current gradient of layer x.
19:08 < sumedhghaisas> yeah... thats what I was thinking. And we can compute and update at the same time... without actually saving the gradient
19:10 < sumedhghaisas> So the update function will do the work of gradient and update
19:12 < sumedhghaisas> but then we will need to change the gradient function of all the layers... uffff
19:13 < zoq> I like the idea, not sure, there is an easy way to achieve this; the idea was to avoid the implement of a special optimizer for the ann code.
19:14 < zoq> modifiying the Gradient function should be straightforward
19:14 < zoq> but it takes some time, yes
19:15 < sumedhghaisas> yeah... We will save lot of memory access... and also the creation of gradient matrix... which involves lot of matrix reshaping
19:17 < sumedhghaisas> okay I will create a github issue for this and try to work it out
19:18 < sumedhghaisas> also ... Should I use the BatchNorm pull request and modify it... cause except for some small changes and adding support for convolutional layers, the code looks good to me
19:19 < zoq> opening a new issue is a good idea
19:19 < zoq> yeah, the BatchNorm PR looks good for me too
19:20 < sumedhghaisas> okay... I will try to replace some architectural changes in place of the batch norm implementation... cause most of my work is already done by him there :P
19:21 < zoq> if you like, sure :)
20:10 < sumedhghaisas> zoq: On a separate note, do you think building the network static rather than dynamic will have speed up?
20:11 < sumedhghaisas> just a curiosity, I dont yet have an architecture to do that :)
21:23 < zoq> sumedhghais: I think the performance boost you probably get is negligible.
