mlpack IRC logs, 2018-11-05
Logs for the day 2018-11-05 (starts at 0:00 UTC) are shown below.
--- Log opened Mon Nov 05 00:00:01 2018
00:00 -!- cjlcarvalho [~caio@2804:d47:1d0d:a700:95d6:5d67:8b41:79ad] has joined #mlpack
00:07 -!- cjlcarvalho [~caio@2804:d47:1d0d:a700:95d6:5d67:8b41:79ad] has quit [Ping timeout: 264 seconds]
01:08 -!- cjlcarvalho [~email@example.com] has joined #mlpack
01:13 -!- cjlcarvalho [~firstname.lastname@example.org] has quit [Ping timeout: 252 seconds]
01:47 -!- cjlcarvalho [~email@example.com] has joined #mlpack
04:31 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
05:23 -!- cjlcarvalho [~firstname.lastname@example.org] has quit [Ping timeout: 240 seconds]
08:06 -!- akshay [b4974b12@gateway/web/freenode/ip.18.104.22.168] has joined #mlpack
08:07 -!- akshay [b4974b12@gateway/web/freenode/ip.22.214.171.124] has quit [Client Quit]
11:47 -!- cjlcarvalho [~caio@2804:d47:1d0d:a700:95d6:5d67:8b41:79ad] has joined #mlpack
12:09 -!- rahul [0e8bd103@gateway/web/freenode/ip.126.96.36.199] has joined #mlpack
12:10 -!- rahul is now known as Guest80569
12:10 < Guest80569> hello
12:17 -!- Guest80569 [0e8bd103@gateway/web/freenode/ip.188.8.131.52] has quit [Ping timeout: 256 seconds]
12:19 < zoq> Guest80569: Hello there!
12:38 -!- cjlcarvalho [~caio@2804:d47:1d0d:a700:95d6:5d67:8b41:79ad] has quit [Ping timeout: 252 seconds]
14:01 < davida> For Convolution<> layers is it possible to apply asymmetric padding? If not, how would I apply a 4x4 filter with stride=1 to a 64x64 input and get "SAME" padding so the output is also a 64x64 matrix?
14:03 < davida> The typical formula ... 1+(Hin-f+2p)/s = Hout ... requires padding = 2.5 which is clearly not possible.
15:33 -!- travis-ci [~email@example.com] has joined #mlpack
15:33 < travis-ci> mlpack/mlpack#5570 (master - 922950f : Ryan Curtin): The build has errored.
15:33 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/971d78b6da7e...922950f25073
15:33 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/450886097
15:33 -!- travis-ci [~firstname.lastname@example.org] has left #mlpack 
17:35 < ShikharJ> davida: Be sure that the division on the formula would use a floor function.
17:41 < ShikharJ> davida: My suggestion would be to use a odd length filter like 3x3 or 5x5. Asymmetric padding is not used that often as far as I'm aware, but maybe we can provide the support for that. But it would be way too many parameters for padding itself, and correspondingly one might also argue that we need to provide for asymmetric strides as well.
17:41 < ShikharJ> rcurtin: The website for ensmallen looks good!
17:43 < davida> ShikharJ: Thx for the reply. Was trying to replicate the Deplearning.ai course exercises which are using these even Conv filters. Anyway, having much larger problems as cannot
17:43 < davida> get the Conv model they proposed to converge at all.
17:45 < ShikharJ> davida: I'm guess they would be trimming the output to get to 64x64? Can you give a link to that?
17:45 < davida> Does anyone know if there are major differences between the application of Tensorflow's AdamOptimizer and the one provide in MLPACK if I use that with SGD??
17:46 < davida> The exercise is in Python.
17:46 < davida> It is a Jupyter notebook.
17:47 < ShikharJ> Can you point me to the week of the course? I might have access to the specialization.
17:48 < davida> It is Course 4. Week 1. The second assignment.
17:48 < davida> Title is "Convolution Model - Application"
17:49 < davida> I printed a PDF of the notebook if it can help or you can go to https://www.coursera.org/learn/convolutional-neural-networks/notebook/0TkXB/convolutional-model-application
17:50 < davida> NOTE::: this link may be my own notebook so you might not get access.
17:50 < ShikharJ> davida: Ah, I can't access the notebook as I have finished the course. I'll have to re-enroll for the access.
17:51 < davida> Can I post the PDF somewhere you can accessit?
17:52 < ShikharJ> Yeah, maybe post the code in a pastebin and give a link here?
17:55 < davida> https://drive.google.com/open?id=1-7O79ZgdfpfjWUwW7sWleP0M0Q_5e-oW
18:07 -!- cjlcarvalho [~email@example.com] has joined #mlpack
18:35 < ShikharJ> davida: Ah, using the conv2D function directly does make use of arbitrary padding, so my guess is it adds an odd pad column to the right and row to the bottom as required.
18:35 < ShikharJ> zoq: This might be something we should look into as well.
18:36 < davida> They have the option to set the padding to SAME or VALID rather than specify an actual padding amount.
18:37 < ShikharJ> rcurtin: I'm curious though, when you had started you PhD in ML, what resources did you refer to back in the day when there were hardly many resources for the field?
18:38 < davida> Anyway, I tried using the 5x5 and 3x3 with padding of 2 and 1 respectively, instead of 4x4 and 2x2 in the exercise but I cannot get the MLPACK version to converge beyond ~50% on training. It seems something
18:38 < davida> is quite different in the optimizer since the network setup is almost identical,
18:39 < ShikharJ> davida: Yes, they take care of the computation, but that in a way is a restriction as you only have two padding options, and you'll have to apply tf.pad() to the image to set the padding of your choice.
18:40 < davida> I used AdamUpdate with SGD as an approximation for the Tensorflow AdamOptimizer.
18:40 -!- cjlcarvalho [~firstname.lastname@example.org] has quit [Ping timeout: 272 seconds]
18:41 < ShikharJ> davida: I'm not sure about the optimizer framework, zoq would be a better person to ask.
19:11 < rcurtin> ShikharJ: the Bishop "Pattern Recognition and Machine Learning" book, CLRS for algorithms, and the AIMA book by Russell and Norvig
19:11 < rcurtin> that plus recent papers in the particular field of study (which for me was nearest neighbor search)
19:12 < rcurtin> so there was a lot of reading of labmates' papers, etc., as I came up to speed
19:12 < rcurtin> davida: what batch size were you using?
19:13 < davida> AdamUpdate adamUpdate(1e-8, 0.9, 0.999);
19:13 < davida> SGD<AdamUpdate> optimizer(0.009, 64, 100000, 1e-05, true, adamUpdate);
19:13 < rcurtin> try with a batch size of 1... I yesterday noticed some strange results for convolutional layers with larger batch sizes
19:13 < davida> The parameters were taken from the exercuse
19:13 < rcurtin> (so change "64" to "1")
19:13 < davida> OK - will try now.
19:15 < davida> One more question. If I create a layer like this...
19:15 < davida> model.Add<Convolution<> >(3, 8, 4, 4, 1, 1, 2, 2, 64, 64);
19:16 < davida> ... what will the output size be? 65x65x8 ?
19:17 < davida> ... so applying a pooling layer with there parameters:
19:17 < davida> model.Add<MaxPooling<> >(8, 8, 8, 8, true);
19:17 < rcurtin> what's the input size for that convolution layer?
19:17 < rcurtin> oh sorry that is the last two parameters
19:18 < davida> Should reduce this to 8x8x8, right since the 65th pixel will be ignored, right?
19:18 < davida> 64*64*3 images
19:18 < davida> But I am getting memory access violation with this
19:19 < davida> I am worried that the flattening of the image is not correctly taken care of in MaxPooling since there is not input for the width and height
19:21 < davida> The 1,080 images of size 64*64*3 are read in to a matrix of 1288x1080.
19:22 < davida> 12288x1080
19:22 < davida> With (64x64)(64x64)(64x64) layout.
19:23 < rcurtin> so, the output size I would *expect* for the convolution layer you described is 65x65x8 exactly like you wrote
19:23 < davida> ... so then they become 8 layers of (65x65).....(65x65) after the Conv layer, how does MaxPool know that the image input is now 65x65x8?
19:24 < rcurtin> however, the size is being calculated by the function ConvOutSize() in convolution.hpp at line 185... and if I am reading it correctly, it will give a size of ... 32x32x8 ??
19:24 < rcurtin> let me read the rest of what you wrote, hang on...
19:24 < davida> ... because MaxPool should ignore the 65th col and row since they are not complete sets of 8x8 pixels?
19:27 < rcurtin> I see that the MaxPooling layer will use the size of the previous layer's output, but I am not too familiar with this part of the code (which is why I am kind of slow to respond)
19:28 < davida> Hmmm. Then I am not sure why I get an memory access violation when I run that.
19:28 < rcurtin> I'm really kind of hung up on the ConvOutSize() issue. It seems like the output size is being computed incorrectly
19:28 < rcurtin> I think that we should open a Github issue for this. To me it seems clear there is a problem of some sort
19:28 < rcurtin> would you like to do this, or would you like me to?
19:28 < davida> If it was not correct it would explain a lot of issues I am facing right now....
19:28 < rcurtin> I don't have the time this afternoon to dig in too deep to this
19:29 < rcurtin> but I think that I can find time to address it soon (or maybe someone will beat me to it)
19:30 < davida> Do you also suspect a problem with the batch management in the optimizer?
19:31 < rcurtin> I think overall the batch management is okay, but I suspect that the convolution layer is not using the memory for its output correctly
19:31 < rcurtin> basically each layer will compute some big output matrix, and pass (memory, rows, cols, slices) or something like this to the following layer
19:31 < rcurtin> (sometimes the passing of rows/cols/slices is implicit and done through a different mechanism)
19:32 < rcurtin> I personally think right now after a quick glance that the convolution layer is *saying* to other layers that rows/cols/slices is one thing, but then it's acting as though those values are *different* inside of the layer
19:32 < rcurtin> and this may also be the case for max pooling
19:32 < rcurtin> now it would seem odd for you to encounter this, because I know there are tests for this code
19:32 < rcurtin> but... it's software. anything can happen...
19:48 < ShikharJ> rcurtin: What is the issue with the batch sizes that you are observing?
19:50 < ShikharJ> rcurtin: Also, when you eventually started you PhD, did you take time to go back on doing Linear Algebra and Probability all over again, or you did something else?
19:50 < davida> rcurtin: Setting batch size to 1 does not seem to complete even one iteration.
19:59 < ShikharJ> davida: You are correct regarding the output of the MaxPooling layer, the output should be 8x8x8
20:01 < davida> ShikharJ: then there is a problem in the code somewhere as it is throwing an error during Train. Access Violation, so most likely some matrix memory reads/writes are not taking care of the sizes correctly.
20:02 < davida> It works fine (no exception) if you match the layers perfectly by using 5x5 filter with pad=2 giving an exact 64x64x8 output.
20:04 < davida> rcurtin: I tried: SGD<AdamUpdate> optimizer(0.009, 1, 1080, 1e-05, true, adamUpdate); and it fails to come out of the optimizer loop. Even set maxIterations to 1 and got the same problem.
20:04 < ShikharJ> davida: I suspect that would be because of an issue with the Pooling class, unfortunately, I'm not familiar with its working as of yet. I'll dig deeper and let you know tomorrow?
20:04 < davida> rcurtin: I take that back. It stopped the optimizer loop after 5mins.
20:12 < ShikharJ> davida: For the pooling layer, my suspicion kind of grows, given that it is not tested, apart from being used in an example convolutional network.
20:19 < davida> rcurtin: Just reading the code on the ConvOutSize() at line 181 of convolution.hpp. Looks correct to me unless it is being called incorrectly.
20:22 < davida> looks like it is used correctly in the calls from convolution_impl.hpp at lines 123 & 124.
20:32 < davida> As for MaxPooling, the code calculating the outputWidth and outputHeight on lines 61 & 62 of max_pooling_impl.hpp is correct. In my case it correctly calculates the output 8x8x8 for an input dimension of 65x65x8.
20:45 < davida> ... so referring to your suggestion to open a GitHub issue I am not sure what to say.
20:46 < zoq> davida: A simple example to reproduce the issue and the expected output would be enough here.
20:57 < davida> K
20:58 < zoq> davida: thanks
21:04 < rcurtin> davida: zoq: ShikharJ: it's always possible I got confused and there is no bug in ConvOutSize(), so take everything I have written with a grain of salt of course :)
21:05 < rcurtin> ShikharJ: I observed some strange behavior with the runtime and accuracy changing for batch size in this issue: https://github.com/mlpack/mlpack/pull/1554
21:06 < rcurtin> however, I am not actually sure that anything is necessarily *wrong* with the code there. It may just be the expected behavior for that dataset/network combination
21:06 < zoq> rcurtin: If there is an issue, I'm sure we can figure it out.
21:15 < rcurtin> oh! I see. I did read ConvOutSize() wrong
21:15 < rcurtin> the code is 'return std::floor(size + p * 2 - k) / s + 1;'
21:15 < rcurtin> but I understood this as 'return std::floor(size + p * 2 - k) / (s + 1);'
21:15 < rcurtin> which is entirely different
21:16 < rcurtin> I must not have gotten enough sleep last night...
21:17 < rcurtin> davida: so, my statement that the output of the convolution layer is 32x32x8 is totally wrong. The actual size will be 65x65x8 like you originally said
21:17 < rcurtin> ShikharJ: I remembered linear algebra well enough to not take a class on it, but I did consult some various linear algebra textbooks. Actually one thing that was really useful to me was the matrix cookbook:
21:17 < rcurtin> https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
21:20 < davida> rcurtin: Thanks for confirming.
--- Log closed Tue Nov 06 00:00:03 2018