mlpack IRC logs, 2018-06-12
Logs for the day 2018-06-12 (starts at 0:00 UTC) are shown below.
--- Log opened Tue Jun 12 00:00:00 2018
01:54 -!- wenhao [731bc011@gateway/web/freenode/ip.126.96.36.199] has quit [Ping timeout: 260 seconds]
06:32 -!- manish7294 [8ba73011@gateway/web/freenode/ip.188.8.131.52] has joined #mlpack
06:32 < manish7294> zoq: You there?
06:33 < manish7294> I guess Ryan must be sleeping right now.
06:33 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 260 seconds]
06:36 < manish7294> I was debugging the convergence of SGD, AMSGrad and BigBatchSGD on LMNN and what I found in common is that they don't converge but terminate after max iterations. To make them converge on even iris, tolerance needs to be of order of at least 1e-03.
06:36 < manish7294> And this is not only the case with LMNN, NCA suffers the same issue.
06:36 < manish7294> In case it is a issue.
06:38 < manish7294> Whereas L-BFGS converges successfully.
06:38 < manish7294> rcurtin: And this may have to do with the 100 iterations idea as L-BFGS works just fine with it.
06:45 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
06:46 < manish7294> zoq: In adaptive search of BigBatchSGD, there is a stepSize calculation at https://github.com/mlpack/mlpack/blob/0128ef719418edd90c2c6cdcfd651f75a044d914/src/mlpack/core/optimizers/bigbatch_sgd/adaptive_stepsize.hpp#L95 , I was wondering what will happen if batchSize is kept 1.
08:22 -!- manish7294 [8ba73011@gateway/web/freenode/ip.184.108.40.206] has quit [Ping timeout: 260 seconds]
09:27 -!- sulan_ [~sulan_@563BE0E4.catv.pool.telekom.hu] has joined #mlpack
09:43 < ShikharJ> zoq: I posted the results of I the 10,000 image dataset on the DCGAN PR. It seems to be a bt slower than vanilla GAN, primarily because of the Transposed Convolutions, but the results are good. Please take a look
09:44 < ShikharJ> *bit
09:48 < ShikharJ> zoq: I'll spend some more time finding better hyper-parameters. Unfortunately, the O'Reilly example for DCGAN doesn't test on MNIST, so we have no way of checking for competitiveness.
09:50 < ShikharJ> zoq: DCGAN uses a lot more Convolutions and Transposed Convolutions, and is also a bit deeper than the vanilla implementation, so I guess that made the difference. Probably, something we need to keep an eye from now on is the performance of the Convolutional toolbox of mlpack.
09:54 < ShikharJ> zoq: Also in the O'Reilly example, they haven't implemented the same model of DCGAN as in the paper, like we have, so the difference may even come there. Now the CelebA dataset is all that remains.
09:58 < jenkins-mlpack> Yippee, build fixed!
09:58 < jenkins-mlpack> Project docker mlpack nightly build build #347: FIXED in 2 hr 44 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/347/
10:45 -!- witness_ [uid10044@gateway/web/irccloud.com/x-jnctwawtyglzhirn] has joined #mlpack
12:05 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 240 seconds]
12:26 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
12:51 -!- wenhao [731bc1e7@gateway/web/freenode/ip.220.127.116.11] has joined #mlpack
13:05 < rcurtin> manish7294: that sounds about right, because SGD uses a different batch every iteration, it's hard to get convergence
13:06 < rcurtin> I think this is why most neural network training doesn't converge based on a tolerance but instead a maximum number of iterations (epochs)
13:06 < rcurtin> anyway, I would not be surprised if using a larger tolerance (like 1e-3) would give essentially equivalent kNN accuracy results in a fraction of the time, since it takes so many fewer iterations
13:16 < rcurtin> I just did a very quick simulation with the covertype-5k dataset; with the regular SGD optimizer, if I just set max_iterations to take five full passes over the data (so, --max_iterations 25000) the resulting kNN accuracy is basically just as good as with a million iterations
13:16 < rcurtin> let me try again with the full covertype dataset
13:17 < rcurtin> in any case, the idea here would just be that we can set a smaller tolerance or a smaller default number of maximum iterations, and LMNN will converge much quicker but the quality of the solution will be about the same
13:29 -!- manish7294 [8ba73011@gateway/web/freenode/ip.18.104.22.168] has joined #mlpack
13:35 < manish7294> rcurtin: Everything you said looks right on the point.
13:35 < zoq> ShikharJ: Perhaps it makes sense to write an executable (gan_main.cpp), and to use that to run some parameter ranges? I can run some tests on another machine too.
13:38 < manish7294> rcurtin: After a number of iterations sgd just wanders around the minima.
13:38 < manish7294> Hence, as you said its best to have low max iterations.
13:39 < zoq> ShikharJ: I think there are a couple of ideas we could look into to improve the conv operations, "Deep Tensor Convolution on Multicores" might be interesting here.
13:45 < rcurtin> manish7294: right, exactly. so maybe we can try with a handful of datasets and calculate or plot learning curves
13:46 < rcurtin> (i.e. x axis == number of passes over data, y axis = resulting kNN accuracy)
13:46 < rcurtin> and then we can see what a good 'default' number of epochs is
13:47 < manish7294> rcurtin: Is there any way to plot curve while running through command line?
13:48 < manish7294> rcurtin: And should we replace sgd with AMSGrad
13:50 < rcurtin> manish7294: I would say, probably the best way is to write a bash script to cycle the max_iterations, then extract the resulting kNN accuracy into a CSV file or something
13:50 < rcurtin> then you could use octave or matplotlib or whatever your favorite plotting library to plot it (or just look at the numbers directly)
13:50 < rcurtin> I don't know many good C++ plotting libraries that are easy to use though... it's hard to beat Python for that...
13:50 < rcurtin> for the second bit, I'd suggest maybe adding an extra option for --optimizer to include amsgrad and perhaps bbsgd also
13:51 < ShikharJ> zoq: I'll run for different parameters, and then set the defaults for the most appropriate ones. Though you can clone the dcgan_test.cpp code and run yourself if you wish, that should work fine.
13:51 < manish7294> rcurtin: There is a option for optimizer but it currently only supports sgd and lbfgs
13:52 < ShikharJ> zoq: Regarding the paper, I only had a brief look through it, though I guess it can come in handy. I'll have to take a deeper look.
13:53 < zoq> ShikharJ: I'll see if I can implement the idea over the next days.
13:53 < zoq> ShikharJ: Do you use the same parameter for the Celeb dataset?
13:55 < ShikharJ> zoq: I'm yet to run for CelebA, I was digging into the code for support of mini-batches. For CelebA, the layer and the kernel sizes are pertaining to the ones in Soumith Chintala's DCGAN implementation.
13:56 < ShikharJ> zoq: Here (https://github.com/soumith/dcgan.torch)
13:56 < zoq> ShikharJ: I see, I think batch support should be the next big 'milestone' right now.
13:59 < ShikharJ> zoq: Agreed, I'll let you know if I face any doubts.
13:59 < zoq> ShikharJ: Sounds good.
14:02 < rcurtin> manish7294: right, do you think it would be easy to add more there? it should be straightforward I think
14:02 < rcurtin> also it may be useful to add a --passes option for SGD-like optimizers (so --max_iterations is only used for L-BFGS)
14:03 < manish7294> Ya, no problem which ones do you suggest should be there, I personally don't want to keep sgd.
14:03 < rcurtin> and --passes would just specify the number of passes over the data. so then maxIterations would be set to data.n_cols * passes
14:03 < rcurtin> that's fair. I think it might be useful to leave SGD because people know what it is
14:03 < manish7294> right, I will make that change
14:03 < rcurtin> but surely AMSgrad and BBSGD are better approaches
14:03 < rcurtin> so it's up to you how you'd like to do it
14:04 < manish7294> so we can have AMSGrad as default and sgd in the secondary options
14:04 < rcurtin> if you want to remove SGD I would suggest adding comments mentioning that AMSgrad or BBSGD are better alternatives than SGD,
14:04 < manish7294> It's just because of divergence
14:04 < rcurtin> and if you want to leave it I would suggest adding comments saying that AMSgrad or BBSGD might be better choices :)
14:05 < rcurtin> right, understood. the divergence is a hard thing to solve with stock SGD
14:05 < rcurtin> (back in a bit)
14:10 < zoq> manish7294: It might be worth to start with Adam (or another flavour like AMSgrad) and use SGD afterwards: https://arxiv.org/pdf/1712.07628.pdf
14:10 < zoq> I'll see if I can implement SWATS over the next days, but you could hardcode something similar.
14:14 < manish7294> zoq: That's good thing to do but I fear we may face divergence as we move from Adam to SGD.
14:16 < manish7294> zoq: Have you seen the batchSize issue in stepSizeDecay calculation https://github.com/mlpack/mlpack/blob/0128ef719418edd90c2c6cdcfd651f75a044d914/src/mlpack/core/optimizers/bigbatch_sgd/adaptive_stepsize.hpp#L95
14:22 < manish7294> rcurtin: I am going to remove gradient batch precalculation part as it will not going to work with new optimizers. I think it won't affect much.
14:22 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
14:22 < zoq> manish7294: hm, not sure someone is going to use BigBatchSGD with a batch size of 1, in this case https://github.com/mlpack/mlpack/blob/0128ef719418edd90c2c6cdcfd651f75a044d914/src/mlpack/core/optimizers/bigbatch_sgd/bigbatch_sgd_impl.hpp#L139 has the same issue.
14:23 < zoq> manish7294: Might be a good idea to raise at least a warning.
14:24 < manish7294> zoq: Ya no problem, I was foolish enough to do that and got nan as my coordinates matrix.
14:27 < ShikharJ> zoq: If you wish to experiment around with the DCGAN code for MNIST, there are only two parameters you can really search around (stepSize and multiplier).
14:30 < zoq> ShikharJ: Okay, I guess I'll just write a simple executable since I can't pass any parameter to the test without a rebuild.
14:34 < ShikharJ> zoq: What I used to do was make different builds in different tmux sessions.
14:37 < ShikharJ> zoq: Though increasing the step-size and multiplier, we may be able to speed-up the tests, but it could potentially lead to lower quality outputs. So instead of searching for better hyper-parameters, on a second thought, I feel that introducing the support on the tasks mentioned is what we must spend time working on.
14:40 < zoq> ShikharJ: Agreed, as I said on the PR the results are good and they show that it works fine.
14:41 < ShikharJ> zoq: Also the hyper-parameters are anyways going to be set by a user, and need not be similar to what we use by default.
14:45 < zoq> ShikharJ: Yeah, the settings have to be tailored to the task, the defaults are just a good starting point.
15:38 -!- manish7294 [8ba73011@gateway/web/freenode/ip.22.214.171.124] has quit [Ping timeout: 260 seconds]
16:08 -!- killer_bee[m] [killerbeem@gateway/shell/matrix.org/x-foupilldhzggqzrk] has quit [Remote host closed the connection]
16:08 -!- prakhar_code[m] [prakharcod@gateway/shell/matrix.org/x-xneighayurqdzswt] has quit [Remote host closed the connection]
16:36 -!- prakhar_code[m] [prakharcod@gateway/shell/matrix.org/x-ruuqpvptcjubqfjv] has joined #mlpack
16:51 < ShikharJ> zoq: I also noticed that CelebA dataset is over 700MBs (for 200,000 images). So I don't think it would be wise to run the test on the full dataset. I'll rather work on a subset if you're fine with that?
17:05 < zoq> ShikharJ: sounds reasonable
17:12 -!- killer_bee[m] [killerbeem@gateway/shell/matrix.org/x-yxahgfdfqvznvuui] has joined #mlpack
17:26 < rcurtin> ShikharJ: I took a look at your blog post, the images look great
17:26 < rcurtin> I think that the images are really helpful, I suspect this is the reason why deep learning got so popular---the papers had cool pictures ;)
17:27 < rcurtin> much more exciting than a bunch of theory :(
17:27 < ShikharJ> rcurtin: What's your stand on Geoffrey Hinton
17:28 < ShikharJ> Like his views that Deep Learning is useless and would be replaced by something more radical?
17:29 < ShikharJ> Given that Deep Learning craze itself started after Hinton developed the CD-K algorithm for training Deep Belief Networks?
17:39 < rcurtin> (I'm getting lunch, let me finish then I'll respond :))
17:42 < ShikharJ> Haha, sure.
18:11 < rcurtin> ShikharJ: hmm, I'm not sure about Geoffrey Hinton. I can see where he is coming from---deep learning is just curve fitting, so if you want artificial intelligence, maybe something more radical is needed (but you could even debate that point)
18:11 < rcurtin> I have heard some interesting things about capsule networks, but I haven't investigated them
18:11 < rcurtin> I think a lot of big people in the machine learning field like to say controversial things :)
18:13 < ShikharJ> I often fail to see why deep learning is considered so different from statistics itself as well.
18:15 < ShikharJ> I sometimes feel with statements like these, that probably no one knows why things work in ML.
18:15 < ShikharJ> Obviously leaving aside the statistical ML part.
18:17 < ShikharJ> It's almost like some people are adamant, that we're going in a very wrong direction with Deep Learning.
18:17 < rcurtin> right, I think that many people come from fields that aren't deep learning, and now that deep learning has the spotlight, the feeling is a little like jealousy or envy
18:17 < rcurtin> it's pretty easy to get any paper about deep learning accepted somewhere, but if you do something more niche
18:17 < rcurtin> like... for instance... dual-tree algorithms :)
18:18 < rcurtin> it can be very hard to get those papers accepted
18:18 < rcurtin> I think the same was true before deep learning with SVMs and kernel machines
18:18 < ShikharJ> Haha :)
18:18 < rcurtin> just a reaction to hype and trends I guess
18:19 < rcurtin> but I do agree with your statement... deep learning isn't really different than statistics
18:19 < rcurtin> just an application of a particularly complex set of curve fitters :)
18:20 < ShikharJ> Even Statistical Machine Translation people are unhappy with this. It can be seen easily. Pretty much every new grad student is doing Neural Machine Translation, as it is a lot easier to get paper accepted for the NMT domain :)
18:23 < ShikharJ> Though I'd say even GANs were considered a niche area when they first came out. So there's that.
19:06 -!- sulan_ [~sulan_@563BE0E4.catv.pool.telekom.hu] has quit [Quit: Leaving]
19:27 < ShikharJ> rcurtin: I was wondering if there are any thoughts for moving mlpack repository from GitHub to GitLab or some other place (like armadillo has been moved)?
20:49 -!- killer_bee[m] [killerbeem@gateway/shell/matrix.org/x-yxahgfdfqvznvuui] has quit [Remote host closed the connection]
20:50 -!- prakhar_code[m] [prakharcod@gateway/shell/matrix.org/x-ruuqpvptcjubqfjv] has quit [Remote host closed the connection]
20:51 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
20:52 < rcurtin> ShikharJ: I don't see any particular reason to move away from Github, but if the majority of mlpack developers want to move it, I'm certainly not opposed
20:52 < rcurtin> it would be a bit of work to make the transition though
21:10 -!- prakhar_code[m] [prakharcod@gateway/shell/matrix.org/x-bkbnuvpjcakcreed] has joined #mlpack
23:16 -!- prakhar_code[m] [prakharcod@gateway/shell/matrix.org/x-bkbnuvpjcakcreed] has quit [Remote host closed the connection]
23:36 -!- prakhar_code[m] [prakharcod@gateway/shell/matrix.org/x-qvpztfxkelkcscdj] has joined #mlpack
--- Log closed Wed Jun 13 00:00:01 2018