mlpack IRC logs, 2018-06-06

Logs for the day 2018-06-06 (starts at 0:00 UTC) are shown below.

June 2018
--- Log opened Wed Jun 06 00:00:51 2018
01:44 -!- witness_ [uid10044@gateway/web/] has quit [Quit: Connection closed for inactivity]
03:17 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
06:47 < ShikharJ> zoq: Should we continue the testing? It's been three days today, and the network still continues to train, on the full test.
07:08 -!- witness_ [uid10044@gateway/web/] has joined #mlpack
07:30 < ShikharJ> zoq: It turns out that a single iteration of the optimizer takes about a second, so for 70,000 images iterated over 2000 epochs, this takes way too much time (more than what is required to train probably). I'll see if I'm able to get good results within a day of training or not.
07:41 < ShikharJ> zoq: I have tmux'd another session which should take a day at most, I'll also spawn some other session on different hyperparameters.
10:03 < jenkins-mlpack> Project docker mlpack nightly build build #341: SUCCESS in 2 hr 49 min:
11:59 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 265 seconds]
12:10 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
12:24 -!- wenhao [731bc1ca@gateway/web/freenode/ip.] has joined #mlpack
13:58 < rcurtin> ShikharJ: 2000 epochs seems like a lot to me, do GANs really take that long to train? all of the work I have ever done with neural networks and MNIST reaches maximum accuracy usually within 100 epochs
13:58 < rcurtin> note, I am not an expert, so maybe 2k epochs is totally reasonable, I am just curious
14:00 < ShikharJ> rcurtin: The O'Reilly test example ran for that many epochs (100,000 epochs on a 50 batch), so we just went by 2,000. Hopefullty the per epoch evaluation is a lot better with mlpack. Let's see.
14:00 < rcurtin> wow, 100k epochs... and was that really necessary to get the performance they got?
14:00 < rcurtin> I'm not familiar with the example by the way, so I'd be interested in glancing at the paper or reference if you have it handy
14:02 < ShikharJ> Yeah, but seemingly, we got better results with a smaller dataset and a lot smaller epoch and pre-training. So hopefully, the things even out.
14:05 < ShikharJ> rcurtin: Take a look here for the slightly modified example (
14:06 < ShikharJ> rcurtin: Here is the original paper, though I don't think they have specified the code anywhere (
14:07 < rcurtin> oh, I see, actually that is 100k batches, not 100k epochs (if 'epoch' is defined as one full pass over the data)
14:08 < rcurtin> since the MNIST training data is 55k points, that actually comes out to roughly 91 full passes over the dataset
14:08 < rcurtin> if I am understanding it right
14:08 < ShikharJ> Training data is 60,000 points if I'm not wrong and 10,000 test data points for a total of 70,000 images.
14:09 < rcurtin> that's how I've typically seen it, but if they are using the same mnist package in Python that I've used before, it's 55k training, 5k validation, 10k test
14:09 < rcurtin> if it is 60k points, that's ~83 full passes, which to me seems a lot less crazy than 100k passes :)
14:10 < ShikharJ> But still, it's a lot. A single pass over the entire dataset of 70,000 images would approximately take over 19 hours.
14:11 < rcurtin> right, that seems really long compared to what I would expect
14:11 < rcurtin> if the batch size support is not yet ready for your GAN implementation, that can make a huge difference
14:12 < ShikharJ> I've currently spawned a new job with 10,000 images and 10 epochs to see if we get somewhere. SHould be done by tomorrow.
14:13 < rcurtin> cool, hopefully it performs well :)
14:26 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
15:31 -!- travis-ci [] has joined #mlpack
15:32 < travis-ci> ShikharJ/mlpack#172 (GAN - 9014d65 : Shikhar Jaiswal): The build has errored.
15:32 < travis-ci> Change view :
15:32 < travis-ci> Build details :
15:32 -!- travis-ci [] has left #mlpack []
15:59 -!- wenhao [731bc1ca@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
17:37 < zoq> ShikharJ: Let's see if we can get good results on a smaller subset, we can always run more experiments on the side.
19:34 < ShikharJ> zoq: Posted some results which I got, I'll post more tomorrow.
19:35 < zoq> ShikharJ: Looks good, do you mind to post the test script you used?
19:37 < ShikharJ> zoq: Test script as in the code that got me the output? It's the same as the one in the PR (GANMNISTTest), with the mentioned hyper-parameters changed. Like epochs limited to 10, ganPreTrain set to 300 and datasetMaxCols set to 10000.
19:38 < zoq> ShikharJ: ahh, okay
19:39 < ShikharJ> This was about 3 times faster than I was expecting it to take, so probably a larger dataset can also be tested. Let me try full dataset with 20 epochs. It should take a day, and if the results are just as good as the output for O'Reilly example, then we're all good to merge.
19:39 < zoq> ShikharJ: I think the results are really good for the current settings, find good paramater for GAN is difficult
19:40 < zoq> agreed
19:40 < zoq> as I said before, we can always run more experiments on the side
19:41 < ShikharJ> zoq: I had also spawned a couple of jobs for 15 and 20 epochs (10,000 images), let's see how the outputs change for those cases as well.
19:41 < ShikharJ> I'll post them, as they become available.
19:42 < zoq> great, nice to see some load on the machine :)
19:44 < zoq> let me install htop :)
19:44 < ShikharJ> zoq: I'm sorry this took a while longer than I had planned, I'll get all the tests done before the evaluations.
19:45 < zoq> No worries at all, we should take all the time we need to get some good results before we move forward
19:45 < zoq> Load average: 2.52
19:46 < zoq> still some room left
19:46 < ShikharJ> zoq: What's load average?
19:46 < zoq> system utilization
19:46 < ShikharJ> zoq: I just started the full job, so 3 jobs running now.
19:47 < zoq> you can run htop, to see some nice results
19:49 < ShikharJ> Load Average 3.05 now.
19:49 < zoq> on a 4 core system max is 4.0
19:49 < ShikharJ> I guess that's it, so now we can just wait :)
19:50 < zoq> right :)
20:13 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 245 seconds]
20:32 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
20:33 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
20:43 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 240 seconds]
20:47 -!- witness_ [uid10044@gateway/web/] has quit [Quit: Connection closed for inactivity]
--- Log closed Thu Jun 07 00:00:53 2018