mlpack IRC logs, 2018-06-01

Logs for the day 2018-06-01 (starts at 0:00 UTC) are shown below.

>
June 2018
Sun
Mon
Tue
Wed
Thu
Fri
Sat
 
 
 
 
 
1
2
3
4
5
6
7
8
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
--- Log opened Fri Jun 01 00:00:44 2018
00:18 -!- manish7294 [~yaaic@2405:205:230e:5f8b:58de:366:8945:38e6] has joined #mlpack
00:21 < manish7294> rcurtin: I am trying to load covertype dataset as a csv file. But somehow, I am repeatedly getting file extension not recognised error. Do you have any idea, what could possibly be going wrong here?
00:23 < manish7294> I am using kaggle's covertype.csv
02:09 < rcurtin> manish7294: how are you loading it?
03:07 < manish7294> rcurtin: using command line -i covertype.csv
03:17 < rcurtin> there should not be a problem with that, but I don't really have enough information to help you debug here
03:17 < rcurtin> what is the full command line that you are using?
03:19 < manish7294> rcurtin: bin/mlpack_lmnn -i covertype.csv -k 5 -a 0.01 -o output.csv --verbose
03:20 < manish7294> rcurtin: I will try to debug more.
03:20 < rcurtin> I think you'll have to use a debugger, I don't easily see any reason why that should give you problems with file extensions
03:21 < rcurtin> you could catch when the exception is thrown and step through the backtrace from there, I imagine that will help you figure out where the issue is
03:21 < rcurtin> if it's an issue in the mlpack loading code, let's definitely fix it, but that's not one I've seen before
03:21 < rcurtin> I'm going to head to bed now---good night! :)
03:22 < manish7294> good night :)
03:44 -!- Netsplit *.net <-> *.split quits: ShikharJ, petris
03:57 < manish7294> converting dataset to .txt worked.
03:58 < jenkins-mlpack> Project docker mlpack weekly build build #44: STILL UNSTABLE in 3 hr 11 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20weekly%20build/44/
06:16 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
06:23 -!- ShikharJ [6a332755@gateway/web/freenode/ip.106.51.39.85] has joined #mlpack
06:24 < ShikharJ> zoq: Apart from the epochs and the pre-training mentioned, the parameters were unchanged. So, atleast we have some meaningful defaults.
06:24 < ShikharJ> zoq: Thanks for the link, but I figured out a way to plot using matplotlib :)
06:24 < ShikharJ> zoq: I got some results, I'll comment on the PR.
06:24 < ShikharJ> This was really fast :)
06:29 -!- mikeling [uid89706@gateway/web/irccloud.com/x-cftmbciyshhazfyz] has joined #mlpack
06:31 -!- ShikharJ [6a332755@gateway/web/freenode/ip.106.51.39.85] has quit [Quit: Page closed]
06:37 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
06:48 -!- ShikharJ [6a332755@gateway/web/freenode/ip.106.51.39.85] has joined #mlpack
06:50 < ShikharJ> zoq: Ack, I set the sampling size to 400 instead of 10, so it ended up in the output being incomprehensible, I have tmux'd it again, with the changes, let's see.
06:53 < ShikharJ> zoq: Sorry, if I missed any messages after I said that I'll comment on the PR. EliteBNC is supposedly down, so I'm back on freenode webchat.
06:54 < Atharva> ShikharJ: You can always check the irc logs :)
06:54 < ShikharJ> Atharva: Yeah, did that. Thankfully, I didn't miss anything :)
06:55 < Atharva> Good to know.
06:59 < ShikharJ> zoq: If you have some hyper-parameter suggestions, we can also explore them. It takes less than 12 hours on the full dataset with the maxed out O'Reilly test parameters. We should really test our output for the single optimizer case, and then contrast them with the dual optimizer case output.
07:08 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
07:16 < ShikharJ> zoq: The optimization support, I feel, would be better done along with support for batch-sizes, as I really can't think of much benefit in having two separate optimizers that iterate on singular inputs (apart from the check reduction that you mentioned). Let's push in the support for GAN and DCGAN till Phase 1, so that we can have the basic infrastructure ready, and then we can focus on this task.
07:19 < ShikharJ> In the meantime, we can also collect a lot of output data on different parameters, so in case the separation leads to a worse output than before, we wouldn't have to worry, as the GAN infrastructure would be already incorporated into mlpack.
07:47 -!- ShikharJ is now known as 43UAB3PW7
07:47 -!- ShikharJ [Elite21812@gateway/shell/elitebnc/x-rrlewaeogpvgutmn] has joined #mlpack
07:47 -!- petris [quassel@2600:3c02::f03c:91ff:fe25:b576] has joined #mlpack
07:51 -!- 43UAB3PW7 [6a332755@gateway/web/freenode/ip.106.51.39.85] has quit [Quit: Page closed]
08:39 -!- mikeling [uid89706@gateway/web/irccloud.com/x-cftmbciyshhazfyz] has quit [Quit: Connection closed for inactivity]
09:35 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
09:49 < jenkins-mlpack> Yippee, build fixed!
09:49 < jenkins-mlpack> Project docker mlpack nightly build build #336: FIXED in 2 hr 35 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/336/
09:55 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Read error: Connection reset by peer]
10:00 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
10:04 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Read error: Connection reset by peer]
10:06 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
11:36 < Atharva> rcurtin: armadillo doesn't overload the ? : operators, does it?
11:38 < Atharva> In the softplus activation function, for vectors it has been done by looping over all the elements, and it doesn't support matrices.
11:40 < manish7294> Atharva: You may use for_each or transform for this.
11:42 < Atharva> manish7294: Performance wise, is for_each comparable to for loops or operator overloading (like we have for + - / ...)?
11:48 < manish7294> Atharva: I am not sure regarding this, but I think here for loop may be the case. I think Ryan can tell more about this. You can also have a quick look over armadillo code, it may help.
11:48 < Atharva> Anyway, I think I will go with for_each. Thanks for that! Yeah, I will have a look over the armadillo code.
11:49 < manish7294> great :)
12:21 < ShikharJ> zoq: It turns out on the benchmark systems, the training takes a lot faster than I expected (probably less than 5 hours with maxed out hyperparameters).
12:25 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
12:34 < ShikharJ> zoq: Looks like I messed up by providing the wrong input dataset. Gosh!
12:50 < ShikharJ> rcurtin: Do we have a ready to deploy mnist dataset available for use in mlpack? Would the one in mlpack/models work?
13:27 < rcurtin> yeah, that one could work. a subset of mnist is in src/mlpack/tests/data/, but it's not very large
13:29 < ShikharJ> rcurtin: Yeah, I worked with that and got some promising results which I've put up on the PR. But I'm not aware if the format of individual pixels (along rows or along columns) is the same in both?
13:37 < rcurtin> hmm, I think the subset may actually be transposed, but I'm not sure. it's stored as a binary Armadillo matrix (.arm), so you could load it and see
13:37 < rcurtin> it should be 784 rows, N columns (where N is the number of points)
13:38 < ShikharJ> rcurtin: In mlpack/models, the test dataset is 784 columns, so definitely transposed.
13:39 < rcurtin> ah, ok... not sure why it is transposed
13:39 < ShikharJ> zoq: It'll take me some time to prepare the dataset and upload the results it seems, nevertheless, I have begun the work on DCGAN.
13:44 -!- manish7294 [~yaaic@2405:205:230e:5f8b:58de:366:8945:38e6] has quit [Ping timeout: 245 seconds]
13:55 -!- wenhao [731bc1e7@gateway/web/freenode/ip.115.27.193.231] has joined #mlpack
14:09 < Atharva> rcurint: zoq: Do take a look at #1414
14:20 < rcurtin> Atharva: yeah, I saw the email, I'll try to get to it today
14:22 < Atharva> rcurtin: sure, whenever you get time
14:27 -!- Trion [~trion@122.173.213.239] has joined #mlpack
14:31 < wenhao> lozhnikov: Hi Mikhail. I am trying to do neighbor search with cosine distance but NeighborSearch with KDTree may not work with cosine distance. I guess one way to do that is to first normalize all query vectors and reference set vectors to unit length, and then perform NeighborSearch with Euclidean distance.
14:32 < wenhao> That's because, with normalized vectors, neighbor search with Cosine Distance is equivalent to neighbor search with Euclidean Distance. But I am not sure whether my proof/calculation is correct. What do you think?
14:34 < rcurtin> wenhao: that's correct
14:35 < rcurtin> if you don't want to normalize, another option would be to use FastMKS (fast max-kernel search)
14:35 < rcurtin> however, the bounds are tighter for pruning with nearest neighbor search, so unless there is a good reason to avoid normalization I think that may be the better strategy
14:37 < rcurtin> if you're interested in reading more, the paper http://www.ratml.org/pub/pdf/2014fastmks.pdf has a description of the max-kernel search problem and when it reduces to nearest neighbor search
14:37 < rcurtin> but I am not sure how interesting the paper is. my perspective is biased :)
14:48 -!- Trion [~trion@122.173.213.239] has quit [Quit: Entering a wormhole]
14:48 < wenhao> rcurtin: Thanks! The mks problem sounds interesting. What's the time complexity if I use fastmks to search for k neighbors (k > 1) instead of only one neighbor with maximal kernel?
14:48 -!- mikeling [uid89706@gateway/web/irccloud.com/x-icelohqzydpdbxjq] has joined #mlpack
14:49 < wenhao> I am comparing which solution could be faster
15:04 < rcurtin> hm, so the asymptotic time complexity depends all kinds of strange constants that we probably won't know
15:04 < rcurtin> but in reality the FastMKS algorithm requires building a cover tree in kernel space, and usually that takes a lot longer than building a kd-tree in the original space
15:05 < rcurtin> hang on, I have a meeting, back later...
15:32 < Atharva> rcurtin: Sorry I just realized i misspelled your nick in my second last message.
15:46 < rcurtin> Atharva: no worries, I didn't notice :)
17:08 -!- mikeling [uid89706@gateway/web/irccloud.com/x-icelohqzydpdbxjq] has quit [Quit: Connection closed for inactivity]
18:14 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Read error: Connection reset by peer]
18:17 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
18:38 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 244 seconds]
19:41 -!- travis-ci [~travis-ci@ec2-50-19-20-167.compute-1.amazonaws.com] has joined #mlpack
19:41 < travis-ci> ShikharJ/mlpack#168 (DCGAN - 1f6ee54 : Shikhar Jaiswal): The build has errored.
19:41 < travis-ci> Change view : https://github.com/ShikharJ/mlpack/compare/38e691ba9b9a^...1f6ee54a2196
19:41 < travis-ci> Build details : https://travis-ci.org/ShikharJ/mlpack/builds/386825163
19:41 -!- travis-ci [~travis-ci@ec2-50-19-20-167.compute-1.amazonaws.com] has left #mlpack []
21:25 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
--- Log closed Sat Jun 02 00:00:45 2018