mlpack IRC logs, 2018-06-29

Logs for the day 2018-06-29 (starts at 0:00 UTC) are shown below.

June 2018
Sun
Mon
Tue
Wed
Thu
Fri
Sat
 
 
 
 
 
1
2
3
4
5
6
7
8
23
24
25
26
27
28
29
30
--- Log opened Fri Jun 29 00:00:24 2018
02:27 -!- yaswagner [4283a544@gateway/web/freenode/ip.66.131.165.68] has quit [Quit: Page closed]
04:33 < jenkins-mlpack> Project docker mlpack weekly build build #48: STILL UNSTABLE in 3 hr 46 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20weekly%20build/48/
04:33 < jenkins-mlpack> * akhandait45: removed reduntant data members, moved NegativeLogLikelihood to loss
04:33 < jenkins-mlpack> * akhandait45: moved NegativeLogLikelihood to loss folder
04:33 < jenkins-mlpack> * akhandait45: rectified mistake in last commit, added comment
04:33 < jenkins-mlpack> * akhandait45: changed path
04:34 < jenkins-mlpack> * akhandait45: added sampling layer
04:34 < jenkins-mlpack> * akhandait45: removed parameters, it just does the reparametrization now
04:34 < jenkins-mlpack> * akhandait45: sampling layer done, kl divergence forward implemented
04:34 < jenkins-mlpack> * akhandait45: kl backward implemented
04:34 < jenkins-mlpack> * akhandait45: fix style errors
04:34 < jenkins-mlpack> * akhandait45: suggested changes made
04:34 < jenkins-mlpack> * akhandait45: seed removed
04:34 < jenkins-mlpack> * akhandait45: changed names in cmakelists
04:34 < jenkins-mlpack> * akhandait45: removed build errors
04:34 < jenkins-mlpack> * akhandait45: corrected kl forward and backward
04:34 < jenkins-mlpack> * akhandait45: added numerical gradient test
04:34 < jenkins-mlpack> * akhandait45: gradient check passed, removed redundant lines
04:34 < jenkins-mlpack> * akhandait45: corrections made in kl, more tests added
06:26 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 248 seconds]
06:29 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
08:01 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 260 seconds]
08:11 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
09:24 < Atharva> zoq: Are you there?
09:59 < zoq> Atharva: I'm now.
10:42 < Atharva> In network_init.hpp, what is offset variable for?
10:58 < zoq> Atharva: So all network parameter/weights are stored in a single matrix (continues memory). The idea is that each layer uses a specific part from the parameter matrix, and offset marks the begin for each layer.
10:58 < zoq> Atharva: So for the first layer offset is 0 and let's say the layer is of size 10 so that the offset for the next layer is 10, if the second layer is of size 20, the offset would be 10 + 20, for the next layer. Hopefully, this was helpful?
10:59 < Atharva> zoq: Understood, thanks!
11:00 < zoq> Atharva: Here to help.
11:23 < jenkins-mlpack> Project docker mlpack nightly build build #364: SUCCESS in 4 hr 9 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/364/
11:41 < Atharva> zoq: Sorry to disturb again, you there?
11:42 < Atharva> For the ann input/output PR, @rcurtin commented if it was possible to remove the inSize parameter from the first layer as well and not just the subsequent layers
11:44 < Atharva> I have found a way but it involves adding one argument to the ResetParameters() function.
11:44 < Atharva> which will be input.n_rows of course
11:49 < Atharva> Or there is another way but in that case we will need to stop using ResetParameters() function ourselves and only allow the forward and train functions to call it.
11:59 < zoq> Does this mean that each layer would have to make sure that ResetParameters() is called at least once, so if we just use the layer independently we have to make sure the function is called in the forward pass right?
12:02 < Atharva> Yes, ResetParameters() has to be called at least once, because that's where we go over te network and set the inSizes and weights for the layers. Till then inSize for all layers is zero.
12:03 < Atharva> But in some cases, ResetParameters() has been used before externally the Forward() or Train() functions. In that case the network has no way of knowing what the size of the input data is.
12:04 < Atharva> used externally before*
12:04 < sumedhghaisas> Atharva: I was just taking a look at the jacobian test for NormalDistribution
12:04 < sumedhghaisas> I am nt sure I understand your jacobian test
12:05 < zoq> ResetParameters: I see, so do you think we could provide both versions, one version expects that inSize is already set and the other takes the inSize from the provided dataset?
12:07 < sumedhghaisas> In the jacobian test you perturb the input and check the approximate gradient with real one. But I see that you are perturbing mean and variance instead
12:08 < Atharva> I was confused a lot over that too, but I think the mean and variance in this case can be considered input, and the obervation is just the target.
12:08 < Atharva> In the LogProbBackward function, the gradients are w.r.t. the mean and std
12:08 < sumedhghaisas> the check is delta_output / delta_input, and we keep the weights, in this case mean and variance, will be kept constant
12:09 < sumedhghaisas> ahh... I see what you mean
12:09 < Atharva> Also, in the ReconstructionLoss, we receive the mean and std as input which we then forward to the NormalDistribution
12:12 < Atharva> zoq: Yes we can keep both the ways, but I think we will need to use multiple definitions of ResetParameters() then
12:13 < Atharva> and if someone calls ResetParameters() without any parameter and doesn't provide input size for the first layer, it will throw an error.
12:13 < zoq> Atharva: Agreed, but at least for now I think that way we can provide backward compatibility.
12:13 < sumedhghaisas> Atharva: ahh I understand the test now. Can you tell me how much is the difference between the two?
12:14 < Atharva> zoq: Sorry I don't understand what you mean by backward compatibility
12:14 < zoq> Atharva: We can could check the inSize parameter and provide some reasonable output for the user.
12:14 < zoq> Atharva: I can use the current code without any changes.
12:15 < Atharva> sumedhghaisas: We get 5000 something when we should get 1e-5
12:16 < sumedhghaisas> 5000?
12:17 < sumedhghaisas> then something is terribly wrong
12:17 < Atharva> zoq: Yes that can be done, but another ResetParameters() definition is needed which will be used internally. People can still use the ann module the way it is noe.
12:17 < Atharva> sumedhghaisas: Yes, are we messing something big up, like conceptually
12:18 < Atharva> zoq: now*
12:18 < sumedhghaisas> hmm. thats scary.
12:19 < sumedhghaisas> I haven't taken a look at the derivatives yet, got a meeting for an hour, I will come back from it and take a look at the gradients :)
12:20 < zoq> Atharva: If a user can use it the way they use it now, I don't see any reason against the idea.
12:20 < Atharva> zoq: Okay, I will push a commit then
12:21 < Atharva> sumedhghaisas: Sure!
14:22 < ShikharJ> zoq: Are you there?
15:08 < zoq> Atharva: I'm here now.
15:08 < zoq> ShikharJ: I'm here now.
15:08 < zoq> ... wrong name :)
15:11 < ShikharJ> zoq: I had a doubt regarding the WGAN algorithm. In the usual GAN, we label the real images as one and fake ones as 0, so that we can obtain log(D(x)) + log(1 - D(G(x))) using CrossEntropy loss function.
15:13 < ShikharJ> zoq: But according to the WGAN algorithm, we need to obtain a simple D(x) - D(G(x)), so I'm not sure if any labelling at all should be done or not.
15:13 < ShikharJ> zoq: If any labelling is to be done, I'm not sure if both the real and the fake ones should have the same label or different labels.
15:19 < zoq> ShikharJ: In case of Wasserstein, it would make sense to use -1 for the generated samples and 1 for the real one, since the output doesn't use an activation on top. But I don't think we will see any difference if we use something else.
15:24 < ShikharJ> zoq: Ah, I see, and anyways, we're trying to maximize the distance between the D(x) and D(G(x)), so it makes sense.
16:09 -!- Netsplit *.net <-> *.split quits: zoq
16:13 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Read error: Connection reset by peer]
16:14 -!- Netsplit over, joins: zoq
16:18 -!- prakhar_code[m] [prakharcod@gateway/shell/matrix.org/x-qvpztfxkelkcscdj] has quit [Ping timeout: 255 seconds]
16:18 -!- killer_bee[m] [killerbeem@gateway/shell/matrix.org/x-enyhokurzmhbrctx] has quit [Ping timeout: 247 seconds]
16:20 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
16:21 -!- manish7294 [8ba7cbd6@gateway/web/freenode/ip.139.167.203.214] has joined #mlpack
16:27 < manish7294> rcurtin: zoq: Can I take 4 days off starting tommorow? I have to attend my cousin's wedding ceremony :)
16:27 < manish7294> I think I can be available on IRC for disscusions during first two days at least.
16:57 < Atharva> sumedhghaisas: Did you get a chance to look at the code?
17:04 -!- prakhar_code[m] [prakharcod@gateway/shell/matrix.org/x-luwvhyxgptnavtun] has joined #mlpack
17:06 < sumedhghaisas> Atharva: I am checking right now :)
17:20 < sumedhghaisas> Atharva: Just a quick question, are you storing the variance or the standard deviation? Cause the Softplus must be applied to variance
17:20 < Atharva> I am soring the standard deviation, why is that?
17:20 < Atharva> storing*
17:22 < sumedhghaisas> ahh sorry i meant standard deviation or log standard deviation. But I see that you are using standard deviation later on as well. never mind :)
17:23 < Atharva> Okayy :)
17:32 < sumedhghaisas> Atharva: could you run the test without Softplus?
17:32 < sumedhghaisas> just want to know where the exact error is
17:34 < Atharva> Okay, give me a minute
17:34 < sumedhghaisas> and also we should have with and without Softplus :)
17:36 < Atharva> Do you mean the tests with and without Softplus?
17:37 -!- killer_bee[m] [killerbeem@gateway/shell/matrix.org/x-wvdjelpgxdrufsgw] has joined #mlpack
17:38 < sumedhghaisas> yup
17:38 < Atharva> Okay, I will add another
17:38 < sumedhghaisas> okay the math seems correct on the first glance
17:38 < sumedhghaisas> and the test also looks correct
17:38 < sumedhghaisas> hmmm
17:39 < sumedhghaisas> interesting
17:39 -!- manish7294 [8ba7cbd6@gateway/web/freenode/ip.139.167.203.214] has quit [Ping timeout: 260 seconds]
17:39 < Atharva> Sorry I was on another branch, so it's taking time to build the tests
17:39 < sumedhghaisas> no problem
17:40 < sumedhghaisas> okay couple of other pointers
17:40 < sumedhghaisas> first is to see if without softplus passes
17:40 < sumedhghaisas> could you also run the test with 1 target element and try to analyze the results?
17:40 < Atharva> Okay, will do that
17:41 < sumedhghaisas> there should be just one entry in the matrix so easy to follow
17:43 < Atharva> It just passed without softplus
17:43 < sumedhghaisas> Atharva: aha...
17:47 < sumedhghaisas> okay need to run to another meeting. But now that we know the problem lies in softmax its easy to find.
17:47 < sumedhghaisas> I will take a look later again if you haven't solved it by then.
17:48 < Atharva> Hopefully I will have :)
18:11 < ShikharJ> zoq: Do you think I could implement a identity loss layer for which the forward routine just returns -arma::accu(target % (input + eps))?
18:21 < ShikharJ> zoq: Saying this because the FFN by default sets the loss layer to negative log-likelihood, and the WGAN paper is against using any kind of log or exponent based loss function.
18:26 < ShikharJ> zoq: Or if it's fine, I can try to implement the same in the Evaluate function itseld, no worries there.
18:37 < zoq> ShikharJ: Implementing an identity loss layer might be the cleanest solution, do you prefer to implement it inside the evaluation function?
18:38 < ShikharJ> zoq: I can implement it inside the loss_functions directory. Similar for the loss routine for WGANGradientPenalty.
18:39 < zoq> ShikharJ: Yeah, I like the idea.
18:40 < ShikharJ> zoq: Ideally we're computing an approximation of Kantorovich-Rubenstein duality form of Wasserstein-1 (or Earth Mover) distance. Do you want it to be named as such?
18:43 < zoq> ShikharJ: Don't have a preference here, each one sounds reasonable to me.
18:44 < ShikharJ> zoq: Then I'll name it EarthMover distance loss or something similar, so that a user has an idea of where the code is intended to be used.
18:45 < zoq> ShikharJ: Sounds perfect to me, don't think there are any confusions with another part of the codebase.
18:48 < zoq> manish7294: Thanks for the heads up, as from my side this is absolutely fine, have fun :)
18:50 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 256 seconds]
19:26 < Atharva> sumedhghaisas: It was due to the fact that the approximate jacobian was calculated w.r.t. the standard deviation and logProbBackward was w.r.t. pre standard deviation
19:26 < Atharva> I tried perturbing pre standard deviation and the test passed
21:23 < ShikharJ> zoq: rcurtin : Are you guys online?
21:26 < zoq> ShikharJ: About to step out, but I'm still here.
21:27 < ShikharJ> zoq: I was just thinking what would be the correct way of finding the column wise L2 norm of a matrix using armadillo's functions?
21:30 < ShikharJ> zoq: Considering an MxN matrix, we need to find the individual norms of the n columns?
21:38 < zoq> ShikharJ: Actually, I'm not sure there is an armadillo function that returns the norm for each col, so probably a loop is necessary. What you could do is to search the codebase for "L2 distance" or "euclidean norm".
21:39 < ShikharJ> zoq: There isn't, but the issue is that WGAN Gradient Penalty algorithm calculates the norm for each single input in the mini-batch, so I guess I'll go with the loop :)
21:40 < ShikharJ> zoq: The WGAN PR is now complete, I'll push the code in sometime, and tmux the builds tomorrow to see how we fare!
21:58 < rcurtin> manish7294: of course, enjoy the wedding ceremony!
21:59 < rcurtin> I will have some more theory for you by the time you get back
22:02 < rcurtin> things have been slow for me here, a lot to do before leaving the job next week
22:02 < ShikharJ> rcurtin: Till when can we expect the benchmark systems to remain functional?
22:05 < rcurtin> ShikharJ: at least Friday, but I will bring up replacements but they will not be as powerful
22:06 < ShikharJ> rcurtin: It's all good, atleast we would be finished with our planned goals and experiments by then :)
22:10 < rcurtin> right, well in any case I will be sure that we have something available to run experiments on
22:11 < ShikharJ> rcurtin: If I may ask, what was your reason for leaving Symantec?
22:25 < rcurtin> lack of work aligned with my research interests
22:25 < rcurtin> malware is an interesting problem, but I'm really interested in accelerating algorithms
22:25 < rcurtin> and there was not so much space for that inside symantex
22:25 < rcurtin> symantec*
22:25 < rcurtin> it wasn't a huge company need
22:26 < ShikharJ> Ah, I can understand, accelerating stuff is a highly rewarding feeling :)
22:28 < ShikharJ> Even I hate being assigned work I have no interest in :P Hope you have a good time in your new position.
23:08 < rcurtin> I hope so too, thanks :)
--- Log closed Sat Jun 30 00:00:25 2018