mlpack IRC logs, 2018-02-08

Logs for the day 2018-02-08 (starts at 0:00 UTC) are shown below.

February 2018
--- Log opened Thu Feb 08 00:00:03 2018
01:39 -!- govg [~govg@unaffiliated/govg] has quit [Ping timeout: 276 seconds]
02:02 -!- travis-ci [] has joined #mlpack
02:02 < travis-ci> mlpack/mlpack#3885 (master - 38d2155 : Marcus Edel): The build has errored.
02:02 < travis-ci> Change view :
02:02 < travis-ci> Build details :
02:02 -!- travis-ci [] has left #mlpack []
03:31 -!- govg [~govg@unaffiliated/govg] has joined #mlpack
05:28 -!- daivik_ [7d10a738@gateway/web/cgi-irc/] has joined #mlpack
06:13 -!- shenghac [df8c681e@gateway/web/freenode/ip.] has joined #mlpack
06:17 -!- kaushik_ [uid193796@gateway/web/] has joined #mlpack
06:26 -!- shenghac [df8c681e@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]
10:07 -!- kaushik_ [uid193796@gateway/web/] has quit [Quit: Connection closed for inactivity]
11:07 -!- HeikoS [] has joined #mlpack
11:23 -!- ironstark [uid221607@gateway/web/] has quit [Ping timeout: 256 seconds]
11:47 -!- daivik_ [7d10a738@gateway/web/cgi-irc/] has quit [Quit: - A hand crafted IRC client]
11:47 -!- daivik [7d10a738@gateway/web/cgi-irc/] has joined #mlpack
12:19 -!- daivik [7d10a738@gateway/web/cgi-irc/] has quit [Quit: - A hand crafted IRC client]
12:34 -!- rcurtin [] has quit [Ping timeout: 260 seconds]
12:34 -!- rcurtin [] has joined #mlpack
12:34 -!- Topic for #mlpack: -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs:
12:34 -!- Topic set by naywhayare [] [Wed May 21 16:24:10 2014]
12:34 [Users #mlpack]
12:34 [ anirudhm ] [ gtank ] [ K4k ] [ lozhnikov ] [ rcurtin] [ zoq]
12:34 [ djhoulihan] [ HeikoS ] [ keonkim ] [ nikhilweee] [ vivekp ]
12:34 [ govg ] [ jenkins-mlpack] [ killer_bee[m]] [ petris ] [ wiking ]
12:34 -!- Irssi: #mlpack: Total of 16 nicks [0 ops, 0 halfops, 0 voices, 16 normal]
12:34 -!- Home page for #mlpack:
12:34 -!- Channel #mlpack created Tue Oct 11 18:35:40 2011
12:35 -!- Irssi: Join to #mlpack was synced in 34 secs
12:58 -!- HeikoS [] has quit [Ping timeout: 255 seconds]
13:19 -!- ironstark [uid221607@gateway/web/] has joined #mlpack
13:31 -!- govg [~govg@unaffiliated/govg] has quit [Ping timeout: 256 seconds]
13:43 -!- govg [~govg@unaffiliated/govg] has joined #mlpack
13:47 -!- travis-ci [] has joined #mlpack
13:47 < travis-ci> mlpack/mlpack#3888 (master - 4bd01bb : Marcus Edel): The build has errored.
13:47 < travis-ci> Change view :
13:47 < travis-ci> Build details :
13:47 -!- travis-ci [] has left #mlpack []
14:55 < zoq> rcurtin: Maybe you already tested this, do we have to use sudo on travis for the pip step?
14:56 < rcurtin> ah, yeah, sorry, I think that is true
14:56 < rcurtin> since I did my testing as root, I did not think about that
14:57 < rcurtin> it looks like the test run timed out:
14:58 < rcurtin> we could see if the python fix worked by just adding a test step (to be run from the build directory):
14:58 < rcurtin> LD_LIBRARY_PATH=lib/ PYTHONPATH=src/mlpack/bindings/python/ python3 src/mlpack/bindings/python/ test
14:58 < rcurtin> and that should show the 'setuptools too old' error if there is one
15:06 < zoq> I think the cache would kick in once we install the python packages without sudo, let's see.
15:09 < rcurtin> maybe sudo is not necessary, I am not sure; you could try without it
15:10 < zoq> I guess the issue is that we cache '$HOME/.cache/pip' which is not correct if we use sudo.
15:10 < rcurtin> yeah; as long as python will correctly use the locally-installed up-to-date version of setuptools it should be no problem
15:37 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
16:16 -!- daivik [9d31f634@gateway/web/cgi-irc/] has joined #mlpack
16:16 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 256 seconds]
16:38 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
16:47 -!- daivik [9d31f634@gateway/web/cgi-irc/] has quit [Quit: - A hand crafted IRC client]
16:55 -!- daivik [9d31f634@gateway/web/cgi-irc/] has joined #mlpack
17:09 < daivik> rcurtin: In continuation to the discussion yesterday: I think I may have found a fix for the problem we were facing, but I don't know whether it should be a permanent solution - please let me know what you think. So, when we left off - we knew Emission distribution had dimensionality 0 for some reason. I printed the emission distribution like you s
17:09 < daivik> uggested and found that:
17:09 < daivik> 1. emission.size() was two times the number of states. I later found that the reason for this was a stray emission.resize() right before the loading step (in hmm_model.hpp). Commenting this out corrected the size of the emission vector.
17:09 < daivik> 2. Each emission[i] had dimensionality one more than the expected. For some reason (which I'm yet to figure out), when a DiscreteDistribution is saved (using something like ar & BOOST_SERIALIZATION_NVP(dd)) and then reloaded (again using BOOST_SERIALIZATION_NVP(dd)), the dimensionality increases by one. Please run this simplified program I've writt
17:09 < daivik> en as a POC ( In order to fix this, I added a check in DiscreteDistribution::serialize() to erase the first element of the probabilities vector whenever a load occurs (because the first element was the extra/useless thing that was being added).
17:09 < daivik> I have a few more points to add to this - each of which I'm fairly certain is an issue:
17:09 < daivik> 1. None of the serialization tests report the error with DiscreteDistribution::serialize() when I run mlpack_test. IMHO, something should have failed and alerted about this.
17:09 < daivik> 2. My "fix" is not really a fix. I still do not know why the extra dimension is being added on its own. And while mlpack_test shows no errors (related to this), how do I know that I haven't broken something?
17:09 < daivik> In summation, what do I do now? Do I submit a pull request with my "fix"? Do I open an issue? Do I proceed with trying to find out why the exra dimension is being mysteriously added? Do I write tests/more tests for serialization/hmm CLI bindings?
17:14 < rcurtin> daivik: thanks for digging so deep. let me look into this and get back to you in a little while (I am about to go into a meeting)
18:29 < rcurtin> daivik: I tried running the POC code you gave, but it looks like on my system, DiscreteDistribution::serialize() seems to behave properly:
18:29 < rcurtin>
18:30 < rcurtin> I do think you are correct that emission.resize() is not necessary; I know why it is that way...
18:31 < rcurtin> it used to be that the code was structured differently and we couldn't use BOOST_SERIALIZATION_NVP() to serialize a vector type so we had to serialize the elements individually (long story)
18:31 < rcurtin> when we switched it to use boost serialization as it was intended, we refactored that code, but I guess the emissions.resize() stayed around by accident
18:38 < daivik> can you please also verify whether you're having problem 1. on your system. that is, try saving a model file using hmm_train and then reusing that in hmm_loglik
18:39 < daivik> the resize() should cause problems on any system
18:40 < daivik> also, what might be the reason for the added dimension i get on my system (I'll paste the output I get in a minute when the build finishes)
18:40 < rcurtin> sure, let's see what that does
18:45 < rcurtin> daivik: first I copied the exact model XML from the pastebin you gave, and that appears to work correctly:
18:45 < rcurtin> [INFO ] Loading 'test.csv' as raw ASCII formatted data. Size is 1 x 3.
18:45 < rcurtin> log_likelihood: -7.73293
18:47 < daivik> thats very strange -- i had to dig through a lot of code to get that
18:48 < rcurtin> and if I try to build the model the same way I think you did (mlpack_hmm_train -i train.csv -n 2 -t discrete -M model2.xml), then run with that model, it successfully completes with a similar log likelihood
18:48 < rcurtin> hmmm, what version of boost are you using?
18:50 < daivik> 1.58
18:51 < rcurtin> I am using 1.62 here; I wonder if this is some boost serialization versioning issue
18:51 < rcurtin> I say this because I am currently debugging some serialization failure on 1.66...
18:51 < daivik> right, i should really upgrade
18:52 < rcurtin> still, we do support boost 1.58 so if the issue is in boost we'll need to come up with some kind of workaround...
18:53 < daivik> geez, all this for just a version issue. I've learnt my lesson now
18:53 < rcurtin> well, if upgrading does fix it, we should still at least open an issue somewhere, because I am sure you will not be the last boost 1.58 user :)
18:55 < daivik> i guess i'll upgrade -- probably lose the issue -- and then carry on writing the tests for hmm CLI bindings :)
18:55 < daivik> I'll let you know if upgrading resolves the issue
18:56 < daivik> thanks a lot for your help
18:57 < daivik> And I'm very sorry for wasting your time -- I should've known to check the release notes at least
18:59 < rcurtin> no, you are not wasting my time, figuring stuff out like this is important :)
18:59 < rcurtin> keep in mind that sooner or later someone else will come along with the same problem, so it's important to figure out what's going on on your system and try and diagnose it
18:59 < rcurtin> and it can be a time consuming process sometimes
19:00 < rcurtin> for instance, this other serialization issue I mentioned with 1.66, I've probably spent 2 or 3 hours digging into it today
19:00 < rcurtin> what I discovered in the end is that some change from 1.65.1 to 1.66 caused a problem in the mlpack tests, so I was about to submit the bug report to the serialization github repository
19:01 < rcurtin> and then I discovered that actually, there is not a bug in boost::serialization, just unclear behavior that could be improved, and that this behavior has existed since long before 1.66 came out
19:01 < rcurtin> so, I guess I'll fix the mlpack code, not submit the bug to upstream serialization, and then try and figure out what the actual difference was that causes it to fail for 1.66 but not 1.65...
19:05 < rcurtin> so yeah... this is a large part of the important maintenance that has to happen for a project like this, so don't feel bad at all pointing out issues that need to be investigated like this :)
19:13 < daivik> I did get to learn about the boost serialization modules - so i'm not completely disheartened. And yes, you're absolutely right - things like this do take up a lot of time sometimes (and often dont lead to things of much consequence), but I guess they're still important - like you said.
19:19 -!- kaushik_ [uid193796@gateway/web/] has joined #mlpack
21:40 -!- daivik [9d31f634@gateway/web/cgi-irc/] has quit [Quit: - A hand crafted IRC client]
21:47 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
21:55 -!- travis-ci [] has joined #mlpack
21:55 < travis-ci> ShikharJ/mlpack#65 (RBM - 9963311 : Shikhar Jaiswal): The build has errored.
21:55 < travis-ci> Change view :
21:55 < travis-ci> Build details :
21:55 -!- travis-ci [] has left #mlpack []
22:25 < rcurtin> zoq: I don't think the trusty cython3 package will work; it is only version 0.20 but we need 0.24 minimum
22:25 < zoq> rcurtin: ah
22:25 < zoq> thanks, I should download the docker image, and see if I can reproduce the caching issue
22:30 < rcurtin>
22:30 < rcurtin> that looks like it could be helpful but it seems to be cython2 not cython3
22:31 < rcurtin> ah that seems that it does have cython3 in it
22:32 < zoq> okay, let's test that
22:39 -!- kaushik_ [uid193796@gateway/web/] has quit [Quit: Connection closed for inactivity]
23:36 -!- travis-ci [] has joined #mlpack
23:36 < travis-ci> ShikharJ/mlpack#66 (master - 4bd01bb : Marcus Edel): The build has errored.
23:36 < travis-ci> Change view :
23:36 < travis-ci> Build details :
23:36 -!- travis-ci [] has left #mlpack []
--- Log closed Fri Feb 09 00:00:04 2018