[mlpack] Degenerate cases and other GMM problems

Jaelon joery_vn at hotmail.com
Tue Apr 9 13:58:09 EDT 2013


Hey,

I'm a student in my final year and ran in the exact same problem yesterday.
I have explained this to my mentor and he told me that it would prob be 
the excessive zero values in my training set.
His suggestion was to replace the zero values with a higher and 
extremely unlikely number with some random noise on it. Although his 
idea sounds interesting and could work I haven't had time to try it out 
and think it depends on what you are trying to do whether it will work 
or not.
In my case i have data in 15 dimensions but not every dimension will 
have a value available all the time so I keep them at zero in that case. 
In this case it might just work to give them a high value.

But perhaps someone with more experience has a better and definite 
solution for the problem at hand.

Good luck,
Joery

On 04/09/2013 07:34 PM, John Demme wrote:
> Hi All-
>
> I'm trying to use mlpack's GMM with some data I've got. I'm not so 
> familiar with the statistical tools used here as I should be, so I've 
> run into some problems that I'm having trouble debugging on my own:
>
> - First, I often get "error: inv(): matrix appears to be singular" 
> during estimation. It appears that during estimation, one (or more) 
> rows and columns of a covariance matrix become 0, and I think this 
> causes it to become non-invertible.
>
> - Second, in cases when estimation completes, I often end up with 
> means, weights and covariances which are all -nan.
>
> I'm not sure whether I'm mis-using the tool or I've got funny data 
> which need to be conditioned. It's six-dimensional, values less than 
> 1.0 and one of the features is very often zero. (I'm wondering if that 
> last bit means that one good gaussian would be zero mean and zero 
> stdev, resulting in a degenerate covariance matrix -- though I don't 
> know enough stat and linear algebra to work this out.) Can someone 
> give some advice?
>
> I've posted a small sub-set of my data which can trigger these problems:
> www.cs.columbia.edu/~jdd/gmm_obs0.csv 
> <http://www.cs.columbia.edu/%7Ejdd/gmm_obs0.csv>
>
> If I run "./gmm -i gmm_obs0.csv -g 5" I can get the first problem. 
> Changing the number of gaussians to 8 results in the second problem.
>
> Thanks in advance,
> John
>
>
> _______________________________________________
> mlpack mailing list
> mlpack at cc.gatech.edu
> https://mailman.cc.gatech.edu/mailman/listinfo/mlpack

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20130409/a1bb4283/attachment-0003.html>


More information about the mlpack mailing list