[mlpack] Fwd: Degenerate cases and other GMM problems

Ryan Curtin gth671b at mail.gatech.edu
Wed Apr 10 15:43:46 EDT 2013


On Tue, Apr 09, 2013 at 04:56:01PM -0400, John Demme wrote:
> Thanks for your help, Ryan!
> 
> I'm testing out the latest now. Seems to fix at least one problem. It's
> looking like fixing the other may just be a matter of adding trials until
> it finds something reasonable, though I'm not sure. I'll be looking at it
> closer tomorrow.

After a lot of digging here is what I discovered.  Your data file
gmm_obs0.csv contains mostly very small values in the fifth dimension (I
think I noted this earlier).  However, there is one value which is an
order of magnitude larger than the rest (this is point 49230).

This means that the point's conditional probability with respect to
every cluster is 0, and at some point the code was attempting to
normalize the conditional probabilities for each point.  But if all
those probabilities are 0 for this outlier point, then the sum is 0, and
we're dividing by zero, and then everything turns into NaNs.

A simple if statement (committed in r14887) fixes this issue and gives
much better results -- or at least, results not filled with NaNs -- on
the same data.

So, I think I've fixed the issues, but if you come across more, let me
know.  Thanks for pointing these things out.  :)

Ryan

-- 
Ryan Curtin       | "If it's something that can be stopped, then just try to stop it!"
ryan at igglybob.com |   - Skull Kid



More information about the mlpack mailing list