[mlpack] K-means with HMM

Thuener Silva thuener at gmail.com
Fri May 16 16:31:05 EDT 2014


Great! That is what I'm looking for.
In my test case I'm using a Gaussian Multivariate Distribution so I will
also have to evaluate the covariances, but that is easy.

Thanks!


Thuener Silva


On Fri, May 16, 2014 at 11:32 AM, Ryan Curtin <gth671b at mail.gatech.edu>wrote:

> On Fri, May 16, 2014 at 10:56:03AM -0300, Thuener Silva wrote:
> > Hello I'm using mlpack for research project in PUC-Rio. I'm trying to use
> > K-Means to get the initial parameters for HMM to do an unsupervised
> > learning. I noticed the phrase "I won't use K-Means because we can't
> afford
> > to add the instability of that to our test." in one of the test cases of
> > the hmm_test.cpp so you guys probably already done something like that.
> > Can some one send me the some code example with that(K-Mean with HMM) or
> > tell me where can I find that? It will help a lot.
>
> Hi Thuener,
>
> The reason we avoided using k-means in the test was that k-means can
> sometimes produce very bad results (depending on the starting
> positions), so taking that factor out of the test helps test HMMs
> better.
>
> I am assuming that what you want to do is use k-means to initialize
> the means of the emission distributions of your HMM; and I'll assume
> that like in the test you referenced, the emission distribution type is
> the Gaussian distribution.  If that's not the case, please write back
> and clarify so I can give better advice.
>
> ----
> // This is our training dataset; we don't have any labels, so we are
> // doing unsupervised training of the HMM.
> extern std::vector<arma::mat> observations;
>
> // Create the HMM object.
> HMM<GaussianDistribution> hmm(numStates,
> GaussianDistribution(dimensionality));
>
> // Reshape the vector of observations into one big matrix.
> arma::mat data;
> size_t totalCols = 0;
> for (size_t i = 0; i < observations.size(); ++i)
>   totalCols += observations[i].n_cols;
>
> data.set_size(dimensionality, totalCols);
> size_t startCol = 0;
> for (size_t i = 0; i < observations.size(); ++i)
>   data.submat(0, startCol, dimensionality - 1, startCol +
> observations[i].n_cols - 1) = observations[i];
>
> // Now run k-means on the observations to get a set of initial means for
> // the emission distributions.
> arma::mat centroids;
> arma::Col<size_t> observations; // We won't use this.
> KMeans<> k(); // All default options.
> k.Cluster(data, numStates, observations, centroids);
>
> // Set the emission distribution's mean to the centroid of each cluster.
> for (size_t i = 0; i < numStates; ++i)
>   hmm.Emission()[i].Mean() = centroids.col(i);
>
> // Ok, finally we are ready for training.
> hmm.Train(observations);
> ----
>
> I haven't compiled this and tested it, but I think it should work.  At
> the very least, it should point you in the right direction about how to
> use k-means to initialize your emission distributions.
>
> If you're trying to use GMMs as your emission type instead of Gaussian
> distributions, this process becomes a little more difficult.
> Personally, I think that training GMM HMMs on unlabeled data is
> unrealistic because there are so many free parameters.
>
> If I can clarify anything I've written, please let me know.
>
> Thanks!
>
> Ryan
>
> --
> Ryan Curtin    | "Happy premise #2: There is no giant foot trying
> ryan at ratml.org | to squash me." - Kit Ramsey
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140516/82dabb79/attachment-0003.html>


More information about the mlpack mailing list