[mlpack] Query Regarding Conditional Random Field implementation for mlpack

Ryan Curtin ryan at ratml.org
Tue Jan 30 09:20:17 EST 2018


On Fri, Jan 26, 2018 at 08:45:59PM +0530, NEMA DAIVIK RAKESH wrote:
> Hello
> 
> I am interested in contributing to mlpack for GSoC 2018. I spoke to zoq on
> the IRC channel about a week ago regarding a few ideas I had. I discussed
> (among other things) the possibility of adding a CRF module to mlpack -
> seeing that an HMM module already exists. He suggested that I talk to Ryan
> Curtin regarding this, as Ryan has already worked with CRFs in the past.
> After going through some of the literature on CRFs (this paper
> <http://homepages.inf.ed.ac.uk/csutton/publications/crftut-fnt.pdf>, and Hugo
> Larochelle's videos
> <https://www.youtube.com/watch?v=SGZ6BttHMPw&list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH>
> on YouTube), and the HMM code in mlpack source - I cannot seem to figure
> out where to begin. I have tried to list some concerns I have below - and
> would greatly appreciate it if someone could help me with these:

Hi Daivik,

Thanks for getting in touch.  I'll do my best to answer the questions
you asked.  I have worked with CRFs briefly in the past but I would not
call myself an expert at this time.

> 1. I see that several functions in the HMM code are also required for CRFs
> - for instance, Forward-Backward procedures and the Viterbi Algorithm.
> However, I do not see a very direct way to reuse them. Would it be better
> to rewrite them in a new CRF module - or should I try to make the existing
> functions work for both HMMs as well as CRFs?

My opinion is that it's always good to reuse code if possible, but I
think that HMMs and CRFs may be sufficiently different that it may not
be possible to reuse the code here.

> 2. I'm still not completely sure about what sort of API to expose. I know I
> need to provide the following:
> - A function to evaluate p(y|X) (that ginormous softmax-like function)
> using Forward-Backward tables.
> - A function to infer a sequence of labels given a sequence on input
> vectors (Using Viterbi algorithm/argmax of per position marginals)
> - A function to calculate per position marginals (Ie. to evaluate p(y[k]|X)
> - also using the forward-backward tables)
> - A function to calculate the marginal p(y[k],y[k+1] | X)
> - A function to train the CRF model (using SGD+L2 regularization - also
> provide the option of other training methods??)
> Am I missing something? Any suggestions for what the function prototypes
> should look like, or should I go with my better judgement? I fear that my
> better judgement may not be very good though.

This looks good to me; I think that is about the same API as the other
generative models that we have (which I guess would include GMMs and
HMMs).  The only thing I might add is the ability to sample randomly
from the CRF, like the Random() functions for the other distributions.

> 3. In most literature I saw on CRFs, p(y|X) is modeled as exp( sum of
> (parameters*feature functions) )/Z(X). However, in Hugo Larochelle's
> videos, he uses neural networks (with hidden units) for feature extraction
> - hence his expression for p(y|X) cannot be represented as exponent of a
> linear function of parameters of the model. (this video explains it better
> than I did just now). Ideally, the choice of feature functions should lie
> with the user. IMHO, Larochelle's model is more general and flexible than
> the ones presented in the paper. My question is, how do I allow for users
> to specify their feature functions? Should feature extraction be done via
> neural nets? - in which case, I presume that the choice of hyperparameters
> such as number of hidden layers and hidden units will lie with the user?

My thinking here is that you could do similar to the HMM class, which
allows any type of emission distribution to be used.

> 4. Does this (that is, a CRF module) sound like a good feature to have in
> the library given that CRFs are not as popular as some other methods for
> sequence modeling <cough> LSTMs <cough>? Should I direct my efforts
> elsewhere, like one of the ideas from the ideas page?

CRFs are popular in NLP and related fields, so I think a useful CRF
implementation in mlpack would have to come with examples of how to use
it for common NLP tasks, and it may mean that some of the string
processing utilities suggested in the GSoC ideas page might need to be
implemented for the CRFs to be fully useful to a wider audience.

I hope these pointers are helpful; please let me know if I can clarify
anything.

Thanks,

Ryan

-- 
Ryan Curtin    | "A present for my friends... at Thanksgiving."
ryan at ratml.org |   - Bruce


More information about the mlpack mailing list