[mlpack] Query Regarding Conditional Random Field implementation for mlpack

Fri Jan 26 10:15:59 EST 2018

Hello

I am interested in contributing to mlpack for GSoC 2018. I spoke to zoq on
the IRC channel about a week ago regarding a few ideas I had. I discussed
(among other things) the possibility of adding a CRF module to mlpack -
seeing that an HMM module already exists. He suggested that I talk to Ryan
Curtin regarding this, as Ryan has already worked with CRFs in the past.
After going through some of the literature on CRFs (this paper
<http://homepages.inf.ed.ac.uk/csutton/publications/crftut-fnt.pdf>, and Hugo
Larochelle's videos
<https://www.youtube.com/watch?v=SGZ6BttHMPw&list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH>
on YouTube), and the HMM code in mlpack source - I cannot seem to figure
out where to begin. I have tried to list some concerns I have below - and
would greatly appreciate it if someone could help me with these:

1. I see that several functions in the HMM code are also required for CRFs
- for instance, Forward-Backward procedures and the Viterbi Algorithm.
However, I do not see a very direct way to reuse them. Would it be better
to rewrite them in a new CRF module - or should I try to make the existing
functions work for both HMMs as well as CRFs?

2. I'm still not completely sure about what sort of API to expose. I know I
need to provide the following:
- A function to evaluate p(y|X) (that ginormous softmax-like function)
using Forward-Backward tables.
- A function to infer a sequence of labels given a sequence on input
vectors (Using Viterbi algorithm/argmax of per position marginals)
- A function to calculate per position marginals (Ie. to evaluate p(y[k]|X)
- also using the forward-backward tables)
- A function to calculate the marginal p(y[k],y[k+1] | X)
- A function to train the CRF model (using SGD+L2 regularization - also
provide the option of other training methods??)
Am I missing something? Any suggestions for what the function prototypes
should look like, or should I go with my better judgement? I fear that my
better judgement may not be very good though.

3. In most literature I saw on CRFs, p(y|X) is modeled as exp( sum of
(parameters*feature functions) )/Z(X). However, in Hugo Larochelle's
videos, he uses neural networks (with hidden units) for feature extraction
- hence his expression for p(y|X) cannot be represented as exponent of a
linear function of parameters of the model. (this video explains it better
than I did just now). Ideally, the choice of feature functions should lie
with the user. IMHO, Larochelle's model is more general and flexible than
the ones presented in the paper. My question is, how do I allow for users
to specify their feature functions? Should feature extraction be done via
neural nets? - in which case, I presume that the choice of hyperparameters
such as number of hidden layers and hidden units will lie with the user?

4. Does this (that is, a CRF module) sound like a good feature to have in
the library given that CRFs are not as popular as some other methods for
sequence modeling <cough> LSTMs <cough>? Should I direct my efforts
elsewhere, like one of the ideas from the ideas page?

Any pointers of how to begin would be very helpful.

Thanks and regards

Daivik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180126/974b29c5/attachment.html>