[mlpack] Implementation of matlab ksdensity function

Tue Feb 13 15:54:26 EST 2018

On Tue, Feb 13, 2018 at 02:44:03PM +0000, Angelo DI SENA wrote:
> Hi Ryan
> 
> Thanks for your answer.
> Hi partially understood your suggestion.
> This is due to my poor knowledge of the math behind.
> 
> In the mathlab script I'm trying to  convert
> Vector is  3000  element (1x3000)
> With values between -1 and 1
> Pts is a 200 vector(200X1)
> 
> From matlab documentation the result should be  200 pair of values (one for each element in pts)
> So, what is not clear is how I should consider vector.
> For each value in pts which values must be considered from vector?

Hi Angelo,

No problem, I am happy to try to help out.  I can explain basic kernel
density estimation; however, you should double-check the MATLAB
implementation and make sure you change my description below to fit what
they are actually doing.  For instance, I think that ksdensity() does
auto-tune the bandwidth of the kernel, but my discussion below will
assume a hand-chosen bandwidth.

When you do kernel density estimation, you are assuming that your
density f(x) can be modeled by a sum of the points:

f(x) = sum_{i = 0}^{n} K(x, p_i, bw)

where { p_0, ..., p_n } are the reference points (called 'vector' in
your code, containing 3000 one-dimensional points), 'x' is the query
point (one element of 'pts' in your code), and 'bw' is a bandwidth for
the density estimation.

The kernel function, if you choose a Gaussian function, is just

K(x, p_i, bw) = exp(-| x - p_i |^2 / (2 * bw^2)),

so you can use GaussianDistribution for that part.

Since you want results for each point in 'pts', you can just repeat that
f(x) calculation for each point in 'pts'.

I hope this is helpful... let me know if I can clarify anything.

Thanks!

Ryan

-- 
Ryan Curtin    | "For more enjoyment and greater efficiency,
ryan at ratml.org | consumption is being standardized."