[mlpack] A better way to do DBSCAN on a dataset with twi different units of measure?

Ryan Curtin ryan at ratml.org
Tue Jul 17 04:25:26 EDT 2018


On Tue, Jul 17, 2018 at 04:07:15AM +0000, Yew Khong See wrote:
> Hi all,
> I am using DBSCAN to cluster a dataset consisting of an individual's
> weight (in kg) and height (in cm). 
> What I am doing now is to cluster the weights first and then do
> another clustering on the heights from each weight cluster. 
> This method is not efficient and will not scale with larger datasets.
> 
> Is there a better way to perform clustering one time on both the
> weights and heights, but with different epsilon and minpoints?

Hi there,

Can you clarify what you mean by 'different minpoints'?  I can picture
what you mean when you say 'different epsilon'---I think that you mean
that you want a different epsilon value for weight and height, and that
you want to cluster simultaneously using both weight and height values.

In this case you could just normalize your data accordingly: if, e.g.,
you want epsilon 1 for weight and 2 for height, simply divide all the
height values by 2, and then use epsilon = 1.

Hope this helps; let me know if I can clarify further.

Thanks,

Ryan

-- 
Ryan Curtin    | "Indeed!"
ryan at ratml.org |   - David Lo Pan


More information about the mlpack mailing list