[mlpack] Improve mlpack's tree ensemble support - GSoC 2021

RISHABH GARG rishabhgarg108 at gmail.com
Tue Mar 30 13:40:46 EDT 2021


Hello Ryan,

Thanks for sharing the approach. I liked it and maybe we could use it, but
while writing the proposal I realised it might be too much to work on along
with other things in the limited GSoC time period.

Although, what we will be implementing will be called XGBoost only when we
are able to scale to larger datasets. Thus, at some point, we will have to
implement this too. So, would it be okay if we can shift it to post GSoC ?

Thanks,

Rishabh



On Mon, 29 Mar 2021, 17:33 Ryan Curtin, <ryan at ratml.org> wrote:

> On Mon, Mar 29, 2021 at 10:59:32AM +0530, RISHABH GARG wrote:
> > Hey Ryan, thanks for the feedback.
> >
> >
> > I also agree with you. XGBoost is one of the most widely used ML
> > algorithms. It would be really great for MLPACK to have it and this will
> > undoubtedly attract more and more users to MLPACK. This discussion with
> > you, has changed my perspective and I think we can prioritise XGBoost
> over
> > others.
> >
> >
> > As you mentioned in the previous mail, it will be straightforward to
> > implement the core XGBoost algorithm provided the flexible implementation
> > of trees in MLPACK. But, how can we implement optimisations like
> > cache-aware access and out-of-core computation with armadillo matrices? I
> > remember I had a chat with you related to this and you slightly mentioned
> > that it can be done with a simple tweak. Can you please elaborate it a
> bit?
>
> I wouldn't worry about out-of-core learning for your proposal---ideally
> we should just be able to demonstrate that the performance of what we
> implement is comparable to XGBoost's performance.
>
> That said, if you are interested in doing out-of-core learning, the way
> I know to do it is to create a file of the right size on disk (e.g.
> n_rows * n_cols * sizeof(double) bytes).  Then, in your program, use
> mmap() to memory map the file.  This will give you a pointer to some
> memory, which you can cast to a double*.  You can then use the Armadillo
> advanced constructor that takes a memory pointer to create the Armadillo
> matrix that is wrapped around the mmap()-ed file.  Now, ta-da, you have
> an out-of-core matrix. :)  (But some restrictions are that you can't
> resize it, and operations on that matrix that result in a new matrix
> will not be mmap()-ed.)
>
> Anyway, hope that is helpful!
>
> Thanks,
>
> Ryan
>
> --
> Ryan Curtin    | "If it's something that can be stopped, then just try to
> stop it!"
> ryan at ratml.org |   - Skull Kid
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210330/37b7e968/attachment.htm>


More information about the mlpack mailing list