[mlpack] Improve mlpack's tree ensemble support - GSoC 2021

Ryan Curtin ryan at ratml.org
Mon Mar 29 08:03:11 EDT 2021


On Mon, Mar 29, 2021 at 10:59:32AM +0530, RISHABH GARG wrote:
> Hey Ryan, thanks for the feedback.
> 
> 
> I also agree with you. XGBoost is one of the most widely used ML
> algorithms. It would be really great for MLPACK to have it and this will
> undoubtedly attract more and more users to MLPACK. This discussion with
> you, has changed my perspective and I think we can prioritise XGBoost over
> others.
> 
> 
> As you mentioned in the previous mail, it will be straightforward to
> implement the core XGBoost algorithm provided the flexible implementation
> of trees in MLPACK. But, how can we implement optimisations like
> cache-aware access and out-of-core computation with armadillo matrices? I
> remember I had a chat with you related to this and you slightly mentioned
> that it can be done with a simple tweak. Can you please elaborate it a bit?

I wouldn't worry about out-of-core learning for your proposal---ideally
we should just be able to demonstrate that the performance of what we
implement is comparable to XGBoost's performance.

That said, if you are interested in doing out-of-core learning, the way
I know to do it is to create a file of the right size on disk (e.g.
n_rows * n_cols * sizeof(double) bytes).  Then, in your program, use
mmap() to memory map the file.  This will give you a pointer to some
memory, which you can cast to a double*.  You can then use the Armadillo
advanced constructor that takes a memory pointer to create the Armadillo
matrix that is wrapped around the mmap()-ed file.  Now, ta-da, you have
an out-of-core matrix. :)  (But some restrictions are that you can't
resize it, and operations on that matrix that result in a new matrix
will not be mmap()-ed.)

Anyway, hope that is helpful!

Thanks,

Ryan

-- 
Ryan Curtin    | "If it's something that can be stopped, then just try to stop it!"
ryan at ratml.org |   - Skull Kid


More information about the mlpack mailing list