[mlpack] GSoC =?utf-8?Q?=E2=80=9922_?=Idea Discussion: Add CUDA Support For mlpack

Zhuojin Liu zhuojinliu.cs at gmail.com
Mon Apr 11 11:55:00 EDT 2022


Hi, Edel, and thanks for your valuable response!

I think template specialization is a better solution than a new wrapper class,
it requires less modification to current codes and is more consistent with the current
style, so I'll go with this one. Thanks for your advice!

As for specific implementation, the situation varies from file to file. Just like you
said, some classes/functions were implemented in a template way, so we can easily substitute
arma::mat with coot::mat, but some of the template classes still use arma::mat
instead of MatType::mat somewhere in their implementation. And others directly
use arma::mat. So I think this is also something I can improve.

As for the conv2 problem, the situation I described in my last email is the current
implementation, I used NaiveConvolution just for an example to demonstrate what
we can benefit from using GPUs. But I'd also like to optimize the CPU version and/or
implement a more generic version of those functions.

The worry I have is whether mlpack is ready for adding GPU supports now.
Though it's on the mlpack vision list, but it's not on the GSoC idea list, plus
that the bandicoot library is not release-ready now, so I'm not sure it's time
to add GPU support to mlpack this year.

Thanks
Liu, Zhuojin
On Apr 11, 2022, 04:14 +0800, Marcus Edel <marcus.edel at fu-berlin.de>, wrote:
> Hello Liu,
>
> > On Apr 6, 2022, at 6:14 AM, Zhuojin Liu <zhuojinliu.cs at gmail.com> wrote:
> >
> > Hello, everyone.
> >
> > I'm an undergraduate student who's interested in taking part in the GSoC 2022 program, and I'd like to discuss an idea that was not listed on the Summer of Code Ideas list.
> >
> > GPU is used in a lot of machine learning packs to accelerate training and inference, however, mlpack does not support GPU acceleration so far. I found the plan of adding CUDA support to mlpack on <https://www.mlpack.org/papers/vision.pdf>, and I'm interested in it. Therefore, I want to spend the summer adding GPU support to mlpack. This plan includes:
> >
> > 1. Add a mlpack::mat class. Currently, we are using arma::mat in mlpack. However, if we want to compute on GPUs, we have to use coot::mat. The conversion between arma::mat and coot::mat is easy, but doing this conversion manually every time when users want to convert from one to another is bothering, thus a wrapper class that manages the device related information is useful. The lower level implementation can use bandicoot and armadillo, we only need to provide APIs that have the same semantics and names as arma::mat and provide device related functions ( such as torch.Tensor.to(device) ). This looks like a big change, and I don't think I am capable of designing a perfect class. So I want to be discreet, only to implement this class after a proper discussion.
>
> I think the better approach is to make sure every mlpack method uses a template
> type that specifies the matrix type to use. Some of the mlpack implementations
> already do that:
>
> https://github.com/mlpack/mlpack/blob/de94eee6881e33f1145d9d4a8b5380a5be8af36a/src/mlpack/methods/det/dtree.hpp#L44-L46
>
> which allows me to construct my DTree with:
>
> DTree<arma::mat>, DTree<arma::fmat> or DTree<coot::mat> without having another
> wrapper class around the armadillo and bandicoot matrix types. However other
> implementations don't support the same interface:
>
> https://github.com/mlpack/mlpack/blob/de94eee6881e33f1145d9d4a8b5380a5be8af36a/src/mlpack/methods/pca/pca.hpp#L55-L58
>
> is one example which needs to be adapted.
>
> >
> > 2. Implement ANN layers with bandicoot. No matter the mlpack::mat class will be implemented or not, we can still implement ANN layers with bandicoot and benefit from GPU acceleration. E.g., currently the naive_convolution iteratively multiplies the corresponding elements in the filter and the input, then add the result to the output. This can be parallelized easily on GPUs. I want to start with some most used layers, implement their GPU version.
>
> Do you think a better approach would be to use arma::conv2 instead, which would
> allow us to implement a fast coot::conv2 version and use the same code in mlpack
> for CPU and GPU, without falling back to a specific implementation?
>
> >
> > 3. Contribute to bandicoot. Bandicoot is still an unstable library, thus we may encounter unexpected situations like necessary functions are not implemented or bugs. For example, some CUDA kernels in bandicoot can work correctly only if the shape of the input matrix is a factor of 2. Therefore, I want to implement functions and kernels and fix bugs I found during the implementation of layers to help bandicoot release-ready.
> >
> > Thank your time for reading this email, and I'm looking forward to hearing valuable feedback from you. Have a good day!
> >
> > Regards
> > Liu, Zhuojin
> > _______________________________________________
> > mlpack mailing list
> > mlpack at lists.mlpack.org
> > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>
> Thanks
> Marcus
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20220411/27f28e80/attachment-0001.htm>


More information about the mlpack mailing list