[mlpack] GSoC ’22 Idea Discussion: Add CUDA Support For mlpack

Sun Apr 10 16:14:20 EDT 2022

Hello Liu,

> On Apr 6, 2022, at 6:14 AM, Zhuojin Liu <zhuojinliu.cs at gmail.com> wrote:
> 
> Hello, everyone.
> 
> I'm an undergraduate student who's interested in taking part in the GSoC 2022 program, and I'd like to discuss an idea that was not listed on the Summer of Code Ideas list.
> 
> GPU is used in a lot of machine learning packs to accelerate training and inference, however, mlpack does not support GPU acceleration so far. I found the plan of adding CUDA support to mlpack on <https://www.mlpack.org/papers/vision.pdf <https://www.mlpack.org/papers/vision.pdf>>, and I'm interested in it. Therefore, I want to spend the summer adding GPU support to mlpack. This plan includes:
> 
> 1. Add a mlpack::mat class. Currently, we are using arma::mat in mlpack. However, if we want to compute on GPUs, we have to use coot::mat. The conversion between arma::mat and coot::mat is easy, but doing this conversion manually every time when users want to convert from one to another is bothering, thus a wrapper class that manages the device related information is useful. The lower level implementation can use bandicoot and armadillo, we only need to provide APIs that have the same semantics and names as arma::mat and provide device related functions ( such as torch.Tensor.to <https://torch.tensor.to/>(device) ). This looks like a big change, and I don't think I am capable of designing a perfect class. So I want to be discreet, only to implement this class after a proper discussion.

I think the better approach is to make sure every mlpack method uses a template
type that specifies the matrix type to use. Some of the mlpack implementations
already do that:

https://github.com/mlpack/mlpack/blob/de94eee6881e33f1145d9d4a8b5380a5be8af36a/src/mlpack/methods/det/dtree.hpp#L44-L46

which allows me to construct my DTree with:

DTree<arma::mat>, DTree<arma::fmat> or DTree<coot::mat> without having another
wrapper class around the armadillo and bandicoot matrix types. However other
implementations don't support the same interface:

https://github.com/mlpack/mlpack/blob/de94eee6881e33f1145d9d4a8b5380a5be8af36a/src/mlpack/methods/pca/pca.hpp#L55-L58

is one example which needs to be adapted.

> 
> 2. Implement ANN layers with bandicoot. No matter the mlpack::mat class will be implemented or not, we can still implement ANN layers with bandicoot and benefit from GPU acceleration. E.g., currently the naive_convolution iteratively multiplies the corresponding elements in the filter and the input, then add the result to the output. This can be parallelized easily on GPUs. I want to start with some most used layers, implement their GPU version.

Do you think a better approach would be to use arma::conv2 instead, which would
allow us to implement a fast coot::conv2 version and use the same code in mlpack
for CPU and GPU, without falling back to a specific implementation?

> 
> 3. Contribute to bandicoot. Bandicoot is still an unstable library, thus we may encounter unexpected situations like necessary functions are not implemented or bugs. For example, some CUDA kernels in bandicoot can work correctly only if the shape of the input matrix is a factor of 2. Therefore, I want to implement functions and kernels and fix bugs I found during the implementation of layers to help bandicoot release-ready.
> 
> Thank your time for reading this email, and I'm looking forward to hearing valuable feedback from you. Have a good day!
> 
> Regards
> Liu, Zhuojin
> _______________________________________________
> mlpack mailing list
> mlpack at lists.mlpack.org
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Thanks
Marcus

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20220410/5c36ec3c/attachment.htm>