[mlpack] GSoC =?utf-8?Q?=E2=80=9922_?=Idea Discussion: Add CUDA Support For mlpack

Wed Apr 6 06:14:41 EDT 2022

Hello, everyone.

I'm an undergraduate student who's interested in taking part in the GSoC 2022 program, and I'd like to discuss an idea that was not listed on the Summer of Code Ideas list.

GPU is used in a lot of machine learning packs to accelerate training and inference, however, mlpack does not support GPU acceleration so far. I found the plan of adding CUDA support to mlpack on <https://www.mlpack.org/papers/vision.pdf>, and I'm interested in it. Therefore, I want to spend the summer adding GPU support to mlpack. This plan includes:

1. Add a mlpack::mat class. Currently, we are using arma::mat in mlpack. However, if we want to compute on GPUs, we have to use coot::mat. The conversion between arma::mat and coot::mat is easy, but doing this conversion manually every time when users want to convert from one to another is bothering, thus a wrapper class that manages the device related information is useful. The lower level implementation can use bandicoot and armadillo, we only need to provide APIs that have the same semantics and names as arma::mat and provide device related functions ( such as torch.Tensor.to(device) ). This looks like a big change, and I don't think I am capable of designing a perfect class. So I want to be discreet, only to implement this class after a proper discussion.

2. Implement ANN layers with bandicoot. No matter the mlpack::mat class will be implemented or not, we can still implement ANN layers with bandicoot and benefit from GPU acceleration. E.g., currently the naive_convolution iteratively multiplies the corresponding elements in the filter and the input, then add the result to the output. This can be parallelized easily on GPUs. I want to start with some most used layers, implement their GPU version.

3. Contribute to bandicoot. Bandicoot is still an unstable library, thus we may encounter unexpected situations like necessary functions are not implemented or bugs. For example, some CUDA kernels in bandicoot can work correctly only if the shape of the input matrix is a factor of 2. Therefore, I want to implement functions and kernels and fix bugs I found during the implementation of layers to help bandicoot release-ready.

Thank your time for reading this email, and I'm looking forward to hearing valuable feedback from you. Have a good day!

Regards
Liu, Zhuojin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20220406/aa307ba5/attachment.htm>