mlpack and google summer of code
mlpack is a proud participant in Google Summer of Code. We have been a part of the program in 2013, 2014, 2016 and 2017, with a total of 24 students accepted for mlpack. In the past, we have received very many applications and it is a competitive process, so this page exists to help you determine if you could be a strong candidate.
The two most important qualities that an mlpack GSoC candidate can possess is the ability to be self-sufficient and the willingness to learn.
Self-sufficiency is key because mentors have limited time and can't put in as much time helping the student as the student is putting in. However, this of course does not mean that a student (or a prospective student) can never ask any questions! mlpack is a complex library and can sometimes take help and explanation to understand.
A willingness to learn is also important because virtually every potential project will require the student to become familiar with a new algorithm or C++ technique. mlpack is a complex library with many components and it is likely during your project that you might have to use or interact with some other part of the codebase.
Of course, these are not the only important ingredients for a successful GSoC project. A student should ideally be familiar with
- open source software development: opening pull requests, using git, opening issues. mlpack uses Github, which has great documentation. You can learn about the workflow using that link, if you are not already familiar.
- using the development toolchain on your computer: you should be able to download and compile mlpack, make changes to the code, and recompile with the new changes. There is a tutorial for how to build mlpack and would be a great place to get started. If you're on Windows, then this wiki page could be very useful.
- at least intermediate C++ knowledge: mlpack uses lots of different C++ paradigms including template metaprogramming, C++11 features like rvalue references, and different parts of the Boost libraries, in order to make the code fast. You should be at least familiar with some of these language features and what templates are, even if you have not used them in-depth, so that you can understand the mlpack codebase. Some examples of patterns that are often used inside of mlpack are SFINAE (example in mlpack), policy-based design, and compile-time class traits. Here are some other useful resources for learning template metaprogramming, and some useful reference books. If some of this sounds new to you, don't feel overwhelmed---it's not a necessity, but it is helpful. You should at least be prepared to learn about it!
- project-specific knowledge: you'll need to have an in-depth understanding of the specific project that you choose.
For this last bit, a project should be chosen. There are many projects on the SummerOfCodeIdeas page to be selected from. Find one that interests you---or propose your own and see if a mentor is interested in supervising the project! If you have any questions about a project, be aware that it's possible that the question has already been answered on the mailing list. Take a look through the archives or search to see if there's already an answer to your question.
There are a couple options for how to start contributing to mlpack. The first is to look through the list of open issues and see if there is any issue you think you can solve. The issues are generally tagged with difficulty, so you can search for, e.g., only "easy" issues. Another approach is to find a bug in the mlpack code and solve it. And yet another approach is to contribute new functionality. As always, be patient---it may be some time before a community member is able to fully review your PR. Be sure that it follows the style guidelines to help keep things moving quickly.
Remember that open source is about community---so be sure to participate!