[mlpack] Looking for heavyweight usecases of NB Classifier

Yannis Mentekidis mentekid at gmail.com
Mon Jun 5 12:50:08 EDT 2017


Hi guys,

Shikhar is working on his project to profile different mlpack algorithms
and identify potential bottlenecks he could then parallelize. He's found a
paper (
https://papers.nips.cc/paper/3150-map-reduce-for-machine-learning-on-multicore.pdf)
which
adapts the MapReduce paradigm for certain algorithms, including Naive
Bayes, so he started with profiling that algorithm.

However, he and I have been struggling to actually find a dataset that
makes the algorithm take a significant amount of time. The time it takes
for the mlpack::data::Load() functions is 2-3 orders of magnitude larger
than the Train() and Classify() functions.

We were wondering:

   - Has anybody come across any usecases where NBC is slow enough to be
   worth parallelizing?
   - Does anyone have any tips on profiling the algorithm so that data
   loading is ignored, so we can focus on the things we can actually improve?

Thanks a lot in advance :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20170605/ef7080f8/attachment.html>


More information about the mlpack mailing list