[mlpack] Issue using FastCSV

Sat Apr 3 01:55:38 EDT 2021

Hello,

Thank you Ryan for the help. I at least figured out how to use it to
benchmark, although I am not sure if it's the most
efficient way.

I really wanted to attend the video meet-up to discuss the project but I am
kinda busy with college exams
and submissions.

I agree that armadillo's parser is the fastest in any case. You can see
that I benchmarked all 4 and
posted the result in the issue
<https://github.com/mlpack/mlpack/issues/2646>. Maybe this will help more.

Code: https://github.com/heisenbuug/Benchmark-CSV-Parsers

Let me know your thoughts so we can proceed further.
Very excited to work on this project.

Thanks,
Gopi

On Sat, Apr 3, 2021 at 7:44 AM Ryan Curtin <ryan at ratml.org> wrote:

> On Fri, Apr 02, 2021 at 03:35:23PM +0530, Gopi Manohar Tatiraju wrote:
> > Hello,
> >
> > I am exploring some csv parsers. Link
> > <https://github.com/ben-strasser/fast-cpp-csv-parser>
> > I went through the basic example:
> >
> > # include "csv.h"
> > > int main(){
> > >   io::CSVReader<3> in("ram.csv");
> > >   in.read_header(io::ignore_extra_column, "vendor", "size", "speed");
> > >   std::string vendor; int size; double speed;
> > >   while(in.read_row(vendor, size, speed)){
> > >     // do stuff with the data
> > >   }
> > > }
> >
> >
> > Here you can see that the variables we pass into the function read_row
> get
> > assigned the values of the corresponding columns, but what if I have 20
> or
> > 25 columns, declaring so many variables won't make sense. There should be
> > some C++ syntax to handle cases like this. What is the concept called?
> >
> > What if I don't know how many columns are there in my CSV file or what if
> > there are 100 columns, we should use a vector or an array. Agreed.
> But...I
> > don't want to pass 3 variables in the while loop(vendor, size, speed) but
> > what I want is, pass a single vector or an array and I can unpack the
> > vector to get those values.
> >
> > Any help would be appreciated.
>
> Hey Gopi,
>
> I think maybe you've progressed past this point, but if you are still
> struggling with it you might try checking on that repository.
>
> (Wait, I see that you opened
> https://github.com/ben-strasser/fast-cpp-csv-parser/issues/119.)
>
> In your example, the template parameter must be set at compile-time, so
> to me it seems strange that the fast-csv designer would have chosen to
> make the number of columns a template parameter.
>
> In my view, if there is no workaround that rules this out---we can have
> arbitrarily-sized CSVs (even thousands of columns!) and we don't want to
> force the compiler to instantiate `CSVReader<>` for every single number
> of columns we might encounter...
>
> Hope that helps!
>
> Ryan
>
> --
> Ryan Curtin    | "I was misinformed."
> ryan at ratml.org |   - Rick Blaine
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210403/e4208eae/attachment.htm>