[mlpack] GSoC-2021

Gopi Manohar Tatiraju deathcoderx at gmail.com
Wed Mar 31 19:15:48 EDT 2021


Hey,

So, I want through both the libraries we considered for `csv parsers`
I implemented code to load the data from a small example `csv` file
to arma::mat, here is the sample code, let me know what you think.
I am loading into wrong in arma::mat? Can there be any other efficient
way?

Fast CSV Parser <https://github.com/ben-strasser/fast-cpp-csv-parser>
io::CSVReader<4> in("llog.csv");
float a, b, c, d;
int row = 0;
arma::mat data(20, 4);

while(in.read_row(a, b, c, d)){
data(row, 0) = a;
data(row, 1) = b;
data(row, 2) = c;
data(row, 3) = d;
row++;
}

Rapid.csv <https://github.com/d99kris/rapidcsv>
// For headerless csv files
rapidcsv::Document doc("llog.csv", rapidcsv::LabelParams(-1, -1));
arma::mat data(doc.GetRowCount(), doc.GetColumnCount(), arma::fill::ones);

std::vector<float> col;
for(int i = 0; i < doc.GetRowCount(); i++)
{
col = doc.GetRow<float>(i);
for(int j = 0; j < doc.GetColumnCount(); j++)
{
data(i, j) = col[j];
}
}

After using both a I feel like `rapid.csv` is easier to grasp and work on
and seemed more structured.
Let me know your thoughts. Also If loading like the above example is file,
this can be converted
into a function that can act as basic csv file loading in arma::mat, right?

Thank You,
Gopi

On Mon, Mar 29, 2021 at 8:28 PM Omar Shrit <omar at shrit.me> wrote:

> Hey Gopi
>
> On 03/29, Gopi Manohar Tatiraju wrote:
> > Hey,
> >
> > I agree, after going a bit through both the candidates I can see we can
> > unload a lot of work by using a well-implemented existing parser.
> > I think I should start by comparing both the mentioned libraries to
> decide
> > which one to use. I will use the same benchmark strategy that
> > was discussed in the issue. Does that sound good?
>
> Sounds good to me.
>
> > And also I think I can work on replacing boost spirits in GSoC then. This
> > will be a start to the data frame idea. Even if we are left with time
> > after this, I can start the work on the data frame as well. Is it
> > considerable?
>
> Yes of course.
>
> > Thanks,
> > Gopi
> >
> >
> > On Mon, Mar 29, 2021 at 7:33 PM Omar Shrit <omar at shrit.me> wrote:
> >
> > > Hey Gopi,
> > >
> > > I totally agree with Ryan, using existing parser will accelerate the
> > > project and allow to move forward with the dataframe class. Also, I
> > > do believe that replacing boost Spirit with an existing parser will
> take
> > > a considerable amount of the summer.
> > >
> > > Thanks,
> > >
> > > Omar
> > >
> > > On 03/29, Ryan Curtin wrote:
> > > > On Mon, Mar 29, 2021 at 04:17:35PM +0530, Gopi Manohar Tatiraju
> wrote:
> > > > > Would love to hear your thoughts on whether to go with an already
> > > > > implemented parser or build a new one. Also if we are planning to
> > > build a
> > > > > data frame here then
> > > > > maybe going with an in-house parser would be better as we will
> have the
> > > > > ability to design it in such a way that it can extend maximum
> support
> > > to
> > > > > the new data frame
> > > > > which we are planning to build ahead.
> > > >
> > > > Hey Gopi,
> > > >
> > > > Honestly I think it's best to use another package.  Not only will
> this
> > > > free up time to actually work on the dataframe class, but also it
> means
> > > > we are not responsible for maintenance of the CSV parser.  There are
> > > > lots of little complexities and edge cases in parsing (not to mention
> > > > efficiency!) and so we can probably get a lot more bang for our buck
> > > > here by using an implementation from someone who has already put down
> > > > the time to consider all those details.
> > > >
> > > > Hope this is helpful. :)
> > > >
> > > > Thanks,
> > > >
> > > > Ryan
> > > >
> > > > --
> > > > Ryan Curtin    | "Kill them, Machine... kill them all."
> > > > ryan at ratml.org |   - Dino Velvet
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210401/ac98e86a/attachment-0001.htm>


More information about the mlpack mailing list