[mlpack] Query regarding the project "String Processing Utilities"

Rahul Barnwal rahulbarnwal70 at gmail.com
Fri Feb 2 05:27:50 EST 2018


Hi Ryan,
Thank you for the reply. And also sorry for my late response, I was
preoccupied with sports fest in my college. But now i can dedicate my time
to this.

 Like you suggested, i will like to start with the task of performing
TF-IDF on yelp review dataset and apply logistic regression after that. I
believe it shouldn't be very difficult but incase of some difficulty, Can i
ask here ?

Thanks
Rahul




‌
<https://mailtrack.io/> Sent with Mailtrack
<https://chrome.google.com/webstore/detail/mailtrack-for-gmail-inbox/ndnaehgpjlnokgebbaldlmgkapkpjkkb?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality>

On Thu, Jan 25, 2018 at 6:37 PM, Ryan Curtin <ryan at ratml.org> wrote:

> On Wed, Jan 24, 2018 at 11:06:01PM +0530, Rahul Barnwal wrote:
> > Hi,
> > I will like to contribute on the project "String Processing Utilities". I
> > had built and acquainted myself with mlpack library to some extent
> earlier.
> >
> > I am somewhat familiar with various string encoding methods in python.
> And
> > i am willing to learn things in the process. Could you tell how to get
> > started on this project and how to proceed on the same.
>
> Hi Rahul,
>
> Thanks for getting in touch.  I think that the string processing
> utilities project will be an interesting one.  I would say that the
> right way to get started on this project is to understand the problem.
>
> One way to do that (and there are many others) might be to find a data
> science problem that involves strings... perhaps, predicting the next
> character in a sentence with an RNN, or, perhaps, performing TF-IDF (or
> similar) on the Yelp reviews dataset and trying to predict the outcome
> with logistic regression.  There are many tasks of this sort but the key
> would be to choose one, then use mlpack to solve the problem.
>
> When you do that, you can then identify the difficulties inherent in
> using string-based datasets with mlpack, and this can help inform your
> proposal on how these types of issues could be improved or resolved.
>
> Of course, feel free to discuss ideas here on the mailing list.  Let me
> know if I can clarify anything that I've written.
>
> Thanks,
>
> Ryan
>
> --
> Ryan Curtin    | "Avoid the planet Earth at all costs."
> ryan at ratml.org |   - The President
>



-- 
Regards,
Rahul Barnwal
4th year Undergraduate
Mathematics Department
IIT KHARAGPUR
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20180202/104124dd/attachment.html>


More information about the mlpack mailing list