[mlpack] GSoC 2020: Visualization Tool

Thu Mar 19 09:23:28 EDT 2020

Hello, and apologies for my absence. In case it's not clear, were I to
mentor the visualization project, I'd certainly need a co-mentor. GSOC, if
done right, is a reasonable time commitment, and students deserve good
support, which I don't think I can offer entirely on my own now.

That being said...

I think that these visualizations are a reasonable start. I'm wondering how
you think that users will want to create or interact with these
visualizations? Will they want to use a log scale in certain cases? Are
they interested in the size of an ANN layer's input/output? The project
proposals are generally reasonably openly stated so that you can decide
exactly what form of the project is most interesting or useful. I'm curious
what you think!

Thank you,

-Ryan Birmingham

On Tue, Mar 17, 2020 at 9:41 AM Gopi Manohar Tatiraju <deathcoderx at gmail.com>
wrote:

> Hey Mentors,
>
> Regarding the visualization project, I am looking for some feedback and
> help to prepare a proof of concept.
> My previous mail depicts my doubts about should we integrate this tool
> with the existing library as this will impact our path to build the tool.
> Also I have already started to work on the tool, I took the existing model
> of mlpack(Digit Recognizer
> <https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizerBatchNorm/src>
> )
>
>    - Using openCV I visualised the MNIST dataset. OpenCV doesn't have any
>    in-build function to load .csv images so I wrote my custom function for
>    that. The output is something like this, also the label will be displayed
>    in the terminal or if required we can add it to the image itself:
>    [image: 1.png]
>
>
>
>    - Using matplotlibcpp I also plotted accuracy while training the
>    model. This can be displayed in two ways:
>       1. Either at the time of training. The graph will be updated after
>       each epoch(better on faster machines)
>       2. Show the whole graph once the whole training is done.
>
> [image: 0.png]
>
>
>    - I also made a graph which depicts the order of layers added in the
>    model. I used openCV for this, also I read that text rendering is not much
>    efficient in openCV so maybe we can discuss how to tackle that by some
>    testing.
>
> [image: 2.png]
>
> Till now this much has been done, I am thinking about more model metric
> like loss and many other ML metrics. Using articles like these
> <https://towardsdatascience.com/20-popular-machine-learning-metrics-part-1-classification-regression-evaluation-metrics-1ca3e282a2ce>
> and referring to research papers, we can discuss what more to add. Also the
> main point still remains is how the user will use the tool.
>
> Can I get some feed back regarding this, as proposal submission is already
> open and I want to submit a detailed proposal.
>
>
> Project:
> https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas#visualization-tool
> Mentor: Ryan Birmingham
> Mail-List: mlpack at lists.mlpack.org
>
> Regards.
> Gopi M Tatiraju
>
>
>
> On Fri, Mar 13, 2020 at 1:54 AM Gopi Manohar Tatiraju <
> deathcoderx at gmail.com> wrote:
>
>> Hey Rahul,
>>
>> If you don't mind me asking, are you mentoring this project? Coz it was
>> not listed on the idea page and there are many things which I would I like
>> to discuss about this project from a mentor's perspective.
>> About serialized model, I need to go through the saved .h5 file to see
>> how exactly we can use it. Also I am just trying to determine what all can
>> be included in this project, I am yet to decide how to implement these
>> things coz there are many options available. As it was mentioned on the
>> idea page that proof of concept is required so I am just working on
>> determining the outlines of the project first,
>>
>> Regards.
>> Gopi M Tatiraju
>>
>>
>> On Fri, Mar 13, 2020 at 1:35 AM Rahul Prabhu <cupertinorp at gmail.com>
>> wrote:
>>
>>> Hey Gopi,
>>> Thanks for the interest in this project. I was wondering, to visualize
>>> the neural network, could we not just parse the serialized model returned
>>> by data::Save()?
>>>
>>> On Thu, Mar 12, 2020 at 11:59 PM Gopi Manohar Tatiraju <
>>> deathcoderx at gmail.com> wrote:
>>>
>>>> Hey,
>>>>
>>>> Regarding Visualization Tool, I think we may need to use one or more
>>>> different libraries to build it, so a discussion regarding the dependencies
>>>> is needed to proceed further.
>>>>
>>>> I took the example of Digit Recogniser
>>>> <https://github.com/mlpack/models/tree/master/Kaggle> and started
>>>> working on it.
>>>>
>>>> I started by visualising the dataset itself. Using OpenCV I wrote code
>>>> to read images from CSV file and display them(OpenCV doesn't have any
>>>> function to read csv files as images).
>>>>
>>>> Now I think another good visual will be a list of all the layers and
>>>> activation function which are used and connections between them. Now we
>>>> have some options to do this:
>>>>
>>>>    1. *Total Naive Approach: *We can use file handling. Our tool will
>>>>    take code file as input. All layers are added like this(Add<Parameter>). We
>>>>    can detect the parameters and using openCV we can arrange them in a graph
>>>>    fashion.
>>>>    2. *A better approach: *A better approach will be to add a variable
>>>>    or function (for ex. FNN class) which keep track of the layers being added
>>>>    and other required parameters. Then we can create an object of visual
>>>>    class, and the FNN class object can be passed to this visual class which
>>>>    then can produce the required visualization.
>>>>
>>>> *Method 1 *maybe not that efficient and is prone to many errors as
>>>> here we also have to ensure code file given by the user contains right code
>>>> and all the connections are properly done. But here we don't need to touch
>>>> any of the base code of the library so required testing will be only be
>>>> limited to Visual Tool Class
>>>>
>>>> *Method 2 *is efficient but changing the base code of the library will
>>>> required extensive testing before we can merge it. Testing will take more
>>>> time here, but using objects can we more beneficial.
>>>>
>>>> I need some views regarding what method should be chosen and how to
>>>> proceed from here. Once the flow is established other parameters like
>>>> accuracy, bias and other parameters can be visualised using graphs. I have
>>>> some parameters in mind for now, we can also take some inspiration from
>>>> tensor-board <https://www.tensorflow.org/tensorboard> for that.
>>>>
>>>> Waiting for suggestion as  I am planning to implement a proof of
>>>> concept so that we can understand the project better.
>>>>
>>>> Regards
>>>> Gopi M Tatiraju
>>>>
>>>> _______________________________________________
>>>> mlpack mailing list
>>>> mlpack at lists.mlpack.org
>>>> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>>>>
>>> _______________________________________________
> mlpack mailing list
> mlpack at lists.mlpack.org
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20200319/db6cbf81/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.png
Type: image/png
Size: 7590 bytes
Desc: not available
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20200319/db6cbf81/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0.png
Type: image/png
Size: 32098 bytes
Desc: not available
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20200319/db6cbf81/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2.png
Type: image/png
Size: 52678 bytes
Desc: not available
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20200319/db6cbf81/attachment-0005.png>