[mlpack] Is scanning business transactions for fraud an appropriate use of MLpack?

Wed Nov 7 13:27:59 EST 2018

Hi, Ryan.

Hmm.  Interesting.  Always numeric, eh?

I could convert our data into numeric values, as you suggest, but I have
some misgivings.  Maybe I should say some more about the data records that
stream in.

1.  Some of our fields are encoded values.  So for example SERVICE_TYPE =
PSYCHIC_READING.  With other possible values being PALM_READING,
CASTING_STICKS, TAROT_CARDS.  (We don't really offer psychic readings.)

We should be able to assign a number to each of the possible values without
any problem.

2.  Some of our fields are numeric values.  So for example AMOUNT_CHARGED =
9.95.

I bet MLPack could handle these directly.

3.  Some of our fields are free text fields.  For example COMMENT =
"Customer seemed agitated.  I couldn't get a clear reading."

We could create a big dictionary, and map words to numbers.  Leaving out
stemming and phrases.  But that would truly be a very big dictionary.  And
it's quite likely that information in the comment might be useful for
determining whether the transaction is fraudulent.

My previous experience, CRM114, handles text swimmingly.  But it doesn't
handle numeric fields at all.  (Other than as a very peculiar looking
number.)  Perhaps neither engine is really appropriate.

Does this information about our fields jog loose any additional ideas?

On Tue, Nov 6, 2018 at 8:46 PM Ryan Curtin <ryan at ratml.org> wrote:

> On Tue, Nov 06, 2018 at 02:33:56PM -0500, Rick Hedin wrote:
> > Hi.  Could you give me a reading on whether MLpack is an appropriate tool
> > for what I want to do?  Too often, you start down a path, and after a few
> > weeks you realize "Oh.  I shouldn't be doing this."
>
> Hey there Rick,
>
> Always good to check first. :)  I'll do my best to provide useful
> answers...
>
> > I would like to put an AI process on the message stream, transparent
> > to other uses of the message stream.  When one of our operators marks
> > a transaction as "possibly fraudulent," that would be a data item for
> > the AI process.  When they later mark it "definitely fraudulent" or
> > "definitely not fraudulent," those are also data items for the AI.
> > Eventually, the AI would be able to add additional tags in the record
> > "AI suspects this transaction is fraudulent" or "AI suspects this
> > transaction is not fraudulent," along with "AI confidence is xxx%."
> >
> > The nice thing about this setup is nobody has to spend hours training it.
> > The data stream provides both data, and judgement on the data.
> >
> > So, is this a good application for MLpack?  Or is it more intended for
> > other purposes, and a different software suite is more appropriate?
>
> So, I think mlpack could work for this but keep in mind a lot of the
> system development here will be preparing the input to give to mlpack so
> that mlpack can make the predictions.
>
> mlpack does all its predictions on numeric data; so, for instance, if
> you have a dataset full of words, you'll need to convert these words to
> numeric values as one-hot encoding, or perhaps by an embedding or TF-IDF
> or something like this.
>
> Note that mlpack does have Python bindings, so if you're working from
> Python it might fit really nicely into a Python workflow.
>
> Hope that this is helpful!
>
> Thanks,
>
> Ryan
>
> --
> Ryan Curtin    | "Avoid the planet Earth at all costs."
> ryan at ratml.org |   - The President
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20181107/24e88723/attachment.html>