Git Product home page Git Product logo

digitalwallet's Introduction

An Efficient Data Pipeline for Fraud Detection

Methodology

To design an efficient fraud detection feature for PayMo users; the following are the two steps of solution:

  1. Building model: Using the batch_payment data to construct efficient graph data structure
  2. Streaming Values: For each incoming payments searching/parsing through the model to find out the degree.

Evaluated data structures

  1. Representing the transactions graph: To represent the transaction the following data structures have been evaluated:

    1. Bloom Filters: False positive is bad
    2. Trees(AVL,RB): Maintaining N trees is costly with no real advantage.
    3. Hashmaps: O(1) access for membership queries
    4. Array: duh!

From the above mentioned data structures; Hashmaps provide us with a good performance while performing membership queries and to represent a graph, a hash map array is used for each node in graph.

For example consider the following graph: example

The tree is now represented as

graph g =
{
	'userA':{
		'userC':'userC'
	}
	'userB':{
		'userC':'userC'
	}
	'userC':{
		'userA':'userA',
		'userB':'userB'
	}
}

where the graph g is a hash map array with indexes for each node and for each node as key in parent hashmap has a value of inner hashmap array containing nodes that share a edge.

Data pipeline

batch_payment        -----> Data Model
                                |
                                |
Stream at time $t_0$ -----> Fraud Detection Algorithm ----> unverified,trusted

Fairly, a simple pipeline that shall build the Data Model; which is a graph represented as hashmap. Once the data model is created, for each streaming value, the sender and receiver are identified and degree between them is determined to generate the warnings.

Implementation

Setup

Effort has been made to write as many custom implementations as possible with little use of external libraries:

  1. Libraries:

    1. Networkx:
    pip install networkx
    
    1. Python 2.7
  2. Environment:

    1. Ubuntu 16.04
    2. git

Algorithms

  1. Generate data model:
For each batch_payment:
	if not sender/receiver in model:
		initialize node
	add transaction edge between sender and receiver
	update the transaction
  1. Fraud detection algorithm:
For each stream_payment:
	findDegree(sender,receiver)
	update corressponding output

Optimizations

Several optimizations have been made:

  1. A global nodes dictionary is maintained to detect unknown transactions with O(1)
  2. For the first degree neighborhood, a different scheme is used. The receiver is checked directly from the hashmap entry instead of checking entore graph. Guarantees O(edges in sender node) or bounded by O(max(Edges))

Analysis

  1. O(1) insertion into the hashmap
  2. Building the grows linearly with the build transactions
  3. A Breadth first search is deployed to find the degree between the sender and receiver. Which performs: O(|V|+|E|) ~ O(max(V,E)) in entire graph

Testing

images/test.gif

Following are the various test cases written:

  1. test-1-paymo-trans: Basic test provided
  2. test-2-unknown-transactions: Test case for unknown transactions to be identified by optimization rule
  3. test-3-feature2: test for degree 2 check
  4. test-4-feature3: extended check for degree 4

Performance

  1. Build step: This typically takes about 17.308 s. Measured as execution time and averaged over five runs.
  2. Streaming: For one second of streaming data (915 transactions) are processed within 0.0649 s. Measured as execution time over five runs.

Remarks: Additionally, the build model can be cached for the user network locally.

Conclusion

The algorithm employed is guaranteed to bound by the linear time. The data structures are scalable and no bottlenecks have been observed during initial testing.

digitalwallet's People

Contributors

pseudoaj avatar

Stargazers

sai krishna kanneti avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.