Expectations the solution fulfills :-

The code can run on multi core machines with better performance as multiprocessor concept of python is used.
The code can scale for more than 2 pofiles.
The fields are extensible, any number of fields are supported by Profile class.

How to run the code

clone the repo and install requirements.txt packages in your respective container/virtualenv
run :-

python3 Duplicate_Finder.py

To try custom inputs, modify main function accordigly

def main():

  o1 = Profile(id=1,first_name = "Kanhai", last_name = "Shah",email_field = "[email protected]", random_field=1)
  o2 = Profile(first_name = "Kanhai", last_name = "Shah",email_field = "[email protected]")
  df = Duplication([o1.get_profile(),o2.get_profile()])
  df.findDuplicates(['email_field','first_name','last_name','random_field'])
  print(df.get_result())

Small logic I tweeked :-

It is given in question that :-
if first_name + last_name + email match between two profiles is greater than 80% (you can try using a library like https://pypi.org/project/fuzzywuzzy/), increase the match score to 1
Also in find_duplicates sometimes all these 3 fields are not passed

To resolve this confusion, what I have done :-
- If any of first_name, last_name, email_field is passed, the fuzz logic is performed as in every Profile these fields are mandatory to be there and they will be there.
- if fuzz_logic gives >80% match, total_match_score is incremented by 1

The code today is extensible for more than 2 profiles, but the fuzzywuzzy comparison runs for only 2 string at a time.

To resolve this confusion, what I have done :-
- The first profile (0 index profile) is used like an anchor profile, it compares itself with every other logic for fuzzy fields (first_name, last_name,email_field) and the minimum match % is used to decide the total match score update(whether to increment or not)

ayushsingh12march / backend-task Goto Github PK

backend-task's Introduction

Expectations the solution fulfills :-

How to run the code

To try custom inputs, modify main function accordigly

Small logic I tweeked :-

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent