Git Product home page Git Product logo

backend-task's Introduction

Expectations the solution fulfills :-

  • The code can run on multi core machines with better performance as multiprocessor concept of python is used.
  • The code can scale for more than 2 pofiles.
  • The fields are extensible, any number of fields are supported by Profile class.



How to run the code

  • clone the repo and install requirements.txt packages in your respective container/virtualenv
  • run :-
python3 Duplicate_Finder.py    



To try custom inputs, modify main function accordigly

def main():

  o1 = Profile(id=1,first_name = "Kanhai", last_name = "Shah",email_field = "[email protected]", random_field=1)
  o2 = Profile(first_name = "Kanhai", last_name = "Shah",email_field = "[email protected]")
  df = Duplication([o1.get_profile(),o2.get_profile()])
  df.findDuplicates(['email_field','first_name','last_name','random_field'])
  print(df.get_result())



Small logic I tweeked :-

  1. It is given in question that :-
    if first_name + last_name + email match between two profiles is greater than 80% (you can try using a library like https://pypi.org/project/fuzzywuzzy/), increase the match score to 1
  2. Also in find_duplicates sometimes all these 3 fields are not passed
  • To resolve this confusion, what I have done :-
    • If any of first_name, last_name, email_field is passed, the fuzz logic is performed as in every Profile these fields are mandatory to be there and they will be there.
    • if fuzz_logic gives >80% match, total_match_score is incremented by 1

  1. The code today is extensible for more than 2 profiles, but the fuzzywuzzy comparison runs for only 2 string at a time.
  • To resolve this confusion, what I have done :-
    • The first profile (0 index profile) is used like an anchor profile, it compares itself with every other logic for fuzzy fields (first_name, last_name,email_field) and the minimum match % is used to decide the total match score update(whether to increment or not)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.