Git Product home page Git Product logo

mtklabsinterview's Introduction

Company Name Normalizer

This script normalizes company names in a CSV file, attributing patents to canonical company names. The normalization process includes handling whitespace, punctuation, legal structure variations, and fuzzy matching.

Requirements

  • Python (version 3.x)
  • Pip (Python package installer)

Installation

  1. Clone the repository:

    git clone https://github.com/xvimnt/MTKLabsInterview.git

1. Navigate to the project directory:

```bash
cd your-repository
  1. Install the required Python packages:
pip install -r requirements.txt

Usage

Run the script using the following command:

python company_name_normalizer.py input_file.csv output_file.csv

Replace input_file.csv with the path to your input CSV file and output_file.csv with the desired output CSV file.

Options

  • input_file.csv: Path to the input CSV file.
  • output_file.csv: Path to the output CSV file.

Algorithm Details

  • The algorithm normalizes company names for whitespace, punctuation, and legal structure variations before addressing misspellings.
  • It uses fuzzy matching to compare the similarity of two company names but relies on other attributes in the file to rule out potential false positives.

Notes

  • The script assumes no misspellings are possible in the country field but are possible in the city field.

Contributing

If you find issues or have suggestions for improvements, please open an issue or submit a pull request.

Remember to customize placeholders such as `your-username`, `your-repository`, and `[Your License Name]` with the relevant information for your script and repository.

mtklabsinterview's People

Contributors

xvimnt avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.