Git Product home page Git Product logo

phonorm's Introduction

Phonorm

Project Status: Inactive โ€“ The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows. lifecycle PyPI - Python Version License DOI

phonorm is an exploratory project in which we apply a machine translation approach to the problem of phonetic normalization. The need for such a model arose from the type of conversations we observed in our chatbot ChitChat developed at the Leiden University Center for Innovation, as we observed a lot of text that is written much like it is spoken. Current phonetic algorithms, such as Soundex are too aggressive and do not work well in our use case.

You can find our writeup of the project here. Comments are welcome and can either be left in the issues section or can be sent to jasperginn[at]gmail.com

This repository contains the following files

+-- data
  | +-- extra
      - contains wikipedia dataset with commonly misspelled words
  | +-- preprocessed
      - contains preprocessed datasets
  | +-- raw
      - contains raw data (not preprocessed)
+-- docs
  - Contains presentation and writeup
+-- modeling
  - Contains Jupyter notebooks used for modeling
+-- models
  - Contains pre-trained models
+-- phonorm
  - Contains utilities and code for modeling
+-- preprocessing
  - Contains utilities and code for preprocessing data
+-- .gitignore
+-- README.md
+-- requirements.txt

A note on training the model

If you want to retrain the model using the data in this repository, be aware that training will be slow on CPUs. You should consider using a GPU.

Setting up

At a minimum, you need a python 3 installation. However, it would be best to use Anaconda. The steps below assume that you are using anaconda for this project.

  1. Create a new environment called 'phonorm'
conda create -n phonorm python=3.6 anaconda
  1. Activate the environment
source activate phonorm  

on Windows:

conda activate phonorm
  1. Install dependencies
conda install --yes --file requirements.txt
  1. (optional) Install 'pywiktionary' from git
pip install git+https://github.com/abuccts/wikt2pron.git
  1. (optional) install tensorflow-gpu if you are using a GPU
conda install tensorflow-gpu

At this point, your environment ready to be used.

Using phonorm

If you want to train your own models, you should check out the modeling folder for examples.

If you want to use the pre-trained models, please see the examples folder.

phonorm's People

Contributors

jasperhg90 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.