Phonorm

phonorm is an exploratory project in which we apply a machine translation approach to the problem of phonetic normalization. The need for such a model arose from the type of conversations we observed in our chatbot ChitChat developed at the Leiden University Center for Innovation, as we observed a lot of text that is written much like it is spoken. Current phonetic algorithms, such as Soundex are too aggressive and do not work well in our use case.

You can find our writeup of the project here. Comments are welcome and can either be left in the issues section or can be sent to jasperginn[at]gmail.com

This repository contains the following files

+-- data
  | +-- extra
      - contains wikipedia dataset with commonly misspelled words
  | +-- preprocessed
      - contains preprocessed datasets
  | +-- raw
      - contains raw data (not preprocessed)
+-- docs
  - Contains presentation and writeup
+-- modeling
  - Contains Jupyter notebooks used for modeling
+-- models
  - Contains pre-trained models
+-- phonorm
  - Contains utilities and code for modeling
+-- preprocessing
  - Contains utilities and code for preprocessing data
+-- .gitignore
+-- README.md
+-- requirements.txt

A note on training the model

If you want to retrain the model using the data in this repository, be aware that training will be slow on CPUs. You should consider using a GPU.

Setting up

At a minimum, you need a python 3 installation. However, it would be best to use Anaconda. The steps below assume that you are using anaconda for this project.

Create a new environment called 'phonorm'

conda create -n phonorm python=3.6 anaconda

Activate the environment

source activate phonorm

on Windows:

conda activate phonorm

Install dependencies

conda install --yes --file requirements.txt

(optional) Install 'pywiktionary' from git

pip install git+https://github.com/abuccts/wikt2pron.git

(optional) install tensorflow-gpu if you are using a GPU

conda install tensorflow-gpu

At this point, your environment ready to be used.

Using phonorm

If you want to train your own models, you should check out the modeling folder for examples.

If you want to use the pre-trained models, please see the examples folder.

jasperhg90 / phonorm Goto Github PK

phonorm's Introduction

Phonorm

A note on training the model

Setting up

Using phonorm

phonorm's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent