Git Product home page Git Product logo

convert-british-to-american-spellings's Introduction

US/UK Spelling Converter

You provide the text, with either US/UK-spelling.

We return the same text, converted to either system.

We have you covered -- for about 20,000 words.

TOC

  1. TOC
  2. Online Demos
  3. Features
  4. Functionality
  5. Example Usage
  6. Code Structure and Design

Online Demos

Check out the code in an online demo...

Simple Demo Hosted by Us

Editable, Online Sandbox Demo (at IDEone.com)

Note: Since there are text limits to online compilers, we reduced the actual list of words covered to make this demo run.

Features

Regularly updated! Please submit corrections, additions, fixes, anything!

How many words are covered?

  • Total of 20,000 words covered, with multiple sources.
    • Source: VarCon/ISpell (18,000 words).
    • Source: WordsWorldWide (8,000 words).
    • Source: Our own personal list.
      • BtA List: Literary and archaic British variants (1500's to 1900's): (~500 words).
      • BtA List: Alternative Latinized spellings of Russian and French names: (~1,500 words).
      • BtA List: Alternative dashed-form words ("hundredfold" versus "hundred-fold"): (~2,000 words).
    • These lists were used to cross-check each other, correct errors, and remove duplicates.
    • Letter-sorted lists for easily updating and checking on words: A (1314 words), B (687 words), C (1,807 words), D (1,427 words), E (948 words), F (678 words), G (654 words), H (1,066 words), I (590 words), J (149 words), K (264 words), L (641 words), M (1,312 words), N (716 words), O (532 words), P (2,273 words), Q (57 words), R (1,071 words), S (2,024 words), T (800 words), U (1,259 words), V (450 words), W (177 words), X (0 words), Y (75 words), z (63 words).
  • Variants for British words.
    • For example, "unrealisable" and "unrealiseable".
  • Words are defined with simple associative array, making for a quick transfer to Perl, C++, Java, etc..
    • For example, the syntax of somekey=>"somevalue" is widely-used throughout many languages, or easily converted to their versions of this syntax.
  • Permissively-licensed
    • Do whatever you want with the code!
    • For example, see what others are doing with their personal, commercial, and legal rights as endowed by BSD-3-clause-licensed software.

Functionality

General Behavior

How in general does it work?

  • Exact / Error-Resistant
    • British/American Spelling Converter uses regular expression checking with /\b$word\b/, so this makes it impossible to corrupt words.
    • For example, "Ax" becomes "Axe", but "Axiomatic" will remain as "Axiomatic", and cannot become "Axeiomatic", which would be incorrect.
  • Fast / Efficient
    • Every mass-replace is done within a single preg_replace() call, using arrays as arguments
    • This means that the script will finish much sooner.
  • Reliable / Atomic / Deterministic
    • American-ize/British-ify will not corrupt meaning.
    • For example, 'discus' and 'diskus' have reverse meanings in US/UK, swapping them in or out will cause the text to change each time you "Americanize" or "Britishify" it. So, we don't do these types of swaps.

Precise Behavior - Use Cases

How exactly does it work?

  • Only all lower case, all upper case, or first letter capitalized versions are converted.
    • Example: American=>English, "axe"=>"ax", "AXE" would be converted to "AX" or vice versa, but "AxE would not be converted to Ax".
  • Apostrophes are treated as word boundaries.
    • Example: American=>English, "axe"=>"ax", "the ax's handle" would be converted to "the axe's handle."
  • Only precisely whole, known words are converted.
    • Example: American=>English, "axe"=>"ax", this will not convert "axed" to axd", because the "-d" concluding character indicates that it is an entirely different word.
  • Dashes are treated as word boundaries only when not preceded and followed by a dash.
    • Example: American=>English, "affecteffect=>affect-effect", this will convert "the affect-effect of it" to "the affecteffect of it", but it will not convert "these every-night-affect-effect-happenings are" to "these every-every-night-affecteffect-happenings are", as the dash here implies new meaning than when solely alone.
  • British alternates are handled.
    • Example: American=>English, "amoebas"=>["amoebae", "amebas", "amebae",], if converting to English, "amoebas" will be replaced with "amoebae", the most contemporary term, and if converting to American, "amoebae", "amebas", etc., will all be converted to the single, American equivalent.

Some test sentences...

The neighbour walked to the theatre's centre, manoeuvred about the sabre, and proceeded to reconnoitre the sepulchre in ochre.

The rumour spread that splendour and flavour were affected by our behaviour, so walk a metre in my mitre while carrying a litre of nitre.

The connexion with industrialisation remains with the municipalisation of the calibre of the fibre of the spectre, not with the meagre and sombre saltpetre with all its colour and honour.

Example Usage

How do I use the British/American Spelling Converter?

Americanize Text Example

How do I convert British-spelling text to American-spelling text?

require('AmericanBritishSpellings.php');
$american_british_spellings = new AmericanBritishSpellings([]);

$text = "Axiomatically ax that door, would you, my neighbour?";     // British input text source

$americanized = $american_british_spellings->SwapBritishSpellingsForAmericanSpellings(['text'=>$text]);

print($americanized);   // output: Axiomatically axe that door, would you, my neighbor?

Britishize Text Example

How do I convert American-spelling text to British-spelling text?

require('AmericanBritishSpellings.php');
$american_british_spellings = new AmericanBritishSpellings([]);

$text = "Axiomatically axe that door, would you, my neighbor?";     // American input text source

$britishized = $american_british_spellings->SwapAmericanSpellingsForBritishSpellings(['text'=>$text]);

print($britishized);   // output: Axiomatically ax that door, would you, my neighbour?

Code Structure and Design

Coding Languages

What coding languages are used in the British/American Spelling Converter?

The entire project is coded in the following...

  • PHP - For processing the text and storing the US/UK words.

Exclude List

How do you avoiding adding words that would break the deterministic / atomistic model of functionality?

We do this with an exclude list, which also details the conflict in the words themselves.

Check it out: Exclude List.

AmericanBritishSpellings.php - Technical Overview

What are the functions in the sourcecode files for?

AmericanBritishSpellings.php

Class for converting text from US/UK spellings to US/UK spellings.

  • __construct($args)
    • Constructor.
    • Load the words into the converter class for ready use.
  • SwapBritishSpellingsForAmericanSpellings($args)
    • Convert text with British spellings to text with American spellings.
  • SwapAmericanSpellingsForBritishSpellings($args)
    • Convert text with American spellings to text with British spellings.
  • GetSpellingsAndReplacements($args)
    • Get spellings and replacements based on the desired end language.
  • BuildSpellingAlternates($args)
    • Building spelling alternatives for British and American dialects.
  • BuildSpellingAlternatesForLanguage($args)
    • Building spelling alternates for a single particular dialect of a language (either British or American, in our case).
  • BuildSearchRegex($args)
    • Build an array of search regexes when given an array of search terms.
  • BuildSearchRegex($args)
    • Build a single search regex for a single search term.
  • BuildSpellingReplacements()
    • Build the replacements to be used for the search terms.

AmericanBritishSpellings_Words.php

Class for building word lists for converting UK/US english dialects.

  • __construct($args)
    • Constructor.
    • Nothing to do here.
  • GetBritishToAmericanSpellings()
    • Build a mapping of British to American spellings.
  • GetAmericanToBritishSpellings()
    • Build a mapping of American to British spellings from the /Language/Words/AmericanBritish/ classes.

AmericanBritishWords_A.php ... AmericanBritishWords_Z.php

  • __construct($args)
    • Constructor.
    • Load the words into the converter class for ready use.
  • AmericanBritishWords()
    • List of US/UK spellings for words starting with : A...Z.

convert-british-to-american-spellings's People

Contributors

holdoffhunger avatar nameless-ross avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

convert-british-to-american-spellings's Issues

Make Acronym/StrToUpper()'d Words Optional and Controlled by Config

Can we make it os that acronym/strtoupper'd words are optional and/or controlled by a config?

If you look at...

BuildSpellingAlternatesForLanguage()

here, https://github.com/HoldOffHunger/convert-british-to-american-spellings/blob/master/classes/Language/AmericanBritishSpellings.php .

You will see: $spellings_uppercase = array_map('strtoupper', $spellings);

This upper-cased word is then used in replacements, i.e., "NEIGHBOUR"=>"NEIGHBOR". Since people won't necessarily always want to convert acronyms, make this controllable by an argument, both by means of the constructor and by means of the main converting function, i.e....

This....

$abs = new AmericanBritishSpellings([]);

Can become....

$abs = new AmericanBritishSpellings(['acronyms'=>1]);

And, this...

 $abs->SwapAmericanSpellingsForBritishSpellings(['text'=>$americanized]);

Can become...

 $abs->SwapAmericanSpellingsForBritishSpellings(['text'=>$americanized, 'acronyms'=>1]);

This argument then controls whether anything is done with $spellings-uppercase.

Please make sure to test and submit an IDEone demo, much like the IDEone demos listed in the README.md (in fact, it is recommended that you fork one of these demos).

Relative paths are not correct

Hi,

I have tried out the tool but after requiring AmericanBritishSpellings.php in my code I'm getting fatal errors that AmericanBritishSpellings_Words.php failed to open.

On closer inspection it looks as if the relative paths are incorrect for AmericanBritishSpellings_Words.php and AmericanBritishWords_[LETTER].php.

Everything works fine if I replace the relative path ../classes/Language with __DIR__

Many thanks in advance!
Ross

Move Required Files Out of Class Constructor Definitions

In #4 , we discovered that there was some weird stuff going on with object classes actually doing the require() calls. We fixed it with a simple require_once() hack.

Please update the code so that when a user requires AmericanBritishSpellings.php, the requires for all other subsidiary classes are done immediately in that PHP file, just before the class is actually defined, so that the class constructor and related methods don't have do any require()ing at all. I.E., something like...

require('x.php'); class AmericanBirtishSpellings { ...}

Instead of...

class AmericanBirtishSpellings { ... constructor() {require('x.php'); } }

Make Archaic Spellings-Mode an Optional Config Choice

Our style of American=>British spellings is typically, 'american-spelling'=>['modern-british-spelling', 'rare-british-spelling', 'archaic-british-spelling'], and when converting back and forth, we automatically always choose the most modernized word, because that makes sense.

But maybe someone wants to have the archaic spellings specifically? In this case, we would select the last element of this array (which is the most archaic), and not the first element (which is the most contemporary and/or modern).

Create an optional config choice (established by means of constructor or convert args) that causes the converter to use the last British optional alternate as opposed to the most modern one.

If you are curious on coding details, check out a similar task, which is to make Acronyms-optional in constructor/args: #5

Detection vs. Conversion Modes

Currently, the code can only convert between British and American. Can we make a new mode (function?) so that the code has a mode for detecting British/American spellings, without doing any actual conversion on the text?

American to British not working

Hi there,

I've tried out your tool and everything is working fine for translations from British to American but not vice-versa.

The following error is thrown out:

Warning: preg_replace(): Parameter mismatch, pattern is a string while replacement is an array in /.../multilingual/AmericanBritishSpellings.php on line 45

Warning: ucwords() expects parameter 1 to be string, array given in /.../multilingual/AmericanBritishSpellings.php on line 47

I want to use the tool for filtering wordpress content:

function replace_content($content){

$args = array(
'text'=>$content
);

$american_british_spellings = new AmericanBritishSpellings($args);
$content = $american_british_spellings->SwapAmericanSpellingsForBritishSpellings($args);

return $content;
}
add_filter('the_content','replace_content');

Thanks in advance for your help!

Best regards
Robin

Add Early 20th-century, British-Latinized, Russian, Possessive Forms

In version 1.02, we added thousands of Russian names that had spelling alternatives in British and French. We did not add their possessive forms, though, i.e., person versus person's. Add the possessive forms for this task.

Relevant code will be attached.

NOTE: Do not do this the hard way. Add the A's to the A...php file, check that there's no duplicates at https://www.revoltlib.com/datautilities.php?action=findDuplicateArrayKeys , and then save and push.

new-british-russian-2.txt

Feature request: custom word replacements

Hi,

You're probably fed up hearing from me! The great work you have done has led me to an idea for a new feature. It would be great to customise and add additional words that may be more appropriate for different use-cases.

The best example I can think of is exchanging 'holiday' for 'vacation'. Both spellings are correct but are used differently. The ability to add custom words such as this in a class method would be really useful.

What I would propose is a new method in AmericanBritishSpellings which accepts an array of word replacements that would be appended to the $word_hash in AmericanBritishSpellings_Words.

I'm happy to work on a solution but I wanted to get your thoughts on this sort of functionality.

Many thanks!
Ross

Duplicates in Spellings

There are multiple instances of ambiguous duplicates in the spellings. E.g. when calling GetAmericanToBritishSpellings(), you are returned a dictionary with both the (thiamine, thiamin) and reverse (thiamin, thiamine) pairs. Same for sirup/syrup, partizan/partisan...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.