Git Product home page Git Product logo

hyphen's Introduction

Franklin M. Liang's hyphenation algorithm

npm version All Contributors

hyphen

Demo page

This is a text hyphenation library, based on Franklin M. Liang's hyphenation algorithm. In core of the algorithm lies a set of hyphenation patterns. They are extracted from hand-hyphenated dictionaries. Patterns for this library were taken from ctan.org and ported to Javascript.

import { hyphenate } from "hyphen/en";

(async () => {
  const text = "A certain king had a beautiful garden";

  const result = await hyphenate(text);
  // result is "A cer\u00ADtain king had a beau\u00ADti\u00ADful garden"
})();

Hyphenate HTML

Processor will automaticly skip HTML tags hyphenation.

import { hyphenate } from "hyphen/en";

(async () => {
  const text = "<blockquote>A certain king had a beautiful garden</blockquote>";

  const result = await hyphenate(text);
  // result is "<blockquote>A cer\u00ADtain king had a beau\u00ADti\u00ADful garden</blockquote>"
})();

Multilingual hyphenation

To hypehante text in any other supported language, just change the import source. For example for German language, import a hyphenation function from a "hyphen/de" source.

import { hyphenate } from "hyphen/de";

(async () => {
  const text = "Ein gewisser König hatte einen wunderschönen Garten";

  const result = await hyphenate(text);
  // result is "Ein ge\u00ADwis\u00ADser Kö\u00ADnig hat\u00ADte einen wun\u00ADder\u00ADschö\u00ADnen Gar\u00ADten"
})();

It is possible to use many langauges on the same page.

import { hyphenate as hyphenateEn } from "hyphen/en";
import { hyphenate as hyphenateDe } from "hyphen/de";

(async () => {
  const english = "A certain king had a beautiful garden";

  const englishResult = await hyphenateEn(english);
  // result is "A cer\u00ADtain king had a beau\u00ADti\u00ADful garden"

  const deutch = "Ein gewisser König hatte einen wunderschönen Garten";

  const deutchResult = await hyphenateDe(deutch);
  // result is "Ein ge\u00ADwis\u00ADser Kö\u00ADnig hat\u00ADte einen wun\u00ADder\u00ADschö\u00ADnen Gar\u00ADten"
})();

Sync version

The hyphenate function returns a Promise, however a sync version of it returns a string.

import { hyphenateSync as hyphenate } from "hyphen/en";

const text = "A certain king had a beautiful garden";

const result = hyphenate(text);
// result is "A cer\u00ADtain king had a beau\u00ADti\u00ADful garden"

Install

npm install hyphen

Install types definitions for Typescript usage.

npm install --save-dev @types/hyphen

Types definitions are created and maintained by Krisztián Balla.

Options

  • exceptions

    An Array of values with exceptions of hyphenation in words. Hard hyphen symbol - should be used to mark the position of further configured hyphenation symbol. Default value is [].

  • hyphenChar

    A String sets a value of the soft hyphen character. Default value is \u00AD.

  • minWordLength

    A Number sets the minimum length of the word, intended for hyphenation. Default value is 5.

Example of using options

import { hyphenate } from "hyphen/en";

(async () => {
  const text = "A certain king had a beautiful garden";

  const result = await hyphenate(text, {
    hyphenChar: "-"
  });
  // result is "A cer-tain king had a beau-ti-ful garden"
})();

List of available languages

Check the list
  • Afrikaans language
import { hyphenate } from "hyphen/af";
  • Assamese language
import { hyphenate } from "hyphen/as";
  • Belarusian language
import { hyphenate } from "hyphen/be";
  • Bulgarian language
import { hyphenate } from "hyphen/bg";
  • Bengali language
import { hyphenate } from "hyphen/bn";
  • Catalan language
import { hyphenate } from "hyphen/ca";
  • Coptic language
import { hyphenate } from "hyphen/cop";
  • Czech language
import { hyphenate } from "hyphen/cs";
  • Welsh language
import { hyphenate } from "hyphen/cy";
  • Church Slavonic language
import { hyphenate } from "hyphen/cu";
  • Danish language
import { hyphenate } from "hyphen/da";
  • German, traditional spelling
import { hyphenate } from "hyphen/de-1901";
  • German, reformed spelling
import { hyphenate } from "hyphen/de-1996";
  • German, traditional Swiss spelling
import { hyphenate } from "hyphen/de-CH-1901";
  • Modern Greek, monotonic spelling
import { hyphenate } from "hyphen/el-monoton";
  • Modern Greek, polytonic spelling
import { hyphenate } from "hyphen/el-polyton";
  • English, British spelling language
import { hyphenate } from "hyphen/en-gb";
  • English, American spelling language
import { hyphenate } from "hyphen/en-us";
  • Spanish language
import { hyphenate } from "hyphen/es";
  • Estonian language
import { hyphenate } from "hyphen/et";
  • Basque language
import { hyphenate } from "hyphen/eu";
  • Finnish language
import { hyphenate } from "hyphen/fi";
  • French language
import { hyphenate } from "hyphen/fr";
  • Friulan language
import { hyphenate } from "hyphen/fur";
  • Irish language
import { hyphenate } from "hyphen/ga";
  • Galician language
import { hyphenate } from "hyphen/gl";
  • Ancient Greek language
import { hyphenate } from "hyphen/grc";
  • Gujarati language
import { hyphenate } from "hyphen/gu";
  • Hindi language
import { hyphenate } from "hyphen/hi";
  • Croatian language
import { hyphenate } from "hyphen/hr";
  • Upper Sorbian language
import { hyphenate } from "hyphen/hsb";
  • Hungarian language
import { hyphenate } from "hyphen/hu";
  • Armenian language
import { hyphenate } from "hyphen/hy";
  • Interlingua language
import { hyphenate } from "hyphen/ia";
  • Bahasa Indonesia, Indonesian language
import { hyphenate } from "hyphen/id";
  • Icelandic language
import { hyphenate } from "hyphen/is";
  • Italian language
import { hyphenate } from "hyphen/it";
  • Georgian language
import { hyphenate } from "hyphen/ka";
  • Kurmanji, Northern Kurdish language
import { hyphenate } from "hyphen/kmr";
  • Kannada language
import { hyphenate } from "hyphen/kn";
  • Classical Latin language
import { hyphenate } from "hyphen/la-x-classic";
  • Liturgical Latin language
import { hyphenate } from "hyphen/la-x-liturgic";
  • Latin language
import { hyphenate } from "hyphen/la";
  • Lithuanian language
import { hyphenate } from "hyphen/lt";
  • Latvian language
import { hyphenate } from "hyphen/lv";
  • Malayalam language
import { hyphenate } from "hyphen/ml";
  • Mongolian, Cyrillic script, alternative patterns
import { hyphenate } from "hyphen/mn-cyrl-x-lmc";
  • Mongolian, Cyrillic script
import { hyphenate } from "hyphen/mn-cyrl";
  • Marathi language
import { hyphenate } from "hyphen/mr";
  • Multiple languages using the Ethiopic scripts
import { hyphenate } from "hyphen/mul-ethi";
  • Norwegian Bokmål, bokmål, norsk bokmål language
import { hyphenate } from "hyphen/nb";
  • Dutch language
import { hyphenate } from "hyphen/nl";
  • Norwegian Nynorsk, nynorsk language
import { hyphenate } from "hyphen/nn";
  • Norwegian, norsk language
import { hyphenate } from "hyphen/no";
  • Occitan language
import { hyphenate } from "hyphen/oc";
  • Odia, Oriya language
import { hyphenate } from "hyphen/or";
  • Panjabi, Punjabi language
import { hyphenate } from "hyphen/pa";
  • Pāli language
import { hyphenate } from "hyphen/pi";
  • Polish language
import { hyphenate } from "hyphen/pl";
  • Piedmontese language
import { hyphenate } from "hyphen/pms";
  • Portuguese language
import { hyphenate } from "hyphen/pt";
  • Romansh language
import { hyphenate } from "hyphen/rm";
  • Romanian language
import { hyphenate } from "hyphen/ro";
  • Russian language
import { hyphenate } from "hyphen/ru";
  • Sanskrit language
import { hyphenate } from "hyphen/sa";
  • Serbocroatian, Cyrillic script
import { hyphenate } from "hyphen/sh-cyrl";
  • Serbocroatian, Latin script
import { hyphenate } from "hyphen/sh-latn";
  • Slovak language
import { hyphenate } from "hyphen/sk";
  • Slovenian language
import { hyphenate } from "hyphen/sl";
  • Serbian, Cyrillic script
import { hyphenate } from "hyphen/sr-cyrl";
  • Swedish language
import { hyphenate } from "hyphen/sv";
  • Tamil language
import { hyphenate } from "hyphen/ta";
  • Telugu language
import { hyphenate } from "hyphen/te";
  • Thai language
import { hyphenate } from "hyphen/th";
  • Turkmen language
import { hyphenate } from "hyphen/tk";
  • Turkish language
import { hyphenate } from "hyphen/tr";
  • Ukrainian language
import { hyphenate } from "hyphen/uk";
  • Mandarin Chinese, pinyin transliteration
import { hyphenate } from "hyphen/zh-latn-pinyin";

Aliases for specific languages

  • Alias for hyphen/de-1996
import { hyphenate } from "hyphen/de";
  • Alias for hyphen/el-monoton
import { hyphenate } from "hyphen/el";
  • Alias for hyphen/en-us
import { hyphenate } from "hyphen/en";
  • Alias for hyphen/mul-ethi
import { hyphenate } from "hyphen/ethi";
  • Alias for hyphen/mn-cyrl
import { hyphenate } from "hyphen/mn";
  • Alias for hyphen/sh-cyrl
import { hyphenate } from "hyphen/sh";
  • Alias for hyphen/sr-cyrl
import { hyphenate } from "hyphen/sr";
  • Alias for hyphen/zh-latn-pinyin
import { hyphenate } from "hyphen/zh";

Factory function

Factory function can be used to create hyphenate function with changed default options.

Create hyphenation function with predefined exception list

import createHyphenator from "hyphen";
import patterns from "hyphen/patterns/en-us";

const hyphenate = createHyphenator(patterns, {
  // result in Promise
  async: true,
  // exceptions of hyphenation
  exceptions: ["present", "ta-ble"]
});

Predefined functions

The following are predefined hyphenate functions.

import createHyphenator from "hyphen";
import patterns from "hyphen/patterns/en-us";

const hyphenate = createHyphenator(patterns, {
  async: true
});

const hyphenateSync = createHyphenator(patterns);

Predefined hyphenate functions are set in every language pack.

jsDelivr CDN for older websites

It is possible to use hyphen on older websites with jsDelivr network. Check the package page on their website.

<script src="https://cdn.jsdelivr.net/npm/[email protected]/patterns/en-us.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/hyphen.min.js"></script>

After the script is added on your page, use еру createHyphenator to create a hyphenate function.

var hyphenate = createHyphenator(hyphenationPatternsEnUs, {
  async: true
});

Alternatives

Check other great hyphenation libraries:

  • Hyphenopoly does client-side hyphenation of HTML-Documents.
  • Hypher A fast and small hyphenation engine.

Text hyphenation in CSS

The CSS hyphens property is intended to add hyphenation support to modern browsers without Javascript:

p {
  hyphens: auto;
}

It is part of the CSS Text Level 3 specification. The browser compatibility list can be found on the related MDN page.

DEPRECATED

  • Option debug will be deprecated in further versions;

Migration

from 1.9.1 to 1.10.0

Option html default value changed from false to true

In cases when text parser should not skip HTML tags, apply the following code changes.

Default exported hyphenate function

// Code before 1.10.0
hyphenate(text);
// Code after 1.10.0
hyphenate(text, { html: false });

Create hyphenate function with pre 1.10.0 behavior using a factory function:

// Code after 1.10.0
const hyphenate = createHyphenator(patterns, {
  async: true,
  html: false
});

hyphenate(text);

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Eugene Tiurin
Eugene Tiurin

🤔 💻 🚧
Krisztián Balla
Krisztián Balla

🐛 🧑‍🏫 📣
Robin Millette
Robin Millette

💻 🐛
Asko Soukka
Asko Soukka

💻 🐛
Nicolas Sierra
Nicolas Sierra

💻 🐛
Jaume Ortolà
Jaume Ortolà

💻 🐛
Simon Osterlehner
Simon Osterlehner

💻
Jason Wohlgemuth
Jason Wohlgemuth

📖
Kamil Mielnik
Kamil Mielnik

💻 🐛
Oskar Köök
Oskar Köök

💻 🐛
Add your contributions

This project follows the all-contributors specification. Contributions of any kind welcome!

hyphen's People

Contributors

datakurre avatar jaumeortola avatar jhwohlgemuth avatar kamilmielnik avatar krisztianb avatar millette avatar simolation avatar ytiurin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hyphen's Issues

Unexpected output when the input contains the “constructor” word

There seems to be a problem with hyphen 1.6.2 when the input contains the word “constructor.” Here's a short example which reproduces the issue:

const hyphen = require('hyphen');
const patterns = require('hyphen/patterns/en-us');
const hyphenate = hyphen(patterns);
const result = hyphenate('Example of a constructor.');
console.log(result);

The expected output would be similar to:

Ex-am-ple of a cons-truc-tor.

but the actual output is:

Ex-am-ple of a function Object() { [native code] }.

Unit test

Here's the unit test which would reproduce the issue:

const createHyphenator = require("../hyphen.js");
const patterns = require("../patterns/en-us.js");

let hyphenate;

beforeAll(() => {
  hyphenate = createHyphenator(patterns, { hyphenChar: "-" });
});

describe("Constructor word hyphenation", () => {
  test("Should hyphenate constructor word correctly", () => {
    expect(hyphenate("constructor")).toBe("con-struc-tor");
  });
});

Problem with minified code in Firefox

I have a problem with the minified code in Firefox – SyntaxError: invalid range in character class

Minified code (compiled by CRA) has the code return { next: function() { for (var r, i = ""; r = e.charAt(t++);) { var o = /\s|[\!-\@\[-\`\{-\\xbf]/.test(r), a = o ? a === n.readWord ? n.returnWord : n.returnChar : n.readWord; switch (a) { case n.readWord: i += r; break; case n.returnWord: return t--, i; case n.returnChar: return r } } if ("" !== i) return i } }
where
var o = /\s|[\!-\@\[-\`\{-\\xbf]/.test(r),
is wrong expression because of \\xbbf.

Why do we have to escape ¿ sign in

var charIsSpaceOrSpecial = /\s|[\!-\@\[-\`\{-\¿]/.test(nextChar);
?

charIsSpaceOrSpecial fails to catch special characters

Dear all,

the regular expression on line 103 of hyphen.js

103 var charIsSpaceOrSpecial = /\s|[\!-\@\[-\{-\xbf]/.test(nextChar); `

does not really work with all NON-WORD characters. For instance, in the provided Fiddle, by playing a little with the width of the Italian version of the result pane, you will notice that it breaks the line after an open quote, as follows:

giu­di­zî s’
­hanno a ri­ferire

which is clearly wrong, but unfortunately the open quote character is not captured by the regular expression provided.

FIX

Following the suggestion in https://mathiasbynens.be/notes/es6-unicode-regex ,
I replaced the above-mentioned line 103 with the following:

103 var regExp = /[^\p{L}]/u ;
104 var charIsSpaceOrSpecial = regExp.test(nextChar); 

and everything seems to work correctly.

I did not verify thoroughly, though.

hyphenationPatternsEnUS is not defined

I'm trying to get this working in a Wordpress Admin. It looks like exactly what I need but I'm getting this error when I try to run the script:

ReferenceError: hyphenationPatternsEnUS is not defined

I noticed that it looked like a previous fix was implemented for this but I seem to be having the same problem.

Incorrect Output for abortion

abortion -> 2io>20 o2n>02 r1ti>010 1tio>100 > -> 0 0 0 0 1 2 0 0 0 -> abor1t2ion -> abor-tion

The word "abortion" however has 3 syllables "a-bor-tion".

Classnames are hyphenated, too.

I like the simplicity of your system, but it hyphenates classnames, too. I'd suggest to exclude all attributes, classnames, ids, etc.

Example uses wrong language pattern

The example says (and IS written in) German, but the language pattern collection used is for LOWER German = NL = The Netherlands = DUTCH. Yes, those languages are related, but so are English and Dutch. Or Russian and Croatian. Or Spanish and Italian. BUT: They are NOT identical.

cu, w0lf

Imports seem to be incorrect

In the docs it says that all exports are named like:

import {
  hyphenate,
  hyphenateHTML,
  hyphenateHTMLSync,
  hyphenateSync
} from "hyphen/en";

but that doesn't work for me. I have to do:

import Hyphen from "hyphen/en";
const {
 hyphenate,
  hyphenateHTML,
  hyphenateHTMLSync,
  hyphenateSync
} = Hyphen;

It looks like all the name exports are exported inside the default export

Hyphenation error since version 1.7.0

Hi Yevhen. We are using your great library in our software and recently found a regression after we updated our version.

You can see the error here: https://ytiurin.github.io/hyphen/?path=/story/hyphen-languages--german

  • Ge•si•cht should be Ge•sicht
  • Kö•ni•gs should be Kö•nigs

As there should always be one vocal in every syllable.

  • Example from our document with an error: Grund•schuld•teile should be Grund•schuld•tei•le

This is working in version 1.6.6
No longer working since version 1.7.0

I didn't look at the code specifically but I guess the error was introduced by this commit: 38ac577

Please keep the HTML option

I see in the latest release that the html option is deprecated.

Please do not remove the html option. HTML and text are different data types, and being able to specify the data types used can make for more robust and explicit code. I only use the hyphenate library with text.

Thank you.

repeated patterns inside a word

There is an obvious problem when a word contains a repeated pattern (e.g. blablabla, nanana, papapa). The pattern is used only once here, but it should be used in every occurrence of the pattern.

Import from .mjs in node 19 is not working

Tried as shown in the docs

import { hyphenate, hyphenateHTML, hyphenateHTMLSync, hyphenateSync } from "hyphen/en";

but I get errors

Uncaught Error Error [ERR_UNSUPPORTED_DIR_IMPORT]: Directory import '/home/user/bigtext/node_modules/hyphen/en' is not supported resolving ES modules imported from /home/user/bigtext/src/lib/test.mjs Did you mean to import hyphen/en/index.js? at __node_internal_captureLargerStackTrace (internal/errors:490:5) at NodeError (internal/errors:399:5) at finalizeResolution (internal/modules/esm/resolve:224:17) at moduleResolve (internal/modules/esm/resolve:850:10) at defaultResolve (internal/modules/esm/resolve:1058:11) at nextResolve (internal/modules/esm/loader:163:28) at resolve (internal/modules/esm/loader:835:30) at getModuleJob (internal/modules/esm/loader:416:18) at <anonymous> (internal/modules/esm/module_job:76:40) at link (internal/modules/esm/module_job:75:36) --- await --- at processTicksAndRejections (internal/process/task_queues:95:5) --- await --- at runMainESM (internal/modules/run_main:53:21) at executeUserEntryPoint (internal/modules/run_main:79:5) at <anonymous> (internal/main/run_main_module:23:47)

Set minimum word length

Is there a way to set the minimum word length to hyphenate? I'm trying to make my app as accessible as possible and supporting large text mode. I have some very short words that are supposed to fit in small spaces. I was hoping to still hyphenate small words so there wouldn't be any awkward trailing characters.

hyphenateHTML will add hyphens to html parameters

It seems that if given a string containing HTML, the html tag name will not be hyphenated, but all of its property will.

For example hyphenateHTMLSync('<span class="user-input" style="text-transform: capitalize;">HTML tags are NOT hyphenated</span>',{hyphenChar: "[-]"}) will return <span class="user-in[-]put" style="text-trans[-]form: cap[-]i[-]tal[-]ize;">HTML tags are NOT hy[-]phen[-]at[-]ed</span> leaving the class and other properties broken.

I think it shouldn't. I come to this discovery trying to take the entire innerHTML content of a <p>, including some spans with classes, and hyphenate it.

The tag id hyphenates only the first text and I changed to class

Before I added id to many classes as div, span and p, but it accepted to hyphenate only the first text.

I changed to class and it gave an error.

    <!-- Hyphen, by Eugene Tiurin -->
    <script type="text/javascript">
        function provideHyphenation(hyphenate, class) {
            var el = document.getElementByClassName(".deutsch");
            el.innerHTML = hyphenate(el.innerHTML);
        }
        provideHyphenation(createHyphenator(hyphenationPatternsDe), 'de');
    </script>

As you are an expert in Javascript, why did it give my error of function provideHyphenation(hyphenate, class)?

Zip file in node package

I've just installed hyphen locally and noticed that there is a file named hyphen-1.3.1.tgz within the directory of the node package. I guess it was added by accident.

Failed to resolve '../patterns/en-us.cjs.js' since 1.10.2

Since release 1.10.2 we are facing the following error:

Module not found: Error: Can't resolve '../patterns/en-us.cjs.js' in '/data/node_modules/hyphen/en-us.cjs'

It looks like the patterns/en-us.cjs file is exported to NPM but is not present in the patterns folder:
https://www.npmjs.com/package/hyphen?activeTab=code

The code in which we are using hyphen is as follows:

const { hyphenate } = await import(`hyphen/${language}`);
const hyphenatedText: string = await hyphenate(text, { hyphenChar: '&shy', minWordLength: 10 })
    .then(result => {
        return result;
    })
    .catch(() => {
        return text;
    });

Maybe the way we dynamically import hyphen is causing an issue?

How make hyphenation in document where multilang words ?

I have document with russian words
i made hypenation in next way using library reactpdf:
const hyphenator = hyphen(pattern); const hyphenationCallback = (word) => { return hyphenator(word).split('\u00AD'); } // <Text hyphenationCallback={hyphenationCallback}></Text>
But what if i have also english words ? How make hyphenation in document where multilang words, for example englisn and russian ?

New JS Fiddle example, please

This project used to contain a link to a JS Fiddle example. But I don't see it anymore. Can it come back? I don't remember how to install the system on a website and there's no (clear) instruction on how to do that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.