Git Product home page Git Product logo

beygla's Introduction

Beygla

Tiny (5kB gzipped) declension helper for Icelandic names

applyCase("ef", "Jóhannes");
//=> "Jóhannesar"

applyCase("þgf", "Helga Fríða Smáradóttir");
//=> "Helgu Fríðu Smáradóttur"

Overview


Why does beygla exist?

Icelandic names have four cases:

Guðmundur   →  Nominative case (nefnifall)
Guðmund     →  Accusative case (þolfall)
Guðmundi    →  Dative case (þágufall)
Guðmundar   →  Genitive case (eignarfall)

The different cases are used depending on the context in which the name is used.

  • „Hann Guðmundur hefur bætt sig mikið.“
  • „Illa er farið með góðan Guðmund.“
  • „Hvað finnst Guðmundi um breytingarnar?“
  • „Ég kem þessu áleiðis til Guðmundar.“

Icelandic usernames are stored in the nominative case (nefnifall). This can pose a challenge when using the name in a sentence.

The document has been sent to Guðmundur

Translated to Icelandic, this reads:

Skjalið hefur verið sent á Guðmundur

To an Icelander, this is jarring. The name appears in the nominative case „Guðmundur“, but it should be in the accusative case „Guðmund“.

Rewritten to use the nominative case, we get:

Guðmundur hefur fengið skjalið sent

But we've now changed the message entirely!

BeforeAfter

The document has been sent to Guðmundur

Guðmundur has received the document

This forces an Icelandic content writer to degrade the user experience by either

  • using language that is not as natural, or
  • reducing specificity by omitting the name entirely.

By being able to decline (transform) names to the correct case, we would remove this problem entirely.

Unfortunately, Icelandic name declension has lots of rules, with lots of exceptions.

# Left is nominative case, right is accusative case

Gauti → Gauta
Jóhanna → Jóhönnu
Snæfríður → Snæfríði
Alex → Alex
Bjarnfreður → Bjarnfreð

Encoding these rules, and their exceptions, is hard and can take up a lot of space. Developers don't want to add hundreds of kilobytes to the bundle size, just to apply cases to names.

Well, beygla encodes these rules in just 5 kilobytes gzipped.1

Usage

Install beygla as an npm package:

npm i -S beygla

Beygla exports a single function named applyCase.

import { applyCase } from "beygla";

applyCase("ef", "Jóhannes");
//=> "Jóhannesar"

applyCase("þgf", "Helga Dís Smáradóttir");
//=> "Helgu Dís Smáradóttur"

applyCase accepts two parameters: a case and a name (in the nominative case2).

The return value is a string with the name declined to the desired case.

Cases

The following cases may be provided as the first argument to applyCase:

Case (English)  Case (Icelandic) Value (English) Value (Icelandic)
Nominative Nefnifall "nom" "nf"
Accusative Þolfall "acc" "þf"
Dative Þágufall "dat" "þgf"
Genitive Eignarfall "gen" "ef"

If a case not in the table above is provided, "nf" is used as a fallback (i.e. nothing is done).

Whitespace

If the name includes superfluous whitespace, applyCase removes it.

applyCase("þgf", "  \n  Helga  Dís\tSmáradóttir  \n\n");
//=> "Helgu Dís Smáradóttur"

Addresses

The beygla/addresses module allows you to apply declension to Icelandic addresses and place names:

import { applyCase } from "beygla/addresses";

applyCase("þf", "Rauðalækur 63");
//=> "Rauðalæk 63"

applyCase("ef", "Reykjavík");
//=> "Reykjavíkur"

applyCase("þgf", "Þjórsárdalur");
//=> "Þjórsárdal"

Its behavior is the same as the regular beygla module, except it contains data that allows it to apply cases to Icelandic addresses and place names instead of person names. All of the same pattern matching behaviors and limitations apply.

The beygla/addresses module is around 4.9kB gzipped.

Correctness

Beygla will correctly apply the desired case to the input name in most cases.

Most Icelandic names (81%), especially common ones, are present on bin.arnastofnun.is. Beygla is guaranteed to produce a correct result for those names.

This does not mean that Beygla produces an incorrect result for the other 19% of names. Beygla finds patterns in name endings based on the data on bin.arnastofnun.is and applies those patterns to any input name. This means that beygla will produce a correct result for most names, even if the name is not in the dataset from bin.arnastofnun.is.

I tried randomly sampling 20 names from the list of legal Icelandic names not present in bin.arnastofnun.is:

  • 14 names matched a pattern with the correct result
  • 6 names matched no pattern
  • 0 names matched a pattern with an incorrect result

Even though I happened to get no incorrect results, this is a very small sample. I'm absolutely certain that there are a handful of names that will produce incorrect results.

See beygla.spec.ts.

Strict mode

Beygla provides a "strict version" accessible under beygla/strict which guarantees that declensions are only be applied to legal Icelandic names.

import { applyCase } from "beygla/strict";

The interface for beygla/strict is the exact same as for beygla.

Only declining Icelandic names may not be desirable when a correct declensions is not to applied to a foreign name. The beygla/strict module is also 15kB gzipped, which is three times larger than the standard beygla module.

Passing a name in the wrong case

Beygla operates on the assumption that names provided to it are in the nominative case (nefnifall). If a name provided to beygla is in another case than nominative, an incorrect result is extremely likely.

What happens if beygla does not find a pattern?

Given a name that has an ending that beygla does not recognize, it will not apply the case to the name.

Do note that beygla attempts to apply the case to every name (first, last, and middle name) in a full name individually. This means that some names in a full name might have a case applied, and some not.

Footnotes

  1. Declension rules are encoded using cases for 3647 out of 4505 Icelandic names (81%). The data for the cases is from bin.arnastofnun.is.

  2. If the provided name is not in the nominative case, applyCase is likely to yield an unexpected value.

beygla's People

Contributors

alexharri avatar valgeirb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

valgeirb oddsson

beygla's Issues

For names that can also be identified as a noun the declension is wrong if they are not the same for the name and noun

Some names also fall into a category of a noun.
For example the names Eldey, Hafrún and Sóley.
The declension for the name and noun are different and the beygla package seems to return the declension for the noun, not the name in these cases.

Example:
The declension of the name Sóley is: Sóley - Sóleyju - Sóleyju - Sóleyjar
The declension of the noun Sóley is: Sóley - Sóley -Sóley - Sóleyjar

beygla returns the later one which is not correct.

Readme: Cases -> Inflections

Mér skilst að enska orðið inflections sé notað fyrir beygingar orða, það væri held ég réttara að segja t.d. "Icelandic names have four inflections", og ef til viłl réttast að segja "Icelandic names generally have four inflections" þar sem sum nöfn hafa jafnvel fleiri en eina almennt samþykkta beygingu 😄 , til dæmis nafnið Björn (til Bjarnar, en líka til Björns).

Sjá t.d. Cambridge dictionary: inflection.

Option to not try to apply case to non icelandic names

Hi! 👋

Would it be possible to add an option to applyCase to control whether or not beygla tries to apply case to foreign names?

Example

Current functionality: applyCase('þgf', "Carlos") => Carlosi
Wanted functionality: applyCase('þgf', "Carlos", {applyToForeignNames: false}) => Carlos

Maybe we could do a lookup in /data/icelandic-names.csv if applyToForeignNames is set to false 🤷

Beygla for street addresses

Hi!

Is there any way you could use Beygla's way of declining names to decline Icelandic street addresses? It would be awesome to be able to do something like

import { applyCaseToAddress } from 'beygla'

applyCaseToAddress('ef', 'Dúfnahólar 10') // RESULT: Dúfnahóla 10

There is a list available of all Icelandic street addresses in Staðfangaskrá 👀

Thanks for your time ☺️

Handle known problem addresses

There are some addresses which have conflicting cases:

Efri-Hreppur
Efra-Hrepp
Efra-Hreppi
Efra-Hrepps

vs

Efri-Hvoll
Efri-Hvol
Efri-Hvol
Efri-Hvols

Beygla can currently only pick one, which means that one of those will be incorrectly declined.

I list these as "known problem addresses" in addresses.spec.ts (see source).

Uppástunga að viðbótar gagnasafni sem hægt væri að nota

Hæhæ.

Það er alltaf gaman að sjá svona geggjuð forritunar framlög til íslenskunnar dúkka upp, mikið kudos.

Ég hef í um eitt og hálft ár verið hægt en stöðugt að byggja upp eigin orðagrunn sem er aðgengilegur undir LGPL-v3 leyfi hér: github.com/Loknar/loka-ord.

Á einum tímapunkti einbeitti ég mér sérstaklega að íslenskum mannanöfnum og hef að ég tel meira og minna öłl íslensk mannanöfn og beygingar þeirra í grunninum, sem og þekkt ættarnöfn og kenninöfn. Einnig einhver gælunöfn.

Vert að taka fram að í grunninum er eitt smá frávik frá hefðbundinni ritaðri íslensku, innleiðing bókstafsins Ł, en lítið mál að meðhöndla það með einföldu replace "łl" -> "ll".

Kveðja, Sveinn

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.