Git Product home page Git Product logo

old-swedish-dictionary-builder's Introduction

Old Swedish Dictionary Builder

Build "Dictionary of Old Swedish" into easier-to-use data formats.

Available formats:

  • JSON
  • DSL

The data source can be used either in volumes (Vol I - III & vol IV - V) or as a combined dictionary. Also outputs additional related dictionary of Medieval Swedish law.

Usage

Main package exposes ToJson function, which generates output files in /build/ directory. Running main function generates all outputs.

Using "dictionary" package one can also use the dictionary as in-memory structures.

About "Dictionary of Old Swedish"

"Ordbok Öfver svenska medeltids-språket" dictionary was published in late 1884—1918 by K.F. Söderwall. Additional supplement to it was published in 1953—1973.

Old Swedish developed from Old East Norse, the eastern dialect of Old Norse, at the end of the Viking Age. Early Old Swedish was spoken from about 1225 until about 1375, and Late Old Swedish was spoken from about 1375 until about 1526.

The original material is licenced under Creative Commons International (CC BY 4.0), made available by University of Gothenburg. The source code for this library is under MIT licence.

old-swedish-dictionary-builder's People

Contributors

dependabot[bot] avatar stscoundrel avatar

Stargazers

 avatar

Watchers

 avatar  avatar

old-swedish-dictionary-builder's Issues

Add minifier script

The language specific libs that use this dataset utilize minified version. This builder could output it alongside the main dataset.

  • Minify repeated keys, use a, b, c d etc instead. Client libs will parse them back to reasonable values
  • One-line the json output.

Should shave off many megabytes.

Parse "information" tag

"Information" seems to exist for some entries and contain information like <feat att="information" val="p. adj. " />. Parse it to entries.

Support multiple parts of speech

There apparently can be multiple part of speech blocks for entries. Change it from string to []string. Essentially a breaking change for libraries consuming this dataset, but happily they're all version 0.1.0 or something, so bumping major is a minor thing (badum-ts)

Additional formatting for entries

For entries, alternative forms and other infos, following fixes to formatting could be applied:

  • . " -> . " (removes extra space at the end)
  • " -> " (removes extra space at start, should be applied only after the previous fix)
  • ) . -> ). (removes extra space in the middle)

There are probably many more, but these would fix many common oddities & make definitions more readable.

Add gzipped JSON output

Based on results in similar Old Danish builder, simply gzipping the JSON output seems to help the minification considerably. At least for Node.js library the reduced filesize was small enough that added overhead of unzipping it when needed still had similar performance.

Two versions could be provided:

  • Just gzipped json output (for Node.js)
  • Gzipped json output of minified keys (for language ports that anyway read json keys to internal keys). This saves even more disk space.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.