Git Product home page Git Product logo

kirum's Introduction

Kirum

ci

Kirum (from Standard Babylonian Kirûm meaning garden or orchard) is a conlang CLI utility and library. Unlike many conlang tools, which allow you to generate lexicons based on phonetic rules, Kirum generates entire languages and language families based on specified etymology. Kirum started as a way to enable easy iteration on a conlang; instead of a blind find-and-replace operation or carefully scrolling through documentation, phonology and conjugation rules can be changed in a single place and then "trickled down" via etymology rules. The bureaucracy example demonstrates this by generating the modern English word bureaucracy using only the Latin word burra and the Greek root kratia.

Kirum is a work in progress, and should be considered alpha software.

Installing

If you haven't already, install rust and git. Once you've cloned the repo, install with

cd kirum/
cargo install --path=./kirum

Getting Started

To create your first project, simply run kirum new [NAME]:

$ kirum new my_first_project
[2023-05-27T19:57:10Z INFO  kirum] created new project my_first_project

This will create a basic project file under a my_first_project directory. From there on, you can render your project to a lexicon:

$ kirum render -d my_first_project/ line
    essemple (Old French) model, example
    exemplum (Latin): (Noun) an instance, model, example
    emere (Latin): (Verb) To buy, remove

Examples

The examples directory has a number of projects:

  • bureaucracy - A basic example that demonstrates how to use etymology graphs to make changes to the history of a word.
  • generate_daugher - An example of how to use the generate subcommand to create a daughter language from a parent language.
  • templates - Using a handlebars template to output an asciidoc dictionary.
  • conditionals - Using conditional statements in transforms.
  • phonetic_rules - Using Kirum's phonetic rulesets to generate words.
  • ingest_from_json - Ingesting words into a language project from a JSON or newline-delimited text file.
  • rhai - Using the Rhai scripting language to transform words as part of an etymological history.

The structure of a Kirum project

kirum generates languages from a number of files, contained in separate tree and etymology directories: Tree files contain a lexicon of words, stems, roots, etc, and etymology files contain data on the transforms between words. The transform files can also contain conditional statements that determine if a transform should be applied to a word. An optional phonetics directory also allows for generating words from phonetic, as opposed to etymological, rules.

Lexis objects

A Tree file is a JSON object of Lexis objects, a maximal example of which is presented below:

    "latin_example": {
      "type": "word", // A user-supplied tag. Can be any value.
      "word": "exemplum", // The actual lexical word. If not supplied, kirum will attempt to derive it based on etymology
      "language": "Latin", // Can be any user-supplied value
      "generate": "word_rules", // An optional tag that will generate the word from phonetic rules, see examples/phonetic_rules
      "definition": "an instance, model, example",
      "part_of_speech": "noun", // Optional. Must be one of Noun, verb, or adjective.
      "etymology": {
        "etymons": [
          {
            "etymon": "latin_verb", // The key name of another lexis in the Kirum project
            "transforms": [
              "latin-from-verb" // the key name of a transform
            ]
          }
        ]
      },
      "archaic": true, //optional. Used only for sorting and filtering.
      "historical_metadata": {"metadata_value":"value"} // Optional historical metadata. Unlike tags, historical metadata is inherited from any etymons. Can also be used for sorting and templates.
      "tags": [ // optional, user-supplied tags.
        "example",
        "default"
      ],
      "derivatives": [ // The optional derivatives field works as syntactic sugar, allowing users to specify derivative words within the object of the etymon, as opposed to as a separate JSON object.
        {
          "lexis": { // Identical to the `lexis` structure of the parent lexis.
            "language": "Old French",
            "definition": "model, example",
            "part_of_speech": "noun",
            "archaic": true
          },
          "transforms": [
            "of-from-latin"
          ]
        }
      ]
    },

Transform objects

A transform object specifies the relationship between words. Transform files are a JSON object of Transform objects, an example of which is below:

        "vowel-o-change":{
            "transforms":[ // a list of individual transform functions. See below for available transforms
                {
                    "letter_replace":{
                        "letter": {"old": "e", "new":"ai"},
                        "replace": "all"
                    }
                }
            ],
            "conditional":{// Optional. The transform will only be applied if the conditional evaluates to true
                "pos": { // will match against the `part_of_speech` field of the Lexis object
                    "match":{
                        "equals": "noun" // The `part_of_speech` field must be equal to `noun`. 
                    }
                }
            }
        }

A complete list of available transform types can be found in the transforms.rs file.

kirum's People

Contributors

fearful-symmetry avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

kirum's Issues

Apply transforms based on language relationships

For example, if an etymon has language Old Lang and the derivative has language Middle Lang, the Kirum compute methods should look for a transform rule that matches those two languages, and apply it automatically.

Add rustdoc

We need a proper rustdoc page, particularly for documenting the structure of lexis and transform objects.

Create templated languages

Kirum should have some option to generate a mostly complete (as in 500 or so words) conlang at startup with a flag passed to new. This first language should be based on English word relationships, at least for the first pass. We can add other languages later.

customize `new` behavior

new should be a little more customizable; perhaps with a flag to make the default template language optional ,etc

Add `stat` command

Add a stat command that provides metrics about the specified language

Validate keynames on ingest

Due to the fact that we can read multiple files, it's possible for a user to supply different words with the same keyname. Check for this.

`transform` field should be optional.

Right now, a lexis etymology field is required to specify a transform, even if no actual transform is needed. Kirum probably needs some kind of built-in "loanword" transform that operates as a default if no transform is specified in an etymon object.

Redo `Word` type

Right now, Word is an enum, with the possible values of String or Vector. This has its limits, and certain operations, like regex transforms, are lossy, and require a string instead of a vector. Redo the word type, perhaps as a struct that contains a string field, and a metadata field that maps to individual characters.

Add CI

Add github/CI actions, since we have more than a few tests.

Fix CSV encoding

Right now, CSV encoding is broken (and disabled), because of issues with serializing the tags field. The tags field probably needs a custom serializer.

CLI needs `new` call

Right now, the structure of a kirum project is fairly complex, and can contain multiple directories, files, etc. Create a new CLI call that creates the basic directory structure, with the JSON pre-filled with a single word to operate as a template for users.

daughter languages may reference etymons that don't exist

If the generate daughter command grabs a word that was part of a derivatives field in the original definition, the resulting etymon will not exist when the daugher language is next read in. This can probably be fixed by "waiting" to connect any words where the etymon isn't found at first.

Word & Language generation from scratch.

Using this issue to track word and language generation from scratch, which Kirum currently can't do. Most other conlang tools, like vulgarlang, rely on a combination of consonant/vowel lists, and templates that determine the structure of syllables in a word. We may want something similar.

IPA logic

It would be cool, along with #2 , to add support for IPA in transforms, word generation, and conditionals. In transforms, it would work something like this:

Consonant::from("s").make_voiced()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.