fearful-symmetry / kirum Goto Github PK

An "etymology-first" conlang tool that uses a directed graph to manage and generate languages.

License: GNU Affero General Public License v3.0

Rust 100.00%

kirum's Introduction

Kirum

Kirum (from Standard Babylonian Kirûm meaning garden or orchard) is a conlang CLI utility and library. Unlike many conlang tools, which allow you to generate lexicons based on phonetic rules, Kirum generates entire languages and language families based on specified etymology. Kirum started as a way to enable easy iteration on a conlang; instead of a blind find-and-replace operation or carefully scrolling through documentation, phonology and conjugation rules can be changed in a single place and then "trickled down" via etymology rules. The bureaucracy example demonstrates this by generating the modern English word bureaucracy using only the Latin word burra and the Greek root kratia.

Kirum is a work in progress, and should be considered alpha software.

Installing

If you haven't already, install rust and git. Once you've cloned the repo, install with

cd kirum/
cargo install --path=./kirum

Getting Started

To create your first project, simply run kirum new [NAME]:

$ kirum new my_first_project
[2023-05-27T19:57:10Z INFO  kirum] created new project my_first_project

This will create a basic project file under a my_first_project directory. From there on, you can render your project to a lexicon:

$ kirum render -d my_first_project/ line
    essemple (Old French) model, example
    exemplum (Latin): (Noun) an instance, model, example
    emere (Latin): (Verb) To buy, remove

Examples

The examples directory has a number of projects:

bureaucracy - A basic example that demonstrates how to use etymology graphs to make changes to the history of a word.
generate_daugher - An example of how to use the generate subcommand to create a daughter language from a parent language.
templates - Using a handlebars template to output an asciidoc dictionary.
conditionals - Using conditional statements in transforms.
phonetic_rules - Using Kirum's phonetic rulesets to generate words.
ingest_from_json - Ingesting words into a language project from a JSON or newline-delimited text file.
rhai - Using the Rhai scripting language to transform words as part of an etymological history.

The structure of a Kirum project

kirum generates languages from a number of files, contained in separate tree and etymology directories: Tree files contain a lexicon of words, stems, roots, etc, and etymology files contain data on the transforms between words. The transform files can also contain conditional statements that determine if a transform should be applied to a word. An optional phonetics directory also allows for generating words from phonetic, as opposed to etymological, rules.

Lexis objects

A Tree file is a JSON object of Lexis objects, a maximal example of which is presented below:

    "latin_example": {
      "type": "word", // A user-supplied tag. Can be any value.
      "word": "exemplum", // The actual lexical word. If not supplied, kirum will attempt to derive it based on etymology
      "language": "Latin", // Can be any user-supplied value
      "generate": "word_rules", // An optional tag that will generate the word from phonetic rules, see examples/phonetic_rules
      "definition": "an instance, model, example",
      "part_of_speech": "noun", // Optional. Must be one of Noun, verb, or adjective.
      "etymology": {
        "etymons": [
          {
            "etymon": "latin_verb", // The key name of another lexis in the Kirum project
            "transforms": [
              "latin-from-verb" // the key name of a transform
            ]
          }
        ]
      },
      "archaic": true, //optional. Used only for sorting and filtering.
      "historical_metadata": {"metadata_value":"value"} // Optional historical metadata. Unlike tags, historical metadata is inherited from any etymons. Can also be used for sorting and templates.
      "tags": [ // optional, user-supplied tags.
        "example",
        "default"
      ],
      "derivatives": [ // The optional derivatives field works as syntactic sugar, allowing users to specify derivative words within the object of the etymon, as opposed to as a separate JSON object.
        {
          "lexis": { // Identical to the `lexis` structure of the parent lexis.
            "language": "Old French",
            "definition": "model, example",
            "part_of_speech": "noun",
            "archaic": true
          },
          "transforms": [
            "of-from-latin"
          ]
        }
      ]
    },

Transform objects

A transform object specifies the relationship between words. Transform files are a JSON object of Transform objects, an example of which is below:

        "vowel-o-change":{
            "transforms":[ // a list of individual transform functions. See below for available transforms
                {
                    "letter_replace":{
                        "letter": {"old": "e", "new":"ai"},
                        "replace": "all"
                    }
                }
            ],
            "conditional":{// Optional. The transform will only be applied if the conditional evaluates to true
                "pos": { // will match against the `part_of_speech` field of the Lexis object
                    "match":{
                        "equals": "noun" // The `part_of_speech` field must be equal to `noun`. 
                    }
                }
            }
        }

A complete list of available transform types can be found in the transforms.rs file.

kirum's People

Contributors

Stargazers

Watchers

kirum's Issues

Apply transforms based on language relationships

For example, if an etymon has language Old Lang and the derivative has language Middle Lang, the Kirum compute methods should look for a transform rule that matches those two languages, and apply it automatically.

Add rustdoc

We need a proper rustdoc page, particularly for documenting the structure of lexis and transform objects.

Add rhai script support in transforms

In addition to canned transformation types (letter_replace, etc), transforms should support calling user-supplied rhai scripts: https://github.com/rhaiscript/rhai

Allow for templating and variables in definition fields

The definition field in a lexis should be able to render template fields, so you can have something like this:

This word, which comes from proto-{langname}...

CI should run examples, `new` output

The CI tests should try to render all of the examples, as well as the default output of new.

Create templated languages

Kirum should have some option to generate a mostly complete (as in 500 or so words) conlang at startup with a flag passed to new. This first language should be based on English word relationships, at least for the first pass. We can add other languages later.

Add doc section: conditionals

Add a section in the readme for conditional expressions.

customize `new` behavior

new should be a little more customizable; perhaps with a flag to make the default template language optional ,etc

Make cool spinner for background tasks

For certain tasks, like running render on a large lexicon, kirum can take a second or two to run. Make a cool CLI spinner/bar thing.

Add `stat` command

Add a stat command that provides metrics about the specified language

Validate keynames on ingest

Due to the fact that we can read multiple files, it's possible for a user to supply different words with the same keyname. Check for this.

Add doc section: templating

Add a doc section on using template output and rhai scripts.

`transform` field should be optional.

Right now, a lexis etymology field is required to specify a transform, even if no actual transform is needed. Kirum probably needs some kind of built-in "loanword" transform that operates as a default if no transform is specified in an etymon object.

Redo `Word` type

Right now, Word is an enum, with the possible values of String or Vector. This has its limits, and certain operations, like regex transforms, are lossy, and require a string instead of a vector. Redo the word type, perhaps as a struct that contains a string field, and a metadata field that maps to individual characters.

Add CI

Add github/CI actions, since we have more than a few tests.

Fix CSV encoding

Right now, CSV encoding is broken (and disabled), because of issues with serializing the tags field. The tags field probably needs a custom serializer.

CLI needs `new` call

Right now, the structure of a kirum project is fairly complex, and can contain multiple directories, files, etc. Create a new CLI call that creates the basic directory structure, with the JSON pre-filled with a single word to operate as a template for users.

daughter languages may reference etymons that don't exist

If the generate daughter command grabs a word that was part of a derivatives field in the original definition, the resulting etymon will not exist when the daugher language is next read in. This can probably be fixed by "waiting" to connect any words where the etymon isn't found at first.

Generate functions should be able to write to an arbitary number of files

Right now, commands like generate daughter dump everything to one file. There should be some kind of --lexis-per-word flag that can at least break out all the different lexis entries in a generation to different files.

Add Doc section: Getting Started

Depends on #9. Add a Getting Started section to the start of the main readme.

Word & Language generation from scratch.

Using this issue to track word and language generation from scratch, which Kirum currently can't do. Most other conlang tools, like vulgarlang, rely on a combination of consonant/vowel lists, and templates that determine the structure of syllables in a word. We may want something similar.

CLI shortcuts for adding words to dictionaries

There should be some kind of subcommand, kirum add lexis for adding words to the existing JSON.

IPA logic

It would be cool, along with #2 , to add support for IPA in transforms, word generation, and conditionals. In transforms, it would work something like this:

Consonant::from("s").make_voiced()