Enochian

This project provides some tools to do exploratory phonological comparisons between texts in unknown languages and entries one or more lexicons.

You may see the results of a recent test run of the software for the Voynich Manuscript here.

Introduction

The initial goal is to investigate whether a particular theory of a possible phonological interpretation of the script in the Voynich manuscript can be used to find possible lexical matches in various machine-readable lexicons.

Stephen Bax in 2014 proposed some phonological values for various Voynich characters, based on identifications of plant and star names in some of the illustrated pages. Derek Vogt has elaborated on this work and proposed a more extensive phonological scheme. In addition, he has analyzed the phonological inventory of the scheme and proposed that the language of the Voynich manuscript is based on some variety of Romani.

At present, the Enochian software tool can take arbitrary lines from the Reed-Landini-Stolfi Interlinear transcription of the Voynich manuscript, encode each word as a sequence of vectors in phonological feature space, and then search the RomLex lexicon of Romani and the Shabda-Sagara Sanskrit dictionary, using dynamic time warping to look for for the closest phonological sequence matches.

You can see a sample of this kind of flow in the voynich.json flow configuration. This flow reads the RomLex lexicon and the specified lines of the Voynich transcription and produces an HTML file containing a report on the possible phonological matches.

Status

Current results are inconclusive. Possible matches for words meaning "sun", "moon", "house", and "sky" appear on the first page of the Voynich manuscript, which are suggestive of references to astrological content, but much more work needs to be done.

You may see the results of a recent test run of the software for the Voynich Manuscript here.

Roadmap

The RomLex lexicon has fewer than 30,000 entries, many of which are duplicates, due to the lexicon containing data from multiple Romani dialects. This means it does not provide very conclusive results on its own.

The Shabda-Sagara dictionary also has fewer than 30,000 entries.

General Functionality

At the most general level, the Enochian library provides a system for configuring and running "flows" of arbitrary data transformations. This is implemented by the Flow class, which contains a FlowContainer which can have a number of FlowStep objects (which can themselves be containers).

When you iterate over the enumerable returned by FlowStep.GetOutputs(), each step will grab an output from its previous sibling and call its Process() method on it, returning the resulting output. If you implement only FlowStep.Process(), or if you implement FlowStep.GetOutputs() using yield return, the flow process will be asynchronous; it will only process as many items as are needed to return one output.

Linguistic Resources

In order to do phonological analysis, the Enochian library provides a way to specify a phonological feature set (see features.json for an example using a pretty standard set of phonological features). The FeatureSet class is used to load and use these feature sets.

You can also define text "encodings". These take input strings in Unicode and produce sequences of vectors in the multi-dimensional space defined by the phonological feature set. A single phonological segment consists of an N-dimensional vector, where N is the number of features in your feature set. If a particular feature has a + value for that segment, its corresponding vector element will be 1; if it has a - value, its vector element will be -. If the feature is unspecified, its vector element will be 0.

Lexicons

The systems includes several lexicons:

CMU Pronouncing Dictionary

This is used for testing the underlying assumption behind the project, that we can find slightly dissimilar phonological sequences in a lexicon by means of dynamic time warping. The english_test.json contains a sample flow that compares a defective encoding of English text with the CMU dictionary to produce matches for English words. Running this flow demonstrates that the process is capable of finding many such valid matches.

RomLex

This is a dictionary of words in various Romani dialects. The database is only available via the web, so there is a project RomlexScraper that scrapes the web interface to assemble a complete version of the lexicon.

Shabda-Sagara

This is a 19th-century dictionary of classical Sanskrit.

kulibali / enochian Goto Github PK

enochian's Introduction

Enochian

Introduction

Status

Roadmap

General Functionality

Linguistic Resources

Lexicons

CMU Pronouncing Dictionary

RomLex

Shabda-Sagara

enochian's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent