Git Product home page Git Product logo

enochian's Introduction

Enochian

This project provides some tools to do exploratory phonological comparisons between texts in unknown languages and entries one or more lexicons.

You may see the results of a recent test run of the software for the Voynich Manuscript here.

Introduction

The initial goal is to investigate whether a particular theory of a possible phonological interpretation of the script in the Voynich manuscript can be used to find possible lexical matches in various machine-readable lexicons.

Stephen Bax in 2014 proposed some phonological values for various Voynich characters, based on identifications of plant and star names in some of the illustrated pages. Derek Vogt has elaborated on this work and proposed a more extensive phonological scheme. In addition, he has analyzed the phonological inventory of the scheme and proposed that the language of the Voynich manuscript is based on some variety of Romani.

At present, the Enochian software tool can take arbitrary lines from the Reed-Landini-Stolfi Interlinear transcription of the Voynich manuscript, encode each word as a sequence of vectors in phonological feature space, and then search the RomLex lexicon of Romani and the Shabda-Sagara Sanskrit dictionary, using dynamic time warping to look for for the closest phonological sequence matches.

You can see a sample of this kind of flow in the voynich.json flow configuration. This flow reads the RomLex lexicon and the specified lines of the Voynich transcription and produces an HTML file containing a report on the possible phonological matches.

Status

Current results are inconclusive. Possible matches for words meaning "sun", "moon", "house", and "sky" appear on the first page of the Voynich manuscript, which are suggestive of references to astrological content, but much more work needs to be done.

You may see the results of a recent test run of the software for the Voynich Manuscript here.

Roadmap

The RomLex lexicon has fewer than 30,000 entries, many of which are duplicates, due to the lexicon containing data from multiple Romani dialects. This means it does not provide very conclusive results on its own.

The Shabda-Sagara dictionary also has fewer than 30,000 entries.

General Functionality

At the most general level, the Enochian library provides a system for configuring and running "flows" of arbitrary data transformations. This is implemented by the Flow class, which contains a FlowContainer which can have a number of FlowStep objects (which can themselves be containers).

When you iterate over the enumerable returned by FlowStep.GetOutputs(), each step will grab an output from its previous sibling and call its Process() method on it, returning the resulting output. If you implement only FlowStep.Process(), or if you implement FlowStep.GetOutputs() using yield return, the flow process will be asynchronous; it will only process as many items as are needed to return one output.

Linguistic Resources

In order to do phonological analysis, the Enochian library provides a way to specify a phonological feature set (see features.json for an example using a pretty standard set of phonological features). The FeatureSet class is used to load and use these feature sets.

You can also define text "encodings". These take input strings in Unicode and produce sequences of vectors in the multi-dimensional space defined by the phonological feature set. A single phonological segment consists of an N-dimensional vector, where N is the number of features in your feature set. If a particular feature has a + value for that segment, its corresponding vector element will be 1; if it has a - value, its vector element will be -. If the feature is unspecified, its vector element will be 0.

Lexicons

The systems includes several lexicons:

CMU Pronouncing Dictionary

This is used for testing the underlying assumption behind the project, that we can find slightly dissimilar phonological sequences in a lexicon by means of dynamic time warping. The english_test.json contains a sample flow that compares a defective encoding of English text with the CMU dictionary to produce matches for English words. Running this flow demonstrates that the process is capable of finding many such valid matches.

RomLex

This is a dictionary of words in various Romani dialects. The database is only available via the web, so there is a project RomlexScraper that scrapes the web interface to assemble a complete version of the lexicon.

Shabda-Sagara

This is a 19th-century dictionary of classical Sanskrit.

enochian's People

Contributors

chalcolith avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.