Git Product home page Git Product logo

rrpl's Introduction

Recursive Radical Packing Language

Recursive Radical Packing Language (RRPL) is a proposal for a method of describing arbitrary Chinese characters concisely while retaining their structural information. Potential fields for usage include font design and machine learning. In RRPL, each Chinese character is described as a short string of numbers, symbols, and references to other characters. Its syntax is inspired by markup languages such as LaTeX, as well as the traditional "米" grids used for calligraphy practice.

5000+ Traditional Chinese Characters and radicals are currently described using this lanugae. You can download a .json file containing all of them (and unicode mapping) here: dist/min-trad.json

Syntax

Each Chinese character is described as a combination of components. These components can be other characters or radicals, as well as building blocks, which defines the simplest shapes that make up every component. Combination can be applied recursively to describe ever more complex glyphs.

Below is an overview of this syntax; You can also check out the Interactive Demo to play with it yourself.

Building Blocks

A building block is a string of the alphabet {0, 1, 2, 3, 4, 5, 6, 7, 8}, in which the presence of a number indicates a corresponding stroke to be drawn on a "米" grid:

 1 2 3
  \|/
8 -+- 4
  /|\
 7 6 5

0 indicates that no stroke should be drawn in this block.

Example:

ResultCodeResultCode
48 24578

Packing

Building blocks can be packed horizontally or vertically using the - and | symbols respectively to compose more complex glyphs. These symbols can be chained to pack more than two symbols with equal room.

Example:

ResultCodeResultCode
27-26-26 2468|24578

Grouping

( and ) symbols can be used to group components together so mixed horizontal and vertical packing can happen in the correct order.

Example:

ResultCodeResultCode
(48|37)-(25678|27)-(37|15) (46-68)|(246-268)|(24-28)

Referencing

Other characters and radicals can be referenced directly to build a new character. The parser will dump the contents of the reference glyph directly into the string, similar to C/C++ #include feature. This makes it especially easy to describe the more complicated Chinese characters, as most of them consist of radicals.

Example:

ResultCodeResultCode
廿|468|由|(八) ((車|(山))-(殳))|(手)
((口)-(口))|(甲)|十 (((木)-(缶)-(木))|(冖))|((鬯)-(彡))

Parser

An baseline parser is included in rrpl_parser.js, which powers this Interactive Demo. It can be used with browser-side JavaScript as well as Node.js:

//require the module: (or in html, <script src="./rrpl_parser.js"></script>)
var parser = require('./rrpl_parser.js');

//obtain an abstract syntax tree
var ast = parser.parse("(48|37)-(25678|27)-(37|15)");

//returns line segments (normalized 0.0-1.0) that can be used to render the character
var lines = parser.toLines(parser.toRects(ast));

File Type

RRPL data can be stored in a JSON file, whith the root object mapping unicode characters to their respective description, e.g.

{
  "一":"48",
  "丁":"468|26|27",
  "上":"246|248",
  "不":"(48-45678-48)|(3-26-1)",
  "丕":"不|一",
  "中":"(46-2468-68)|(24-2468-28)",
  "串":"中|中"
}

The references in these files are usually first expanded before rendering is attempted. This can be done in two ways. The first is using parser.preprocess(json_object) in rrpl_parser.js, while the second is using compile.js. More documentation can be found in the header comments of these files.

The JSON files can be further compressed into (and uncompressed from) a binary file around half of the size of the original using compress.js, by using a half byte to encode each symbol in the RRPL alphabet.

Downloads

  • dist/min-trad.json contains RRPL description of 5000+ traditional Chinese characters stored in JSON format.
  • dist/RRPL.ttf contains a True Type Font (ttf) containing 5000+ traditional Chinese characters with glyphs generated by the default parser. Below is a screenshot of the font in macOS TextEdit.app:

Tools

Rendering

  • Generate a preview.html web page containing a rendering of all characters in a RRPL json file:
$node render.js preview path/to/input.json
  • Generate a realtime.html web page where user inputs can be parsed and rendered interactively: (Characters defined in the input file will be available for referencing)
$node render.js realtime path/to/input.json

Exporting

  • Export a folder of SVG (Scalable Vector Graphics) rendering of each character in a RRPL json file:
$node export_glyphs.js path/to/input.json path/to/output/folder 0

Contrary to what render.js generates, these SVG's contains "outlines" of the glyphs instead of simple strokes. More settings such as thickness can be tweaked in the source code of export_glyphs.js; Command-line API will come later.

  • To generate a TTF (True Type) font from the aforementioned SVG's, FontForge's python library can be used for this purpose. (pip install fontforge) An example can be found in tools/forge_font.py.

Applications

Since RRPL reduces all Chinese characters to a short string of numbers, their structure can be learned by sequential models such as Markov chains, RNN's and LSTM's without much difficulty. I've applied RNN (Recurrent Neural Networks) to the language to hallucinate non-existent Chinese characters. Below are some characters generated by training overnight on ~1000 RRPL character descriptions, with the visuals rendered using a pix2pix model. A separate repo for that project will be created soon.

Contributing

rrpl.json contains the latest, work-in-progress version. There're some 5,000 characters in there, but there're over 50,000 Chinese characters in existence! So help is very much appreciated. If you'd like to help with this project, please append new characters to the file and submit a pull request. For more info, contact me by sending an email to lingdonh[at]andrew[dot]cmu[dot]edu.

Below is a rendering of all 5000+ Chinese characters denoted using RRPL so far. Click on the image to enlarge.

rrpl's People

Contributors

lingdong- avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.