Git Product home page Git Product logo

glyphnameformatter's Introduction

GlyphNameFormatter

GlyphNameFormatter Banner

A system for generating glyph name lists from official Unicode data.

  • Unicode has long and descriptive names for each character.
  • Font editors need glyph names to easily identify the glyphs, short, unique, can't use spaces, non-ascii characters.
  • Raw Unicode names are unsuitable for this purpose in font editors
  • Font editors use their own lists that map names to unicode values.
  • So this system parses, generates and tweaks.

Glyph Name Formatted Unicode List Release 0.6 (ɣNUFL)

download

  • Release 0.1 offers (almost) the same coverage as the Adobe Glyph Dictionary, AGD.txt
  • Release 0.2 offers more ranges from Unicode 10.0.0.
  • Release 0.3 offers more ranges from Unicode 11.0.0.
  • Release 0.4 fixes a bug in the conflict analysis. Better names for all.
  • Release 0.5 Improved support for Georgian names.
  • Release 0.6 Adds Mathematical Alphanumeric Symbols, housekeeping

Contributions

This release is not meant to be final. Many ranges have basic coverage but could be improved. Some Unicode names are wrong and then get translated wrong. The list does not claim authority or completeness.

If you find things wrong and would like to share this insight, we're accepting comments, open an issue. If you see how the system works we will also gladly consider pull requests. If you would like to see certain ranges supported, let us know.

This version acknowledges the help by Frederik, Just, Adam, Daniel, Bahman and Ilya.

Using conversion functions

After all the processing is done, the lists can be used with a couple of convenient functions.

  • u2n(value) Unicode value to glyphname
  • n2u(name) Glyphname to Unicode value
  • u2c(value) Unicode value to Unicode category
  • n2c(name) Glyphname to Unicode category
  • u2r(value) Unicode value to range name
  • n2N(name) name to uppercase
  • N2n(name) name to lowercase
  • u2U(uni) unicode to uppercase unicode
  • U2u(uni) unicode to lowercase unicode

Naming guidelines

  • Glyph names should, as much as possible, only have script tags to disambiguate.
  • Detect when script specific prefix or suffix is necessary
  • Keep script prefix or suffix short
  • Some scripts already have a preference for pre- or suffix.
  • Some names look better with camelCase.

Generating lists

Run data/buildFlatUnicodeList.py to download the current (or previous) data from Unicode.org. This is a large file. This script downloads and processes the data to a more practical size, stored in data/flatUnicode.txt.

Run data/buildJoiningTypesList.py to download the current data from Unicode.org. This stored in a separate file, data/joiningTypes.txt

Run exporters/exportFlatLists.py to generate a text file with pairs, exclusively with the available range processors. The results are in names/glyphNamesToUnicode.txt

Run test/buildRanges.py to make all the name lists. They will be deposited in names/ranges. There will be other methods and other lists, but for now this is the place to make things. You can also run each of the range scripts in rangeProcessors/ and they will print a nice readable table with the processed name, unicode value, original name. There is also a column for names from the Adobe AGL if it has a different name for that entry.

Run exporters/analyseConflicts.py to get an overview of all name clashes and how they are addressed. The results are in a text file in data/conflict.txt

Range Processors

Given the rather large task of handling thousands of exceptions and tweaks, the package has a modules that each take care of a single unicode range. This makes it easier to work in different places at once. Also testing is easier.

The GlyphName class is initialised with a single unicode number. It then finds the unicode name. Based on the range name it tries to find a module with a corresponding name in rangeProcessors/. If it finds such a module it will run the process() function and apply it. The process() function will try to transform the unicode name by editing or replacing parts of the name.

Each range processor has a handy debugginh print function that will show an overview of the unicode value, the generated name, a comparison with the AGD name and the unicode names.

On the internals

  • GlyphName.uniNumber integer, the unicode number we're working on.
  • GlyphName.uniName string, the original unicode character name
  • GlyphName.uniNameProcessed string, the edited name.
  • GlyphName.suffixParts list of name parts that are added at the end. Please use:
  • GlyphName.suffix(namePart) use this method to add name parts to the suffix list.
  • GlyphName.replace(oldPattern, [newPattern]) If no newPattern is given it will assume it is "" and delete oldPattern
  • GlyphName.edit(oldPattern, [*suffixes]) This is more elaborate: it will remove oldPattern from the name, and then append any number of suffix strings to GlyphName.suffixParts. When the processing is done all strings in suffixParts are appended to the end of the glyph name.

glyphnameformatter's People

Contributors

arialcrime avatar benkiel avatar danielgrumer avatar letterror avatar moyogo avatar roberto-arista avatar twardoch avatar typemytype avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

glyphnameformatter's Issues

Arabic naming: shaddadammatan

FC5E shadda.init_dammatan.fina ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM
FC5E shaddadammatan ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM

As I said feature tag names should not be used for diacritics. This is a composition of two diacritics so calling it ligature is absolutely wrong and will cause confusing. I’m sorry but I’ve no idea why they used the term ligature here. (there are many more examples like this)

Dash, prefixes, suffixes, and a few other questions

Hi,
I wonder, could you perhaps clarify: Is there a particular reason (other than just a "decision" :) ) that you’ve decided to put the "script" components as prefixes (while Glyphs uses suffixes). I'm myself rather undecided on that. :)

A.

Great! :)

Hi Erik,

So you were doing something rather useful when publishing your sarcastic tweets about Unicode character names. Now I understand! :)

As it happens, I've been trying to do some work in that field recently as well. For a number of mathematical characters, there is a good source of sensible names which we could lift from the unicode.xml file in the https://github.com/w3c/xml-entities repo, which is an official W3C-maintained list of short names for character entities.

This is the human-readable version of that document:
https://www.w3.org/2003/entities/2007doc/bycodes.html
but the unicode.xml file also contains many names from other namespaces, especially from the TeX unicode-math documents which catalogs names as used by various LaTeX packages.

The XML entity list and the unicode-math list aren’t completely identical, but potentially one of those is a good source for standardized, well-thought-out shorter list for many characters (mostly symbols). Some characters are the same in both lists, for example U+2AAC (⪬) is called smte in both lists. But for example U+1D49C (𝒜) is called mscrA in unicode-math and Ascr in the XML Entity list.

Would you say that it might make sense to adopt at least some of these? They seem pretty systematic.

Publish the GNUFL list separately

For those who would like to use the GNUFL list this whole package is a substantial download. It would be useful to publish the list all by itself, perhaps with a simple reader. Maybe also consider additional output formats, json maybe?

Arabic naming: fathamedial

FE77 fatha.medi ARABIC FATHA MEDIAL FORM
FE77 fathamedial ARABIC FATHA MEDIAL FORM

These are deprecated forms of diacritics. I cannot possibly imagine the medial forms of diacritics would be used ever in the future even for legacy purposes. adding .medi feature tag would cause confusion.

Better Sinhala names

For some reason, the Unicode Sinhala names don't follow the same convention as the rest of the Indic scripts, but interestingly the PDF chart has the Indic-like names. The dictionary below provides the correlation between the two sets of names.

# This data was obtained from the Sinhala PDF code chart
# It got minor changes to the sign names (not the vowel sign)
sinhalaNameEquivalentsDict = {
'SINHALA SIGN ANUSVARAYA':'sinhala sign anusvara',
'SINHALA SIGN VISARGAYA':'sinhala sign visarga',
'SINHALA LETTER AYANNA':'sinhala letter a',
'SINHALA LETTER AAYANNA':'sinhala letter aa',
'SINHALA LETTER AEYANNA':'sinhala letter ae',
'SINHALA LETTER AEEYANNA':'sinhala letter aae',
'SINHALA LETTER IYANNA':'sinhala letter i',
'SINHALA LETTER IIYANNA':'sinhala letter ii',
'SINHALA LETTER UYANNA':'sinhala letter u',
'SINHALA LETTER UUYANNA':'sinhala letter uu',
'SINHALA LETTER IRUYANNA':'sinhala letter vocalic r',
'SINHALA LETTER IRUUYANNA':'sinhala letter vocalic rr',
'SINHALA LETTER ILUYANNA':'sinhala letter vocalic l',
'SINHALA LETTER ILUUYANNA':'sinhala letter vocalic ll',
'SINHALA LETTER EYANNA':'sinhala letter e',
'SINHALA LETTER EEYANNA':'sinhala letter ee',
'SINHALA LETTER AIYANNA':'sinhala letter ai',
'SINHALA LETTER OYANNA':'sinhala letter o',
'SINHALA LETTER OOYANNA':'sinhala letter oo',
'SINHALA LETTER AUYANNA':'sinhala letter au',
'SINHALA LETTER ALPAPRAANA KAYANNA':'sinhala letter ka',
'SINHALA LETTER MAHAAPRAANA KAYANNA':'sinhala letter kha',
'SINHALA LETTER ALPAPRAANA GAYANNA':'sinhala letter ga',
'SINHALA LETTER MAHAAPRAANA GAYANNA':'sinhala letter gha',
'SINHALA LETTER KANTAJA NAASIKYAYA':'sinhala letter nga',
'SINHALA LETTER SANYAKA GAYANNA':'sinhala letter nnga',
'SINHALA LETTER ALPAPRAANA CAYANNA':'sinhala letter ca',
'SINHALA LETTER MAHAAPRAANA CAYANNA':'sinhala letter cha',
'SINHALA LETTER ALPAPRAANA JAYANNA':'sinhala letter ja',
'SINHALA LETTER MAHAAPRAANA JAYANNA':'sinhala letter jha',
'SINHALA LETTER TAALUJA NAASIKYAYA':'sinhala letter nya',
'SINHALA LETTER TAALUJA SANYOOGA NAAKSIKYAYA':'sinhala letter jnya',
'SINHALA LETTER SANYAKA JAYANNA':'sinhala letter nyja',
'SINHALA LETTER ALPAPRAANA TTAYANNA':'sinhala letter tta',
'SINHALA LETTER MAHAAPRAANA TTAYANNA':'sinhala letter ttha',
'SINHALA LETTER ALPAPRAANA DDAYANNA':'sinhala letter dda',
'SINHALA LETTER MAHAAPRAANA DDAYANNA':'sinhala letter ddha',
'SINHALA LETTER MUURDHAJA NAYANNA':'sinhala letter nna',
'SINHALA LETTER SANYAKA DDAYANNA':'sinhala letter nndda',
'SINHALA LETTER ALPAPRAANA TAYANNA':'sinhala letter ta',
'SINHALA LETTER MAHAAPRAANA TAYANNA':'sinhala letter tha',
'SINHALA LETTER ALPAPRAANA DAYANNA':'sinhala letter da',
'SINHALA LETTER MAHAAPRAANA DAYANNA':'sinhala letter dha',
'SINHALA LETTER DANTAJA NAYANNA':'sinhala letter na',
'SINHALA LETTER SANYAKA DAYANNA':'sinhala letter nda',
'SINHALA LETTER ALPAPRAANA PAYANNA':'sinhala letter pa',
'SINHALA LETTER MAHAAPRAANA PAYANNA':'sinhala letter pha',
'SINHALA LETTER ALPAPRAANA BAYANNA':'sinhala letter ba',
'SINHALA LETTER MAHAAPRAANA BAYANNA':'sinhala letter bha',
'SINHALA LETTER MAYANNA':'sinhala letter ma',
'SINHALA LETTER AMBA BAYANNA':'sinhala letter mba',
'SINHALA LETTER YAYANNA':'sinhala letter ya',
'SINHALA LETTER RAYANNA':'sinhala letter ra',
'SINHALA LETTER DANTAJA LAYANNA':'sinhala letter la',
'SINHALA LETTER VAYANNA':'sinhala letter va',
'SINHALA LETTER TAALUJA SAYANNA':'sinhala letter sha',
'SINHALA LETTER MUURDHAJA SAYANNA':'sinhala letter ssa',
'SINHALA LETTER DANTAJA SAYANNA':'sinhala letter sa',
'SINHALA LETTER HAYANNA':'sinhala letter ha',
'SINHALA LETTER MUURDHAJA LAYANNA':'sinhala letter lla',
'SINHALA LETTER FAYANNA':'sinhala letter fa',
'SINHALA SIGN AL-LAKUNA':'sinhala sign virama',
'SINHALA VOWEL SIGN AELA-PILLA':'sinhala vowel sign aa',
'SINHALA VOWEL SIGN KETTI AEDA-PILLA':'sinhala vowel sign ae',
'SINHALA VOWEL SIGN DIGA AEDA-PILLA':'sinhala vowel sign aae',
'SINHALA VOWEL SIGN KETTI IS-PILLA':'sinhala vowel sign i',
'SINHALA VOWEL SIGN DIGA IS-PILLA':'sinhala vowel sign ii',
'SINHALA VOWEL SIGN KETTI PAA-PILLA':'sinhala vowel sign u',
'SINHALA VOWEL SIGN DIGA PAA-PILLA':'sinhala vowel sign uu',
'SINHALA VOWEL SIGN GAETTA-PILLA':'sinhala vowel sign vocalic r',
'SINHALA VOWEL SIGN KOMBUVA':'sinhala vowel sign e',
'SINHALA VOWEL SIGN DIGA KOMBUVA':'sinhala vowel sign ee',
'SINHALA VOWEL SIGN KOMBU DEKA':'sinhala vowel sign ai',
'SINHALA VOWEL SIGN KOMBUVA HAA AELA-PILLA':'sinhala vowel sign o',
'SINHALA VOWEL SIGN KOMBUVA HAA DIGA AELA-PILLA':'sinhala vowel sign oo',
'SINHALA VOWEL SIGN KOMBUVA HAA GAYANUKITTA':'sinhala vowel sign au',
'SINHALA VOWEL SIGN GAYANUKITTA':'sinhala vowel sign vocalic l',
'SINHALA VOWEL SIGN DIGA GAETTA-PILLA':'sinhala vowel sign vocalic rr',
'SINHALA VOWEL SIGN DIGA GAYANUKITTA':'sinhala vowel sign vocalic ll',
}

Changes for Arabic names

Comments from Bahman:

These are my notes about the names. First line for every entry is your output, second line is my preferred versions and third line is my explanation.

0616 alef_lam_yehabove ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH
0616 alefLamYehabove ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH

this is a diacritic and not a ligature. So it’s precomposed and the components are not letters.

066C ar-thousands ARABIC THOUSANDS SEPARATOR
066C thousandSeperator ARABIC THOUSANDS SEPARATOR
The first one is vague!


066E behdotless ARABIC LETTER DOTLESS BEH
066E dotlessbeh ARABIC LETTER DOTLESS BEH
it sounds more clear this way.

066F qafdotless ARABIC LETTER DOTLESS QAF
066F dotlessqaf ARABIC LETTER DOTLESS QAF
it sounds more clear this way.


0697 rehtwoabove ARABIC LETTER REH WITH TWO DOTS ABOVE
0697 rehtwodotsabove ARABIC LETTER REH WITH TWO DOTS ABOVE

069B seenthreebelow ARABIC LETTER SEEN WITH THREE DOTS BELOW
069B seenthreedotsbelow ARABIC LETTER SEEN WITH THREE DOTS BELOW

069D sadtwobelow ARABIC LETTER SAD WITH TWO DOTS BELOW
069D sadtwodotsbelow ARABIC LETTER SAD WITH TWO DOTS BELOW
in above examples the word dot is dropped, keeping it would be good? Maybe a capital A for above and B for below would make them shorter? There are more similar examples.


06D4 ar-period ARABIC FULL STOP
06D4 fullstop ARABIC FULL STOP
this is not a generic arabic stop. It’s only used in Urdu. Calling it arabic period would be confusing.


06E1 khahdotlessabove ARABIC SMALL HIGH DOTLESS HEAD OF KHAH
06E1 dotlesskhahabove ARABIC SMALL HIGH DOTLESS HEAD OF KHAH


06E2 meemabove.isol ARABIC SMALL HIGH MEEM ISOLATED FORM
06E2 meemabove ARABIC SMALL HIGH MEEM ISOLATED FORM

FE74 kasratan.isol ARABIC KASRATAN ISOLATED FORM
FE74 kasratan ARABIC KASRATAN ISOLATED FORM

For diacritics it’s better to ignore the isol and related feature tags. For automated feature generation it could cause unexpected results. there are more examples like this so treating them parametrically would be faster. If there is a initial, medial or final term inside the diacritic term, adding the actual full term without using the feature tag would be a good idea. For isolated forms of diacritics you could ignore the isolated term and not adding the term makes it cleaner.


FE77 fatha.medi ARABIC FATHA MEDIAL FORM
FE77 fathamedial ARABIC FATHA MEDIAL FORM

These are deprecated forms of diacritics. I cannot possibly imagine the medial forms of diacritics would be used ever in the future even for legacy purposes. adding .medi feature tag would cause confusion.


FE83 alefhamza.isol ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM
FE83 alefhamzaabove.isol ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM

there are two letters that contain alef and hamza. One is above the other one is below.


FBEA yeh.init_hamzaabove.medi_alef.fina ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF ISOLATED FORM
FBEA yehhamzaabove.medi_alef.fina ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF ISOLATED FORM

hamzaabove is a diacritic so it’s part of the letter “U+0626”. I guess before making the ligature you could check if any two components combined are a letter, so instead of treating them as two components just add the letter if its name exist in database. (there are many more examples like this in ligatures.)


FC5E shadda.init_dammatan.fina ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM
FC5E shaddadammatan ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM

As I said feature tag names should not be used for diacritics. This is a composition of two diacritics so calling it ligature is absolutely wrong and will cause confusing. I’m sorry but I’ve no idea why they used the term ligature here. (there are many more examples like this)

`legacy:` prefix for Arabic presentation form names?

The GNUFL names of the Arabic presentation form names cause confusion. For instance, a variant of the glyph seen with unicode 0633, seen.init should not have any unicode value. But GNUFL suggests FEB3.

So, if the presentation form unicodes (all of them? some of them?) are only for legacy support, then what about adding a legacy: prefix to these names for disambiguation. This should happen in the presentation form rangeprocessors (https://github.com/LettError/glyphNameFormatter/blob/master/Lib/glyphNameFormatter/rangeProcessors/arabic_presentation_forms_a.py)

Better unicodedata

Erik, Frederik,

the UCD (Unicode Character Database) contains a number of useful properties that we could use to improve the autonaming scheme. For example, there is the Math property which could be used to generate a math: prefix consistently. Also, for building certain glyph names, using the canonical decomposition might turn out to be easier than using the charname replacements.

  • unicodedata2: Fast C++ Python 2.7-compatible unicodedata replacement with Unicode 8.0 support, expected to be updated to 9.0 when it’s officially released (Apache2-licensed)
  • unicode_data: pure-Python “bleeding-edge” unicodedata replacement (Apache2-licensed), which exposes additional properties via parsing more UCD files.
    (Those two above could be combined).

There are also those two tools I found:

  • unipropgen: Unicode property table generator (MIT-licensed)
  • pyucd: quite complete parser for UnicodeData (LGPL/GPL-licensed)

Cyrillic + supplement

some conflicts in Cyrillic supplement range 0500..052F

both

CYRILLIC CAPITAL LETTER EL WITH MIDDLE HOOK
CYRILLIC CAPITAL LETTER EL WITH HOOK

will process to Elhookcyr

also

Enhookcyr                               Cyrillic                                CYRILLIC CAPITAL LETTER EN WITH HOOK
Enhookcyr                               Cyrillic Supplement                     CYRILLIC CAPITAL LETTER EN WITH MIDDLE HOOK

Changing MIDDLE HOOK will affect (in Cyrillic range 0400..04FF)

cy-Ghemiddlehook    Ghemiddlehook   Gehook  0494    Ҕ  CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK
cy-ghemiddlehook    ghemiddlehook   gehook  0495    ҕ  CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK

cy-Pemiddlehook Pemiddlehook    Pehook  04A6    Ҧ  CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK
cy-pemiddlehook pemiddlehook    pehook  04A7    ҧ  CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK

PROSGEGRAMMENI and YPOGEGRAMMENI

The issue with PROSGEGRAMMENI and YPOGEGRAMMENI is a bit complicated, and I won’t go into detail why but the recommended solution is:
The character U+1FBE (GREEK PROSGEGRAMMENI) should be named iotaadscript as per AGD (alternatively iotasmall) while in all other contexts, the strings YPOGEGRAMMENI and PROSGEGRAMMENI should be converted to iotasub. Not sure how to do this in the converter.

Currently, some become iotasub and others iotasubscript (and there's the "and" left):

alphalenisandiotasubscript 1F80
alphaasperandiotasubscript 1F81
Alphalenisiotasub 1F88
Alphaasperiotasub 1F89

Clean up Enclosed CJK Letters and Months

Looks like there are a few names in Enclosed CJK Letters and Months that could be treated better.

hpafullwidth 3371
dafullwidth 3372
aufullwidth 3373
barfullwidth 3374
ovfullwidth 3375
pcfullwidth 3376
dmfullwidth 3377
dm2fullwidth 3378
dm3fullwidth 3379
iufullwidth 337A

conflict in halfforms

hlfw-ahalfwidth                         Halfwidth and Fullwidth Forms           HALFWIDTH KATAKANA LETTER A
hlfw-ahalfwidth                         Halfwidth and Fullwidth Forms           HALFWIDTH HANGUL LETTER A

add script tag for each sub range

there are more doubles:

    hlfw-ahalfwidth
    hlfw-ehalfwidth
    hlfw-ihalfwidth
    hlfw-ohalfwidth
    hlfw-uhalfwidth
    hlfw-wahalfwidth
    hlfw-yahalfwidth
    hlfw-yohalfwidth
    hlfw-yuhalfwidth

"Radical" thought: "+"

Here's a more radical idea. FDK now also allows + in glyph names. So I recently thought — wouldn’t it be simplest to ditch the AGL names like Aacute and consistently use A+acute? The + could be the replacement of the Unicode character name term WITH. And the non-spacing marks could be renamed +acute, +grave etc. (instead of cmb-*).

Ok, it is a bit radical but... who knows.

Arabic naming: alefLamYehabove

0616 alef_lam_yehabove ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH
0616 alefLamYehabove ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH

this is a diacritic and not a ligature. So it’s precomposed and the components are not letters.

double cmb

For combining marks, I myself was thinking about using something like grave.mark or grave:mark. I see you’ve used the cmb- prefix. If so, would you say that perhaps the comb suffixes could be dropped from some? Right now it's:

cmb-gravecomb 0300
cmb-tildecomb 0303
cmb-macron 0304

but perhaps it'd make sense to just make it:

cmb-grave 0300
cmb-tilde 0303
cmb-macron 0304

037A YPOGEGRAMMENI

should this one follow the AGD list like: iotasubscript (see commit 72cf2ae)

or be set as iotalenissubscript or just ypogegrammeni

this is creating an conflict within the same script with iotalenis 1F30 GREEK SMALL LETTER IOTA WITH PSILI

Arabic naming: yehhamzaabove.medi_alef.fina

FBEA yeh.init_hamzaabove.medi_alef.fina ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF ISOLATED FORM
FBEA yehhamzaabove.medi_alef.fina ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF ISOLATED FORM

hamzaabove is a diacritic so it’s part of the letter “U+0626”. I guess before making the ligature you could check if any two components combined are a letter, so instead of treating them as two components just add the letter if its name exist in database. (there are many more examples like this in ligatures.)

makeotf supported char

I see that you’re using a dash as a separator between the "script" component and the rest of the glyph names. As of November 2015, MakeOTF allows the use of several additional characters in glyph names (+ * : ~ ^ !) but not -. Would you consider perhaps changing the separator to one of these allowed characters instead of the dash (I'd say :), e.g. U+0C8C could become kn:lvocal instead of kn-lvocal?

Include unicode category in exports?

It might be useful to export a list of unicode categories, or include the category in the name lists. The regular python unicodedata.category( unicodeChar) does not reach all the new ranges.

"Generating" script and variation abbreviations

I think it would make sense to utilize the OpenType script tags and feature tags as much as possible when generating script and variation names. The most common scripts should indeed have a supershort (2-char) abbreviations, for others we could use the 4-letter ones.

For example when I look at the halfwidth and fullwidth names, I think: there are two OT features, hwid for "Half Widths" and fwid for "Full Widths". So I imagine FF21 Afullwidth FULLWIDTH LATIN CAPITAL LETTER A becoming A:fwid and FF8C hlfw-huhalfwidth HALFWIDTH KATAKANA LETTER HU becoming hu:hwid or hwid:hu (if we choose to adopt :).

Also, Gurmukhi in OT uses guru while our list uses grmk. So I can imagine 0A22 grmk-ddha GURMUKHI LETTER DDHA becoming ddha:guru or guru:ddha.

I'm off for the weekend teaching, but I'll be back next week and will try to do some more systematic work on this. After more deliberation, I think prefixes are better than suffixes for scripts. :D

gravecomb-cmb

In glyphNamesToUnicode.txt the AGD names take precedence, which results in some odd combinations with the *comb-cmb.

greek not complete

there are duplicates in the latin-extended-d that dont get a nice prefix/suffix

   392 : Beta                                              Beta                     Beta                                    Greek and Coptic                        GREEK CAPITAL LETTER BETA
  A7B4 : Beta                                              -                        Beta                                    Latin Extended-D                        LATIN CAPITAL LETTER BETA

its seems like no latin will get a prefix...

see https://github.com/LettError/glyphNameFormatter/blob/master/Lib/glyphNameFormatter/__init__.py#L190

see

https://github.com/LettError/glyphNameFormatter/blob/master/Lib/glyphNameFormatter/names/glyphNamesToUnicode.txt#L888

https://github.com/LettError/glyphNameFormatter/blob/master/Lib/glyphNameFormatter/names/glyphNamesToUnicode.txt#L7669

Arabic naming: alefhamzaabove.isol

FE83 alefhamza.isol ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM
FE83 alefhamzaabove.isol ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM

there are two letters that contain alef and hamza. One is above the other one is below.

Arabic naming: rehtwodotsabove, seenthreedotsbelow, sadtwodotsbelow

0697 rehtwoabove ARABIC LETTER REH WITH TWO DOTS ABOVE
0697 rehtwodotsabove ARABIC LETTER REH WITH TWO DOTS ABOVE

069B seenthreebelow ARABIC LETTER SEEN WITH THREE DOTS BELOW
069B seenthreedotsbelow ARABIC LETTER SEEN WITH THREE DOTS BELOW

069D sadtwobelow ARABIC LETTER SAD WITH TWO DOTS BELOW
069D sadtwodotsbelow ARABIC LETTER SAD WITH TWO DOTS BELOW
in above examples the word dot is dropped, keeping it would be good? Maybe a capital A for above and B for below would make them shorter? There are more similar examples.

Zecyr / Reversed Zecyr

Report:
Seems that /Zecyr (/З U+0417) and /zecyr (/з U+0437) are absent from the Cyrillic section, but somehow appear in Cyrillic Supplement (although they seem not to be supplement but basic Cyrillic), but with the wrong Unicode.
https://en.wikipedia.org/wiki/Ze_(Cyrillic)

Indeed in Cyr Sup there are 'reversed equivalents' of these shapes, as the unicode description says.
/Ԑ /ԑ are the guys, respective unicodes U+0510 U+0511. Described as /Zereversedcyr & /zereversedcyr.
https://en.wikipedia.org/wiki/Reversed_Ze

Arabic naming: fullstop

06D4 ar-period ARABIC FULL STOP
06D4 fullstop ARABIC FULL STOP
this is not a generic arabic stop. It’s only used in Urdu. Calling it arabic period would be confusing.

Double "cyr"

I see these names with cyrcyr in glyphNamesToUnicode.txt — is that right?

Ksicyrcyr 046E
ksicyrcyr 046F
Psicyrcyr 0470
psicyrcyr 0471

name: cedilla vs commaaccent

for glyphs that don't have both: Scedilla, Scommaaccent and Tcedilla, Tcommaacent

should doest be renamed the cedilla to commaaccent?
All existing list (AGL, AGD and Georgs Glyphs list) change the cedilla to a commaaccent.

Historically this is maybe a nice...

Kcedilla                                          Kcommaaccent                  Kcommaaccent                  0136   Ķ    LATIN CAPITAL LETTER K WITH CEDILLA
kcedilla                                          kcommaaccent                  kcommaaccent                  0137   ķ    LATIN SMALL LETTER K WITH CEDILLA
Lcedilla                                          Lcommaaccent                  Lcommaaccent                  013B   Ļ    LATIN CAPITAL LETTER L WITH CEDILLA
lcedilla                                          lcommaaccent                  lcommaaccent                  013C   ļ    LATIN SMALL LETTER L WITH CEDILLA
Ncedilla                                          Ncommaaccent                  Ncommaaccent                  0145   Ņ    LATIN CAPITAL LETTER N WITH CEDILLA
ncedilla                                          ncommaaccent                  ncommaaccent                  0146   ņ    LATIN SMALL LETTER N WITH CEDILLA
Rcedilla                                          Rcommaaccent                  Rcommaaccent                  0156   Ŗ    LATIN CAPITAL LETTER R WITH CEDILLA
rcedilla                                          rcommaaccent                  rcommaaccent                  0157   ŗ    LATIN SMALL LETTER R WITH CEDILLA

Arabic ligature names with multiple suffixen

The names currently generated for Arabic ligatures are not following the OpenType Feature File Specification rules, pointed out by @justvanrossum.

The motivation for the current structure was to have valid glyphnames for each of the parts combined with the underscore. This makes it easier to find the parts without external knowledge. (@typoman)

Just dropping the suffixes for the parts is not a good idea. The part names are no longer unique and cause 158 collisions. For instance:

tah_yeh
	fcf6	tah.init_yeh.fina
	fd12	tah.medi_yeh.fina

alefmaksura_superscriptalef
	fc5d	alefmaksura.init_superscriptalef.fina
	fc90	alefmaksura.medi_superscriptalef.fina

jeem_hah
	fc15	jeem.init_hah.fina
	fca7	jeem.init_hah.medi

I think we should use the unicode patterns (also described in the Adobe reference) for these ligatures.

These are the names in question:

0xfbea yeh.init_hamzaabove.medi_alef.fina
0xfbeb yeh.medi_hamzaabove.medi_alef.fina
0xfbec yeh.init_hamzaabove.medi_ae.fina
0xfbed yeh.medi_hamzaabove.medi_ae.fina
0xfbee yeh.init_hamzaabove.medi_waw.fina
0xfbef yeh.medi_hamzaabove.medi_waw.fina
0xfbf0 yeh.init_hamzaabove.medi_u.fina
0xfbf1 yeh.medi_hamzaabove.medi_u.fina
0xfbf2 yeh.init_hamzaabove.medi_oe.fina
0xfbf3 yeh.medi_hamzaabove.medi_oe.fina
0xfbf4 yeh.init_hamzaabove.medi_yu.fina
0xfbf5 yeh.medi_hamzaabove.medi_yu.fina
0xfbf6 yeh.init_hamzaabove.medi_e.fina
0xfbf7 yeh.medi_hamzaabove.medi_e.fina
0xfbf8 yeh.init_hamzaabove.medi_e.medi
0xfbf9 uighurkirghizyeh.init_hamzaabove.medi_alefmaksura.fina
0xfbfa uighurkirghizyeh.medi_hamzaabove.medi_alefmaksura.fina
0xfbfb uighurkirghizyeh.init_hamzaabove.medi_alefmaksura.medi
0xfc00 yeh.init_hamzaabove.medi_jeem.fina
0xfc01 yeh.init_hamzaabove.medi_hah.fina
0xfc02 yeh.init_hamzaabove.medi_meem.fina
0xfc03 yeh.init_hamzaabove.medi_alefmaksura.fina
0xfc04 yeh.init_hamzaabove.medi_yeh.fina
0xfc05 beh.init_jeem.fina
0xfc06 beh.init_hah.fina
0xfc07 beh.init_khah.fina
0xfc08 beh.init_meem.fina
0xfc09 beh.init_alefmaksura.fina
0xfc0a beh.init_yeh.fina
0xfc0b teh.init_jeem.fina
0xfc0c teh.init_hah.fina
0xfc0d teh.init_khah.fina
0xfc0e teh.init_meem.fina
0xfc0f teh.init_alefmaksura.fina
0xfc10 teh.init_yeh.fina
0xfc11 theh.init_jeem.fina
0xfc12 theh.init_meem.fina
0xfc13 theh.init_alefmaksura.fina
0xfc14 theh.init_yeh.fina
0xfc15 jeem.init_hah.fina
0xfc16 jeem.init_meem.fina
0xfc17 hah.init_jeem.fina
0xfc18 hah.init_meem.fina
0xfc19 khah.init_jeem.fina
0xfc1a khah.init_hah.fina
0xfc1b khah.init_meem.fina
0xfc1c seen.init_jeem.fina
0xfc1d seen.init_hah.fina
0xfc1e seen.init_khah.fina
0xfc1f seen.init_meem.fina
0xfc20 sad.init_hah.fina
0xfc21 sad.init_meem.fina
0xfc22 dad.init_jeem.fina
0xfc23 dad.init_hah.fina
0xfc24 dad.init_khah.fina
0xfc25 dad.init_meem.fina
0xfc26 tah.init_hah.fina
0xfc27 tah.init_meem.fina
0xfc28 zah.init_meem.fina
0xfc29 ain.init_jeem.fina
0xfc2a ain.init_meem.fina
0xfc2b ghain.init_jeem.fina
0xfc2c ghain.init_meem.fina
0xfc2d feh.init_jeem.fina
0xfc2e feh.init_hah.fina
0xfc2f feh.init_khah.fina
0xfc30 feh.init_meem.fina
0xfc31 feh.init_alefmaksura.fina
0xfc32 feh.init_yeh.fina
0xfc33 qaf.init_hah.fina
0xfc34 qaf.init_meem.fina
0xfc35 qaf.init_alefmaksura.fina
0xfc36 qaf.init_yeh.fina
0xfc37 kaf.init_alef.fina
0xfc38 kaf.init_jeem.fina
0xfc39 kaf.init_hah.fina
0xfc3a kaf.init_khah.fina
0xfc3b kaf.init_lam.fina
0xfc3c kaf.init_meem.fina
0xfc3d kaf.init_alefmaksura.fina
0xfc3e kaf.init_yeh.fina
0xfc3f lam.init_jeem.fina
0xfc40 lam.init_hah.fina
0xfc41 lam.init_khah.fina
0xfc42 lam.init_meem.fina
0xfc43 lam.init_alefmaksura.fina
0xfc44 lam.init_yeh.fina
0xfc45 meem.init_jeem.fina
0xfc46 meem.init_hah.fina
0xfc47 meem.init_khah.fina
0xfc48 meem.init_meem.fina
0xfc49 meem.init_alefmaksura.fina
0xfc4a meem.init_yeh.fina
0xfc4b noon.init_jeem.fina
0xfc4c noon.init_hah.fina
0xfc4d noon.init_khah.fina
0xfc4e noon.init_meem.fina
0xfc4f noon.init_alefmaksura.fina
0xfc50 noon.init_yeh.fina
0xfc51 heh.init_jeem.fina
0xfc52 heh.init_meem.fina
0xfc53 heh.init_alefmaksura.fina
0xfc54 heh.init_yeh.fina
0xfc55 yeh.init_jeem.fina
0xfc56 yeh.init_hah.fina
0xfc57 yeh.init_khah.fina
0xfc58 yeh.init_meem.fina
0xfc59 yeh.init_alefmaksura.fina
0xfc5a yeh.init_yeh.fina
0xfc5b thal.init_superscriptalef.fina
0xfc5c reh.init_superscriptalef.fina
0xfc5d alefmaksura.init_superscriptalef.fina
0xfc64 yeh.medi_hamzaabove.medi_reh.fina
0xfc65 yeh.medi_hamzaabove.medi_zain.fina
0xfc66 yeh.medi_hamzaabove.medi_meem.fina
0xfc67 yeh.medi_hamzaabove.medi_noon.fina
0xfc68 yeh.medi_hamzaabove.medi_alefmaksura.fina
0xfc69 yeh.medi_hamzaabove.medi_yeh.fina
0xfc6a beh.medi_reh.fina
0xfc6b beh.medi_zain.fina
0xfc6c beh.medi_meem.fina
0xfc6d beh.medi_noon.fina
0xfc6e beh.medi_alefmaksura.fina
0xfc6f beh.medi_yeh.fina
0xfc70 teh.medi_reh.fina
0xfc71 teh.medi_zain.fina
0xfc72 teh.medi_meem.fina
0xfc73 teh.medi_noon.fina
0xfc74 teh.medi_alefmaksura.fina
0xfc75 teh.medi_yeh.fina
0xfc76 theh.medi_reh.fina
0xfc77 theh.medi_zain.fina
0xfc78 theh.medi_meem.fina
0xfc79 theh.medi_noon.fina
0xfc7a theh.medi_alefmaksura.fina
0xfc7b theh.medi_yeh.fina
0xfc7c feh.medi_alefmaksura.fina
0xfc7d feh.medi_yeh.fina
0xfc7e qaf.medi_alefmaksura.fina
0xfc7f qaf.medi_yeh.fina
0xfc80 kaf.medi_alef.fina
0xfc81 kaf.medi_lam.fina
0xfc82 kaf.medi_meem.fina
0xfc83 kaf.medi_alefmaksura.fina
0xfc84 kaf.medi_yeh.fina
0xfc85 lam.medi_meem.fina
0xfc86 lam.medi_alefmaksura.fina
0xfc87 lam.medi_yeh.fina
0xfc88 meem.medi_alef.fina
0xfc89 meem.medi_meem.fina
0xfc8a noon.medi_reh.fina
0xfc8b noon.medi_zain.fina
0xfc8c noon.medi_meem.fina
0xfc8d noon.medi_noon.fina
0xfc8e noon.medi_alefmaksura.fina
0xfc8f noon.medi_yeh.fina
0xfc90 alefmaksura.medi_superscriptalef.fina
0xfc91 yeh.medi_reh.fina
0xfc92 yeh.medi_zain.fina
0xfc93 yeh.medi_meem.fina
0xfc94 yeh.medi_noon.fina
0xfc95 yeh.medi_alefmaksura.fina
0xfc96 yeh.medi_yeh.fina
0xfc97 yeh.init_hamzaabove.medi_jeem.medi
0xfc98 yeh.init_hamzaabove.medi_hah.medi
0xfc99 yeh.init_hamzaabove.medi_khah.medi
0xfc9a yeh.init_hamzaabove.medi_meem.medi
0xfc9b yeh.init_hamzaabove.medi_heh.medi
0xfc9c beh.init_jeem.medi
0xfc9d beh.init_hah.medi
0xfc9e beh.init_khah.medi
0xfc9f beh.init_meem.medi
0xfca0 beh.init_heh.medi
0xfca1 teh.init_jeem.medi
0xfca2 teh.init_hah.medi
0xfca3 teh.init_khah.medi
0xfca4 teh.init_meem.medi
0xfca5 teh.init_heh.medi
0xfca6 theh.init_meem.medi
0xfca7 jeem.init_hah.medi
0xfca8 jeem.init_meem.medi
0xfca9 hah.init_jeem.medi
0xfcaa hah.init_meem.medi
0xfcab khah.init_jeem.medi
0xfcac khah.init_meem.medi
0xfcad seen.init_jeem.medi
0xfcae seen.init_hah.medi
0xfcaf seen.init_khah.medi
0xfcb0 seen.init_meem.medi
0xfcb1 sad.init_hah.medi
0xfcb2 sad.init_khah.medi
0xfcb3 sad.init_meem.medi
0xfcb4 dad.init_jeem.medi
0xfcb5 dad.init_hah.medi
0xfcb6 dad.init_khah.medi
0xfcb7 dad.init_meem.medi
0xfcb8 tah.init_hah.medi
0xfcb9 zah.init_meem.medi
0xfcba ain.init_jeem.medi
0xfcbb ain.init_meem.medi
0xfcbc ghain.init_jeem.medi
0xfcbd ghain.init_meem.medi
0xfcbe feh.init_jeem.medi
0xfcbf feh.init_hah.medi
0xfcc0 feh.init_khah.medi
0xfcc1 feh.init_meem.medi
0xfcc2 qaf.init_hah.medi
0xfcc3 qaf.init_meem.medi
0xfcc4 kaf.init_jeem.medi
0xfcc5 kaf.init_hah.medi
0xfcc6 kaf.init_khah.medi
0xfcc7 kaf.init_lam.medi
0xfcc8 kaf.init_meem.medi
0xfcc9 lam.init_jeem.medi
0xfcca lam.init_hah.medi
0xfccb lam.init_khah.medi
0xfccc lam.init_meem.medi
0xfccd lam.init_heh.medi
0xfcce meem.init_jeem.medi
0xfccf meem.init_hah.medi
0xfcd0 meem.init_khah.medi
0xfcd1 meem.init_meem.medi
0xfcd2 noon.init_jeem.medi
0xfcd3 noon.init_hah.medi
0xfcd4 noon.init_khah.medi
0xfcd5 noon.init_meem.medi
0xfcd6 noon.init_heh.medi
0xfcd7 heh.init_jeem.medi
0xfcd8 heh.init_meem.medi
0xfcd9 heh.init_superscriptalef.medi
0xfcda yeh.init_jeem.medi
0xfcdb yeh.init_hah.medi
0xfcdc yeh.init_khah.medi
0xfcdd yeh.init_meem.medi
0xfcde yeh.init_heh.medi
0xfcdf yeh.medi_hamzaabove.medi_meem.medi
0xfce0 yeh.medi_hamzaabove.medi_heh.medi
0xfce1 beh.medi_meem.medi
0xfce2 beh.medi_heh.medi
0xfce3 teh.medi_meem.medi
0xfce4 teh.medi_heh.medi
0xfce5 theh.medi_meem.medi
0xfce6 theh.medi_heh.medi
0xfce7 seen.medi_meem.medi
0xfce8 seen.medi_heh.medi
0xfce9 sheen.medi_meem.medi
0xfcea sheen.medi_heh.medi
0xfceb kaf.medi_lam.medi
0xfcec kaf.medi_meem.medi
0xfced lam.medi_meem.medi
0xfcee noon.medi_meem.medi
0xfcef noon.medi_heh.medi
0xfcf0 yeh.medi_meem.medi
0xfcf1 yeh.medi_heh.medi
0xfcf5 tah.init_alefmaksura.fina
0xfcf6 tah.init_yeh.fina
0xfcf7 ain.init_alefmaksura.fina
0xfcf8 ain.init_yeh.fina
0xfcf9 ghain.init_alefmaksura.fina
0xfcfa ghain.init_yeh.fina
0xfcfb seen.init_alefmaksura.fina
0xfcfc seen.init_yeh.fina
0xfcfd sheen.init_alefmaksura.fina
0xfcfe sheen.init_yeh.fina
0xfcff hah.init_alefmaksura.fina
0xfd00 hah.init_yeh.fina
0xfd01 jeem.init_alefmaksura.fina
0xfd02 jeem.init_yeh.fina
0xfd03 khah.init_alefmaksura.fina
0xfd04 khah.init_yeh.fina
0xfd05 sad.init_alefmaksura.fina
0xfd06 sad.init_yeh.fina
0xfd07 dad.init_alefmaksura.fina
0xfd08 dad.init_yeh.fina
0xfd09 sheen.init_jeem.fina
0xfd0a sheen.init_hah.fina
0xfd0b sheen.init_khah.fina
0xfd0c sheen.init_meem.fina
0xfd0d sheen.init_reh.fina
0xfd0e seen.init_reh.fina
0xfd0f sad.init_reh.fina
0xfd10 dad.init_reh.fina
0xfd11 tah.medi_alefmaksura.fina
0xfd12 tah.medi_yeh.fina
0xfd13 ain.medi_alefmaksura.fina
0xfd14 ain.medi_yeh.fina
0xfd15 ghain.medi_alefmaksura.fina
0xfd16 ghain.medi_yeh.fina
0xfd17 seen.medi_alefmaksura.fina
0xfd18 seen.medi_yeh.fina
0xfd19 sheen.medi_alefmaksura.fina
0xfd1a sheen.medi_yeh.fina
0xfd1b hah.medi_alefmaksura.fina
0xfd1c hah.medi_yeh.fina
0xfd1d jeem.medi_alefmaksura.fina
0xfd1e jeem.medi_yeh.fina
0xfd1f khah.medi_alefmaksura.fina
0xfd20 khah.medi_yeh.fina
0xfd21 sad.medi_alefmaksura.fina
0xfd22 sad.medi_yeh.fina
0xfd23 dad.medi_alefmaksura.fina
0xfd24 dad.medi_yeh.fina
0xfd25 sheen.medi_jeem.fina
0xfd26 sheen.medi_hah.fina
0xfd27 sheen.medi_khah.fina
0xfd28 sheen.medi_meem.fina
0xfd29 sheen.medi_reh.fina
0xfd2a seen.medi_reh.fina
0xfd2b sad.medi_reh.fina
0xfd2c dad.medi_reh.fina
0xfd2d sheen.init_jeem.medi
0xfd2e sheen.init_hah.medi
0xfd2f sheen.init_khah.medi
0xfd30 sheen.init_meem.medi
0xfd31 seen.init_heh.medi
0xfd32 sheen.init_heh.medi
0xfd33 tah.init_meem.medi
0xfd34 seen.medi_jeem.medi
0xfd35 seen.medi_hah.medi
0xfd36 seen.medi_khah.medi
0xfd37 sheen.medi_jeem.medi
0xfd38 sheen.medi_hah.medi
0xfd39 sheen.medi_khah.medi
0xfd3a tah.medi_meem.medi
0xfd3b zah.medi_meem.medi
0xfd3c alef.medi_fathatan.fina
0xfd3d alef.init_fathatan.fina
0xfd50 teh.init_jeem.medi_meem.medi
0xfd51 teh.medi_hah.medi_jeem.fina
0xfd52 teh.init_hah.medi_jeem.medi
0xfd53 teh.init_hah.medi_meem.medi
0xfd54 teh.init_khah.medi_meem.medi
0xfd55 teh.init_meem.medi_jeem.medi
0xfd56 teh.init_meem.medi_hah.medi
0xfd57 teh.init_meem.medi_khah.medi
0xfd58 jeem.medi_meem.medi_hah.fina
0xfd59 jeem.init_meem.medi_hah.medi
0xfd5a hah.medi_meem.medi_yeh.fina
0xfd5b hah.medi_meem.medi_alefmaksura.fina
0xfd5c seen.init_hah.medi_jeem.medi
0xfd5d seen.init_jeem.medi_hah.medi
0xfd5e seen.medi_jeem.medi_alefmaksura.fina
0xfd5f seen.medi_meem.medi_hah.fina
0xfd60 seen.init_meem.medi_hah.medi
0xfd61 seen.init_meem.medi_jeem.medi
0xfd62 seen.medi_meem.medi_meem.fina
0xfd63 seen.init_meem.medi_meem.medi
0xfd64 sad.medi_hah.medi_hah.fina
0xfd65 sad.init_hah.medi_hah.medi
0xfd66 sad.medi_meem.medi_meem.fina
0xfd67 sheen.medi_hah.medi_meem.fina
0xfd68 sheen.init_hah.medi_meem.medi
0xfd69 sheen.medi_jeem.medi_yeh.fina
0xfd6a sheen.medi_meem.medi_khah.fina
0xfd6b sheen.init_meem.medi_khah.medi
0xfd6c sheen.medi_meem.medi_meem.fina
0xfd6d sheen.init_meem.medi_meem.medi
0xfd6e dad.medi_hah.medi_alefmaksura.fina
0xfd6f dad.medi_khah.medi_meem.fina
0xfd70 dad.init_khah.medi_meem.medi
0xfd71 tah.medi_meem.medi_hah.fina
0xfd72 tah.init_meem.medi_hah.medi
0xfd73 tah.init_meem.medi_meem.medi
0xfd74 tah.medi_meem.medi_yeh.fina
0xfd75 ain.medi_jeem.medi_meem.fina
0xfd76 ain.medi_meem.medi_meem.fina
0xfd77 ain.init_meem.medi_meem.medi
0xfd78 ain.medi_meem.medi_alefmaksura.fina
0xfd79 ghain.medi_meem.medi_meem.fina
0xfd7a ghain.medi_meem.medi_yeh.fina
0xfd7b ghain.medi_meem.medi_alefmaksura.fina
0xfd7c feh.medi_khah.medi_meem.fina
0xfd7d feh.init_khah.medi_meem.medi
0xfd7e qaf.medi_meem.medi_hah.fina
0xfd7f qaf.medi_meem.medi_meem.fina
0xfd80 lam.medi_hah.medi_meem.fina
0xfd81 lam.medi_hah.medi_yeh.fina
0xfd82 lam.medi_hah.medi_alefmaksura.fina
0xfd83 lam.init_jeem.medi_jeem.medi
0xfd84 lam.medi_jeem.medi_jeem.fina
0xfd85 lam.medi_khah.medi_meem.fina
0xfd86 lam.init_khah.medi_meem.medi
0xfd87 lam.medi_meem.medi_hah.fina
0xfd88 lam.init_meem.medi_hah.medi
0xfd89 meem.init_hah.medi_jeem.medi
0xfd8a meem.init_hah.medi_meem.medi
0xfd8b meem.medi_hah.medi_yeh.fina
0xfd8c meem.init_jeem.medi_hah.medi
0xfd8d meem.init_jeem.medi_meem.medi
0xfd8e meem.init_khah.medi_jeem.medi
0xfd8f meem.init_khah.medi_meem.medi
0xfd92 meem.init_jeem.medi_khah.medi
0xfd93 heh.init_meem.medi_jeem.medi
0xfd94 heh.init_meem.medi_meem.medi
0xfd95 noon.init_hah.medi_meem.medi
0xfd96 noon.medi_hah.medi_alefmaksura.fina
0xfd97 noon.medi_jeem.medi_meem.fina
0xfd98 noon.init_jeem.medi_meem.medi
0xfd99 noon.medi_jeem.medi_alefmaksura.fina
0xfd9a noon.medi_meem.medi_yeh.fina
0xfd9b noon.medi_meem.medi_alefmaksura.fina
0xfd9c yeh.medi_meem.medi_meem.fina
0xfd9d yeh.init_meem.medi_meem.medi
0xfd9e beh.medi_khah.medi_yeh.fina
0xfd9f teh.medi_jeem.medi_yeh.fina
0xfda0 teh.medi_jeem.medi_alefmaksura.fina
0xfda1 teh.medi_khah.medi_yeh.fina
0xfda2 teh.medi_khah.medi_alefmaksura.fina
0xfda3 teh.medi_meem.medi_yeh.fina
0xfda4 teh.medi_meem.medi_alefmaksura.fina
0xfda5 jeem.medi_meem.medi_yeh.fina
0xfda6 jeem.medi_hah.medi_alefmaksura.fina
0xfda7 jeem.medi_meem.medi_alefmaksura.fina
0xfda8 seen.medi_khah.medi_alefmaksura.fina
0xfda9 sad.medi_hah.medi_yeh.fina
0xfdaa sheen.medi_hah.medi_yeh.fina
0xfdab dad.medi_hah.medi_yeh.fina
0xfdac lam.medi_jeem.medi_yeh.fina
0xfdad lam.medi_meem.medi_yeh.fina
0xfdae yeh.medi_hah.medi_yeh.fina
0xfdaf yeh.medi_jeem.medi_yeh.fina
0xfdb0 yeh.medi_meem.medi_yeh.fina
0xfdb1 meem.medi_meem.medi_yeh.fina
0xfdb2 qaf.medi_meem.medi_yeh.fina
0xfdb3 noon.medi_hah.medi_yeh.fina
0xfdb4 qaf.init_meem.medi_hah.medi
0xfdb5 lam.init_hah.medi_meem.medi
0xfdb6 ain.medi_meem.medi_yeh.fina
0xfdb7 kaf.medi_meem.medi_yeh.fina
0xfdb8 noon.init_jeem.medi_hah.medi
0xfdb9 meem.medi_khah.medi_yeh.fina
0xfdba lam.init_jeem.medi_meem.medi
0xfdbb kaf.medi_meem.medi_meem.fina
0xfdbc lam.medi_jeem.medi_meem.fina
0xfdbd noon.medi_jeem.medi_hah.fina
0xfdbe jeem.medi_hah.medi_yeh.fina
0xfdbf hah.medi_jeem.medi_yeh.fina
0xfdc0 meem.medi_jeem.medi_yeh.fina
0xfdc1 feh.medi_meem.medi_yeh.fina
0xfdc2 beh.medi_hah.medi_yeh.fina
0xfdc3 kaf.init_meem.medi_meem.medi
0xfdc4 ain.init_jeem.medi_meem.medi
0xfdc5 sad.init_meem.medi_meem.medi
0xfdc6 seen.medi_khah.medi_yeh.fina
0xfdc7 noon.medi_jeem.medi_yeh.fina
0xfef5 lam.init_alef.medi_maddaabove.fina
0xfef6 lam.medi_alef.medi_maddaabove.fina
0xfef7 lam.init_alef.medi_hamzaabove.fina
0xfef8 lam.medi_alef.medi_hamzaabove.fina
0xfef9 lam.init_alef.medi_hamzabelow.fina
0xfefa lam.medi_alef.medi_hamzabelow.fina
0xfefb lam.init_alef.fina
0xfefc lam.medi_alef.fina

Arabic naming: meemabove, kasratan

06E2 meemabove.isol ARABIC SMALL HIGH MEEM ISOLATED FORM
06E2 meemabove ARABIC SMALL HIGH MEEM ISOLATED FORM

FE74 kasratan.isol ARABIC KASRATAN ISOLATED FORM
FE74 kasratan ARABIC KASRATAN ISOLATED FORM

For diacritics it’s better to ignore the isol and related feature tags. For automated feature generation it could cause unexpected results. there are more examples like this so treating them parametrically would be faster. If there is a initial, medial or final term inside the diacritic term, adding the actual full term without using the feature tag would be a good idea. For isolated forms of diacritics you could ignore the isolated term and not adding the term makes it cleaner.

Doubling separators to resolve ambiguities

I'd like to suggest a simple algorithmic approach to resolve ambiguities in compound glyph names.

Let's for now agree that our sxtended conventions for glyph name separators will be, on the decreasing order of strength (unlike the AGL rules where . is stronger than _):

  • use _ to indicate constituents of a glyph that corresponds in some way to multiple glyphs or characters
  • use . to indicate a split between “character” portion and “feature” portion
  • use : as a categorizing separator within the character or feature portion

With this simple system, there will be a few edge cases where the hierarchy is ambiguous. I’d like to propose a simple solution — by repeating the separator, we increase its splitting priority.

So, the name f_f.fina is a bit ambiguous as we’re not a 100% sure whether it represents something like (f)_(f.fina) or something like (f_f).fina. A simple way to resolve this ambiguity would be to use the name f__f.fina in the 1st case and f_f..fina in the 2nd.

Another such case: lam.medi_alif.init.ss01 is a bit ambiguous but lam.medi_alif.init..ss01 = (lam.medi_alif.init.)ss01 while lam.medi__alif.init.ss01 = (lam.medi)_(alif.init.ss01).

This suggestion is a bit out of scope for this project, but I'd like to hear your thoughts about it as a guiding principle.

Arabic naming: dotlessqaf, dotlessbeh

066E behdotless ARABIC LETTER DOTLESS BEH
066E dotlessbeh ARABIC LETTER DOTLESS BEH
it sounds more clear this way.

066F qafdotless ARABIC LETTER DOTLESS QAF
066F dotlessqaf ARABIC LETTER DOTLESS QAF
it sounds more clear this way.

should GREEK have a suffix?

gr-iotasubscript
gr-mu 
gr-nu 
gr-pi  
gr-question 
gr-sho
gr-xi  

only GREEK QUESTION MARK 037E should have a prefix

Vowel removal in scriptPrefixes.py

scriptPrefixes has a curious bit:

        # remove all vowels
        key = [c for c in key if c not in "aeiou -"]

Is this really necessary? I’d really love if the OT script tags were used as they are (except if we override them). This vowel removal feels a bit over the top and may introduce inconsistencies.

Arabic naming: dotlesskhahabove

06E1 khahdotlessabove ARABIC SMALL HIGH DOTLESS HEAD OF KHAH
06E1 dotlesskhahabove ARABIC SMALL HIGH DOTLESS HEAD OF KHAH

Hyphen separator for accents?

Looking through the glyph list, I wonder if it's worth adding a hyphen to separate the accent from the base glyph, e.g. Schwa-dieresis rather than Schwadieresis?

aacute and friends are relatively easy to discern (and I think that we're used to setting that format), but the non-latin letters that have names spelled out get much harder to parse.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.