Git Product home page Git Product logo

ldml-keyboards-dev's People

Contributors

davidlrowe avatar mhosken avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ldml-keyboards-dev's Issues

Reorder elements can induce enormous tries

If I understand correctly, the class Trie will generate a Node for each string matching the concatenation of a 'before' and a 'from' attribute of a reorder element. However the number of such strings can be enormous. For example, for the seriously proposed reorder element

<reorder from="\u1A60[\u1A20-\u1A49]" order="75"
             before="[\u1A69\u1A6A\u1A6C][\u1A55-\u1A5E\u1A62\u1A65-\u1A68\u1A6D-\u1A7F][\u1A55-\u1A5E\u1A62\u1A65-\u1A68\u1A6D-\u1A7F][\u1A55-\u1A5E\u1A62\u1A65-\u1A68\u1A6D-\u1A7F][\u1A55-\u1A5E\u1A62\u1A65-\u1A68\u1A6D-\u1A7F]"/>

there are about 150 million matching strings. This overwhelms a 32-bit implementation. For example, a keyboard definition containing this element failed with 'MemoryError' after one hour of wall time. This was observed with both Python 2.7.12 and Python 3.5.2 on 32-bit Ununtu Xenial uname -a:
Linux JRWUBU2 4.4.0-163-generic #191-Ubuntu SMP Wed Sep 11 17:09:46 UTC 2019 i686 athlon i686 GNU/Linux

Need introduction

I am curious with this project.

I understand ldml layouts are installable out of box in mac.

does other OS support ldml layouts yet?

can we submit new layouts to unicode consortium for distribution?

Can't handle US international keyboards

The 'reference programme' can't handle the US International Keyboard en-t-k0-windows-extended.xml from CLDR Version 35.1. So far, three problems have been identified, all when running with Pythom 2.7.

  1. Loading gets into an infinite loop with the transforms for the apostrophe dead key. Commenting them out enables the rest of the keyboard to load.
  2. If the 'settings' element is left in, the program reports an error because it tries to normalise a string of type 'str' rather than 'unicode'. Prior to this, string are output as blank.
  3. If the mapping of [shift C11] to quote is left in, the processing of a line containing it gets stuck in an infinite loop. Changing the mapping to the sequence QUOTE avoids the problem.

The test input sequence was:

[C11][D03][BKSP]
[shift E01][E06]
[shift B05][C12][C12][C12][C12]
[shift C01][C12][shift E11][B09][shift E11][shift C04][BKSP]
[shift D05][shift E11][shift C04][shift E11][B09][BKSP]
[B02][shift E11][B09][shift E11][B09][C12][BKSP]
[B02][shift E11][shift C11][shift E11][B09][BKSP]
[C01]

Competing Simple Transforms

There's a fault in the handling of 'simple' transforms in ldml_keyboards.py. It doesn't handle competing transforms when one is a prefix of the other. The LDML specifivation (Version 35) defines the behaviour in Section 5.18 with the examples 'ab', 'abc' and 'abd'.

I found a real-life example with my XSAMPA keyboard. Starting from an ASCII keyboard, I added the transforms

<transforms type="simple">
    <!--transform from="B" to="β"/--> <!-- U+03b2 GREEK SMALL LETTER BETA -->
    <transform from="B\\" to="ʙ"/> <!-- U+0299 LATIN LETTER SMALL CAPITAL B -->
    <transform from="B\\\\" to = "B"/>
    <!--transform from="a~" to = "a\u0303"/--><!--to="ã"/--> <!-- U+00E3 LATIN SMALL LETTER A WITH TILDE -->
     <transform from="A\\_._F" to="Ậ"/> <!-- U+1EAC LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW -->
</transforms>

and tried the input

[shift B05][C12][C12][C12][C12]
[shift C01][C12][shift E11][B09][shift E11][shift C04][BKSP]

I defined no settings element.

I got

B, ʙ, ʙ\, ʙ\\, ʙ\\\
A, A\, A\_, A\_., A\_._, Ậ, 

The transform of the 2-character sequence "B\" is applied without waiting to see if the longer 3-element sequence "B\\" is input. The much longer 6-character input sequence "A\_._F" was handled correctly. Incidentally, the input handler chokes on redundant backslashes, as can be seen by deleting one of the backslashes from the last of the 'simple' transforms.

The problem is that the transformations seems to be being applied in the KMfL fashion, whereby matches are taken as soon as input, rather than the scheme used in the related Emacs MULE and M17n keyboards, which look for the longest match before committing the conversion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.