Git Product home page Git Product logo

Comments (6)

hhzl avatar hhzl commented on July 17, 2024

Juan [email protected] Fri, Dec 4, 2015 at 2:32 PM to mailing list

Current status of Cuis Unicode support:

  1. Limited Unicode support (as in Cuis). Can handle Unicode character in Strings. Comfortable Display / edit of text is restricted to Latin alphabet. Non ISO-8859-15 characters are represented as NCRs, but are not instances of Character themselves. For example, an NCR such as 'α' (made of 5 8-bit Characters) represents the greek letter Alpha, and is properly handled if such string is converted to an UTF-8 ByteArray (for example for copying into the Clipboard or for serving Web pages). In short, you can not directly edit or display general Unicode, but you can embed it in code, include it in Strings, copy&paste, and serve web pages with it.

  2. Ken's Cuis-Smalltalk-Unicode. Can display and edit. Includes the great Ropes representation for Strings. Limited font support.

References:
NCR: http://en.wikipedia.org/wiki/Numeric_character_reference
Ken's: https://github.com/KenDickey/Cuis-Smalltalk-Unicode

from cuis-smalltalk-dev.

hhzl avatar hhzl commented on July 17, 2024

Explain the function of #initializeUnicodeCodePoints

from cuis-smalltalk-dev.

hhzl avatar hhzl commented on July 17, 2024

http://utf8everywhere.org/
(7)
By design of this encoding, UTF-8 guarantees that an ASCII character value or a substring will never match a part of a multi-byte encoded character.
....
Also, you can search for a non-ASCII, UTF-8 encoded substring in a UTF-8 string as if it was a plain byte array—there is no need to mind code point boundaries. This is thanks to another design feature of UTF-8—a leading byte of an encoded code point can never hold value corresponding to one of trailing bytes of any other code point.

(10)
Always produce text output files in UTF-8.

(11)

Q: What do you think about line endings?

A: Always use \n (0x0a) line endings, even on Windows. Files should be read and written in binary mode, which guarantees interoperability—a program will always give the same output on any system. Since the C and C++ standards use \n as in-memory line endings, this will cause all files to be written in the POSIX convention.

from cuis-smalltalk-dev.

hhzl avatar hhzl commented on July 17, 2024

Code conversion

Squeak

 (UTF8TextConverter new encodeString: 'Les élèves Français') asByteArray 

 #[76 101 115 32 195 169 108 195 168 118 101 115 32 70 114 97 110 195 167 97 105 115]

http://wiki.squeak.org/squeak/6224

Pharo

'Les élèves Français' utf8Encoded
#[76 101 115 32 195 169 108 195 168 118 101 115 32 70 114 97 110 195 167 97 105 115]

http://files.pharo.org/books/enterprisepharo/book/Zinc-Encoding-Meta/Zinc-Encoding-Meta.html

$é is encoded in ISO-9959-1 as #[233], but as #[195 169] in UTF-8.

from cuis-smalltalk-dev.

hhzl avatar hhzl commented on July 17, 2024

Class Unicode in Squeak and Pharo
http://wiki.squeak.org/squeak/6225

from cuis-smalltalk-dev.

jvuletich avatar jvuletich commented on July 17, 2024

This issue has been stagnant for five years. If you think further action is in order, please discuss at https://lists.cuis.st/mailman/listinfo/cuis-dev

from cuis-smalltalk-dev.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.