google / emoji4unicode Goto Github PK

Automatically exported from code.google.com/p/emoji4unicode

License: Apache License 2.0

Python 97.93% Shell 2.07%

emoji4unicode's Introduction

Project name: emoji4unicode
Project location: http://code.google.com/p/emoji4unicode/

Google uses Private Use mappings to represent Emoji ("picture character")
symbols in Unicode text.
These characters are commonly used by Japanese cell phone carriers.
This project makes these mappings available.

Google and other members of the Unicode consortium are also developing
a proposal for the addition of standardized Emoji symbol characters to Unicode.
This project also provides data and tools that can be used in the development
of the proposal.
The tools are Python scripts that provide for consistency checks,
reports on the data, and chart generation.

The project documentation is available at
http://sites.google.com/site/unicodesymbols/Home/emoji-symbols
and its subpages.

emoji4unicode's People

Contributors

Stargazers

Watchers

Forkers

einharch wsad137 ashank lingkeyang iamkouder khongthequenbluestar crissov zzhqk joncampbell123 lusl1991 neotim isabella232 ghas-results

emoji4unicode's Issues

Place circled ideographs in a common block

MarkD commenting on issue 19:

I'd suggest all the circled ones together, either before or after all the
squared ones.

e-B2B
e-B3D
e-B43
e-B50
next to the other circled ones.

Original issue reported on code.google.com by katmomoi on 5 Dec 2008 at 5:45

review again ARIB vs. Emoji

Compare again the sets of ARIB and Emoji symbols. Review currently proposed 
unifications, and review whether we should unify more symbols.

Original issue reported on code.google.com by markus.icu on 26 Nov 2008 at 10:27

test that Unicode character names in <ann>otations match UnicodeData.txt

For example, in cross-references.

Original issue reported on code.google.com by markus.icu on 15 Dec 2008 at 8:41

e-1B5 YEOMAN OF THE GUARD misnamed?

William Overington pointed me to a thread at http://forum.high-
logic.com/viewtopic.php?p=10544#p10544 where he made the following comment:

In the http://www.unicode.org/~scherer/emoji4u ... t/utc.html document as 
it stands as I write this post, I did notice one item straightaway.

About a fifth of the way down is an item labelled as YEOMAN OF THE GUARD 
with the note • Beefeater, British below it. Yet the illustration is of a 
Guardsman of a Foot Guard regiment, not of a Yeoman of the Guard, which is 
a different group.

The Yeomen of the Guard wear Tudor-style uniforms, with a flat hat. They 
are typically shown on the television at the State Opening of Parliament in 
England. They are not the same group as the Yeoman Warders of the Tower of 
London. Readers who know of the Gilbert and Sullivan Opera named Yeoman of 
the Guard may also know that that name is in that Opera being wrongly 
applied to the Yeoman Warders of the Tower of London.

There are pictures on the following page.

http://en.wikipedia.org/wiki/Yeomen_of_the_Guard

http://en.wikipedia.org/wiki/Yeomen_Warders

http://en.wikipedia.org/wiki/Foot_Guards

Original issue reported on code.google.com by markus.icu on 22 Dec 2008 at 6:44

change some ARIB/AMD6 character names

ARIB-9003=U+26CE TRAFFIC WARNING just looks like another heavy exclamation 
point. We should propose changing its name to something descriptive of the 
shape rather than its Japanese TV Symbols semantics.

Please add comments for further proposed changes of names of characters in 
the encoding pipeline.

Original issue reported on code.google.com by markus.icu on 13 Dec 2008 at 6:54

add new SoftBank symbol IDs

SoftBank symbols have old and new IDs. The current data contains the old 
IDs. Add the new IDs and show them in the generated charts.

Original issue reported on code.google.com by markus.icu on 24 Nov 2008 at 10:16

More unifications of shapes and arrows?

See http://www.unicode.org/charts/symbols.html
especially
Geometrical Shapes http://www.unicode.org/charts/PDF/U25A0.pdf
Arrows http://www.unicode.org/charts/PDF/U2190.pdf
and http://www.unicode.org/charts/PDF/U27F0.pdf
and http://www.unicode.org/charts/PDF/U2900.pdf
Miscellaneous Symbols and Arrows 
http://www.unicode.org/charts/PDF/U2B00.pdf
(e.g. squares like e-B6B, e-B71, etc.)

See Open Issues: https://docs.google.com/Doc?docid=ddsrrpj5_44cnr2fj64

Original issue reported on code.google.com by markus.icu on 2 Dec 2008 at 6:11

count symbols that are proposed to be added

... and report to people involved in starting the font design

Do not count unified symbols nor in_proposal="no" symbols.

Maybe print at the bottom of the HTML charts?

Original issue reported on code.google.com by markus.icu on 27 Nov 2008 at 12:04

e-4C5 proposal

Per comment by KameOyaji:

http://groups.google.com/group/emoji4unicode/browse_thread/thread/b2e536ad983594
6

e-4C5: BOUTIQUE 109 should not be in the UTC proposal since it refers to a
fashion chain boutique store in Tokyo -- the one in Shibuya being most famous. 

The Softbank's name for this Emoji is: Shibuya. Not Shibuya 109. But it
still remains true that the image has 109 logo in it. 

I propose to 

1) change the name to SHIBUYA
2) move it to the Softbank specific section and move it out of the UTC proposal

Original issue reported on code.google.com by katmomoi on 6 Dec 2008 at 6:36

Move "OPENWAVE" to KDDI specific category

OPENWAVE:
http://www.unicode.org/~scherer/emoji4unicode/snapshot/full.html#e-B89

is a business logo and should be moved to KDDI specific section and be
removed from the UTC version of the table.

Original issue reported on code.google.com by katmomoi on 2 Dec 2008 at 11:40

check character names

- if unified with an existing character,
  verify that the name is the same as that character's
- otherwise, verify that the name is not already used in Unicode

Original issue reported on code.google.com by markus.icu on 24 Nov 2008 at 10:32

short version of the chart without carriers' Unicode/Shift-JIS/ISO-2022-JP codes

Optionally generate a shorter version of the chart, with fewer details,
less scrolling, fewer printed pages.

We don't seem to need the carrier codes except for examining cross-mappings.

Original issue reported on code.google.com by markus.icu on 4 Dec 2008 at 12:36

generate font designer chart and data

Generate an HTML chart suitable for a font designer.

Add support for <design> sub-elements of <e> in emoji4unicode.xml.

Show Unicode code points for the new symbols font, instead of symbol IDs. 
Probably something like base+ID where base is a Unicode Private Use code 
point (Maybe U+E000 if we want to use BMP code points, or U+FF000 if we 
want to avoid collisions with other PUA uses.) The "ID" is our symbol ID. 
(A font designer will need a code point, and we can easily read our ID from 
that if we know the base.)

Hide irrelevant data, like carrier Unicode/Shift-JIS/JIS codes.

Original issue reported on code.google.com by markus.icu on 26 Nov 2008 at 11:17

add AMD6 character data

Add data (at least code points & names) for ISO 10646 AMD6 characters and 
check for character name problems.

Original issue reported on code.google.com by markus.icu on 15 Dec 2008 at 5:51

Report which carrier's images we use for the symbol representation

For the font design, we want to know how many symbols use representative
images from which carrier. Also, given some existing or assumed resources
for some of the symbol sets, we want to count remaining symbols where we
may need to start from scratch.

Original issue reported on code.google.com by markus.icu on 5 Dec 2008 at 8:29

per-symbol chart anchors?

Consider adding per-symbol anchors into the generated HTML charts to make 
it easy to jump to a symbol row via #e-4B0 or similar.

Original issue reported on code.google.com by markus.icu on 24 Nov 2008 at 10:25

change SoftBank Japanese names to use official SoftBank symbol names

We received data with the official names.

Original issue reported on code.google.com by markus.icu on 2 Dec 2008 at 1:04

check that all symbols for each carrier are covered by emoji4unicode.xml

covered by round-trip mappings with symbols

Original issue reported on code.google.com by markus.icu on 27 Nov 2008 at 12:07

e-7D5: SKIER

The ARIB-9138 (U+26F7) character has a ski with a person on it.
All 3 carriers have just a ski and a boot on it. If the ARBI name were
something like "SKIING", this would be acceptable but the proposed name is
"SKIER". Unless the ARIB name can be changed, I suggest that we keep it as
SKIING or something like "A SKI AND A BOOT".

Original issue reported on code.google.com by katmomoi on 16 Dec 2008 at 1:58

Reconsider unification of e-4B4 (hospital)

e-4B4: BLACK CROSS ON SHIELD

This unification was made in r65: 

http://code.google.com/p/emoji4unicode/source/detail?r=65

There is a new Unicode character ARIB-9109 (U+26E8) but this is a black
cross within a shield. If you look at Emoji for 3 carriers, you see that
all of them have a house and a cross on it. 

I feel that we should not unify this unless the ARIB proposal name can be
changed to something like "HOSPITAL CROSS".

Original issue reported on code.google.com by katmomoi on 16 Dec 2008 at 1:49

should softbank #239 #old45 be generic clock e-02A instead of 10 oclock e-027?

While DoCoMo and KDDI have just a single "clock symbol", SoftBank has 12 
symbols, one per full hour. We currently round-trip e-027 "10 oclock" with 
SoftBank #239=#old45 which makes sense given the continuity in the old 
numbers. However, the current SoftBank #239 image is quite different from 
their other clocks, and both the (new?) image and the new numbers (they now 
jump from #369 "9 oclock" to #370 "11 oclock") suggest that they have 
changed their preference.

It seems like we should have a round-trip between e-02A "clock symbol" and 
SoftBank #239=#old45, and a fallback from e-027 "10 oclock" to that same 
SoftBank symbol.

Original issue reported on code.google.com by markus.icu on 5 Dec 2008 at 5:00

Unify e-B82 with ARIB-9071 squared key?

Unify e-B82 key/password with ARIB-9071=U+26BF, which is a key inside a 
square?

Original issue reported on code.google.com by markus.icu on 25 Nov 2008 at 5:28

correct spelling error: e-037: OPHIUCUS

A suggestion from: Werner LEMBERG <[email protected]> to the [email protected].

"Maybe a typo: Ophiucus is normally written as Ophiuchus (at least in
the Astronomical world)."

Sometimes we see the spelling OPHIUCUS but the majority of web references use:

OPHIUCHUS

I propose to rename this entry to OPHIUCHUS.

Original issue reported on code.google.com by katmomoi on 18 Dec 2008 at 7:30

Unify e-4BC FOUNTAIN with ARIB-9125?

I think e-4BC FOUNTAIN can be unified with ARIB 9125 = new U+26F2

Original issue reported on code.google.com by [email protected] on 12 Dec 2008 at 1:19

misspelled character names

James Kass reports the following misspellings:
e-355 SPEAK NO EVIL MONEY
e-502 BOOK WITH VERICAL FILL

Original issue reported on code.google.com by markus.icu on 19 Dec 2008 at 6:56

Cross-map with WingDings, starting points for glyphs?

From Asmus Freytag 2008-feb-01:

Clock faces, computer/document icons, as well as a rather significant
number of other symbols are present in the suite of wingdings fonts
distributed by Microsoft. A cross mapping to these would be a useful
exercise - not the least because these fonts represent existing black
and white interpretations of the glyph shape(s) for such symbols. These
glyphs might represent possible starting points for representative
glyphs, should these characters be encoded.

See Open Issues: https://docs.google.com/Doc?docid=ddsrrpj5_44cnr2fj64

Original issue reported on code.google.com by markus.icu on 2 Dec 2008 at 6:11

Add a revision author name in emoji4unicode.xml

For correctness sake, add the name of "Satoru Takabayashi (revisions)"
under "Main authors:" in trunk/data/emoji4unicode.xml

Original issue reported on code.google.com by katmomoi on 27 Nov 2008 at 9:13

change index finger unifications

In discussing issue #16 Mark/Kat/Markus looked at the adjacent e-B98..e-B9C
symbols with hands and index fingers pointing in various directions.

We noticed that e-B98 shows the palm of the hand while the other four show
the back of the hand.

We agreed to unify e-B98 -- rather than e-B99 -- with U+261D WHITE UP
POINTING INDEX because the existing characters all show the palm of the hand.

We agreed to disunify e-B99..e-B9C from existing characters and give the
new symbols names like the existing ones but with "BACKHAND" inserted.

Original issue reported on code.google.com by markus.icu on 4 Dec 2008 at 9:12

Is e-1E3 part of the proposal?

e-1E3 does not round-trip to any carrier and does not have any
representation. I assume it's a Google-created symbol. If so, then it
should be in_proposal="no".

Original issue reported on code.google.com by markus.icu on 5 Dec 2008 at 1:09

do e-B7C..e-B80 belong into the "School (7. Abstract Concepts)" subcategory?

They look to me like they are phone IME status indicators and belong into
"Communication (4. Artifacts)" or "Phone Specific (5.
Activities/Work/Entertainment)".

Original issue reported on code.google.com by markus.icu on 5 Dec 2008 at 1:16

chart: show whether unification is with existing character or upcoming one

The charts should show whether we are unifying with an existing (Unicode 
5.1) character or an upcoming one (Unicode 5.2/AMD6). For upcoming ones we 
should not show the actual characters because there won't be fonts for 
them.

Original issue reported on code.google.com by markus.icu on 16 Dec 2008 at 10:45

Should e-83C PDC be part of the proposal?

Its description says "Personal Digital Cellular Symbol/Logo. ?"

Original issue reported on code.google.com by markus.icu on 2 Dec 2008 at 6:16

More unifications with enclosed letters?

Notes from feedback from UTC meeting 2008-feb-05:
KDDI 279 Top Secret Sign =U+3299?
Softbank 201 Existence Sign =U+3292
Softbank 203 Monthly Sign =U+328A (there may be additional characters in 
the vicinity of U+328A that can be unified with other symbols in question 
here)
KDDI 384 Service Sign =U+32DA?
KDDI 402 Celebration Sign =U+3297?

TODO: Need to verify unifications and apply them in the table.

Discuss: Any further feedback on the proposed unifications? Characters like 
U+3299 are much less styled (they look like the Han characters with 
enclosing circle) than the symbols in the Emoji context.

Peter Edberg: Docs use KDDI #279 & U+3299 together
Mark: Consider whether plain text distinction
Peter: Maybe VS?
Markus: JIS source separation but otherwise unify

See Open Issues: https://docs.google.com/Doc?docid=ddsrrpj5_44cnr2fj64

Original issue reported on code.google.com by markus.icu on 2 Dec 2008 at 6:14

pick a Shift-JIS mapping table and verify source separation

no intersection of Emoji unifications with Shift-JIS round-trips

not clear if this is the right thing to do if the Shift-JIS round-trip is 
outside the JIS X 0208 part

Original issue reported on code.google.com by markus.icu on 24 Nov 2008 at 10:33

scissor in hand vs. victory hand

e-B94 SCISSOR IN HAND GAME is currently disunified from U+270C VICTORY HAND

According to my notes from the 2008-aug UTC meeting:
Ken agreed with the disunification because they are distinct gestures in 
the real world.
Mark recommended to unify now, and if the distinction is really needed 
later, then add a specific character at that time.
The meeting did not end conclusively on this topic.

Original issue reported on code.google.com by markus.icu on 2 Dec 2008 at 6:10

propose code points for new symbol characters

For the encoding proposal, new symbol characters will need proposed Unicode 
code points. We should try to fill existing blocks with similar symbols, 
and otherwise add symbols on plane 1.

Original issue reported on code.google.com by markus.icu on 2 Dec 2008 at 5:11

show geta mark when there is no carrier symbol and no text_fallback

In the HTML chart, in a per-carrier-per-symbol cell, if there is no 
corresponding carrier symbol and there is also no text_fallback for the 
symbol, show the geta mark (U+3013 〓) in text fallback style, rather than 
"-".

This is how charts used to show this information with previous tools. 
(Before this project was created.)

Original issue reported on code.google.com by markus.icu on 27 Nov 2008 at 7:25

decorative length marks

According to my notes from the 2008-aug UTC meeting, the discussion ended 
with Ken's recommendation to unify e-B07 wavy length mark with U+3030 wavy 
dash (without change in properties), and to encode e-B08 looped length mark 
as a new compatibility character.

Original issue reported on code.google.com by markus.icu on 2 Dec 2008 at 6:09

review comments in the chart about possible unifications

Look for "May unify with..." or "Possibly unify with..." and similar 
comments.

For example, one of the happy faces should be unified with U+263A.

Original issue reported on code.google.com by markus.icu on 5 Dec 2008 at 5:40

source separation errors: arrows U+2190..3

emoji4unicode/src$ ./emoji4unicode_test.py
source separation error: e-AF8 = U+2191 = Shift-JIS-81AA
source separation error: e-AF9 = U+2193 = Shift-JIS-81AB
source separation error: e-AFA = U+2192 = Shift-JIS-81A8
source separation error: e-AFB = U+2190 = Shift-JIS-81A9

They are e-AF8 UPWARDS ARROW .. e-AFB LEFTWARDS ARROW.

Disunify and rename with prefix "HEAVY"?

Original issue reported on code.google.com by markus.icu on 6 Dec 2008 at 12:56

format <ann>otations like in the Unicode book; check their syntax

Check that our <ann>otations follow the syntax of a small subset of the 
Unicode NamesList.txt file format. (Except that we don't use the TAB which 
starts most CHAR_ENTRY lines; in an <ann> element that's implied.)

Format the recognized types of annotations as described in NamesList.html. 
For example, it says there:

COMMENT_LINE: TAB "*" SP EXPAND_LINE
    // * is replaced by BULLET, output line as comment

Original issue reported on code.google.com by markus.icu on 15 Dec 2008 at 8:45

circles and mappings with softbank #205..207

Should the SoftBank symbols #205, #206, #207 (見出しボタン 1..3) map to e-
B63=KDDI #40, e-B64=KDDI #41 and e-B67 as the data currently does?

These symbols are used as menu selectors.

Also review the name of e-B67 with regard to being in a series with the 
other two SoftBank symbols.

Original issue reported on code.google.com by markus.icu on 25 Nov 2008 at 5:25

rename the 4 operators in "Arithmetic Operators (7. Abstract Concepts)"?

Should we rename the 4 operators to "HEAVY" followed by exactly the names
of the corresponding normal characters?

That would mean
e-B51 HEAVY PLUS -> HEAVY PLUS SIGN
e-B52 HEAVY MINUS -> HEAVY HYPHEN-MINUS
e-B53 HEAVY TIMES -> HEAVY MULTIPLICATION SIGN
e-B54 HEAVY DIVISION -> HEAVY DIVISION SIGN

Original issue reported on code.google.com by markus.icu on 5 Dec 2008 at 1:22

rename symbols that are currently "xyz 2"

For example, we have HOURGLASS and HOURGLASS 2. It would be good to have 
somewhat more interesting name variations than appending "2".

Original issue reported on code.google.com by markus.icu on 5 Dec 2008 at 6:18

Correct an error in Softbank Emoji name

e-7E5: The original Softbank data says the name is ｢RV者」. 

This is clearly a typo. I propose to correct this to 「RV車」.

Original issue reported on code.google.com by katmomoi on 13 Dec 2008 at 7:46

The name for Four Leafed Clover seems odd

... isn't it always called a Four Leaf Clover?

asks David Starner

Original issue reported on code.google.com by markus.icu on 22 Dec 2008 at 7:04

Rename flags to FLAG SYMBOL XX

According to my notes from the 2008-aug UTC meeting, there was consensus to 
rename the flags from
FLAG OF JAPAN
to
FLAG SYMBOL JP
etc.
(using 2-letter country codes)

There was an open issue about what to use for representative glyphs -- real 
flags, or flags with the 2-letter codes in them, or something else.

Original issue reported on code.google.com by markus.icu on 2 Dec 2008 at 6:15

public review 2008-dec

We should post a public review of the chart by mid-December 2008 and invite 
feedback. I will start a cover page on the unicodesymbols site.

Original issue reported on code.google.com by markus.icu on 8 Dec 2008 at 9:36

some additions to font/glyph design instructions

This issue is intended for the collection of various additions of 
font/glyph design instructions via <design> sub-elements of appropriate <e> 
elements in emoji4unicode.xml. We can check in several of these together.

e-83B KEYPAD 10: Should look like digits 10 enclosed by a keycap like 
U+20E3.
(We should move KEYPAD 10 to immediately after KEYPAD 0.)

Original issue reported on code.google.com by markus.icu on 5 Dec 2008 at 5:12

propose additional annotations for existing/upcoming Unicode characters

We have annotations on symbols that are unified with existing characters. 
We should propose those annotations as additional annotations on their 
characters.

We can also collect additional annotations for other existing/upcoming 
characters in this issue.

Original issue reported on code.google.com by markus.icu on 15 Dec 2008 at 8:26