Unicode::Emoji
Provides Unicode Emoji data and regexes, incorporating the latest Unicode and Emoji standards.
Also includes a categorized list of recommended Emoji.
Emoji version: 15.0 (September 2022)
CLDR version (used for sub-region flags): 43 (April 2023)
Supported Rubies: 3.2, 3.1, 3.0
No longer supported Rubies, but might still work: 2.7, 2.6, 2.5, 2.4, 2.3
If you are stuck on an older Ruby version, checkout the latest 0.9 version of this gem.
Gemfile
gem "unicode-emoji"
Usage
Regex
The gem includes a bunch of Emoji regexes, which are compiled out of various Emoji Unicode data sources.
require "unicode/emoji"
string = "String which contains all kinds of emoji:
- Singleton Emoji: ๐ด
- Textual singleton Emoji with Emoji variation: โถ๏ธ
- Emoji with skin tone modifier: ๐๐ฝ
- Region flag: ๐ต๐น
- Sub-Region flag: ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ
- Keycap sequence: 2๏ธโฃ
- Sequence using ZWJ (zero width joiner): ๐คพ๐ฝโโ๏ธ
"
string.scan(Unicode::Emoji::REGEX) # => ["๐ด", "โถ๏ธ", "๐๐ฝ", "๐ต๐น", "๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ", "2๏ธโฃ", "๐คพ๐ฝโโ๏ธ"]
Main Regexes
Matches (non-textual) Emoji of all kinds:
Regex | Description | Example Matches | Example Non-Matches |
---|---|---|---|
Unicode::Emoji::REGEX |
Use this if unsure! Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of recommended Emoji sequences | ๐ด , โถ๏ธ , ๐๐ฝ , ๐ต๐น , 2๏ธโฃ , ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ , ๐คพ๐ฝโโ๏ธ |
๐ด๏ธ , โถ , ๐ป , ๐ต๐ต , ๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ , ๐ค โ๐คข |
Unicode::Emoji::REGEX_VALID |
Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences | ๐ด , โถ๏ธ , ๐๐ฝ , ๐ต๐น , 2๏ธโฃ , ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ , ๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ , ๐คพ๐ฝโโ๏ธ , ๐ค โ๐คข |
๐ด๏ธ , โถ , ๐ป , ๐ต๐ต |
Unicode::Emoji::REGEX_WELL_FORMED |
Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of well-formed Emoji sequences | ๐ด , โถ๏ธ , ๐๐ฝ , ๐ต๐น , 2๏ธโฃ , ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ , ๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ , ๐คพ๐ฝโโ๏ธ , ๐ค โ๐คข , ๐ต๐ต |
๐ด๏ธ , โถ , ๐ป |
Picking the Right Emoji Regex
- Usually you just want
REGEX
(RGI set) - If you want broader matching (e.g. more sub-regions), choose
REGEX_VALID
- If you even want to match for invalid sequences, too, use
REGEX_WELL_FORMED
Please see the standard for details.
Property | REGEX (RGI / Recommended) |
REGEX_VALID (Valid) |
REGEX_WELL_FORMED (Well-formed) |
---|---|---|---|
Region "๐ต๐น" | Yes | Yes | Yes |
Region "๐ต๐ต" | No | No | Yes |
Tag Sequence "๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ" | Yes | Yes | Yes |
Tag Sequence "๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ" | No | Yes | Yes |
Tag Sequence "๐ด๓ ง๓ ข๓ ก๓ ก๓ ก๓ ฟ" | No | No | Yes |
ZWJ Sequence "๐คพ๐ฝโโ๏ธ" | Yes | Yes | Yes |
ZWJ Sequence "๐ค โ๐คข" | No | Yes | Yes |
More info about valid vs. recommended Emoji in this blog article on Emojipedia.
Singleton Regexes
Matches only simple one-codepoint (+ optional variation selector) Emoji:
Regex | Description | Example Matches | Example Non-Matches |
---|---|---|---|
Unicode::Emoji::REGEX_BASIC |
Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | ๐ด , โถ๏ธ |
๐ด๏ธ , โถ , ๐ป , ๐๐ฝ , ๐ต๐น , ๐ต๐ต ,2๏ธโฃ , ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ , ๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ , ๐คพ๐ฝโโ๏ธ , ๐ค โ๐คข |
Unicode::Emoji::REGEX_TEXT |
Matches only textual singleton Emoji (except for singleton components, like digit 1) | ๐ด๏ธ , โถ |
๐ด , โถ๏ธ , ๐ป , ๐๐ฝ , ๐ต๐น , ๐ต๐ต ,2๏ธโฃ , ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ , ๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ , ๐คพ๐ฝโโ๏ธ , ๐ค โ๐คข |
Include Textual Emoji
By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes. However, if you wish to match for them too, you can include them in your regex by appending the _INCLUDE_TEXT
suffix:
Regex | Description | Example Matches | Example Non-Matches |
---|---|---|---|
Unicode::Emoji::REGEX_INCLUDE_TEXT |
REGEX + REGEX_TEXT |
๐ด , โถ๏ธ , ๐๐ฝ , ๐ต๐น , 2๏ธโฃ , ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ , ๐คพ๐ฝโโ๏ธ , ๐ด๏ธ , โถ |
๐ป , ๐ต๐ต , ๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ , ๐ค โ๐คข |
Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT |
REGEX_VALID + REGEX_TEXT |
๐ด , โถ๏ธ , ๐๐ฝ , ๐ต๐น , 2๏ธโฃ , ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ , ๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ , ๐คพ๐ฝโโ๏ธ , ๐ค โ๐คข , ๐ด๏ธ , โถ |
๐ป , ๐ต๐ต |
Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT |
REGEX_WELL_FORMED + REGEX_TEXT |
๐ด , โถ๏ธ , ๐๐ฝ , ๐ต๐น , 2๏ธโฃ , ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ , ๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ , ๐คพ๐ฝโโ๏ธ , ๐ค โ๐คข , ๐ต๐ต , ๐ด๏ธ , โถ |
๐ป |
Extended Pictographic Regex
Unicode::Emoji::REGEX_PICTO
matches single codepoints with the Extended_Pictographic property. For example, it will match โ
BLACK SAFETY SCISSORS.
Unicode::Emoji::REGEX_PICTO_NO_EMOJI
matches single codepoints with the Extended_Pictographic property, but excludes Emoji characters.
See character.construction/picto for a list of all non-Emoji pictographic characters.
Partial Regexes
Matches potential Emoji parts (often, this is not what you want):
Regex | Description | Example Matches | Example Non-Matches |
---|---|---|---|
Unicode::Emoji::REGEX_ANY |
Matches any Emoji-related codepoint (but no variation selectors, tags, or zero-width joiners). Please not that this will match Emoji-parts rather than complete Emoji, for example, single digits! | ๐ด , โถ , ๐ป , ๐ , ๐ฝ , ๐ต , ๐น , 2 , ๐ด , ๐คพ , โ , ๐ค , ๐คข |
- |
List
Use Unicode::Emoji::LIST
or the list method to get a grouped (and ordered) list of Emoji:
Unicode::Emoji.list.keys
# => ["Smileys & Emotion", "People & Body", "Component", "Animals & Nature", "Food & Drink", "Travel & Places", "Activities", "Objects", "Symbols", "Flags"]
Unicode::Emoji.list("Food & Drink").keys
# => ["food-fruit", "food-vegetable", "food-prepared", "food-asian", "food-marine", "food-sweet", "drink", "dishware"]
Unicode::Emoji.list("Food & Drink", "food-asian")
=> ["๐ฑ", "๐", "๐", "๐", "๐", "๐", "๐", "๐ ", "๐ข", "๐ฃ", "๐ค", "๐ฅ", "๐ฅฎ", "๐ก", "๐ฅ", "๐ฅ ", "๐ฅก"]
Please note that categories might change with future versions of the Emoji standard. This gem will issue warnings when attempting to retrieve old categories using the #list
method.
A list of all Emoji can be found at character.construction.
Properties
Allows you to access the codepoint data form Unicode's emoji-data.txt file:
require "unicode/emoji"
Unicode::Emoji.properties "โ" # => ["Emoji", "Emoji_Modifier_Base"]
Also See
- Unicodeยฎ Technical Standard #51
- Emoji categories
- Ruby gem which displays Emoji sequence names
- Part of unicode-x
MIT
- Copyright (C) 2017-2023 Jan Lelis https://janlelis.com. Released under the MIT license.
- Unicode data: https://www.unicode.org/copyright.html#Exhibit1