Git Product home page Git Product logo

emoji-detector-php's Introduction

Emoji Detection

This library will find all emoji in an input string and return information about each emoji character. It supports emoji with skin tone modifiers, as well as the composite emoji that are made up of multiple people.

The current version supports Emoji version 15.1 (Sept 2023)

You can see a catalog of the emoji data here:

Installation

composer require p3k/emoji-detector

Or include src/Emoji.php in your project, and make sure the map.json and regexp.json files are available in the same folder as Emoji.php. You don't need any of the other files for use in your own projects.

Usage

Detect Emoji

$input = "Hello ๐Ÿ‘๐Ÿผ World ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ";
$emoji = Emoji\detect_emoji($input);

print_r($emoji);

The function returns an array with details about each emoji found in the string.

Array
(
    [0] => Array
        (
            [emoji] => ๐Ÿ‘๐Ÿผ
            [short_name] => +1
            [num_points] => 2
            [points_hex] => Array
                (
                    [0] => 1F44D
                    [1] => 1F3FC
                )
            [hex_str] => 1F44D-1F3FC
            [skin_tone] => skin-tone-3
            [byte_offset] => 6
            [grapheme_offset] => 6
        )
    [1] => Array
        (
            [emoji] => ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ
            [short_name] => man-woman-boy-boy
            [num_points] => 7
            [points_hex] => Array
                (
                    [0] => 1F468
                    [1] => 200D
                    [2] => 1F469
                    [3] => 200D
                    [4] => 1F466
                    [5] => 200D
                    [6] => 1F466
                )
            [hex_str] => 1F468-200D-1F469-200D-1F466-200D-1F466
            [skin_tone] =>
            [byte_offset] => 21
            [grapheme_offset] => 14
        )
)
  • emoji - The emoji sequence found, as the original byte sequence. You can output this to show the original emoji.
  • short_name - The short name of the emoji, as defined by Slack's emoji data.
  • num_points - The number of unicode code points that this emoji is composed of.
  • points_hex - An array of each unicode code point that makes up this emoji. These are returned as hex strings. This will also include "invisible" characters such as the ZWJ character and skin tone modifiers.
  • hex_str - A list of all unicode code points in their hex form separated by hyphens. This string is present in the Slack emoji data array.
  • skin_tone - If a skin tone modifier was used in the emoji, this field indicates which skin tone, since the short_name will not include the skin tone.
  • byte_offset - The position of the emoji in the string in bytes, used with the plain str* functions
  • grapheme_offset - The position of the emoji in the string, counting each emoji as 1 char, used with the grapheme_* functions

You can use the grapheme_* functions to extract parts of the string using the grapheme_offset position returned. For example:

$string = "Treลกnja ๐Ÿ’";
$emoji = Emoji\detect_emoji($string);
echo '.'.grapheme_substr($string, 0, $emoji[0]['grapheme_offset']).".\n";
echo '.'.substr($string, 0, $emoji[0]['byte_offset']).".\n";
// Both output ".Treลกnja ."

Replace emoji with string representations

$string = Emoji\replace_emoji('I like ๐ŸŒฎ and ๐ŸŒฏ', ':', ':');
echo $string;
// I like :taco: and :burrito:

Test if a string is a single emoji

Since simply counting the number of unicode characters in a string does not tell you how many visible emoji are in the string, determining whether a single character is an emoji is more involved. This function will return the emoji data only if the string contains a single emoji character, and false otherwise.

$emoji = Emoji\is_single_emoji('๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ');
print_r($emoji);
Array
(
    [emoji] => ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ
    [short_name] => man-woman-boy-boy
    [num_points] => 7
    [points_hex] => Array
        (
            [0] => 1F468
            [1] => 200D
            [2] => 1F469
            [3] => 200D
            [4] => 1F466
            [5] => 200D
            [6] => 1F466
        )

    [hex_str] => 1F468-200D-1F469-200D-1F466-200D-1F466
    [skin_tone] =>
    [byte_offset] => 0
    [grapheme_offset] => 0
)
$emoji = Emoji\is_single_emoji('๐Ÿ˜ป๐Ÿˆ');
// false

Remove emoji from a string

You can remove all emoji from a string with this function, optionally removing trailing spaces.

$string = "I like ๐ŸŒฎ and ๐ŸŒฏ";
echo Emoji\remove_emoji($string);
// "I like  and "
echo Emoji\remove_emoji($string, ['collapse' => true]);
// "I like and";

Updates

When a new emoji set is released, this library will need to be updated with the new unicode points and names. The source of the emoji data is iamcal/emoji-data, so first check there for the latest updates. You can build the new source files this library uses with the following command:

composer build

Tests

A comprehensive set of tests is available to ensure things are working as expected, including tests for the new emoji added in new emoji versions. You can run the tests with the following command:

composer test

License

Made with โค๏ธ by Aaron Parecki.

Copyright 2017-2024 by Aaron Parecki. Available under the MIT license.

Emoji data sourced from iamcal/emoji-data under the MIT license.

Emoji parsing regex sourced from EmojiOne under the MIT license.

emoji-detector-php's People

Contributors

aaronpk avatar aksafan avatar dmongeau avatar intoeetive avatar peter279k avatar sebsel avatar twofed1 avatar zegnat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emoji-detector-php's Issues

add `strlen()`/`mb_strlen()` to `is_single_emoji()`

Before throwing in my whole blogpost to see if it's a single emoji, I first checked if strlen($txt) < 10. It felt like a good thing to do with performance in mind. It turned out that the strlen of ๐Ÿณ๏ธโ€๐ŸŒˆ is actually 14, so I changed it into mb_strlen(), which has 4 for rainbow_flag.

It might be nice to add that check to is_single_emoji() itself.

It now starts with a detect_emoji(), which uses the new monster regex on the full string. If a blogpost is 5000+ chars, it's surely not a single emoji.

I don't know if adding if(mb_strlen($string) > 10) return false; is the right way to go here. What is the longest emoji there is now? I found the following very long ones in the map.json:

  • "1F469-200D-2764-FE0F-200D-1F48B-200D-1F469":"woman-kiss-woman"
  • "1F468-200D-2764-FE0F-200D-1F48B-200D-1F468":"man-kiss-man"

Both are 8 in mb_strlen. So >= 8 should be fine already.

Not all emojis.

Great library!

I have only one issue: some "older" emojis such as โค are not detected. I know this is a general problem of the definition "emoji", but what is the difference between โค and ๐Ÿฐ๐Ÿ’๐Ÿ’‘? I think none, they're all emojis to me. However:

  • ๐Ÿฐ๐Ÿ’๐Ÿ’‘ are detected
  • โค is not detected

Non-qualified skin variations are not recognized

I saw that support for non-qualified emojis was added recently, but it appears that non-qualified skin variations are still unrecognized.

For example, 1F937 1F3FD 200D 2642 is defined in https://unicode.org/Public/emoji/14.0/emoji-test.txt, but 1F937-1F3FD-200D-2642 is not in map.json.


I tried replacing the following code in the generator, which did regenerate the map properly.

  if(isset($emoji['skin_variations'])) {
    foreach($emoji['skin_variations'] as $key=>$var) {
      $map[$var['unified']] = $short_name;
    }
  }

to

  if(isset($emoji['skin_variations'])) {
    foreach($emoji['skin_variations'] as $key=>$var) {
      $map[$var['unified']] = $short_name;

      if(isset($var['non_qualified'])) {
        $map[$var['non_qualified']] = $short_name;
      }
    }
  }

However, I started getting errors about the regex being too large.

New Version with current Emojis

@aaronpk the last version from this lib is quite old and there are some new commits with new emojis already merged.

A new version of this library would be much appreciated.

Error when passing in unrecognizable emojis

\Emoji\detect_emoji($string) seems to return an error if there are some number of unrecognizable emojis (or so thats my assumption).

Screen Shot 2021-09-02 at 5 00 13 PM

Unfortunately, i do not know what emojis are causing this, but from our logs we can see here which strings are failing if that gives you any clues.
Screen Shot 2021-09-02 at 5 02 48 PM

Add a function to strip all emojis

I'm using this library to strip all emojis from a string.

Here's the code, if you want to create a function for it in this package.

$emojis = \Emoji\detect_emoji($string);

foreach (array_reverse($emojis) as $emoji) {
  $length = strlen($emoji['emoji']);
  $start = substr($string, 0, $emoji['byte_offset']);
  $end = substr($string, $emoji['byte_offset'] + $length, strlen($string) - ($emoji['byte_offset'] + $length));
  $string = $start . $end;
}

Some emojis are not detected properly

โ˜ - this is not recognised as emoji

ยฉ - but this on is recognised as emoji, is it mistaken with ยฉ๏ธ
ยฎ - but this on is recognised as emoji, is it mistaken with ยฎ๏ธ
โ„ข - but this on is recognised as emoji, is it mistaken with โ„ข๏ธ

PHP 5.3 Compatibility

We're using this now in Semantic Linkbacks. This library uses short array notation. Being as this is the only issue for PHP 5.3 compatibility, would you consent to not using short array notation in order to encourage compatibility as it is not a breaking change?

7 | ERROR | [ ] Short array syntax (open) is available since 5.4
7 | ERROR | [ ] Short array syntax (close) is available since 5.4
19 | ERROR | [ ] Short array syntax (open) is available since 5.4
19 | ERROR | [ ] Short array syntax (close) is available since 5.4
32 | ERROR | [ ] Short array syntax (open) is available since 5.4
38 | ERROR | [ ] Short array syntax (close) is available since 5.4
44 | ERROR | [ ] Short array syntax (open) is available since 5.4
51 | ERROR | [ ] Short array syntax (close) is available since 5.4

Update map.json?

Hi,
It's not really an issue but I wonder if you frequently update map.json with new unicode emojis?
I use your lib in my science project and I need to parse every emojis I can find in Twitter.
BTW, thanks, this script is trully awesome for me :)

php-intl must be a package dependency

I've upgraded from v0.2.1 to v1.0 and I got some errors because grapheme_ functions were not available.

To avoid this, composer.json should list php-intl:* as a dependency (require).

Keep up the good work

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.