Git Product home page Git Product logo

php-pinyin's Introduction

php-pinyin

A PHP extension converting Chinese characters to Pinyin.

一个来自百度的汉字转拼音PHP扩展,其他的汉字转拼音方案存在两个问题:

  1. 可转的汉字数有限,几千个左右
  2. 不能解决多音字问题

Installation

Currently you have two ways to use php-pinyin. One depends on PHP-CPP, while another one is plain php extenstion which works with php 7.x. (For php 5.x support, please checkout the branch legacy)

Method with PHP-CPP

Main improvements:

  • Depend PHP-CPP, an awesome library which wrapper Zend Engine with friendly api
  • Support PHP 7
  • This time we support UTF-8 and GBK encoding
  • Add ini_setting (pinyin.dict_path and pinyin.dict_tone), you shoud not loadDict yourself.

Install

  1. Install PHP-CPP or its LEGACY Version. Before that, you need to change the Makefile,,, because PHP-CPP was written with C++11, but libpinyin was written with C++98,,, So you should build PHP-CPP with -D_GLIBCXX_USE_CXX11_ABI=0 option, which means "Do not use Cxx11's Application Binary Interface"
  2. cd /path/to/php-pinyin/cpp-ext
  3. make
  4. make install

Method without PHP-CPP

This is upgraded from old php-pinyin for php 5.x.

Install

  1. cd /path/to/php-pinyin/ext
  2. /path/to/php/bin/phpize
  3. ./configure --with-php-config=/path/to/php/bin/php-config --with-baidu-pinyin=/path/to/pinyin
  4. make
  5. make install

Here /path/to/pinyin is the directory where you copied libpinyin to.

Usage

$obj  = new Pinyin();

// UTF-8
var_dump($obj->convert("重庆重量"));
var_dump($obj->multiConvert(array("重庆南京市长江大桥财务会议会计")));

// GBK
var_dump($obj->multiConvert(array(iconv("UTF-8", "GBK", "重庆"), iconv("UTF-8", "GBK", "重量"))));

Results will be:

string(22) "chong'qing'zhong'liang"
array(1) {
  [0] =>
  string(65) "chong'qing'nan'jing'shi'chang'jiang'da'qiao'cai'wu'hui'yi'kuai'ji"
}
array(2) {
  [0] =>
  string(10) "chong'qing"
  [1] =>
  string(11) "zhong'liang"
}
array(1) {
  [0] =>
  string(29) "zhong'hua'ren'min'gong'he'guo"
}

If you want to get the Abbr. of the whole pinyin-string, you can simply do this:

echo preg_replace("/\'([a-zA-Z])[0-9a-zA-Z]*/e", "strtoupper('$1')", "'".$py_string);

This lib only support Chinese characters and english letters, or else it will return false. So you can write a safeConvert function to avoid this.

$p = new Pinyin();
function safeConvert($word, $pyOnly = true) {
    global $p;
    // UTF-8 regex for Chinese
    $result = preg_match_all("/([\x{4e00}-\x{9fa5}]+)/iu", $word, $matches);
    if(!$result) {
        throw new \Exception("No Chinese characters in word");
    }

    $pys = $p->multiConvert($matches[1]);
    if($pyOnly == true) {
        return implode("'", $pys);
    } else {
        return str_replace($matches[1], $pys, $word);
    }
}

If you want to customize dict-files yourself and then convert them to binary-format again, do it like this:

$result = $obj->generateDict("/home/work/local/pinyin/dict/dict.txt", "/home/work/tmp/dict.dat");

if($result) echo "Generate complete";

Feedback

Issues and contributions are welcome.

Thank you!

php-pinyin's People

Contributors

guweigang avatar ideal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

php-pinyin's Issues

Mac 10.9.5下安装出错,请问还需要什么依赖

checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for a sed that does not truncate output... /usr/bin/sed
checking for cc... cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether cc accepts -g... yes
checking for cc option to accept ISO C89... none needed
checking how to run the C preprocessor... cc -E
checking for icc... no
checking for suncc... no
checking whether cc understands -c and -o together... yes
checking for system library directory... lib
checking if compiler supports -R... no
checking if compiler supports -Wl,-rpath,... yes
checking build system type... i386-apple-darwin13.4.0
checking host system type... i386-apple-darwin13.4.0
checking target system type... i386-apple-darwin13.4.0
checking for PHP prefix... /usr
checking for PHP includes... -I/usr/include/php -I/usr/include/php/main -I/usr/include/php/TSRM -I/usr/include/php/Zend -I/usr/include/php/ext -I/usr/include/php/ext/date/lib
checking for PHP extension directory... /usr/lib/php/extensions/no-debug-non-zts-20100525
checking for PHP installed headers prefix... /usr/include/php
checking if debug is enabled... no
checking if zts is enabled... no
checking for re2c... re2c
checking for re2c version... 0.13.6 (ok)
checking for gawk... no
checking for nawk... no
checking for awk... awk
checking if awk is broken... no
checking for pinyin support... yes, shared
checking for baidu-pinyin library support... yes, shared
checking for pinyin files in default path... not found
configure: error: Please reinstall the baidu pinyin distribution

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.