Git Product home page Git Product logo

pinyin-searcher's Introduction

PinyinSearcher

一个支持以汉字、拼音首字母、拼音前缀、非汉字串前缀、非汉字串后缀等及他们混合进行关键字搜索的jar包。

配置

  1. 引入依赖jar包:pinyin4j-2.5.0.jar
<dependency>
    <groupId>com.belerweb</groupId>
    <artifactId>pinyin4j</artifactId>
    <version>2.5.0</version>
</dependency>
  1. 引入pinyin_searcher.jar
#进入pinyin_searcher.jar所在目录
#将该第三方jar包打入本地Maven仓库
mvn install:install-file -Dfile=pinyin_searcher.jar -DgroupId=org.ken -DartifactId=searcher -Dversion=1.0.0 -Dpackaging=jar
<dependency>
    <groupId>org.ken</groupId>
    <artifactId>searcher</artifactId>
    <version>1.0.0</version>
</dependency>

使用方法

  1. 调用PinyinSearcher构造搜索器
  2. 调用match(关键字, 待搜的实体列表, 待搜实体字段名)进行搜索
  3. 返回匹配上的实体列表
List<Object> res = new PinyinSearcher().match("逍yao", beans, "name");

示例

//构造你的实体列表(这里必须是List<Object>)
List<Object> beans = new ArrayList<Object>();
beans.add(new YourBean(1, "李逍遥", "四川省成都市锦江区"));
beans.add(new YourBean(2, "李晓姚", "四川省成都市"));
beans.add(new YourBean(3, "李xiaoyao", "四川省自贡市"));
beans.add(new YourBean(4, "xiaoyao", "四川省南充市阆中"));
beans.add(new YourBean(5, "lixiao遥", "北京市海淀区"));
beans.add(new YourBean(6, "阳sunny光", "北京市朝阳区"));
beans.add(new YourBean(7, "阳sunny光好", "北京市朝阳区"));
beans.add(new YourBean(8, "阳sunnyguang", "北京"));
beans.add(new YourBean(9, "阳sunguang", "北京"));
beans.add(new YourBean(10, "阳光", "河北省保定市"));
beans.add(new YourBean(11, "", "河北省邢台市"));
beans.add(new YourBean(12, null, "河北省安新县"));

//调用拼音搜索器
PinyinSearcher searcher = new PinyinSearcher();
List<Object> res_name = searcher.match("xy", beans, "name"); 
List<Object> res_address = searcher.match("sichuans", beans, "address");

//得到搜索结果(从这里就可以写自己的业务了)
for(Object object : res_name) {
    YourBean bean = (YourBean) object; //取得匹配name的bean
    System.out.println(bean.getName());
}
for(Object object : res_address) {
    YourBean bean = (YourBean) object; //取得匹配address的bean
    System.out.println(bean.getAddress());
}

示例结果

name:

李逍遥
李晓姚

address:

四川省成都市锦江区
四川省成都市
四川省自贡市
四川省南充市阆中

支持的关键字形式

  • 全汉字:李 -> 李逍遥、李晓姚、李xiaoyao
  • 全拼音首字母:lxy -> 李逍遥、李晓姚
  • 全拼音前缀:lixia -> 李逍遥、李晓姚、李xiaoyao
  • 汉字 + 拼音首字母:李xy -> 李逍遥、李晓姚
  • 汉字 + 拼音前缀:李xi -> 李逍遥、李晓姚、李xiaoyao
  • 全非汉字前缀:sun -> 阳sunny光、阳sunnyguang、阳sun光、阳sunny光好
  • 全非汉字后缀:nny -> 阳sunny光、阳sunnyguang、阳sunny光好
  • 汉字 + 非汉字串前缀 :阳sun -> 阳sunny光、阳sunnyguang、阳sun光、阳sunny光好
  • 完整非汉字串 + 拼音前缀:sunnygu -> 阳sunny光、阳sunnyguang、阳sunny光好
  • 完整非汉字串 + 拼音首字母:sunnygh -> 阳sunny光好
  • 非汉字后缀 + 汉字:nny光 -> 阳sunny光、阳sunny光好
  • 还有很多组合形式这里就不一一列举了。

注意

  • 由于算法原因,目前暂时不支持对 *.?+$^(){}|/和中括号 的搜索

pinyin-searcher's People

Contributors

yaochenkun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pinyin-searcher's Issues

我来膜拜一下

做模糊查询时看了很多项目,大部分项目在中文转拼音预处理这个环节,算法是用一堆List堆出来的,我不得不在查询对象里保存这堆预处理的List,几千条数据每条都多了一大串List我想想都觉得恶心,后续关键字匹配过程也异常复杂而且臃肿,老实这些东西让我觉得很厌烦.但你这边竟然把它转变成正则表达式,这个算法真的太神奇了,简直是变魔术. 现在我只需要在查询对象里保存一段短短的字符串,而关键字匹配需要一次String.match就OK了.简直太神奇,非常感谢你!
另外,我看到你的PinyinSearcher.java里用到了很多反射,我想目的是为了兼容所有的实体类,所以通过反射来获取字段.我认为通过接口来做会更好.需要查询的对象implements查询接口,通过查询接口提供的
public String setSearchWord()来设置查询的字段.而外面提供的查询方法可以通过泛型<T extends 查询接口>,这样保持了灵活性,而且性能更高,更易懂.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.