Git Product home page Git Product logo

Comments (8)

hankcs avatar hankcs commented on May 23, 2024

请贴一下插件jar版本。

from hanlp.

hankcs avatar hankcs commented on May 23, 2024

看起来像是用4.x的Lucene插件放到5.x的Lucene里用。

from hanlp.

lsq88334753 avatar lsq88334753 commented on May 23, 2024

请问 :lucene5.2.1 (hanlp-portable-1.2.4, hanlp-solr-plugin-1.0)索引创建成功,搜索却无命中记录(搜索没报错,并且肯定有这个词)。
建立索引:

 Analyzer analyzer = new HanLPAnalyzer();////////////////////////////////////////////////////
 IndexWriterConfig config = new IndexWriterConfig(analyzer);
 config.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
 Directory directory = FSDirectory.open(Paths.get(INDEX_DIR));
 IndexWriter indexWriter = new IndexWriter(directory, config);

 Document document = new Document();
 document.add(new TextField("time", res, Store.YES));
 document.add(new TextField("content", content, Store.YES));
 indexWriter.addDocument(document);

 indexWriter.commit();
 closeWriter(indexWriter);

搜索

 Directory directory = FSDirectory.open(Paths.get(INDEX_DIR));
 Analyzer analyzer = new HanLPAnalyzer();//////////////////////////////////////////////
 IndexReader ireader = DirectoryReader.open(directory);
 IndexSearcher isearcher = new IndexSearcher(ireader);
 QueryParser parser = new QueryParser("content", analyzer);
 Query query = parser.parse(text);
 ScoreDoc[] hits = isearcher.search(query, 300000).scoreDocs;

当搜索 “time:[20140101 TO 20150101]” 有命中结果显示,而搜索 “被告人” 这个词命中结果为0个,这个词是一定有的。请问您知道是什么原因么?

from hanlp.

hankcs avatar hankcs commented on May 23, 2024

我测试正常

        Analyzer analyzer = new HanLPAnalyzer();////////////////////////////////////////////////////
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        config.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
        String INDEX_DIR = System.getProperty("java.io.tmpdir") + File.separator + "index";
        Directory directory = FSDirectory.open(Paths.get(INDEX_DIR));
        IndexWriter indexWriter = new IndexWriter(directory, config);

        Document document = new Document();
        document.add(new TextField("content", "被公诉机关指控涉嫌犯罪的当事人称作被告人。", Field.Store.YES));
        indexWriter.addDocument(document);

        document = new Document();
        document.add(new TextField("content", "商品和服务", Field.Store.YES));
        indexWriter.addDocument(document);

        document = new Document();
        document.add(new TextField("content", "和服的价格是每镑15便士", Field.Store.YES));
        indexWriter.addDocument(document);

        indexWriter.commit();
        indexWriter.close();

        IndexReader ireader = DirectoryReader.open(directory);
        IndexSearcher isearcher = new IndexSearcher(ireader);
        QueryParser parser = new QueryParser("content", analyzer);
        Query query = parser.parse("被告人");
        ScoreDoc[] hits = isearcher.search(query, 300000).scoreDocs;
        for (ScoreDoc scoreDoc : hits)
        {
            Document targetDoc = isearcher.doc(scoreDoc.doc);
            System.out.println(targetDoc.getField("content").stringValue());
        }

如果你那边也能通过这个测试,那么问题可能并不在这里。

from hanlp.

lsq88334753 avatar lsq88334753 commented on May 23, 2024

折腾一天找到原因了,就是在添加的字符串(content)中存在两个或以上的换行符时后面的文本就不被识别了。例如:"\n\n" + "被公诉机关指控涉嫌犯罪的当事人称作被告人。" 再搜索被告人 结果就为0

from hanlp.

hankcs avatar hankcs commented on May 23, 2024

感谢排查,问题已经确认,马上修复这个bug。

from hanlp.

hankcs avatar hankcs commented on May 23, 2024

这个问题应该解决了,如果还有问题,欢迎再开issue。

from hanlp.

lsq88334753 avatar lsq88334753 commented on May 23, 2024

非常感谢!

from hanlp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.