Git Product home page Git Product logo

Comments (6)

yanyiwu avatar yanyiwu commented on August 22, 2024

from gojieba.

qiukeren avatar qiukeren commented on August 22, 2024

虽然不太礼貌,不过既然是不正常了,那么求看一下,gojieba库,我的调用方法是不是有问题:

我换了个源,改成小说,来源是:http://www.iplaysoft.com/1326-txt-science-fiction.html
(源文件是cp936编码,手动转码成utf8)
大约1300个文件,43M。

索引完后的文件大约在900M

  899.3 MiB [##########] /gojieba.bleve                                                                                                               
   43.3 MiB [          ] /source
package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"os"

	"github.com/blevesearch/bleve"
	. "github.com/qiukeren/go-utils/common"
	"github.com/yanyiwu/gojieba"
	_ "github.com/yanyiwu/gojieba/bleve"
)

func Example() {
	INDEX_DIR := "gojieba.bleve"
	dirEntries, err := ioutil.ReadDir("/Users/XXX/gopath/src/search/bleve/source")
	if err != nil {
		log.Panicln(err)
	}
	type Message struct {
		Id      string
		Content string
	}

	indexMapping := bleve.NewIndexMapping()
	os.RemoveAll(INDEX_DIR)
	// clean index when example finished
	// defer os.RemoveAll(INDEX_DIR)

	err = indexMapping.AddCustomTokenizer("gojieba",
		map[string]interface{}{
			"dictpath":     gojieba.DICT_PATH,
			"hmmpath":      gojieba.HMM_PATH,
			"userdictpath": gojieba.USER_DICT_PATH,
			"type":         "gojieba",
			"idf":          gojieba.IDF_PATH, //idf 与stop_words必须要加,不然报错,此处采用自带的idf
			"stop_words":   gojieba.STOP_WORDS_PATH,
		},
	)
	if err != nil {
		panic(err)
	}
	err = indexMapping.AddCustomAnalyzer("gojieba",
		map[string]interface{}{
			"type":      "gojieba",
			"tokenizer": "gojieba",
		},
	)
	if err != nil {
		panic(err)
	}
	indexMapping.DefaultAnalyzer = "gojieba"

	index, err := bleve.New(INDEX_DIR, indexMapping)
	if err != nil {
		panic(err)
	}
	log.Println(len(dirEntries))
	for k, v := range dirEntries {
		log.Println(k, "of", len(dirEntries))

		data, _ := ReadToString("/Users/XXX/gopath/src/search/bleve/source/" + v.Name())
		message := Message{Id: v.Name(), Content: string(data)}

		// go func(name string, content Message) {
		index.Index(v.Name(), message)
		// }(v.Name(), message)

	}

	querys := []string{
		"你好世界",
		"亲口交代",
	}

	for _, q := range querys {
		req := bleve.NewSearchRequest(bleve.NewQueryStringQuery(q))
		req.Highlight = bleve.NewHighlight()
		res, err := index.Search(req)
		if err != nil {
			panic(err)
		}
		fmt.Println(res)
	}
}

func main() {
	Example()
}

from gojieba.

ixdog avatar ixdog commented on August 22, 2024

sphinx挺好的

from gojieba.

qiukeren avatar qiukeren commented on August 22, 2024

@wenduniang

我自己选型的规则是,Go>java>php>python>C/C++。

单单就搜索方面的选型是,Go>java>python>c/c++/php。

java尚且还有elk、lucene等一系列框架可以选,我没必要再舍近求远了。

from gojieba.

ixdog avatar ixdog commented on August 22, 2024

解决了吗

from gojieba.

qiukeren avatar qiukeren commented on August 22, 2024

并没有

from gojieba.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.