sea-team / gofound Goto Github PK
View Code? Open in Web Editor NEWGoFound GoLang Full text search go语言全文检索引擎,毫秒级查询。 使用http接口调用,集成Admin管理界面,任何系统都可以使用。
License: Apache License 2.0
GoFound GoLang Full text search go语言全文检索引擎,毫秒级查询。 使用http接口调用,集成Admin管理界面,任何系统都可以使用。
License: Apache License 2.0
希望可以提供 查看数据库一共多少条数据的接口
源码编译(go get & go build
)后,将二进制文件放入docker内。
FROM ubuntu:20.04
COPY ./gofound /app/gofound
RUN mkdir /app/data &&\
chmod 555 /app/gofound
WORKDIR /app
EXPOSE 5678
CMD ["./gofound", "--addr=:5678", "--data=./data"]
删除数据库后,虽然数据库的记录已经在内存中删除了,但对应生成的文件并没有删除。
似乎是Engine::Drop方法中的代码存在问题。
且其中的第551行不知道为什么要每次都要执行os.Remove(e.IndexPath)
,应该可以优化一下。
删除数据库后,内存中的记录删除以外,对应的数据库文件也删除。
我不确定这是否是预期行为,但个人感觉应该是一个bug。
package main
import (
"gofound/core"
"gofound/global"
"gofound/searcher/model"
service2 "gofound/web/service"
"log"
"runtime"
)
type Services struct {
Base *service2.Base
Index *service2.Index
Database *service2.Database
Word *service2.Word
}
func NewServices() *Services {
return &Services{
Base: service2.NewBase(),
Index: service2.NewIndex(),
Database: service2.NewDatabase(),
Word: service2.NewWord(),
}
}
func main() {
// Initialize 初始化
//global.CONFIG = core.Parser() // if you need config.yaml
global.CONFIG = &global.Config{
//Addr: *addr,
Data: "./data",
Debug: true,
Dictionary: "./data/dictionary.txt",
//EnableAdmin: false,
Gomaxprocs: runtime.NumCPU(),
//Auth: "",
//EnableGzip: false,
Timeout: 600,
BufferNum: 1000,
}
//初始化分词器
tokenizer := core.NewTokenizer(global.CONFIG.Dictionary)
global.Container = core.NewContainer(tokenizer)
srv := NewServices()
log.Println(srv.Base.Status())
request := &model.IndexDoc{}
request.Id = 1
request.Text = "下列关于静态代码块的描述中,正确的是( )"
t := `
a. 使用静态代码块可以实现类的初始化
b. 静态代码块随着类的加载而加载
c. 每次创建对象时,类中的静态代码块都会被执行一次
d. 静态代码块指的是被static关键字修饰的代码块
`
request.Document = map[string]interface{}{
"content": t,
"answer": "静态代码块指的是被static关键字修饰的代码块, 静态代码块随着类的加载而加载, 使用静态代码块可以实现类的初始化",
}
log.Println(srv.Index.AddIndex("default", request))
}
是否要将数据库导出写入文件才行?
批量导入了一批数据,搜索结果出现重复数据的情况,同一个id出现多次。
比如,就想看最新的一些数据,通常也就是第一页内容看看也就大致满足,即使翻页也不过是翻几页
要是也支持单字段检索完美了
data目录数据不变,下载运行1.17后出现搜索结果重复,退回旧版本后恢复
是否支持部署多个实例?
{"document":{"sys_org_code":null,"owner_id":"10008","name":"测试测试1","background_zosid":"20220714100592881081549866861280","create_by":"","image_zosid":"20220818101555966537523750432670","create_time":"2022-08-18 09:15:15","update_time":"2022-08-18 10:35:31"},"text":"10013^10013^测试测试2","id":10013}
我以这种格式去更新数据,会偶尔出现更新不成功的情况 api接口 提示的是成功 也没有报错 但是数据内容没有变
不管全文检索匹配分值,不按分值只按id倒序,这种模式有没可能支持?
gofound-python添加add_documents方式特别慢,目测要1秒钟一个doc。
具体参数和环境为:
(1)索引text字段长度限制在200个字符内;
(2)每100个doc时调用add_documents;
(3)其他采用默认参数。
其他应用用过docker都能正常运行,但gofound还是有问题。郁闷了,不通过docker,直接./gofound 则是不停的报出wait:0
如题,研究了一下源码,有点复杂,后续有进展会PR
作者大大您好!
请问能否增加针对CSV数据格式的分词选项呢,CSV以逗号做为分隔符,把CSV整行数据导入Gofound其为强需求,如果可以增加这一块的功能,相信对项目是有益的。
如今都是容器化部署,命令行参数配置较为繁琐,文件配置也不够灵活,将会有有众多的个性化参数出现。
各参数提供默认值,以环境变量配置以可替换化,这是比较灵活的方式,容器化部署也会很方便,请考虑,预祝越做越好!
the Hypertext Transfer Protocol HTTP is an application layer protocol in the Internet protocol 我增加了内容, 我搜 layer 没有出数据啊,搜 protocol 也没有数据,debug默认开的,控制台没有报错误
fast.go 里面的find的二分查找算法写错了
func (f *FastSort) find(target *uint32) (bool, int) {
low := 0
high := f.count - 1
for low <= high {
mid := (low + high) / 2
if f.data[mid].Id == *target {
return true, mid
} else if f.data[mid].Id < *target { // 这里的小于号应该是大于号
high = mid - 1
} else {
low = mid + 1
}
}
return false, -1
//for index, item := range f.data {
// if item.Id == *target {
// return true, index
// }
//}
//return false, -1
}
gofound revision: 59d4e00
gofound-python revision: e170d832a486c3588ddb7e61a0c84eea9e99829b
Use python script in https://github.com/newpanjing/gofound-python/blob/master/README.md, got:
$ python test.py
{'state': True, 'message': 'success'}
{'state': True, 'message': 'success', 'data': {'time': 33.977121, 'total': 1, 'pageCount': 1, 'page': 1, 'limit': 10, 'documents': [{'id': 1000, 'text': '探访海南自贸港“样板间”', 'document': {'content': '洋浦经济开发区地处海南西北部洋浦半岛,是21世纪海上丝绸之路与西部陆海新通道的交汇节点。是国务院1992年批准设立的。我国第一个由外商成片开发、享受保税区政策的国家级开发区'}, 'score': 3, 'keys': ['海南', '自贸港', '样板', '样板间', '探访']}], 'words': ['探访', '海南', '自贸港']}}
{'id': 1000, 'text': '探访海南自贸港“样板间”', 'document': {'content': '洋浦经济开发区地处海南西北部洋浦半岛,是21世纪海上丝绸之路与西部陆海新通道的交汇节点。是国务院1992年批准设立的。我国第一个由外商成片开发、享受保税区政策的国家级开发区'}, 'score': 3, 'keys': ['海南', '自贸港', '样板', '样板间', '探访']}
Traceback (most recent call last):
File "test.py", line 46, in <module>
remove()
File "test.py", line 39, in remove
res = client.remove_document(1000)
File "/home/zhangclb/sandbox/gofound/gofound-python/gofound/client.py", line 78, in remove_document
res = self._post("remove", json={
File "/home/zhangclb/sandbox/gofound/gofound-python/gofound/client.py", line 43, in _post
raise DBException("Error:", res.status_code)
gofound.exceptions.DBException: ('Error:', 404)
Then access http://localhost:8080/admin/#/ , query with "海南",
sent:
{
"query": "海南",
"page": 1,
"limit": 10,
"highlight": {
"preTag": "<em style='color:red'>",
"postTag": "</em>"
},
"order": "DESC"
}
got:
{"state":true,"message":"success","data":{"time":0.38622900000000004,"total":1,"pageCount":1,"page":1,"limit":10,"documents":[{"id":1000,"text":"探访\u003cem style='color:red'\u003e海南\u003c/em\u003e自贸港“样板间”","document":{"content":"洋浦经济开发区地处海南西北部洋浦半岛,是21世纪海上丝绸之路与西部陆海新通道的交汇节点。是国务院1992年批准设立的。我国第一个由外商成片开发、享受保税区政策的国家级开发区"},"originalText":"探访海南自贸港“样板间”","score":1,"keys":["海南","自贸港","样板","样板间","探访"]}],"words":["海南"]}}
But remove failed, sent:
{id: 1000}
But got 404:
请求网址: http://localhost:8080/api/remove?database=default
请求方法: POST
状态代码: 404 Not Found
远程地址: 127.0.0.1:8080
引荐来源网址政策: strict-origin-when-cross-origin
换了台2核4G5M的centos7,安装后不报错:curl: (56) Recv failure: Connection reset by peer
id 建议设计为string ,这样更能兼容业务
请问一下,分词的时候,为什么要把标点符号和空格都去掉,这样英文就没法分词了吧?还是我用的不对?
text = utils.RemovePunctuation(text)
//移除所有的空格
text = utils.RemoveSpace(text)
如题,扩容是什么怎样的设计或推荐方案呢?
我存进需要分词的张三,搜索输入张就搜索不到
{"name":"张三","age":18}
这种“一维”结构太简单了,现实情境太复杂,不支持子节点结构影响实际应用啊
通过管理页面上添加{"name":"张三","age":18,"node":{"test":"testnode"}}
,提示像成功,实际查询结果不对,也不知道有没有正常添加成功
内置的分词总有不满足的情况,新建时可否考虑接口中增加自定义分词的参数?
比如增加一个自定义分词属性"token",允许加上自定义的分词,像空格分隔的"token"=“π 3.131415926”,目的是希望用“π”或“3.131415926”检索时能被查出。
如此既有内置默认分词,又能方便业务扩展
请问支持哪些语言的分词?
多线程导入百万数据的时候CPU占用高,有什么方法可以快速导入,但是资源占用相对比较低一点的么?
如题
运行一段时间之后程序就会退出,打开 debug 模式查看了最后的日志输入如下:
panic: resource temporarily unavailable
goroutine 3364545 [running]:
gofound/searcher/storage.(*LeveldbStorage).ReOpen(0xc00007a3c0)
/home/runner/work/gofound/gofound/searcher/storage/leveldb_storage.go:81 +0x115
gofound/searcher/storage.(*LeveldbStorage).autoOpenDB(0xc00007a3c0)
/home/runner/work/gofound/gofound/searcher/storage/leveldb_storage.go:26 +0x2e
gofound/searcher/storage.(*LeveldbStorage).Get(0xc00007a3c0, {0xc001a29e8c, 0x4, 0x4})
/home/runner/work/gofound/gofound/searcher/storage/leveldb_storage.go:91 +0x2d
gofound/searcher.(*Engine).GetDocById(0xc0029da4e0?, 0x29da360?)
/home/runner/work/gofound/gofound/searcher/engine.go:550 +0x65
gofound/searcher.(*Engine).getDocument(0x0?, {0x43fb65?, 0xc0052f8768?}, 0xc000000230, 0xc002b6a600, 0xc002694280, 0xc000394a40)
/home/runner/work/gofound/gofound/searcher/engine.go:466 +0x68
created by gofound/searcher.(*Engine).MultiSearch.func2
/home/runner/work/gofound/gofound/searcher/engine.go:419 +0x9cd
很多情况还真不关心id,只要能自动递增就行
若服务能提供选项配置就更好了。
零配置引领入门,便利的配置选项能更好满足需求
请问:我将gofound项目方到centos7上编译执行后出现 waiting:0 一直不停的出现无法执行成功。
具体我是将项目上传到远程centos7上后编译执行的。当然没有用docker。就是常规编译执行:./gofound --addr=:8081 --data=./data
麻烦帮我看下,具体什么情况导致
请求:
curl -H "Content-Type:application/json" -X GET http://127.0.0.1:5678/api/status
{"state":false,"message":"runtime error: index out of range [0] with length 0"}
goroutine 39 [running]:
runtime/debug.Stack()
/opt/hostedtoolcache/go/1.18.5/x64/src/runtime/debug/stack.go:24 +0x68
runtime/debug.PrintStack()
/opt/hostedtoolcache/go/1.18.5/x64/src/runtime/debug/stack.go:16 +0x20
gofound/web/middleware.Exception.func1.1()
/home/runner/work/gofound/gofound/web/middleware/exception.go:15 +0x40
panic({0x10178b720, 0x14002e19ec0})
/opt/hostedtoolcache/go/1.18.5/x64/src/runtime/panic.go:838 +0x204
gofound/searcher/system.GetCPUStatus()
/home/runner/work/gofound/gofound/searcher/system/cpu.go:19 +0xd0
gofound/web/service.(*Base).Status(0x14000067100)
/home/runner/work/gofound/gofound/web/service/base.go:44 +0x10c
gofound/web/controller.Status(0x14000071630?)
/home/runner/work/gofound/gofound/web/controller/base.go:39 +0x38
github.com/gin-gonic/gin.(*Context).Next(...)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/context.go:168
gofound/web/middleware.Exception.func1(0x1400017c300)
/home/runner/work/gofound/gofound/web/middleware/exception.go:20 +0x6c
github.com/gin-gonic/gin.(*Context).Next(...)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/context.go:168
gofound/web/middleware.Cors.func1(0x1400017c300)
/home/runner/work/gofound/gofound/web/middleware/cors.go:25 +0x140
github.com/gin-gonic/gin.(*Context).Next(...)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/context.go:168
github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0x1400017c300)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/recovery.go:99 +0x80
github.com/gin-gonic/gin.(*Context).Next(...)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/context.go:168
github.com/gin-gonic/gin.LoggerWithConfig.func1(0x1400017c300)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/logger.go:241 +0xb0
github.com/gin-gonic/gin.(*Context).Next(...)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/context.go:168
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0x140052b1040, 0x1400017c300)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/gin.go:555 +0x568
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0x140052b1040, {0x1017ccf80?, 0x140002781c0}, 0x1400017c000)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/gin.go:511 +0x1d4
net/http.serverHandler.ServeHTTP({0x140002c81e0?}, {0x1017ccf80, 0x140002781c0}, 0x1400017c000)
/opt/hostedtoolcache/go/1.18.5/x64/src/net/http/server.go:2916 +0x3fc
net/http.(*conn).serve(0x1400032a000, {0x1017cd6e8, 0x140051c5f50})
/opt/hostedtoolcache/go/1.18.5/x64/src/net/http/server.go:1966 +0x56c
created by net/http.(*Server).Serve
/opt/hostedtoolcache/go/1.18.5/x64/src/net/http/server.go:3071 +0x450
[GIN] 2022/11/02 - 16:31:23 | 200 | 9.7025ms | 127.0.0.1 | GET "/api/status"
我通过docker已经在centos7上成功安装并执行,而且开放了端口,但现在在centos7远程服务器上无法自行访问,本地也无法通过域名访问到后台界面。是不是我还需要在哪里进行设置?
[GIN-debug] GET /api/db/list --> github.com/sea-team/gofound/web/controller.DBS (6 handlers)
[GIN-debug] GET /api/db/drop --> github.com/sea-team/gofound/web/controller.DatabaseDrop (6 handlers)
[GIN-debug] GET /api/db/create --> github.com/sea-team/gofound/web/controller.DatabaseCreate (6 handlers)
[GIN-debug] GET /api/word/cut --> github.com/sea-team/gofound/web/controller.WordCut (6 handlers)
2023/05/31 14:19:34 API Url: http://:8080/api
2023/05/31 14:19:44 waiting: 0
2023/05/31 14:19:44 waiting: 0
2023/05/31 14:19:54 waiting: 0
2023/05/31 14:19:54 waiting: 0
^C2023/05/31 14:19:59 Shutdown Server ...
2023/05/31 14:19:59 Server exiting
详细错误提示:
index.3e4347a8.js:1 Failed to load module script: Expected a JavaScript module script but the server responded with a MIME type of "text/plain". Strict MIME type checking is enforced for module scripts per HTML spec.
添加代码 gofound/web/admin/admin.go:
import (
"mime"
)
func init() {
mime.AddExtensionType(".html", "text/html")
mime.AddExtensionType(".css", "text/css")
mime.AddExtensionType(".js", "text/javascript")
}
还有在维护吗?
使用最新的release,使用python客户端运行test.py
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x188 pc=0xcc1df7]
goroutine 7 [running]:
github.com/syndtr/goleveldb/leveldb.(*DB).isClosed(...)
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db_state.go:230
github.com/syndtr/goleveldb/leveldb.(*DB).ok(...)
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db_state.go:235
github.com/syndtr/goleveldb/leveldb.(*DB).Get(0xc00040a680?, {0xc002ed87a8?, 0x8?, 0x6?}, 0xc002ed87a8?)
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db.go:838 +0x57
gofound/searcher/storage.(*LeveldbStorage).Get(0xc00040a680, {0xc002ed87a8, 0x6, 0x8})
/home/runner/work/gofound/gofound/searcher/storage/leveldb_storage.go:88 +0x4b
gofound/searcher.(*Engine).addInvertedIndex(0xc0004380c0, {0xc0051bb8b0, 0x6}, 0x3e8)
/home/runner/work/gofound/gofound/searcher/engine.go:208 +0x165
gofound/searcher.(*Engine).AddDocument(0xc0004380c0, 0xc002ee4080)
/home/runner/work/gofound/gofound/searcher/engine.go:187 +0xf7
gofound/searcher.(*Engine).DocumentWorkerExec(0x0?, 0x0?)
/home/runner/work/gofound/gofound/searcher/engine.go:125 +0x45
created by gofound/searcher.(*Engine).Init
/home/runner/work/gofound/gofound/searcher/engine.go:71 +0x24a
是关键字 命中的越多,分数越大吗?
测了不少次,没测出什么规律,不知道如何利用这个Score 字段
因为搜索的东西多,并不精准,想要筛选下结果集
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.