sajari / fuzzy Goto Github PK
View Code? Open in Web Editor NEWSpell checking and fuzzy search suggestion written in Go
Home Page: https://www.sajari.com/
License: MIT License
Spell checking and fuzzy search suggestion written in Go
Home Page: https://www.sajari.com/
License: MIT License
Hi,
you say that your algo has a accuracy of 68%. do you know what accuracy other libs achieve?
is it able to correct a slang - word?
Gerald
Line 593 in a913c98
For autocomplete to work on front page example model threshold should be 0.
model.SetThreshold(0)
Thanks for the awesome fuzzy library.
For my use case I want to be able to provide different suggestion ranking criteria than is implemented in fuzzy.best(). I can use Suggestions(), but I lose the information in *Potential.
What do you think of exposing SuggestPotential()?
SuggestPotential() would need to obtain the lock and the fields in Potential also made public, but I don't see anything complex that would need to change.
Need to auto load this if switched from false to true.
See line: https://github.com/sajari/fuzzy/blob/master/fuzzy.go#L507
When testing use of autocomplete on a small dataset I found that this restriction was problematic, as model.Maxcount/50 is always 0. I don't understand the purpose of this rule as it would seem to exclude the most popular matches. And if there is some reason for I'm not getting, isn't 50 just going to be arbitrary, should it be configurable?
Can we use this package with sentences like below
Ex: Input = "hi i am fro India"("m" is missing in this text)
Expected output = "hi i am from India"("m" is added from missing place)
If you need speed, change your LD code for this one. Explanations :
func LevenshteinDistance(a, b *string) int {
la := len(*a)
lb := len(*b)
d := make([]int, la + 1)
var lastdiag, olddiag, temp int
for i := 1; i <= la; i++ {
d[i] = i
}
for i := 1; i <= lb; i++ {
d[0] = i
lastdiag = i - 1
for j := 1; j <= la; j++ {
olddiag = d[j]
min := d[j] + 1
if (d[j - 1] + 1) < min {
min = d[j - 1] + 1
}
if ( (*a)[j - 1] == (*b)[i - 1] ) {
temp = 0
} else {
temp = 1
}
if (lastdiag + temp) < min {
min = lastdiag + temp
}
d[j] = min
lastdiag = olddiag
}
}
return d[la]
}
Currently we can't load models from io.Reader implementations (only local filesystem), and sim for persisting models.
FromReader(r io.Reader) (*Model, error)
func (m *Model) WriteTo(w io.Writer) (n int64, err error)
Update Load
and Save
functions to use the new code.
Hi everybody,
I am getting this error when I run the following commands while building a Docker image;
FROM golang:latest
RUN go get -t github.com/sajari/fuzzy
RUN cd ${GOPATH}/src/github.com/sajari/fuzzy && go test
I get the following error regarding the double char delete 2nd closest for the word bigge
.
--- FAIL: TestSpellingSuggestions (0.00s)
fuzzy_test.go:78: Spell check suggestions, Double char delete 2nd closest
Spell test1 count: 270, Correct: 193, Incorrect: 77, Ratio: 0.714815, Total time: 6.1401ms
Spell test2 count: 400, Correct: 270, Incorrect: 130, Ratio: 0.675000, Total time: 11.0152ms
FAIL
exit status 1
FAIL github.com/sajari/fuzzy 4.069s
The command '/bin/sh -c cd ${GOPATH}/src/github.com/sajari/fuzzy && go test' returned a non-zero code: 1
Could you help me out?
Thanks in advance!
Panic occurs at https://github.com/sajari/fuzzy/blob/master/fuzzy.go#L592
This function should return an error for blank inputs
The output from SpellCheck and SpellCheckSuggestions differs.
model.SpellCheck("lisence") => "liens"
model.SpellCheckSuggestions("lisence", 1) => ["licence"]
model.CheckKnown("lisense", "license") => true
purpel => pure || [parcel]
natior => nor || [nation]
It is not completely consistent, as it works in some cases.
The model has been trained from a pre-collected corpus with a count for each word. Using model.SetCount(word, count, true)
, but it seems to be the same with training from SampleEnglish()
Test:
func TestSuggestionsVsSpelling(t *testing.T) {
model := NewModel()
model.Train(SampleEnglish())
cases := []string{
"lisence",
"purpel",
"blidn",
"teh",
}
for _, word := range cases {
checked := model.SpellCheck(word)
suggestions := model.SpellCheckSuggestions(word, 1)
if len(suggestions) == 1 && suggestions[0] != checked {
t.Errorf("first suggestion '%s', does not equal SpellCheck '%s'", suggestions[0], checked)
}
}
}
See lines: https://github.com/sajari/fuzzy/blob/master/fuzzy.go#L430-L432
A slice of size 10 is allocated, but then suggestions are appended to the end of that.
Simple fix is to change it to:
output := make([]string, 0, 10)
I tried training the model with some "hindi" data but the model silently fails
$ go build
# github.com/sajari/fuzzy
../../../github.com/sajari/fuzzy/fuzzy.go:128: undefined: UseAutocomplete
Did you try compiling this before merging the latest PR?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.