Git Product home page Git Product logo

go-lang-detector's People

Contributors

bioothod avatar chrisport avatar pfedak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

go-lang-detector's Issues

Enhance Logging

Currently the library prints stdout using fmt. A suitable logging library should be integrated and made configurable. As standard as possible

Add Chinese in language detection

Will it be possible to direct me how to detect the language written in chinese.I searched everywhere but couldnt find the Chinese text sample to add the language in the detector.Thanks

English detector fails when checking czech text

English detector fails when checking czech text:

package main

import (
"fmt"
"github.com/chrisport/go-lang-detector/langdet"
"github.com/chrisport/go-lang-detector/langdet/langdetdef"
)

var isEnglishDetector langdet.Detector

func isEnglish(text string) bool {
if len(isEnglishDetector.Languages) == 0 {
fmt.Println("* Init English detector ...")
isEnglishDetector = langdetdef.NewWithDefaultLanguages()
}

if isEnglishDetector.GetClosestLanguage(text) == "english" {
	return true
}

return false

}

func main() {
fmt.Println(isEnglish("do not care about quantity"))
fmt.Println(isEnglish("V jeho jednomyslném schválení však brání dlouhodobý nesouhlas dvojice zmíněných států. „Slyším tak často z Polska a Maďarska, že nemají problém s právním státem, až bych skoro čekala, že to dokážou tím, že pro to zvednou ruku,“ prohlásila. (ČTK)*"))
fmt.Println(isEnglish("Jesteśmy przekonani, że właśnie taki rodzaj dziennikarstwa najlepiej pomaga rozumieć to, co dzieje się dookoła nas i stanowi najbardziej wartościowy wkład w rozwój demokracji oraz wartości obywatelskich"))
}

OUTPUT:

  • Init English detector ...
    true
    true
    true

Update to GO modules

If anyone is interested in migrating this repository to the newest GO practices around modules, I would appreciate some help.

Restructure package?

This repository seems to be structured incorrectly. Rather than having the examples in an examples/ folder they seem to be the root level and the actual package is github.com/chrisport/go-lang-detector/langdet. In addition, the package has an init() call that always fails because the default_languages.json file isn't in the package folder.

package main

import (
	"fmt"
	"github.com/chrisport/go-lang-detector/langdet"
)

func main() {

	strs := []string{
		"do not care about quantity",
		"ont permis d'identifier",
		"English 简体中文",
	}

	filepath := "...    ...go/src/github.com/chrisport/go-lang-detector/default_languages.json"
	langdet.InitWithDefault(filepath)
	detector := langdet.NewDefaultLanguages()

	for _, s := range strs {
		fmt.Println(s)
		fmt.Println(detector.GetLanguages(s))
	}

}

Running this code gives the following output:

$ go run lang.go
go-lang-detector/langdet: No default languages loaded. default_languages.json not present
do not care about quantity
[{english 90} {french 75} {german 50} {turkish 44} {hebrew 19} {arabic 1} {russian 0}]
ont permis d'identifier
[{french 86} {english 80} {german 71} {turkish 54} {hebrew 33} {arabic 2} {russian 0}]
English 简体中文
[{english 48} {german 37} {french 29} {turkish 22} {hebrew 16} {arabic 2} {russian 0}]

New detector from reader

It would be good to be able to create a detector from a reader.
The init method that looks for a file path is really difficult to work with in tests etc.

Explanation on the readme

Hi,
one thing is not clear to me. Let's say I want to add a language that is not present in the library. One I extracted the text I should call code like in the following snippet.

   text_sample := GetTextFromFile("samples/polish.txt")
    french := langdet.Analyze(text_sample, "french")

My question is: do I need to do every time I start the program? Isn't there a way to persist the detector?

NewDefaultLanguages -> NewWithDefaultLanguages

It would be goog to rename method "NewDefaultLanguages" in README.md
You have "NewDefaultLanguages" instead of "NewWithDefaultLanguages", that cause errror:

.\lang.go:9: undefined: langdet.NewDefaultLanguages

Guesses english as higher probability with estonian phrase

Hello,

I loaded the default lang detector with an Estonian dictionary, then tested a list of keywords from an Estonian web page. The lang detector guesses English as a higher probability than Estonian, even though only one of the keywords is also English (as far as I know). Keywords are "domeen registreeritud see kuid saab"

Anything I can do differently to help the detector guess the correct language here?

Minimal example code to reproduce the issue on dropbox: https://www.dropbox.com/s/z05ct7eowp3yq9m/langtest.zip?dl=0

Analyse language from URL

It could be useful (?) to get new input language from a Website

text_sample := GetTextFromHTML("https://en.wikipedia.org/wiki/Special:Random")
detector.AddLanguageFrom(text_sample, "english")\

HTML-tags need to be removed and probably only Body should be used.

A few questions

Hi,

Just had a few questions I was hoping you could answer:

  1. What languages are supported by default?
  2. Have you run comparisons against other open source language detectors to see how much different the results are?
  3. Have you benchmarked the performance of this against other open source language detectors in terms of speed?

Thanks

Error

I have run your test, after i run this command "go get github.com/chrisport/go-lang-detector/langdet":
package main

import (
"fmt"
"github.com/chrisport/go-lang-detector/langdet"
)

func main() {
detector := langdet.NewDefaultDetector()
testString := "do not care about my car"
result := detector.GetClosestLanguage(testString)
fmt.Println(result)
}

and this is the result:
undefined: langdet.NewDefaultDetector

Do i need to install more packages???

Wrong language detected

No longer seems to work

package main

import (
	"fmt"

	"github.com/chrisport/go-lang-detector/langdet"
	"github.com/chrisport/go-lang-detector/langdet/langdetdef"
)

var detector langdet.Detector

func init() {
	detector = langdetdef.NewWithDefaultLanguages()
}

func main() {
	testString := "do not care about quantity"
	result := detector.GetClosestLanguage(testString)
	fmt.Println(result)
}

Prints the following results:

> go run ./
hebrew

Low confidence on Russian language

Code:

package main

import (
	"os"
	"strings"

	"github.com/chrisport/go-lang-detector/langdet/langdetdef"
)

func main() {
	arg := strings.Join(os.Args[1:], " ")

	detector := langdetdef.NewWithDefaultLanguages()
	result := detector.GetClosestLanguage(arg)
	fmt.Println(arg, ' is ', result)
}

Print: привет мир is undefined, привет мир is Russian lang.

english example but output french

package main

import (
"fmt"

//"github.com/chrisport/go-lang-detector"
"github.com/chrisport/go-lang-detector/langdet/langdetdef"

)

func main() {
detector := langdetdef.NewWithDefaultLanguages()
testString := "do not care about quantity"
result := detector.GetClosestLanguage(testString)
fmt.Println(result)
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.