Git Product home page Git Product logo

go-license-detector's Introduction

go-license-detector GoDoc Test Go Report Card

Project license detector - a command line application and a library, written in Go. It scans the given directory for license files, normalizes and hashes them and outputs all the fuzzy matches with the list of reference texts. The returned names follow SPDX standard. Read the blog post.

Why? There are no similar projects which can be compiled into a native binary without dependencies and also support the whole SPDX license database (≈400 items). This implementation is also fast, requires little memory, and the API is easy to use.

The license texts are taken directly from license-list-data repository. The detection algorithm is not template matching; this directly implies that go-license-detector does not provide any legal guarantees. The intended area of it's usage is data mining.

Installation

go get github.com/go-enry/go-license-detector/v4/licensedb

The CLI is available for download at the release page.

Algorithm

  1. Find files in the root directory which may represent a license. E.g. LICENSE or license.md.
  2. If the file is Markdown or reStructuredText, render to HTML and then convert to plain text. Original HTML files are also converted.
  3. Normalize the text according to SPDX recommendations.
  4. Split the text into unigrams and build the weighted bag of words.
  5. Calculate Weighted MinHash.
  6. Apply Locality Sensitive Hashing and pick the reference licenses which are close.
  7. For each of the candidate, calculate the Levenshtein distance - D. the corresponding text is the single line with each unigram represented by a single rune (character).
  8. Set the similarity as 1 - D / L where L is the number of unigrams in the quieried license.

This pipeline guarantees constant time queries, though requires some initialization to preprocess the reference licenses.

If there are not license files found:

  1. Look for README files.
  2. If the file is Markdown or reStructuredText, render to HTML and then convert to plain text. Original HTML files are also converted.
  3. Scan for words like "copyright", "license" and "released under". Take the neighborhood.
  4. Run Named Entity Recognition (NER) over that surrounding context and extract the possible license name.
  5. Match it against the list of license names from SPDX.

Usage

Command line:

license-detector /path/to/project
license-detector https://github.com/go-git/go-git

Library (for a single license detection):

import (
    "github.com/go-enry/go-license-detector/v4/licensedb"
    "github.com/go-enry/go-license-detector/v4/licensedb/filer"
)

func main() {
	licenses, err := licensedb.Detect(filer.FromDirectory("/path/to/project"))
}

Library (for a convenient data structure that can be formatted as JSON):

import (
	"encoding/json"
	"fmt"

	"github.com/go-enry/go-license-detector/v4/licensedb"
)

func main() {
	results := licensedb.Analyse("/path/to/project1", "/path/to/project2")
	bytes, err := json.MarshalIndent(results, "", "\t")
	if err != nil {
		fmt.Printf("could not encode result to JSON: %v\n", err)
	}
	fmt.Println(string(bytes))
}

Quality

On the dataset of ~1000 most starred repositories on GitHub as of early February 2018 (list), 99% of the licenses are detected. The analysis of detection failures is going in FAILURES.md.

Comparison to other projects on that dataset:

Detector Detection rate Time to scan, sec
go-license-detector 99% (897/902) 13.5
benbalter/licensee 75% (673/902) 111
google/licenseclassifier 76% (682/902) 907
boyter/lc 88% (797/902) 548
amzn/askalono 87% (785/902) 165
LiD 94% (847/902) 3660
How this was measured
$ cd $(go env GOPATH)/src/github.com/go-enry/go-license-detector/v4/licensedb
$ mkdir dataset && cd dataset
$ unzip ../dataset.zip
$ # go-enry/go-license-detector
$ time license-detector * \
  | grep -Pzo '\n[-0-9a-zA-Z]+\n\tno license' | grep -Pa '\tno ' | wc -l
$ # benbalter/licensee
$ time ls -1 | xargs -n1 -P4 licensee \
  | grep -E "^License: Other" | wc -l
$ # google/licenseclassifier
$ time find -type f -print | xargs -n1 -P4 identify_license \
  | cut -d/ -f2 | sort | uniq | wc -l
$ # boyter/lc
$ time lc . \
  | grep -vE 'NOASSERTION|----|Directory' | cut -d" " -f1 | sort | uniq | wc -l
$ # amzn/askalono
$ echo '#!/bin/sh
result=$(askalono id "$1")
echo "$1
$result"' > ../askalono.wrapper
$ time find -type f -print | xargs -n1 -P4 sh ../askalono.wrapper | grep -Pzo '.*\nLicense: .*\n' askalono.txt | grep -av "License: " | cut -d/ -f 2 | sort | uniq | wc -l
$ # LiD
$ time license-identifier -I dataset -F csv -O lid
$ cat lid_*.csv | cut -d, -f1 | cut -d"'" -f 2 | grep / | cut -d/ -f2 | sort | uniq | wc -l

Regenerate binary data

The SPDX licenses are included into the binary. To update them, run

# go install github.com/go-bindata/go-bindata/...
make licensedb/internal/assets/bindata.go

Contributions

...are welcome, see CONTRIBUTING.md and code of conduct.

License

Apache 2.0, see LICENSE.md.

go-license-detector's People

Contributors

abursavich avatar adracus avatar aperezg avatar bzz avatar campoy avatar dsymonds avatar erizocosmico avatar g4s8 avatar johanbrandhorst avatar karlmutch avatar karthiknayak avatar lafriks avatar marclop avatar mcuadros avatar mrdos avatar mx-psi avatar sebbonnet avatar smola avatar stiletto avatar tevino avatar tnir avatar vmarkovtsev avatar zurk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

go-license-detector's Issues

Failure to execute on Arch Linux

On Arch Linux (current rolling version), i'm seeing license-detector try to call get_robust_list which is apparently not implemented:

arch_prctl(ARCH_SET_FS, 0x6bbc7919d740) = 0
set_tid_address(0x6bbc7919da10)         = 2689494
set_robust_list(0x6bbc7919da20, 24)     = 0
rseq(0x6bbc7919e060, 0x20, 0, 0x53053053) = -1 ENOSYS (Function not implemented)
mprotect(0x6bbc79374000, 16384, PROT_READ) = 0
mprotect(0x14c2580c4000, 3244032, PROT_READ) = 0
mprotect(0x6bbc79407000, 8192, PROT_READ) = 0

The build process for Arch is all sh, and the specific PKGBUILD for license-detector is here

Feature: filer.FromGoModule

The filer package already supports FromGitURL which fetches the contents of a remote Git repo, but this doesn't cover the case of fetching and analyzing a Go module's source, wherever it may be. Go modules can be fetched from Mercurial, Subversion, etc., and might have import path aliases in place that make FromGitURL tricky.

I'd like to suggest adding filer.FromGoModule that would address this by fetching the module's source from the Go module proxy (proxy.golang.org by default) as a zip file, and analyzing it as FromZip does today -- possibly using a shared internal method that takes a zip.Reader instead of having to write to disk.

For example, filer.FromGoModule("knative.dev/[email protected]") would fetch and process https://proxy.golang.org/knative.dev/serving/@v/v0.26.0.zip. Note that this import path has an alias configured, and the user didn't have to perform a lookup to find that knative.dev/serving aliases https://github.com/knative/serving, then detect that it's GitHub and use Git, or GitHub's zip archive, etc. If knative.dev/serving moves to Mercurial or Subversion or non-GitHub, the caller wouldn't have to be aware.

If the module isn't found in the proxy, return an error. Users can set the GOPROXY env var to point to a different (e.g., internal corporate) module proxy, and filer.FromGoModule should support that. I'd suggest that filer.FromGoModule shouldn't support unversioned importpaths (e.g., knative.dev/serving) since looking up the latest correct version adds complexity, and I imagine in most cases users will be processing a go.mod file where versions are available anyway.

More info about the Go module proxy protocol:

edit: a quick sketch of the code is here: master...imjasonh:go-module -- it buffers the zip in memory on first read; it could also just write to a temp directory and filer.FromZIP it. It also doesn't take into account GOPROXY yet, still WIP. Let me know if I should keep working on this and send a PR.

Huge performance degradation with large license candidate files due to a bug

The library is building a regex here of the normalized first lines of license files. It then later splits files using the regex here.

The problem here is that the App-s2p.txt license's first line normalizes into an empty string. This then causes the regex to match every line beginning and ending as we can see for example in this regex tester. You can see the bug in the regex by searching for || which is where the license's first line would go.

This causes huge performance degradation in repositories with large files that match the license filename pattern. One example of a such a repository is https://gitlab.com/tikiwiki/tiki which contains a large file called copyright.txt. Detecting a license for the repository took 22s. Detecting the license takes 260ms with the below patch:

diff --git a/licensedb/internal/db.go b/licensedb/internal/db.go
index a7254fd..d69118e 100644
--- a/licensedb/internal/db.go
+++ b/licensedb/internal/db.go
@@ -176,6 +176,11 @@ func loadLicenses() *database {
 		if len(header.Name) <= 6 {
 			continue
 		}
+
+		if header.Name == "./App-s2p.txt" {
+			continue
+		}
+
 		key := header.Name[2 : len(header.Name)-4]
 		text := make([]byte, header.Size)
 		readSize, readErr := archive.Read(text)

What would be the appropriate fix here?

Data race on licensedb.Detect()

Hi there. I tried to integrate this library into Gitaly component of Gitlab to replace Ruby licensee version with Go implementation: https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3313

But the CI failed on race detection phase (with go test -race). So I tried to reproduce it on go-enry/go-license-detector tests with:

go test -race -v ./licensedb

And it fails with error:

=== RUN   TestDataset
==================
WARNING: DATA RACE
Write at 0x000001378b30 by goroutine 944:
  github.com/hhatto/gorst.initParser()
      /home/g4s8/go/pkg/mod/github.com/hhatto/[email protected]/parser.leg.go:12772 +0x3b8
  github.com/hhatto/gorst.NewParser()
      /home/g4s8/go/pkg/mod/github.com/hhatto/[email protected]/rst.go:63 +0x3a8
  github.com/go-enry/go-license-detector/v4/licensedb/internal/processors.RestructuredText()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/processors/markup.go:20 +0x3b
  github.com/go-enry/go-license-detector/v4/licensedb/internal.ExtractReadmeFiles()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/investigation.go:137 +0x3b4
  github.com/go-enry/go-license-detector/v4/licensedb.Detect()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/licensedb.go:46 +0x2a7
  github.com/go-enry/go-license-detector/v4/licensedb.TestDataset.func1()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/dataset_test.go:28 +0x128

Previous write at 0x000001378b30 by goroutine 928:
  [failed to restore the stack]

Goroutine 944 (running) created at:
  github.com/go-enry/go-license-detector/v4/licensedb.TestDataset()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/dataset_test.go:26 +0x2fc
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1123 +0x202

Goroutine 928 (running) created at:
  github.com/go-enry/go-license-detector/v4/licensedb.TestDataset()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/dataset_test.go:26 +0x2fc
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1123 +0x202
==================
==================
WARNING: DATA RACE
Write at 0x0000013444a0 by goroutine 944:
  github.com/hhatto/gorst.initParser()
      /home/g4s8/go/pkg/mod/github.com/hhatto/[email protected]/parser.leg.go:12773 +0x3dc
  github.com/hhatto/gorst.NewParser()
      /home/g4s8/go/pkg/mod/github.com/hhatto/[email protected]/rst.go:63 +0x3a8
  github.com/go-enry/go-license-detector/v4/licensedb/internal/processors.RestructuredText()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/processors/markup.go:20 +0x3b
  github.com/go-enry/go-license-detector/v4/licensedb/internal.ExtractReadmeFiles()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/investigation.go:137 +0x3b4
  github.com/go-enry/go-license-detector/v4/licensedb.Detect()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/licensedb.go:46 +0x2a7
  github.com/go-enry/go-license-detector/v4/licensedb.TestDataset.func1()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/dataset_test.go:28 +0x128

Previous write at 0x0000013444a0 by goroutine 928:
  [failed to restore the stack]

Goroutine 944 (running) created at:
  github.com/go-enry/go-license-detector/v4/licensedb.TestDataset()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/dataset_test.go:26 +0x2fc
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1123 +0x202

Goroutine 928 (running) created at:
  github.com/go-enry/go-license-detector/v4/licensedb.TestDataset()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/dataset_test.go:26 +0x2fc
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1123 +0x202
==================
==================
WARNING: DATA RACE
Write at 0x00c000222098 by goroutine 125:
  regexp.(*Regexp).Longest()
      /usr/lib/go/src/regexp/regexp.go:166 +0x64
  github.com/jdkato/prose/chunk.Locate()
      /home/g4s8/go/pkg/mod/github.com/jdkato/[email protected]/chunk/chunk.go:65 +0x4f
  github.com/jdkato/prose/chunk.Chunk()
      /home/g4s8/go/pkg/mod/github.com/jdkato/[email protected]/chunk/chunk.go:50 +0x7c
  github.com/go-enry/go-license-detector/v4/licensedb/internal.investigateReadmeFile()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/nlp.go:67 +0x469
  github.com/go-enry/go-license-detector/v4/licensedb/internal.(*database).QueryReadmeText()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/db.go:443 +0x404
  github.com/go-enry/go-license-detector/v4/licensedb/internal.InvestigateReadmeText()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/investigation.go:157 +0xaa
  github.com/go-enry/go-license-detector/v4/licensedb/internal.InvestigateReadmeTexts.func1()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/investigation.go:150 +0x68
  github.com/go-enry/go-license-detector/v4/licensedb/internal.investigateCandidates()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/investigation.go:71 +0x3b8
  github.com/go-enry/go-license-detector/v4/licensedb/internal.InvestigateReadmeTexts()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/investigation.go:149 +0x77
  github.com/go-enry/go-license-detector/v4/licensedb.Detect()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/licensedb.go:50 +0x2fd
  github.com/go-enry/go-license-detector/v4/licensedb.TestDataset.func1()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/dataset_test.go:28 +0x128

Previous write at 0x00c000222098 by goroutine 777:
  regexp.(*Regexp).Longest()
      /usr/lib/go/src/regexp/regexp.go:166 +0x64
  github.com/jdkato/prose/chunk.Locate()
      /home/g4s8/go/pkg/mod/github.com/jdkato/[email protected]/chunk/chunk.go:65 +0x4f
  github.com/jdkato/prose/chunk.Chunk()
      /home/g4s8/go/pkg/mod/github.com/jdkato/[email protected]/chunk/chunk.go:50 +0x7c
  github.com/go-enry/go-license-detector/v4/licensedb/internal.investigateReadmeFile()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/nlp.go:67 +0x469
  github.com/go-enry/go-license-detector/v4/licensedb/internal.(*database).QueryReadmeText()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/db.go:443 +0x404
  github.com/go-enry/go-license-detector/v4/licensedb/internal.InvestigateReadmeText()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/investigation.go:157 +0xaa
  github.com/go-enry/go-license-detector/v4/licensedb/internal.InvestigateReadmeTexts.func1()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/investigation.go:150 +0x68
  github.com/go-enry/go-license-detector/v4/licensedb/internal.investigateCandidates()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/investigation.go:71 +0x3b8
  github.com/go-enry/go-license-detector/v4/licensedb/internal.InvestigateReadmeTexts()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/internal/investigation.go:149 +0x77
  github.com/go-enry/go-license-detector/v4/licensedb.Detect()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/licensedb.go:50 +0x2fd
  github.com/go-enry/go-license-detector/v4/licensedb.TestDataset.func1()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/dataset_test.go:28 +0x128

Goroutine 125 (running) created at:
  github.com/go-enry/go-license-detector/v4/licensedb.TestDataset()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/dataset_test.go:26 +0x2fc
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1123 +0x202

Goroutine 777 (running) created at:
  github.com/go-enry/go-license-detector/v4/licensedb.TestDataset()
      /home/g4s8/projects/github.com/g4s8/go-license-detector/licensedb/dataset_test.go:26 +0x2fc
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1123 +0x202
==================
895 902 99%
    testing.go:1038: race detected during execution of test
--- FAIL: TestDataset (135.48s)
=== CONT  
    testing.go:1038: race detected during execution of test
FAIL
FAIL	github.com/go-enry/go-license-detector/v4/licensedb	136.905s
FAIL

Checked with go version go1.15.8 linux/amd64.

Long initialization on first detect

When the Detect() function called the first time, it loads license database from internal files via internal.globalLicenseDatabase(). There are no problems when calling it once from CLI. But it could be an issue when called from HTTP endpoint on a service with limited resource, since the request may be dropped due to connection timeout.

E.g. see this comment: https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3313#note_561680449

I'm proposing to add a new method to load the database before Detect, e.g. from main() of web-app before starting web-service.

command not found error when trying to run cli command

I am trying to download and run license-detector

❯ go clean --modcache                                                                                                                                    
❯ go get github.com/go-enry/go-license-detector/v4/licensedb                                                                                             go: downloading github.com/go-enry/go-license-detector v0.0.0-20200530180532-d686c4b71e84
go: downloading github.com/go-enry/go-license-detector/v4 v4.1.1
go: downloading github.com/go-git/go-git/v5 v5.1.0
go: downloading github.com/pkg/errors v0.9.1
go: downloading github.com/ekzhu/minhash-lsh v0.0.0-20171225071031-5c06ee8586a1
go: downloading github.com/jdkato/prose v1.1.0
go: downloading github.com/sergi/go-diff v1.1.0
go: downloading golang.org/x/text v0.3.2
go: downloading github.com/hhatto/gorst v0.0.0-20181029133204-ca9f730cac5b
go: downloading github.com/russross/blackfriday/v2 v2.0.1
go: downloading golang.org/x/net v0.0.0-20200301022130-244492dfa37a
go: downloading golang.org/x/exp v0.0.0-20190125153040-c74c464bbbf2
go: downloading gonum.org/v1/gonum v0.7.0
go: downloading github.com/dgryski/go-minhash v0.0.0-20170608043002-7fe510aff544
go: downloading github.com/emirpasic/gods v1.12.0
go: downloading golang.org/x/crypto v0.0.0-20200302210943-78000ba7a073
go: downloading github.com/go-git/go-billy/v5 v5.0.0
go: downloading github.com/imdario/mergo v0.3.9
go: downloading github.com/montanaflynn/stats v0.0.0-20151014174947-eeaced052adb
go: downloading github.com/shogo82148/go-shuffle v0.0.0-20170808115208-59829097ff3b
go: downloading gopkg.in/neurosnap/sentences.v1 v1.0.6
go: downloading github.com/shurcooL/sanitized_anchor_name v0.0.0-20170918181015-86672fcb3f95
go: downloading github.com/mitchellh/go-homedir v1.1.0
go: downloading github.com/jbenet/go-context v0.0.0-20150711004518-d14ea06fba99
go: downloading golang.org/x/sys v0.0.0-20200302150141-5c8b2ff67527
go: downloading github.com/go-git/gcfg v1.5.0
go: downloading github.com/kevinburke/ssh_config v0.0.0-20190725054713-01f96b0aa0cd
go: downloading github.com/xanzy/ssh-agent v0.2.1
go: downloading gopkg.in/warnings.v0 v0.1.2

However, when I cd in to ~/go/src/github.com, I do not see go-enry. When I run license-detector, I get

zsh: command not found: license-detector

This is at the bottom of my .zshrc file:

export GOPATH=$HOME/go
export GOBIN=$GOPATH/bin
export PATH=$PATH:$GOBIN

echo $PATH gives

/Users/my.name/.serverless/bin:/Users/my.name/bin:/usr/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/Users/my.name/go/bin

go env gives:

GO111MODULE=""
GOARCH="amd64"
GOBIN="/Users/my.name/go/bin"
GOCACHE="/Users/my.name/Library/Caches/go-build"
GOENV="/Users/my.name/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/my.name/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/my.name/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/0k/vkq6ycws5sg73j7lpgcd2z680000gn/T/go-build702706474=/tmp/go-build -gno-record-gcc-switches -fno-common"

How do I get the CLI command to run?

Update gonum.org/v1/gonum v0.7.0 to 0.9.0

We are packaging this module for Debian, but we already have gonum.org/v1/gonum v0.9.0 and would like to use that instead of 0.7.0

But one test is failing with the new version, so please update the test with new version.

$ go test -v ./licensedb ./licensedb/api ./licensedb/filer ./licensedb/internal ./licensedb/internal/assets ./licensedb/internal/fastlog ./licensedb/internal/normalize ./licensedb/internal/processors ./licensedb/internal/wmh
=== RUN   TestDataset
895 902 99%
--- PASS: TestDataset (16.23s)
PASS
ok  	github.com/go-enry/go-license-detector/v4/licensedb	16.513s
?   	github.com/go-enry/go-license-detector/v4/licensedb/api	[no test files]
=== RUN   TestLocalFiler
--- PASS: TestLocalFiler (0.00s)
=== RUN   TestGitFiler
--- PASS: TestGitFiler (0.01s)
=== RUN   TestZipFiler
--- PASS: TestZipFiler (0.00s)
=== RUN   TestNestedFiler
--- PASS: TestNestedFiler (0.00s)
PASS
ok  	github.com/go-enry/go-license-detector/v4/licensedb/filer	0.019s
=== RUN   TestSplitLicenseName
--- PASS: TestSplitLicenseName (0.00s)
PASS
ok  	github.com/go-enry/go-license-detector/v4/licensedb/internal	0.300s
?   	github.com/go-enry/go-license-detector/v4/licensedb/internal/assets	[no test files]
=== RUN   TestFastlog
--- PASS: TestFastlog (0.00s)
PASS
ok  	github.com/go-enry/go-license-detector/v4/licensedb/internal/fastlog	0.037s
=== RUN   TestNormalizeLines
--- PASS: TestNormalizeLines (0.00s)
PASS
ok  	github.com/go-enry/go-license-detector/v4/licensedb/internal/normalize	0.006s
=== RUN   TestHTML
--- PASS: TestHTML (0.00s)
PASS
ok  	github.com/go-enry/go-license-detector/v4/licensedb/internal/processors	0.031s
=== RUN   TestWMHSerialize
--- PASS: TestWMHSerialize (0.01s)
=== RUN   TestWMHHash
    wmh_test.go:98: 
        	Error Trace:	wmh_test.go:98
        	Error:      	Not equal: 
        	            	expected: []uint64{0x10032, 0x0, 0x1005a, 0x10050, 0x1005a, 0x1e, 0x10050, 0x5a, 0x28, 0x10028, 0x1003c, 0x10032, 0x1005a, 0x1005a, 0x9003c, 0x14, 0x10050, 0x1005a, 0x1003c, 0x1005a, 0x4005a, 0x20050, 0x1003c, 0x1003c, 0x10014, 0x1005a, 0x10028, 0x10046, 0x1005a, 0x10046, 0xa, 0x5a, 0x1003c, 0x10032, 0x1005a, 0x10046, 0x1005a, 0x0, 0x2005a, 0x1005a, 0x10028, 0x1005a, 0x10050, 0x10046, 0x10046, 0x20050, 0x1001e, 0x1005a, 0x10032, 0x1005a}
        	            	actual  : []uint64{0x1003c, 0x40046, 0x10046, 0x46, 0x10028, 0x5003c, 0x3c, 0x1e, 0x1e, 0x10032, 0x1005a, 0x2005a, 0x46, 0x10046, 0x10046, 0x10014, 0x10032, 0x20032, 0x10050, 0x10050, 0x10032, 0x5a, 0x10028, 0x10046, 0x10046, 0x1003c, 0x20050, 0x2003c, 0x10046, 0x1000a, 0x10028, 0x1005a, 0x30046, 0x1005a, 0x1003c, 0x5a, 0x28, 0x1001e, 0x20050, 0x10046, 0x1005a, 0x10050, 0x1003c, 0x10032, 0x10032, 0x1000a, 0x1005a, 0x10050, 0x1003c, 0x10032}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,52 +1,52 @@
        	            	 ([]uint64) (len=50) {
        	            	+ (uint64) 65596,
        	            	+ (uint64) 262214,
        	            	+ (uint64) 65606,
        	            	+ (uint64) 70,
        	            	+ (uint64) 65576,
        	            	+ (uint64) 327740,
        	            	+ (uint64) 60,
        	            	+ (uint64) 30,
        	            	+ (uint64) 30,
        	            	  (uint64) 65586,
        	            	- (uint64) 0,
        	            	+ (uint64) 65626,
        	            	+ (uint64) 131162,
        	            	+ (uint64) 70,
        	            	+ (uint64) 65606,
        	            	+ (uint64) 65606,
        	            	+ (uint64) 65556,
        	            	+ (uint64) 65586,
        	            	+ (uint64) 131122,
        	            	+ (uint64) 65616,
        	            	+ (uint64) 65616,
        	            	+ (uint64) 65586,
        	            	+ (uint64) 90,
        	            	+ (uint64) 65576,
        	            	+ (uint64) 65606,
        	            	+ (uint64) 65606,
        	            	+ (uint64) 65596,
        	            	+ (uint64) 131152,
        	            	+ (uint64) 131132,
        	            	+ (uint64) 65606,
        	            	+ (uint64) 65546,
        	            	+ (uint64) 65576,
        	            	+ (uint64) 65626,
        	            	+ (uint64) 196678,
        	            	+ (uint64) 65626,
        	            	+ (uint64) 65596,
        	            	+ (uint64) 90,
        	            	+ (uint64) 40,
        	            	+ (uint64) 65566,
        	            	+ (uint64) 131152,
        	            	+ (uint64) 65606,
        	            	  (uint64) 65626,
        	            	  (uint64) 65616,
        	            	- (uint64) 65626,
        	            	- (uint64) 30,
        	            	- (uint64) 65616,
        	            	- (uint64) 90,
        	            	- (uint64) 40,
        	            	- (uint64) 65576,
        	            	  (uint64) 65596,
        	            	  (uint64) 65586,
        	            	- (uint64) 65626,
        	            	- (uint64) 65626,
        	            	- (uint64) 589884,
        	            	- (uint64) 20,
        	            	- (uint64) 65616,
        	            	- (uint64) 65626,
        	            	- (uint64) 65596,
        	            	- (uint64) 65626,
        	            	- (uint64) 262234,
        	            	- (uint64) 131152,
        	            	- (uint64) 65596,
        	            	- (uint64) 65596,
        	            	- (uint64) 65556,
        	            	- (uint64) 65626,
        	            	- (uint64) 65576,
        	            	- (uint64) 65606,
        	            	- (uint64) 65626,
        	            	- (uint64) 65606,
        	            	- (uint64) 10,
        	            	- (uint64) 90,
        	            	- (uint64) 65596,
        	            	  (uint64) 65586,
        	            	- (uint64) 65626,
        	            	- (uint64) 65606,
        	            	- (uint64) 65626,
        	            	- (uint64) 0,
        	            	- (uint64) 131162,
        	            	- (uint64) 65626,
        	            	- (uint64) 65576,
        	            	+ (uint64) 65546,
        	            	  (uint64) 65626,
        	            	  (uint64) 65616,
        	            	- (uint64) 65606,
        	            	- (uint64) 65606,
        	            	- (uint64) 131152,
        	            	- (uint64) 65566,
        	            	- (uint64) 65626,
        	            	- (uint64) 65586,
        	            	- (uint64) 65626
        	            	+ (uint64) 65596,
        	            	+ (uint64) 65586
        	            	 }
        	Test:       	TestWMHHash
--- FAIL: TestWMHHash (0.00s)
=== RUN   TestWMHTrash
2021/11/10 23:00:53 len(values)=9 is not equal to len(indices)=10
2021/11/10 23:00:53 len(values)=10 is not equal to len(indices)=9
2021/11/10 23:00:53 index is out of range: 100 @ 9
--- PASS: TestWMHTrash (0.00s)
FAIL
FAIL	github.com/go-enry/go-license-detector/v4/licensedb/internal/wmh	0.070s
FAIL

License differs from standard Apache-2.0 document

go-license-detector's own license is a slight modification of the Apache-2.0 document. GitHub cannot correctly detect the license and instead says "View license" instead of Apache-2.0 license within the first page-view of the project. Correcting this will help others quickly determine the license of the project.

I will make a PR to fix this.

0BSD detected over ISC

Hi! 👋🏻

I'm using this library as part of wwhrd, which is used to detect licenses in go-based projects.

One of the users of wwhrd found an interesting issue (frapposelli/wwhrd#40) where even when presented with a verbatim ISC license, the library detects a 0BSD license with 95% probability.

I was previously using the v3 version of the library, which presented a 93% probability of being 0BSD and 84% of being ISC, which is still wrong but slightly more accurate.

Although the 0BSD one is shorter, the licenses are very similar, missing a critical sentence in the first part.

Happy to help with the debug process 👐🏻

[Question] Can you scan arbitrary files for licenses in code comments?

Hello, I've been experimenting with the v4.3.0 release to scan files in my C++ repository for licenses. I've found many in the readme or license files, but sometimes there are licenses in "header-only" libraries that are at the top of the header file in comments, like so:

#ifndef DATE_H
#define DATE_H

// The MIT License (MIT)
//
// Copyright (c) 2015, 2016, 2017 Howard Hinnant
// Copyright (c) 2016 Adrian Colomitchi
// Copyright (c) 2017 Florian Dang
// Copyright (c) 2017 Paul Thompson
// Copyright (c) 2018, 2019 Tomasz Kamiński
// Copyright (c) 2019 Jiangang Zhuang
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.

Since this is in a file ...external/date/date.h in a separate subdirectory from the README, it doesn't show up in the results.

Is there anything I can do to detect these?

EDIT: I noticed there was one pull request to expose an interface for scanning arbitrary files, but that's a go interface. Is there anything that's already existing for someone using the precompiled executable from a bash script, for instance?

Thanks!

Make license database public

At GitLab we've been working to integrate this package in our codebase (with the kind help of @g4s8).

But to use this package to it's full extent, it would be nice if it would expose more info about the detected license.

The info we'd need:

  • name
  • nickname (not sure how this differs from the name, is it the SPDX name?)
  • url to the license

I don't think this require a change in the API, but we'd need outside access to licensedb/internal. But I'm happy to hear your thought on that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.