Git Product home page Git Product logo

zoekt's Introduction

This is a fast text search engine, intended for use with source code. (Pronunciation: roughly as you would pronounce "zooked" in English)

NOTICE: github.com/sourcegraph/zoekt is the active main repository for Zoekt development.

INSTRUCTIONS

Downloading:

go get github.com/google/zoekt/

Indexing:

go install github.com/google/zoekt/cmd/zoekt-index
$GOPATH/bin/zoekt-index .

Searching

go install github.com/google/zoekt/cmd/zoekt
$GOPATH/bin/zoekt 'ngram f:READ'

Indexing git repositories:

go install github.com/google/zoekt/cmd/zoekt-git-index
$GOPATH/bin/zoekt-git-index -branches master,stable-1.4 -prefix origin/ .

Indexing repo repositories:

go install github.com/google/zoekt/cmd/zoekt-{repo-index,mirror-gitiles}
zoekt-mirror-gitiles -dest ~/repos/ https://gfiber.googlesource.com
zoekt-repo-index \
   -name gfiber \
   -base_url https://gfiber.googlesource.com/ \
   -manifest_repo ~/repos/gfiber.googlesource.com/manifests.git \
   -repo_cache ~/repos \
   -manifest_rev_prefix=refs/heads/ --rev_prefix= \
   master:default_unrestricted.xml

Starting the web interface

go install github.com/google/zoekt/cmd/zoekt-webserver
$GOPATH/bin/zoekt-webserver -listen :6070

A more organized installation on a Linux server should use a systemd unit file, eg.

[Unit]
Description=zoekt webserver

[Service]
ExecStart=/zoekt/bin/zoekt-webserver -index /zoekt/index -listen :443  --ssl_cert /zoekt/etc/cert.pem   --ssl_key /zoekt/etc/key.pem
Restart=always

[Install]
WantedBy=default.target

SEARCH SERVICE

Zoekt comes with a small service management program:

go install github.com/google/zoekt/cmd/zoekt-indexserver

cat << EOF > config.json
[{"GithubUser": "username"},
 {"GithubOrg": "org"},
 {"GitilesURL": "https://gerrit.googlesource.com", "Name": "zoekt" }
]
EOF

$GOPATH/bin/zoekt-server -mirror_config config.json

This will mirror all repos under 'github.com/username', 'github.com/org', as well as the 'zoekt' repository. It will index the repositories.

It takes care of fetching and indexing new data and cleaning up logfiles.

The webserver can be started from a standard service management framework, such as systemd.

SYMBOL SEARCH

It is recommended to install Universal ctags to improve ranking. See here for more information.

ACKNOWLEDGEMENTS

Thanks to Alexander Neubeck for coming up with this idea, and helping me flesh it out.

DISCLAIMER

This is not an official Google product

zoekt's People

Contributors

asdine avatar bzz avatar dgryski avatar dongs0104 avatar dziemba avatar eskriett avatar ezkl avatar glundh avatar greenyouse avatar hanwen avatar heltonmarx avatar ijsnow avatar keegancsmith avatar kepeket avatar mattn avatar nicksnyder avatar nikos912000 avatar niteria avatar poucet avatar quite avatar sluongng avatar spearce avatar sselberg avatar stefanhengl avatar tobix avatar vrischmann avatar zbindenren avatar zchee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zoekt's Issues

cs.bazel.build keeps unloading the same shard

2018/01/12 06:09:47 got query " r:torvalds craz[yi]"
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:10 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:11 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:11 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:11 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:11 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:10:11 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:11:48 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:11:48 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:11:48 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:11:48 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:11:48 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
2018/01/12 06:11:48 unloading: /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt

$ ls -l /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt
-rw-r--r-- 1 root root 94511883 Jan 12 10:22 /zoekt/index/android.googlesource.com%2Fplatform%2Fsuperproject_v14.00021.zoekt

builtin source browser should support line anchors.

Hi,

I'm using zoekt to index src rpms, before build it, after that the patches are already applied, and I'm wondering if zoekt is the right tool. The index and search tool, work as expected, however I see that if I search for a term, in the results page the links to the code number are wrong, pointing to the searched term. I.e if my search looks like http://127.0.0.1:6070/search?q=TIFFLinkDirectory&num=50, every single line in the results page will link to http://127.0.0.1:6070/search?q=TIFFLinkDirectory&num=50#
I don't know if its a bug or thats related with the way that I'm using it. Does it should link to a code browser kind like http://lxr.linux.no/? If not, do you have any recommendation of code browser, which could interact well with zoekt (without reimplement the same features?)

startup is too slow

webserver restart today at 20:39 UTC. Restart took almost 7 minutes. We should parallelize shard loading on startup again.

Some notes on Installation

Hi,
Great to see some alternative to csearch!
I tried to install it an RHEL 7.2 and I had to do some additional steps.

I had to do an go get first:
go get github.com/google/zoekt/cmd/zoekt-index

It would also he helpful to mention that the git repository indexing support depends on libgit2 (and libgit2-devel, I guess) being installed, which in turn depends on git2go. They mean issue here for me was that on my OS only version 0.21 of libgit2 was available in the repository. I therefore had to compile libgit (0.24) from scratch, which worked flawlessly.

SSH URLs not recognized

[hanwen@localhost zoekt]$ zoekt-git-index .git
2017/06/26 23:57:16 setTemplatesFromConfig(/home/hanwen/go/src/github.com/google/zoekt/.git): unknown git hosting site "[email protected]:google/zoekt.git"

query the rest api

Hi,

I was reading the source code and i couldn't figure out how can I query the rest api.

My best try was:

curl -X POST -H "Content-type: application/json; charset=utf-8" -d '{"Query":"printf"}' http://zoekt-server:6070/api/search

However it doesn't work. Could you please provide me an example?

Bare gits needs to have refs/heads/ included in ref name (go-git.v4?)

When trying out latest Zoekt (@master), it seems that the full branch name (**refs/heads/**master) needs to be specified for bare gits to allow Zoekt to actually index them.

In our previously deployed version of Zoekt (pre go-git.v4, so I'm guessing this is the change that brought on the issue) it was fine running zoekt-git-index --branches master,stable-* on bare gits and have Zoekt indexing the specified branches. This is not the case anymore and is a bit problematic since the full ref-spec is now stored into the index and presented in the UI.

When branches are presented as "refs/heads/master" in the UI it hinders readability and wastes screen real estate in the UI. I'm guessing this also this affects branch searches.

I can try making a patch for it, though I'm not sure if we should change how zoekt-git-index interpret branch names, or just change how the UI presents the information.

error return ignored

func (b *Builder) buildShard(
..

for _, t := range todo {
	shardBuilder.Add(*t)
}

Add returns an error.

provide -index_cache flag in webserver

the slowness on cold cache of cs.bazel.build since November is explained.

In November, I made a new VM image, with a Persistent Disk holding the repos and index shards, with syncing to a serving SSD. Unfortunately, the webserver flag still pointed to the old location, which was the Persistent Disk, and so it has been serving from networked storage all this time.

To avoid having to setup a sync cronjob correctly, move the cronjob sync-to-SSD functionality into the webserver.

crash

https://cs.bazel.build/search?q=register+syncCallback+case%3Ano&num=50

unexpected fault address 0x7fdb75ef1ef1
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x7fdb75ef1ef1 pc=0x571cf6]
goroutine 3568545 [running]:
runtime.throw(0x93589e, 0x5)
/usr/lib/google-golang/src/runtime/panic.go:622 +0x8a fp=0xc530c07228 sp=0xc530c07208 pc=0x54224a
runtime.sigpanic()
/usr/lib/google-golang/src/runtime/signal_unix.go:395 +0x211 fp=0xc530c07278 sp=0xc530c07228 pc=0x5579d1
runtime.memmove(0xc42083e270, 0x7fdb75ef1ef1, 0xc)
/usr/lib/google-golang/src/runtime/memmove_amd64.s:172 +0x136 fp=0xc530c07280 sp=0xc530c07278 pc=0x571cf6
runtime.slicebytetostring(0x0, 0x7fdb75ef1ef1, 0xc, 0x4d10f, 0xc42083e270, 0xc)
/usr/lib/google-golang/src/runtime/string.go:98 +0x6f fp=0xc530c072b0 sp=0xc530c07280 pc=0x55c1ef
github.com/google/zoekt/web.(*Server).formatResults(0xc8ca5a8000, 0xc8a4e8aab0, 0xc71bc56f70, 0xb, 0xc42276ff00, 0xc7d3747200, 0xc8a4e8aab0, 0x0, 0x0, 0
xc707ebeb78)
/usr/local/google/home/hanwen/go/src/github.com/google/zoekt/web/snippets.go:127 +0x29c fp=0xc530c07a70 sp=0xc530c072b0 pc=0x8523dc
github.com/google/zoekt/web.(*Server).serveSearchErr(0xc8ca5a8000, 0x98a2c0, 0xc7d93be380, 0xc5fb950700, 0x0, 0x0)
/usr/local/google/home/hanwen/go/src/github.com/google/zoekt/web/server.go:244 +0x48e fp=0xc530c07c68 sp=0xc530c07a70 pc=0x84f8be
github.com/google/zoekt/web.(*Server).serveSearch(0xc8ca5a8000, 0x98a2c0, 0xc7d93be380, 0xc5fb950700)
/usr/local/google/home/hanwen/go/src/github.com/google/zoekt/web/server.go:157 +0x4d fp=0xc530c07cc0 sp=0xc530c07c68 pc=0x84f28d
github.com/google/zoekt/web.(*Server).(github.com/google/zoekt/web.serveSearch)-fm(0x98a2c0, 0xc7d93be380, 0xc5fb950700)
/usr/local/google/home/hanwen/go/src/github.com/google/zoekt/web/server.go:146 +0x48 fp=0xc530c07cf0 sp=0xc530c07cc0 pc=0x8549f8
net/http.HandlerFunc.ServeHTTP(0xc88e6c68e0, 0x98a2c0, 0xc7d93be380, 0xc5fb950700)
/usr/lib/google-golang/src/net/http/server.go:1946 +0x44 fp=0xc530c07d18 sp=0xc530c07cf0 pc=0x7c64d4
net/http.(*ServeMux).ServeHTTP(0xc88151bd40, 0x98a2c0, 0xc7d93be380, 0xc5fb950700)
/usr/lib/google-golang/src/net/http/server.go:2336 +0x130 fp=0xc530c07d58 sp=0xc530c07d18 pc=0x7c8140
net/http.serverHandler.ServeHTTP(0xc88a00d5f0, 0x98a2c0, 0xc7d93be380, 0xc5fb950700)
/usr/lib/google-golang/src/net/http/server.go:2693 +0xbc fp=0xc530c07d88 sp=0xc530c07d58 pc=0x7c917c
net/http.(*conn).serve(0xc5008774a0, 0x98a740, 0xc89b37d080)
/usr/lib/google-golang/src/net/http/server.go:1829 +0x651 fp=0xc530c07fc8 sp=0xc530c07d88 pc=0x7c54f1
runtime.goexit()
/usr/lib/google-golang/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc530c07fd0 sp=0xc530c07fc8 pc=0x571351
created by net/http.(*Server).Serve
/usr/lib/google-golang/src/net/http/server.go:2794 +0x27b
goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x7fdf5b1d0e30, 0x72, 0x0)
/usr/lib/google-golang/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc88b4d1a98, 0x72, 0xc420570000, 0x0, 0x0)
/usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc88b4d1a98, 0xffffffffffffff00, 0x0, 0x0)
/usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:90 +0x3d

allow short pattern in conjunction

pattern "at" too short. Did you mean "r:gerrit Pushing requires being at least project owner" ?

We can search for this efficiently because we have other trigrams. Transform into a brute force substring if there are other trigrams in the conjunction.

surface more semantic information

support operators such as string:literal type:Blabla method:BlaBla etc.

Universal-ctags extracts some of this information (but not very well, see below), and this will need language specific mappings to deal with different notions of things (is an interface different from a type?)

{"_type": "tag", "name": "FileCount", "path": "api.go", "pattern": "/^\tFileCount int$/", "language": "Go", "line": 103, "kind": "member", "scope": "Stats", "scopeKind": "struct"}
{"_type": "tag", "name": "FileMatch", "path": "api.go", "pattern": "/^type FileMatch struct {$/", "language": "Go", "line": 27, "kind": "func"} ## ??
{"_type": "tag", "name": "FileMatch", "path": "api.go", "pattern": "/^type FileMatch struct {$/", "language": "Go", "line": 27, "kind": "struct"}

should this use a separate corpus, or be integrated with the normal string search?

lang:xyz filtering on shard level

sym:repo lang:Go

yields 0 results, because "Go" is not a recognizd language. However, we do get trigrams for "repo", and the query takes a couple of seconds with cold caches.

We could do a bitmask union of all the languages in a repo, and check that before starting iteration.

this is a win, because many repos have just a few languages.

too many branches?

Hi,

I was trying to index some large repositories with a mass of release and feature braches (80 to be exact) and ran into the error

indexGitRepo(/PATH): too many branches

I can see in

if len(desc.Branches) > 64 {
that the maximum is 64 branches. Is this a technical limitation? Can I raise the limit without ill effect?

Improve base document ranking within shards

Shorter file names are closer to the root, and are usually more important. More recently modified files are more important.

This has two aspects:

  • when building the index, the more important files should be at the start of the index, so we are more likely to see them.
  • when ordering matches between different shards, this ranking should also be taken into account.

Suggestion: add a Rank (uint32)

https://github.com/google/zoekt/blob/master/indexbuilder.go#L124

on building the shard, reorder the documents using the ranking. Since the ordering goes into the search results,

https://github.com/google/zoekt/blob/master/eval.go#L601

that should already do something useful, but in order to compare results between shards, we should also store the rank in the index and use that instead of nextDoc/len(docs).

make the parser pluggable and generic

Right now the ctags parser is integrated in the higher level packages, could be nice if the ctag.Parse can be isolated at his own package and include a new interface Parser in the main package to accept generic parsers.

I will be happy to work on that.

drop duplicate atoms

we create as many matchtrees as atoms, so [ A A A ] is 3x slower to match than [ A ]

zoekt-git-index: "-submodules" flag not picked up

The help states:

-submodules
    	if set to false, do not recurse into submodules (default true)

No -submodules flag:

$GOPATH/bin/zoekt-git-index -allow_missing_branches -branches 'stable-*' -prefix "" .
2017/05/04 11:10:00 indexGitRepo(/path/gerrit/gerrit.git): submodule plugins/singleusergroup: Failed to resolve path '<server>/gerrit/plugins/singleusergroup.git': No such file or directory

-submodules=false yields the very same result:

$GOPATH/bin/zoekt-git-index -submodules=false -allow_missing_branches -branches 'stable-*' -prefix "" .
2017/05/04 11:10:00 indexGitRepo(/path/gerrit/gerrit.git): submodule plugins/singleusergroup: Failed to resolve path '<server>/gerrit/plugins/singleusergroup.git': No such file or directory

Bazel sandbox was refactored

In https://github.com/google/zoekt#symbol-search there is a mini-howto to build the bazel sandbox, but the code seems to have changed in bazel. I got a working binary using these steps:

  1. Get process-tools.h process-tools.cc linux-sandbox.cc linux-sandbox.h linux-sandbox-options.cc linux-sandbox-options.h linux-sandbox-pid1.cc linux-sandbox-pid1.h linux-sandbox-utils.h from the same path.
  2. Replace some includes: sed -i 's|src/main/tools/||' *.{cc,h}
  3. Compile with g++: g++ -Wall -std=c++11 -o linux-sandbox linux-sandbox.cc linux-sandbox-options.cc linux-sandbox-pid1.cc process-tools.cc

The resulting linux-sandbox tool seems to work in a similar way to the old namespace-sandbox tool...

Query parser does not apply case to Not tokens

I was playing around with the query parser and noticed this inconsistency. When you specify case:yes, it applies to all the query.Q. But that doesn't seem to apply to a not query. Concretely

query.Parse(`content:foo case:yes f:\.go$ f:\.yaml$ -f:\bvendor\b`)
got  (and case_content_substr:"foo" case_file_regex:"\\.go(?m:$)" case_file_regex:"\\.yaml(?m:$)" (not file_regex:"\\bvendor\\b"))
want (and case_content_substr:"foo" case_file_regex:"\\.go(?m:$)" case_file_regex:"\\.yaml(?m:$)" (not case_file_regex:"\\bvendor\\b"))

is this expected behaviour when being explicit about case?

anchor to results is wrong

Hi,

if i search for a term, and I'm running the webserver with "-print", then I get the results, and you have the line number as link to the place where the search term was found. However the anchor is wrong (i.e if you click in the line number, it doesn't bring you to the exact line number)

so I was able to fix it with:

diff --git a/web/templates.go b/web/templates.go
index 7e95a28..419f8f2 100644
--- a/web/templates.go
+++ b/web/templates.go
@@ -151,7 +151,7 @@ To list repositories, try:
       {{else}}
         <div style="background: #eef;">
         {{range .Matches}}
-          <pre>{{if .URL}}<a href="{{.URL}}">{{end}}{{.LineNum}}{{if .URL}}</a>{{end}}: {{range .Fragments}}{{.Pre}}<b>{{.Match}}</b>{{.Post}}{{end}}</pre>
+          <pre>{{if .URL}}<a href="{{.URL}}l{{.LineNum}}">{{end}}{{.LineNum}}{{if .URL}}</a>{{end}}: {{range .Fragments}}{{.Pre}}<b>{{.Match}}</b>{{.Post}}{{end}}</pre>
         {{end}}
         </div>
       {{end}}

However could you tell me how would be the best way to fix it? Maybe in the method where we generate the URL with fragments? Or maybe some defensive logic in the template to avoid issues when the LineNum doesn't exist?

Missing files and results

I am unable to see all search results for a string that I was trying to search. If I just look at the index file and grep it, I get more results then what I am seeing in the webpage.

I also found that certain files are never indexed. File search for these shows zero results, also these files do not show up in the index file too..

query without positive atom does not advance iterator

017/07/25 23:15:12 got query "r:bazel lang:python"
2017/07/25 23:15:12 lastDoc 90, nextDoc 90
2017/07/25 23:15:12 lastDoc 809, nextDoc 809
2017/07/25 23:15:12 lastDoc 134, nextDoc 134
2017/07/25 23:15:12 lastDoc 36, nextDoc 36
2017/07/25 23:15:12 lastDoc 152, nextDoc 152
2017/07/25 23:15:12 crashed shard: shard(/zoekt/index/github.com%2Fbazelbuild%2Fbazel_v14.00000.zoekt): lastDoc 90, nextDoc 90, goroutine 7895054604 [running]:
runtime/debug.Stack(0xc5ea7796e0, 0xc598d4def0, 0x43)
/usr/lib/google-golang/src/runtime/debug/stack.go:24 +0x79
github.com/google/zoekt/shards.(*shardedSearcher).Search.func1.1(0xc47d23c580, 0x9d0fe0, 0xc5ea7796e0, 0xc6870ec120)
/usr/local/google/home/hanwen/go/src/github.com/google/zoekt/shards/shards.go:118 +0xc2
panic(0x78e2e0, 0xc44dfe8890)
/usr/lib/google-golang/src/runtime/panic.go:489 +0x2e1
log.Panicf(0x825dbb, 0x16, 0xc7296aec70, 0x2, 0x2)
/usr/lib/google-golang/src/log/log.go:329 +0xda
github.com/google/zoekt.(*indexData).Search(0xc4be9b5180, 0x7f70a27fb040, 0xc6870ec1e0, 0x9cb1e0, 0xc674805700, 0xc426027c20, 0xc60b8e7f30, 0xc60b8e7f28, 0xc7776df630)
/usr/local/google/home/hanwen/go/src/github.com/google/zoekt/eval.go:668 +0x1638
github.com/google/zoekt/shards.(*searchShard).Search(0xc5ea7796e0, 0x7f70a27fb040, 0xc6870ec1e0, 0x9cb1e0, 0xc674805700, 0xc426027c20, 0x9cb260, 0xc4241010fa, 0xc42cdb8e80)
:5 +0x82
github.com/google/zoekt/shards.(*shardedSearcher).Search.func1(0xc47d23c580, 0xc6870ec120, 0xc490a4dee0, 0x9cb1e0, 0xc674805700, 0xc426027c20, 0x9d0fe0, 0xc5ea7796e0)
/usr/local/google/home/hanwen/go/src/github.com/google/zoekt/shards/shards.go:126 +0xf9
created by github.com/google/zoekt/shards.(*shardedSearcher).Search
/usr/local/google/home/hanwen/go/src/github.com/google/zoekt/shards/shards.go:128 +0x404

Ignore .svn directories?

Hi,
We still have some repositories in subversion,which creates .svn directories for its meta-data.
It seems that the .git get skipped by the git indexer.
It would be useful to also skip the .svn folders, or more generically offer a way to skip directories, e.g everything in .gitignore maybe?

watchdog failure leaves no stack traces.

today 20:39, zoekt had a watchdog failure, but there is no way to debug it:

2018/01/11 20:36:55 got query " r:torvalds craz[yi]"
2018/01/11 20:38:02 got query "virtio "
2018/01/11 20:38:26 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00010.zoekt, err
2018/01/11 20:38:26 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00009.zoekt, err
2018/01/11 20:38:26 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00008.zoekt, err
2018/01/11 20:38:27 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00013.zoekt, err
2018/01/11 20:38:27 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00005.zoekt, err
2018/01/11 20:38:27 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00017.zoekt, err
2018/01/11 20:38:27 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00012.zoekt, err
2018/01/11 20:38:27 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00007.zoekt, err
2018/01/11 20:38:27 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00000.zoekt, err
2018/01/11 20:38:28 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00003.zoekt, err
2018/01/11 20:38:28 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00016.zoekt, err
2018/01/11 20:38:28 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00001.zoekt, err
2018/01/11 20:38:28 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00011.zoekt, err
2018/01/11 20:38:28 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00014.zoekt, err
2018/01/11 20:38:28 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00004.zoekt, err
2018/01/11 20:38:28 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00002.zoekt, err
2018/01/11 20:38:28 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00006.zoekt, err
2018/01/11 20:38:28 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fcodesearch_v14.00015.zoekt, err
2018/01/11 20:38:47 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fdeps%2Ficu_v14.00000.zoekt, err
2018/01/11 20:39:06 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fllvm-project%2Flibcxx_v14.00000.zoekt, err
2018/01/11 20:39:08 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fsrc%2Fandroid_webview%2Fglue_v14.00000.zoekt, err
2018/01/11 20:39:21 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fsrc%2Fbase_v14.00000.zoekt, err
2018/01/11 20:39:22 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fsrc%2Fbuild%2Fconfig_v14.00000.zoekt, err
2018/01/11 20:39:25 got query "packer helm"
2018/01/11 20:39:28 reloading: /zoekt/index/chromium.googlesource.com%2Fchromium%2Fsrc%2Fbuild_v14.00000.zoekt, err
2018/01/11 20:39:39 watchdog: Get http://:80: context deadline exceeded

this should print stacktraces.

repo indexing assumes paths have the same subrepo across branches.

Makson: the link of file in search result point to a wrong git repository.

the situation is that the git repository R1 on branch B1 and R2 on branch B2 have same path specified in manifest.

HW: B1 has R1 on path "subrepo/" and B2 has R2 on the same path "subrepo/" ?

Panic when indexing document with symbols that cross rune boundaries

Sometimes ctags (both exuberant and universal) return symbol boundaries that cross rune boundaries. This leads to a panic during indexing.

I have reduced the issue that happened during indexing my codebase to this test:

func TestBadRunesFromCtags(t *testing.T) {
	content := []byte("Hlášení: {0}")
	b := testIndexBuilder(t, nil,
		Document{
			Name:    "bad_rune.cs",
			Content: content,
			Symbols: []DocumentSection{{6, 9}}, // splits í rune (ctags does this)
		},
	)
	searchForTest(t, b, &query.Substring{Pattern: "Hlášení"})
}

which panics:

--- FAIL: TestBadRunesFromCtags (0.00s)
panic: runtime error: index out of range [recovered]
        panic: runtime error: index out of range

goroutine 340 [running]:
testing.tRunner.func1(0xc42012a2d0)
        /usr/local/go/src/testing/testing.go:711 +0x2d2
panic(0x799260, 0xa05be0)
        /usr/local/go/src/runtime/panic.go:491 +0x283
github.com/google/zoekt.(*postingsBuilder).newSearchableString(0xc42007c500, 0xc42012c34f, 0xf, 0x1, 0xc42012c350, 0x1, 0x1, 0x2, 0xc42023fb30, 0x0, ...)
        /home/mmm/go/src/github.com/google/zoekt/indexbuilder.go:138 +0xc07
github.com/google/zoekt.(*IndexBuilder).Add(0xc4200b8420, 0x803c18, 0x7, 0xc42012c340, 0xf, 0x10, 0x0, 0x0, 0x0, 0x0, ...)
        /home/mmm/go/src/github.com/google/zoekt/indexbuilder.go:376 +0x451
github.com/google/zoekt.testIndexBuilder(0xc42012a2d0, 0x0, 0xc420051f20, 0x1, 0x1, 0x10)
        /home/mmm/go/src/github.com/google/zoekt/index_test.go:48 +0x1b4
github.com/google/zoekt.TestBadRunesFromCtags(0xc42012a2d0)
        /home/mmm/go/src/github.com/google/zoekt/index_test.go:1743 +0x18b
testing.tRunner(0xc42012a2d0, 0x81e5c0)
        /usr/local/go/src/testing/testing.go:746 +0xd0
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:789 +0x2de
exit status 2
FAIL    github.com/google/zoekt 0.042s

This happened on latest HEAD which was at 3f9ebd3 at time of writing this issue report.

ctags failures for android.googlesource.com/platform/docs/source.android.com.git

Apr 13 17:02:03 zoekt-instance-local-ssd zoekt-indexserver[20306]: ERR: 2017/04/13 17:02:03 indexGitRepo(/zoekt/repos/android.googlesource.com/platform/docs/source.android.com.git): exec([/zoekt/bin/zoekt-2017-04-06T1709-19631b7/zoekt-sandbox -s /tmp/940425630 -b /tmp/ctags-input080203899=input -d /input -b /zoekt/bin/ctags-universal=ctags -b /lib=lib -b /lib64=lib64 -t tmp -- /ctags -n -f - --sort=no en/devices/tv/hdmi-cec.html zh-tw/security/bulletin/2016-01-01.html ko/security/bulletin/2016-07-01.html en/compatibility/cts/camera-hal.html ru/security/bulletin/2016-03-01.html en/devices/tech/datausage/tags-explained.html en/devices/audio/midi_test.html en/security/bulletin/2016-02-01.html en/compatibility/5.1/versions.html en/devices/tech/debug/asan.html en/devices/tech/admin/index.html en/devices/media/framework-hardening.html en/devices/tech/connect/call-notification.html en/devices/audio/latency_contrib.html en/reference/_toc.yaml en/devices/tech/ota/device_code.html en/devices/camera/camera3_requests_hal.html ko/security/bulletin/2015-10-01.html en/devices/graphics/arch-egl-opengl.html ko/security/bulletin/2015-12-01.html en/devices/tech/dalvik/dalvik-bytecode.html zh-cn/security/bulletin/2015-12-01.html en/devices/tech/ota/block.html en/compatibility/cts/downloads.html zh-cn/security/bulletin/2016-08-01.html en/devices/media/soc.html en/source/read-bug-reports.html en/_book.yaml zh-tw/security/bulletin/2016-04-02.html en/compatibility/4.4/versions.html ja/security/bulletin/2015-11-01.html en/devices/tech/connect/data-saver.html en/security/bulletin/2017.html en/devices/graphics/implement-vulkan.html en/devices/audio/implement.html en/security/bulletin/2015.html en/devices/graphics/automate-tests.html en/security/bulletin/2017-01-01.html en/source/brands.html README.txt ko/security/bulletin/2017-03-01.html en/security/enhancements/enhancements60.html en/devices/tech/test_infra/tradefed/fundamentals/index.html en/devices/graphics/arch-sf-hwc.html en/devices/sensors/report-modes.html ja/security/bulletin/2016-11-01.html en/source/community.html en/devices/audio/avoiding_pi.ht
Apr 13 17:02:03 zoekt-instance-local-ssd zoekt-indexserver[20306]: OUT:
Apr 13 17:02:03 zoekt-instance-local-ssd zoekt-indexserver[20306]: 2017/04/13 17:02:03 command [zoekt-git-index -require_ctags -parallelism=1 -repo_cache /zoekt/repos -index /zoekt/index -incremental /zoekt/repos/android.googlesource.com/platform/docs/source.android.com.git] failed: exit status 1
Apr 13 17:02:02 zoekt-instance-local-ssd zoekt-indexserver[20306]: ctags: cannot open temporary file : Value too large for defined data type

ctags timeout configurable?

Hi,

can we set the ctags timeout configurable? I think the default for large codebases and not so powerful machines, sometimes isn't enough. I have a patch to do it that i could send it via pull request (zoekt-index gets a new argument which is passed to build/ctags.go. if nothing is given, default is assumed)

Issues found by https://github.com/dominikh/go-staticcheck

Two small issues flagged with static analysis:

<dgryski@kamek[zoekt] ʕ╯◔ϖ◔ʔ╯ > staticcheck ./...
/home/dgryski/work/src/cvs/gocode/src/github.com/google/zoekt/shards.go:180:5: ineffective break statement. Did you mean to break out of the outer loop?
/home/dgryski/work/src/cvs/gocode/src/github.com/google/zoekt/query/parse.go:108:4: ineffective break statement. Did you mean to break out of the outer loop?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.