google / codesearch Goto Github PK

Fast, indexed regexp search over large file trees

Home Page: http://swtch.com/~rsc/regexp/regexp4.html

License: BSD 3-Clause "New" or "Revised" License

Go 98.68% Shell 1.32%

codesearch's Introduction

Code Search is a tool for indexing and then performing
regular expression searches over large bodies of source code.
It is a set of command-line programs written in Go.

For background and an overview of the commands,
see http://swtch.com/~rsc/regexp/regexp4.html.

To install:

	go get github.com/google/codesearch/cmd/...

Use "go get -u" to update an existing installation.

Russ Cox
[email protected]
June 2015

codesearch's People

Contributors

Stargazers

Watchers

Forkers

mattn ikawaha awesomenix mbbill markfelt easycody dbaxa yang6n narayana1208 ateleshev s-urbaniak orivej rjkroege shuky19 herrphon tchen0123 wokkaflokka ksmaheshkumar jrjang kienhung ollie314 firstcomdigital hanscj1 hotei bdx0 dryruner avyayah alexkring jpathak stephaniemak kotakondavinay piger mcanthony mnjstwins 5a68656e67 pombredanne taliesinb aakarsh andrewstucki sangohan cmeon lanzafame yusong666666 agent-0007 coltnz phistuck bradparks akshaykumar12527 postromantic sis-cats ahnan4arch hanazuki dominikh khushboo1192 cgcai c0ns0le metaflow snapbug gonotes victorgan kleopatra999 cybernetics machao-github wesley-moore-rockalltech qinshulei mbrukman beiyexertz lamproae thomasking2014 qiyeboy zaccone awesome-security flyr4nk sbilly kalmuthu komalbansal1991 yesol1209 davidchu201 digideskio uikit0 leepro vkvns timwee runt18 a5b claudiouzelac casial yasushi-saito sjmik arivero checko relaxar strategist922 wiltonlazary evanj longjohncoder junkblocker antelle nasiry ashang

codesearch's Issues

Pull request: add support for per-file custom annotations

This change introduces the ability to attach annotation blobs to each entry in
the index. We index all our internal and external repositories at Twitter, and
the annotations are used to add data like commit count, number of import
references, etc. to each file. This is then used for scoring at search time to
improve ranking.

There are a couple of other minor changes useful for using the code as a
library, such as exposing a couple of private functions or,and and adding a
callback system to match.go.

Change is here: 
http://code.google.com/r/alec-codesearch-annotations/source/detail?r=b965ca9c2a9
4d464ef541d05f4004368b9c3508b

Original issue reported on code.google.com by [email protected] on 29 Aug 2012 at 10:19

cindex needs vast amounts of temporary space

1. Run cindex on about 9 GiB of files (where about half of them are text files)
2. Observe space usage of cindex

What is the expected output?

cindex takes a reasonable amount of temporary space and provides an option to 
specify an alternative place to store temporary files.

What do you see instead?

cindex eats up to 5 GiB of space on /tmp while generating a 600 MiB 
.csearchindex file.

What version of the product are you using? On what operating system?

tip on Linux with go tip

Please provide any additional information below.

It would be nice if cindex could provide an option to change the temporary 
directory as /tmp might not be large enough. Additionally, it might be possible 
to lower the space usage of cindex, for instance, it might be possible to merge 
(and therefore compress) temporary indices before processing the next set of 
files.

Original issue reported on code.google.com by [email protected] on 23 Sep 2013 at 1:23

index: merge of index fails

I've tried to index the Go sources on Windows. The first run of Cindex is like expected. The second run doesn't remove .csearchindex~ and has created a second file .csearchindex~~. I don't know whether the merge of .csearchindex and .csearchindex~ has been successful.

The expected behaviour is .csearchindex and .csearchindex~ will be merged and .csearchindex~ will removed.

OS: Windows 7
Go: go1.5rc1

First Run

v% cindex go
2015/08/06 09:36:34 index E:\home\visfj\go
2015/08/06 09:36:34 flush index
2015/08/06 09:36:34 merge 0 files + mem
2015/08/06 09:36:35 39563770 data bytes, 9212960 index bytes
2015/08/06 09:36:35 done
v% ls -al
total 9068
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Aug  6 09:36 .
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Jul 30 13:22 ..
-rw-r--r-- 1 visfj Domänen-Benutzer    6213 Aug  6 08:36 .bash_history
-rw-r--r-- 1 visfj Domänen-Benutzer     422 Aug  5 11:29 .bashrc
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug  6 09:36 .csearchindex
-rw-r--r-- 1 visfj Domänen-Benutzer     105 Aug  5 14:26 .gitconfig
-rw-r--r-- 1 visfj Domänen-Benutzer     114 Jul 31 13:56 .gitcookies

Second Run

v% cindex
2015/08/06 09:36:45 index E:\home\visfj\go
2015/08/06 09:36:46 flush index
2015/08/06 09:36:46 merge 0 files + mem
2015/08/06 09:36:46 39563770 data bytes, 9212960 index bytes
2015/08/06 09:36:46 merge E:\home\visfj\.csearchindex E:\home\visfj\.csearchindex~
2015/08/06 09:36:46 done
v% ls -al
total 27068
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Aug  6 09:36 .
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Jul 30 13:22 ..
-rw-r--r-- 1 visfj Domänen-Benutzer    6213 Aug  6 08:36 .bash_history
-rw-r--r-- 1 visfj Domänen-Benutzer     422 Aug  5 11:29 .bashrc
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug  6 09:36 .csearchindex
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug  6 09:36 .csearchindex~
-rw-r--r-- 1 visfj Domänen-Benutzer 9212944 Aug  6 09:36 .csearchindex~~
-rw-r--r-- 1 visfj Domänen-Benutzer     105 Aug  5 14:26 .gitconfig
-rw-r--r-- 1 visfj Domänen-Benutzer     114 Jul 31 13:56 .gitcookies

Adding more trigrams makes post query worse

bash$ csearch -i -verbose '%%0.*sprintf'
2012/02/29 16:02:11 query: "%%0" 
("SPR"|"SPr"|"SpR"|"Spr"|"sPR"|"sPr"|"spR"|"spr")|("\xbfPR" "ſP")|("\xbfPr" 
"ſP")|("\xbfpR" "ſp")|("\xbfpr" "ſp") 
("PRI"|"PRi"|"PrI"|"Pri"|"pRI"|"pRi"|"prI"|"pri") 
("RIN"|"RIn"|"RiN"|"Rin"|"rIN"|"rIn"|"riN"|"rin") 
("INT"|"INt"|"InT"|"Int"|"iNT"|"iNt"|"inT"|"int") 
("NTF"|"NTf"|"NtF"|"Ntf"|"nTF"|"nTf"|"ntF"|"ntf")
2012/02/29 16:02:11 post query identified 3295 possible files
bash$ csearch -i -verbose '%%0' >/dev/null
2012/02/29 16:02:24 query: "%%0"
2012/02/29 16:02:24 post query identified 9 possible files

Since only 9 files match '%%0', I would expect no more than 9 possible files 
from the post query when adding '.*sprintf' to the query string.  It looks like 
something is ignoring the 'restrict' list somewhere.

I'm still working on a minimal test case.  I'll look into this more tonight or 
so.

Original issue reported on code.google.com by dgryski on 29 Feb 2012 at 3:10

Exclude directory from indexing

Please provide way to exclude directory from indexing

csearch sometimes fails on character classes containing only the same letter in upper and lowercase

What steps will reproduce the problem?
laptop$ cat userids.txt 
dgryski
laptop$ cindex .
2012/01/24 14:48:22 index /[XXXXXX]/
2012/01/24 14:48:22 flush index
2012/01/24 14:48:22 merge 0 files + mem
2012/01/24 14:48:22 8 data bytes, 237 index bytes
2012/01/24 14:48:22 done
laptop$ csearch '[g]r'
/[XXXXX]/userids.txt:dgryski
laptop$ csearch '[Hg]r'
/[XXXXX]/userids.txt:dgryski
laptop$ csearch '[Gg]r'
laptop$ 
laptop$ csearch 'g[Rr]'
/[XXXXX]/userids.txt:dgryski
laptop$ csearch '[Dd]g'
laptop$ csearch '[ZDd]g'
/[XXXXX]/userids.txt:dgryski
laptop$

What is the expected output? What do you see instead?
I expect 'dgryski' to be printed, but instead depending on the regex no lines 
are found.

What version of the product are you using? On what operating system?
070ef10ab799 tip.  Darwin 10.8.0

Please provide any additional information below.

Original issue reported on code.google.com by dgryski on 24 Jan 2012 at 1:59

csearch: Add flag (-g) for grouping output by file (a'la ack --group, or git grep --heading) [PATCH]

The goal is so that the output is more palatable to the human eyes, compare:

$ csearch Aqyadaya
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3:\&    
$t1\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3:\&    
$t2\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3:\&    
$t3\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);

$ csearch -g Aqyadaya
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3
\&    $t1\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
\&    $t2\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
\&    $t3\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);

Original issue reported on code.google.com by [email protected] on 20 Apr 2012 at 1:46

Attachments:

csearch-group-by-file.patch

Patch for /index/write_test.go

Add TestHeap()

Original issue reported on code.google.com by [email protected] on 26 Nov 2013 at 4:42

Attachments:

write_test.go.patch

Windows missing conversion

In mmap_windows.go line 26:

h, err := syscall.CreateFileMapping(f.Fd(), nil, ...

Need convert to syscall.Handle:

h, err := syscall.CreateFileMapping(syscall.Handle(f.Fd()), nil,

Original issue reported on code.google.com by [email protected] on 6 Nov 2013 at 5:17

Support text files compressed with gzip, bzip2 and xz

It would be awesome to be able to search my IRC log files, but I compress them to save space. Would it be possible to implement automatic decompression of gzip, bzip2 and xz to the toolchain?

Second and newer indexing loses paths if one directory basename is a prefix

Easier to demonstrate.

% cat  Makefile
.PHONY: all test clean

BAD1 = a
BAD2 = a-b

GOOD1 = a-x
GOOD2 = a-y

.PHONY: template good bad

template:
    rm -rf test
    mkdir -p test/$(VAR1) test/$(VAR2)
    ls > test/$(VAR1)/a
    ls > test/$(VAR2)/a
    rm -f test/.csearchindex
    /usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex test/$(VAR1) test/$(VAR2)
    /usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -list
    # Just index them again to reproduce the bug
    /usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -verbose
    /usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -list
    test `/usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -list | wc -l` = 2

bad:
    VAR1=$(BAD1) VAR2=$(BAD2) $(MAKE) template

good:
    VAR1=$(GOOD1) VAR2=$(GOOD2) $(MAKE) template

No bug when test run with two directories where basenames are not a prefix or another:

% make good
VAR1=a-x VAR2=a-y /Applications/Xcode.app/Contents/Developer/usr/bin/make template
rm -rf test
mkdir -p test/a-x test/a-y
ls > test/a-x/a
ls > test/a-y/a
rm -f test/.csearchindex
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex test/a-x test/a-y
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-x
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-y
2016/02/08 17:37:14 flush index
2016/02/08 17:37:14 merge 0 files + mem
2016/02/08 17:37:14 166 data bytes, 1596 index bytes
2016/02/08 17:37:14 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a-x
/Users/xxxx/work/codesearch/test/a-y
# Just index them again to reproduce the bug
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -verbose
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-x
2016/02/08 17:37:14 83 79 /Users/xxxx/work/codesearch/test/a-x/a
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-y
2016/02/08 17:37:14 83 79 /Users/xxxx/work/codesearch/test/a-y/a
2016/02/08 17:37:14 flush index
2016/02/08 17:37:14 merge 0 files + mem
2016/02/08 17:37:14 166 data bytes, 1596 index bytes
2016/02/08 17:37:14 merge test/.csearchindex test/.csearchindex~
2016/02/08 17:37:14 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a-x
/Users/xxxx/work/codesearch/test/a-y
test `/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list | wc -l` = 2

Bug seen when test run with two directories where one basename is prefix of another:

% make bad
VAR1=a VAR2=a-b /Applications/Xcode.app/Contents/Developer/usr/bin/make template
rm -rf test
mkdir -p test/a test/a-b
ls > test/a/a
ls > test/a-b/a
rm -f test/.csearchindex
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex test/a test/a-b
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a-b
2016/02/08 17:37:11 flush index
2016/02/08 17:37:11 merge 0 files + mem
2016/02/08 17:37:11 166 data bytes, 1592 index bytes
2016/02/08 17:37:11 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a
/Users/xxxx/work/codesearch/test/a-b
# Just index them again to reproduce the bug
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -verbose
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a
2016/02/08 17:37:11 83 79 /Users/xxxx/work/codesearch/test/a/a
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a-b
2016/02/08 17:37:11 83 79 /Users/xxxx/work/codesearch/test/a-b/a
2016/02/08 17:37:11 flush index
2016/02/08 17:37:11 merge 0 files + mem
2016/02/08 17:37:11 166 data bytes, 1592 index bytes
2016/02/08 17:37:11 merge test/.csearchindex test/.csearchindex~
2016/02/08 17:37:11 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a
test `/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list | wc -l` = 2
make[1]: *** [template] Error 1
make: *** [bad] Error 2

Option to remove a path from index

Currently there is no straightforward way to remove one directory from the 
index (cindex -list). One needs to provide the complete new list with cindex 
-reset to remove one entry. Please provide an option to remove one (or more) of 
the entries from the path that are currently indexed.

Original issue reported on code.google.com by [email protected] on 17 May 2012 at 9:09

Minor adjustment of Csearch page.

Not really a issue. But request. 

I see that "Emacs editor integration" is promimently mentioned on the homepage 
of codesearch. In that case, I would recommend to add

"Vim editor integration"  too, link: 
https://github.com/junkblocker/unite-codesearch

Thanks in advance.

Original issue reported on code.google.com by [email protected] on 30 Oct 2014 at 6:07

Where are the binaries?

I've looked in here and although the top-level readme references binaries I can't find anything that isn't a .go file.

cgrep line numbers are 0-indexed

They should be 1-indexed.

This is from the mac 64 zip file e0e01b6d3c01a3ac8b8d0507aae7cb34ba24b1a7 Jan 
19 2012.

-rob

Original issue reported on code.google.com by [email protected] on 21 Jan 2012 at 4:31

"go get" fails to get

$ go get code.google.com/p/codesearch/cmd/...
package code.google.com/p/codesearch/cmd/...: unable to detect version control system for code.google.com/ path

Oh well, let's try github:

$ go get github.com/google/codesearch/...
package github.com/google/codesearch/...
imports github.com/google/codesearch/cmd/cgrep
imports code.google.com/p/codesearch/regexp: unable to detect version control system for code.google.com/ path
package github.com/google/codesearch/...
imports github.com/google/codesearch/cmd/cindex
imports code.google.com/p/codesearch/index: unable to detect version control system for code.google.com/ path
package github.com/google/codesearch/...
imports github.com/google/codesearch/index
imports code.google.com/p/codesearch/sparse: unable to detect version control system for code.google.com/ path

prefix/suffix lists and question-marks don't play nicely together


One of the areas I got stuck on when debugging the trigram-question-mark issue, 
but might actually be a fundamental design limitation / feature, is that moving 
to prefix/suffix lists can cause the list of trigrams to drop considerably.

bash$ ./csearch -verbose 'foo_(bar)?zot' >/dev/null
2012/03/07 22:18:48 query: "foo" "oo_" "zot" ("_zo" "o_z")|("arz" "rzo")
2012/03/07 22:18:48 post query identified 0 possible files
bash$ ./csearch -verbose 'foo_(bar_)?zot' >/dev/null
2012/03/07 22:18:53 query: "foo" "oo_" "zot"
2012/03/07 22:18:53 post query identified 0 possible files

In the first case, "bar" is only three characters and stays as an exact trigram 
and is used to construct the arz/rzo entries.  When it becomes a prefix/suffix 
list (when it hits 4 characters by adding the underscore),  it no longer 
provides us with any trigram info because the empty string empties out the 
prefix and suffix lists as being "redundant" with the empty string.  ("" is a 
prefix of "ba").

I'm not sure if this is a bug or not.  I.e, _should_ we be able to transform 
prefix/suffix lists into AND/OR sets of trigrams in this case?

Original issue reported on code.google.com by dgryski on 7 Mar 2012 at 10:08

"csearch -n" output is 0-based, not 1-based

Maybe this works as designed, but it caught me off guard. The line numbers 
reported by "csearch -n" seem to start at 0 instead of 1. This is at odds with 
e.g. grep, to which csearch compares itself in its help output. This makes it 
somewhat awkward to use "csearch -n" in emacs (and presumably other editors) 
since it numbers its lines starting at 1.

Original issue reported on code.google.com by austin.bingham on 11 Oct 2012 at 10:41

Merged into: #4

Files containing ascii8 are not indexed (feature/request)

What steps will reproduce the problem?
1. File containing: PAT and ä (0xE4, a umlaut), ö (0xF6,o umlaut), other high 
bit ascii chars
2. cindex -reset .
3. csearch -i PAT

What is the expected output? What do you see instead?
Expected to find PAT. Instead no match.

What version of the product are you using? On what operating system?
codesearch-0.01-windows-amd64.zip

Please provide any additional information below.
I looked at the source (write.go) and it seems to expect that files are in 
UTF-8 only (this is a Go specification?). However it would be nice if csearch 
could be used with any source files, including those with high bit ascii 
characters. Or that there would be a command line option for this.

Original issue reported on code.google.com by [email protected] on 21 Nov 2012 at 7:24

Index update fails on Windows

What steps will reproduce the problem?
1. cindex any/path
2. cindex
3.

What is the expected output? 
The index is updated.

What do you see instead?
After merging the master index with the updated index, the rename of 
.csearchindex~~ to .csearchindex fail. This is possibly due to open file 
handles.

What version of the product are you using? On what operating system?
Windows 7, 64 bit

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 23 Jan 2012 at 10:55

Some log.Fatal() calls should be log.Fatalf() [PATCH]

I ran out of disk space on a server and the error message from cindex ("short 
write writing %s") indicated it should have been a log.Fatalf() call instead of 
log.Fatal().

I ran go vet over the source tree and found two more instances of Fatal() calls 
with format specifiers.

The patch is attached.

Original issue reported on code.google.com by dgryski on 20 Feb 2012 at 2:15

Attachments:

write-fatal.patch

Error in read.go comments

http://code.google.com/p/codesearch/source/browse/index/read.go#36

The comment reads : "For example, the delta list [2,5,1,1,0] encodes the file 
ID list 1, 6, 7, 8."

Am I misunderstanding something, or should the delta list read : [1,5,1,1,0] to 
match up to the given file ID list?

Original issue reported on code.google.com by [email protected] on 26 Apr 2012 at 7:10

cindex indexes vcs directories: .hg, .svn, .git, .bzr, ...

What steps will reproduce the problem?
1. Run cindex on a directory containing a checkout
2. Run csearch on a text string known to be in the source code

What is the expected output? What do you see instead?
Expected output is source code matches.  Instead, I see source code matches 
_and_ .svn/.../foo.svn-base matches, or .hg/blah matches, or .git/blah matches.

What version of the product are you using? On what operating system?
f9a003c603e3 tip
Darwin 10.8.0

Please provide any additional information below.
A simple patch to the filewalker function can skip directories if they are 
known to be "special" and not likely to contain items we want to index.  I've 
included a small patch to cindex.go which does just this.

This functionality could also be disabled unless -ignore-vcs or something is 
passed.

Ack (betterthangrep.com) does similar filtering on directories and files: 
#foo.c# from emacs, foo.c~ from vim, etc.  Ignoring those would also be easy to 
add.

Original issue reported on code.google.com by dgryski on 19 Jan 2012 at 9:00

Attachments:

skip-vcs-directories.diff

cgrep blows up when given no arguments

What steps will reproduce the problem?
1. run cgrep
2.
3.

What is the expected output? What do you see instead?
There is a custom usage message which is not displayed.  Instead we get the 
default usage message and a panic.

laptop$ cgrep
Usage of cgrep:
  -c=false: print match counts only
  -cpuprofile="": write cpu profile to this file
  -h=false: omit file names
  -i=false: case-insensitive match
  -l=false: list matching files only
  -n=false: show line numbers
panic: runtime error: index out of range

goroutine 1 [running]:
main.main()
    /[XXXXX]/src/code.google.com/p/codesearch/cmd/cgrep/cgrep.go:58 +0x292

What version of the product are you using? On what operating system?
070ef10ab799 tip. Darwin 10.8.0

Please provide any additional information below.
This is the default flag.Usage() message instead of the custom usage() one 
which exits and would prevent the index-out-of-range error.

The fix is a one-line patch in main():
   flag.Usage = usage

Alternately, we could call our usage() function instead of flag.Usage() in the 
if block.

Original issue reported on code.google.com by dgryski on 27 Jan 2012 at 9:20

Support grep's "-w" flag [PATCH]

It's nice to be able to search for 'whole words', which is to say making sure 
the pattern has a word boundary at the beginning and end.

Having to type '\bpattern\b' is annoying, and grep/ack already have a -w flag 
which does this for you.
Since we already have -i which adjusts the pattern, I think adding another one 
isn't to terrible.

I've attached a patch that does this.

Original issue reported on code.google.com by dgryski on 17 Feb 2012 at 1:13

Attachments:

csearch-w.patch

files with codepage 1252 are skipped

I've many very very old Windows source code which is encoded in codepage 1252. Codesearch skips all files without a warning.

How can I tell cindex to ignore certain folders? (such as logs/, *.log, tmp/ etc.)

See summary above.

Original issue reported on code.google.com by [email protected] on 21 Jan 2013 at 12:33

cindex fails with mmap errors on OpenBSD

Trying to run codesearch on OpenBSD -current amd64 with go 1.0.3 from packages, 
using current codesearch code fetched with 'go get', I'm seeing this:

$ go run src/code.google.com/p/codesearch/cmd/cindex/cindex.go /usr/src
2013/03/04 10:52:52 index /usr/src
2013/03/04 10:56:11 flush index
2013/03/04 10:56:12 merge 13 files + mem
2013/03/04 10:56:12 mmap /tmp/csearch-index060279584: errno 4967816
exit status 1

If I run tests I get this,

=== RUN TestMerge
2013/03/04 11:11:32 merge 0 files + mem
2013/03/04 11:11:32 99 data bytes, 1217 index bytes
2013/03/04 11:11:33 merge 0 files + mem
2013/03/04 11:11:33 87 data bytes, 1217 index bytes
2013/03/04 11:11:33 mmap /tmp/index-test259895843: errno 5172144
exit status 1
FAIL    code.google.com/p/codesearch/index      0.411s

The errno seems strange, any suggestions what might be happening? OpenBSD does 
not have UBC so I'm wondering if it may require msync (not sure if cindex is 
mixing access via standard file operations and mmap, but if so, it will need 
msyncs between the different types of access).

Original issue reported on code.google.com by [email protected] on 4 Mar 2013 at 11:19

Exclude subdirectories

I would like to add exclusion of subdirectories to cindex. For example indexing of chromium /src takes about 50 seconds while only 20 seconds if I exclude /src/out and /src/third_party.
Right now it works like cindex --exclude /src/out --exclude /src/tmp /src plus I have to pass excludes if I want to re-index. As I understand the way to integrate that is to modify index format to store list of excluded directories. Is it OK to introduce such backward incompatible changes?

Corrupt output for files without a final newline

$ echo -n aaa > a
$ echo bbb > b
$ cindex .
2015/07/14 18:49:45 index /tmp/t
2015/07/14 18:49:45 flush index
2015/07/14 18:49:45 merge 0 files + mem
2015/07/14 18:49:45 7 data bytes, 154 index bytes
2015/07/14 18:49:45 merge /Users/jacekm/.csearchindex /Users/jacekm/.csearchindex~
2015/07/14 18:49:45 done
$ csearch -n .
/tmp/t/a:1:aaa/tmp/t/b:1:bbb
$

cindex ignores IRC log files

https://code.google.com/p/codesearch/source/browse/cmd/cindex/cindex.go#132 has 
the relevant line of code. Effectively when I run cindex on my IRC log 
directory, I get an index of *only* the private conversations. Public 
conversations are in files named after the channel, so for example: 
`2014/#ubuntu.03.log`.

cindex decides to ignore dotfiles, tildefiles, and for some reason probably 
relating to EMACS, files beginning with an octothorpe ('#'). There does not 
seem to be any command-line switch to disable this file-skipping (-a would seem 
appropriate), and no way to provide custom include/exclude rules.

Since searching IRC logs is probably more common for me than searching code, it 
makes this amazing tool rather less useful for me!

Original issue reported on code.google.com by [email protected] on 27 Mar 2014 at 5:03

Cannot index / search one file

What steps will reproduce the problem?
1. Index the attached file with cindex
2. Search for a pattern inside it
3. No hits

What is the expected output? What do you see instead?

repro$
repro$ cindex -reset
repro$ cindex badfile
2013/03/13 18:51:52 index /tmp/repro/badfile
2013/03/13 18:51:52 flush index
2013/03/13 18:51:52 merge 0 files + mem
2013/03/13 18:51:52 0 data bytes, 92 index bytes
2013/03/13 18:51:52 done
repro$ cindex -list
/tmp/repro/badfile
repro$
repro$
repro$
repro$ grep main badfile
libc.so.6        __libc_start_main
torch             main
torch              realmain(int, char**)
libglib-2.0....          g_main_context_iteration
libglib-2.0....           g_main_context_prepare
libglib-2.0....            g_main_context_dispatch
#85 0x00000032f5c38f0e in g_main_context_dispatch () from 
/lib64/libglib-2.0.so.0
#87 0x00000032f5c3ca3a in g_main_context_iteration () from 
/lib64/libglib-2.0.so.0
#93 0x000000000040e74d in realmain(int, char**) ()
#94 0x000000000040e933 in main ()
repro$
repro$ csearch main  <= no results here !!
repro$ 
repro$ grep threads badfile 
============ All threads ==========
============ All threads ==========
repro$
repro$ csearch threads <= no results here !!
repro$ 

I cannot find (with csearch) text that is in a file I have indexed (cindex)

What version of the product are you using? On what operating system?

I'm using the Linux binaries that are available on the Download page.
I tried to compile go / codesearch but couldn't make it work (my go
install might be funky).

Please provide any additional information below.

It looks like the problem happens at indexing time.

Original issue reported on code.google.com by [email protected] on 14 Mar 2013 at 1:57

Attachments:

badfile

cindex fails on the first run: mmap() returns EINVAL

What steps will reproduce the problem?
1. build all utils (or download binaries from this site)
2. be sure that there is no .csearchindex (or it has zero size)
3. run `cindex /usr/include` or any other path

`cindex` fails, because mmap returns error with EINVAL code.
This happens inside mmap_linux.go, on line 24.
That function (mmapFile) calls mmap with zero size, which fails.
I had manually copy ~/.csearchindex\~ to ~/.csearchindex.
That fixed the issue.

I expected cindex to work right out of the box, even with no .csearchindex.

I'm using:
Gentoo Base System release 2.0.3, Linux 3.1.6, amd64.
Go repository is the newest (built a few minutes ago from head).

Original issue reported on code.google.com by [email protected] on 19 Jan 2012 at 8:10

cgrep not mentioned in README

And its usage message doesn't help distinguish it from csearch.

Original issue reported on code.google.com by [email protected] on 21 Jan 2012 at 4:31

Home path is missing the drive letter on Windows

What steps will reproduce the problem?
1. cindex any/path (without CSEARCHINDEX set)
2.
3.

What is the expected output? 
Using .csearchindex in home dir. 

What do you see instead?
Failing due to the home path missing the drive letter. In index/read.go, the 
line in File() getting the home dir should probably read 'home = 
os.Getenv("HOMEDRIVE") + os.Getenv("HOMEPATH")'

What version of the product are you using? On what operating system?
Windows 7, 64 bit

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 23 Jan 2012 at 10:39

build failed on tip of golang

On tip 59fdb5d05e04 of golang, buiild of pkg index failed because of argument 
type mismatch.
# code.google.com/p/codesearch/index
src/code.google.com/p/codesearch/index/mmap_bsd.go:34: cannot use f.Fd() (type 
uintptr) as type int in function argument

Original issue reported on code.google.com by [email protected] on 13 Feb 2012 at 8:30

cindex should skip revision control directories

When pointed to a source code tree containing revision control meta-data (e.g. 
.svn, CVS, or similar system with meta-data in each subdir), the metadata files 
get indexed as well, leading to undesired output from csearch.


What steps will reproduce the problem?
1. checkout some svn code
2. cindex that code
3. csearch some-token

What is the expected output? What do you see instead?
Only the actual code files are hits, instead I also see "xxx.svn-base" files.

What version of the product are you using? On what operating system?
hg 070ef10ab799

Please provide any additional information below.
Thanks for a great tool!

Original issue reported on code.google.com by [email protected] on 20 Jan 2012 at 2:04

Merged into: #2

Line missing in read.go

Queries of the type [Hh]ashTable fail to yield results, where hashTable and 
HashTable each gets results.
It seems like a line is clearly missing in read.go in method:
  func (r *postReader) init(ix, trigram, restrict)
so that the restrict parameter is ignored.

The attached patch adds a test that shows the bug and fixes the problem.

Original issue reported on code.google.com by [email protected] on 15 Apr 2012 at 3:59

Merged into: #8

Attachments:

query_bug_fix.diff

How to install

I want to source my android source like https://cs.chromium.org/
But after i
install golang-go
install codesearch
and run "go get github.com/google/codesearch/cmd/{cindex,csearch}"

cindex xxxSourcesxxx/

I don't know to make a website like https://cs.chromium.org/ to search android source
Is there any step-by-step guidance ?

I index android sources by openGrok before .
Is there any guidance like this
https://github.com/OpenGrok/OpenGrok/blob/master/README.txt

Couldn't install

I ran the command go get github.com/google/codesearch/cmd/... but nothing happens, cannot run cindex or csearch, can anyone help?

I already have go already installed

 ~ [10:26:26] $ go version
go version go1.9 darwin/amd64

csearch: print statistics of -f fregexp hits if -verbose [PATCH]

csearch already prints the trigrams and number of files matched from the index 
if -verbose mode is on.

This patch preprocesses the list of fileids so we can print how many were 
matched by the filename regex.

It does double the number of calls to ix.Name(fileid), but that call looks 
cheap and any slowdown would be dwarfed by the disk access during the grep 
stage anyway.

Original issue reported on code.google.com by dgryski on 17 Feb 2012 at 1:50

Attachments:

filename-regex-stats.patch

cindex does not follow and index symlinked paths

What steps will reproduce the problem?
1. Create some symlinks to some source code in a directory.
2. Index that directory.
3. Search for symbols which are only in symlinked source files.

What is the expected output? What do you see instead?

I expect to see symlinked files listed by csearch.
csearch only indexed regular files.

What version of the product are you using? On what operating system?

0.1 on Mac OS X 10.6

Please provide any additional information below.

Maybe additionally provide an option to restrict or follow symlinks to certain 
filesystems for better control.

Original issue reported on code.google.com by [email protected] on 7 Feb 2012 at 4:00

IndexWriter always writes logs

Would be great to wrap these in if ix.Verbose { ... } and allow for silent use.

In Flush:

  log.Printf("%d data bytes, %d index bytes", ix.totalBytes, ix.main.offset())


In mergePost:

  log.Printf("merge %d files + mem", len(ix.postFile))

Original issue reported on code.google.com by [email protected] on 7 Dec 2012 at 8:08

fix bug in siftUp()

siftUp is wrong if heap size > 3

Original issue reported on code.google.com by [email protected] on 16 Nov 2013 at 3:45

Attachments:

write.go.patch

.csearchindex is world-readable

The file .csearchindex is by word-readable by default. This might be a security 
issue because it may allow an attacker to get pieces of information about files 
it cannot read, such as filenames of files in directories that are not world 
readable or reconstructed contents of files that are not world readable.

I could provide a patch, although I believe this is easy to fix.

Original issue reported on code.google.com by [email protected] on 26 Aug 2013 at 9:30

regex->trigram has wrong behaviour when trigrams are question-marked

bash$ csearch -verbose "foo_(bar_)?" >/dev/null
2012/03/05 10:44:37 query: "_ba" "foo" "o_b" "oo_"
2012/03/05 10:44:37 post query identified 11 possible files
bash$ csearch -verbose "foo_b(ar_)?" >/dev/null
2012/03/05 10:44:44 query: "foo" "o_b" "oo_"
2012/03/05 10:44:44 post query identified 12 possible files

In the first example, "_ba" and "o_b"' should not be required trigrams for the 
file.  In the second example, "ar_" is correctly _not_ included in the list of 
required trigrams.

I'll take a look at this later tonight.  It looks like an issue with precedence 
in index/regexp.go.  (I.e., the question mark is only applying to the final 
trigram, and not all trigrams included in the grouping. ).

Original issue reported on code.google.com by dgryski on 5 Mar 2012 at 10:07

go install fails to find package

What steps will reproduce the problem?

1. Installed latest version of go binaries.
2. go install code.google.com/p/codesearch/cmd/cindex (or any other component 
library)

This produces:
can't load package: package code.google.com/p/codesearch/cmd/cindex: import 
"code.google.com/p/codesearch/cmd/cindex": cannot find package

After that failed, I simply cloned the repository using Hg, and tried to build 
locally. Not sure exactly how this is supposed to work (how to install all of 
the sources as the local versions of the packages), so whenever I try to build 
any sub-folder's contents (with go build), it produces the same error as above 
when attempting to locate a dependent package.

What version of the product are you using? On what operating system?

Windows, both 32-bit and 64-bit.

Original issue reported on code.google.com by [email protected] on 16 May 2012 at 1:13

Incremental indexing

It would be great if one could tell the indexer to (re-)index a single file and merge that efficiently into the existing index.

A strong plus would be to read the file names - 1 per line - from a (named or regular) pipe, and index those as soon as a new line becomes available.

mmap_windows.go not updated for latest go version

change f.Fd()
to syscall.Handle(f.Fd())
mmap_windows.go:26

Original issue reported on code.google.com by [email protected] on 2 Apr 2013 at 11:24

csearch/cgrep exit with status 1 regardless of matches found or not

What steps will reproduce the problem?
1. csearch int
2. echo $?
3. csearch SomeStringNotAppearingInYourSourceTree
4. echo $?

What is the expected output? What do you see instead?
In both cases, a '1' is displayed indicating that csearch believes it found no 
matches.  This is because the Grep.Match field is not updated when a match is 
found.

What version of the product are you using? On what operating system?
tip, darwin

Please provide any additional information below.
A one line fix setting g.Match = true in match.go:Grep.Reader():430 should fix 
this.

Original issue reported on code.google.com by dgryski on 29 Jan 2012 at 8:27

google / codesearch Goto Github PK

codesearch's Introduction

codesearch's People

Contributors

Stargazers

Watchers

Forkers

codesearch's Issues

First Run

Second Run

Recommend Projects

Recommend Topics

Recommend Org