Git Product home page Git Product logo

codesearch's Introduction

Code Search is a tool for indexing and then performing
regular expression searches over large bodies of source code.
It is a set of command-line programs written in Go.

For background and an overview of the commands,
see http://swtch.com/~rsc/regexp/regexp4.html.

To install:

	go get github.com/google/codesearch/cmd/...

Use "go get -u" to update an existing installation.

Russ Cox
[email protected]
June 2015

codesearch's People

Contributors

dgryski avatar rsc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

codesearch's Issues

Pull request: add support for per-file custom annotations

This change introduces the ability to attach annotation blobs to each entry in
the index. We index all our internal and external repositories at Twitter, and
the annotations are used to add data like commit count, number of import
references, etc. to each file. This is then used for scoring at search time to
improve ranking.

There are a couple of other minor changes useful for using the code as a
library, such as exposing a couple of private functions or,and and adding a
callback system to match.go.

Change is here: 
http://code.google.com/r/alec-codesearch-annotations/source/detail?r=b965ca9c2a9
4d464ef541d05f4004368b9c3508b

Original issue reported on code.google.com by [email protected] on 29 Aug 2012 at 10:19

cindex needs vast amounts of temporary space

1. Run cindex on about 9 GiB of files (where about half of them are text files)
2. Observe space usage of cindex

What is the expected output?

cindex takes a reasonable amount of temporary space and provides an option to 
specify an alternative place to store temporary files.

What do you see instead?

cindex eats up to 5 GiB of space on /tmp while generating a 600 MiB 
.csearchindex file.

What version of the product are you using? On what operating system?

tip on Linux with go tip

Please provide any additional information below.

It would be nice if cindex could provide an option to change the temporary 
directory as /tmp might not be large enough. Additionally, it might be possible 
to lower the space usage of cindex, for instance, it might be possible to merge 
(and therefore compress) temporary indices before processing the next set of 
files.

Original issue reported on code.google.com by [email protected] on 23 Sep 2013 at 1:23

index: merge of index fails

I've tried to index the Go sources on Windows. The first run of Cindex is like expected. The second run doesn't remove .csearchindex~ and has created a second file .csearchindex~~. I don't know whether the merge of .csearchindex and .csearchindex~ has been successful.

The expected behaviour is .csearchindex and .csearchindex~ will be merged and .csearchindex~ will removed.

OS: Windows 7
Go: go1.5rc1

First Run

v% cindex go
2015/08/06 09:36:34 index E:\home\visfj\go
2015/08/06 09:36:34 flush index
2015/08/06 09:36:34 merge 0 files + mem
2015/08/06 09:36:35 39563770 data bytes, 9212960 index bytes
2015/08/06 09:36:35 done
v% ls -al
total 9068
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Aug  6 09:36 .
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Jul 30 13:22 ..
-rw-r--r-- 1 visfj Domänen-Benutzer    6213 Aug  6 08:36 .bash_history
-rw-r--r-- 1 visfj Domänen-Benutzer     422 Aug  5 11:29 .bashrc
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug  6 09:36 .csearchindex
-rw-r--r-- 1 visfj Domänen-Benutzer     105 Aug  5 14:26 .gitconfig
-rw-r--r-- 1 visfj Domänen-Benutzer     114 Jul 31 13:56 .gitcookies

Second Run

v% cindex
2015/08/06 09:36:45 index E:\home\visfj\go
2015/08/06 09:36:46 flush index
2015/08/06 09:36:46 merge 0 files + mem
2015/08/06 09:36:46 39563770 data bytes, 9212960 index bytes
2015/08/06 09:36:46 merge E:\home\visfj\.csearchindex E:\home\visfj\.csearchindex~
2015/08/06 09:36:46 done
v% ls -al
total 27068
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Aug  6 09:36 .
drwxr-xr-x 1 visfj Domänen-Benutzer       0 Jul 30 13:22 ..
-rw-r--r-- 1 visfj Domänen-Benutzer    6213 Aug  6 08:36 .bash_history
-rw-r--r-- 1 visfj Domänen-Benutzer     422 Aug  5 11:29 .bashrc
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug  6 09:36 .csearchindex
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug  6 09:36 .csearchindex~
-rw-r--r-- 1 visfj Domänen-Benutzer 9212944 Aug  6 09:36 .csearchindex~~
-rw-r--r-- 1 visfj Domänen-Benutzer     105 Aug  5 14:26 .gitconfig
-rw-r--r-- 1 visfj Domänen-Benutzer     114 Jul 31 13:56 .gitcookies

Adding more trigrams makes post query worse

bash$ csearch -i -verbose '%%0.*sprintf'
2012/02/29 16:02:11 query: "%%0" 
("SPR"|"SPr"|"SpR"|"Spr"|"sPR"|"sPr"|"spR"|"spr")|("\xbfPR" "ſP")|("\xbfPr" 
"ſP")|("\xbfpR" "ſp")|("\xbfpr" "ſp") 
("PRI"|"PRi"|"PrI"|"Pri"|"pRI"|"pRi"|"prI"|"pri") 
("RIN"|"RIn"|"RiN"|"Rin"|"rIN"|"rIn"|"riN"|"rin") 
("INT"|"INt"|"InT"|"Int"|"iNT"|"iNt"|"inT"|"int") 
("NTF"|"NTf"|"NtF"|"Ntf"|"nTF"|"nTf"|"ntF"|"ntf")
2012/02/29 16:02:11 post query identified 3295 possible files
bash$ csearch -i -verbose '%%0' >/dev/null
2012/02/29 16:02:24 query: "%%0"
2012/02/29 16:02:24 post query identified 9 possible files

Since only 9 files match '%%0', I would expect no more than 9 possible files 
from the post query when adding '.*sprintf' to the query string.  It looks like 
something is ignoring the 'restrict' list somewhere.

I'm still working on a minimal test case.  I'll look into this more tonight or 
so.

Original issue reported on code.google.com by dgryski on 29 Feb 2012 at 3:10

csearch sometimes fails on character classes containing only the same letter in upper and lowercase

What steps will reproduce the problem?
laptop$ cat userids.txt 
dgryski
laptop$ cindex .
2012/01/24 14:48:22 index /[XXXXXX]/
2012/01/24 14:48:22 flush index
2012/01/24 14:48:22 merge 0 files + mem
2012/01/24 14:48:22 8 data bytes, 237 index bytes
2012/01/24 14:48:22 done
laptop$ csearch '[g]r'
/[XXXXX]/userids.txt:dgryski
laptop$ csearch '[Hg]r'
/[XXXXX]/userids.txt:dgryski
laptop$ csearch '[Gg]r'
laptop$ 
laptop$ csearch 'g[Rr]'
/[XXXXX]/userids.txt:dgryski
laptop$ csearch '[Dd]g'
laptop$ csearch '[ZDd]g'
/[XXXXX]/userids.txt:dgryski
laptop$

What is the expected output? What do you see instead?
I expect 'dgryski' to be printed, but instead depending on the regex no lines 
are found.

What version of the product are you using? On what operating system?
070ef10ab799 tip.  Darwin 10.8.0

Please provide any additional information below.

Original issue reported on code.google.com by dgryski on 24 Jan 2012 at 1:59

csearch: Add flag (-g) for grouping output by file (a'la ack --group, or git grep --heading) [PATCH]

The goal is so that the output is more palatable to the human eyes, compare:

$ csearch Aqyadaya
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3:\&    
$t1\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3:\&    
$t2\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3:\&    
$t3\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);

$ csearch -g Aqyadaya
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3
\&    $t1\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
\&    $t2\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
\&    $t3\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);

Original issue reported on code.google.com by [email protected] on 20 Apr 2012 at 1:46

Attachments:

Windows missing conversion

In mmap_windows.go line 26:

h, err := syscall.CreateFileMapping(f.Fd(), nil, ...

Need convert to syscall.Handle:

h, err := syscall.CreateFileMapping(syscall.Handle(f.Fd()), nil,

Original issue reported on code.google.com by [email protected] on 6 Nov 2013 at 5:17

Second and newer indexing loses paths if one directory basename is a prefix

Easier to demonstrate.

% cat  Makefile
.PHONY: all test clean

BAD1 = a
BAD2 = a-b

GOOD1 = a-x
GOOD2 = a-y

.PHONY: template good bad

template:
    rm -rf test
    mkdir -p test/$(VAR1) test/$(VAR2)
    ls > test/$(VAR1)/a
    ls > test/$(VAR2)/a
    rm -f test/.csearchindex
    /usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex test/$(VAR1) test/$(VAR2)
    /usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -list
    # Just index them again to reproduce the bug
    /usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -verbose
    /usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -list
    test `/usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -list | wc -l` = 2

bad:
    VAR1=$(BAD1) VAR2=$(BAD2) $(MAKE) template

good:
    VAR1=$(GOOD1) VAR2=$(GOOD2) $(MAKE) template

No bug when test run with two directories where basenames are not a prefix or another:

% make good
VAR1=a-x VAR2=a-y /Applications/Xcode.app/Contents/Developer/usr/bin/make template
rm -rf test
mkdir -p test/a-x test/a-y
ls > test/a-x/a
ls > test/a-y/a
rm -f test/.csearchindex
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex test/a-x test/a-y
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-x
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-y
2016/02/08 17:37:14 flush index
2016/02/08 17:37:14 merge 0 files + mem
2016/02/08 17:37:14 166 data bytes, 1596 index bytes
2016/02/08 17:37:14 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a-x
/Users/xxxx/work/codesearch/test/a-y
# Just index them again to reproduce the bug
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -verbose
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-x
2016/02/08 17:37:14 83 79 /Users/xxxx/work/codesearch/test/a-x/a
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-y
2016/02/08 17:37:14 83 79 /Users/xxxx/work/codesearch/test/a-y/a
2016/02/08 17:37:14 flush index
2016/02/08 17:37:14 merge 0 files + mem
2016/02/08 17:37:14 166 data bytes, 1596 index bytes
2016/02/08 17:37:14 merge test/.csearchindex test/.csearchindex~
2016/02/08 17:37:14 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a-x
/Users/xxxx/work/codesearch/test/a-y
test `/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list | wc -l` = 2

Bug seen when test run with two directories where one basename is prefix of another:

% make bad
VAR1=a VAR2=a-b /Applications/Xcode.app/Contents/Developer/usr/bin/make template
rm -rf test
mkdir -p test/a test/a-b
ls > test/a/a
ls > test/a-b/a
rm -f test/.csearchindex
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex test/a test/a-b
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a-b
2016/02/08 17:37:11 flush index
2016/02/08 17:37:11 merge 0 files + mem
2016/02/08 17:37:11 166 data bytes, 1592 index bytes
2016/02/08 17:37:11 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a
/Users/xxxx/work/codesearch/test/a-b
# Just index them again to reproduce the bug
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -verbose
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a
2016/02/08 17:37:11 83 79 /Users/xxxx/work/codesearch/test/a/a
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a-b
2016/02/08 17:37:11 83 79 /Users/xxxx/work/codesearch/test/a-b/a
2016/02/08 17:37:11 flush index
2016/02/08 17:37:11 merge 0 files + mem
2016/02/08 17:37:11 166 data bytes, 1592 index bytes
2016/02/08 17:37:11 merge test/.csearchindex test/.csearchindex~
2016/02/08 17:37:11 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a
test `/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list | wc -l` = 2
make[1]: *** [template] Error 1
make: *** [bad] Error 2

Option to remove a path from index

Currently there is no straightforward way to remove one directory from the 
index (cindex -list). One needs to provide the complete new list with cindex 
-reset to remove one entry. Please provide an option to remove one (or more) of 
the entries from the path that are currently indexed.


Original issue reported on code.google.com by [email protected] on 17 May 2012 at 9:09

Minor adjustment of Csearch page.

Not really a issue. But request. 

I see that "Emacs editor integration" is promimently mentioned on the homepage 
of codesearch. In that case, I would recommend to add

"Vim editor integration"  too, link: 
https://github.com/junkblocker/unite-codesearch

Thanks in advance.

Original issue reported on code.google.com by [email protected] on 30 Oct 2014 at 6:07

Where are the binaries?

I've looked in here and although the top-level readme references binaries I can't find anything that isn't a .go file.

"go get" fails to get

$ go get code.google.com/p/codesearch/cmd/...
package code.google.com/p/codesearch/cmd/...: unable to detect version control system for code.google.com/ path

Oh well, let's try github:

$ go get github.com/google/codesearch/...
package github.com/google/codesearch/...
imports github.com/google/codesearch/cmd/cgrep
imports code.google.com/p/codesearch/regexp: unable to detect version control system for code.google.com/ path
package github.com/google/codesearch/...
imports github.com/google/codesearch/cmd/cindex
imports code.google.com/p/codesearch/index: unable to detect version control system for code.google.com/ path
package github.com/google/codesearch/...
imports github.com/google/codesearch/index
imports code.google.com/p/codesearch/sparse: unable to detect version control system for code.google.com/ path

prefix/suffix lists and question-marks don't play nicely together


One of the areas I got stuck on when debugging the trigram-question-mark issue, 
but might actually be a fundamental design limitation / feature, is that moving 
to prefix/suffix lists can cause the list of trigrams to drop considerably.

bash$ ./csearch -verbose 'foo_(bar)?zot' >/dev/null
2012/03/07 22:18:48 query: "foo" "oo_" "zot" ("_zo" "o_z")|("arz" "rzo")
2012/03/07 22:18:48 post query identified 0 possible files
bash$ ./csearch -verbose 'foo_(bar_)?zot' >/dev/null
2012/03/07 22:18:53 query: "foo" "oo_" "zot"
2012/03/07 22:18:53 post query identified 0 possible files

In the first case, "bar" is only three characters and stays as an exact trigram 
and is used to construct the arz/rzo entries.  When it becomes a prefix/suffix 
list (when it hits 4 characters by adding the underscore),  it no longer 
provides us with any trigram info because the empty string empties out the 
prefix and suffix lists as being "redundant" with the empty string.  ("" is a 
prefix of "ba").

I'm not sure if this is a bug or not.  I.e, _should_ we be able to transform 
prefix/suffix lists into AND/OR sets of trigrams in this case?

Original issue reported on code.google.com by dgryski on 7 Mar 2012 at 10:08

"csearch -n" output is 0-based, not 1-based

Maybe this works as designed, but it caught me off guard. The line numbers 
reported by "csearch -n" seem to start at 0 instead of 1. This is at odds with 
e.g. grep, to which csearch compares itself in its help output. This makes it 
somewhat awkward to use "csearch -n" in emacs (and presumably other editors) 
since it numbers its lines starting at 1.

Original issue reported on code.google.com by austin.bingham on 11 Oct 2012 at 10:41

  • Merged into: #4

Files containing ascii8 are not indexed (feature/request)

What steps will reproduce the problem?
1. File containing: PAT and ä (0xE4, a umlaut), ö (0xF6,o umlaut), other high 
bit ascii chars
2. cindex -reset .
3. csearch -i PAT

What is the expected output? What do you see instead?
Expected to find PAT. Instead no match.

What version of the product are you using? On what operating system?
codesearch-0.01-windows-amd64.zip

Please provide any additional information below.
I looked at the source (write.go) and it seems to expect that files are in 
UTF-8 only (this is a Go specification?). However it would be nice if csearch 
could be used with any source files, including those with high bit ascii 
characters. Or that there would be a command line option for this.

Original issue reported on code.google.com by [email protected] on 21 Nov 2012 at 7:24

Index update fails on Windows

What steps will reproduce the problem?
1. cindex any/path
2. cindex
3.

What is the expected output? 
The index is updated.

What do you see instead?
After merging the master index with the updated index, the rename of 
.csearchindex~~ to .csearchindex fail. This is possibly due to open file 
handles.

What version of the product are you using? On what operating system?
Windows 7, 64 bit

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 23 Jan 2012 at 10:55

Some log.Fatal() calls should be log.Fatalf() [PATCH]

I ran out of disk space on a server and the error message from cindex ("short 
write writing %s") indicated it should have been a log.Fatalf() call instead of 
log.Fatal().

I ran go vet over the source tree and found two more instances of Fatal() calls 
with format specifiers.

The patch is attached.

Original issue reported on code.google.com by dgryski on 20 Feb 2012 at 2:15

Attachments:

Error in read.go comments

http://code.google.com/p/codesearch/source/browse/index/read.go#36

The comment reads : "For example, the delta list [2,5,1,1,0] encodes the file 
ID list 1, 6, 7, 8."

Am I misunderstanding something, or should the delta list read : [1,5,1,1,0] to 
match up to the given file ID list?

Original issue reported on code.google.com by [email protected] on 26 Apr 2012 at 7:10

cindex indexes vcs directories: .hg, .svn, .git, .bzr, ...

What steps will reproduce the problem?
1. Run cindex on a directory containing a checkout
2. Run csearch on a text string known to be in the source code

What is the expected output? What do you see instead?
Expected output is source code matches.  Instead, I see source code matches 
_and_ .svn/.../foo.svn-base matches, or .hg/blah matches, or .git/blah matches.

What version of the product are you using? On what operating system?
f9a003c603e3 tip
Darwin 10.8.0

Please provide any additional information below.
A simple patch to the filewalker function can skip directories if they are 
known to be "special" and not likely to contain items we want to index.  I've 
included a small patch to cindex.go which does just this.

This functionality could also be disabled unless -ignore-vcs or something is 
passed.

Ack (betterthangrep.com) does similar filtering on directories and files: 
#foo.c# from emacs, foo.c~ from vim, etc.  Ignoring those would also be easy to 
add.

Original issue reported on code.google.com by dgryski on 19 Jan 2012 at 9:00

Attachments:

cgrep blows up when given no arguments

What steps will reproduce the problem?
1. run cgrep
2.
3.

What is the expected output? What do you see instead?
There is a custom usage message which is not displayed.  Instead we get the 
default usage message and a panic.

laptop$ cgrep
Usage of cgrep:
  -c=false: print match counts only
  -cpuprofile="": write cpu profile to this file
  -h=false: omit file names
  -i=false: case-insensitive match
  -l=false: list matching files only
  -n=false: show line numbers
panic: runtime error: index out of range

goroutine 1 [running]:
main.main()
    /[XXXXX]/src/code.google.com/p/codesearch/cmd/cgrep/cgrep.go:58 +0x292

What version of the product are you using? On what operating system?
070ef10ab799 tip. Darwin 10.8.0

Please provide any additional information below.
This is the default flag.Usage() message instead of the custom usage() one 
which exits and would prevent the index-out-of-range error.

The fix is a one-line patch in main():
   flag.Usage = usage

Alternately, we could call our usage() function instead of flag.Usage() in the 
if block.

Original issue reported on code.google.com by dgryski on 27 Jan 2012 at 9:20

Support grep's "-w" flag [PATCH]

It's nice to be able to search for 'whole words', which is to say making sure 
the pattern has a word boundary at the beginning and end.

Having to type '\bpattern\b' is annoying, and grep/ack already have a -w flag 
which does this for you.
Since we already have -i which adjusts the pattern, I think adding another one 
isn't to terrible.

I've attached a patch that does this.

Original issue reported on code.google.com by dgryski on 17 Feb 2012 at 1:13

Attachments:

cindex fails with mmap errors on OpenBSD

Trying to run codesearch on OpenBSD -current amd64 with go 1.0.3 from packages, 
using current codesearch code fetched with 'go get', I'm seeing this:

$ go run src/code.google.com/p/codesearch/cmd/cindex/cindex.go /usr/src
2013/03/04 10:52:52 index /usr/src
2013/03/04 10:56:11 flush index
2013/03/04 10:56:12 merge 13 files + mem
2013/03/04 10:56:12 mmap /tmp/csearch-index060279584: errno 4967816
exit status 1

If I run tests I get this,

=== RUN TestMerge
2013/03/04 11:11:32 merge 0 files + mem
2013/03/04 11:11:32 99 data bytes, 1217 index bytes
2013/03/04 11:11:33 merge 0 files + mem
2013/03/04 11:11:33 87 data bytes, 1217 index bytes
2013/03/04 11:11:33 mmap /tmp/index-test259895843: errno 5172144
exit status 1
FAIL    code.google.com/p/codesearch/index      0.411s

The errno seems strange, any suggestions what might be happening? OpenBSD does 
not have UBC so I'm wondering if it may require msync (not sure if cindex is 
mixing access via standard file operations and mmap, but if so, it will need 
msyncs between the different types of access).

Original issue reported on code.google.com by [email protected] on 4 Mar 2013 at 11:19

Exclude subdirectories

I would like to add exclusion of subdirectories to cindex. For example indexing of chromium /src takes about 50 seconds while only 20 seconds if I exclude /src/out and /src/third_party.
Right now it works like cindex --exclude /src/out --exclude /src/tmp /src plus I have to pass excludes if I want to re-index. As I understand the way to integrate that is to modify index format to store list of excluded directories. Is it OK to introduce such backward incompatible changes?

Corrupt output for files without a final newline

$ echo -n aaa > a
$ echo bbb > b
$ cindex .
2015/07/14 18:49:45 index /tmp/t
2015/07/14 18:49:45 flush index
2015/07/14 18:49:45 merge 0 files + mem
2015/07/14 18:49:45 7 data bytes, 154 index bytes
2015/07/14 18:49:45 merge /Users/jacekm/.csearchindex /Users/jacekm/.csearchindex~
2015/07/14 18:49:45 done
$ csearch -n .
/tmp/t/a:1:aaa/tmp/t/b:1:bbb
$

cindex ignores IRC log files

https://code.google.com/p/codesearch/source/browse/cmd/cindex/cindex.go#132 has 
the relevant line of code. Effectively when I run cindex on my IRC log 
directory, I get an index of *only* the private conversations. Public 
conversations are in files named after the channel, so for example: 
`2014/#ubuntu.03.log`.

cindex decides to ignore dotfiles, tildefiles, and for some reason probably 
relating to EMACS, files beginning with an octothorpe ('#'). There does not 
seem to be any command-line switch to disable this file-skipping (-a would seem 
appropriate), and no way to provide custom include/exclude rules.

Since searching IRC logs is probably more common for me than searching code, it 
makes this amazing tool rather less useful for me!

Original issue reported on code.google.com by [email protected] on 27 Mar 2014 at 5:03

Cannot index / search one file

What steps will reproduce the problem?
1. Index the attached file with cindex
2. Search for a pattern inside it
3. No hits

What is the expected output? What do you see instead?

repro$
repro$ cindex -reset
repro$ cindex badfile
2013/03/13 18:51:52 index /tmp/repro/badfile
2013/03/13 18:51:52 flush index
2013/03/13 18:51:52 merge 0 files + mem
2013/03/13 18:51:52 0 data bytes, 92 index bytes
2013/03/13 18:51:52 done
repro$ cindex -list
/tmp/repro/badfile
repro$
repro$
repro$
repro$ grep main badfile
libc.so.6        __libc_start_main
torch             main
torch              realmain(int, char**)
libglib-2.0....          g_main_context_iteration
libglib-2.0....           g_main_context_prepare
libglib-2.0....            g_main_context_dispatch
#85 0x00000032f5c38f0e in g_main_context_dispatch () from 
/lib64/libglib-2.0.so.0
#87 0x00000032f5c3ca3a in g_main_context_iteration () from 
/lib64/libglib-2.0.so.0
#93 0x000000000040e74d in realmain(int, char**) ()
#94 0x000000000040e933 in main ()
repro$
repro$ csearch main  <= no results here !!
repro$ 
repro$ grep threads badfile 
============ All threads ==========
============ All threads ==========
repro$
repro$ csearch threads <= no results here !!
repro$ 

I cannot find (with csearch) text that is in a file I have indexed (cindex)

What version of the product are you using? On what operating system?

I'm using the Linux binaries that are available on the Download page.
I tried to compile go / codesearch but couldn't make it work (my go
install might be funky).

Please provide any additional information below.

It looks like the problem happens at indexing time.

Original issue reported on code.google.com by [email protected] on 14 Mar 2013 at 1:57

Attachments:

cindex fails on the first run: mmap() returns EINVAL

What steps will reproduce the problem?
1. build all utils (or download binaries from this site)
2. be sure that there is no .csearchindex (or it has zero size)
3. run `cindex /usr/include` or any other path

`cindex` fails, because mmap returns error with EINVAL code.
This happens inside mmap_linux.go, on line 24.
That function (mmapFile) calls mmap with zero size, which fails.
I had manually copy ~/.csearchindex\~ to ~/.csearchindex.
That fixed the issue.

I expected cindex to work right out of the box, even with no .csearchindex.

I'm using:
Gentoo Base System release 2.0.3, Linux 3.1.6, amd64.
Go repository is the newest (built a few minutes ago from head).

Original issue reported on code.google.com by [email protected] on 19 Jan 2012 at 8:10

Home path is missing the drive letter on Windows

What steps will reproduce the problem?
1. cindex any/path (without CSEARCHINDEX set)
2.
3.

What is the expected output? 
Using .csearchindex in home dir. 

What do you see instead?
Failing due to the home path missing the drive letter. In index/read.go, the 
line in File() getting the home dir should probably read 'home = 
os.Getenv("HOMEDRIVE") + os.Getenv("HOMEPATH")'

What version of the product are you using? On what operating system?
Windows 7, 64 bit

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 23 Jan 2012 at 10:39

build failed on tip of golang

On tip 59fdb5d05e04 of golang, buiild of pkg index failed because of argument 
type mismatch.
# code.google.com/p/codesearch/index
src/code.google.com/p/codesearch/index/mmap_bsd.go:34: cannot use f.Fd() (type 
uintptr) as type int in function argument

Original issue reported on code.google.com by [email protected] on 13 Feb 2012 at 8:30

cindex should skip revision control directories

When pointed to a source code tree containing revision control meta-data (e.g. 
.svn, CVS, or similar system with meta-data in each subdir), the metadata files 
get indexed as well, leading to undesired output from csearch.


What steps will reproduce the problem?
1. checkout some svn code
2. cindex that code
3. csearch some-token

What is the expected output? What do you see instead?
Only the actual code files are hits, instead I also see "xxx.svn-base" files.

What version of the product are you using? On what operating system?
hg 070ef10ab799

Please provide any additional information below.
Thanks for a great tool!

Original issue reported on code.google.com by [email protected] on 20 Jan 2012 at 2:04

  • Merged into: #2

Line missing in read.go

Queries of the type [Hh]ashTable fail to yield results, where hashTable and 
HashTable each gets results.
It seems like a line is clearly missing in read.go in method:
  func (r *postReader) init(ix, trigram, restrict)
so that the restrict parameter is ignored.

The attached patch adds a test that shows the bug and fixes the problem.


Original issue reported on code.google.com by [email protected] on 15 Apr 2012 at 3:59

  • Merged into: #8

Attachments:

Couldn't install

I ran the command go get github.com/google/codesearch/cmd/... but nothing happens, cannot run cindex or csearch, can anyone help?

I already have go already installed

 ~ [10:26:26] $ go version
go version go1.9 darwin/amd64

csearch: print statistics of -f fregexp hits if -verbose [PATCH]

csearch already prints the trigrams and number of files matched from the index 
if -verbose mode is on.

This patch preprocesses the list of fileids so we can print how many were 
matched by the filename regex.

It does double the number of calls to ix.Name(fileid), but that call looks 
cheap and any slowdown would be dwarfed by the disk access during the grep 
stage anyway.

Original issue reported on code.google.com by dgryski on 17 Feb 2012 at 1:50

Attachments:

cindex does not follow and index symlinked paths

What steps will reproduce the problem?
1. Create some symlinks to some source code in a directory.
2. Index that directory.
3. Search for symbols which are only in symlinked source files.

What is the expected output? What do you see instead?

I expect to see symlinked files listed by csearch.
csearch only indexed regular files.

What version of the product are you using? On what operating system?

0.1 on Mac OS X 10.6

Please provide any additional information below.

Maybe additionally provide an option to restrict or follow symlinks to certain 
filesystems for better control.

Original issue reported on code.google.com by [email protected] on 7 Feb 2012 at 4:00

IndexWriter always writes logs

Would be great to wrap these in if ix.Verbose { ... } and allow for silent use.

In Flush:

  log.Printf("%d data bytes, %d index bytes", ix.totalBytes, ix.main.offset())


In mergePost:

  log.Printf("merge %d files + mem", len(ix.postFile))

Original issue reported on code.google.com by [email protected] on 7 Dec 2012 at 8:08

.csearchindex is world-readable

The file .csearchindex is by word-readable by default. This might be a security 
issue because it may allow an attacker to get pieces of information about files 
it cannot read, such as filenames of files in directories that are not world 
readable or reconstructed contents of files that are not world readable.

I could provide a patch, although I believe this is easy to fix.

Original issue reported on code.google.com by [email protected] on 26 Aug 2013 at 9:30

regex->trigram has wrong behaviour when trigrams are question-marked

bash$ csearch -verbose "foo_(bar_)?" >/dev/null
2012/03/05 10:44:37 query: "_ba" "foo" "o_b" "oo_"
2012/03/05 10:44:37 post query identified 11 possible files
bash$ csearch -verbose "foo_b(ar_)?" >/dev/null
2012/03/05 10:44:44 query: "foo" "o_b" "oo_"
2012/03/05 10:44:44 post query identified 12 possible files

In the first example, "_ba" and "o_b"' should not be required trigrams for the 
file.  In the second example, "ar_" is correctly _not_ included in the list of 
required trigrams.

I'll take a look at this later tonight.  It looks like an issue with precedence 
in index/regexp.go.  (I.e., the question mark is only applying to the final 
trigram, and not all trigrams included in the grouping. ).

Original issue reported on code.google.com by dgryski on 5 Mar 2012 at 10:07

go install fails to find package

What steps will reproduce the problem?

1. Installed latest version of go binaries.
2. go install code.google.com/p/codesearch/cmd/cindex (or any other component 
library)

This produces:
can't load package: package code.google.com/p/codesearch/cmd/cindex: import 
"code.google.com/p/codesearch/cmd/cindex": cannot find package

After that failed, I simply cloned the repository using Hg, and tried to build 
locally. Not sure exactly how this is supposed to work (how to install all of 
the sources as the local versions of the packages), so whenever I try to build 
any sub-folder's contents (with go build), it produces the same error as above 
when attempting to locate a dependent package.

What version of the product are you using? On what operating system?

Windows, both 32-bit and 64-bit.

Original issue reported on code.google.com by [email protected] on 16 May 2012 at 1:13

Incremental indexing

It would be great if one could tell the indexer to (re-)index a single file and merge that efficiently into the existing index.

A strong plus would be to read the file names - 1 per line - from a (named or regular) pipe, and index those as soon as a new line becomes available.

csearch/cgrep exit with status 1 regardless of matches found or not

What steps will reproduce the problem?
1. csearch int
2. echo $?
3. csearch SomeStringNotAppearingInYourSourceTree
4. echo $?

What is the expected output? What do you see instead?
In both cases, a '1' is displayed indicating that csearch believes it found no 
matches.  This is because the Grep.Match field is not updated when a match is 
found.

What version of the product are you using? On what operating system?
tip, darwin

Please provide any additional information below.
A one line fix setting g.Match = true in match.go:Grep.Reader():430 should fix 
this.

Original issue reported on code.google.com by dgryski on 29 Jan 2012 at 8:27

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.