google / codesearch Goto Github PK
View Code? Open in Web Editor NEWFast, indexed regexp search over large file trees
Home Page: http://swtch.com/~rsc/regexp/regexp4.html
License: BSD 3-Clause "New" or "Revised" License
Fast, indexed regexp search over large file trees
Home Page: http://swtch.com/~rsc/regexp/regexp4.html
License: BSD 3-Clause "New" or "Revised" License
Code Search is a tool for indexing and then performing regular expression searches over large bodies of source code. It is a set of command-line programs written in Go. For background and an overview of the commands, see http://swtch.com/~rsc/regexp/regexp4.html. To install: go get github.com/google/codesearch/cmd/... Use "go get -u" to update an existing installation. Russ Cox [email protected] June 2015
This change introduces the ability to attach annotation blobs to each entry in
the index. We index all our internal and external repositories at Twitter, and
the annotations are used to add data like commit count, number of import
references, etc. to each file. This is then used for scoring at search time to
improve ranking.
There are a couple of other minor changes useful for using the code as a
library, such as exposing a couple of private functions or,and and adding a
callback system to match.go.
Change is here:
http://code.google.com/r/alec-codesearch-annotations/source/detail?r=b965ca9c2a9
4d464ef541d05f4004368b9c3508b
Original issue reported on code.google.com by [email protected]
on 29 Aug 2012 at 10:19
1. Run cindex on about 9 GiB of files (where about half of them are text files)
2. Observe space usage of cindex
What is the expected output?
cindex takes a reasonable amount of temporary space and provides an option to
specify an alternative place to store temporary files.
What do you see instead?
cindex eats up to 5 GiB of space on /tmp while generating a 600 MiB
.csearchindex file.
What version of the product are you using? On what operating system?
tip on Linux with go tip
Please provide any additional information below.
It would be nice if cindex could provide an option to change the temporary
directory as /tmp might not be large enough. Additionally, it might be possible
to lower the space usage of cindex, for instance, it might be possible to merge
(and therefore compress) temporary indices before processing the next set of
files.
Original issue reported on code.google.com by [email protected]
on 23 Sep 2013 at 1:23
I've tried to index the Go sources on Windows. The first run of Cindex is like expected. The second run doesn't remove .csearchindex~
and has created a second file .csearchindex~~
. I don't know whether the merge of .csearchindex
and .csearchindex~
has been successful.
The expected behaviour is .csearchindex
and .csearchindex~
will be merged and .csearchindex~
will removed.
OS: Windows 7
Go: go1.5rc1
v% cindex go
2015/08/06 09:36:34 index E:\home\visfj\go
2015/08/06 09:36:34 flush index
2015/08/06 09:36:34 merge 0 files + mem
2015/08/06 09:36:35 39563770 data bytes, 9212960 index bytes
2015/08/06 09:36:35 done
v% ls -al
total 9068
drwxr-xr-x 1 visfj Domänen-Benutzer 0 Aug 6 09:36 .
drwxr-xr-x 1 visfj Domänen-Benutzer 0 Jul 30 13:22 ..
-rw-r--r-- 1 visfj Domänen-Benutzer 6213 Aug 6 08:36 .bash_history
-rw-r--r-- 1 visfj Domänen-Benutzer 422 Aug 5 11:29 .bashrc
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug 6 09:36 .csearchindex
-rw-r--r-- 1 visfj Domänen-Benutzer 105 Aug 5 14:26 .gitconfig
-rw-r--r-- 1 visfj Domänen-Benutzer 114 Jul 31 13:56 .gitcookies
v% cindex
2015/08/06 09:36:45 index E:\home\visfj\go
2015/08/06 09:36:46 flush index
2015/08/06 09:36:46 merge 0 files + mem
2015/08/06 09:36:46 39563770 data bytes, 9212960 index bytes
2015/08/06 09:36:46 merge E:\home\visfj\.csearchindex E:\home\visfj\.csearchindex~
2015/08/06 09:36:46 done
v% ls -al
total 27068
drwxr-xr-x 1 visfj Domänen-Benutzer 0 Aug 6 09:36 .
drwxr-xr-x 1 visfj Domänen-Benutzer 0 Jul 30 13:22 ..
-rw-r--r-- 1 visfj Domänen-Benutzer 6213 Aug 6 08:36 .bash_history
-rw-r--r-- 1 visfj Domänen-Benutzer 422 Aug 5 11:29 .bashrc
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug 6 09:36 .csearchindex
-rw-r--r-- 1 visfj Domänen-Benutzer 9212960 Aug 6 09:36 .csearchindex~
-rw-r--r-- 1 visfj Domänen-Benutzer 9212944 Aug 6 09:36 .csearchindex~~
-rw-r--r-- 1 visfj Domänen-Benutzer 105 Aug 5 14:26 .gitconfig
-rw-r--r-- 1 visfj Domänen-Benutzer 114 Jul 31 13:56 .gitcookies
bash$ csearch -i -verbose '%%0.*sprintf'
2012/02/29 16:02:11 query: "%%0"
("SPR"|"SPr"|"SpR"|"Spr"|"sPR"|"sPr"|"spR"|"spr")|("\xbfPR" "ſP")|("\xbfPr"
"ſP")|("\xbfpR" "ſp")|("\xbfpr" "ſp")
("PRI"|"PRi"|"PrI"|"Pri"|"pRI"|"pRi"|"prI"|"pri")
("RIN"|"RIn"|"RiN"|"Rin"|"rIN"|"rIn"|"riN"|"rin")
("INT"|"INt"|"InT"|"Int"|"iNT"|"iNt"|"inT"|"int")
("NTF"|"NTf"|"NtF"|"Ntf"|"nTF"|"nTf"|"ntF"|"ntf")
2012/02/29 16:02:11 post query identified 3295 possible files
bash$ csearch -i -verbose '%%0' >/dev/null
2012/02/29 16:02:24 query: "%%0"
2012/02/29 16:02:24 post query identified 9 possible files
Since only 9 files match '%%0', I would expect no more than 9 possible files
from the post query when adding '.*sprintf' to the query string. It looks like
something is ignoring the 'restrict' list somewhere.
I'm still working on a minimal test case. I'll look into this more tonight or
so.
Original issue reported on code.google.com by dgryski
on 29 Feb 2012 at 3:10
Please provide way to exclude directory from indexing
What steps will reproduce the problem?
laptop$ cat userids.txt
dgryski
laptop$ cindex .
2012/01/24 14:48:22 index /[XXXXXX]/
2012/01/24 14:48:22 flush index
2012/01/24 14:48:22 merge 0 files + mem
2012/01/24 14:48:22 8 data bytes, 237 index bytes
2012/01/24 14:48:22 done
laptop$ csearch '[g]r'
/[XXXXX]/userids.txt:dgryski
laptop$ csearch '[Hg]r'
/[XXXXX]/userids.txt:dgryski
laptop$ csearch '[Gg]r'
laptop$
laptop$ csearch 'g[Rr]'
/[XXXXX]/userids.txt:dgryski
laptop$ csearch '[Dd]g'
laptop$ csearch '[ZDd]g'
/[XXXXX]/userids.txt:dgryski
laptop$
What is the expected output? What do you see instead?
I expect 'dgryski' to be printed, but instead depending on the regex no lines
are found.
What version of the product are you using? On what operating system?
070ef10ab799 tip. Darwin 10.8.0
Please provide any additional information below.
Original issue reported on code.google.com by dgryski
on 24 Jan 2012 at 1:59
The goal is so that the output is more palatable to the human eyes, compare:
$ csearch Aqyadaya
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3:\&
$t1\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3:\&
$t2\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3:\&
$t3\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
$ csearch -g Aqyadaya
/home/nazri/perl5/perlbrew/perls/perl-5.14.2/man/man3/Text::SimpleTable.3
\& $t1\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
\& $t2\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
\& $t3\->row(\*(Aqfoobarbaz\*(Aq, \*(Aqyadayadayada\*(Aq);
Original issue reported on code.google.com by [email protected]
on 20 Apr 2012 at 1:46
Attachments:
Add TestHeap()
Original issue reported on code.google.com by [email protected]
on 26 Nov 2013 at 4:42
Attachments:
In mmap_windows.go line 26:
h, err := syscall.CreateFileMapping(f.Fd(), nil, ...
Need convert to syscall.Handle:
h, err := syscall.CreateFileMapping(syscall.Handle(f.Fd()), nil,
Original issue reported on code.google.com by [email protected]
on 6 Nov 2013 at 5:17
It would be awesome to be able to search my IRC log files, but I compress them to save space. Would it be possible to implement automatic decompression of gzip, bzip2 and xz to the toolchain?
Easier to demonstrate.
% cat Makefile
.PHONY: all test clean
BAD1 = a
BAD2 = a-b
GOOD1 = a-x
GOOD2 = a-y
.PHONY: template good bad
template:
rm -rf test
mkdir -p test/$(VAR1) test/$(VAR2)
ls > test/$(VAR1)/a
ls > test/$(VAR2)/a
rm -f test/.csearchindex
/usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex test/$(VAR1) test/$(VAR2)
/usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -list
# Just index them again to reproduce the bug
/usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -verbose
/usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -list
test `/usr/bin/env CSEARCHINDEX=test/.csearchindex $(GOPATH)/bin/cindex -list | wc -l` = 2
bad:
VAR1=$(BAD1) VAR2=$(BAD2) $(MAKE) template
good:
VAR1=$(GOOD1) VAR2=$(GOOD2) $(MAKE) template
No bug when test run with two directories where basenames are not a prefix or another:
% make good
VAR1=a-x VAR2=a-y /Applications/Xcode.app/Contents/Developer/usr/bin/make template
rm -rf test
mkdir -p test/a-x test/a-y
ls > test/a-x/a
ls > test/a-y/a
rm -f test/.csearchindex
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex test/a-x test/a-y
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-x
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-y
2016/02/08 17:37:14 flush index
2016/02/08 17:37:14 merge 0 files + mem
2016/02/08 17:37:14 166 data bytes, 1596 index bytes
2016/02/08 17:37:14 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a-x
/Users/xxxx/work/codesearch/test/a-y
# Just index them again to reproduce the bug
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -verbose
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-x
2016/02/08 17:37:14 83 79 /Users/xxxx/work/codesearch/test/a-x/a
2016/02/08 17:37:14 index /Users/xxxx/work/codesearch/test/a-y
2016/02/08 17:37:14 83 79 /Users/xxxx/work/codesearch/test/a-y/a
2016/02/08 17:37:14 flush index
2016/02/08 17:37:14 merge 0 files + mem
2016/02/08 17:37:14 166 data bytes, 1596 index bytes
2016/02/08 17:37:14 merge test/.csearchindex test/.csearchindex~
2016/02/08 17:37:14 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a-x
/Users/xxxx/work/codesearch/test/a-y
test `/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list | wc -l` = 2
Bug seen when test run with two directories where one basename is prefix of another:
% make bad
VAR1=a VAR2=a-b /Applications/Xcode.app/Contents/Developer/usr/bin/make template
rm -rf test
mkdir -p test/a test/a-b
ls > test/a/a
ls > test/a-b/a
rm -f test/.csearchindex
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex test/a test/a-b
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a-b
2016/02/08 17:37:11 flush index
2016/02/08 17:37:11 merge 0 files + mem
2016/02/08 17:37:11 166 data bytes, 1592 index bytes
2016/02/08 17:37:11 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a
/Users/xxxx/work/codesearch/test/a-b
# Just index them again to reproduce the bug
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -verbose
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a
2016/02/08 17:37:11 83 79 /Users/xxxx/work/codesearch/test/a/a
2016/02/08 17:37:11 index /Users/xxxx/work/codesearch/test/a-b
2016/02/08 17:37:11 83 79 /Users/xxxx/work/codesearch/test/a-b/a
2016/02/08 17:37:11 flush index
2016/02/08 17:37:11 merge 0 files + mem
2016/02/08 17:37:11 166 data bytes, 1592 index bytes
2016/02/08 17:37:11 merge test/.csearchindex test/.csearchindex~
2016/02/08 17:37:11 done
/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list
/Users/xxxx/work/codesearch/test/a
test `/usr/bin/env CSEARCHINDEX=test/.csearchindex /Users/xxxx/pkgs/go/bin/cindex -list | wc -l` = 2
make[1]: *** [template] Error 1
make: *** [bad] Error 2
Currently there is no straightforward way to remove one directory from the
index (cindex -list). One needs to provide the complete new list with cindex
-reset to remove one entry. Please provide an option to remove one (or more) of
the entries from the path that are currently indexed.
Original issue reported on code.google.com by [email protected]
on 17 May 2012 at 9:09
Not really a issue. But request.
I see that "Emacs editor integration" is promimently mentioned on the homepage
of codesearch. In that case, I would recommend to add
"Vim editor integration" too, link:
https://github.com/junkblocker/unite-codesearch
Thanks in advance.
Original issue reported on code.google.com by [email protected]
on 30 Oct 2014 at 6:07
I've looked in here and although the top-level readme references binaries I can't find anything that isn't a .go file.
They should be 1-indexed.
This is from the mac 64 zip file e0e01b6d3c01a3ac8b8d0507aae7cb34ba24b1a7 Jan
19 2012.
-rob
Original issue reported on code.google.com by [email protected]
on 21 Jan 2012 at 4:31
$ go get code.google.com/p/codesearch/cmd/...
package code.google.com/p/codesearch/cmd/...: unable to detect version control system for code.google.com/ path
Oh well, let's try github:
$ go get github.com/google/codesearch/...
package github.com/google/codesearch/...
imports github.com/google/codesearch/cmd/cgrep
imports code.google.com/p/codesearch/regexp: unable to detect version control system for code.google.com/ path
package github.com/google/codesearch/...
imports github.com/google/codesearch/cmd/cindex
imports code.google.com/p/codesearch/index: unable to detect version control system for code.google.com/ path
package github.com/google/codesearch/...
imports github.com/google/codesearch/index
imports code.google.com/p/codesearch/sparse: unable to detect version control system for code.google.com/ path
One of the areas I got stuck on when debugging the trigram-question-mark issue,
but might actually be a fundamental design limitation / feature, is that moving
to prefix/suffix lists can cause the list of trigrams to drop considerably.
bash$ ./csearch -verbose 'foo_(bar)?zot' >/dev/null
2012/03/07 22:18:48 query: "foo" "oo_" "zot" ("_zo" "o_z")|("arz" "rzo")
2012/03/07 22:18:48 post query identified 0 possible files
bash$ ./csearch -verbose 'foo_(bar_)?zot' >/dev/null
2012/03/07 22:18:53 query: "foo" "oo_" "zot"
2012/03/07 22:18:53 post query identified 0 possible files
In the first case, "bar" is only three characters and stays as an exact trigram
and is used to construct the arz/rzo entries. When it becomes a prefix/suffix
list (when it hits 4 characters by adding the underscore), it no longer
provides us with any trigram info because the empty string empties out the
prefix and suffix lists as being "redundant" with the empty string. ("" is a
prefix of "ba").
I'm not sure if this is a bug or not. I.e, _should_ we be able to transform
prefix/suffix lists into AND/OR sets of trigrams in this case?
Original issue reported on code.google.com by dgryski
on 7 Mar 2012 at 10:08
Maybe this works as designed, but it caught me off guard. The line numbers
reported by "csearch -n" seem to start at 0 instead of 1. This is at odds with
e.g. grep, to which csearch compares itself in its help output. This makes it
somewhat awkward to use "csearch -n" in emacs (and presumably other editors)
since it numbers its lines starting at 1.
Original issue reported on code.google.com by austin.bingham
on 11 Oct 2012 at 10:41
What steps will reproduce the problem?
1. File containing: PAT and ä (0xE4, a umlaut), ö (0xF6,o umlaut), other high
bit ascii chars
2. cindex -reset .
3. csearch -i PAT
What is the expected output? What do you see instead?
Expected to find PAT. Instead no match.
What version of the product are you using? On what operating system?
codesearch-0.01-windows-amd64.zip
Please provide any additional information below.
I looked at the source (write.go) and it seems to expect that files are in
UTF-8 only (this is a Go specification?). However it would be nice if csearch
could be used with any source files, including those with high bit ascii
characters. Or that there would be a command line option for this.
Original issue reported on code.google.com by [email protected]
on 21 Nov 2012 at 7:24
What steps will reproduce the problem?
1. cindex any/path
2. cindex
3.
What is the expected output?
The index is updated.
What do you see instead?
After merging the master index with the updated index, the rename of
.csearchindex~~ to .csearchindex fail. This is possibly due to open file
handles.
What version of the product are you using? On what operating system?
Windows 7, 64 bit
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 23 Jan 2012 at 10:55
I ran out of disk space on a server and the error message from cindex ("short
write writing %s") indicated it should have been a log.Fatalf() call instead of
log.Fatal().
I ran go vet over the source tree and found two more instances of Fatal() calls
with format specifiers.
The patch is attached.
Original issue reported on code.google.com by dgryski
on 20 Feb 2012 at 2:15
Attachments:
http://code.google.com/p/codesearch/source/browse/index/read.go#36
The comment reads : "For example, the delta list [2,5,1,1,0] encodes the file
ID list 1, 6, 7, 8."
Am I misunderstanding something, or should the delta list read : [1,5,1,1,0] to
match up to the given file ID list?
Original issue reported on code.google.com by [email protected]
on 26 Apr 2012 at 7:10
What steps will reproduce the problem?
1. Run cindex on a directory containing a checkout
2. Run csearch on a text string known to be in the source code
What is the expected output? What do you see instead?
Expected output is source code matches. Instead, I see source code matches
_and_ .svn/.../foo.svn-base matches, or .hg/blah matches, or .git/blah matches.
What version of the product are you using? On what operating system?
f9a003c603e3 tip
Darwin 10.8.0
Please provide any additional information below.
A simple patch to the filewalker function can skip directories if they are
known to be "special" and not likely to contain items we want to index. I've
included a small patch to cindex.go which does just this.
This functionality could also be disabled unless -ignore-vcs or something is
passed.
Ack (betterthangrep.com) does similar filtering on directories and files:
#foo.c# from emacs, foo.c~ from vim, etc. Ignoring those would also be easy to
add.
Original issue reported on code.google.com by dgryski
on 19 Jan 2012 at 9:00
Attachments:
What steps will reproduce the problem?
1. run cgrep
2.
3.
What is the expected output? What do you see instead?
There is a custom usage message which is not displayed. Instead we get the
default usage message and a panic.
laptop$ cgrep
Usage of cgrep:
-c=false: print match counts only
-cpuprofile="": write cpu profile to this file
-h=false: omit file names
-i=false: case-insensitive match
-l=false: list matching files only
-n=false: show line numbers
panic: runtime error: index out of range
goroutine 1 [running]:
main.main()
/[XXXXX]/src/code.google.com/p/codesearch/cmd/cgrep/cgrep.go:58 +0x292
What version of the product are you using? On what operating system?
070ef10ab799 tip. Darwin 10.8.0
Please provide any additional information below.
This is the default flag.Usage() message instead of the custom usage() one
which exits and would prevent the index-out-of-range error.
The fix is a one-line patch in main():
flag.Usage = usage
Alternately, we could call our usage() function instead of flag.Usage() in the
if block.
Original issue reported on code.google.com by dgryski
on 27 Jan 2012 at 9:20
It's nice to be able to search for 'whole words', which is to say making sure
the pattern has a word boundary at the beginning and end.
Having to type '\bpattern\b' is annoying, and grep/ack already have a -w flag
which does this for you.
Since we already have -i which adjusts the pattern, I think adding another one
isn't to terrible.
I've attached a patch that does this.
Original issue reported on code.google.com by dgryski
on 17 Feb 2012 at 1:13
Attachments:
I've many very very old Windows source code which is encoded in codepage 1252. Codesearch skips all files without a warning.
See summary above.
Original issue reported on code.google.com by [email protected]
on 21 Jan 2013 at 12:33
Trying to run codesearch on OpenBSD -current amd64 with go 1.0.3 from packages,
using current codesearch code fetched with 'go get', I'm seeing this:
$ go run src/code.google.com/p/codesearch/cmd/cindex/cindex.go /usr/src
2013/03/04 10:52:52 index /usr/src
2013/03/04 10:56:11 flush index
2013/03/04 10:56:12 merge 13 files + mem
2013/03/04 10:56:12 mmap /tmp/csearch-index060279584: errno 4967816
exit status 1
If I run tests I get this,
=== RUN TestMerge
2013/03/04 11:11:32 merge 0 files + mem
2013/03/04 11:11:32 99 data bytes, 1217 index bytes
2013/03/04 11:11:33 merge 0 files + mem
2013/03/04 11:11:33 87 data bytes, 1217 index bytes
2013/03/04 11:11:33 mmap /tmp/index-test259895843: errno 5172144
exit status 1
FAIL code.google.com/p/codesearch/index 0.411s
The errno seems strange, any suggestions what might be happening? OpenBSD does
not have UBC so I'm wondering if it may require msync (not sure if cindex is
mixing access via standard file operations and mmap, but if so, it will need
msyncs between the different types of access).
Original issue reported on code.google.com by [email protected]
on 4 Mar 2013 at 11:19
I would like to add exclusion of subdirectories to cindex
. For example indexing of chromium /src
takes about 50 seconds while only 20 seconds if I exclude /src/out
and /src/third_party
.
Right now it works like cindex --exclude /src/out --exclude /src/tmp /src
plus I have to pass excludes if I want to re-index. As I understand the way to integrate that is to modify index format to store list of excluded directories. Is it OK to introduce such backward incompatible changes?
$ echo -n aaa > a
$ echo bbb > b
$ cindex .
2015/07/14 18:49:45 index /tmp/t
2015/07/14 18:49:45 flush index
2015/07/14 18:49:45 merge 0 files + mem
2015/07/14 18:49:45 7 data bytes, 154 index bytes
2015/07/14 18:49:45 merge /Users/jacekm/.csearchindex /Users/jacekm/.csearchindex~
2015/07/14 18:49:45 done
$ csearch -n .
/tmp/t/a:1:aaa/tmp/t/b:1:bbb
$
https://code.google.com/p/codesearch/source/browse/cmd/cindex/cindex.go#132 has
the relevant line of code. Effectively when I run cindex on my IRC log
directory, I get an index of *only* the private conversations. Public
conversations are in files named after the channel, so for example:
`2014/#ubuntu.03.log`.
cindex decides to ignore dotfiles, tildefiles, and for some reason probably
relating to EMACS, files beginning with an octothorpe ('#'). There does not
seem to be any command-line switch to disable this file-skipping (-a would seem
appropriate), and no way to provide custom include/exclude rules.
Since searching IRC logs is probably more common for me than searching code, it
makes this amazing tool rather less useful for me!
Original issue reported on code.google.com by [email protected]
on 27 Mar 2014 at 5:03
What steps will reproduce the problem?
1. Index the attached file with cindex
2. Search for a pattern inside it
3. No hits
What is the expected output? What do you see instead?
repro$
repro$ cindex -reset
repro$ cindex badfile
2013/03/13 18:51:52 index /tmp/repro/badfile
2013/03/13 18:51:52 flush index
2013/03/13 18:51:52 merge 0 files + mem
2013/03/13 18:51:52 0 data bytes, 92 index bytes
2013/03/13 18:51:52 done
repro$ cindex -list
/tmp/repro/badfile
repro$
repro$
repro$
repro$ grep main badfile
libc.so.6 __libc_start_main
torch main
torch realmain(int, char**)
libglib-2.0.... g_main_context_iteration
libglib-2.0.... g_main_context_prepare
libglib-2.0.... g_main_context_dispatch
#85 0x00000032f5c38f0e in g_main_context_dispatch () from
/lib64/libglib-2.0.so.0
#87 0x00000032f5c3ca3a in g_main_context_iteration () from
/lib64/libglib-2.0.so.0
#93 0x000000000040e74d in realmain(int, char**) ()
#94 0x000000000040e933 in main ()
repro$
repro$ csearch main <= no results here !!
repro$
repro$ grep threads badfile
============ All threads ==========
============ All threads ==========
repro$
repro$ csearch threads <= no results here !!
repro$
I cannot find (with csearch) text that is in a file I have indexed (cindex)
What version of the product are you using? On what operating system?
I'm using the Linux binaries that are available on the Download page.
I tried to compile go / codesearch but couldn't make it work (my go
install might be funky).
Please provide any additional information below.
It looks like the problem happens at indexing time.
Original issue reported on code.google.com by [email protected]
on 14 Mar 2013 at 1:57
Attachments:
What steps will reproduce the problem?
1. build all utils (or download binaries from this site)
2. be sure that there is no .csearchindex (or it has zero size)
3. run `cindex /usr/include` or any other path
`cindex` fails, because mmap returns error with EINVAL code.
This happens inside mmap_linux.go, on line 24.
That function (mmapFile) calls mmap with zero size, which fails.
I had manually copy ~/.csearchindex\~ to ~/.csearchindex.
That fixed the issue.
I expected cindex to work right out of the box, even with no .csearchindex.
I'm using:
Gentoo Base System release 2.0.3, Linux 3.1.6, amd64.
Go repository is the newest (built a few minutes ago from head).
Original issue reported on code.google.com by [email protected]
on 19 Jan 2012 at 8:10
And its usage message doesn't help distinguish it from csearch.
Original issue reported on code.google.com by [email protected]
on 21 Jan 2012 at 4:31
What steps will reproduce the problem?
1. cindex any/path (without CSEARCHINDEX set)
2.
3.
What is the expected output?
Using .csearchindex in home dir.
What do you see instead?
Failing due to the home path missing the drive letter. In index/read.go, the
line in File() getting the home dir should probably read 'home =
os.Getenv("HOMEDRIVE") + os.Getenv("HOMEPATH")'
What version of the product are you using? On what operating system?
Windows 7, 64 bit
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 23 Jan 2012 at 10:39
On tip 59fdb5d05e04 of golang, buiild of pkg index failed because of argument
type mismatch.
# code.google.com/p/codesearch/index
src/code.google.com/p/codesearch/index/mmap_bsd.go:34: cannot use f.Fd() (type
uintptr) as type int in function argument
Original issue reported on code.google.com by [email protected]
on 13 Feb 2012 at 8:30
When pointed to a source code tree containing revision control meta-data (e.g.
.svn, CVS, or similar system with meta-data in each subdir), the metadata files
get indexed as well, leading to undesired output from csearch.
What steps will reproduce the problem?
1. checkout some svn code
2. cindex that code
3. csearch some-token
What is the expected output? What do you see instead?
Only the actual code files are hits, instead I also see "xxx.svn-base" files.
What version of the product are you using? On what operating system?
hg 070ef10ab799
Please provide any additional information below.
Thanks for a great tool!
Original issue reported on code.google.com by [email protected]
on 20 Jan 2012 at 2:04
Queries of the type [Hh]ashTable fail to yield results, where hashTable and
HashTable each gets results.
It seems like a line is clearly missing in read.go in method:
func (r *postReader) init(ix, trigram, restrict)
so that the restrict parameter is ignored.
The attached patch adds a test that shows the bug and fixes the problem.
Original issue reported on code.google.com by [email protected]
on 15 Apr 2012 at 3:59
Attachments:
I want to source my android source like https://cs.chromium.org/
But after i
install golang-go
install codesearch
and run "go get github.com/google/codesearch/cmd/{cindex,csearch}"
cindex xxxSourcesxxx/
I don't know to make a website like https://cs.chromium.org/ to search android source
Is there any step-by-step guidance ?
I index android sources by openGrok before .
Is there any guidance like this
https://github.com/OpenGrok/OpenGrok/blob/master/README.txt
I ran the command go get github.com/google/codesearch/cmd/...
but nothing happens, cannot run cindex or csearch, can anyone help?
I already have go already installed
~ [10:26:26] $ go version
go version go1.9 darwin/amd64
csearch already prints the trigrams and number of files matched from the index
if -verbose mode is on.
This patch preprocesses the list of fileids so we can print how many were
matched by the filename regex.
It does double the number of calls to ix.Name(fileid), but that call looks
cheap and any slowdown would be dwarfed by the disk access during the grep
stage anyway.
Original issue reported on code.google.com by dgryski
on 17 Feb 2012 at 1:50
Attachments:
What steps will reproduce the problem?
1. Create some symlinks to some source code in a directory.
2. Index that directory.
3. Search for symbols which are only in symlinked source files.
What is the expected output? What do you see instead?
I expect to see symlinked files listed by csearch.
csearch only indexed regular files.
What version of the product are you using? On what operating system?
0.1 on Mac OS X 10.6
Please provide any additional information below.
Maybe additionally provide an option to restrict or follow symlinks to certain
filesystems for better control.
Original issue reported on code.google.com by [email protected]
on 7 Feb 2012 at 4:00
Would be great to wrap these in if ix.Verbose { ... } and allow for silent use.
In Flush:
log.Printf("%d data bytes, %d index bytes", ix.totalBytes, ix.main.offset())
In mergePost:
log.Printf("merge %d files + mem", len(ix.postFile))
Original issue reported on code.google.com by [email protected]
on 7 Dec 2012 at 8:08
siftUp is wrong if heap size > 3
Original issue reported on code.google.com by [email protected]
on 16 Nov 2013 at 3:45
Attachments:
The file .csearchindex is by word-readable by default. This might be a security
issue because it may allow an attacker to get pieces of information about files
it cannot read, such as filenames of files in directories that are not world
readable or reconstructed contents of files that are not world readable.
I could provide a patch, although I believe this is easy to fix.
Original issue reported on code.google.com by [email protected]
on 26 Aug 2013 at 9:30
bash$ csearch -verbose "foo_(bar_)?" >/dev/null
2012/03/05 10:44:37 query: "_ba" "foo" "o_b" "oo_"
2012/03/05 10:44:37 post query identified 11 possible files
bash$ csearch -verbose "foo_b(ar_)?" >/dev/null
2012/03/05 10:44:44 query: "foo" "o_b" "oo_"
2012/03/05 10:44:44 post query identified 12 possible files
In the first example, "_ba" and "o_b"' should not be required trigrams for the
file. In the second example, "ar_" is correctly _not_ included in the list of
required trigrams.
I'll take a look at this later tonight. It looks like an issue with precedence
in index/regexp.go. (I.e., the question mark is only applying to the final
trigram, and not all trigrams included in the grouping. ).
Original issue reported on code.google.com by dgryski
on 5 Mar 2012 at 10:07
What steps will reproduce the problem?
1. Installed latest version of go binaries.
2. go install code.google.com/p/codesearch/cmd/cindex (or any other component
library)
This produces:
can't load package: package code.google.com/p/codesearch/cmd/cindex: import
"code.google.com/p/codesearch/cmd/cindex": cannot find package
After that failed, I simply cloned the repository using Hg, and tried to build
locally. Not sure exactly how this is supposed to work (how to install all of
the sources as the local versions of the packages), so whenever I try to build
any sub-folder's contents (with go build), it produces the same error as above
when attempting to locate a dependent package.
What version of the product are you using? On what operating system?
Windows, both 32-bit and 64-bit.
Original issue reported on code.google.com by [email protected]
on 16 May 2012 at 1:13
It would be great if one could tell the indexer to (re-)index a single file and merge that efficiently into the existing index.
A strong plus would be to read the file names - 1 per line - from a (named or regular) pipe, and index those as soon as a new line becomes available.
change f.Fd()
to syscall.Handle(f.Fd())
mmap_windows.go:26
Original issue reported on code.google.com by [email protected]
on 2 Apr 2013 at 11:24
What steps will reproduce the problem?
1. csearch int
2. echo $?
3. csearch SomeStringNotAppearingInYourSourceTree
4. echo $?
What is the expected output? What do you see instead?
In both cases, a '1' is displayed indicating that csearch believes it found no
matches. This is because the Grep.Match field is not updated when a match is
found.
What version of the product are you using? On what operating system?
tip, darwin
Please provide any additional information below.
A one line fix setting g.Match = true in match.go:Grep.Reader():430 should fix
this.
Original issue reported on code.google.com by dgryski
on 29 Jan 2012 at 8:27
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.