Comments (26)
can you try the latest version and see if it improved?
from zoekt.
it's hard to give specific answers with specific data.
files that are never indexed are usually too large or binary
see also
zoekt/cmd/zoekt-git-index/main.go
Line 31 in 2f0c630
from zoekt.
which version are you using?
do the stats (bottom of page) indicate that data was skipped?
can you reduce the example to something smaller?
from zoekt.
Stats
Used 10M mem for 16953 documents (58M) from 1 repositories.
The file is a .c
file with size 259K
which would be greater than 128K.. That may indicate the problem..
However, its hard to pass options into git_index_flags
, I can't find a way to pass more that one flag into it.
Thanks..
from zoekt.
I think you can do -git_index_flags="-flag1 flag2"
zoekt/cmd/zoekt-indexserver/main.go
Line 153 in 2f0c630
if you want to clarify the help string there, that would be great.
from zoekt.
I tried
"-branches=master,develop sizeMax=1048576"
,
"-branches=master,develop -sizeMax=1048576"
"'-branches=master,develop sizeMax=1048576'"
from zoekt.
May be we could do strings.Split()
and -
as prefix, so that you could pass "flag1=data flag2=data"
from zoekt.
yeah, good idea. Send me a change.
from zoekt.
any further comments on "If I just look at the index file and grep it, I get more results then what I am seeing in the webpage." ? Is this a cutoff by number of matches, or does it really not show up (try restricting to the file you know it should be in.)
from zoekt.
I think increasing the size fixed it.. Let me investigate more and see if I can get more info if it concurs..
from zoekt.
I still seem to get this problem, cat reponame.zoekt | grep stringval
gives me 19 values, while the search gives me only 4. The stringval
also shows up in files that are not as big as the one I mentioned in prior comments.
The file which have stringval
are indexed properly, as I can get good results for other values from these files. Does the length (42 characters) of the searched string matter?
from zoekt.
if you do search for stringval, and restrict the search to a file that you know contains it (using "f:path/to/file"), does that return the data?
(I'd also be happy to debug the shard directly, if you are able to share it privately with me.)
from zoekt.
Yup, using f:path
works..
Sorry :(, can't really share the data..
from zoekt.
can you check that for incomplete results, the following condition triggers?
Line 159 in 2f0c630
from zoekt.
I think this is related to the web server. If i run ./zoekt -index_dir /var/data/index/ "stingval" | wc -l
, then I get 19 results.
from zoekt.
If that is true, the webserver should show a "Show more" link next to the results.
from zoekt.
oops.. crap.. It does... My mistake.. sorry about bothering you.. I did not expect that..
from zoekt.
did you get many matches for "stringval" that were symbol defintions?
How large is the corpus (number of files, number of bytes)? You can query "r:"
I got bitten by this today as well. I think we should make this more visible.
from zoekt.
Found 1 repositories (17517 files, 82Mb content)
I would say 11/19 are valid code and the rest 8 are comments. If I try sym:stringval
, I get 4 results.
I considered every presence of stingval inside a piece of code as a symbol, may be that wrongs?
The other problem seems to be numerous files that show up tagged Duplicate result, but with the same path.
from zoekt.
the sym: operator looks for symbol definitions, eg
class Blabla { .. }
in c++.
Looks your files are tiny (~ 500 bytes each), which throws off some coarse heuristics for matchcount that I introduced.
Re: duplicate results, are you indexing multiple branches? Does your project use submodules?
From which branches do the duplicate results come from?
from zoekt.
I am trying to index three branches.. The result I get is somethings like
MdEmbed.c [branch1]
stringval
MdEmbed.c [branch1] DuplicateResult
MdEmbed.c [branch1] DuplicateResult
MdEmbed.c [branch2]
stringval
MdEmbed.c [branch2] DuplicateResult
MdEmbed.c [branch2] DuplicateResult
MdEmbed.c [branch3]
stringval
MdEmbed.c [branch3] DuplicateResult
MdEmbed.c [branch3] DuplicateResult
from zoekt.
that is weird. Each (branch, filename) combo should be there just once. How much files does a single branch have, and how many distinct (filename, filecontent) pairs should you have roughly?
from zoekt.
The 3 branches are almost the same, they are usually merged back and forth every 2-3 days.
from zoekt.
how many files does each branch have?
from zoekt.
see #55
from zoekt.
16977 files..
Awesome.. that was perfect
from zoekt.
Related Issues (20)
- Feature: Directory listing HOT 1
- Change Bitbucket Default Clone URL HOT 1
- zoekt-mirror-gitiles doesn't delete stale repos.
- support exact match for definitions like `sym:^create$` HOT 1
- installation of zoekt-indexer fails HOT 4
- Make line reference format in url hash formattable HOT 6
- zoekt cannot list enough search results HOT 3
- git:// url in submodule trips up web template rendering
- Revive REST functionality with what conditions? HOT 6
- Q: Show context HOT 6
- zoekt-webserver: Work With Empty Index Directory HOT 1
- Problem when using ctags: "write |1: file already closed" HOT 2
- Occasional Invalid JSON From Ctags HOT 7
- zoekt-git-index fails with indexGitRepo(...): reference not found HOT 9
- zoekt-mirror-github doesn't update existing repository cache
- Error for all repos: `subrepository branch mismatch` HOT 3
- Need to exempt certain file types from being treated as binary due to trigram count HOT 1
- Error parsing regexp when escaped square brackets are quoted
- Feature request: filtering on whether code is in an archived repository HOT 1
- zoekt-indexserver: Support Per-Repository Branch Selection HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zoekt.