Git Product home page Git Product logo

Comments (26)

hanwen avatar hanwen commented on May 9, 2024 1

can you try the latest version and see if it improved?

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

it's hard to give specific answers with specific data.

files that are never indexed are usually too large or binary
see also

var sizeMax = flag.Int("file_limit", 128*1024, "maximum file size")

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

which version are you using?

do the stats (bottom of page) indicate that data was skipped?

can you reduce the example to something smaller?

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

Stats

 Used 10M mem for 16953 documents (58M) from 1 repositories.

The file is a .c file with size 259K which would be greater than 128K.. That may indicate the problem..
However, its hard to pass options into git_index_flags, I can't find a way to pass more that one flag into it.

Thanks..

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

I think you can do -git_index_flags="-flag1 flag2"

args = append(args, strings.Split(indexFlags, " ")...)

if you want to clarify the help string there, that would be great.

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

I tried
"-branches=master,develop sizeMax=1048576",
"-branches=master,develop -sizeMax=1048576"
"'-branches=master,develop sizeMax=1048576'"

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

May be we could do strings.Split() and - as prefix, so that you could pass "flag1=data flag2=data"

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

yeah, good idea. Send me a change.

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

any further comments on "If I just look at the index file and grep it, I get more results then what I am seeing in the webpage." ? Is this a cutoff by number of matches, or does it really not show up (try restricting to the file you know it should be in.)

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

I think increasing the size fixed it.. Let me investigate more and see if I can get more info if it concurs..

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

I still seem to get this problem, cat reponame.zoekt | grep stringval gives me 19 values, while the search gives me only 4. The stringval also shows up in files that are not as big as the one I mentioned in prior comments.

The file which have stringval are indexed properly, as I can get good results for other values from these files. Does the length (42 characters) of the searched string matter?

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

if you do search for stringval, and restrict the search to a file that you know contains it (using "f:path/to/file"), does that return the data?

(I'd also be happy to debug the shard directly, if you are able to share it privately with me.)

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

Yup, using f:path works..

Sorry :(, can't really share the data..

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

can you check that for incomplete results, the following condition triggers?

zoekt/eval.go

Line 159 in 2f0c630

break

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

I think this is related to the web server. If i run ./zoekt -index_dir /var/data/index/ "stingval" | wc -l, then I get 19 results.

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

If that is true, the webserver should show a "Show more" link next to the results.

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

oops.. crap.. It does... My mistake.. sorry about bothering you.. I did not expect that..

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

did you get many matches for "stringval" that were symbol defintions?

How large is the corpus (number of files, number of bytes)? You can query "r:"

I got bitten by this today as well. I think we should make this more visible.

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

Found 1 repositories (17517 files, 82Mb content)

I would say 11/19 are valid code and the rest 8 are comments. If I try sym:stringval, I get 4 results.
I considered every presence of stingval inside a piece of code as a symbol, may be that wrongs?

The other problem seems to be numerous files that show up tagged Duplicate result, but with the same path.

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

the sym: operator looks for symbol definitions, eg

class Blabla { .. }

in c++.

Looks your files are tiny (~ 500 bytes each), which throws off some coarse heuristics for matchcount that I introduced.

Re: duplicate results, are you indexing multiple branches? Does your project use submodules?
From which branches do the duplicate results come from?

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

I am trying to index three branches.. The result I get is somethings like

MdEmbed.c [branch1]
stringval

MdEmbed.c [branch1] DuplicateResult
MdEmbed.c [branch1] DuplicateResult

MdEmbed.c [branch2]
stringval

MdEmbed.c [branch2] DuplicateResult
MdEmbed.c [branch2] DuplicateResult

MdEmbed.c [branch3]
stringval

MdEmbed.c [branch3] DuplicateResult
MdEmbed.c [branch3] DuplicateResult

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

that is weird. Each (branch, filename) combo should be there just once. How much files does a single branch have, and how many distinct (filename, filecontent) pairs should you have roughly?

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

The 3 branches are almost the same, they are usually merged back and forth every 2-3 days.

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

how many files does each branch have?

from zoekt.

hanwen avatar hanwen commented on May 9, 2024

see #55

from zoekt.

nikhilkalige avatar nikhilkalige commented on May 9, 2024

16977 files..
Awesome.. that was perfect

from zoekt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.