Git Product home page Git Product logo

Comments (6)

AmenRa avatar AmenRa commented on May 30, 2024 1

Hi @kaleko,

Thank you very much for the bug report and for providing a working example!
numba was not raising a ZeroDivisionError, so I did not spot this issue before.
I fixed it in v.0.3.4. Now it works as intended.

Please, consider giving ranx a star if you like it!

from ranx.

kaleko avatar kaleko commented on May 30, 2024

It seems that if there is an empty query result in the run_dict, every query after it will always have a precision of 0.

from ranx.

kaleko avatar kaleko commented on May 30, 2024

@AmenRa I now see that in the above example, the outputs are run 1 --> precision of 1.0, run 2 --> precision of 0.75, run 3 --> precision 0.75.

It's good to see runs 2 and 3 have the same precision, the result of fixing your ZeroDivisionError issue.

However I question whether the actual precision calculation is correct.
According to this comment

**Precision** is the proportion of the retrieved documents that are relevant.<br />

Precision is the "proportion of retrieved documents that are relevant"

In all three runs above, every document which was retrieved was relevant. Shouldn't the precision be 1.0 for all runs?

from ranx.

AmenRa avatar AmenRa commented on May 30, 2024

Usually, a system does not return documents whose relevance score is zero.
That's why you could end up with empty result lists, as in your example.
However, this is probably "a convention" because 1) you cannot meaningfully order the documents if they all have the same relevance score (so the system's output would be kind of random), and 2) if the system returns the entire collection every time it is queried, it will have severe efficiency issues.

Moreover, if you cast Information Retrieval to a binary classification problem, the returned documents would be the data points judged as positives by the model and the non-returned ones as negatives.
If you have a query for which no document was returned, the model judged all the documents as negatives (non-relevant to the query).

I think returning no documents for one or more queries is a corner case.
If we take this corner case to the extreme, a system that never returns documents should have Precision=1.0 on average following the last line of your comment, which does not sound right to me.

Makes sense / do we agree?

from ranx.

kaleko avatar kaleko commented on May 30, 2024

I guess I agree. It sounds like a convention.

For example, if I google "awefoihawoefihawoefihw" and zero results come back, did my query have 100% precision or 0% precision? I would argue 100%, but I can see both sides.

Thanks for the clarification.

from ranx.

AmenRa avatar AmenRa commented on May 30, 2024

If you find a theoretically sound explanation, please post it here.

from ranx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.