Git Product home page Git Product logo

Comments (2)

joel-beck avatar joel-beck commented on June 21, 2024

Potential Issues to keep in mind:

All further steps are restricted to the documents of the top n document ids. This includes

  • the weighted linear model: the co-citation analysis and bibliographic coupling of all other documents are None and must be excluded from the weighted linear model
  • the hybrid model: The first recommender might select documents that are not available in the data of the second recommender since e.g. the top-n documents for the citation model might not be within the top-n documents for the language model

from readnext.

joel-beck avatar joel-beck commented on June 21, 2024

Reopened since currently, each row of the scores dataframe stores the scores for any other document.
This means the stored dataframe for 10.000 documents contains 10.000 rows with 10.000 DocumentScore objects each which is too large and too slow to read in.

Idea: Again store only the top 100 document scores within each row.
For feature weighting the ranks are not used directly but the inverse ranks as score points, i.e. rank 1 gets 100 points, rank 100 gets one point, all lower ranks get zero points (score points = 101 - rank).

Task: When computing the ranks and documents are looked up with their index, the index might now not be contained in the row of the scores dataframe (since not all but only the top 100 are stored for each feature).
Modify the lookup such that the score points are set to zero in case of a KeyError and the computation can continue.

from readnext.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.