Right now e.g. the precomputed pairwise cosine similarities are stored within a

Store only top 100 document matches instead of all scores in a dataframe about readnext HOT 2 CLOSED

joel-beck commented on June 21, 2024

Store only top 100 document matches instead of all scores in a dataframe

from readnext.

Comments (2)

joel-beck commented on June 21, 2024

Potential Issues to keep in mind:

All further steps are restricted to the documents of the top n document ids. This includes

the weighted linear model: the co-citation analysis and bibliographic coupling of all other documents are None and must be excluded from the weighted linear model
the hybrid model: The first recommender might select documents that are not available in the data of the second recommender since e.g. the top-n documents for the citation model might not be within the top-n documents for the language model

from readnext.

joel-beck commented on June 21, 2024

Reopened since currently, each row of the scores dataframe stores the scores for any other document.
This means the stored dataframe for 10.000 documents contains 10.000 rows with 10.000 DocumentScore objects each which is too large and too slow to read in.

Idea: Again store only the top 100 document scores within each row.
For feature weighting the ranks are not used directly but the inverse ranks as score points, i.e. rank 1 gets 100 points, rank 100 gets one point, all lower ranks get zero points (score points = 101 - rank).

Task: When computing the ranks and documents are looked up with their index, the index might now not be contained in the row of the scores dataframe (since not all but only the top 100 are stored for each feature).
Modify the lookup such that the score points are set to zero in case of a KeyError and the computation can continue.

from readnext.

Recommend Projects

Store only top 100 document matches instead of all scores in a dataframe about readnext HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent