Git Product home page Git Product logo

academicrank's Introduction

AcademicRank

2021SP FORWARD Lab Project

Introduction

The goal of the project is to calculate the rank of academic works given a keyword. The rank will be calculated according to the Field of Study of the paper. The ranking algorithm is inspired by The PageRank Citation Ranking: Bringing Order to the Web, with the assumption that similarity between the papers and the target keywords can only be distributed once. Currently, the program can only handle the keyword with multiple words to ensure the accuracy of ranking.

Installation

Install the package using requirements.txt

pip3 install -r requirements.txt

Datasets

Microsoft Academic Graph

The Mircosoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications and fields of study. The schema of the dataset can be found here. Among those dataset files, we would use:

  • FieldsOfStudy
  • PaperFieldsOfStudy
  • PaperReferences

The downloaded data can be found on owl3 server, path.

Springer-83K CS Keywords

The CS keywords collected from Springer by Yanghui Pang. Dataset can be found here.

word2vec Model

The word2vec model is trained on the abstract of papers in arXive dataset by Edward Ma. The model can be found here.

Usage

Build the Pruned MAG Dataset

To speed up the ranking algorithm, we need to first prune out the Field of Study (FoS) that are not CS keywords.

python3 prune_fos.py

The resulting FoS list will be in pruned_FOS.txt.

We further need to prune out the papers and references that do not relate to CS.

python3 prune_paper_edge.py

The resulting file are cspapers.txt and pruned_PR.txt.

If any issue exists when running prune_fos.py or prune_paper_edge.py please check the original codes which are more stable.

Perform AcademicRank

The preparation work only need to be done once. To calculate the rank of papers given keywords, do

python3 academic_rank.py [keyword1,keyword2,...]

where keywords need to be separated by ',' and keywords with multiple words need to be connected by '_'. E.g.

python3 academic_rank.py computer_science,data_mining

Visualization

Since the academic_rank.py will give a list of paper ID, we can find the name of the papers given the ID using MAG API. See methods and examples from visualization.ipynb for more information.

Reservation

The accuracy of this program is not guaranteed because the vocabulary of the word2vec model is not large enough and thus the keyword similarity cannot be calculated in the most times. Currently, the program is assigning dummy similarity to the keywords that are not in word2vec model.

Author

academicrank's People

Contributors

ehzoahis avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.