Git Product home page Git Product logo

cbenge509 / arxiv-ai-analysis Goto Github PK

View Code? Open in Web Editor NEW
11.0 4.0 1.0 955.01 MB

A visualization experience of AI/ML academic papers hosted on ArXiV - for project work at the University of California, Berkeley MIDS program (W209, Data Visualization).

Home Page: https://victorious-plant-0cdd0420f.azurestaticapps.net/

License: MIT License

Jupyter Notebook 19.51% Python 0.10% HTML 80.39%

arxiv-ai-analysis's Introduction

arXiv.org AI/ML Analysis

GitHub GitHub Pipenv locked Python version GitHub Pipenv locked dependency version GitHub Pipenv locked dependency version GitHub Pipenv locked dependency version GitHub Pipenv locked dependency version GitHub Pipenv locked dependency version GitHub Pipenv locked dependency version

U.C. Berkeley, Masters in Information & Data Science program - datascience@berkeley
Summer 2020, W209 - Data Visualization - Andrew Reagan, PhD - Section 4


Description

This repo contains the draft work for the visualization of AI/ML research papers catalogued on arXiv.org for calendary years 1993 through 2019. Categories under consideration have been limited to:

  • Computer Science: Artificial Intelligence [cs: AI]
  • Computer Science: Machine Learning [cs: LG]
  • Statistics: Machine Learning [stat: ML]

There are two external visuals for this project:


This project leverages the following visualization frameworks:


Highlight of key files included in this repository:

File Description
ArXiV AI & ML Analytics - Midterm.pptx Midterm presentation for arXiv AI & ML Analysis solution
w209_assignment_2__cris_benge.pdf Assignment 2, covering thorough review and initial hypothesis testing of arXiv data.
load_base_data.py Processes the base arXiv categories data, storing the output into a single Pandas.DataFrame (HDF5 file)
refine_data_for_analysis.py Processes the consolidated (but raw) arXiv categories data, generating the final analysis output data (CSV file)
utils/preprocessing.py Utility class; used for loading the raw arXiv data and generating the processed analysis dataset
plot_utils.py Utility class; used for generating various plots in EDA
Exploratory Data Analysis.ipynb Jupyter Notebook demonstrating the basic exploratory data analysis performed
Clustering and Topic Modeling.ipynb Jupyter Notebook containing the walk-through for clustering and topic modeling of the arXiv dataset

Visualization Samples




References

Data was collected from the tremendous work provided by the arxiv_archive repo and all due credit is referred to:

Geiger, R. Stuart (2020). ArXiV Archive: A Tidy and Complete Archive of Metadata for Papers on arxiv.org. doi | url.


License

Licensed under the MIT License. See LICENSE file for more details.

arxiv-ai-analysis's People

Contributors

cbenge509 avatar anuyadavberkeley avatar dependabot[bot] avatar

Stargazers

Piyush R. Maharana avatar  avatar  avatar ipruning avatar  avatar Shiran Yuan avatar  avatar  avatar OMAR KELLA avatar  avatar Gregory Lisiak avatar

Watchers

James Cloos avatar Rahul Kulkarni avatar  avatar  avatar

Forkers

yzwu1209

arxiv-ai-analysis's Issues

Import Error

I used python 3.8.18 and successfully installed all the packages in requirements.txt using pip. However, there is an import error using the bokeh package.

ImportError: cannot import name 'Markup' from 'jinja2' (c:\Users\29657\anaconda3\envs\datavis\lib\site-packages\jinja2_init_.py)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.