Git Product home page Git Product logo

arxivpapers's Introduction

ArXiv Paper Reader

Official implementation of the algorithm behind:

YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

The main idea of this work is to simplify and streamline ArXiv paper reading. If you're a visual learner, this code will covert a paper to an engaging video format. If you are on the run and like listening, this code will also generate audio for listening.

Overview

Teaser image

Here are the main steps of the algorithm:

  1. Download paper source code, given its ArXiv ID

  2. Use latex2html or latexmlc to convert latex code to HTML page

  3. Parse HTML page to extract text and equations, ignoring tables, figures, etc

  4. If creating video, also create a map that matches pdf page to text and also text chunks to page blocks.

  5. Split the text into sections and pass them through OpenAI GPT api to paraphrase, simplify and explain.

  6. Split GPT-generated text into chunks and convert them to audio using text-to-speach Google api

  7. Pack all the necessary pieces and create a zip file for further video processing

  8. Using earlier computed text-block map, create video using ffmpeg

Note 1 The code can create both long, more detailed, as well as short, summarized versions of the paper.

Note 2 The long video version will also contain summary blocks after each section

Note 3 The short video version will contain automatically generated slides summarizing the paper

Note 4 The code can also upload the generated audio files to your Google Drive, if provided with proper credentials

Setup

Python Packages

openai, PyPDF2, spacy, tiktoken, pyperclip, google-cloud-texttospeech, pydrive2, pdflatex

How to run

# to create audio, both short and long, and prepare for video creation

python main.py --verbose --include_summary --create_short --create_video --openai_key <your_key> --paperid <arxiv_paper_id> --l2h

The default latex conversion tool latex2html sometimes fails, in this case remove --l2h to use latexmlc. Also, by default the code will process the whole paper up to references, if you want to stop earlier, pass --stop_word "experiments" (e.g., to stop before Experiments Section).

Output

<arxiv_paper_id>_files/
├── final_audio.mp3
├── final_audio_short.mp3
├── abstract.txt
├── zipfile-<time_stamp>.zip
├── ...
├── extracted_orig_text_clean.txt
├── original_text_split_pages.txt
├── original_text_split_sections.txt
├── ...
├── gpt_text.txt
├── gpt_text_short.txt
├── gpt_verb_steps.txt
├── ...
├── slides
    ├── slide1.pdf
    ├── ...

The output directory, among other things, contains generated audio files, slides, extracted original text and GPT generated output, split across pages or sections. The output also contains zipfile-<time_stamp>.zip which includes data for video generation.

# to extract only the original text from ArXiv paper, without any GPT/audio/video processing

python main.py --verbose --extract_text_only --paperid <arxiv_paper_id>

Now, we are ready to generate the video:

# to generate video based on the results from above, point to the 

python makevideo.py --paperid <arxiv_paper_id>

Output

output_<time_stamp>/
├── output.mp4
├── output_short.mp4
├── ...

The output directory now contains two video files, one for the long and another for the short video.

arxivpapers's People

Contributors

imelnyk avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.