Git Product home page Git Product logo

muse-dl's Introduction

muse-dl Travis (.org) GitHub issues GitHub issues by-label GitHub GitHub top language PRs Welcome Docker Cloud Automated build Docker Cloud Build Status Docker Image Size (latest semver)

Download PDFs from Project MUSE and stitch them together into a single-file using pdftk.

⚠️ WARNING ⚠️

Any downloads you perform with this tool are for your own usage. I personally hate reading PDFs on a browser, this lets me read them much more easily offline. This is just for personal use.

Installation

Linux / Build

git clone https://github.com/captn3m0/muse-dl.git
cd muse-dl
shards install
shards build
./bin/muse-dl --help

Linux / Download

A linux x86_64 static build is available in the latest release: https://github.com/captn3m0/muse-dl/releases/latest. Save the file as muse-dl and remember to mark it as executable (chmod +x).

Docker

A docker image is available at captn3m0/muse-dl on Docker Hub. The working directory for the image is set as /data, so you'll need to mount your output-directory as /data for it to work. Sample invocations;

# Download the book, and put it in your Downloads directory
docker run -it /home/nemo/Downloads:/data captn3m0/muse-dl:edge https://muse.jhu.edu/book/875

# If you have a list.txt file in your Downloads directory, then you can run
docker run -it /home/nemo/Downloads:/data captn3m0/muse-dl:edge /data/list.txt

# If you want to keep the temporary files with your host, and not delete them
docker run -it /home/nemo/Downloads:/data /tmp:/musetmp captn3m0/muse-dl:edge --tmp-dir /musetmp --no-cleanup https://muse.jhu.edu/book/875

Replace edge with the latest version number if you'd like to run a tagged release.

Docker Images

The following images are available:

  • edge: Run muse-dl against latest master.
  • edge-static: Get the pre-built static-binary against latest master.
  • v1.3.1: Run muse-dl against the specific release.
  • v1.3.1-static: Get the pre-built static binary against the specific release.

Requirements

Please ensure you have pdftk installed, unless you're running via docker.

Usage

Usage: muse-dl [--flags] [URL|INPUT_FILE]

URL: A link to a book on the Project MUSE website, eg https://muse.jhu.edu/book/875
INPUT_FILE: Path to a file containing a list of links

    --no-cleanup                     Don't cleanup temporary files
    --tmp-dir PATH                   Temporary Directory to use
    --output FILE                    Output Filename
    --no-bookmarks                   Don't add bookmarks in the PDF
    --clobber                        Overwrite the output file, if it already exists.
    --dont-strip-first-page          Disables first page from being stripped. Use carefully
    --cookie COOKIE                  Cookie-header
    -h, --help                       Show this help

Sample Run

muse-dl https://muse.jhu.edu/book/875
Saved final output to Accommodating Revolutions- Virginia's Northern Neck in an Era of Transformations, 1760-1810.pdf

Alternatively, if you pass a input-file.txt (sample), you can pass it as the sole parameter.

muse-dl input.txt

And it will download all the links in that file.

License

Licensed under the MIT License. See LICENSE file for details.

muse-dl's People

Contributors

captn3m0 avatar pradn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

muse-dl's Issues

Switch to GHCR

DockerHub is no longer hosting old images, and GHCR is much better.

--help shows help twice

This is with the newest static build on the releases page.

$ ./muse-dl --help
Usage: muse-dl [--flags] [URL|INPUT_FILE]

URL: A link to a book on the Project MUSE website, eg https://muse.jhu.edu/book/875
INPUT_FILE: Path to a file containing a list of links

    --no-cleanup                     Don't cleanup temporary files
    --tmp-dir PATH                   Temporary Directory to use
    --output FILE                    Output Filename
    --no-bookmarks                   Don't add bookmarks in the PDF
    --input-pdf INPUT                Input Stitched PDF. Will not download anything
    --clobber                        Overwrite the output file, if it already exists. Not compatible with input-pdf
    --cookie COOKIE                  Cookie-header
    -h, --help                       Show this help
Usage: muse-dl [--flags] [URL|INPUT_FILE]

URL: A link to a book on the Project MUSE website, eg https://muse.jhu.edu/book/875
INPUT_FILE: Path to a file containing a list of links

    --no-cleanup                     Don't cleanup temporary files
    --tmp-dir PATH                   Temporary Directory to use
    --output FILE                    Output Filename
    --no-bookmarks                   Don't add bookmarks in the PDF
    --input-pdf INPUT                Input Stitched PDF. Will not download anything
    --clobber                        Overwrite the output file, if it already exists. Not compatible with input-pdf
    --cookie COOKIE                  Cookie-header
    -h, --help                       Show this help````

instruction to use in catalina 10.15.6 or high Sierra

Hello: would you be able to provide instructions on how to install on either catalina or high Sierra OSX? I am not very knowledgable but can follow instructions. I am very bored of dl chapter by chapter then stitching -- so would hugely appreciate your script working in my environment.Thankyou for your time.

Easier captcha bypass

Maybe automatically open a browser window against a local running webserver with the captcha and submit the captcha on the backend?

Set output directory

Fairly easy todo. Right now we drop everything in the current working directory. It will be nice to have a --output-directory flag so it can be run more easily.

Actually cleanup

The --dont-cleanup flag doesn't make a difference, since we aren't really cleaning up properly.

Fix this on priority.

Crash on dowloading

Hi,

downloading the book:

https://muse.jhu.edu/book/475

crash the software:

Downloaded 1865151
Downloaded 1865152
Downloaded 2183800
Downloaded 2183798
Downloaded 8830
Downloaded 8831
Downloaded 8832
Downloaded 8833
Downloaded 8834
Downloaded 8835
Downloaded 8836
Downloaded 8837
Downloaded 8838
Downloaded 8839
Downloaded 8840
Downloaded 8841
Downloaded 8842
Downloaded 8843
Unhandled exception: (no message) (Muse::Dl::Errors::CorruptFile)
  from /Users/fox/Downloads/muse-dl/muse-dl/src/pdftk.cr:103:9 in 'stitch'
  from /Users/fox/Downloads/muse-dl/muse-dl/src/muse-dl.cr:37:11 in 'dl'
  from /Users/fox/Downloads/muse-dl/muse-dl/src/muse-dl.cr:59:11 in 'run'
  from /Users/fox/Downloads/muse-dl/muse-dl/src/muse-dl.cr:68:1 in '__crystal_main'
  from /usr/local/Cellar/crystal/0.33.0/src/crystal/main.cr:106:5 in 'main_user_code'
  from /usr/local/Cellar/crystal/0.33.0/src/crystal/main.cr:92:7 in 'main'
  from /usr/local/Cellar/crystal/0.33.0/src/crystal/main.cr:115:3 in 'main'

is it fixable?

Regards,
S.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.