Git Product home page Git Product logo

zippyshare-scraper's Introduction

Test Status

Check the CHANGELOG for updates.

Zippyshare Scraper:

This is a script to get direct download links to files from zippyshare. If you've used zippyshare to download anything then you know that you have to go to their page and click on the download now button to get the download started.

This script extracts the real download link from the page. You can directly feed that link to a downloader to get your download started.

This script is useful when working on remote servers where you don't have access to gui software.

Reset Zippyshare Uploads:

If you're a zippyshare uploader, you know that zippyshare uploads are taken down if they're not downloaded in last 30 days.

You can use this script to reset the last download date without actually downloading the complete upload.

This script initiates the download of the file to test whether the link is working. Because of this, the last downloaded date for the file is also updated.

In this way, you can very easily extend the lifetime of your upload without wasting valuable time and bandwidth.

Dependencies :

  1. You need Python 3 environment to execute the script. You can easily install it from here.
  2. Install the python dependencies:
	pip install requirements.txt

Options:

Arg Value Description
--in-file filepath Path of the file containing zippyshare links to parse.
--out-file filepath Path of the file in which generated links will be stored.
--dlc filepath Path of a .dlc file. Takes precendence over --in-file.
--filecrypt link Link of a filecrypt container page. Note: It should not have a password or captcha.
--engine js/text Which engine to use for generating links. js by default. See Engines below for explanation.

Engines:

History

  • This library used to work by scraping the zippyshare webpage.
  • Parsing the javascript code to generate the link by regex matching.
  • Whenever the source code of the site changed even slightly, this broke the regex matchers.
  • Hence we ended up multiple different patterns that the site source code can have.

Update

  • Instead of parsing the javascript using regex matching, the library has switched to executing the javascript code.
  • The pure python implementation of javascript engine js2py is used for this.
  • This should make the library more robust.

For now, I am keeping both the different approaches for getting the download links. These are the two engines --

  • JsEngine
  • TextEngine

Usage :

  1. Input links using an input file --
python zippyshare.py --in-file input_links.txt --out-file links.txt
  1. input links using dlc file --
python zippyshare.py --dlc filename.dlc --out-file links.txt
  1. Input links from terminal --
python zippyshare.py

Examples :

Example of unprocessed link (this type of link will be input): http://www120.zippyshare.com/v/7DpZTYfi/file.html

Example of Direct Downloadable link: http://www120.zippyshare.com/d/7DpZTYfi/4656/Ghost.In.The.Shell.S2.x265.7z.003

(Optional) :

  • You can then download from links.txt as follows:

aria2c -i links.txt --file-allocation=none -c --auto-file-renaming=false

wget -nc -i links.txt

  • Or you can download using any other downloader you prefer.

Known Issues :

  • You have to run the script from the same network using which you are downloading files otherwise links may not work.

  • The direct download links stop working after a few hours ( About 3-4 hrs, maybe). Don't know the exact time period. At that point, you can rerun the script to get new download links to the same files which will work without problem.

  • The script runs into an error when the "File not exist" zippyshare page loads.

License :

This project is licensed under the terms of the MIT license.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.