Git Product home page Git Product logo

gogoanime's Introduction

⚠️ Archived ⚠️

As the commit history indicates, this project has been abandoned for quite a while. More information on why can be found in this comment. Long story short, circumventing CAPTCHA on gogoanime was becoming more and more tricky, until it eventually surpassed my expertise.

Assuming you came here looking for a simple and nice download utility, I have personally moved on to ani-cli which has been working great so I can highly recommend it! It does everything I intended this script to do at the core, and has many more neat features!

About

This script extracts the MP4 video file links for episodes found on <www.gogoanime.so> (or wheverever this points to). All episodes for a whole season are automatically extracted. Usage is basically as follows; no quotes must be used here, check Using the script further down for more details.

./gogoanime.py <search term>

Important to note is that the links are meant to be downloaded using aria2. Reason being that the links by themselves do not allow to download the episode; you also need to specify the referer request parameter. Plus, with aria2 we can neatly name our downloaded files. Recommended usage is as follows (command can be copy-pasted), more relevant flags can be found further down in section About downloading.

aria2c -i links.txt -c --auto-file-renaming false

Installation

Install Python dependencies

The dependencies are managed using pipenv. Check https://realpython.com/pipenv-guide/ for a nice introduction to the tool. Use the first command to install the packages in a new virtual environment. Use the second variant to install the packages globally on your system. The installation method has an impact on how you actually run the code. The latter might be the preferred way for the end user.

pipenv install
pipenv install --system

Install System dependencies

Since we are relying on selenium to extract the links, you also need to install the corresponding browser and driver. We use Firefox here, for which you need the GeckoDriver.

On KDE Neon (which is based on Ubuntu) you can install them as follows.

# apt install firefox firefox-geckodriver

Using the script

The links will be saved to a file in a corresponding folder located inside of $HOME/Videos/Anime. The download location can be changed by modifying the download_folder variable in the code. The script further tries to grab the highest resolution video found in the download page. Downloading should be taken care of outside of this program. To be noted is that the links file is structured for usage with the aria2 downloading utility.

Usage

./gogoanime.py <search term>

Upon hitting enter, the script will parse the gogoanime page for results given the search term and you will be prompted to select one by typing the number attributed to the entry. Entering a negative number will abort and exit the script. Alternatively ctrl+c also does the job.

Example

Note that the search term must not be quoted. Internally, all command linke arguments will be concatenated which should make it less of a hassle to look for animes.

./gogoanime.py yahari ore

About downloading

Once you have extracted the links with this script, you can run the following command. Note that you should be inside the directory where the links are stored, which by default is always a file called links.txt.

aria2c -i links.txt

To resume partially downloaded files use the -c or --continue flag.

aria2c -i links.txt -c

It may be preferred to use the following instead however, as aria2 by default downloads existing files again but simply renames them by adding a number. With the additional flag we tell aria to not rename existing files, which will cause aria2 to skip the download since by default files are not overwritten (see --allow-overwrite flag for aria2c).

aria2c -i links.txt -c --auto-file-renaming false

This last form is the most robust, since partially downloaded episodes will resume their download. At the same time, already downloaded episodes will not be re-downloaded or overwritten.

Fine tuning the download process

These commands are already plenty to download the episodes. However, aria2 has many options and the following flags are a relevant collection which allow to fine tune the downloading further if needed.

aria2c -i links.txt -c -x12 -s12 -k10M --auto-file-renaming false --max-overall-download-limit=3M

Flags are as follows:

Short Long Description
-i --input-file File in which the URLs are stored
-c --continue Resume on partially downloaded files
-x --max-connection-per-server Default is 1
-s --split Number of parallel downloads for item
-k --min-split-size Split download of item into multiple ranges
--auto-file-renaming Rename file if the same file already exists
--max-overall-download-limit Limit overall download speed

About the implementation

Accessing the download page (when pressing the yellow download button on an episode on gogoanime) requires you to first complete a CAPTCHA verification. Circumventing this verification is seemingly only achievable by rendering the page with a browser (as it does not require any human intervention). For this reason, selenium is used allowing us to render the page in a headless browser and then to extract the download links. This is an automated process of course.

The first section in this code, called "Parameters", found after the imports, presents two dictionaries that you can and may have to tweak. They are called 'headers' and 'cookies'. The User-Agent header is important to set, because otherwise Python's request will be shut down due to being a bot. In elder versions of gogoanime, cookies were needed to enter the download page containing the video links. At this point in time, the cookies are not used anywhere in the code.

gogoanime's People

Contributors

nivek77pur avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

devvratmiglani

gogoanime's Issues

Specify download location?

Currently there is no way to specify the root download folder, other than modifying the source code. The aim is to keep this script simple and straightforward, therefore such an option was not included yet.

The ideal solution would be to read an environment variable. An optional flag might cause difficulties and inconsistencies in how the script is to be called when looking for anime. For example an envrionment variable such as GOGO_SAVEDIR could be set. Usage could then be as follows.

  1. Set environment variable "globally" via the .profile by adding a line saying export GOGO_SAVEDIR=xxx
  2. Environment variable can be specified when calling the script as GOGO_SAVEDIR=xxx gogoanime.py yyy

With this, the only modification to the code would be to read or not the environment variable if it was set. This will maintain the simplicity of the implementation.

Captcha triggered?

When downloading anime, the script very often fails at the Extracting video links ... step with the following error message. When looking at the file written to /tmp/gogoanime.html it shows that a button needs to be pressed, verifying that I'm not a bot

Could not extract HTML for the following site. Timeout reached.
https://goload.pro/download?id=MTQ4NTMy&typesub=Gogoanime-SUB&title=Shingeki+no+Kyojin%3A+The+Final+Season+Episode+1
Page written to /tmp/gogoanime.html for debugging.
Traceback (most recent call last):
  File "/home/kuni/Videos/Anime/gogoanime/./gogoanime.py", line 206, in <module>
    soup = getDownloadPageHTML(browser, episode)
  File "/home/kuni/Videos/Anime/gogoanime/./gogoanime.py", line 54, in getDownloadPageHTML
    what_is_this = WebDriverWait(browser, timeout).until(
  File "/home/kuni/.local/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 89, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:183:5
NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.jsm:395:5
element.find/</<@chrome://remote/content/marionette/element.js:300:16

The script creates geckodriver.log files

A geckodriver.log file is created in whichever directory you run the script in.

This is most likely caused by selenium and there is surely an option to avoid it writing a log file.

Anime search results with no year cause error

Animes that have no release date yet may be listed without a year, which causes the script to spit out an error

Traceback (most recent call last):
  File "./gogoanime.py", line 93, in <module>
    year = re.search(r'\d+', rls.text).group()
AttributeError: 'NoneType' object has no attribute 'group'

New episodes for seasonals don't have the link included

It seems that trying to fetch the latest episode of a seasonal anime, does not put the actual link into the links.txt file. It only adds the two out and referer fields which results in the file looking something like this, where there is obviously a link missing before the second out.

...
<link to an episode here>
    out=...
    referer=...
    out=...
    referer=...

Possible workarounds if this occurs:

  1. Delete the last two lines from the file and retry.
  2. If it still does not work delete the last five lines as shown in the above code block (or the last few episodes if you want to be sure).
  3. Worst case: delete the file and retry (which will re-download all the links).

Gogoanime episode link is not formulated correcty

Episode link constructed:

https://gogoanime.fi/tate-no-yuusha-no-nariagari-2nd-season-episode-1

Actual episode link:

https://gogoanime.fi/tate-no-yuusha-no-nariagari-season-2-episode-1

Proposal to fix: do not construct episode link based on link from category page, but instead build it based on the link found in the first episode button.

Refactor writing of webpage for debugging

Lines 63 to 66 as well as 213 to 216 are debug statements which write the webpage to a file for further inspection. These lines should be abstracted into a function to clean up the code, and allow to easily add more such debug lines when needed.


Additional reasons for these debug statements:
It sometimes occurs that the script cannot extract what it needs in a webpage, which will cause the script to break and error out. Writing the webpage at the time of the error to a file, allows to easily and quickly inspect what might have caused the issue. For example, CAPTCHA might have been triggered, which will be visible in the file that was written (opening it with the browser also renders the webpage as the script saw it).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.