dreamcobbler / fiction-dl Goto Github PK

A content downloader, capable of retrieving works of (fan)fiction from the web and saving them in a few common file formats.

License: GNU General Public License v3.0

Python 92.99% HTML 6.97% CSS 0.05%

fiction fanfiction scrapper formatter downloader fiction-dl downloading-stories ebook ebook-downloader epub

fiction-dl's People

Contributors

Stargazers

Watchers

Forkers

john-trapasso cptspacemanspiff

fiction-dl's Issues

Disable text features?

I'm not good at coding, so I just want to know, how do I comment out/disable some of the text formatting stuff? Specifically the line break editor. There's plenty of older fic that has weirder typography that it automatically reads as 'intended to be line breaks' and replaces it with a

even when it isn't, cutting off text. I want to be able to turn it off.

Can't add gif images

When I tried to convert a html with single frame gifs they failed to download while giving "! Failed to download image..." message.

I had to install dreamy-utilities manually to get it working

I installed this using pip but when I tried to download a story from HF I got error related to dreamy-utilities (don't remember exactly what but it worked perfectly after I installed dreamy-utilities using pip though.
On a side note I was trying to scrape stories from HF when I found this on pypi.
I'm new to programming and was trying to get an epub from the scraped html strings(chapters). Now that I found this I don't need to go through that anymore. Thanks for creating this tool. Really appreciate it.

Entire program terminates because of a error in one link.

Here is the error (I didn't reduce any details, please don't be offended!)

Creating the extractor...

Extractor created: "ExtractorHentaiFoundry".

Scanning the story...
┌─────────────────┬───────────────────────┐
│ Title: │ Bitchbreaker Lucifer │
├─────────────────┼───────────────────────┤
│ Author: │ Delaware │
├─────────────────┼───────────────────────┤
│ Date published: │ Oct 16, 2020 │
├─────────────────┼───────────────────────┤
│ Date updated: │ Oct 19, 2020 │
├─────────────────┼───────────────────────┤
│ Chapter count: │ 3 │
├─────────────────┼───────────────────────┤
│ Word count: │ 19,945 │
└─────────────────┴───────────────────────┘

Extracting content...

Extracted chapter 3/3: |█████████████████████████████████████████████|

Downloading images...

Found 0 image(s).

Processing content...

Content processed.

Formatting and saving the story...
ERROR:root:Failed to format the stories as HTML.
Traceback (most recent call last):
File "c:\users\anangaya\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
return run_code(code, main_globals, None,
File "c:\users\anangaya\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\anangaya\AppData\Local\Programs\Python\Python38-32\Scripts\fiction-dl.exe_main.py", line 7, in
File "c:\users\anangaya\appdata\local\programs\python\python38-32\lib\site-packages\fiction_dl_main.py", line 93, in Main
Application(
File "c:\users\anangaya\appdata\local\programs\python\python38-32\lib\site-packages\fiction_dl\Core\Application.py", line 181, in Launch
self._FormatAndSaveStoryOrPackage(newlyDownloadedStory)
File "c:\users\anangaya\appdata\local\programs\python\python38-32\lib\site-packages\fiction_dl\Core\Application.py", line 486, in _FormatAndSaveStoryOrPackage
if not formatter.FormatAndSave(story, filePaths["ODT"]):
File "c:\users\anangaya\appdata\local\programs\python\python38-32\lib\site-packages\fiction_dl\Formatters\FormatterODT.py", line 215, in FormatAndSave
with ZipFile(filePath, mode = "a") as outputArchive:
File "c:\users\anangaya\appdata\local\programs\python\python38-32\lib\zipfile.py", line 1251, in init
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'fiction-dl Downloads\Delaware\Bitchbreaker Lucifer \Bitchbreaker Lucifer .odt'

It happened when I was processing multiple links in a text file, so the program getting terminated is really annoying. Especially because the links are being processed randomly.

It would be great if you can make it move to the next link without terminating the entire program while giving message about the failure. Fixing the error is always welcome.

Nifty problem

Most Nifty links do not work for me.
It returns:

Scanning the story...
ERROR:root:Failed to read metadata from the first chapter of the story.
ERROR:root:Failed to scan the story.

Scanning the story...
ERROR:root:List of chapters not found.
ERROR:root:Failed to scan the story.

My experience is under the same circumstances , some links always work , most just do not work.

Unable to use fiction-dl, pops out an error

f-dl : The term 'f-dl' is not recognized as the name of a cmdlet, function, script file, or operable program. Check
the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:1

f-dl https://www.fanfiction.net/s/14220835/1/Uzumaki-Shinigami (this is a test url)

  + CategoryInfo          : ObjectNotFound: (f-dl:String) [], CommandNotFoundException
  + FullyQualifiedErrorId : CommandNotFoundException

I did everything which includes upgrading and just errors out on any link provided from other sites. Nifty.Org does not work either.

Isn't there supposed to be a f-dl exe file included with this?

Not working for the new version of literotica

Not working for literotica anymore. Please fix it. The channel support feature is going to be missed for while😔

minor issue with spaces with ff.net

Something I've noticed is some older fic (pre ~2012 or so) has an issue, where, likely due to the older formatting of the fic, words become mashed together likethis. From what I can see, it's due to the original's site formatting in the html, where basically almost arbitrarily lines break mid-sentence rather than text (but with no

or anything). Due to this, the text is pulled together without a space. An example can be seen here; the html arbitrarily cuts itself in half. The resulting downloads have the words on the ends of the lines smashed together.
Is there any way to remedy this? I've tried tweaking the source, but I only have incredibly rudimentary coding skills so there isn't a ton I can really do.

Add site Rtenzo.net/rtenzo

Could you add rtenzo.net/rtenzo to make epub from it's stories and with it images

Image duplicates

When converting a html that has the same image used in several places fiction-dl downloads&saves the same image several times which is not necessary.

Images are not properly added in LOCAL TEXT STORIES.

Fiction-dl downloads all the images, but most of the img tags are missing from the xhtml file(s). So the images are not displayed when reading the ebook.

Unable to download due to author name

I tried to download an article from Quotev.com, but the author name is simplified Chinese so the output direction could not be built.
Is there any way to skip that problem? Thanks!

[Improvement] Use AO3 official downloads.

AO3 offers downloads in multiple formats, namely: AZW3, EPUB, MOBI, PDF, HTML
Direct links to these are very easy to get: https://archiveofourown.org/downloads/<story_id>/a.<extension> (the a here can be any text, by default it is the stories name, but it does not matter)

This has the potential to simplify the downloading and reduce load on ao3 and potential 429 - Too Many Requests errors.

Cloudflare challenge failure

When downloading fanfiction and a Cloudflare challenge occurs during the request, the request fails and the download process is aborted.

Using mypdf makes that unusable. Unless you have doctorate on methods how to install that bullshit library on linux.

Feature Request: Nifty.org HTML chapters.

Seems F-DL only works on plain-text chapters, but unfortunately some authors use HTML pages for each chapter and the extractor doesn't seem to work with those, generating the error:

ERROR:root:Failed to read metadata from the first chapter of the story.
ERROR:root:Failed to scan the story.

Literotica.com updated its layout and broke the extractor

Literotica recently did a small site redesign and it seems to have broken the extractor. However I try to download (either the story directly or author page) gives me a "Failed to download a story error"

[Feature] AO3 series support

Please support AO3 Series, to make downloading an entire series easier.
Adding to this, a feature to combine all stories downloaded into one document would be awsome (maybe a commandline flag that also works with providing a list?)

Root error?

I've been getting this error the past few days with ff.net fics:

ERROR:root:Failed to download page: "[URL]".
ERROR:root:Failed to scan the story.

Is it something on my end? or has ff.net updated something? I've tried updating f-dl but it still keeps happening.

installation error: Could not find a version

system: linuxmint 19.3, pip 9.0.1
command: python3 -m pip install --upgrade fiction-dl
error:

Collecting fiction-dl
#Could not find a version that satisfies the requirement fiction-dl (from versions: )
No matching distribution found for fiction-dl

command: python3 -m pip install ficiton-dl
error:

Collecting ficiton-dl
Exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 353, in run
wb.build(autobuilding=True)
File "/usr/lib/python3/dist-packages/pip/wheel.py", line 749, in build
self.requirement_set.prepare_files(self.finder)
File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 380, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 554, in _prepare_file
require_hashes
File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 278, in populate_link
self.link = finder.find_requirement(self, upgrade)
File "/usr/lib/python3/dist-packages/pip/index.py", line 465, in find_requirement
all_candidates = self.find_all_candidates(req.name)
File "/usr/lib/python3/dist-packages/pip/index.py", line 423, in find_all_candidates
for page in self._get_pages(url_locations, project_name):
File "/usr/lib/python3/dist-packages/pip/index.py", line 568, in _get_pages
page = self._get_page(location)
File "/usr/lib/python3/dist-packages/pip/index.py", line 683, in _get_page
return HTMLPage.get_page(link, session=self.session)
File "/usr/lib/python3/dist-packages/pip/index.py", line 795, in get_page
resp.raise_for_status()
File "/usr/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/models.py", line 935, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/ficiton-dl/

Story has 144 threadmarks

Some of them are hidden. You don't get hidden threadmarks. For example that https://forums.spacebattles.com/threads/this-wont-end-well-30k-isekai.587209/threadmarks . The result is 100 chapters. Command used: >fiction-dl https://forums.spacebattles.com/threads/this-wont-end-well-30k-isekai.587209/threadmarks

AddToPATH, GetPackageDirectory

I just installed the latest release, and now when I try to do anything I get the following error message:

Traceback (most recent call last): File "__main__.py", line 34, in <module> from Utilities.Filesystem import AddToPATH, GetPackageDirectory ImportError: cannot import name 'GetPackageDirectory' from 'Utilities.Filesystem' (C:\Users\Betsybugaboo\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\fiction_dl\Utilities\Filesystem.py)

What do I need to do to resolve this?

Feature Request: Better way to process local text stories

I think it would be better if the html files are not inside a txt file. It would be better to give fiction-dl a txt file that contains the metadata and the name(or path) of the html files (chapters of the story). This will make it easier to create epub of stories that has lot of chapters.