bitbybyte / fantiadl Goto Github PK

View Code? Open in Web Editor NEW

284.0 8.0 52.0 64 KB

Download posts and media from Fantia

License: MIT License

Python 100.00%

fantia

fantiadl's People

Contributors

Stargazers

Watchers

fantiadl's Issues

Set post date as modified date on post files

A crucial (in my opinion) function that is hopefully easy to implement, either as default or with a parameter.

Images embedded in posts aren't downloaded in the highest resolution available

Example posts (500 Yen plan):

Resolution differences (downloaded with fantiadl vs highest resolution):

Landscape: 800x565 vs 5016x3541
Portrait: 800x1130 vs 2508x3541

PS: I'd highly appreciate it if you would release a new binary. Thanks a lot for your work!

Files with same filename will be covered with the lastest downloaded one

In models.py/Fantiadownloader/perform_download(),
the case that files has the same name but different sizes did not handled.

def perform_download(self, url, filename, server_filename=False):
        """Perform a download for the specified URL while showing progress."""
        request = self.session.get(url, stream=True)

        if request.status_code == 404:
            self.output("Download URL returned 404. Skipping...\n")
            return

        request.raise_for_status()

        if server_filename:
            filename = os.path.join(os.path.dirname(filename), os.path.basename(unquote(request.url.split("?", 1)[0])))

        file_size = int(request.headers["Content-Length"])
        if os.path.isfile(filename):
            if os.stat(filename).st_size == file_size:
                self.output("File found (skipping): {}\n".format(filename))
                return
           ####some changes should be made#####
           ####perhaps####
           #
           else:
                   while os.path.isfile(filename):
                    filename = '_' + filename 
           #


        self.output("File: {}\n".format(filename))

        downloaded = 0
        with open(filename, "wb") as file:
            for chunk in request.iter_content(self.chunk_size):
                downloaded += len(chunk)
                file.write(chunk)
                done = int(25 * downloaded / file_size)
                percent = int(100 * downloaded / file_size)
                self.output("\r|{0}{1}| {2}% ".format("\u2588" * done, " " * (25 - done), percent))
        self.output("\n")

Thanks!

Better path sanitizing for Windows

Trailing spaces, reserved words.

About filename

It will be better to choose filename option: save original image filename or number(0, 1, 2, ...)

Force file extensions

.jpg, .png, .gif, .mp4, .webm

Do a dictionary lookup before calling guess_extension().

"No valid input provided"

I already has Requests and Python 3.7 installed but it still gives me "Error: No valid input provided"

Fee Content

Is it possible to download from fantia.jp/product?
And sometimes show text "post content not available on current plan". So i just download free plan photo/vid?

?

Invalid session cookie

Getting following error message
Error: Invalid session. Please verify your session cookie.

I am receiving the same error when I try to use the cookie text file exported by cookie.txt extension while logged in. I have tried firefox and chrome browsers and tried entering the session id from devtools directly when prompted, all methods lead to the same error.

Did something change or is it user error.

can auto rename the folder with the title of the post?

now the folders name is the post number. Can the downloader auto rename the folder using the post title instead of the number? For the better look.
Thanks

If the same page has the same file name, the file will be overwritten and saved.

For articles with multiple billing plans on the same page, the owner can upload different files with the same filename, but in that case only the last file on the page will be saved.
I can work around this by using use-server-filenames, but please do not overwrite each of them if possible.
Thank you.

Download only from fanclubs backed with paid plans

Error: Invalid session. Please verify your session cookie

Hello, i am having an issue where it doesn't recognize the cookie.
I have deleted the cookies and loged in again to create a new one, but it insists on not recognizing the actual valid cookie.

Thanks in advance.

Not downloading all images if there's two posts with the same title

Command line: fantiadl_v1.7.exe -c [session cookie] -o [output folder] -r -x -t https://fantia.jp/fanclubs/5744/posts

Example URL: https://fantia.jp/posts/658107

On the page above there are two posts with the same title for the paid plan. In this case fantiadl v1.7 only downloads the first set of images for the paid plan and not the second one because files with the same name already exist. To work around this issue I have to pause fantiadl, move the images of the first set to another folder and resume to get the second set.

Command prompt output:
─────────────────
Downloading fanclub 5744...
Collecting fanclub posts...
Collected 98 posts.
Downloading post 658107...
File: .\かるたも\658107\thumb.png
|█████████████████████████| 100%
File: .\かるたも\658107\フリープラン\0.png
|█████████████████████████| 100%
File: .\かるたも\658107\フリープラン\1.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\0.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\1.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\2.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\3.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\4.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\5.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\6.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\7.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\8.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\9.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\10.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\11.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\12.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\13.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\14.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\15.png
|█████████████████████████| 100%
File: .\かるたも\658107\感謝プラン\0.png
|█████████████████████████| 100%
Traceback (most recent call last):
File "fantiadl.py", line 108, in
File "models.py", line 175, in download_fanclub
File "models.py", line 387, in download_post
File "models.py", line 317, in download_post_content
File "models.py", line 298, in download_photo
File "models.py", line 286, in perform_download
FileExistsError: [WinError 183] Cannot create a file when that file already exists: '.\かるたも\658107\感謝プラン\0.incomplete' -> '.\かるたも\658107\感謝プラン\0.png'
[12784] Failed to execute script fantiadl

Note: I shortened the path of the files.

PS: Thanks a lot for coding this tool!

another "No Valid Input Provided" using -c or --cookie

sorry for bothering you,
but i have some login problem in V1.7
already login to fantia from chrome to get session id number,
but still got "no valid input provided" even after using -c cookies.txt / -c "insert session id here" / --cookie cookies.txt / --cookie "insert session ID here".
i already check readme and read plus following guide from previous issue that have the same problem, but it's still like this,
is this bug or there's some step i missing?
thanks for reading this

Handle terminal encoding on output

Hi, I'm using the latest build of fantiadl and encountered this issue:

Encountered an error downloading URL. Skipping...
Traceback (most recent call last):
  File "fantiadl.py", line 88, in <module>
    downloader.download_fanclub(fanclub, cmdl_opts.limit)
  File "F:\fantia\models.py", line 142, in download_fanclub
    self.download_fanclub_metadata(fanclub)
  File "F:\fantia\models.py", line 122, in download_fanclub_metadata
    self.perform_download(header_url, header_filename, server_filename=self.use_server_filenames)
  File "F:\fantia\models.py", line 210, in perform_download
    self.output("File: {}\n".format(filename))
  File "F:\fantia\models.py", line 68, in output
    sys.stdout.write(output)
UnicodeEncodeError: 'gbk' codec can't encode character '\u2615' in position 8: illegal multibyte sequence

Seems like there is no exception mechanism when a filename contains illegal characters in a certain charset. Would be really appreciated if filenames with illegal characters can be automatically renamed / normalized.

Could you look into the issue? Thank you very much!

Windows AV reports trojan in unmodified fantiadl_v1.7.exe

Trojan:Win32/Glupteba!ml
Trojan:Win32/Wacatac.D2!ml

"no valid input provided"

Maybe I dont know how to use this program, but every time i click on the executable, it keeps closing itself but before that happens, it says the message "no valid input".

Skip posts with already downloaded content as a whole instead of each file on its own

When rerunning the program with the same parameters (download all pay for current month) after the last run failed and only downloaded partial posts or a new creator has been added:

the download takes forever especially on posts with larger image galleries because every file gets requested and then skipped if already on the disk.

would it be possible to keep track of already downloaded posts? or at least check if a post has been modified since the last download.

any x86 versions?

any x86 version?

it keeps saying no valid input provided

Max retries exceeded with url: /api/v1/me

python fantiadl.py -c 6c14... https://fantia.jp/fanclubs/6561
Traceback (most recent call last):
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 696, in urlopen
    self._prepare_proxy(conn)
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 964, in _prepare_proxy
    conn.connect()
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connection.py", line 359, in connect
    conn = self._connect_tls_proxy(hostname, conn)
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connection.py", line 496, in _connect_tls_proxy
    return ssl_wrap_socket(
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\util\ssl_.py", line 432, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls)
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\util\ssl_.py", line 474, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock)
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 1040, in _create
    self.do_handshake()
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 439, in send
    resp = conn.urlopen(
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\util\retry.py", line 573, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='fantia.jp', port=443): Max retries exceeded with url: /api/v1/me (Caused by ProxyError('Cannot connect to proxy.', FileNotFoundError(2, 'No such file or directory')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\15516\Desktop\fantia\fantiadl.py", line 77, in <module>
    downloader = models.FantiaDownloader(session_arg=session_arg, dump_metadata=cmdl_opts.dump_metadata, parse_for_external_links=cmdl_opts.parse_for_external_links, download_thumb=cmdl_opts.download_thumb, directory=cmdl_opts.output_path, quiet=cmdl_opts.quiet, continue_on_error=cmdl_opts.continue_on_error, use_server_filenames=cmdl_opts.use_server_filenames, mark_incomplete_posts=cmdl_opts.mark_incomplete_posts, month_limit=cmdl_opts.month_limit)
  File "C:\Users\15516\Desktop\fantia\models.py", line 75, in __init__
    self.login()
  File "C:\Users\15516\Desktop\fantia\models.py", line 99, in login
    check_user = self.session.get(ME_API,verify =  False)
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\15516\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 510, in send
    raise ProxyError(e, request=request)
requests.exceptions.ProxyError: HTTPSConnectionPool(host='fantia.jp', port=443): Max retries exceeded with url: /api/v1/me (Caused by ProxyError('Cannot connect to proxy.', FileNotFoundError(2, 'No such file or directory')))

I am sorry to trouble you. And here is the error information. My python version is 3.9.0.

And I use global system proxy to access the website.

When calling Login, is shows
requests.exceptions.ProxyError: HTTPSConnectionPool(host='fantia.jp', port=443): Max retries exceeded with url: /api/v1/me (Caused by ProxyError('Cannot connect to proxy.', FileNotFoundError(2, 'No such file or directory')))

Is that caused by proxy software?

I am looking forward to your earliest reply, thank you!

https://fantia.jp/posts/365206

https://fantia.jp/posts/365206
gif

External link detection is wonky

Downloading post 131033...
Traceback (most recent call last):
  File "fantiadl.py", line 66, in <module>
    downloader.download_fanclub_posts(fanclub, cmdl_opts.limit)
  File "/dev/shm/models.py", line 78, in download_fanclub_posts
    self.download_post(post_id)
  File "/dev/shm/models.py", line 169, in download_post
    self.parse_external_links(post_description, os.path.abspath(post_directory))
  File "/dev/shm/models.py", line 174, in parse_external_links
    link_matches = self.EXTERNAL_LINKS_RE.findall(post_description)
TypeError: expected string or bytes-like object

Download club icon, header, and custom background

Dump external links in metadata

Produce an output of external links (e.g. Mega) that can be easily processed with another downloader. Find out if JDownloader or another downloader has a way to import links with an assigned directory.

Error: no valid input provided

filename definition

Thanks for the great tool! I have a request regarding functionality, could you please add a feature to define the filename to the post number?
ex: https://fantia.jp/posts/******/post_content_photo/[*******] or https://cc.fantia.jp/uploads/post_content_photo/file/[*******]

unable to open the executable programme

Hi,

For some reason whenever I open 'fantiadl_v1.3.3.exe', it always return the message 'Error: No valid input provided'. While I am able to execute the python source code directly i.e. using visual studio or SPYDER , I just cannot open 'fantiadl_v1.3.3.exe'. What is the way to resolve it?

Cheers

Identify when content is not available

Rename folders or otherwise if content is not available on your currently backed plan.

Better rate limiting

I got the error 429 - Too many requests for bit, so I would suggest adding or increasing the rate limits tiny bit to avoid this

File flagged as a threat by Windows Defender

Windows Defender flags fantiadl as a threat(Program:Win32/Uwamson.A!ml). Is this actually safe to use?

[BUG] Unable to download due to invalid session

Getting "Error: Invalid session. Please verify your session cookie" every time I attempt to run fantiadl.
Things were working the last time I ran this back in december.

command run: .\fantiadl_v1.8.exe -i -s -d 2022-01 -c cookies.txt https://fantia.jp/fanclubs/<FAN_CLUB_ID>
tested: .\fantiadl_v1.8.exe -i -s -d 2022-01 -c <_SESSION_ID VALUE> https://fantia.jp/fanclubs/<FAN_CLUB_ID>
I have also logged out of fantia and logged back in to generate a new cookie/_session_id to test with.

Traceback when downloading certain fanclub(s)

Downloading fanclub 36599...
File found (skipping): .\fanclub\tokorot\6b70d7db-1fe2-459c-8065-54b4b088617b.png
Traceback (most recent call last):
File "fantiadl.py", line 88, in
File "models.py", line 139, in download_fanclub
File "models.py", line 126, in download_fanclub_metadata
File "models.py", line 191, in perform_download
File "site-packages\requests\models.py", line 940, in raise_for_status
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://fantia.jp/images/fallback/fanclub/icon_image
/_default5.png
[11932] Failed to execute script fantiadl

I've seen this happen with other fanclubs but I didn't take note of the ID's so this is my only example.

https://fantia.jp/fanclubs/4071

@bitbybyte @Suika @hinata @utilael @hibikidesu
https://fantia.jp/posts/205729

Where i can find this number?

Galleries without titles

Downloading post 132271...
Traceback (most recent call last):
  File "fantiadl.py", line 62, in <module>
    downloader.download_fanclub_posts(fanclub, cmdl_opts.limit)
  File "/tmp/fantiadl/models.py", line 75, in download_fanclub_posts
    self.download_post(post_id)
  File "/tmp/fantiadl/models.py", line 157, in download_post
    self.download_post_content(post, post_directory)
  File "/tmp/fantiadl/models.py", line 134, in download_post_content
    gallery_directory = os.path.join(post_directory, sanitize_for_path(photo_gallery_title))
  File "/tmp/fantiadl/models.py", line 172, in sanitize_for_path
    return re.sub(r'[<>\"\?\\\/\*:]', replace, value)
  File "/usr/lib/python3.5/re.py", line 182, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

If value is None, the sanitizer will fail.

esther rosenthal images

@hibikidesu https://fantia.jp/posts/210318
Download

Error: No valid input provided

Have tried multiple versions.. running the exe gives the error: "no valid input provided" before immediately closing. Trying to run the python file from the zip with cmd yields the same error. I'm on Python 3.10.

can't login with _session_id cookie or cookies.txt

Hello, when i'm trying to connect to my fantia account with _session_id cookie cookies.txt i'am getting a error that tell me that my session cookie is invalide. i've tried to wait and delete it and connecting with the new one but i'm still getting this error. am I doing something wrong?

Preserve data on post title change

From time to time, creators will change the title of their post.
Due to the way the paths are constructed in fantiadl, that causes file duplication.

The post ID is the only thing that doesn't change.

Solution:

if post ID directory already exists, check files and download new files if size differs
rename old metadata .json and download new metadata as usual
rename directory to new title

Invalid session despite _session_id Cookie not being expired

As the title says, I keep getting this error now for some reason and I don't understand why as the cookie isn't expired, it started a couple days ago.

I tried logging out and back in but it didn't change anything.

Status code 304 from API "fantia.jp/api/v1/me"

Hi bitbybyte,

On June 14th, 2020, a download was interrupted when I was using the downloader. The program keep return "Error: Invalid session" since that time.

I tried to trouble shooting it. I reviewed the source code and check the webpage "https://fantia.jp/api/v1/me" by developer tools on firedox 77.0.1 & Chrome 83.0.4103.106. The page status code is 304 instead of 200.

Did you have the same problem? I am not sure if fantia made some change on their server or there is some cache issue on my side.

Thank you and have a nice day,

No empty directories

Skip posts when -m is none and the user has no access to post content.

Error because of invalid session cookie

For over a week fantiadl_v1.8.exe has been displaying the error message "Error: Invalid session. Please verify your session cookie" on my PC although the session ID is correct. Of course, I copied and pasted it. I also re-logged on fantia.jp to get a new one, to no avail. Before that, it worked perfectly.

Screenshot:

Login process now requires reCAPTCHA

As the title says, fantiadl will return an Error: Invalid session after a password change; and it's not a wrong password either - on wrong password, it will threw "Error: Failed to login. Please verify your username and password", but with the new one, it will just return "Invalid session" instead.

Tried again on a different network and OS, didn't work.

Could you look into the issue? Thanks in advance!

Where do I put the url?

I've been looking for where to put the url but can't find it.

Metadata check on paid and free plan scrape overwrites

For example, when a free plan user performs a scrape for a club already scraped with a paid plan, metadata will be overwritten. To fix this we can check the number of available post_contents to the scraping user and compare with the existing metadata. If more are available to the user, overwrite and continue downloading. If more are availble in the existing metadata, skip the post entirely.

We can also check which plan status is currently joined, or even simplify this entirely by specifying the type of metadata e.g. metadata_plan0.json.

Can this project can set download dirctory?

This script is very convenient and it helps me a lot , thank you.
But I can not find how to set download dirctory. For example , you can see this script .

If edit "config.ini" and type "%artist%/%urlFilename%" , Then all files will be downloaded in a same dirctory , such as following pictures

And if edit "config.ini" as "%artist%/%member_id%/%urlFilename%/" , then the script will create dirctory for every post , like this

Fantiadl will create dirctory for every post like example 2 defaultly, then please tell me how to let "fantiadl" downloads files like example 1

Adding arguments for all posts within a specific back number/month

Don't know if possible, but would be really nice to be able to do this if you have a specific month(s) of content you'd like to download instead of the entire discography or manually entering each post you want downloaded.

bitbybyte / fantiadl Goto Github PK

fantiadl's People

Contributors

Stargazers

Watchers

Forkers

fantiadl's Issues

Recommend Projects

Recommend Topics

Recommend Org