Git Product home page Git Product logo

hect0x7 / jmcomic-crawler-python Goto Github PK

View Code? Open in Web Editor NEW
525.0 3.0 1.1K 1.42 MB

Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀

Home Page: https://jmcomic.readthedocs.io/zh-cn/latest/option_file_syntax/#

License: MIT License

Python 100.00%
18comic crawler github-actions pypi python readthedocs downloader jmcomic

jmcomic-crawler-python's Introduction

Python API For JMComic (禁漫天堂)

本项目封装了一套可用于爬取JM的Python API.

你可以通过简单的几行Python代码,实现下载JM上的本子到本地,并且是处理好的图片。

【指路】教程:使用GitHub Actions下载禁漫本子

【指路】教程:导出并下载你的禁漫收藏夹数据

友情提示:珍爱JM,为了减轻JM的服务器压力,请不要一次性爬取太多本子,西门🙏🙏🙏.

项目介绍

本项目的核心功能是下载本子。

基于此,设计了一套方便使用、便于扩展,能满足一些特殊下载需求的框架。

目前核心功能实现较为稳定,项目也处于维护阶段。

除了下载功能以外,也实现了其他的一些禁漫接口,按需实现。目前已有功能:

  • 登录
  • 搜索本子(支持所有搜索项)
  • 图片下载解码
  • 分类/排行榜
  • 本子/章节详情
  • 个人收藏夹
  • 接口加解密(APP的接口)

安装教程

⚠如果你没有安装过Python,需要先安装Python再执行下面的步骤,且版本需要>=3.7(点我去python官网下载

  • 通过pip官方源安装(推荐,并且更新也是这个命令)

    pip install jmcomic -i https://pypi.org/project -U
  • 通过源代码安装

    pip install git+https://github.com/hect0x7/JMComic-Crawler-Python

快速上手

1. 下载本子方法

只需要使用如下代码,就可以下载本子JM422866的所有章节的图片:

import jmcomic  # 导入此模块,需要先安装.
jmcomic.download_album('422866')  # 传入要下载的album的id,即可下载整个album到本地.

上面的 download_album方法还有一个参数option,可用于控制下载配置,配置包括禁漫域名、网络代理、图片格式转换、插件等等。

你可能需要这些配置项。推荐使用配置文件创建option,用option下载本子,见下章:

2. 使用option配置来下载本子

  1. 首先,创建一个配置文件,假设文件名为 option.yml

    该文件有特定的写法,你需要参考这个文档 → 配置文件指南

    下面做一个演示,假设你需要把下载的图片转为png格式,你应该把以下内容写进option.yml

download:
  image:
    suffix: .png # 该配置用于把下载的图片转为png格式
  1. 第二步,运行下面的python代码
import jmcomic

# 创建配置对象
option = jmcomic.create_option_by_file('你的配置文件路径,例如 D:/option.yml')
# 使用option对象来下载本子
jmcomic.download_album(422866, option)
# 等价写法: option.download_album(422866)

进阶使用

请查阅文档首页→jmcomic.readthedocs.io

(提示:jmcomic提供了很多下载配置项,大部分的下载需求你都可以尝试寻找相关配置项或插件来实现。)

项目特点

  • 绕过Cloudflare的反爬虫

  • 实现禁漫APP接口最新的加解密算法 (1.6.3)

  • 用法多样:

  • 支持网页端移动端两种客户端实现,可通过配置切换(移动端不限ip兼容性好,网页端限制ip地区但效率高

  • 支持自动重试和域名切换机制

  • 多线程下载(可细化到一图一线程,效率极高)

  • 可配置性强

    • 不配置也能使用,十分方便
    • 配置可以从配置文件生成,支持多种文件格式
    • 配置点有:请求域名 客户端实现 是否使用磁盘缓存 同时下载的章节/图片数量 图片格式转换 下载路径规则 请求元信息(headers,cookies,proxies)
  • 可扩展性强

    • 支持自定义本子/章节/图片下载前后的回调函数
    • 支持自定义类:Downloader(负责调度) Option(负责配置) Client(负责请求) 实体类
    • 支持自定义日志、异常监听器
    • 支持Plugin插件,可以方便地扩展功能,以及使用别人的插件,目前内置插件有
      • 登录插件
      • 硬件占用监控插件
      • 只下载新章插件
      • 压缩文件插件
      • 下载特定后缀图片插件
      • 发送QQ邮件插件
      • 日志主题过滤插件
      • 自动使用浏览器cookies插件
      • jpg图片合成为一个pdf插件
      • 导出收藏夹为csv文件插件

使用小说明

  • Python >= 3.7
  • 个人项目,文档和示例会有不及时之处,可以Issue提问

项目文件夹介绍

  • .github:GitHub Actions配置文件

  • assets:存放一些非代码的资源文件

    • docs:项目文档
    • option:存放配置文件
  • src:存放源代码

    • jmcomic:jmcomic模块
  • tests:测试目录,存放测试代码,使用unittest

  • usage:用法目录,存放示例/使用代码

感谢以下项目

图片分割算法代码+禁漫移动端API

Repo Card

jmcomic-crawler-python's People

Contributors

hect0x7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

jmcomic-crawler-python's Issues

小白想问一下,我这里显示“请求重试全部失败”是咋回事

以下是powershell返回的失败信息:
`PS C:\Windows\system32> python
Python 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import jmcomic
jmcomic.download_album('285085')
2023-09-21 11:11:10:【获取禁漫URL】[https://jm365.work/3YeBdF] → [https://18-comic.work]
2023-09-21 11:11:10:【html】https://18-comic.work/album/285085
2023-09-21 11:11:10:【req.error】Failed to perform, ErrCode: 35, Reason: 'BoringSSL SSL_connect: Connection was reset in connection to 18-comic.work:443 '
2023-09-21 11:11:10:【req.retry】次数: [1/5], 域名: [0 of ['18-comic.work']], 路径: [https://18-comic.work/album/285085], 参数: [{}]
2023-09-21 11:11:10:【req.error】Failed to perform, ErrCode: 35, Reason: 'BoringSSL SSL_connect: Connection was reset in connection to 18-comic.work:443 '
2023-09-21 11:11:10:【req.retry】次数: [2/5], 域名: [0 of ['18-comic.work']], 路径: [https://18-comic.work/album/285085], 参数: [{}]
2023-09-21 11:11:11:【req.error】Failed to perform, ErrCode: 35, Reason: 'BoringSSL SSL_connect: Connection was reset in connection to 18-comic.work:443 '
2023-09-21 11:11:11:【req.retry】次数: [3/5], 域名: [0 of ['18-comic.work']], 路径: [https://18-comic.work/album/285085], 参数: [{}]
2023-09-21 11:11:11:【req.error】Failed to perform, ErrCode: 35, Reason: 'BoringSSL SSL_connect: Connection was reset in connection to 18-comic.work:443 '
2023-09-21 11:11:11:【req.retry】次数: [4/5], 域名: [0 of ['18-comic.work']], 路径: [https://18-comic.work/album/285085], 参数: [{}]
2023-09-21 11:11:11:【req.error】Failed to perform, ErrCode: 35, Reason: 'BoringSSL SSL_connect: Connection was reset in connection to 18-comic.work:443 '
2023-09-21 11:11:11:【req.retry】次数: [5/5], 域名: [0 of ['18-comic.work']], 路径: [https://18-comic.work/album/285085], 参数: [{}]
2023-09-21 11:11:12:【req.error】Failed to perform, ErrCode: 35, Reason: 'BoringSSL SSL_connect: Connection was reset in connection to 18-comic.work:443 '
2023-09-21 11:11:12:【req.fallback】请求重试全部失败: [https://18-comic.work/album/285085], ['18-comic.work']
2023-09-21 11:11:12:【dler.exception】JmDownloader Exit with exception: (<class 'jmcomic.jm_config.JmcomicException'>, JmcomicException("请求重试全部失败: [https://18-comic.work/album/285085], ['18-comic.work']"))
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\jmcomic\api.py", line 45, in download_album
dler.download_album(jm_album_id)
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\jmcomic\jm_downloader.py", line 73, in download_album
album = client.get_album_detail(album_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\common\util\decorator_util.py", line 67, in call
value = self.invoke(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\common\util\decorator_util.py", line 55, in invoke
return self.func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\jmcomic\jm_client_impl.py", line 181, in get_album_detail
resp = self.get_jm_html(f"/album/{album_id}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\jmcomic\jm_client_impl.py", line 265, in get_jm_html
resp = self.get(url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\jmcomic\jm_client_impl.py", line 28, in get
return self.request_with_retry(self.postman.get, url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\jmcomic\jm_client_impl.py", line 81, in request_with_retry
return self.request_with_retry(request, url, domain_index, retry_count + 1, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\jmcomic\jm_client_impl.py", line 81, in request_with_retry
return self.request_with_retry(request, url, domain_index, retry_count + 1, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\jmcomic\jm_client_impl.py", line 81, in request_with_retry
return self.request_with_retry(request, url, domain_index, retry_count + 1, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Previous line repeated 2 more times]
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\jmcomic\jm_client_impl.py", line 83, in request_with_retry
return self.request_with_retry(request, url, domain_index + 1, 0, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\jmcomic\jm_client_impl.py", line 52, in request_with_retry
self.fallback(request, url, domain_index, retry_count, **kwargs)
File "C:\Users\curdea\AppData\Local\Programs\Python\Python311\Lib\site-packages\jmcomic\jm_client_impl.py", line 152, in fallback
raise JmModuleConfig.exception(msg)
jmcomic.jm_config.JmcomicException: 请求重试全部失败: [https://18-comic.work/album/285085], ['18-comic.work']`

chatgpt说我可能是网络问题,但是我的网络应该没问题:
PS C:\Windows\system32> Invoke-RestMethod -Uri https://ipinfo.io ip : 隐私 city : Hong Kong region : Central and Western country : HK loc : 22.2783,114.1747 org : AS41378 Kirino LLC timezone : Asia/Hong_Kong readme : https://ipinfo.io/missingauth

其它:

chrome直接打开
https://18-comic.work/album/285085
会有人机验证,验证后可以正常打开网页。

发现一个可能的Bug,无法正确匹配上架和更新日期

我简单写了一个python程序来下载album和album的属性,其中重写了模块的JmDownloader类,但提交上架和更新日期时发现值为0,不论如何修改jm_toolkit中的正则表达式,亦或者更换api模式,均不能解决问题,最后在jm_entity的JmAlbumDetail类中直接加print发现值也为0,初步判断是模块的问题,但我能力太差,找不到Bug的原因。。。。。。

以下是程序代码,option只定义了login扩展的使用,涉及账号密码就不展示了

import time
import jmcomic
import sqlite3
import queue
import threading
import sys

albumData_q = queue.Queue(64)
album_conn = sqlite3.connect("../album.db", check_same_thread=False)
# album_conn.execute('''CREATE TABLE Album(
#                         ID          INTEGER KEY NOT NULL UNIQUE,
#                         name        TEXT NOT NULL,
#                         chapter     TEXT,
#                         chapterID   TEXT,
#                         chapterName TEXT,
#                         chapterPage TEXT,
#                         author      TEXT,
#                         actor       TEXT,
#                         tag         TEXT,
#                         pub_date    TEXT,
#                         upd_date    TEXT);''')

getOption = jmcomic.create_option("opt.yml")
jm_log = jmcomic.JmModuleConfig.jm_log

baseDir = "E:/JMAlbum/"

# 出现问题的类
class superDownloader(jmcomic.JmDownloader):
    def __init__(self, option: jmcomic.JmOption):
        super().__init__(option)
        self.data: dict = {}
        self.chapterID: list = []
        self.chapterName: list = []
        self.chapterPage: list = []

    def after_album(self, album: jmcomic.JmAlbumDetail):
        super().after_album(album)
        self.data = {"ID": album.album_id, "name": album.name, "chapter": len(album),
                     "chapterID": self.chapterID, "chapterName": self.chapterName, "chapterPage": self.chapterPage,
                     "author": album.authors, "actor": album.actors, "tag": album.tags,
                     "pub_date": album.pub_date, "upd_date": album.update_date} # 发现album.pub_date和album.update_date为0
        albumData_q.put(self.data)
        jmcomic.default_jm_logging("album.after.q", "报告成功")
        self.option.call_all_plugin(
            'after_album',
            album=album,
            downloader=self,
        )

    def after_photo(self, photo: jmcomic.JmPhotoDetail):
        super().after_photo(photo)
        self.chapterID.append(photo.photo_id)
        self.chapterName.append(photo.name)
        print(len(photo))
        self.chapterPage.append(len(photo))
        jmcomic.default_jm_logging("photo.after.q", "刷新成功")
        self.option.call_all_plugin(
            'after_photo',
            photo=photo,
            downloader=self,
        )

#用来提交数据的线程
class dataBaseExecutor(threading.Thread):
    def __init__(self):
        super().__init__()
        self.album_cs_w = album_conn.cursor()

    def run(self):
        for i in range(sys.maxsize):
            data: dict = albumData_q.get(True, None)
            data: list = [str(i) for i in data.values()]
            print(data)
            try:
                self.album_cs_w.execute(f'''INSERT INTO Album VALUES ({"?, "*(len(data)-1)}?);''', data)
                jmcomic.default_jm_logging("db", "插入成功")
            except Exception as error:
                jmcomic.default_jm_logging("db",
                                           f"插入失败:代码-INSERT INTO Album VALUES ({'?, '*(len(data)-1)}?); 错误-{error} 数据-{data}")
            if i % 32 == 31:
                album_conn.commit()
                jmcomic.default_jm_logging("db", "提交成功")

    def quit(self):
        album_conn.commit()
        self.album_cs_w.close()

#主线程
def main():
    dbMaster = dataBaseExecutor()
    dbMaster.daemon = True
    dbMaster.start()
    id = 4 #aid
    try:
        jmcomic.download_album(id, getOption, superDownloader)
    except jmcomic.JmcomicException as e:
        if "本子不存在" in e:
            data = {"ID": id, "name": "", "chapter": "",
                    "chapterID": "", "chapterName": "", "chapterPage": "",
                    "author": "", "actor": "", "tag": "",
                    "pub_date": "", "upd_date": ""}
            albumData_q.put(data)
    time.sleep(10)
    album_conn.commit()

if __name__ == "__main__":
    main()

发现个章节下载的小问题

比如社团学姐, 漫画id是564268
jmcomic 564268 可以下载到全部。但是她最新章节 561909, 我用jmcomic p561909 就会提示不存在这本漫画。很奇怪的问题
另一本是
大學生活就從社團開始 564184, jmcomic 564184 成功
单独下载章节失败 jmcomic p563992

有时候不想下载整本漫画,因为前面章节看过了。就想下载最新的。

C:\Users\Administrator>jmcomic p563992
[2024-04-22 15:18:20] [MainThread]:【command_line】start downloading...

  • using option: [D:\myoption.yml]
    to be downloaded:
  • album: []
  • photo: ['563992']
    [2024-04-22 15:18:20] [MainThread]:【plugin.invoke】调用插件: [login]
    [2024-04-22 15:18:20] [MainThread]:【api】https://www.jmapinode.xyz/setting
    [2024-04-22 15:18:20] [MainThread]:【api.setting】change APP_VERSION from [1.6.7] to [1.6.8]
    [2024-04-22 15:18:20] [MainThread]:【api】https://www.jmapinode.xyz/login
    [2024-04-22 15:18:20] [MainThread]:【plugin.login】登录成功
    [2024-04-22 15:18:20] [Thread-1]:【api】https://www.jmapinode.xyz/chapter?id=563992
    [2024-04-22 15:18:21] [Thread-1]:【api】https://www.jmapinode.xyz/album?id=396774
    [2024-04-22 15:18:21] [Thread-1]:【dler.exception】JmDownloader Exit with exception: (<class 'jmcomic.jm_exception.MissingAlbumPhotoException'>, MissingAlbumPhotoException('请求的本子不存在!(https://18comic.vip/album/396774/)\n原因可能为:\n1. id有误,检查你的本子id\n2. 该漫画只对登录用户可见,请配置你的cookies,或者使用移动端Client(api)\n', {'resp': <jmcomic.jm_client_interface.JmApiResp object at 0x0000028BEE6BDD60>, 'missing_jm_id': '396774'}))
    Exception in thread Thread-1:
    Traceback (most recent call last):
    File "c:\program files\python\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
    File "c:\program files\python\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
    File "c:\program files\python\lib\site-packages\jmcomic\api.py", line 35, in
    apply_each_obj_func=lambda aid: download_api(aid,
    File "c:\program files\python\lib\site-packages\jmcomic\api.py", line 86, in download_photo
    photo = dler.download_photo(jm_photo_id)
    File "c:\program files\python\lib\site-packages\jmcomic\jm_downloader.py", line 77, in download_photo
    photo = client.get_photo_detail(photo_id)
    File "c:\program files\python\lib\site-packages\jmcomic\jm_client_impl.py", line 635, in get_photo_detail
    self.fetch_photo_additional_field(photo, fetch_album, fetch_scramble_id)
    File "c:\program files\python\lib\site-packages\jmcomic\jm_client_impl.py", line 714, in fetch_photo_additional_field
    photo.from_album = self.get_album_detail(photo.album_id)
    File "c:\program files\python\lib\site-packages\jmcomic\jm_client_impl.py", line 622, in get_album_detail
    return self.fetch_detail_entity(album_id,
    File "c:\program files\python\lib\site-packages\jmcomic\jm_client_impl.py", line 180, in cache_wrapper
    return func(*args, **kwargs)
    File "c:\program files\python\lib\site-packages\jmcomic\jm_client_impl.py", line 663, in fetch_detail_entity
    resp = self.req_api(self.append_params_to_url(
    File "c:\program files\python\lib\site-packages\jmcomic\jm_client_impl.py", line 869, in req_api
    self.require_resp_success(resp, url)
    File "c:\program files\python\lib\site-packages\jmcomic\jm_client_impl.py", line 916, in require_resp_success
    ExceptionTool.raise_missing(resp, JmcomicText.parse_to_jm_id(url))
    File "c:\program files\python\lib\site-packages\jmcomic\jm_exception.py", line 144, in raise_missing
    cls.raises(
    File "c:\program files\python\lib\site-packages\jmcomic\jm_exception.py", line 100, in raises
    raise e
    jmcomic.jm_exception.MissingAlbumPhotoException: 请求的本子不存在!(https://18comic.vip/album/396774/)
    原因可能为:
  1. id有误,检查你的本子id
  2. 该漫画只对登录用户可见,请配置你的cookies,或者使用移动端Client(api)

【问题反馈】使用GitHub Actions下载禁漫本子(新)

✨提Issue注意✨

  • 这个Issue的主题是 使用Github Actions下载禁漫本子
  • 提Issue之前请先爬一些楼层和搜索,雷同的问题不用重复提
  • 和本Issue主题无关的BUG请单独开Issue

✨最新功能提醒✨

之前需要编辑文件提交才能触发Github Actions,现在不需要啦!

按照album压缩 异常

执行失败:

[2024-04-09 20:45:28] [MainThread]:【plugin.invoke】调用插件: [zip]
[2024-04-09 20:45:28] [MainThread]:【plugin.error】插件 [zip],运行遇到未捕获异常,异常信息: ['NoneType' object has no attribute 'name']
Traceback (most recent call last):
  File "C:\workspace\codes\edit\JMComic-Crawler-Python\src\jmcomic\jm_option.py", line 561, in call_all_plugin
    self.invoke_plugin(pclass, kwargs, extra, pinfo)
  File "C:\workspace\codes\edit\JMComic-Crawler-Python\src\jmcomic\jm_option.py", line 604, in invoke_plugin
    self.handle_plugin_unexpected_error(e, pinfo, kwargs, plugin, pclass)
  File "C:\workspace\codes\edit\JMComic-Crawler-Python\src\jmcomic\jm_option.py", line 634, in handle_plugin_unexpected_error
    raise e
  File "C:\workspace\codes\edit\JMComic-Crawler-Python\src\jmcomic\jm_option.py", line 592, in invoke_plugin
    plugin.invoke(**kwargs)
  File "C:\workspace\codes\edit\JMComic-Crawler-Python\src\jmcomic\jm_plugin.py", line 304, in invoke
    zip_path = self.get_zip_path(album, None, filename_rule, suffix, zip_dir)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\workspace\codes\edit\JMComic-Crawler-Python\src\jmcomic\jm_plugin.py", line 398, in get_zip_path
    filename = DirRule.apply_rule_directly(album, photo, filename_rule)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\workspace\codes\edit\JMComic-Crawler-Python\src\jmcomic\jm_option.py", line 184, in apply_rule_directly
    return cls.apply_rule_solver(album, photo, cls.get_rule_solver(rule))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\workspace\codes\edit\JMComic-Crawler-Python\src\jmcomic\jm_option.py", line 180, in apply_rule_solver
    return func(detail)
           ^^^^^^^^^^^^
  File "C:\workspace\codes\edit\JMComic-Crawler-Python\src\jmcomic\jm_option.py", line 152, in solve_func
    return fix_windir_name(str(DetailEntity.get_dirname(detail, rule[1:])))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\workspace\codes\edit\JMComic-Crawler-Python\src\jmcomic\jm_entity.py", line 164, in get_dirname
    return getattr(detail, ref)
           ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'name'

配置如下:

plugins:
  after_album:
    - plugin: zip # 压缩文件插件
      kwargs:
        level: album # 按照章节,一个章节一个压缩文件
        filename_rule: Pname # 压缩文件的命名规则
        zip_dir: xxxxxx # 压缩文件存放的文件夹
        delete_original_file: true # 压缩成功后,删除所有原文件和文件夹

一个关于无效Json的Bug

以下是报错信息,复现起来可能有点困难,压测了6个小时发现的Bug(另外怀疑Html端也有相似问题,但我这边网络很迷,没法测试
可以在解析JSON时判断行是否有效,建议在option设置一个选项能控制遇到此类错误是直接抛异常还是继续执行

2024-01-15 20:52:11:【dler.exception】superDownloader Exit with exception: (<class 'json.decoder.JSONDecodeError'>, JSONDecodeError('Expecting value: line 2 column 1 (char 1)'))
Exception in thread Thread-4:
Traceback (most recent call last):
  File "E:\Python\3.11.6\Lib\threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "E:\PythonProject\JMDownload\new\apiGet.py", line 127, in run
    jmcomic.download_album(aid, getOption, superDownloader)
  File "E:\Python\3.11.6\Lib\site-packages\jmcomic\api.py", line 48, in download_album
    dler.download_album(jm_album_id)
  File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_downloader.py", line 58, in download_album
    album = client.get_album_detail(album_id)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_client_impl.py", line 607, in get_album_detail
    return self.fetch_detail_entity(album_id,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_client_impl.py", line 174, in cache_wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_client_impl.py", line 648, in fetch_detail_entity
    resp = self.req_api(
           ^^^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_client_impl.py", line 854, in req_api
    self.require_resp_success(resp, url)
  File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_client_impl.py", line 890, in require_resp_success
    resp.require_success()
  File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_client_interface.py", line 43, in require_success
    if self.is_not_success:
       ^^^^^^^^^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_client_interface.py", line 24, in is_not_success
    return not self.is_success
               ^^^^^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_client_interface.py", line 103, in is_success
    return super().is_success and self.json()['code'] == 200
                                  ^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\site-packages\common\util\decorator_util.py", line 63, in func_exec
    attr = func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_client_interface.py", line 89, in json
    return self.resp.json()
           ^^^^^^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\site-packages\curl_cffi\requests\models.py", line 129, in json
    return loads(self.content, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Python\3.11.6\Lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

在python上运行失败

我是小白,在vs上运行python,pip安装后,昨天下载还正常,无论是album还是photo都下载正常,今天突然不行了
代码是
`import jmcomic
jmcomic.download_photo('519829')
#下载章节

jmcomic.download_album('519829')
#下载全本`

错误提示是
Message=partially initialized module 'jmcomic' has no attribute 'download_photo' (most likely due to a circular import) AttributeError: partially initialized module 'jmcomic' has no attribute 'download_photo' (most likely due to a circular import)

請問怎樣使用 find_update?

我想知道該怎樣下載一個album中特定章節後的全部新章節。我看了usage_advanced.py和getting_started.py,但是也看不太懂該怎麼做。新手剛剛上路,謝謝。

建议添加在章节名称前加序号的下载方式,因为很多漫画的章节名称是无序的

建议添加在章节名称前加序号的下载方式,因为很多漫画的章节名称是无序的

Snipaste_2023-04-04_23-05-18

小白一个....我尝试在deside_image_save_dir中添加index形参,并将download_photo中的index数据一路传进去,但不知道是因为多线程还是其他原因,最后拼接出来的章节文件夹名称和序号是不对,查看yml配置文件也没有相关的配置,所以希望作者能添加该下载方式,最好是在目标位置先按照漫画名称创建文件夹后,再在里面按照序号创建章节文件夹,水平太差,在此请教作者。

关于通过站内搜索关键字批量下载

其实作者已经详单完善了,基本上除了找某个固定本子之外,都是搜索作者名称或者tag进行搜索的,只是作者给的示例代码去掉了tag搜索中for循环的if语句,在下载前输出搜索结果。假如我想实现这个功能(主要用于注释掉download看搜索结果的话),是在download(search())之前增加一个for循环吗?

from jmcomic import *

option = create_option('option.yml')
client = option.new_jm_client()

author = '種付'


def search():
    # 站内搜索main_tag=0。
    # 搜索第一页。
    page: JmSearchPage = client.search_site(author, page=1)
    # 直接返回这一页的所有本子id
    return list(page.iter_id())


def download(id_list):
    # 自定义author字段的解析:一律使用'author值'
    JmModuleConfig.AFIELD_ADVICE['author'] = lambda album: author
    download_album(id_list, option)

#for循环是加在这里吗?
#if  
#print

download(search())

另外试着用其他关键字运行了一下代码,有不少编码报错,应该是gkb和utf-8的编码的问题,这个在Google上看到一堆解决的,反而不知道该怎么写了

UnicodeEncodeError: 'gbk' codec can't encode character '\u30fb' in position 190: illegal multibyte sequence

随便截取了一个报错的编码

把下载下来的所有webp文件都转成一个pdf文件

你好,这里有一点疑问。我现在想把下载下来的所有webp文件都转成一个pdf文件。正常的话需要把webp全都手动转成jpg然后使用acrobat打印成pdf。请问这种情况可以在代码中实现吗?我相信不少人和我一样都希望一个章节是一个pdf来观赏网路漫画。感谢!

categories_filter_gen结果为空

奇妙的bug,直接访问https://18comic-c.art/albums/?page=1&o=mr&t=a发现是有内容的

import sys; print('Python %s on %s' % (sys.version, sys.platform))
E:\Python\3.11.6\python.exe -X pycache_prefix=C:\Users\云熙awa\AppData\Local\JetBrains\PyCharmCE2023.3\cpython-cache D:/Pycharm/plugins/python-ce/helpers/pydev/pydevd.py --multiprocess --client 127.0.0.1 --port 54780 --file E:\PythonProject\JMDownload\new\apiWeb.py 
已连接到 pydev 调试器(内部版本号 233.13763.11)2024-02-03 16:06:21:【plugin.invoke】调用插件: [login]
2024-02-03 16:06:21:【html】https://18comic-c.art/login
2024-02-03 16:06:22:【plugin.login】登录成功
2024-02-03 16:06:59:【html】https://18comic-c.art/albums/?page=1&o=mr&t=a
**我debug时输入的代码:next(next(jmclt.categories_filter_gen()).iter_id())**
PyDev console: starting.
2024-02-03 16:07:10:【html】https://18comic-c.art/albums/?page=1&o=mr&t=a
Traceback (most recent call last):
  File "D:\Pycharm\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
StopIteration

感觉自己捅了bug窝了,快把jmcomic底裤都翻出来了

希望能添加一个download_album_for_pdf函数直接下载pdf文件

通过此程序下载的图片是webp格式的,这使得一些设备查看起来非常麻烦,比如说win7电脑,我希望能添加一个函数不仅能下载webp文件,还能直接把所有图片转成一个pdf文档,这样在老设备上查看跟简单一些。以下是我写过的一个webp转pdf的函数,您可以参考一下:

import os
import img2pdf


def convert_webp_to_pdf(folder_path, output_filename):
    """
    Converts all WebP files in the specified folder to a single PDF file.

    Args:
        folder_path (str): Path to the folder containing WebP files.
        output_filename (str): Name of the output PDF file.

    Returns:
        None
    """
    webp_files = [file for file in os.listdir(folder_path) if file.lower().endswith(".webp")]

    if not webp_files:
        print("No WebP files found in the specified folder.")
        return

    images = [os.path.join(folder_path, file) for file in webp_files]

    try:
        with open(output_filename, "wb") as pdf_file:
            pdf_content = img2pdf.convert(images)
            pdf_file.write(pdf_content)
        print(f"Conversion successful! Output saved as {output_filename}")
    except Exception as e:
        print(f"Error during conversion: {e}")

手动调整Action下载包结构 下载完成的本子.zip/书名.zip

JMComic-Crawler-Python 通过 Github Action 下载

1 .github/workflows/download_dispatch.yml

添加:

# 固定值
JM_ZIP_DOWNLOAD_DIR: /home/runner/work/jmcomic/zip/

修改:

DIR_RULE:
	...
	default: 'Bd_Atitle_Pindex'
	...


- name: 上传结果
uses: actions/upload-artifact@v3
with:
  ...
  path: ${{ env.JM_ZIP_DOWNLOAD_DIR }}
  ...

删除(整个删除):

  - name: 压缩文件
    run: |
      cd $JM_DOWNLOAD_DIR
      tar -zcvf "../$ZIP_NAME" ./
      mv "../$ZIP_NAME" .

2 assets/option/option_workflow_download.yml

在最后追加(注意树结构对齐):

after_download: # 全部下载完成以后
  ...
  ...

after_album:
  - plugin: zip # 压缩文件插件
    kwargs:
      level: album # 按照本子,一个本子对应一个压缩文件,该压缩文件会包含这个本子的所有章节

      filename_rule: Atitle
      
      zip_dir: ${JM_ZIP_DOWNLOAD_DIR} # 压缩文件存放的文件夹
      delete_original_file: true # 压缩成功后,删除所有原文件和文件夹

3 测试下载

https://github.com/你的用户名/JMComic-Crawler-Python/actions/workflows/download_dispatch.yml

JM560008,20Mb

下载完成的本子.zip(下载文件名)

解压到当前目录:
	书名(文件夹)
		章节01
		章节02
		章节03

impersonate chrome is not supported

[2024-04-10 00:25:23] [MainThread]:【html】https://jm-comic.org/album/324930

  • Serving Flask app 'plugin_jm_server.app'
  • Debug mode: off
    [2024-04-10 00:25:23] [MainThread]:【req.error】impersonate chrome is not supported
    [2024-04-10 00:25:23] [MainThread]:【html】https://jm-comic.org/album/324930
    [2024-04-10 00:25:23] [MainThread]:【req.retry】次数: [1/5], 域名: [0 of ['jm-comic.org', 'jm-comic2.cc', '18comic.vip', '18comic.org']], 路径: [https://jm-comic.org/album/324930], 参数: [{'headers': {'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.7', 'accept-language': 'zh-CN,zh;q=0.9', 'sec-ch-ua': '"Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"', 'sec-ch-ua-mobile': '?0', 'sec-ch-ua-platform': '"Windows"', 'sec-fetch-dest': 'document', 'sec-fetch-mode': 'navigate', 'sec-fetch-site': 'none', 'sec-fetch-user': '?1', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', 'authority': 'jm-comic.org', 'origin': 'https://jm-comic.org', 'referer': 'https://jm-comic.org'}}]
    [2024-04-10 00:25:23] [MainThread]:【req.error】impersonate chrome is not supported

使用环境正常,代理可以正常访问,下载提示请求重试全部失败
image
而且我的下载列表里都没有324930这个漫画
image
image

jm1,jm2,jm3,jm4,报错

我的python版本是3.10,
当我尝试直接输入1,2,3,4,不会发出请求
【api】https://18comic.vip/album/1
image
当我尝试输入jm1,jm2,jm3,jm4
能够正确返回,但没有进行下载图片,我自己调用对应的cdn地址是存在的图片的
image

自动侦测新章节

写了一段自动侦测是否有新章节、并自动下载所有新章节的code
只需要在list上输入相对应本子id、章节id即可
更新完後会显示最新的章节id,完成後把新的章节id再带入list,就可以循环使用
分享给有需要的人

import jmcomic 

option = jmcomic.create_option(
'D:/config.yml'
)
client = option.build_jm_client()

#带入漫画id, 章节id(第x章),寻找该漫画下第x章节後的所有章节Id
def find_update(albums, id):
    result = []
    flag = False
    
    for item in albums:
        if flag:
            result.append(item)
        
        if item.photo_id == id:
            flag = True
            
    return result

#带入漫画id, 章节id(第x章),自动下载x章节以後的章节,并回传最新的章节id
def check_download(album_id, photo_id):
    album = client.get_album_detail(album_id)
    targets = find_update(album, photo_id)
    id = ""
    #下载 and 取最後的id
    for item in targets:
        jmcomic.download_photo(item.photo_id)
        id = item.photo_id
    #回传最後的id
    if id != photo_id and id != "":
        return {"album_id": album_id, "photo_id": id}
    else:
        return None

#带入要更新的清单,回传更新结果
def start(list):
    result = []
    for item in list:
        #开始侦测
        item_result = check_download(item['album_id'],item['photo_id'])
        #发现有更新则储存最後章节id
        if item_result is not None:
            result.append(item_result)
    return result

#侦测更新清单
list = [
    #带入本子id,目前更新的章节id (只抓yyy以後的章节,不含yyy)
    {"album_id": 'xxx', "photo_id": 'yyy'},
    #带入本子id,目前更新的章节id (只抓bbb以後的章节,不含bbb)
    {"album_id": 'aaa', "photo_id": 'bbb'}
]

result_list = start(list)

#结果,没有印出结果就是没有更新
for item in result_list:
    print("漫画id " + item['album_id'] + " 更新章节id至 " + item['photo_id'])

在'趣味用法:测试你的ip可以访问哪些禁漫域名'中出现NameError: name 'AdvancedDict' is not defined

Traceback (most recent call last):
File "D:\jmdownload\jmdownload\测试.py", line 5, in
from jmcomic import *
File "C:\ProgramData\anaconda3\lib\site-packages\jmcomic_init_.py", line 7, in
from .api import *
File "C:\ProgramData\anaconda3\lib\site-packages\jmcomic\api.py", line 1, in
from .jm_downloader import *
File "C:\ProgramData\anaconda3\lib\site-packages\jmcomic\jm_downloader.py", line 1, in
from .jm_option import *
File "C:\ProgramData\anaconda3\lib\site-packages\jmcomic\jm_option.py", line 1, in
from .jm_client_impl import *
File "C:\ProgramData\anaconda3\lib\site-packages\jmcomic\jm_client_impl.py", line 3, in
from .jm_client_interface import *
File "C:\ProgramData\anaconda3\lib\site-packages\jmcomic\jm_client_interface.py", line 1, in
from .jm_toolkit import *
File "C:\ProgramData\anaconda3\lib\site-packages\jmcomic\jm_toolkit.py", line 356, in
class JmPageTool:
File "C:\ProgramData\anaconda3\lib\site-packages\jmcomic\jm_toolkit.py", line 475, in JmPageTool
def parse_api_to_search_page(cls, data: AdvancedDict) -> JmSearchPage:
NameError: name 'AdvancedDict' is not defined

执行错误

执行以下code出现错误,请问是安装问题还是版本有问题?
import jmcomic
jmcomic.download_album('360537')

image
image

【问题反馈】使用GitHub Actions下载禁漫本子(已截止于2023-08-12)

✨提Issue注意✨

  • 这个Issue的主题是 使用Github Actions下载禁漫本子
  • 提Issue之前请先爬一些楼层和搜索,雷同的问题不用重复提
  • 和本Issue主题无关的BUG请单独开Issue

✨最新功能提醒✨

之前需要编辑文件提交才能触发Github Actions,现在不需要啦!

单独下载一个章节

✨前情提要✨

1. 铺垫概念

  • 禁漫中的album和photo的含义:album表示本子,photo表示章节,一个album可以包含多个photo
    例如,本子145504的URL是: album-145504,这个本子有104个章节
    image

2. 代码版本

  • 以下代码基于最新的jmcomic版本 v2.2.3
    你可以使用下面的命令来保证jmcomic的版本是最新的
pip install jmcomic -i https://pypi.org/project --upgrade

3. 确定需求

  • 如果你只是对一个本子的一个章节感兴趣,想下载这个章节,请看下面的需求1
  • 如果你是持续的对一个本子的新章节感兴趣,想下载某章以后的新章节,请看下面的需求2

新插件idea:“离线版” 禁漫天堂

想法来源

当下载完一个本子后,往往需要用看图软件打开图片文件,然后一页一页翻看,
我感觉这样很麻烦,用看图软件看本地本子的使用体验不如使用浏览器。
因为浏览器可以整体放大缩小、可以鼠标滑动翻页、还可以用一些浏览器插件。
于是,我有一个想法:实现一个“离线版”的禁漫天堂。
具体来说,就是搭建一个本地文件服务器,能用浏览器观看下载好的本子图片。

PS:有一个已经实现的内置插件,也是做类似的事情:#183
这个插件的功能是把下载的章节图片都合并为一个pdf,用一个pdf来看整章节的图片

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.