fuergaosi233 / gitbook2pdf Goto Github PK

View Code? Open in Web Editor NEW

1.0K 18.0 169.0 4.26 MB

Grab the contents of the gitbook document and convert it to pdf

Python 100.00%

gitbook2pdf's Introduction

Welcome to Gitbook2pdf 👋

Simple but powerful tools for converting gitbook pages to pdf.

🏠 Homepage

English 中文

Feature

Asynchronous grab Use aiohttp to grab Can be in a few seconds the data capture.
Grab the text can be replicated
Save the original directory structure
Retain the original hyperlinks

Completely retained the original format（Use js rendered unable to retain🤷‍♂️
Smaller storage space, 800 pages of PDF file is only 4.6 MB

Sample files

KubernetesHandbook.pdf

Install

Notice!

Because it need to use weasyprint for pdf generation but pip can't complete weasyprint installation, so you need to install it manually. it's weasyprint install tutorial If you don't want install install dependencies you can use made of su Yang docker image

pip install -r requirements.txt

Usage

python gitbook.py {url}

Run tests

python gitbook.py http://self-publishing.ebookchain.org

Custom

Results generated by the CSS to define if you want to add other styles can modify gitbook.css.

Author

👤 fuergaosi233

Twitter: @fuergaosi 👤 LiaoChangjiang

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page.

Show your support

Give a ⭐️ if this project helped you!

warning⚠️

Using weasyprint to generate pdf files will compare eating memory Please pay attention to whether your own memory is enough

gitbook2pdf's People

Contributors

Stargazers

Watchers

Forkers

liaochangjiang yu-yv ldw5821cn mango-svip awesome-archive papajia lgb020 huakucha pureal lubaoyilang imfht nofeetbird0321 kuibuke zoeleee zhy52 will-grindelwald horacehe15 tjumick pengjunkun vincke lvzhouyang jonason91 regzhuce abirdcfly diazraelwang tangyuxiaoyao poplp chenjunming alonegg alanlonglong cliffordlai amber1990zhang xingag syncyourmind paladinzh wall-wxk liuxinfsky ibehujun junk-chuan linhui songfangquan dawell lidongdongbuaa zhaihaisheng d4wner amtech cjluzzl prettyfish yeqing112 q-e-d muyixi315 caoshuaitong 823893795 kanewww asiacny zero0r1 hnxyzhw zherop utmcontent hahaleyile ten0n bitbitbyte lasfox tainenko zachkeer changpioner prettyhe huangjiaju titor007 maxwellyu1024 tangent11 pxdawn li-dev2020 hhhcommon timyeung a1609jk cdlz hongshu-share redfoxfox moodykeke idweball yuly jiev chenzhenguo kailiu119 zjsxwc mfiianon foundation-maker umuism terryhu08 jiangfengyuhuo weiling103 zonglu666 walternater bueasy jyg0723 dayudaoren acheng-floyd ttddtd aplot249

gitbook2pdf's Issues

怎么更改输出的字体呢？

macOSX python 3.7 报错

python3 gitbook.py http://self-publishing.ebookchain.org

Traceback (most recent call last):
File "gitbook.py", line 2, in
import requests
ModuleNotFoundError: No module named 'requests'

链接是否可设置为相对地址？

文档使用的本地地址，导出后无法使用

可以增加单线程吗，经常报域名解析错误

我不是很清楚这个错误是否因为并发过多引起的，
不过我觉得脚本可以增加对错误的友好处理，而不是直接崩溃
此外，如果可以保存上次爬取失败的进度，下次不用从新开始会更好吧。

图片输出好像有问题呀

原网页是:
输出后却变成了<img src="http://192.168.179.101/../assets/demo.jpg alt=" "="">
导致图片无法显示，因为名称变成了demo.jpg alt=

能否支持离线下载图片后再转pdf的功能

公司的gitbook是只允许公司网络访问的，但是我又想回家后再家里继续看，虽然文字可以通过转换后的pdf继续阅读，但是图片由于公司服务器的限制无法正常查看。
感觉增加一个参数直接下载图片并打包进pdf应该是可行的，而且应该也有人会有这种需求吧……

爬取html版本文件中文地址报错

占用内存太高。4g ubuntu killed

crawl : all done!
Generating pdf,please wait patiently
Killed

导出的是空白文档

您好，打扰了。安装好之后，从gitbook.io的书籍导出来的是空白文档。

生成的pdf乱码

[feature]使生成的pdf文件有大纲信息(目录结构

对应文档

重写weasyprint中的HTML
使其可以通过手动增加class来决定大纲结构而不是默认的h1-h6标签

[feature]增加更多的metadata信息

官方文档在这：文档

让输出的pdf包含更多有用的metadata信息

title
authors
description
keywords
created

`pip3 install -r requirements.txt`报错 mac

按教程win10 64bit 执行py报错

严格按教程操作的
报错信息：
cairo = dlopen(ffi, 'cairo', 'cairo-2', 'cairo-gobject-2', 'cairo.so.2')
File "C:\Users\Qinvz\AppData\Local\Programs\Python\Python37\lib\site-packages\cairocffi_init_.py", line 36, in dlopen
raise OSError("dlopen() failed to load a library: %s" % ' / '.join(names))
OSError: dlopen() failed to load a library: cairo / cairo-2 / cairo-gobject-2 / cairo.so.2
环境：
win10 1809 64bit
python 3.7.3
pip 19.0.3

感谢作者，但是图片确实无法输出

中文全部乱码

系统：centos
python版本：3.7

两个小问题

1.目录抓取不全

例如要抓取这个页面的内容：https://wizardforcel.gitbooks.io/python-quant-uqer/content/

当前程序中的这段判断存在问题：

if len(level.split(".")) == 2:
    ...

这导致data-level大于3的目录抓取不到。

将collect_toc函数改为：

def collect_toc():
    text = requests.get(BASE_URL, headers=headers).text
    soup = BeautifulSoup(text, 'html.parser')
    lis = soup.find('ul', class_='summary').find_all('li')

    content_urls = []

    for li in lis:
        element_class = li.attrs.get('class')
        if not element_class:
            continue
        if "chapter" in element_class:
            data_path = li.attrs.get('data-path')
            content_urls.append(data_path)
    return content_urls

获取正文中所有li标签，如果带有chapter class的就属于章节内容，将其append到content_urls列表即可。

2. 当没有footer的时候无需隐藏

 if context.find('footer'):
     context.remove(context.find('footer'))

参照教程安装后 , 报语法错误 .

python gitbook.py https://*.com

  File "gitbook.py", line 220
    string = f"<h1 class='{class_}'>{title}</h1>"
                                                ^
SyntaxError: invalid syntax

本地环境 :
debian 9 , python3 , virtualenv .

I think the file path has some problem and the script don't support authorized gitbook

First of all, I use win10 platform, when I execute the gitbook.py immediately, the script has problem : FileNotFoundError: [Errno 2] No such file or directory: './html5_ua.css' for html5_ua.css,gitbook.css,and output folder. It seems like the absolute path and relative path problem .When I modify gitbook.py line 248 to : fname = "D:/MyWorkspace/python/gitbook2pdf-master/output/" + fname , there is have no problem.
Secondly, There is need username and password which gitbook I want to download. but the script can't support it . : (

一片空白

运行起来没有报错, 但是生成的内容一片空白

封装了一个“即开即食”的 Docker 镜像

这个小工具挺好用的，但是看到有同学不太擅长环境配置，于是封装了一个镜像，如果你也是容器用户，可以做到“一键生成”电子书。

项目地址：https://github.com/soulteary/docker-gitbook-pdf-generator
封装细节：https://soulteary.com/2019/05/07/generate-small-gitbook-pdf-using-the-docker-with-python.html

感谢作者，🍻。

python gitbook.py http://self-publishing.ebookchain.org 异常

按照文档提示语法错误，换其他的url地址也不行

python gitbook.py http://self-publishing.ebookchain.org
File "gitbook.py", line 14
async def request(url, headers, timeout=None):
^
SyntaxError: invalid syntax

`python3 gitbook.py http://self-publishing.ebookchain.org` 报错

我的python版本 3.7.3，mac
另外，除了gitbook，请问这个工具可以爬取其他网站吗？

抽象成一个类供外部调用

抽象成一个Gitbook2PDF类，传入起始链接和可选的文件名：

class Gitbook2PDF():
    def __init__(self, base_url, fname=None):
        self.fname = fname
        self.base_url = base_url
        ...

    def run(self):
        ...

外部函数通过以下形式进行调用

Gitbook2PDF("https://hit-alibaba.github.io/interview/").run()

pip3 install error

pip3 install -r requirements.txt

    resp.raise_for_status()
  File "/usr/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/models.py", line 935, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/get/

pip3 -V
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

CERTIFICATE_VERIFY_FAILED

python gitbook.py http://book.flutterchina.club
报错
ClientConnectorCertificateError: Cannot connect to host book.flutterchina.club:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1051)')]

python3 gitbook.py https://book.flutterchina.club/ 报错

# ...

done :  https://book.flutterchina.club/chapter1/
done :  https://book.flutterchina.club/chapter12/ios_implement.html
done :  https://book.flutterchina.club/chapter12/android_implement.html
done :  https://book.flutterchina.club/chapter10/
done :  https://book.flutterchina.club/chapter4/stack.html
done :  https://book.flutterchina.club/chapter13/multi_languages_support.html
Traceback (most recent call last):
  File "gitbook.py", line 306, in <module>
    Gitbook2PDF(url).run()
  File "gitbook.py", line 195, in run
    loop.run_until_complete(self.crawl_main_content(content_urls))
  File "/Users/zhangshuyao/anaconda3/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "gitbook.py", line 217, in crawl_main_content
    await asyncio.gather(*tasks)
  File "gitbook.py", line 238, in gettext
    text = ChapterParser(metatext, title, level, ).parser()
  File "gitbook.py", line 102, in parser
    return html.unescape(ET.tostring(context).decode())
  File "src/lxml/etree.pyx", line 3437, in lxml.etree.tostring
  File "src/lxml/serializer.pxi", line 139, in lxml.etree._tostring
  File "src/lxml/serializer.pxi", line 199, in lxml.etree._raiseSerialisationError
lxml.etree.SerialisationError: IO_ENCODER

在爬取个例时遇到困难

是我搞错了

File "gitbook.py", line 14 async def request(url, headers, timeout=None): ^ SyntaxError: invalid syntax

File "gitbook.py", line 14
async def request(url, headers, timeout=None):
^
SyntaxError: invalid syntax

请教下导出的pdf中文乱码应该怎么解决?

pdf output 乱码

/gitbook2pdf # python gitbook.py http://self-publishing.ebookchain.org
/usr/local/lib/python3.6/site-packages/weasyprint/document.py:34: UserWarning: There are known rendering problems and missing features with cairo < 1.15.4. WeasyPrint may work with older versions, but please read the note about the needed cairo version on the "Install" page of the documentation before reporting bugs. http://weasyprint.readthedocs.io/en/latest/install.html
'There are known rendering problems and missing features with '
crawling : http://self-publishing.ebookchain.org/3-如何打造自己的平台？/3-电子书的生成.html
crawling : http://self-publishing.ebookchain.org/5-附录/1-关于作者.html
crawling : http://self-publishing.ebookchain.org/1-干嘛要写书？/2-写书的好处/readme.html
crawling : http://self-publishing.ebookchain.org/3-如何打造自己的平台？/4-电子书的发布.html
crawling : http://self-publishing.ebookchain.org/2-什么是自出版平台？/readme.html
crawling : http://self-publishing.ebookchain.org/3-如何打造自己的平台？/readme.html
crawling : http://self-publishing.ebookchain.org/4-最佳实践/妙手偶得无须刻意.html
crawling : http://self-publishing.ebookchain.org/index.html
crawling : http://self-publishing.ebookchain.org/3-如何打造自己的平台？/1-Summary的安装.html
crawling : http://self-publishing.ebookchain.org/3-如何打造自己的平台？/2-Summary的使用.html
crawling : http://self-publishing.ebookchain.org/5-附录/0-参考信息.html
crawling : http://self-publishing.ebookchain.org/1-干嘛要写书？/1-写书的必要性/readme.html
done : http://self-publishing.ebookchain.org/2-什么是自出版平台？/readme.html
done : http://self-publishing.ebookchain.org/1-干嘛要写书？/1-写书的必要性/readme.html
done : http://self-publishing.ebookchain.org/index.html
done : http://self-publishing.ebookchain.org/3-如何打造自己的平台？/3-电子书的生成.html
done : http://self-publishing.ebookchain.org/3-如何打造自己的平台？/1-Summary的安装.html
done : http://self-publishing.ebookchain.org/3-如何打造自己的平台？/readme.html
done : http://self-publishing.ebookchain.org/5-附录/0-参考信息.html
done : http://self-publishing.ebookchain.org/3-如何打造自己的平台？/2-Summary的使用.html
done : http://self-publishing.ebookchain.org/4-最佳实践/妙手偶得无须刻意.html
done : http://self-publishing.ebookchain.org/5-附录/1-关于作者.html
done : http://self-publishing.ebookchain.org/3-如何打造自己的平台？/4-电子书的发布.html
done : http://self-publishing.ebookchain.org/1-干嘛要写书？/2-写书的好处/readme.html
crawl : all done!
Generating pdf,please wait patiently
Generated

[bug] ascii' codec can't encode characters in position 16-24

ubuntu 18.04
python 3.6.7

$ python3 gitbook.py http://hukai.me/android-training-course-in-chinese/

Traceback (most recent call last):
  File "gitbook.py", line 303, in <module>
    Gitbook2PDF(url).run()
  File "gitbook.py", line 205, in run
    self.write_pdf(self.fname, html_text, css_text)
  File "gitbook.py", line 246, in write_pdf
    with open(htmlname, 'w', encoding='utf-8') as f:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-24: ordinal not in range(128)

$ python3 gitbook.py http://self-publishing.ebookchain.org

crawling :  crawling :  crawling :  crawling :  crawling :  crawling :  crawling :  crawling :  crawling :  http://self-publishing.ebookchain.org/index.html
crawling :  crawling :  crawling :  Traceback (most recent call last):
  File "gitbook.py", line 303, in <module>
    Gitbook2PDF(url).run()
  File "gitbook.py", line 192, in run
    loop.run_until_complete(self.crawl_main_content(content_urls))
  File "/usr/lib/python3.6/asyncio/base_events.py", line 473, in run_until_complete
    return future.result()
  File "gitbook.py", line 214, in crawl_main_content
    await asyncio.gather(*tasks)
  File "gitbook.py", line 228, in gettext
    print("crawling : ", url)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-49: ordinal not in range(128)

python gitbook.py http://self-publishing.ebookchain.org 报错

python gitbook.py http://self-publishing.ebookchain.org

File "gitbook.py", line 14
async def request(url, headers, timeout=None):
^
SyntaxError: invalid syntax

Any plan to submit this package to pypi

It will be more convenient for Python beginners using this package, if you submit it to Python Pypi. The guide of submitting is here: Packaging Python Projects — Python Packaging User Guide , another unofficial guide here: How to upload your python package to PyPi - joelbarmettlerUZH - Medium.

Thanks for your repository.

weasyprint write_pdf got stuck

gitbook2pdf/gitbook.py

Line 176 in 3339b87

tmphtml.write_pdf(fname, stylesheets=[tmpcss])

这个位置可能跑不动并且没有报错。

使用：

python: 3.7.2
url: https://learnyoua.haskell.sg/content/zh-cn/ch01/introduction.html

Mac 下运行包缺少库

ython3 ../gitbook2pdf/gitbook.py https://books.studygolang.com/gopl-zh/
Traceback (most recent call last):
File "../gitbook2pdf/gitbook.py", line 5, in
import weasyprint
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/weasyprint/init.py", line 393, in
from .css import preprocess_stylesheet # noqa
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/weasyprint/css/init.py", line 26, in
from . import computed_values
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/weasyprint/css/computed_values.py", line 17, in
from .. import text
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/weasyprint/text.py", line 14, in
import cairocffi as cairo
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/cairocffi/init.py", line 39, in
cairo = dlopen(ffi, 'cairo', 'cairo-2', 'cairo-gobject-2', 'cairo.so.2')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/cairocffi/init.py", line 36, in dlopen
raise OSError("dlopen() failed to load a library: %s" % ' / '.join(names))
OSError: dlopen() failed to load a library: cairo / cairo-2 / cairo-gobject-2 / cairo.so.2

pip install -r requirements.txt 报错

Could not find a version that satisfies the requirement urllib3==1.25.3 (from -r requirements.txt (line 29)) (from versions: 0.3, 1.0, 1.0.1, 1.0.2, 1.1, 1.2, 1.2.1, 1.2.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.7.1, 1.8, 1.8.2, 1.8.3, 1.9, 1.9.1, 1.10, 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.11, 1.12, 1.13, 1.13.1, 1.14, 1.15, 1.15.1, 1.16, 1.17, 1.18, 1.18.1, 1.19, 1.19.1, 1.20, 1.21, 1.21.1, 1.22, 1.23, 1.24, 1.24.1, 1.24.2, 1.24.3, 1.25, 1.25.1, 1.25.2)
No matching distribution found for urllib3==1.25.3 (from -r requirements.txt (line 29))

能否支持导出epub？

为什么导出的pdf是空白？

你好！
我在使用的时候，导出gitbook上的书的时候，显示为空白。
然后我就尝试示例：
python gitbook.py http://self-publishing.ebookchain.org

但是output产生的文件还是空白文档：
-rw-rw-r-- 1 340 7月 10 08:58 NetworkError.html
-rw-rw-r-- 1 939 7月 10 08:58 NetworkError.pdf

请问产生这个问题的可能原因是什么？谢谢！

The script cannot handle table DOM

https://warsier.gitbooks.io/new_master_rule/content/3/32/322/3222.html
Traceback (most recent call last): File "gitbook.py", line 309, in <module> Gitbook2PDF(url).run() File "gitbook.py", line 195, in run loop.run_until_complete(self.crawl_main_content(content_urls)) File "/home/ilab/.conda/envs/gitbook/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete return future.result() File "gitbook.py", line 217, in crawl_main_content await asyncio.gather(*tasks) File "gitbook.py", line 238, in gettext text = ChapterParser(metatext, title, level, ).parser() File "gitbook.py", line 102, in parser return html.unescape(ET.tostring(context).decode()) File "src/lxml/etree.pyx", line 3351, in lxml.etree.tostring File "src/lxml/serializer.pxi", line 139, in lxml.etree._tostring File "src/lxml/serializer.pxi", line 199, in lxml.etree._raiseSerialisationError lxml.etree.SerialisationError: IO_ENCODER

The script seems to go wrong when the page contains a table.

Could not find a version that satisfies the requirement get==2019.4.13

ERROR: Could not find a version that satisfies the requirement get==2019.4.13 (from -r requirements.txt (line 13)) (from versions: none)
ERROR: No matching distribution found for get==2019.4.13 (from -r requirements.txt (line 13))

一直显示在重新抓取

抓取这个地址https://wizardforcel.gitbooks.io/daxueba-kali-linux-tutorial/55.html 的书籍时，只有小部分页面能抓取成功，其他页面都在等待状态。

mac下报错

File "gitbook.py", line 14
async def request(url, headers, timeout=None):
^
SyntaxError: invalid syntax

建议增加睡眠功能

多线程爬经常被ban
此外，ssl返回错误的话，脚本就会崩溃，
脚本崩溃就得重爬，
重爬有不可能一次性爬完，
又得循环上述步骤

建议加入单线程，每爬一个连接休息n秒钟的功能

麻烦备注说下这个工具需要python3环境

python2 安装的话会报错

python3 安装才行

尤其是pip 需要使用pip3 进行安装

否则会因为找不到aiohttp报错

Could not find a version that satisfies the requirement aiohttp==3.5.4 (from -r requirements.txt (line 1)) (from versions: 0.1, 0.2, 0.3, 0.4, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.5.0, 0.6.0, 0.6.1, 0.6.2, 0.6.3, 0.6.4, 0.6.5, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.8.4, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.10.0, 0.10.1, 0.10.2, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.14.2, 0.14.3, 0.14.4, 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0, 0.16.1, 0.16.2, 0.16.3, 0.16.4, 0.16.5, 0.16.6, 0.17.0, 0.17.1, 0.17.2, 0.17.3, 0.17.4, 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.18.4, 0.19.0, 0.20.0, 0.20.1, 0.20.2, 0.21.0, 0.21.1, 0.21.2, 0.21.4, 0.21.5, 0.21.6, 0.22.0a0, 0.22.0b0, 0.22.0b1, 0.22.0b2, 0.22.0b3, 0.22.0b4, 0.22.0b5, 0.22.0b6, 0.22.0, 0.22.1, 0.22.2, 0.22.3, 0.22.4, 0.22.5, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.5, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.2.0, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 2.0.0rc1, 2.0.0, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6.post1, 2.0.7, 2.1.0, 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 2.3.0a4, 2.3.0, 2.3.1, 2.3.2b2)

爬取到一定的数量的时候，出现disconnect

使用命令：Python版本 3.6.5
python gitbook.py https://wizardforcel.gitbooks.io/python-quant-uqer/content/
根据爬取的日志，定位代码，优化了一个地方：增加了休眠时间
async def gettext(self, index, url, level, title):
'''
return path's html
'''

    secRnd = random.randint(2, 7)
    time.sleep(secRnd)
    print("防止压不住,设置暂停时间:{}秒,crawling : {}".format(secRnd, url))
    try:
        metatext = await request(url, self.headers, timeout=10)
    except Exception as e:
        time.sleep(secRnd)
        print("防止压不住,设置暂停时间:{}秒,recrawling : {}".format(secRnd, url))
        metatext = await request(url, self.headers)
    try:
        text = ChapterParser(metatext, title, level, ).parser()
        print("done : ", url)            
        self.content_list[index] = text
    except IndexError:
        print('faild at : ', url, ' maybe content is empty?')

但是到爬取到一定的时候，还是会出现disconnect的错误。
done : https://wizardforcel.gitbooks.io/python-quant-uqer/content/81.html
Traceback (most recent call last):
File "gitbook.py", line 5, in
Gitbook2PDF(url).run()
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 202, in run
loop.run_until_complete(self.crawl_main_content(content_urls))
File "d:\ProgramData\Anaconda3\envs\python36\lib\asyncio\base_events.py", line 468, in run_until_complete
return future.result()
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 224, in crawl_main_content
await asyncio.gather(*tasks)
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 246, in gettext
metatext = await request(url, self.headers)
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 21, in request
async with session.get(url, headers=headers, timeout=timeout) as resp:
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client.py", line 1005, in aenter
self._resp = await self._coro
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client.py", line 497, in _request
await resp.start(conn)
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client_reqrep.py", line 844, in start

message, payload = await self._protocol.read()  # type: ignore  # noqa

File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\streams.py", line 588, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: None

def collect_urls_and_metadata(self, start_url):
    response = requests.get(start_url, headers=self.headers, verify=False)

fuergaosi233 / gitbook2pdf Goto Github PK

gitbook2pdf's Introduction

Welcome to Gitbook2pdf 👋

🏠 Homepage

Feature

Sample files

Install

Notice!

Usage

Run tests

Custom

Author

🤝 Contributing

Show your support

warning⚠️

gitbook2pdf's People

Contributors

Stargazers

Watchers

Forkers

gitbook2pdf's Issues

1.目录抓取不全

2. 当没有footer的时候无需隐藏

Recommend Projects

Recommend Topics

Recommend Org