Git Product home page Git Product logo

gitbook2pdf's Introduction

Welcome to Gitbook2pdf 👋

Twitter: fuergaosi

Simple but powerful tools for converting gitbook pages to pdf.

Python 3.6 English 中文

Feature

  • Asynchronous grab Use aiohttp to grab Can be in a few seconds the data capture.

  • Grab the text can be replicated

  • Save the original directory structure

  • Retain the original hyperlinks

  • Completely retained the original format(Use js rendered unable to retain🤷‍♂️
  • Smaller storage space, 800 pages of PDF file is only 4.6 MB

Sample files

KubernetesHandbook.pdf

Install

Notice!

Because it need to use weasyprint for pdf generation but pip can't complete weasyprint installation, so you need to install it manually. it's weasyprint install tutorial If you don't want install install dependencies you can use made of su Yang docker image

pip install -r requirements.txt

Usage

python gitbook.py {url}

Run tests

python gitbook.py http://self-publishing.ebookchain.org

Custom

Results generated by the CSS to define if you want to add other styles can modify gitbook.css.

Author

👤 fuergaosi233

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page.

Show your support

Give a ⭐️ if this project helped you!

warning⚠️

Using weasyprint to generate pdf files will compare eating memory Please pay attention to whether your own memory is enough

gitbook2pdf's People

Contributors

dependabot[bot] avatar feeeei avatar fuergaosi233 avatar imzhi avatar liaochangjiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gitbook2pdf's Issues

可以增加单线程吗,经常报域名解析错误

我不是很清楚这个错误是否因为并发过多引起的,
不过我觉得脚本可以增加对错误的友好处理,而不是直接崩溃
此外,如果可以保存上次爬取失败的进度,下次不用从新开始会更好吧。

能否支持离线下载图片后再转pdf的功能

公司的gitbook是只允许公司网络访问的,但是我又想回家后再家里继续看,虽然文字可以通过转换后的pdf继续阅读,但是图片由于公司服务器的限制无法正常查看。
感觉增加一个参数直接下载图片并打包进pdf应该是可行的,而且应该也有人会有这种需求吧……

爬取html版本文件中文地址报错

faild at : http://39.108.191.211/19day/��/��.html maybe content is empty?
faild at : http://39.108.191.211/17day/third-chapter/����.html maybe content is empty?
faild at : http://39.108.191.211/21day/mini-web��-url��.html maybe content is empty?
faild at : http://39.108.191.211/21day/mini-web��-mysql-�.html maybe content is empty?
faild at : http://39.108.191.211/20day/mini-web��-mysql-�.html maybe content is empty?
faild at : http://39.108.191.211/22day/�类��ORM.html maybe content is empty?
faild at : http://39.108.191.211/20day/mini-web��-路���正�.html maybe content is empty?
faild at : http://39.108.191.211/21day/logging��模�.html maybe content is empty?
faild at : http://39.108.191.211/21day/mini-web��-mysql-�.html maybe content is empty?
faild at : http://39.108.191.211/22day/�类.html maybe content is empty?
faild at : http://39.108.191.211/20day/mini-web��-����为html格�.html maybe content is empty?
faild at : http://39.108.191.211/15day/mini web��-4-路�.html maybe content is empty?
faild at : http://39.108.191.211/20day/����.html maybe content is empty?
faild at : http://39.108.191.211/20day/mini-web��-��伪��url.html maybe content is empty?
faild at : http://39.108.191.211/15day/01-��.html maybe content is empty?
faild at : http://39.108.191.211/15day/02-�饰�.html maybe content is empty?
faild at : http://39.108.191.211/14day/mini web��-3-��模�.html maybe content is empty?
faild at : http://39.108.191.211/19day/账�管�.html maybe content is empty?
faild at : http://39.108.191.211/20day/mini-web��-�mysql中�询��.html maybe content is empty?
faild at : http://39.108.191.211/14day/mini web��-2-�示页�.html maybe content is empty?
faild at : http://39.108.191.211/19day/账�管�/����.html maybe content is empty?
faild at : http://39.108.191.211/19day/账�管�/账���.html maybe content is empty?
faild at : http://39.108.191.211/14day/����示�.html maybe content is empty?
faild at : http://39.108.191.211/19day/��/�交.html maybe content is empty?
faild at : http://39.108.191.211/20day/伪�����������.html maybe content is empty?
faild at : http://39.108.191.211/14day/Web�����-1-����.html maybe content is empty?
faild at : http://39.108.191.211/14day/mini web��-1-�件��.html maybe content is empty?

按教程win10 64bit 执行py报错

严格按教程操作的
报错信息:
cairo = dlopen(ffi, 'cairo', 'cairo-2', 'cairo-gobject-2', 'cairo.so.2')
File "C:\Users\Qinvz\AppData\Local\Programs\Python\Python37\lib\site-packages\cairocffi_init_.py", line 36, in dlopen
raise OSError("dlopen() failed to load a library: %s" % ' / '.join(names))
OSError: dlopen() failed to load a library: cairo / cairo-2 / cairo-gobject-2 / cairo.so.2
环境:
win10 1809 64bit
python 3.7.3
pip 19.0.3

两个小问题

1.目录抓取不全

例如要抓取这个页面的内容:https://wizardforcel.gitbooks.io/python-quant-uqer/content/

当前程序中的这段判断存在问题:

if len(level.split(".")) == 2:
    ... 

这导致data-level大于3的目录抓取不到。

将collect_toc函数改为:

def collect_toc():
    text = requests.get(BASE_URL, headers=headers).text
    soup = BeautifulSoup(text, 'html.parser')
    lis = soup.find('ul', class_='summary').find_all('li')

    content_urls = []

    for li in lis:
        element_class = li.attrs.get('class')
        if not element_class:
            continue
        if "chapter" in element_class:
            data_path = li.attrs.get('data-path')
            content_urls.append(data_path)
    return content_urls

获取正文中所有li标签,如果带有chapter class的就属于章节内容,将其append到content_urls列表即可。

2. 当没有footer的时候无需隐藏

 if context.find('footer'):
     context.remove(context.find('footer'))

参照教程安装后 , 报语法错误 .

python gitbook.py https://*.com

  File "gitbook.py", line 220
    string = f"<h1 class='{class_}'>{title}</h1>"
                                                ^
SyntaxError: invalid syntax

本地环境 :
debian 9 , python3 , virtualenv .

I think the file path has some problem and the script don't support authorized gitbook

First of all, I use win10 platform, when I execute the gitbook.py immediately, the script has problem : FileNotFoundError: [Errno 2] No such file or directory: './html5_ua.css' for html5_ua.css,gitbook.css,and output folder. It seems like the absolute path and relative path problem .When I modify gitbook.py line 248 to : fname = "D:/MyWorkspace/python/gitbook2pdf-master/output/" + fname , there is have no problem.
Secondly, There is need username and password which gitbook I want to download. but the script can't support it . : (

一片空白

运行起来没有报错, 但是生成的内容一片空白

抽象成一个类供外部调用

抽象成一个Gitbook2PDF类,传入起始链接和可选的文件名:

class Gitbook2PDF():
    def __init__(self, base_url, fname=None):
        self.fname = fname
        self.base_url = base_url
        ...

    def run(self):
        ...

外部函数通过以下形式进行调用

Gitbook2PDF("https://hit-alibaba.github.io/interview/").run()

pip3 install error

pip3 install -r requirements.txt

    resp.raise_for_status()
  File "/usr/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/models.py", line 935, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/get/

pip3 -V
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

CERTIFICATE_VERIFY_FAILED

python gitbook.py http://book.flutterchina.club
报错
ClientConnectorCertificateError: Cannot connect to host book.flutterchina.club:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1051)')]

python3 gitbook.py https://book.flutterchina.club/ 报错

# ...

done :  https://book.flutterchina.club/chapter1/
done :  https://book.flutterchina.club/chapter12/ios_implement.html
done :  https://book.flutterchina.club/chapter12/android_implement.html
done :  https://book.flutterchina.club/chapter10/
done :  https://book.flutterchina.club/chapter4/stack.html
done :  https://book.flutterchina.club/chapter13/multi_languages_support.html
Traceback (most recent call last):
  File "gitbook.py", line 306, in <module>
    Gitbook2PDF(url).run()
  File "gitbook.py", line 195, in run
    loop.run_until_complete(self.crawl_main_content(content_urls))
  File "/Users/zhangshuyao/anaconda3/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "gitbook.py", line 217, in crawl_main_content
    await asyncio.gather(*tasks)
  File "gitbook.py", line 238, in gettext
    text = ChapterParser(metatext, title, level, ).parser()
  File "gitbook.py", line 102, in parser
    return html.unescape(ET.tostring(context).decode())
  File "src/lxml/etree.pyx", line 3437, in lxml.etree.tostring
  File "src/lxml/serializer.pxi", line 139, in lxml.etree._tostring
  File "src/lxml/serializer.pxi", line 199, in lxml.etree._raiseSerialisationError
lxml.etree.SerialisationError: IO_ENCODER

pdf output 乱码

/gitbook2pdf # python gitbook.py http://self-publishing.ebookchain.org
/usr/local/lib/python3.6/site-packages/weasyprint/document.py:34: UserWarning: There are known rendering problems and missing features with cairo < 1.15.4. WeasyPrint may work with older versions, but please read the note about the needed cairo version on the "Install" page of the documentation before reporting bugs. http://weasyprint.readthedocs.io/en/latest/install.html
'There are known rendering problems and missing features with '
crawling : http://self-publishing.ebookchain.org/3-如何打造自己的平台?/3-电子书的生成.html
crawling : http://self-publishing.ebookchain.org/5-附录/1-关于作者.html
crawling : http://self-publishing.ebookchain.org/1-干嘛要写书?/2-写书的好处/readme.html
crawling : http://self-publishing.ebookchain.org/3-如何打造自己的平台?/4-电子书的发布.html
crawling : http://self-publishing.ebookchain.org/2-什么是自出版平台?/readme.html
crawling : http://self-publishing.ebookchain.org/3-如何打造自己的平台?/readme.html
crawling : http://self-publishing.ebookchain.org/4-最佳实践/妙手偶得无须刻意.html
crawling : http://self-publishing.ebookchain.org/index.html
crawling : http://self-publishing.ebookchain.org/3-如何打造自己的平台?/1-Summary的安装.html
crawling : http://self-publishing.ebookchain.org/3-如何打造自己的平台?/2-Summary的使用.html
crawling : http://self-publishing.ebookchain.org/5-附录/0-参考信息.html
crawling : http://self-publishing.ebookchain.org/1-干嘛要写书?/1-写书的必要性/readme.html
done : http://self-publishing.ebookchain.org/2-什么是自出版平台?/readme.html
done : http://self-publishing.ebookchain.org/1-干嘛要写书?/1-写书的必要性/readme.html
done : http://self-publishing.ebookchain.org/index.html
done : http://self-publishing.ebookchain.org/3-如何打造自己的平台?/3-电子书的生成.html
done : http://self-publishing.ebookchain.org/3-如何打造自己的平台?/1-Summary的安装.html
done : http://self-publishing.ebookchain.org/3-如何打造自己的平台?/readme.html
done : http://self-publishing.ebookchain.org/5-附录/0-参考信息.html
done : http://self-publishing.ebookchain.org/3-如何打造自己的平台?/2-Summary的使用.html
done : http://self-publishing.ebookchain.org/4-最佳实践/妙手偶得无须刻意.html
done : http://self-publishing.ebookchain.org/5-附录/1-关于作者.html
done : http://self-publishing.ebookchain.org/3-如何打造自己的平台?/4-电子书的发布.html
done : http://self-publishing.ebookchain.org/1-干嘛要写书?/2-写书的好处/readme.html
crawl : all done!
Generating pdf,please wait patiently
Generated
Screenshot from 2019-03-11 15-00-49

[bug] ascii' codec can't encode characters in position 16-24

ubuntu 18.04
python 3.6.7

$ python3 gitbook.py http://hukai.me/android-training-course-in-chinese/

Traceback (most recent call last):
  File "gitbook.py", line 303, in <module>
    Gitbook2PDF(url).run()
  File "gitbook.py", line 205, in run
    self.write_pdf(self.fname, html_text, css_text)
  File "gitbook.py", line 246, in write_pdf
    with open(htmlname, 'w', encoding='utf-8') as f:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-24: ordinal not in range(128)

$ python3 gitbook.py http://self-publishing.ebookchain.org

crawling :  crawling :  crawling :  crawling :  crawling :  crawling :  crawling :  crawling :  crawling :  http://self-publishing.ebookchain.org/index.html
crawling :  crawling :  crawling :  Traceback (most recent call last):
  File "gitbook.py", line 303, in <module>
    Gitbook2PDF(url).run()
  File "gitbook.py", line 192, in run
    loop.run_until_complete(self.crawl_main_content(content_urls))
  File "/usr/lib/python3.6/asyncio/base_events.py", line 473, in run_until_complete
    return future.result()
  File "gitbook.py", line 214, in crawl_main_content
    await asyncio.gather(*tasks)
  File "gitbook.py", line 228, in gettext
    print("crawling : ", url)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-49: ordinal not in range(128)

Mac 下运行包缺少库

ython3 ../gitbook2pdf/gitbook.py https://books.studygolang.com/gopl-zh/
Traceback (most recent call last):
File "../gitbook2pdf/gitbook.py", line 5, in
import weasyprint
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/weasyprint/init.py", line 393, in
from .css import preprocess_stylesheet # noqa
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/weasyprint/css/init.py", line 26, in
from . import computed_values
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/weasyprint/css/computed_values.py", line 17, in
from .. import text
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/weasyprint/text.py", line 14, in
import cairocffi as cairo
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/cairocffi/init.py", line 39, in
cairo = dlopen(ffi, 'cairo', 'cairo-2', 'cairo-gobject-2', 'cairo.so.2')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/cairocffi/init.py", line 36, in dlopen
raise OSError("dlopen() failed to load a library: %s" % ' / '.join(names))
OSError: dlopen() failed to load a library: cairo / cairo-2 / cairo-gobject-2 / cairo.so.2

pip install -r requirements.txt 报错

Could not find a version that satisfies the requirement urllib3==1.25.3 (from -r requirements.txt (line 29)) (from versions: 0.3, 1.0, 1.0.1, 1.0.2, 1.1, 1.2, 1.2.1, 1.2.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.7.1, 1.8, 1.8.2, 1.8.3, 1.9, 1.9.1, 1.10, 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.11, 1.12, 1.13, 1.13.1, 1.14, 1.15, 1.15.1, 1.16, 1.17, 1.18, 1.18.1, 1.19, 1.19.1, 1.20, 1.21, 1.21.1, 1.22, 1.23, 1.24, 1.24.1, 1.24.2, 1.24.3, 1.25, 1.25.1, 1.25.2)
No matching distribution found for urllib3==1.25.3 (from -r requirements.txt (line 29))

为什么导出的pdf是空白?

你好!
我在使用的时候,导出gitbook上的书的时候,显示为空白。
然后我就尝试示例:
python gitbook.py http://self-publishing.ebookchain.org

但是output产生的文件还是空白文档:
-rw-rw-r-- 1 340 7月 10 08:58 NetworkError.html
-rw-rw-r-- 1 939 7月 10 08:58 NetworkError.pdf

请问产生这个问题的可能原因是什么?谢谢!

The script cannot handle table DOM

https://warsier.gitbooks.io/new_master_rule/content/3/32/322/3222.html
Traceback (most recent call last): File "gitbook.py", line 309, in <module> Gitbook2PDF(url).run() File "gitbook.py", line 195, in run loop.run_until_complete(self.crawl_main_content(content_urls)) File "/home/ilab/.conda/envs/gitbook/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete return future.result() File "gitbook.py", line 217, in crawl_main_content await asyncio.gather(*tasks) File "gitbook.py", line 238, in gettext text = ChapterParser(metatext, title, level, ).parser() File "gitbook.py", line 102, in parser return html.unescape(ET.tostring(context).decode()) File "src/lxml/etree.pyx", line 3351, in lxml.etree.tostring File "src/lxml/serializer.pxi", line 139, in lxml.etree._tostring File "src/lxml/serializer.pxi", line 199, in lxml.etree._raiseSerialisationError lxml.etree.SerialisationError: IO_ENCODER

The script seems to go wrong when the page contains a table.

mac下报错

File "gitbook.py", line 14
async def request(url, headers, timeout=None):
^
SyntaxError: invalid syntax

建议增加睡眠功能

多线程爬经常被ban
此外,ssl返回错误的话,脚本就会崩溃,
脚本崩溃就得重爬,
重爬有不可能一次性爬完,
又得循环上述步骤

建议加入单线程,每爬一个连接休息n秒钟的功能

麻烦备注说下这个工具需要python3环境

python2 安装的话 会报错

python3 安装才行

尤其是pip 需要使用pip3 进行安装

否则会因为找不到aiohttp报错

Could not find a version that satisfies the requirement aiohttp==3.5.4 (from -r requirements.txt (line 1)) (from versions: 0.1, 0.2, 0.3, 0.4, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.5.0, 0.6.0, 0.6.1, 0.6.2, 0.6.3, 0.6.4, 0.6.5, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.8.4, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.10.0, 0.10.1, 0.10.2, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.14.2, 0.14.3, 0.14.4, 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0, 0.16.1, 0.16.2, 0.16.3, 0.16.4, 0.16.5, 0.16.6, 0.17.0, 0.17.1, 0.17.2, 0.17.3, 0.17.4, 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.18.4, 0.19.0, 0.20.0, 0.20.1, 0.20.2, 0.21.0, 0.21.1, 0.21.2, 0.21.4, 0.21.5, 0.21.6, 0.22.0a0, 0.22.0b0, 0.22.0b1, 0.22.0b2, 0.22.0b3, 0.22.0b4, 0.22.0b5, 0.22.0b6, 0.22.0, 0.22.1, 0.22.2, 0.22.3, 0.22.4, 0.22.5, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.5, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.2.0, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 2.0.0rc1, 2.0.0, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6.post1, 2.0.7, 2.1.0, 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 2.3.0a4, 2.3.0, 2.3.1, 2.3.2b2)

爬取到一定的数量的时候,出现disconnect

使用命令:Python版本 3.6.5
python gitbook.py https://wizardforcel.gitbooks.io/python-quant-uqer/content/
根据爬取的日志,定位代码,优化了一个地方:增加了休眠时间
async def gettext(self, index, url, level, title):
'''
return path's html
'''

    secRnd = random.randint(2, 7)
    time.sleep(secRnd)
    print("防止压不住,设置暂停时间:{}秒,crawling : {}".format(secRnd, url))
    try:
        metatext = await request(url, self.headers, timeout=10)
    except Exception as e:
        time.sleep(secRnd)
        print("防止压不住,设置暂停时间:{}秒,recrawling : {}".format(secRnd, url))
        metatext = await request(url, self.headers)
    try:
        text = ChapterParser(metatext, title, level, ).parser()
        print("done : ", url)            
        self.content_list[index] = text
    except IndexError:
        print('faild at : ', url, ' maybe content is empty?')

但是到爬取到一定的时候,还是会出现disconnect的错误。
done : https://wizardforcel.gitbooks.io/python-quant-uqer/content/81.html
Traceback (most recent call last):
File "gitbook.py", line 5, in
Gitbook2PDF(url).run()
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 202, in run
loop.run_until_complete(self.crawl_main_content(content_urls))
File "d:\ProgramData\Anaconda3\envs\python36\lib\asyncio\base_events.py", line 468, in run_until_complete
return future.result()
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 224, in crawl_main_content
await asyncio.gather(*tasks)
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 246, in gettext
metatext = await request(url, self.headers)
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 21, in request
async with session.get(url, headers=headers, timeout=timeout) as resp:
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client.py", line 1005, in aenter
self._resp = await self._coro
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client.py", line 497, in _request
await resp.start(conn)
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client_reqrep.py", line 844, in start

message, payload = await self._protocol.read()  # type: ignore  # noqa

File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\streams.py", line 588, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: None

无法抓取公式

Bug 描述:
用 gitbook2pdf 工具将在线文档转成 pdf,在生成的 pdf 中,所有的公式均无法显示,初步推测解析和生成过程中出现了一些 bug, 导致 weasyprint 未能将公式进行转化导致无法显示数学公式。也有可能是 gitbook 所使用的 markdown 公式解析 js 在解析公式时有所延时导致。

Bug 复现:
python gitbook.py https://shichaog1.gitbooks.io/hand-book-of-speech-enhancement-and-recognition/content/
对比
https://shichaog1.gitbooks.io/hand-book-of-speech-enhancement-and-recognition/content/chapter5.html
和 所生成的pdf 对应章节,网页版所有的公式在 pdf 中均无法显示

Fix [SSL: CERTIFICATE_VERIFY_FAILED]

from aiohttp import ClientSession
from aiohttp import TCPConnector

async def request(url, headers, timeout=None):
async with aiohttp.ClientSession(connector=TCPConnector(verify_ssl=False)) as session:
async with session.get(url, headers=headers, timeout=timeout) as resp:
return await resp.text()
.....
.....

def collect_urls_and_metadata(self, start_url):
    response = requests.get(start_url, headers=self.headers, verify=False)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.