Git Product home page Git Product logo

geektime2pdf's Introduction

极客时间专栏转换为PDF

说明:该项目仅仅只能用户个人学习使用,不能在商业中使用,若极客时间官方要求该代码仓库删除,请联系我进行删除

使用方法

配置信息

在配置文件config.js中修改配置所需要的信息

/**
 * 需要转换为 pdf 的配置信息 
 */
module.exports = {
    url: 'https://time.geekbang.org/serv/v1/article', // 该配置项不需要改动
    commentUrl: 'https://time.geekbang.org/serv/v1/comments', // 该配置项不需要改动
    columnBaseUrl: 'https://time.geekbang.org/column/article/', // 该配置项不需要改动
    columnName: '玩转VScode', // 专栏名称
    firstArticalId: 18053, //专栏第一篇文章的ID
    articalIds: [201700,202772,204472,205784],  //指定下载的articalId, 优先级更高, 配置后firstArticalId配置将失效
    isdownloadVideo: false, // 是否下载音频
    isComment: false, // 是否导出评论
    cookie: 'cookie'
};
  • 上面的配置项前三项是不需要修改的, 只需要修改后面的专栏信息

  • 会自动生成一个geektime_{{columnName}} 的文件夹来保存导出的所有pdf文件, columnName 为上面配置的

  • firstArticalId 这个参数最好配置专栏第一篇文章的 ID ,这个可以获取专栏的所有的文章,若不是第一篇文章的ID 则获取的是该文章以及之后的文章

  • articalIds 这个参数配置为需要获取的文章的所有的 ID

  • cookie 你在网页版登录后返回的cookie信息

运行

  1. git clone [email protected]:jjeejj/geektime2pdf.git 在本地克隆下来
  2. 然后执行 npm i 安装依赖
  3. 运行主程序 node columnArticleList.js 等待一段时间,生成 PDF 完成

这里可以先设置 firstArticalId 参数,获取整个专栏的内容;若中间有错误,不用管它,等运行完毕后,再设置 articalIds 参数,参数的值为上面获取失败的文章 ID,再次运行下载

导出结果

问题汇总

geektime2pdf's People

Contributors

dependabot[bot] avatar jjeejj avatar niyalishanda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

geektime2pdf's Issues

特别放送没问题,但是主要内容出现报错

(node:9388) UnhandledPromiseRejectionWarning: Error: Protocol error (Target.closeTarget): Target closed. at Promise (C:\Users\sy\geektime2pdf\node_modules\puppeteer\lib\Connection.js:74:56) at new Promise () at Connection.send (C:\Users\sy\geektime2pdf\node_modules\puppeteer\lib\Connection.js:73:12) at Page.close (C:\Users\sy\geektime2pdf\node_modules\puppeteer\lib\Page.js:991:38) at Page. (C:\Users\sy\geektime2pdf\node_modules\puppeteer\lib\helper.js:111:23) at generaterPdf (C:\Users\sy\geektime2pdf\generaterPdf.js:38:27) (node:9388) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 5)

已经购买了专栏,但是不能下载

已经购买了专栏,也确定在浏览器登录了帐号,
如果通过firstArticalId配置专栏,就只能下载免费阅读的那一篇文章,之后没有任何提示 ;
如果通过articalIds进行手动配置则同样只能下载第一篇文章,之后显示访问:
访问 地址 https://time.geekbang.org/column/article/186076 err 用户未购买此专栏(确定已经购买)
希望大佬能看看

help me

MacBook-Pro:geektime2pdf stonetest$ node columnArticleList.js

该文件夹已经存在 /Users/stonetest/geektime2pdf/geektime_MySQL实战45讲

专栏文章链接开始获取

error msg 用户未购买此专栏

访问 地址 https://time.geekbang.org/column/article/78427 err 用户未购买此专栏

专栏文章链接获取完成

**

  • 需要转换为 pdf 的配置信息

*/

module.exports = {

url: 'https://time.geekbang.org/serv/v1/article',

commentUrl: 'https://time.geekbang.org/serv/v1/comments',

columnBaseUrl: 'https://time.geekbang.org/column/article/',

columnName: 'MySQL实战45讲',

firstArticalId: 78427, //专栏第一篇文章的ID

isdownloadVideo: false, // 是否下载音频

isComment: true, // 是否导出评论

cookie: '_ga=GA1.2.2063249228.1573724722;GCID=bfdb8a9-972b001-95b8360-8a40fc6; GRID=bfdb8a9-972b001-95b8360-8a40fc6; _gid=GA1.2.1892212275.1587371465;  _gat=1; SERVERID=1fa1f330efedec1559b3abbcb6e30f50|1587537595|1587535691; Hm_lvt_022f847c4e3acd44d4a2481d9187f1e6=1587536660,1587536666,1587537528,1587537551; Hm_lpvt_022f847c4e3acd44d4a2481d9187f1e6=1587537594; GCESS=BAcEUpJfGQIEjeafXgQEAC8NAAoEAAAAAAwBAQsC

求助帖

首先给作者赞一个,发帖是想问下,拉钩教育的课程,这几天有空否,帮写个转换文章到pdf的程序

TimeoutError: Navigation Timeout Exceeded: 30000ms exceeded相关异常

当我代码执行到如下部分时,
await page.setContent(utils.renderEjsArticle2Html(data, options));
抛出了超时异常,

generater pdf err { TimeoutError: Navigation Timeout Exceeded: 30000ms exceeded
   at Promise.then (E:\work_space\Java\geektime2pdf\node_modules\puppeteer\lib\LifecycleWatcher.js:143:21)
 -- ASYNC --
   at Frame.<anonymous> (E:\work_space\Java\geektime2pdf\node_modules\puppeteer\lib\helper.js:110:27)
   at Page.setContent (E:\work_space\Java\geektime2pdf\node_modules\puppeteer\lib\Page.js:647:42)
   at Page.<anonymous> (E:\work_space\Java\geektime2pdf\node_modules\puppeteer\lib\helper.js:111:23)
   at generaterPdf (E:\work_space\Java\geektime2pdf\generaterPdf.js:28:20)
   at process._tickCallback (internal/process/next_tick.js:68:7) name: 'TimeoutError' }

请问这种情况应该如何避免,我尝试结合网上的意见,对代码进行调整,修改为
await page.goto(data:text/html,${utils.renderEjsArticle2Html(data, options)}, { waitUntil: 'networkidle2' });
如此一来虽然不再报错,但是生成的pdf是空白文档,希望作者能帮忙答疑一下,谢谢

下载失败

generater pdf err { Error: ENOENT: no such file or directory, open 'D:\mario\github\geektime2pdf\geektime_Kafka核心技术与实战\17 | 消费者组重平衡能避免吗?.pdf'
-- ASYNC --
at Page. (D:\mario\github\geektime2pdf\node_modules\puppeteer\lib\helper.js:110:27)
at generaterPdf (D:\mario\github\geektime2pdf\generaterPdf.js:31:20)
errno: -4058,
code: 'ENOENT',
syscall: 'open',
path:
'D:\mario\github\geektime2pdf\geektime_Kafka核心技术与实战\17 | 消费者组重平衡能避免吗?.pdf' }
(node:14764) UnhandledPromiseRejectionWarning: Error: Protocol error (Target.closeTarget): Target closed.
at Promise (D:\mario\github\geektime2pdf\node_modules\puppeteer\lib\Connection.js:74:56)
at new Promise ()
at Connection.send (D:\mario\github\geektime2pdf\node_modules\puppeteer\lib\Connection.js:73:12)
at Page.close (D:\mario\github\geektime2pdf\node_modules\puppeteer\lib\Page.js:991:38)
at Page. (D:\mario\github\geektime2pdf\node_modules\puppeteer\lib\helper.js:111:23)
at generaterPdf (D:\mario\github\geektime2pdf\generaterPdf.js:38:27)
(node:14764) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 17)

这不是 issue,这是为了表达对作者的感谢 🙏🙏

非常感谢作者开源了这个项目!利用这个工具,我可以下载我付费购买的课程到本地电脑,而 PDF 文件也比极客时间官网和 app 更方便注释和查找,对于我的学习和复习都很有帮助。

顺便说一下,我知道针对极客时间课程的盗版很猖獗,但我可以负责任的说,我希望极客时间不断发展壮大,推出更多优质课程,因此我只会善用这个工具提升自己的学习效率,绝不会利用这个工具去损害极客时间的利益。我也衷心希望使用这个工具的网友们能支持正版,共同营造一个良性的知识经济氛围,最终互利互惠。

cookie问题

cookie有很多,应该取哪一项呢?谢谢

报错信息

这种报错怎么办啊
`
$ node columnArticleList.js
internal/modules/cjs/loader.js:985
throw err;
^

Error: Cannot find module 'superagent'

`

代码块样式、划线笔记 有优化空间吗

工具导出pdf很方便,感谢~

另发现以下2个问题

  • 代码块样式缺失,主要是没有代码行号,和正文内容配合阅读不太方便
  • 自己在文章里做的划线笔记,有的有显示,有的没有显示,可否能配置为:全部显示 or 全部不显示 ?

不知道可有继续优化的空间 ^-^

generater pdf err TimeoutError: Navigation Timeout Exceeded: 30000ms exceeded

今天下载的两个课程都出现了如题所述的报错。

  • 《Kafka核心技术与实战》:3 次报错。
  • 《数据结构与算法之美》:1 次报错。

报错的章节最终未能生成 PDF 文件,其它没有报错的章节仍然能够正常生成 PDF。

详细错误信息如下:

开篇词 | 从今天起,跨过“数据结构与算法”这道坎
开始获取  https://time.geekbang.org/column/article/39922 评论
结束获取  https://time.geekbang.org/column/article/39922 评论 总评论数为 641
generater pdf start
generater pdf err TimeoutError: Navigation Timeout Exceeded: 30000ms exceeded
    at /Users/liyang/github_projects/geektime2pdf/node_modules/puppeteer/lib/LifecycleWatcher.js:143:21
  -- ASYNC --
    at Frame.<anonymous> (/Users/liyang/github_projects/geektime2pdf/node_modules/puppeteer/lib/helper.js:110:27)
    at Page.setContent (/Users/liyang/github_projects/geektime2pdf/node_modules/puppeteer/lib/Page.js:647:42)
    at Page.<anonymous> (/Users/liyang/github_projects/geektime2pdf/node_modules/puppeteer/lib/helper.js:111:23)
    at generaterPdf (/Users/liyang/github_projects/geektime2pdf/generaterPdf.js:28:20)
    at processTicksAndRejections (internal/process/task_queues.js:85:5)
    at async getNextColumnArticleUrl (/Users/liyang/github_projects/geektime2pdf/columnArticleList.js:62:13)
    at async getColumnArticleList (/Users/liyang/github_projects/geektime2pdf/columnArticleList.js:85:5) {
  name: 'TimeoutError'
}
(node:4383) UnhandledPromiseRejectionWarning: Error: Protocol error (Target.closeTarget): Target closed.
    at /Users/liyang/github_projects/geektime2pdf/node_modules/puppeteer/lib/Connection.js:74:56
    at new Promise (<anonymous>)
    at Connection.send (/Users/liyang/github_projects/geektime2pdf/node_modules/puppeteer/lib/Connection.js:73:12)
    at Page.close (/Users/liyang/github_projects/geektime2pdf/node_modules/puppeteer/lib/Page.js:991:38)
    at Page.<anonymous> (/Users/liyang/github_projects/geektime2pdf/node_modules/puppeteer/lib/helper.js:111:23)
    at generaterPdf (/Users/liyang/github_projects/geektime2pdf/generaterPdf.js:38:27)
    at async getNextColumnArticleUrl (/Users/liyang/github_projects/geektime2pdf/columnArticleList.js:62:13)
    at async getColumnArticleList (/Users/liyang/github_projects/geektime2pdf/columnArticleList.js:85:5)
(node:4383) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)

希望能够下载特定的章节

由于出现了 #6 的报错,导致课程所属的章节未能全部下载,因此希望作者能够提供下载特定章节的方法。非常感谢。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.