Git Product home page Git Product logo

jean-humann / docs-to-pdf Goto Github PK

View Code? Open in Web Editor NEW
91.0 0.0 15.0 21.86 MB

Generate PDF for document website ๐Ÿง‘โ€๐Ÿ”ง

Home Page: https://www.npmjs.com/package/docs-to-pdf

License: MIT License

JavaScript 0.94% TypeScript 10.89% Shell 0.07% Dockerfile 0.22% MDX 0.58% HTML 87.04% CSS 0.25%
documentation docusaurus docusaurus-documentation pdf-generation pdf pdf-converter

docs-to-pdf's Introduction

Docs to PDF

npm node-current npm Codecov GitHub

๐Ÿ“Œ Introduction

This is a PDF generator from document website such as docusaurus. This is a fork of mr-pdf which was not maintained anymore. Feel free to contribute to this project.

๐Ÿ“ฆ Installation

npm install -g docs-to-pdf

๐Ÿš€ Quick Start

npx docs-to-pdf --initialDocURLs="https://docusaurus.io/docs/" --contentSelector="article" --paginationSelector="a.pagination-nav__link.pagination-nav__link--next" --excludeSelectors=".margin-vert--xl a,[class^='tocCollapsible'],.breadcrumbs,.theme-edit-this-page" --coverImage="https://docusaurus.io/img/docusaurus.png" --coverTitle="Docusaurus v2"

โšก Usage

For Docusaurus v2

npx docs-to-pdf docusaurus --initialDocURLs="https://docusaurus.io/docs/"

OR

npx docs-to-pdf --initialDocURLs="https://docusaurus.io/docs/" --contentSelector="article" --paginationSelector="a.pagination-nav__link.pagination-nav__link--next" --excludeSelectors=".margin-vert--xl a,[class^='tocCollapsible'],.breadcrumbs,.theme-edit-this-page" --coverImage="https://docusaurus.io/img/docusaurus.png" --coverTitle="Docusaurus v2"

๐Ÿ— CLI Global Options

Option Required Description
--initialDocURLs Yes set URL to start generating PDF from.
--contentSelector No used to find the part of main content
--paginationSelector No CSS Selector used to find next page to be printed for looping.
--excludeURLs No URLs to be excluded in PDF
--excludeSelectors No exclude selectors from PDF. Separate each selector with comma and no space. But you can use space in each selector. ex: --excludeSelectors=".nav,.next > a"
--cssStyle No CSS style to adjust PDF output ex: --cssStyle="body{padding-top: 0;}" *If you're project owner you can use @media print { } to edit CSS for PDF.
--outputPDFFilename No name of the output PDF file. Default is docs-to-pdf.pdf
--pdfMargin No set margin around PDF file. Separate each margin with comma and no space. ex: --pdfMargin="10,20,30,40". This sets margin top: 10px, right: 20px, bottom: 30px, left: 40px
--paperFormat No pdf format ex: --paperFormat="A3". Please check this link for available formats Puppeteer document
--disableTOC No Optional toggle to show the table of contents or not
--coverTitle No Title for the PDF cover.
--coverImage No <src> Image for PDF cover (does not support SVG)
--coverSub No Subtitle the for PDF cover. Add <br/> tags for multiple lines.
--headerTemplate No HTML template for the print header. Please check this link for details of injecting values Puppeteer document
--footerTemplate No HTML template for the print footer. Please check this link for details of injecting values Puppeteer document
--puppeteerArgs No Add puppeteer BrowserLaunchArgumentOptions arguments ex: --sandbox Puppeteer document
--protocolTimeout No Timeout setting for individual protocol calls in milliseconds. If omitted, the default value of 180000 ms (3 min) is used
--filterKeyword No Only adds pages to the PDF containing a given meta keywords. Makes it possible to generate PDFs of selected pages
--baseUrl No Base URL for all relative URLs. Allows to render the pdf on localhost (ci/Github Actions) while referencing the deployed page.
--excludePaths No URL Paths to be excluded
--restrictPaths No Keep Only URL Path with the same rootPath as --initialDocURLs

Docusaurus Options

Option Required Description
--version No Docusaurus version. Default is 2.
--builDir No Path to Docusaurus build dir. Either absolute or relative from path of the shell

๐ŸŽจ Examples and Demo PDF

Docusaurus v2

20210603060438

https://docusaurus.io/

initialDocURLs: https://docusaurus.io/docs

demoPDF: https://github.com/jean-humann/docs-to-pdf/blob/master/pdf/v2-docusaurus.pdf

command:

npx docs-to-pdf docusaurus --initialDocURLs="https://docusaurus.io/docs/"

OR

npx docs-to-pdf --initialDocURLs="https://docusaurus.io/docs/" --contentSelector="article" --paginationSelector="a.pagination-nav__link.pagination-nav__link--next" --excludeSelectors=".margin-vert--xl a,[class^='tocCollapsible'],.breadcrumbs,.theme-edit-this-page" --coverImage="https://docusaurus.io/img/docusaurus.png" --coverTitle="Docusaurus v2"

Docusaurus v1 - Legacy

https://docusaurus.io/en/

initialDocURLs: https://docusaurus.io/docs/en/installation

demoPDF: https://github.com/jean-humann/docs-to-pdf/blob/master/pdf/v1-docusaurus.pdf

command:

npx docs-to-pdf docusaurus --initialDocURLs="https://docusaurus.io/docs/en/installation" --version=1

OR

npx docs-to-pdf --initialDocURLs="https://docusaurus.io/docs/en/installation" --contentSelector="article" --paginationSelector=".docs-prevnext > a.docs-next" --excludeSelectors=".fixedHeaderContainer,footer.nav-footer,#docsNav,nav.onPageNav,a.edit-page-link,div.docs-prevnext" --cssStyle=".navPusher {padding-top: 0;}" --pdfMargin="20"

PR to add new docs is welcome here... ๐Ÿ˜ธ

๐Ÿ“„ How docs-to-pdf works

  1. puppeteer can make html to PDF like you can print HTML page in chrome browser
  2. so, the idea of docs-to-pdf is generating one big HTML through looping page link, then run page.pdf() from puppeteer to generate PDF.

docs-to-pdf-diagram

๐ŸŽ‰ Thanks

This repo's code is coming from https://github.com/KohheePeace/mr-pdf.

Thanks for awesome code made by @KohheePeace, @maxarndt and @aloisklink.

@bojl approach to make TOC was awesome and breakthrough.

docs-to-pdf's People

Contributors

codingluke avatar dependabot[bot] avatar jafin avatar jean-humann avatar kohheepeace avatar ksmarty avatar lidkxx avatar meddbase-steve avatar mrdrivingduck avatar release-please[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docs-to-pdf's Issues

bookmarks

Can I support generating PDF bookmarks?

Hyperlinks in PDF linking to web documentation

The links (apart from TOC) inside the PDF open up the corresponding web page instead of the PDF page. Is there a way to ensure the links point to the heading in the PDF instead of the web page?

Basic Auth support

Hi Jean,

thanks for creating this project.
It works great for me.

The production version of my documentation is behind a basic auth access.
Would it be possible add the credentials at startup of the crawler?

Kind regards

Error on generating - timeout

I am trying to generate PDF from

npx docs-to-pdf --initialDocURLs="https://ignatandrei.github.io/RSCG_Examples/v2/docs/List-of-RSCG" --contentSelector="article" --paginationSelector="a.pagination-nav__link.pagination-nav__link--next" --excludeSelectors=".margin-vert--xl a,[class^='tocCollapsible'],.breadcrumbs,.theme-edit-this-page"  --coverTitle="RSCG --protocolTimeout=54000"

It is all well before the final
[30.08.2023 23:15.27.852] [LOG] Start generating PDF...
[30.08.2023 23:15.27.852] [LOG] Generate cover...
[30.08.2023 23:15.27.852] [LOG] Start generating TOC...
[30.08.2023 23:15.27.958] [LOG] Restructuring the html of a document...
[30.08.2023 23:15.35.378] [LOG] Remove unnecessary HTML...
[30.08.2023 23:15.35.379] [LOG] Scroll to the bottom of the page...
[30.08.2023 23:16.29.393] [ERROR] ProtocolError: Runtime.callFunctionOn timed out. Increase the 'protocolTimeout' setting in launch/connect calls for a higher timeout if needed.
at <instance_members_initializer> (C:\Users\ignat\AppData\Local\npm-cache_npx\c16ac64a6c7aba73\node_modules\puppeteer-core\lib\cjs\puppeteer\common\Connection.js:49:14)
at new Callback (C:\Users\ignat\AppData\Local\npm-cache_npx\c16ac64a6c7aba73\node_modules\puppeteer-core\lib\cjs\puppeteer\common\Connection.js:53:16)
at CallbackRegistry.create (C:\Users\ignat\AppData\Local\npm-cache_npx\c16ac64a6c7aba73\node_modules\puppeteer-core\lib\cjs\puppeteer\common\Connection.js:93:26)

Could you please help?

Quick Start example doesn't work

I tried running the example from the README

npx docs-to-pdf --initialDocURLs="https://docusaurus.io/docs/" --contentSelector="article" --paginationSelector="a.pagination-nav__link.pagination-nav__link--next" --excludeSelectors=".margin-vert--xl a,[class^='tocCollapsible'],.breadcrumbs,.theme-edit-this-page" --coverImage="https://docusaurus.io/img/docusaurus.png" --coverTitle="Docusaurus v2"

and I got this error:

[10.10.2023 11:08.19.379] [DEBUG] Using Chromium from /home/kkovacs/.cache/puppeteer/chrome/linux-117.0.5938.149/chrome-linux64/chrome
[10.10.2023 11:08.19.607] [DEBUG] Chrome user data dir: /tmp/puppeteer_dev_chrome_profile-2V52e1
[10.10.2023 11:08.19.646] [LOG]   Retrieving html from https://docusaurus.io/docs/
[10.10.2023 11:08.21.047] [DEBUG] Found 0 elements
[10.10.2023 11:08.21.049] [LOG]   Success
[10.10.2023 11:08.21.051] [LOG]   Retrieving html from https://docusaurus.io/docs/category/getting-started
[10.10.2023 11:08.22.165] [DEBUG] Found 0 elements
[10.10.2023 11:08.22.166] [LOG]   Success


...


[10.10.2023 11:09.23.630] [LOG]   Success
[10.10.2023 11:09.23.634] [LOG]   Retrieving html from https://docusaurus.io/docs/deployment
[10.10.2023 11:09.25.372] [DEBUG] Found 6 elements
[10.10.2023 11:09.25.379] [DEBUG] Clicking summary: How much resource (person-hours, money) am I willing to invest in this?
[10.10.2023 11:09.26.267] [DEBUG] Clicking summary: How much server-side configuration would I need?
[10.10.2023 11:09.27.104] [DEBUG] Clicking summary: Do I have needs to cooperate?
[10.10.2023 11:09.27.944] [DEBUG] Clicking summary: GitHub action files
[10.10.2023 11:09.28.771] [DEBUG] Clicking summary: GitHub action file
[10.10.2023 11:09.28.780] [ERROR] Error: Node is either not clickable or not an Element
    at CdpElementHandle.clickablePoint (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/puppeteer-core/lib/cjs/puppeteer/api/ElementHandle.js:680:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async CdpElementHandle.<anonymous> (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/puppeteer-core/lib/cjs/puppeteer/api/ElementHandle.js:258:32)
    at async CdpElementHandle.click (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/puppeteer-core/lib/cjs/puppeteer/api/ElementHandle.js:710:30)
    at async CdpElementHandle.<anonymous> (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/puppeteer-core/lib/cjs/puppeteer/api/ElementHandle.js:261:36)
    at async openDetails (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/docs-to-pdf/lib/utils.js:212:13)
    at async generatePDF (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/docs-to-pdf/lib/utils.js:82:21)

Just wanted to point this out because I'm struggling to get this to work on my own site, so I wanted a working example reference.

Templates for arguments

--contentSelector="article" --paginationSelector="a.pagination-nav__link.pagination-nav__link--next" --excludeSelectors=".margin-vert--xl a,[class^='tocCollapsible'],.breadcrumbs,.theme-edit-this-page"

This software always requires a so-long options. It is so long that no one can input without reading the README. It would be nice if we can shorten this to like:

--template docusaurus2

An option to control whether all of `<details>` elements are opened

https://docusaurus.io/docs/markdown-features#details

<details> allows us to hide contents only for experts. It would be nice if we can control whether <details> are opened.

In the current version, all of <details> are closed.

For beginners
image

For experts
image

Can Puppeteer do this operation before printing the jointed page?

flowchart TD

S(Start) --> F[Find and open closed elements]
F --> C{New closed\nelements appeared?}
C -->|Yes| F
C -->|No| Done(Done)
Loading

Option to restrict the subpath range

npx docs-to-pdf --initialDocURLs="https://docusaurus.io/docs/markdown-features" --contentSele
ctor="article" --paginationSelector="a.pagination-nav__link.pagination-nav__link--next" --excludeSelectors=".margin-vert--xl a,[class^='tocCollapsible'],.breadcrumbs,.theme-edit-this-page" --coverImage="https://docusaurus.io/img/docusaurus.png" --coverTitle="Docusaurus v2"
[13.08.2023 17:17.08.551] [DEBUG] Using Chromium from C:\Program Files\Google\Chrome\Application\chrome.exe
[13.08.2023 17:17.08.781] [DEBUG] Chrome user data dir: C:\Users\tatsu\AppData\Local\Temp\puppeteer_dev_chrome_profile-wjQgPd
[13.08.2023 17:17.08.870] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features
[13.08.2023 17:17.10.684] [LOG]   Success
[13.08.2023 17:17.10.689] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/react
[13.08.2023 17:17.12.843] [LOG]   Success
[13.08.2023 17:17.12.844] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/tabs
[13.08.2023 17:17.14.508] [LOG]   Success
[13.08.2023 17:17.14.510] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/code-blocks
[13.08.2023 17:17.16.113] [LOG]   Success
[13.08.2023 17:17.16.114] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/admonitions
[13.08.2023 17:17.17.707] [LOG]   Success
[13.08.2023 17:17.17.711] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/toc
[13.08.2023 17:17.19.122] [LOG]   Success
[13.08.2023 17:17.19.127] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/assets
[13.08.2023 17:17.21.602] [LOG]   Success
[13.08.2023 17:17.21.603] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/links
[13.08.2023 17:17.23.143] [LOG]   Success
[13.08.2023 17:17.23.144] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/plugins
[13.08.2023 17:17.24.639] [LOG]   Success
[13.08.2023 17:17.24.641] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/math-equations
[13.08.2023 17:17.26.649] [LOG]   Success
[13.08.2023 17:17.26.650] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/diagrams
[13.08.2023 17:17.28.193] [LOG]   Success
[13.08.2023 17:17.28.194] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/head-metadata
[13.08.2023 17:17.29.655] [LOG]   Success
[13.08.2023 17:17.29.658] [LOG]   Retrieving html from https://docusaurus.io/docs/styling-layout
[13.08.2023 17:17.30.985] [LOG]   Success
[13.08.2023 17:17.30.987] [LOG]   Retrieving html from https://docusaurus.io/docs/swizzling
[13.08.2023 17:17.32.235] [LOG]   Success
๏ธ™

Is there an option to prevent this software from fetching pages out of https://docusaurus.io/docs/markdown-features?
It can't be covered by --excludeURLs.

How to disabled cover and TOC title

Without coverTitle coverImage coverSub options, a blank cover is still generated.
TOC title Table of contents: cannot be modified or disabled.

Search / Select in Mac Preview not working

Hi @jean-humann

I just figured out something very strange. When I open the generated PDF in my firefox, I can select and search text just fine. However, when I open the same File in Mac Preview the text is not correctly selectable.

Here a video showing it with the example pdf.

Screenshot_2023-08-10_000075.mp4

When I try the same with the PDFs generated by marp which also uses pupperteer/chromium to generate PDFs from HTML, everything works fine. @yhatt do you maybe have some idea on this?

Best codingluke

Error: Node is either not clickable or not an Element when <details> is inside <tabs>

Hello!

I have a page with <tabs>, one of which contains <details>.

Last logs before the error:

[LOG]   Retrieving html from <page url>
[DEBUG] Found 1 elements
[DEBUG] Clicking summary: <element name>

and then the error:

Error: Node is either not clickable or not an Element
    at CdpElementHandle.clickablePoint (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\node_modules\puppeteer-core\lib\cjs\puppeteer\api\ElementHandle.js:682:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async CdpElementHandle.<anonymous> (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\node_modules\puppeteer-core\lib\cjs\puppeteer\api\ElementHandle.js:259:32)
    at async CdpElementHandle.click (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\node_modules\puppeteer-core\lib\cjs\puppeteer\api\ElementHandle.js:712:30)
    at async CdpElementHandle.<anonymous> (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\node_modules\puppeteer-core\lib\cjs\puppeteer\api\ElementHandle.js:262:36)
    at async openDetails (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\lib\utils.js:212:13)
    at async generatePDF (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\lib\utils.js:82:21)

image

Idea: Align headers level to the sidebar nesting, or make page level configurable by meta keywords

At the moment, when generating a PDF from a Website, every subpage starts with a <h1>. However on the Website some pages are nested under higher level pages.

For example:

Screenshot_2023-08-16_000095

Here getting started is the entry point and has multiple subsites like "installation" and "configuration" and so on.

I question myself whether it would be great to finde out, if a page is a parent or a child and automatically change the heading level to the next, when it is a child. On installation the <h1> would become a <h2> and so on...

๐Ÿ’ก We could also manage this with meta keywords, so it would be manual configurable per page :)
Together with the bookmarks enhancement this would make it superior to word and google docs.

What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.