Percollate – How to Turn Web Pages into PDF Files?

When you finding some very pretty web pages and want to save it into formatted PDF files. You definitely want to get some tools to help us save the beautiful pages. In this post, we will introduce a very useful tool to help you to transform the web pages to PDF files.

Percollate is a command-line tool to turn web pages into beautifully formatted PDFs. The following is the work process of Percollate .

  1. Fetch the page(s) using got
  2. If an AMP version of the page exists, use that instead (disable with --no-amp flag)
  3. Enhance the DOM using jsdom
  4. Pass the DOM through mozilla/readability to strip unnecessary elements
  5. Apply the HTML template and the print stylesheet to the resulting HTML
  6. Use puppeteer to generate a PDF from the page

Here is an example spread from the generated PDF of a chapter in Dimensions of Colour; rendered here in black & white for a smaller image file size.

Percollate - How to Turn Web Pages into PDF Files?

The image on the web page

Percollate - How to Turn Web Pages into PDF Files?

The PDF file transformed from the web page

How to install Percollate ?

percollate needs Node.js version 8.6.0 or later, as it uses new(ish) JavaScript syntax. If you get SyntaxError: Unexpected token errors, check your Node version with node --version.

You can install percollate globally:

# using npm
npm install -g percollate

# using yarn
yarn global add percollate

To keep the package up-to-date, you can run:

# using npm, upgrading is the same command as installing
npm install -g percollate

# yarn has a separate command
yarn global upgrade --latest percollate

How to use Percollate ?

Run percollate --help for a list of available commands. For a particular command, percollate <command> --helplists all available options.

Available commands

Command What it does
percollate pdf Bundles one or more web pages into a PDF
percollate epub Not implemented yet
percollate html Not implemented yet

Available options

The pdfepub, and html commands have these options:

Option What it does
-o, --output The path of the resulting bundle; when ommited, we derive the output file name from the title of the web page.
--individual Export each web page as an individual file.
--template Path to a custom HTML template
--style Path to a custom CSS
--css Additional CSS styles you can pass from the command-line to override the default/custom stylesheet styles
--no-amp Don’t prefer the AMP version of the web page
--debug Print more detailed information
--toc Include a Table of Contents page

Generate Basic PDF

To transform a single web page to PDF:

percollate pdf --output some.pdf https://w3cgeek.com

To bundle several web pages into a single PDF, specify them as separate arguments to the command:

percollate pdf --output some.pdf https://w3cgeek.com/page1 https://w3cgeek.com/page2

You can use common Unix commands and keep the list of URLs in a newline-delimited text file:

cat urls.txt | xargs percollate pdf --output some.pdf

To transform several web pages into individual PDF files at once, use the --individual flag:

percollate pdf --individual https://w3cgeek.com/page1 https://w3cgeek.com/page2

Github