Git Product home page Git Product logo

pdfcpu's Introduction

pdfcpu: a golang pdf processor

Build Status GoDoc Coverage Status Go Report Card Hex.pm

logo

Package pdfcpu is a simple PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000).

Status

Version: 0.1.15

  • Marks the first release under the Apache-2.0 license.
  • Comes with a new command for adding stamps/watermarks for selected pages supporting text and images.
  • Additional watermark configuration for fontname/size/color, absolute/relative scaling, render mode, opacity and rotation is also supported.
  • Optional intelligent rotation aligns the rotation angle with one of two page diagonals.
  • -pages now also supports odd/even. (You can even say -pages odd,n1 if you want to stamp all odd pages other than the title page.)
  • extract -mode image is now natively supporting PNG and TIFF with optional lzw compression.
  • github.com/hhrutter/pdfcpu/lzw is an improved version of compress/lzw. (There is a golang proposal.)
  • github.com/hhrutter/pdfcpu/tiff is an improved version of golang.org/x/image/tiff.
  • Bug fixes.

Motivation

This is an effort to build a PDF processing library from the ground up written in Go with strong support for batch processing via a rich command line. Over time pdfcpu aims to support the standard range of PDF processing features and also any interesting use cases that may present themselves along the way.

One example is reducing the size of large PDF files for mass mailings by optimization to the bare minimum. This can be achieved by analyzing a PDF's cross reference table, removing redundant embedded resources like font files or images and by always writing back the file maxing out PDF compression. I also wanted to have my own swiss army knife for PDFs written entirely in Go that allows me to trim, split, stamp and merge PDF content.

Features

  • Validate (validates PDF files up to version 7.0)
  • Read (builds xref table from PDF file)
  • Write (writes xref table to PDF file)
  • Optimize (gets rid of redundancies like duplicate fonts, images)
  • Split (split a multi page PDF file into single page PDF files)
  • Merge (a set of PDF files into one consolidated PDF file)
  • Extract Images (extract all embedded images of a PDF file into a given dir)
  • Extract Fonts (extract all embedded fonts of a PDF file into a given dir)
  • Extract Pages (extract specific pages into a given dir)
  • Extract Content (extract the PDF-Source into given dir)
  • Trim (generate a custom version of a PDF file)
  • Stamp/Watermark selected pages.
  • Manage (add,remove,list,extract) embedded file attachments
  • Encrypt (sets password protection)
  • Decrypt (removes password protection)
  • Change user/owner password
  • Manage (add,list) user access permissions

Demo Screencast (this is an older version with a smaller command set)

asciicast

Installation

Required build version: go1.9 and up

go get github.com/hhrutter/pdfcpu/cmd/...

Usage

pdfcpu validate [-verbose] [-mode strict|relaxed] [-upw userpw] [-opw ownerpw] inFile
pdfcpu optimize [-verbose] [-stats csvFile] [-upw userpw] [-opw ownerpw] inFile [outFile]
pdfcpu split [-verbose] [-upw userpw] [-opw ownerpw] inFile outDir
pdfcpu merge [-verbose] outFile inFile...
pdfcpu extract [-verbose] -mode image|font|content|page [-pages pageSelection] [-upw userpw] [-opw ownerpw] inFile outDir
pdfcpu trim [-verbose] -pages pageSelection [-upw userpw] [-opw ownerpw] inFile outFile
pdfcpu stamp [-verbose] -pages pageSelection description inFile [outFile]
pdfcpu watermark [-verbose] -pages pageSelection description inFile [outFile]

pdfcpu attach list [-verbose] [-upw userpw] [-opw ownerpw] inFile
pdfcpu attach add [-verbose] [-upw userpw] [-opw ownerpw] inFile file...
pdfcpu attach remove [-verbose] [-upw userpw] [-opw ownerpw] inFile [file...]
pdfcpu attach extract [-verbose] [-upw userpw] [-opw ownerpw] inFile outDir [file...]

pdfcpu encrypt [-verbose] [-mode rc4|aes] [-key 40|128] [-perm none|all] [-upw userpw] [-opw ownerpw] inFile [outFile]
pdfcpu decrypt [-verbose] [-upw userpw] [-opw ownerpw] inFile [outFile]
pdfcpu changeupw [-verbose] [-opw ownerpw] inFile upwOld upwNew
pdfcpu changeopw [-verbose] [-upw userpw] inFile opwOld opwNew

pdfcpu perm list [-verbose] [-upw userpw] [-opw ownerpw] inFile
pdfcpu perm add [-verbose] [-perm none|all] [-upw userpw] -opw ownerpw inFile

pdfcpu version

Please read the documentation

Contributing

  • Please open an issue if you find a bug or want to propose a change.
  • Feature requests - always welcome
  • Bug fixes - always welcome
  • PRs - also welcome, although I can't promise a merge-in right now since pdfcpu is stable but still alpha and occasionally undergoing heavy changes.

Disclaimer

Usage of pdfcpu assumes you know about and respect all copyrights of any PDF content you may be processing. This applies to the PDF files as such, their content and in particular all embedded resources like font files or images. Credit goes to Renee French for creating our beloved Gopher.

License

Apache-2.0

Powered By

pdfcpu's People

Contributors

hhrutter avatar haldyr avatar

Stargazers

Netcat avatar

Watchers

DeanLJ avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.