serhack / pdf-diff Goto Github PK
View Code? Open in Web Editor NEWA tool for visualizing differences between two pdf files.
License: MIT License
A tool for visualizing differences between two pdf files.
License: MIT License
As a command line tool, pdf-diff is the perfect choice for nerdy people. I was wondering if there was any alternative way to show the potentialities of pdf-diff to newbie too. It came up in my mind a sort of local server that someone can execute and using that they can compare pdf files.
Currently we use a strong red as background color (255, 0, 0). This might be nice as a demo, but if we want to have a more practice tool we should have a more transparent color for the background. This issue tracks the progress to make it more transparent.
Finding Poppler for Windows is a herculean task. Latest build is not available anywhere. So far, I could find the old Windows builds at below link and it works after you set it in PATH environment variable.
https://blog.alivate.com.au/poppler-windows/
Link this in the Readme so Windows users don't have to spend hours in order to make this tool work.
We should implement a flag such as -bg-color
that can be used to specify an hex color that will be used as a background color for highlighting differences across PDF files.
i ran some samples files.
One has just one line with a word added.
It seems the current compare logic does not flag single word differences ?
I can send a sample if needed.
Hi! Very cool project.
I'm having some problems, I'm hoping you can offer some guidance in case I am doing something wrong.
go run main.go dummy.pdf dummy2.pdf
The images generated by pdf are inserted into a folder named as the hash of the content of the pdf file. E.g. the file has the hash fc324.., the images are in the fc324 folder. If a folder with that name already exists, pdf-diff will not create any images since it consider that images were already generated.
Once ran, the images are created in the folder generated.
I got two separate folders with different shas generated, but with no content. Also the generated folder remains empty.
What am I doing wrong?
We can replace
Line 48 in 5535f71
package main
import (
"flag"
"fmt"
"image/jpeg"
"os"
"path/filepath"
"github.com/gen2brain/go-fitz"
)
func main() {
// flags for source dir (pdf) and output dir (png,etc)
sourceFile := flag.String("source", ".", "source file pdf")
targetDir := flag.String("target", ".", "target dir")
flag.Parse()
fmt.Println("sourceFile:", *sourceFile)
fmt.Println("targetDir:", *targetDir)
doc, err := fitz.New(*sourceFile)
if err != nil {
panic(err)
}
defer doc.Close()
/*
// output to runtime dir
currentDir, err := os.Getwd()
//tmpDir, err := ioutil.TempDir(os.TempDir(), "fitz")
if err != nil {
panic(err)
}
// concat out dir
tmpDir := filepath.Join(currentDir, "out")
err = os.MkdirAll(tmpDir, os.ModePerm)
*/
err = os.MkdirAll(*targetDir, os.ModePerm)
if err != nil {
panic(err)
}
// Extract pages as images
for n := 0; n < doc.NumPage(); n++ {
img, err := doc.Image(n)
if err != nil {
panic(err)
}
f, err := os.Create(filepath.Join(*targetDir, fmt.Sprintf("test%03d.jpg", n)))
if err != nil {
panic(err)
}
err = jpeg.Encode(f, img, &jpeg.Options{jpeg.DefaultQuality})
if err != nil {
panic(err)
}
f.Close()
}
// Extract pages as text
for n := 0; n < doc.NumPage(); n++ {
text, err := doc.Text(n)
if err != nil {
panic(err)
}
f, err := os.Create(filepath.Join(*targetDir, fmt.Sprintf("test%03d.txt", n)))
if err != nil {
panic(err)
}
_, err = f.WriteString(text)
if err != nil {
panic(err)
}
f.Close()
}
// Extract pages as html
for n := 0; n < doc.NumPage(); n++ {
html, err := doc.HTML(n, true)
if err != nil {
panic(err)
}
f, err := os.Create(filepath.Join(*targetDir, fmt.Sprintf("test%03d.html", n)))
if err != nil {
panic(err)
}
_, err = f.WriteString(html)
if err != nil {
panic(err)
}
f.Close()
}
}
This will build for all OS because the libs are included for all os at https://github.com/gen2brain/go-fitz/tree/master/libs
works for me on Mac. Maybe test on windows, and linux.
it would replace poppler which is very heavy IMHO and make the golang binary fully contained to a single file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.