seehuhn / go-pdf Goto Github PK
View Code? Open in Web Editor NEWSupport for reading and writing PDF files in Go.
License: GNU General Public License v3.0
Support for reading and writing PDF files in Go.
License: GNU General Public License v3.0
It would be really useful to be able to embed an SVG in a PDF.
Tracking issue for:
Tracking issue for:
I can merge multiple pdf files into one pdf file?
Tracking issue for:
Tracking issue for:
Tracking issue for:
I tried this
func extractText(fname string) error {
fd, err := os.Open(fname)
if err != nil {
return err
}
defer fd.Close()
r, err := pdf.NewReader(fd, nil)
if err != nil {
return err
}
contents := reader.New(r, nil)
contents.Text = func(text string) error {
fmt.Print(text)
return nil
}
pages := pagetree.NewIterator(r)
pageNo := 0
pages.All()(func(_ pdf.Reference, pageDict pdf.Dict) bool {
fmt.Println("Page", pageNo)
err := contents.ParsePage(pageDict, matrix.Identity)
if err != nil {
log.Fatal(err)
}
pageNo++
return true
})
return nil
}
and was hoping this would extract text for machine generated PDFs or PDF with OCR information added - but it prints nothing on all the PDFs I tried. What am I missing?
Tracking issue for:
How to import an existing pdf file?
How to automatically wrap long text?
I tried to open an encrypted PDF created with the latest version of Adobe Acrobat and I got the error: encryption dictionary: standard security handler: invalid Encrypt.O
.
Test file (password is password
):
password_protected.pdf
I tested with both v0.3.4 and v0.3.5-0.20230822001153-4ee04e1da286 / 4ee04e1.
Tracking issue for:
Something seems problematic reading the file ending:
Tracking issue for:
I used (*document.MultiPage).AddPage()
and (*document.Page).SetPageSize
to create multiple pages in a document.MultiPage
with different sizes. If text is added to these pages with (*graphics.Page).TextStart
, (*graphics.Page).TextSetFont
, (*graphics.Page).TextFirstLine
, (*graphics.Page).TextShow
, and (*graphics.Page).TextEnd
, the text is stretched on each page based on each page's aspect ratio.
func TraverseTree(tree *outline.Tree, level int) {
if tree == nil {
return
}
var num uint32
var ref gopdf.Reference
n := ""
if a, ok := tree.Action["D"]; ok {
arr := a.(gopdf.Array)
ref = arr[0].(gopdf.Reference)
num = ref.Number()
n = strconv.FormatInt(int64(num), 10)
}
fmt.Printf("%s%s:%d:%s:%s\n", strings.Repeat("\t", level), tree.Title, num, n, ref.String()) // Print the Title, or do whatever you want to do with Open/Action
for _, child := range tree.Children {
TraverseTree(child, level+1)
}
}
I just can’t figure out how to draw an image in the place I need, could you please give me an example.
The example shown draws it at the bottom left.
Tracking issue for:
Tracking issue for:
Is there a way to replace text using this library?
Great library, I fall back on ghostscript on pdf corruption
I get a string "fish" panic when invoking content.ForAllText
seehuhn.de/go/pdf/content.ForAllText.func2.1
/Users/foxyboy/go/pkg/mod/seehuhn.de/go/[email protected]/content/extract.go:223
if I use my own code I change line 223 to:
panic(perrors.ErrorfPF("unknown graphics state key: %q", key))
// import "github.com/haraldrudell/parl/perrors"
In a 2006 pdf, “SM” is used
Attached is the erroring pdf
some thoughts:
it is good software engineering practice to only use error values as panic arguments
ErrorfPF contains a stack trace that can be printed using perrors.Long
perrors.Short prints message with a short code reference
error messages should be actionable
it is better to return error rather than panic for anything that is recoverable, such as corrupt file data
The object model is peculiar: generic pdf.Object values are handed to top-level functions
of sub-packages. What value goes with which function is unclear. It is therefore difficult to understand how the library is to be used.
Objects should be based on world concepts, so Document is the top-level
Document then has methods to return number of pages, pages and so forth. In this way, library usage would be obvious.
What in 0.3.6 is different packages should be different types (struct/methods) of the same package. It is important that api is real-world object and usage, even if an internal disposition comes from the pdf specification
strings like “MediaBox” should be const that can be searched for, used by api consumers and aren’t misspelled. I also ran into MediaBox value being Array or Rectangle. With supporting multiple versions, shims is a good approach because it tags code with why it exists
pagetree should only be a separate package if its intended to be used separate from Open. It is better a type of the same package
Note that in Go, methods for the same object can go in different files, so many methods is not a problem
New-functions enhance encapsulation. A New function should not store pointers to its created object even in the object itself, or launch threads or return anything other than a single pointer to the value created
If a function literal ends up being used, in Go that means another struct needs to be created with the literal function as a method
Tracking issue for:
Tracking issue for:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.