Git Product home page Git Product logo

go-pdf's People

Contributors

seehuhn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

conkayyan

go-pdf's Issues

SVG support

It would be really useful to be able to embed an SVG in a PDF.

extract text not working

I tried this

func extractText(fname string) error {
    fd, err := os.Open(fname)
    if err != nil {
        return err
    }
    defer fd.Close()

    r, err := pdf.NewReader(fd, nil)
    if err != nil {
        return err
    }

    contents := reader.New(r, nil)
    contents.Text = func(text string) error {
        fmt.Print(text)
        return nil
    }

    pages := pagetree.NewIterator(r)
    pageNo := 0
    pages.All()(func(_ pdf.Reference, pageDict pdf.Dict) bool {
        fmt.Println("Page", pageNo)

        err := contents.ParsePage(pageDict, matrix.Identity)
        if err != nil {
            log.Fatal(err)
        }

        pageNo++
        return true
    })
    return nil
}

and was hoping this would extract text for machine generated PDFs or PDF with OCR information added - but it prints nothing on all the PDFs I tried. What am I missing?

Text is stretched

I used (*document.MultiPage).AddPage() and (*document.Page).SetPageSize to create multiple pages in a document.MultiPage with different sizes. If text is added to these pages with (*graphics.Page).TextStart, (*graphics.Page).TextSetFont, (*graphics.Page).TextFirstLine, (*graphics.Page).TextShow, and (*graphics.Page).TextEnd, the text is stretched on each page based on each page's aspect ratio.

outline.Tree get pageNum ?

func TraverseTree(tree *outline.Tree, level int) {
if tree == nil {
return
}
var num uint32
var ref gopdf.Reference
n := ""
if a, ok := tree.Action["D"]; ok {
arr := a.(gopdf.Array)
ref = arr[0].(gopdf.Reference)
num = ref.Number()
n = strconv.FormatInt(int64(num), 10)
}
fmt.Printf("%s%s:%d:%s:%s\n", strings.Repeat("\t", level), tree.Title, num, n, ref.String()) // Print the Title, or do whatever you want to do with Open/Action
for _, child := range tree.Children {
TraverseTree(child, level+1)
}
}

fish panic when invoking content.ForAllText

Great library, I fall back on ghostscript on pdf corruption

I get a string "fish" panic when invoking content.ForAllText
seehuhn.de/go/pdf/content.ForAllText.func2.1
/Users/foxyboy/go/pkg/mod/seehuhn.de/go/[email protected]/content/extract.go:223

if I use my own code I change line 223 to:

panic(perrors.ErrorfPF("unknown graphics state key: %q", key))
// import "github.com/haraldrudell/parl/perrors"

In a 2006 pdf, “SM” is used
Attached is the erroring pdf

some thoughts:
it is good software engineering practice to only use error values as panic arguments
ErrorfPF contains a stack trace that can be printed using perrors.Long
perrors.Short prints message with a short code reference
error messages should be actionable

it is better to return error rather than panic for anything that is recoverable, such as corrupt file data

The object model is peculiar: generic pdf.Object values are handed to top-level functions
of sub-packages. What value goes with which function is unclear. It is therefore difficult to understand how the library is to be used.
Objects should be based on world concepts, so Document is the top-level
Document then has methods to return number of pages, pages and so forth. In this way, library usage would be obvious.
What in 0.3.6 is different packages should be different types (struct/methods) of the same package. It is important that api is real-world object and usage, even if an internal disposition comes from the pdf specification

strings like “MediaBox” should be const that can be searched for, used by api consumers and aren’t misspelled. I also ran into MediaBox value being Array or Rectangle. With supporting multiple versions, shims is a good approach because it tags code with why it exists

pagetree should only be a separate package if its intended to be used separate from Open. It is better a type of the same package

Note that in Go, methods for the same object can go in different files, so many methods is not a problem

New-functions enhance encapsulation. A New function should not store pointers to its created object even in the object itself, or launch threads or return anything other than a single pointer to the value created

If a function literal ends up being used, in Go that means another struct needs to be created with the literal function as a method

aspect.pdf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.