j-f-liu / lopdf Goto Github PK
View Code? Open in Web Editor NEWA Rust library for PDF document manipulation.
License: MIT License
A Rust library for PDF document manipulation.
License: MIT License
https://github.com/yurydelendik/pdf.js/raw/5973d40afe5a1f82474438caae71c4039dc3ba84/test/pdfs/bug864847.pdf has a fun example of an LZWDecode filter being used on a ToUnicode CMap
I tried to run the code on the README, and I got 2 errors: One complaining about the dictionary!
macro, and the other complaining about an extra comma.
error: cannot find macro `dictionary!` in this scope
To fix this, I copied the dictionary!
macro from object.rs. Is there a better way to do this?
error: no rules expected the token `,`
--> src/main.rs:27:30
|
27 | "BaseFont" => "Courier",,
| ^
To fix this I just removed the extra comma.
It would be very valuable if lopdf also supported parsing pdf content streams. I'm not sure how easy these are to parse and how well pom would deal with them but it seems like an interesting challenge.
In order to build out signature support, we need the ability to get the underlying bytestream of segments of the file around certain objects. It would be helpful if that was exposed in the library, even if just the ability to get the byte offset of an object in the pdf file.
It gives: thread 'main' panicked at 'called Result::unwrap()
on an Err
value: Custom { kind: InvalidInput, error: StringError("corrupt deflate stream") }', libcore/result.rs:945:5
This seems like a natural addition to the API.
Hi
Is this lib able to scale/resize pdf pages ?
What I need is something where I can take a pdf with multiple pages and resize/scale them so its only A4 and A3 print sizes.
So everything under A4 to A4, between A4 an A3 to A3 and any page ver A3 to A3.
Is there a method to extract pages in a PDF document as images?
Found another error for http://mirrors.ibiblio.org/CTAN/macros/latex/contrib/ksp-thesis/ksp-thesis.pdf which gives:
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (xref_and_trailer).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 267986 }") }
I don't know if this was intentional or not, but when extracting the text from a document, I get to following output in the console:
{"F1": "WinAnsiEncoding", "F2": "WinAnsiEncoding", "F3": "WinAnsiEncoding", "F4": "WinAnsiEncoding"}
{"F1": "WinAnsiEncoding", "F2": "WinAnsiEncoding", "F3": "WinAnsiEncoding", "F4": "WinAnsiEncoding", "F5": "WinAnsiEncoding"}
{"F1": "WinAnsiEncoding", "F2": "WinAnsiEncoding", "F3": "WinAnsiEncoding", "F4": "WinAnsiEncoding", "F5": "WinAnsiEncoding"}
Here is the code doing the printing:
Line 190 in fa3a198
I get a bunch of the following:
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 9137).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 9383 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 24036).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 24282 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 38935).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 39181 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 53834).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 54080 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 68733).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 68979 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 83632).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 83878 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 98531).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 98777 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 113430).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 113676 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 164220).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 164466 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 215010).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 215256 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 265755).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 265914 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 266025).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 266184 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 266970).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 267129 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 275356).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 275515 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 279461).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 279620 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 310100).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 310259 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 310855).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 311014 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 357054).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 357213 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 363298).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 363457 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 369716).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 369875 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 374713).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 374872 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 375742).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 375901 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 379986).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 380145 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 389773).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 389932 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 391382).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 391541 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 404469).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 404628 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 446244).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 446403 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 490353).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 490512 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 538405).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 538564 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 538913).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 539072 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 545760).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 545919 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 547091).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 547250 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 588499).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 588658 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 634527).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 634686 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 638297).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 638456 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 721012).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 721171 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 721633).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 721792 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 805803).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 805962 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 808239).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 808398 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 808676).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 808835 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 820750).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 820909 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 846137).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 846296 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 849787).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 849946 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 850280).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 850439 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 850918).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 851077 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 851558).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 851717 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 852197).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 852356 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 852675).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 852834 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 853163).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 853322 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 853657).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 853816 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 854201).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 854360 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 854680).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 854839 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 855264).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 855423 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 855774).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 855933 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 856291).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 856450 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 856873).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 857032 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 857465).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 857624 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 857948).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 858107 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 858384).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 858543 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 858902).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 859061 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 859556).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 859616 }") }
and then a panic at:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: FromUtf8Error { bytes: [70, 109, 48, 95, 48, 95, 45, 52, 95, 49, 55, 95, 70, 108, 228, 99, 104, 101, 95, 79, 98, 106, 101, 107, 116, 95, 67, 61, 48, 95, 77, 61, 50, 51, 48, 95, 89, 61, 50, 51, 48, 95, 75, 61, 48], error: Utf8Error { valid_up_to: 14, error_len: Some(1) } }', libcore/result.rs:945:5
stack backtrace:
0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::sys_common::backtrace::print
at libstd/sys_common/backtrace.rs:71
at libstd/sys_common/backtrace.rs:59
2: std::panicking::default_hook::{{closure}}
at libstd/panicking.rs:207
3: std::panicking::default_hook
at libstd/panicking.rs:223
4: std::panicking::begin_panic
at libstd/panicking.rs:402
5: std::panicking::try::do_call
at libstd/panicking.rs:349
6: std::panicking::try::do_call
at libstd/panicking.rs:325
7: core::ptr::drop_in_place
at libcore/panicking.rs:72
8: core::result::unwrap_failed
at /Users/travis/build/rust-lang/rust/src/libcore/macros.rs:26
9: <core::result::Result<T, E>>::unwrap
at /Users/travis/build/rust-lang/rust/src/libcore/result.rs:782
10: lopdf::parser::dictionary::{{closure}}::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/lopdf-0.15.1/src/parser.rs:107
11: core::iter::iterator::Iterator::fold::{{closure}}
at /Users/travis/build/rust-lang/rust/src/libcore/iter/iterator.rs:1594
12: core::iter::iterator::Iterator::try_fold
at /Users/travis/build/rust-lang/rust/src/libcore/iter/iterator.rs:1481
13: core::iter::iterator::Iterator::fold
at /Users/travis/build/rust-lang/rust/src/libcore/iter/iterator.rs:1594
14: lopdf::parser::dictionary::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/lopdf-0.15.1/src/parser.rs:105
15: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &'a F>::call_once
at /Users/travis/build/rust-lang/rust/src/libcore/ops/function.rs:252
16: <core::result::Result<T, E>>::map
at /Users/travis/build/rust-lang/rust/src/libcore/result.rs:468
17: <pom::parser::Parser<'a, I, O>>::map::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:34
18: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
19: <pom::parser::Parser<'a, I, O>>::map::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:34
20: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
21: <pom::parser::Parser<'a, I, O> as core::ops::bit::BitOr>::bitor::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:520
22: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
23: <pom::parser::Parser<'a, I, O> as core::ops::arith::Sub<pom::parser::Parser<'b, I, U>>>::sub::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
24: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
25: pom::parser::call::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:426
26: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
27: <pom::parser::Parser<'a, I, O> as core::ops::arith::Add<pom::parser::Parser<'b, I, U>>>::add::{{closure}}::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:453
28: <core::result::Result<T, E>>::and_then
at /Users/travis/build/rust-lang/rust/src/libcore/result.rs:621
29: <pom::parser::Parser<'a, I, O> as core::ops::arith::Add<pom::parser::Parser<'b, I, U>>>::add::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:453
30: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
31: <pom::parser::Parser<'a, I, O>>::repeat::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:129
32: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
33: <pom::parser::Parser<'a, I, O> as core::ops::arith::Mul<pom::parser::Parser<'b, I, U>>>::mul::{{closure}}::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:485
34: <core::result::Result<T, E>>::and_then
at /Users/travis/build/rust-lang/rust/src/libcore/result.rs:621
35: <pom::parser::Parser<'a, I, O> as core::ops::arith::Mul<pom::parser::Parser<'b, I, U>>>::mul::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:485
36: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
37: <pom::parser::Parser<'a, I, O> as core::ops::arith::Sub<pom::parser::Parser<'b, I, U>>>::sub::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
38: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
39: <pom::parser::Parser<'a, I, O>>::map::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:34
40: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
41: <pom::parser::Parser<'a, I, O>>::map::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:34
42: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
43: <pom::parser::Parser<'a, I, O> as core::ops::bit::BitOr>::bitor::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:520
44: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
45: <pom::parser::Parser<'a, I, O> as core::ops::arith::Sub<pom::parser::Parser<'b, I, U>>>::sub::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
46: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
47: pom::parser::call::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:426
48: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
49: <pom::parser::Parser<'a, I, O> as core::ops::arith::Add<pom::parser::Parser<'b, I, U>>>::add::{{closure}}::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:453
50: <core::result::Result<T, E>>::and_then
at /Users/travis/build/rust-lang/rust/src/libcore/result.rs:621
51: <pom::parser::Parser<'a, I, O> as core::ops::arith::Add<pom::parser::Parser<'b, I, U>>>::add::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:453
52: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
53: <pom::parser::Parser<'a, I, O>>::repeat::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:129
54: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
55: <pom::parser::Parser<'a, I, O> as core::ops::arith::Mul<pom::parser::Parser<'b, I, U>>>::mul::{{closure}}::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:485
56: <core::result::Result<T, E>>::and_then
at /Users/travis/build/rust-lang/rust/src/libcore/result.rs:621
57: <pom::parser::Parser<'a, I, O> as core::ops::arith::Mul<pom::parser::Parser<'b, I, U>>>::mul::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:485
58: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
59: <pom::parser::Parser<'a, I, O> as core::ops::arith::Sub<pom::parser::Parser<'b, I, U>>>::sub::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
60: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
61: <pom::parser::Parser<'a, I, O>>::map::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:34
62: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
63: <pom::parser::Parser<'a, I, O> as core::ops::arith::Sub<pom::parser::Parser<'b, I, U>>>::sub::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
64: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
65: <pom::parser::Parser<'a, I, O> as core::ops::arith::Sub<pom::parser::Parser<'b, I, U>>>::sub::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
66: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
67: <pom::parser::Parser<'a, I, O> as core::ops::arith::Sub<pom::parser::Parser<'b, I, U>>>::sub::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
68: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
69: <pom::parser::Parser<'a, I, O> as core::ops::bit::Shr<F>>::shr::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:501
70: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
71: <pom::parser::Parser<'a, I, O>>::map::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:34
72: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
73: <pom::parser::Parser<'a, I, O> as core::ops::bit::BitOr>::bitor::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:520
74: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
75: <pom::parser::Parser<'a, I, O> as core::ops::bit::BitOr>::bitor::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:516
76: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
77: <pom::parser::Parser<'a, I, O> as core::ops::arith::Sub<pom::parser::Parser<'b, I, U>>>::sub::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
78: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
79: <pom::parser::Parser<'a, I, O> as core::ops::arith::Add<pom::parser::Parser<'b, I, U>>>::add::{{closure}}::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:453
80: <core::result::Result<T, E>>::and_then
at /Users/travis/build/rust-lang/rust/src/libcore/result.rs:621
81: <pom::parser::Parser<'a, I, O> as core::ops::arith::Add<pom::parser::Parser<'b, I, U>>>::add::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:453
82: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
83: <pom::parser::Parser<'a, I, O> as core::ops::arith::Sub<pom::parser::Parser<'b, I, U>>>::sub::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
84: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
85: <pom::parser::Parser<'a, I, O> as core::ops::arith::Sub<pom::parser::Parser<'b, I, U>>>::sub::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
86: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
87: <pom::parser::Parser<'a, I, O> as core::ops::arith::Sub<pom::parser::Parser<'b, I, U>>>::sub::{{closure}}
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
88: <pom::parser::Parser<'a, I, O>>::parse
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
89: lopdf::reader::Reader::read_object
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/lopdf-0.15.1/src/reader.rs:139
90: <pom::input::DataInput<'a, T> as pom::input::Input<T>>::position
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/lopdf-0.15.1/src/reader.rs:91
91: lopdf::reader::<impl lopdf::document::Document>::load_internal
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/lopdf-0.15.1/src/reader.rs:38
92: lopdf::reader::<impl lopdf::document::Document>::load
at /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/lopdf-0.15.1/src/reader.rs:19
When I try to read this document: http://mirror.hmc.edu/ctan/macros/latex/contrib/iwhdp/Back_2015.pdf I get:
Not a valid PDF file (prev xref_and_trailer).
Mismatch { message: "expect repeat at least 1 times, found 0 times", position: 3117 }
This document https://www.uspsoig.gov/sites/default/files/document-library-files/2016/RARC-WP-16-001.pdf
gives:
thread 'main' panicked at 'called Result::unwrap()
on an Err
value: Custom { kind: InvalidInput, error: StringError("corrupt deflate stream") }', libcore/result.rs:945:5
Gives StringError("Not a valid PDF file (read object at 0).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 0 }")
Hi, I've tried your library for creating programming language parser, all things seems be fine until I've reached recursion. I have code like this (simplified):
fn expr<'a>(n: AST, c: Context) -> Combinator<impl Parser<'a, u8, Output = AST>> {
(one_of(b"<>=|~,^#_$?@:'") + phrase(c.clone())).map(move |(v, e)| {...} |
empty().map(move |_| n.clone())
}
//
fn phrase<'a>(c: Context) -> Combinator<impl Parser<'a, u8, Output = AST>> {
noun(c.clone()) >> move |n: AST| expr(n, c.clone())
}
//
fn prg<'a>(c: Context) -> Combinator<impl Parser<'a, u8, Output = AST>> {
phrase(c.clone()) - endp()
}
Compiler says: error[E0275]: overflow evaluating the requirement `impl pom::Parser<u8>`...
I've found call()
and comb()
, unfortunately, they don't accept passing additional arguments (like Context and AST in my case).
Maybe, I've missed something?
See https://docs.rs/crate/lopdf/. It looks like an issue building pom-0.6.1
First of all: really nice work!
I am looking for a Rust crate to parse a PDF and extract the content from it.
For example, to extract each text line and metadata from the first page, to get which font style and/or font family belongs to the given line, etc.
I am asking if this crate supports a good depth of extraction (for example, if the crate supports already font family and font size extraction for a single line, or word, things like that...).
Can you give me some information about that please?
Thanks a lot
http://web.archive.org/web/20070317213312/http://www.bottledwater.org/public/pdf/IBWA05ModelCode_Mar2.pdf gives Error { repr: Custom(Custom { kind: InvalidData, error: StringError("Not a valid PDF file (xref_and_trailer).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 116 }") }) }
Hello,
It's difficult to identify what is changing with this library (and as there have been two api changes in as many weeks there's quite a lot to follow).
Would it be possible to add a changelog (ideally for the past few versions)?
When I try to read http://ctan.math.washington.edu/tex-archive/macros/latex/contrib/multibibliography/tug-paper.pdf then I get a panic:
thread 'main' panicked at 'Stream Length should be an integer.', libcore/option.rs:989:5
note: Run with RUST_BACKTRACE=1
for a backtrace.
Thanks for the lopdf
. Does it allow to add password protection?
The current decompression functionality requires mutating the document. It would be nice to have an api that supports decompression without mutating for consumers that just want to read the document.
Another panic for http://ctan.math.washington.edu/tex-archive/macros/latex/contrib/bg/description.pdf
thread 'main' panicked at 'called Result::unwrap()
on an Err
value: FromUtf8Error { bytes: [139], error: Utf8Error { valid_up_to: 0, error_len: Some(1) } }', libcore/result.rs:945:5
I use tectonic to generate a PDF from LaTeX.
The replace_text
method does not seem to work on that generated document.
I recreated the bug here: https://github.com/J-F-Liu/lopdf/compare/master...efx:text-not-replaced?expand=1
If you clone that repo / pull in the branch just run:
cargo run --example replace_text
I am new to rust but can help fix this with some guidance.
Hi,
I want to merge pages from 2 pdfs files into one. I already have such a tool that I wrote in GO but I want to rewrite it in Rust using lopdf.
This is the state of where I am: https://gist.github.com/bn3t/1508f3526bc4ca894f818182bf23e602. It tries to copy page 1 of the input document to the doc document but still produces a white page.
Would you be so kind to indicate the step needed to achieve this?
Trying to load http://www.adobe.com/content/dam/Adobe/en/devnet/font/pdfs/5014.CIDFont_Spec.pdf gives Not a valid PDF file (xref_and_trailer).
Hi, I made a completely new project and copied the example code.
I had to switch to rust nightly because #![feature(field_init_shorthand)]
is not allowed on stable. I compiled and ran the get started guide. What I got was a corrupted PDF (I'll attach it if I can).
I do not know why the PDF is corrupted, sadly. It would be nice if someone could look into it.
I had a quick look but couldn't find anything related to password protection. Does the library support it?
While working on tests for rsvg-convert (a tool that's built with librsvg), I found several issues with the Object::as_datetime() implementation. Maybe I'm just using this incorrectly ot it's some other misunderstanding, But I would like to raise the issues here.
The implementation uses chrono::Local.datetime_from_str(). This method will return an error (ParseError::Impossible)) if the timezone offset in the string doesn't match the local timezone offset. So while this may work for PDF files that are created in the local timezone, it is likely going to fail quite often.
The chrono crate offers another method which is DateTime::parse_from_str():
https://docs.rs/chrono/0.3.0/chrono/datetime/struct.DateTime.html#method.parse_from_str
This method seems more appropriate as it can handle different timezone offsets.
In my opinion it would also make sense to consider changing the return value of Object::as_datetime() to return a DateTime in the UTC offset instead of using Local.
I've run into problems because the PDF I tested did not specify the minutes of the timezone offset. So the CreationDate string looked like this:
D:20200211085039+00'
Instead of the proper
D:20200211085039+00'00'
While this was due to a bug in the library that created the PDF, it still seems valid according to the PDF spec. According to the spec all fields after the year are optional. However the code in Object::as_datetime() will raise ParseError(TooShort) unless the complete datetime string is given. So I think the parser should be changed to deal gracefully with the optional fields missing.
When I read in a large pdf (34 pages), load a page and then try and iterate over the contents I am getting an empty array
Content { operations: [Operation { operator: "x", operands: [] }] }
Could you shed some light on why this may be happening?
#[macro_use]
extern crate lopdf;
use lopdf::content::{Content, Operation};
use lopdf::{Document, Object, Stream};
fn main() {
let mut doc = Document::with_version("1.5");
let pages_id = doc.new_object_id();
let font_id = doc.add_object(dictionary! {
"Type" => "Font",
"Subtype" => "Type1",
"BaseFont" => "Courier",
});
let resources_id = doc.add_object(dictionary! {
"Font" => dictionary! {
"F1" => font_id,
},
});
let content = Content {
operations: vec![
Operation::new("BT", vec![]),
Operation::new("Tf", vec!["F1".into(), 48.into()]),
Operation::new("Td", vec![100.into(), 600.into()]),
//change text to unicode (arabic)
Operation::new("Tj", vec![Object::string_literal("مرحبا بالعالم!")]),
Operation::new("ET", vec![]),
],
};
let content_id = doc.add_object(Stream::new(dictionary! {}, content.encode().unwrap()));
let page_id = doc.add_object(dictionary! {
"Type" => "Page",
"Parent" => pages_id,
"Contents" => content_id,
});
let pages = dictionary! {
"Type" => "Pages",
"Kids" => vec![page_id.into()],
"Count" => 1,
"Resources" => resources_id,
"MediaBox" => vec![0.into(), 0.into(), 595.into(), 842.into()],
};
doc.objects.insert(pages_id, Object::Dictionary(pages));
let catalog_id = doc.add_object(dictionary! {
"Type" => "Catalog",
"Pages" => pages_id,
});
doc.trailer.set("Root", catalog_id);
doc.compress();
doc.save("example.pdf").unwrap();
}
`
and the result is
<img width="1029" alt="Screen Shot 2019-11-16 at 1 36 12 PM" src="https://user-images.githubusercontent.com/169691/68992001-40876680-0876-11ea-8c05-a1f20fbd824a.png">
target/debug/pdfutil print_streams -i "Lifetimes I and II - v0.9.1.pdf"
Open Lifetimes I and II - v0.9.1.pdf
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 0).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 0 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 0).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 0 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 0).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 0 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 0).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 0 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 0).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 0 }") }
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 0).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 0 }") }
This would be more robust. Output can be controlled by users of lopdf crate or by users of crates which use lopdf.
There are nice println!-like macros:
Printing to stdout can hinder for example usage of lopdf in command line utilities. For example when someone create program to print number of pages to stdout and then want to use it in bash script (or as input to other CLI program) which will do something with that number pulled from stdout. This would hardly work for pdf files that triggers some lopdf warning.
It also looks nicer in code println!("Warning: {}", err)
-> warn!("{}", err)
Or at least print to stderr using eprintln! or use conditional compilation and add feature to turn off printing to stdout.
The Filter entry for streams can be an array of filters. i.e. /Filter [/FlateDecode] vs /Filter /FlateDecode
A name tree, according to the PDF 32000-1:2008 specification (7.9.6 Name Trees), is like a dictionary but it may be arbitrarily large, the keys are strings (not name objects) and are ordered, and there are various criteria on the values.
e.g. (from the spec
<</Limits [(Xenon) (Zirconium)]
/Names [(Xenon) 129 0 R
(Ytterbium) 130 0 R
(Yttrium) 131 0 R
(Zinc) 132 0 R
(Zirconium) 133 0 R
]
>>
A number tree, according to the PDF 32000-1:2008 specification (7.9.7 Number Trees), is 'similar to a name tree' except the keys are integers, sorted in ascending numerical order.
};
/Nums [ 0 << \S \r >>
4 << \S \D >>
7 << \S \D
\P (A-)
\St 8
>>
]
They'll probably need an extension to the Object
enum.
Should the library support this?
edit: whoops, accidentally submitted the issue part way through writing it.
I've done a bit of comparision benchmarking for extracting URLs from PDF files.
Test file: PDF 1.7 specification https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
(This file is encrypted with an empty password, that is another feature I would like to bring to lopdf. My code using lopdf cannot yet unscramble strings.)
Using PyPDF2, the process takes 2.7s while just loading the file takes 43s using lopdf. PyPDF2 has some stability / looping issues that are a no-go for me. I would like to improve lopdf performance by a lot.
On load, most of the time is spent in the pom parser in parser.rs.
I see three approches that would be workable:
From a design perspective, what would be the approaches that make the most sense to you ? I have some time to spend on improving lopdf over the coming weeks.
I recently started fuzzing this crate and have found a few crashes. First two issues I've found is a stack overflow and a panic from subtraction overflow, examples are attached.. As I continue to find new issues I'll add them as comments on this issue.
stackoverflow.pdf
subtractoverflow.pdf
The test I'm using to open these files and cause the crash is simply:
use lopdf::Document;
#[test]
fn f1(){
let _ = Document::load("stackoverflow.pdf");
}
Stream content (the Vec<u8>
) does not get written if it is inside another lopdf::Object.
The length gets calculated and written (but the length is wrong, maybe it overflows a certain buffer?), but the following stream corrupts the PDF document and doesn't get handled correctly.
Example - this works correctly (stream as reference):
use lopdf::{Document as LoDocument, Dictionary as LoDictionary, Object as LoObject, Stream as LoStream};
use lopdf::Object::*;
let problematic_text = "<xml>test</xml>".to_string()
let mut doc = LoDocument::new();
let stream = Stream(LoStream::new(LoDictionary::from_iter(vec![
("Type", "Metadata".into()),
("Subtype", "XML".into()), ]),
problematic_text.as_bytes().to_vec() ));
let catalog = Dictionary(LoDictionary::from_iter(vec![
("Metadata", Reference(doc.add_object(stream)) )]));
doc.add_object(catalog);
doc.save("working.pdf");
The above works and the length gets calculated correctly.
Now let's try putting the stream into the catalog as a direct object instead of a reference:
let mut doc = LoDocument::new();
let catalog = Dictionary(LoDictionary::from_iter(vec![
("Metadata", Stream(LoStream::new(
LoDictionary::from_iter(vec![
("Type", "Metadata".into()),
("Subtype", "XML".into()), ]),
problematic_text.as_bytes().to_vec()
))
)]));
This will corrupt the PDF (something with the / and \ is off). It will not create a string and calculate a wrong "Length" for the stream.
I am working on why this is an issue. It's merely inconvenient, I couldn't explain myself why the content doesn't get written.
For reference, here is the full context in which I discovered the issue: https://github.com/sharazam/printpdf/blob/master/src/api/types/pdf_document.rs#L156-L184
It's a bit too big to put it directly in an issue (the repository compiles). xmp_metadata
is basically this file with the necessary fields filled out (yes, I know, UTF-8 weirdness). The important part is Line 156. If you copy-paste the stream inside there, the PDF will get written without error, but it will be corrupt.
Hi there, how to use lopdf to convert a pdf document to markdown format?
Example pdf file: bi.pdf
Content stream contains:
100 0 0 100 0 0 cm
BI /W 4 /H 4 /CS /RGB /BPC 8
ID
00000z0z00zzz00z0zzz0zzzEI aazazaazzzaazazzzazzz
EI
There is chapter 4.8.6 about inline images in pdf reference.
extern crate lopdf;
fn main()
{
let doc = lopdf::Document::load("bi.pdf").unwrap();
let cont = doc.get_and_decode_page_content(doc.get_pages()[&1]);
println!("{:#?}", cont);
}
Ok(
Content {
operations: [
Operation {
operator: "cm",
operands: [
100,
0,
0,
100,
0,
0,
],
},
Operation {
operator: "BI",
operands: [],
},
Operation {
operator: "ID",
operands: [
/W,
4,
/H,
4,
/CS,
/RGB,
/BPC,
8,
],
},
Operation {
operator: "z",
operands: [
0,
],
},
Operation {
operator: "z",
operands: [
0,
],
},
Operation {
operator: "zzz",
operands: [
0,
],
},
Operation {
operator: "z",
operands: [
0,
],
},
Operation {
operator: "zzz",
operands: [
0,
],
},
Operation {
operator: "zzzEI",
operands: [
0,
],
},
Operation {
operator: "aazazaazzzaazazzzazzz",
operands: [],
},
Operation {
operator: "EI",
operands: [],
},
],
},
)
To handle this properly it is needed to calculate size of decoded image data from parameters like width, height, bit per component, color space and decode using filters (note "EI " byte sequence in middle of image data, there can be any byte sequence). Unfortunately there is no required "Length" key which could be used to skip stream data like in normal pdf streams.
Also this affects other functionality of lopdf which depends on content decoding like text extraction. For example there can be false positive "Tj" inside image. Or in some circumstances could lopdf return error maybe when byte sequence in image data is not valid UTF-8 string and so on.
I tried the below:
use lopdf::{Document, Object, Stream};
fn main() {
println!("Hello, world!");
let doc = Document::load("HasanResume.pdf").unwrap();
println!("version: {:#?}, trailer: {:#?}", &doc.version, &doc.trailer);
let pages = doc.get_pages().iter()
.map(|page| println!("pagenum: {:#?}, pageid: {:#?}",
page.0, page.1)).count();
println!("{}", pages);
}
But got the below:
Finished dev [unoptimized + debuginfo] target(s) in 1.13s
Running `target/debug/read-pdf`
Hello, world!
version: "1.7", trailer: <</Info 92 0 R/Root 1 0 R/Size 93>>
0
let mut doc = Document::load("bbb.pdf").unwrap();
for page_id in doc.page_iter() {
let x = doc.get_and_decode_page_content(page_id).unwrap();
let y = x.operations;
for i in y.iter() {
if i.operator == "Tj".to_string() {
let i2 = &i.operands[0];
println!("{:?}", i2);
} else {
// println!("{:?}", i.operator);
}
}
}
打印出来的值是一堆乱码,请问要怎么转换为正常字符?类似下面这样的
(�N)
(�N)
(�Q)
(��)
(�Y)
(�Q)
(�T)
(�N)
(�F)
(��)
(��)
(��)
(�_)
Caradoc is a parser and validator of PDF files written in OCaml. Caradoc provides many commands to analyze PDFs, as well as an interactive user interface in console. They have an interesting set of the PDF files, which can make a good testcases for you
The test files are here: https://github.com/caradoc-org/caradoc/tree/master/test_files
See more information in their presentation:
Is it currently possible to edit running content, aka footer and header?
I can't find any suitable method in the documentation.
If it's already possivle, maybe you could provide a convenience method for it?
In this pdf http://unpan1.un.org/intradoc/groups/public/documents/NISPAcee/UNPAN004710.pdf I get a TJ operator with no operands because the operands are in the previous object.
Loading https://github.com/isocpp/CppCoreGuidelines/raw/master/docs/Lifetimes%20I%20and%20II%20-%20v0.9.1.pdf gives an empty result from get_pages(). Perhaps it has to do with object streams?
In my project, I need to split a pdf file by the outlines. Is it possible to do it with lopdf and how if so? Thank you very much!
Trying to parse this file crashes with a Stream Length should be an integer
error. It seems to be caused by the figure included using the LaTeX svg package.
Here's the full stack trace:
thread 'main' panicked at 'Stream Length should be an integer.', libcore/option.rs:914:5
stack backtrace:
0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::sys_common::backtrace::print
at libstd/sys_common/backtrace.rs:71
at libstd/sys_common/backtrace.rs:59
2: std::panicking::default_hook::{{closure}}
at libstd/panicking.rs:205
3: std::panicking::default_hook
at libstd/panicking.rs:221
4: <std::panicking::begin_panic::PanicPayload<A> as core::panic::BoxMeUp>::get
at libstd/panicking.rs:457
5: std::panicking::try::do_call
at libstd/panicking.rs:344
6: std::panicking::try::do_call
at libstd/panicking.rs:322
7: <&'a T as core::fmt::Display>::fmt
at libcore/panicking.rs:71
8: core::ptr::drop_in_place
at libcore/option.rs:914
9: alloc::raw_vec::alloc_guard
at /Users/travis/build/rust-lang/rust/src/libcore/option.rs:302
10: lopdf::parser::stream::{{closure}}
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/lopdf-0.15.1/src/parser.rs:114
11: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:501
12: lopdf::encodings::bytes_to_string::{{closure}}
at /Users/travis/build/rust-lang/rust/src/libcore/result.rs:621
13: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:501
14: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
15: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:34
16: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
17: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:520
18: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
19: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:516
20: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
21: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
22: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
23: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:453
24: lopdf::encodings::bytes_to_string::{{closure}}
at /Users/travis/build/rust-lang/rust/src/libcore/result.rs:621
25: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:453
26: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
27: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
28: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
29: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
30: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
31: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:469
32: <pdf_word_count::WordCount as core::default::Default>::default
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/pom-1.1.0/src/parser.rs:23
33: lopdf::reader::Reader::read_object
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/lopdf-0.15.1/src/reader.rs:139
34: <std::collections::hash::map::RandomState as core::hash::BuildHasher>::build_hasher
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/lopdf-0.15.1/src/reader.rs:91
35: lopdf::reader::<impl lopdf::document::Document>::load_internal
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/lopdf-0.15.1/src/reader.rs:38
36: lopdf::reader::<impl lopdf::document::Document>::load_from
at /Users/ek/.cargo/registry/src/github.com-1ecc6299db9ec823/lopdf-0.15.1/src/reader.rs:26
37: pdf_word_count::Collector::process_document
at ./src/lib.rs:27
38: pdf_wc::main
at src/main.rs:23
39: std::rt::lang_start::{{closure}}
at /Users/travis/build/rust-lang/rust/src/libstd/rt.rs:74
40: std::panicking::try::do_call
at libstd/rt.rs:59
at libstd/panicking.rs:304
41: panic_unwind::dwarf::eh::read_encoded_pointer
at libpanic_unwind/lib.rs:105
42: std::sys_common::cleanup
at libstd/panicking.rs:283
at libstd/panic.rs:361
at libstd/rt.rs:58
43: std::rt::lang_start
at /Users/travis/build/rust-lang/rust/src/libstd/rt.rs:74
44: pdf_wc::main
I can't publish my own library if it depends on a github repository (required by cargo, because this repo could be deleted). Please publish a new version. Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.