Git Product home page Git Product logo

librepdf / openpdf Goto Github PK

View Code? Open in Web Editor NEW
3.3K 3.3K 564.0 38.22 MB

OpenPDF is a free Java library for creating and editing PDF files, with a LGPL and MPL open source license. OpenPDF is based on a fork of iText. We welcome contributions from other developers. Please feel free to submit pull-requests and bugreports to this GitHub repository.

License: Other

Java 99.79% HTML 0.18% Shell 0.03%
hacktoberfest itext java openpdf pdf pdf-generation

openpdf's People

Contributors

albfernandez avatar andreasrosdal avatar asturio avatar bengolder avatar brooklyn-0 avatar bsanchezb avatar chappyer avatar codecracker2014 avatar daviddurand avatar dependabot[bot] avatar epic0000 avatar fellmann avatar kindrat avatar lapo-luchini avatar lonzak avatar mluppi avatar mrestivill avatar noavarice avatar obsismc avatar prashantbhat avatar roschdahl avatar rosdal avatar sixdouglas avatar syakovyn avatar tlxtellef avatar ubermichael avatar v-f avatar vk-github18 avatar wugengxian avatar ymasory avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openpdf's Issues

Limited functionality under Google App Engine

Hi, is there a way to get rid of the dependency on java.awt.Color? It's not white listed on GAE, so setting cell colors, etc. will not work. This library is amazing otherwise. I've tried PDFBox and it is years from where this library is.

PAdES signatures support

Nullpointer on Font create

in class: BaseFont
Line: 856

You always get null if you do not cache the font (cache = false).
Lines before will put it inside the cache.
You just habe to return the var here.

Broken retro-compatibility

Commit ae40ae2 changed the signature of PdfSignatureAppearance.setCrypto removing some arrays.
Unfortunately the diff can't be seen on GitHub because the file was dropped and re-added, but here's a part of the diff:

--- PdfSignatureAppearance-fdb76b2.java 2018-03-29 18:15:23.827579000 +0200
+++ PdfSignatureAppearance-ae40ae2.java 2018-03-29 18:15:30.675425000 +0200
@@ -247,36 +271,65 @@
     
     /**
      * Sets the cryptographic parameters.
-     * @param privKey the private key
+   * 
-     * @param certChain the certificate chain
+   * @param privKey
+   *          the private key
+   * @param certificate
+   *          the certificate
+   * @param crl
-     * @param crlList the certificate revocation list. It may be <CODE>null</CODE>
+   *          the certificate revocation list. It may be <CODE>null</CODE>
-     * @param filter the crytographic filter type. It can be SELF_SIGNED, VERISIGN_SIGNED or WINCER_SIGNED
+   * @param filter
+   *          the cryptographic filter type. It can be SELF_SIGNED,
+   *          VERISIGN_SIGNED or WINCER_SIGNED
      */    
-    public void setCrypto(PrivateKey privKey, Certificate[] certChain, CRL[] crlList, PdfName filter) {
+  public void setCrypto(PrivateKey privKey, X509Certificate certificate,
+      CRL crl, PdfName filter) {
         this.privKey = privKey;
-        this.certChain = certChain;
+    this.certificate = certificate;
-        this.crlList = crlList;
+    this.crl = crl;
         this.filter = filter;
     }

As far as the change goes I guess it's fine, because the array was never used for anything more than [0] but it breaks binary compatibility with itext-4.2.0 for no real reason.
I would suggest adding a method such as:

  /**
   * Sets the cryptographic parameters.
   * @deprecated use {@link #setCrypto(PrivateKey, X509Certificate, CRL, PdfName)}
   */
  public void setCrypto(PrivateKey privKey, Certificate[] certChain, CRL[] crlList, PdfName filter) {
    setCrypto(privKey, (X509Certificate) certChain[0], crlList != null ? crlList[0] : null, filter);
  }

Tell me if you'd like a PR for that.

NullPointerException due to missing trailer (on bad startxref?)

In 1.0.5, we got a NullPointerException with the following stacktrace while trying to read a PDF:

PdfReader.java:1112 - com.lowagie.text.pdf.PdfReader.readPages
PdfReader.java:622 - com.lowagie.text.pdf.PdfReader.readPdf
PdfReader.java:282 - com.lowagie.text.pdf.PdfReader.
PdfReader.java:295 - com.lowagie.text.pdf.PdfReader.

Based on the line numbers, trailer must be null. Tracing through the execution, this can happen in the following sequence of events:

  1. readPdf calls readXref, which is supposed to set trailer (among other things).
  2. readXref doesn't find a valid startxref and throws an exception, or readXrefSection throws an exception due to an invalid xref.
  3. readPdf catches the exception and calls rebuildXref
  4. That method tries to set the trailer, too, but it can return without actually setting it.
  5. readPdf proceeds to readPages, trailer is unset, and we get an NPE.

I'm not sure what the proper fix would be, though. Should one of the caught exceptions instead bubble out of readPdf? Should rebuildXref set trailer to an empty PdfDictionary if it doesn't find the actual trailer?

Sorry, I'm pretty ignorant about the PDF format in general. This report is just based on working through this exception's execution path.

DocumentException should be unchecked

DocumentException extends Extension that makes it checked exception.

Basically, it's useless to force to catch it and making it harder to use (e.g. in lambda expressions).

Duplicate entry

Hi,

it looks like XmlDomWriter in openpdf dependency is the same as XmlDomWriter in pdf-xml dependency.
Both of them has same package name and also content.
I have a dependency on pdf-html sources in my android procject.
compile 'com.github.librepdf:pdf-html:1.0.1'
When i try to build my project i get

Error:Execution failed for task ':touchPoApp:transformClassesWithJarMergingForTstingDebug'.
> com.android.build.api.transform.TransformException: java.util.zip.ZipException: duplicate entry: com/lowagie/text/xml/XmlDomWriter.class

Make Travis use JDK 7.

This includes testing that it works as intended in regards to Java 7 code and dependency compatibility.

Bouncy Castle is not optional

A simple test which only instantiate a PdfReader on a empty PDF requires Bouncy Castle in the classpath. The pom declare it optional in the manifest.
This problem does not happen with the iText version at ymasory/iText-4.2.0.

Problem In Signing of Pdf with externalSiging service (eSign)

Hi ,

we want to sign the pdf with external signature provided by esign service.we have used openpdf 1.0.1..
the problem is we are unable to calculate the exclusion size of signature appearance before preclose..
please find below code -

	byte[] signeddata = null;
	PdfSignatureAppearance pdfSigApp=null;
	File destFile=null;
	PdfReader reader=null;
	 ByteArrayOutputStream arrayOutputStream = new ByteArrayOutputStream();
	
	reader = new PdfReader(signingHelper.getSrc());
	destFile = new File(signingHelper.getDest());
   
	OutputStream os = arrayOutputStream;
	
	PdfStamper pdfStamper = PdfStamper.createSignature(reader,os, '\0',null, true);
	pdfSigApp = pdfStamper.getSignatureAppearance();

	

	pdfSigApp.setVisibleSignature("SignatureField1[0]");// for existing
												// signaure feild
	SimpleDateFormat dt = new SimpleDateFormat("dd-MMM-yyyy");
	String formatedDate = dt.format(new Date());													// pass name of
														// field
	
	pdfSigApp.setLayer2Text("Digitally Signed" + "\nReason: " + signingHelper.getReason()
	+"\nDate: "+formatedDate+"\nLocation: " + signingHelper.getLocation());
	Font font = new Font();
	//font.setColor(Color);

	font.setSize(9);
	pdfSigApp.setLayer2Font(font);
	

	pdfSigApp.setLocation(signingHelper.getLocation());
	// pdfSigApp.set
	PdfSignature sigDic = new PdfSignature(PdfName.ADOBE_PPKMS, PdfName.ADBE_PKCS7_DETACHED);
	
	sigDic.put(PdfName.FT, PdfName.SIG);
	sigDic.setReason(signingHelper.getReason());
	sigDic.setLocation(signingHelper.getLocation());

	
	pdfSigApp.setCryptoDictionary(sigDic);
	------------------------------
	HashMap exclusions = new HashMap();
	
	
	exclusions.put(PdfName.CONTENTS,new Integer(7622));  //== Here is problem how to  caclucate this Value ?
	

	pdfSigApp.preClose(exclusions);
	LOGGER.info("exclusions:"+exclusions);
	
	String hashPdf = generateSha256HashInHexForPdf(pdfSigApp.getRangeStream());

	//sending this hash to external service for sining----geting pkcs7 signature in response..
			
		signeddata = Base64.getDecoder().decode(pkcs7Signature);
		
		byte out[] = new byte[signeddata.length];
		System.arraycopy(signeddata, 0, out, 0, signeddata.length);
		updates.put(PdfName.CONTENTS, new PdfString(out).setHexWriting(true));
		pdfSigApp.close(updates);
		
		reader.close();
		FileOutputStream fileOutputStream = new FileOutputStream(destFile);
		fileOutputStream.write(arrayOutputStream.toByteArray());
		fileOutputStream.close(); 

please provide suggestions or help in above code...

Thanks,
Arjun

Recent changes to PdfArray broke the Kids field

Recent changes in PdfArray.java broke the pdf creation. PdfPages.writePageTree creates empty "Kids" field. Maybe some other code parts affected. Reverting getArrayList (and getElements) to returning internal list fixes the issue.
There is pull request #80 on just the same issue (partially)

Compilation error on Java 7

The master branch of OpenPDF doesn't compile with Java 7. If Java 8 now is a requirement, then it would be nice if the README could be updated to show that OpenPDF now requires Java 8.

This is the compilation error I get with Java 7:

[ ERROR] /C:/OpenPDF-master/pdf-html/src/test/java/com/lowagie/text/html/simpleparser/FactoryPropertiesTest.java:[24,19] cannot access java.util.stream.Stream
class file for java.util.stream.Stream not found
[INFO] 1 error
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] OpenPDF - Free and Open PDF ........................ SUCCESS [ 1.030 s]
[INFO] openpdf ............................................ SUCCESS [ 31.981 s]
[INFO] pdf-xml ............................................ SUCCESS [ 2.155 s]
[INFO] pdf-rtf ............................................ SUCCESS [ 7.805 s]
[INFO] pdf-html ........................................... FAILURE [ 0.687 s]
[INFO] pdf-swing .......................................... SKIPPED
[INFO] pdf-toolbox ........................................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 45.628 s
[INFO] Finished at: 2017-10-24T12:38:45+02:00
[INFO] Final Memory: 41M/613M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:testCompile (default-testCompile) on project pdf-html: Compilation failure
[ERROR] /C:/OpenPDF-master/pdf-html/src/test/java/com/lowagie/text/html/simpleparser/FactoryPropertiesTest.java:[24,19] cannot access java.util.stream.Stream
[ERROR] class file for java.util.stream.Stream not found

Improper handling of line-height by HTML to PDF parser

Hi,

I've prepared the test example showing an issue: https://gist.github.com/syakovyn/6ead2da4f00716b25a4803e36b64bb90

The attached PDF shows a workaround that fixes the paragraph leading and the paragraphs without a fix test.pdf

The issue is in com.lowagie.text.html.simpleparser.FactoryProperties#insertStyle(java.util.HashMap, com.lowagie.text.html.simpleparser.ChainedProperties) that doesn't account for a case when line-height is a number, e.g. "line-height:1.3".

Serhiy

Manual is missing

I want to evaluate this fork and I miss a manual / simple demo applications. I tried to search for iText tutorials but most of them does not compile with OpenPDF. So I suggest that demos / wiki would be nice for newcommers.

Btw is there some chat for a discussion with OpenPDF developers?

Unicode Characters

Hi everyone,

i'm new to your repo. I'm trying to print a special unicode character \u25b2 which would be a triangle in the Times new Roman font...

unfortianaly it is ignroed when looking at the pdf
private static final Font NORMALFONT = new Font(Font.TIMES_ROMAN, 7, Font.NORMAL, Color.black);
image

and the result

image
where there should be this triangle before every number

Do i do something wrong here?

New behaviour of how text is extracted from a page

Hi everyone

I just updated from the original iText 4.2.0 (https://github.com/ymasory/iText-4.2.0) to your OpenPDF 1.0.5. So far, it works fine, but I mentioned a change to the behaviour how text is extracted from a pdf.
With the previous version, the text has been extracted via PdfTextExtractor.getTextFromPage(i) as "plain text", now I get every word surrounded by markup tags.

For example:
before:
Hello

after:
<br class='t-pdf' /><span class="t-word" style="bottom: 81.79%; left: 56.18%; width: 17.45%; height: 0.83%;" id="word7">Hello</span>

I found out, that this change has been made by the following fork respectively the following change:
kulatamicuda/iText-4.2.0@7d7c218#diff-b2e0f949a7f5d2e581f63cedf5f30922

Is there a way to get the old behaviour without using the old "SimpleTextExtractingPdfContentRenderListener" class? I don't want to integrate old code because of maintainability...

Thanks in advance!
M.T.

P.S.: I know, this change has been made by another repository, but the original repository has not been updated since at least 3 years...

Possibility to customize "producer"-Flag in PDF-Metadata

I would like to change the producer-flag in the meta-data of a generated PDF-file.
Which way of doing that would you prefer?

  • Exposing an API for Metadata-Manipulation?
  • A protected Method for metadata-processing that can be overridden in a custom class?
  • Any other option?
    Please let me know which way you would prefer from an architectural perspective.
    I can then try to implement it and creating a pull request.

Thanks in advance!

Issue with subsetting on OTF/CFF fonts

I use OpenPDF in flying saucer to generate PDFs from HTML and I've run into a problem that I cannot use fonts such as NotoSansCJKjp (an OTF/CFF font) because CFF font subsetting does not work correctly.

The subsequent PDF output is broken in Acrobat, stating that the embedded font cannot be extracted. It does work in other readers, but I believe that is because they are more lenient than Acrobat on this issue, but unfortunately using another reader is not an option.

I created a fork of OpenPDF and turned off font subsetting entirely and the output works fine, but obviously this is also not an real solution, because these fonts can be quite large and this particular font results in a minimum PDF size of 12MB, so the problem compounds when using multiple fonts.

I inspected the PDF output with PDFBox's preflight and it errors with "Font DICT invalid without "Private" entry", which does indeed seem to point again to the subsetting being broken, not including a private section in each font dict, which would explain why Acrobat is falling over as well.

I did my best to try and fix this myself, but I've not made much headway so I thought I would reach out to the community and see if anyone has the necessary experience with CFF font subsetting in order to fix this issue.

Thanks

OpenPDF makes no distinctions between reading password vs editing password

I have a PDF that is password protected for editing (PDF/A compatible PDF), but can be read without password. If you open Acrobat Reader, go to File -> Properties -> Security -> Show Details.., you can see that there are actually two passwords possible and only for editing it is enabled. Acrobat can even force the PDF to be editable without password, losing the PDF/A compatibility in the process. So either way, a password protected PDF should be viewable in OpenPDF.

OpenPDF detects that the document is encrypted, but since I don't have a password it fails the following check:

public final boolean isOpenedWithFullPermissions() {
  return !encrypted || ownerPasswordUsed;
}

I can open the PDF in other readers just fine as long as I don't enable editing mode. In OpenPDF I would expect something like the following check instead:

public final boolean isOpenedWithFullPermissions() {
  return !encrypted || ownerPasswordUsed || (!pdfRequiresReadingPassword && readonlyMode);
}

If I force this method to return true, it actually is able to read the PDF without issues (this is my current workaround, unfortunately).

Update README

  • Update links to refer to this new repository location (LibrePDF/OpenPDF)
  • Document recent changes

PDF Metadata producer is always "OpenPDF 1.0.0-SNAPSHOT"

I have found this code in class "com.lowagie.text.Document" :

private static final String OPENPDF = "OpenPDF";
private static final String RELEASE = "1.0.0-SNAPSHOT";
private static final String OPENPDF_VERSION = OPENPDF + " " + RELEASE;

This will be great if the version was read from a property file, automatically updated at Maven build phase with resource filtering.

Image.getInstance: mono PNG with color ICC profile displays wrong

Although it doesn't make a lot of sense to me, a monochrome PNG (1 component) might have a color ICC profile (3 components). One way to create a file like that is to use GhostScript and ImageMagick:

  • Start with a black-and-white text PDF
  • Use GhostScript to render to color PNG: gs -sDEVICE=png256 -o test.png test.pdf
  • Use ImageMagick to convert to B&W PNG: convert test.png test2.png

Here's such an image. It displays fine in any browser:
test2.png

But when importing it into a PDF with Image.getInstance, it displays incorrectly, because the raster is 1-component but the /ColorSpace is 3-component:
out.pdf

This problem occurs with PNGs, but not with TIFFs (the TiffImage class ignores an ICC if its getNumComponents() doesn't match).

FontAwesome icons bundled in openpdf

Hi I got this idea from Vaadin:

Could FontAwesome icons be also added similar way to openpdf?

https://github.com/vaadin/framework/blob/7.7/scripts/generateFontAwesomeEnum.sh
https://github.com/vaadin/framework/blob/7.7/server/src/main/java/com/vaadin/server/FontAwesome.java
https://github.com/vaadin/framework/blob/7.7/server/src/main/java/com/vaadin/server/FontAwesome.java#L773
https://github.com/vaadin/framework/blob/7.7/server/src/main/java/com/vaadin/server/GenericFontIcon.java#L94

So if openpdf supports html evaluation this should be doable?

I've used some years ago commercial one (iText) with xhtml,css pipelines but don't know how exactly html is converted to pdf with openpdf if it is possible?

ref: http://fontawesome.io/

Unable to add Group3/Group4 TIFFs into Version 1.2.0

Hello, when trying to add a Group3 or Group4 TIFF image into a PDF in release 1.2.0, there is an exception thrown by the underlying sanselan library for "unknown compression":

ExceptionConverter: org.apache.sanselan.ImageReadException: Tiff: unknown compression: 4

	at org.apache.sanselan.formats.tiff.datareaders.DataReader.decompress(DataReader.java:135)
	at org.apache.sanselan.formats.tiff.datareaders.DataReaderStrips.readImageData(DataReaderStrips.java:96)
	at org.apache.sanselan.formats.tiff.TiffImageParser.getBufferedImage(TiffImageParser.java:505)
	at org.apache.sanselan.formats.tiff.TiffDirectory.getTiffImage(TiffDirectory.java:163)
	at org.apache.sanselan.formats.tiff.TiffImageParser.getBufferedImage(TiffImageParser.java:441)
	at com.lowagie.text.ImageLoader.getTiffImage(ImageLoader.java:163)
	at com.lowagie.text.Image.getInstance(Image.java:363)

This was working previously in version 1.0.5, I presume because the method by which TIFFs were read has changed.

My initial research indicates that Group3/Group4 support was added to some later version of the sanselan/commons-imaging project, but unsure of how stable these releases are.

Provide CSS resolver

If you have an HTML contains <style> tags in the head, the HTMLWorker cannot parse them, instead, it generates the style tags into the PDF file. Can we support inline styles like iText5+?


        OutputStream outputStream = new FileOutputStream(optionalPath);        
        Document document = new Document(PageSize.A4, 30, 30, 30, 30);
        PdfWriter w = PdfWriter.getInstance(document, outputStream);
        HTMLWorker worker = new HTMLWorker(document);
        document.open();
        worker.parse(new StringReader(HTMLUtil.getLongContent()));

        worker.close();
        document.close();
        w.close();

Unable to parse HTML table with whitespace inside it

Document doc1 = new Document();
doc1.open();
HtmlParser.parse(doc1, new StringReader("<table><tr><td>test</td></tr></table>")); // succeeds

Document doc2 = new Document();
doc2.open();
HtmlParser.parse(doc2, new StringReader("<table> <tr><td>test</td></tr></table>")); // fails


The last line throws this exception:

Exception in thread "main" java.lang.ClassCastException: com.lowagie.text.Table cannot be cast to com.lowagie.text.TextElementArray
	at com.lowagie.text.xml.SAXiTextHandler.handleStartingTags(SAXiTextHandler.java:229)
	at com.lowagie.text.html.SAXmyHtmlHandler.startElement(SAXmyHtmlHandler.java:206)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:509)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1359)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2784)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
	at com.lowagie.text.html.HtmlParser.go(HtmlParser.java:85)
	at com.lowagie.text.html.HtmlParser.parse(HtmlParser.java:190)
	at com.example.PDF.main(PDF.java:17)

pom.xml

<dependency>
	<groupId>com.github.librepdf</groupId>
	<artifactId>openpdf</artifactId>
	<version>1.0.5</version>
</dependency>
<dependency>
	<groupId>com.github.librepdf</groupId>
	<artifactId>pdf-html</artifactId>
	<version>1.0.5</version>
</dependency>

Calls to String.toLowerCase(), and friends should be checked for proper use of locales

Case-folding is used for various pieces of data and metadata used in PDF syntax (perhaps also in user content). Those transformations are based on Adobe's general assumption of the use of US_ASCII/ISO-Latin-1 encoding. As noted in the comment on pull request #76 This fails for (at least) Turkish locales, where Capital I can fold to a non-Latin-1 lowercase dotless i (ฤฑ).

Where PDF syntax is being processed the ROOT (no-language) locale should be used. Each case has to be examined, at least superficially, to determine if:

  1. the System locale should be use (e.g. for filenames)
  2. The ROOT local should be used, as discussed.
  3. A locale defined in the PDF itself needs to be used, as transformations are being performed on the content streams. (I do not know for sure that there are any such cases at this point).

Error while retrieving text from pdf with an empty page

While reading text per page of a Pdf had had some issues when it had a blank page. Any other contents were loaded just fine

java.lang.NullPointerException at com.lowagie.text.pdf.parser.PdfTextExtractor.getContentBytesFromContentObject(PdfTextExtractor.java:157)
  at com.lowagie.text.pdf.parser.PdfTextExtractor.getContentBytesForPage(PdfTextExtractor.java:138)
  at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:223)
  at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:199)

Setup Travis CI automatic tests

We should setup automatic builds on Travis CI for this project.

(This requires organization access to the LibrePDF organization)

Develop version - SNAPSHOT?

I noticed that all of the code in the master branch has 1.0 as a version number in the pom.xml files.

Shouldn't we follow the Maven standard of using SNAPSHOT versions, and use the full version number only for the released version? That would mean changing the 1.0 in the pom files to 1.1-SNAPSHOT or 1.0.1-SNAPSHOT.

Since I'm preparing a couple of pull requests for this project, I'd like to understand how this is handled in OpenPDF. Happy to also create a PR for adjusting the version numbers - just let me know.

Change package name to com.github.librepdf

Since the maven groupId and the java package name of the library have nothing in common now, wouldn't it be better to move all class from com.lowagie to com.github.librepdf / com.github.librepdf.openpdf?

Some input files use or override a deprecated API

I get these warnings of "Some input files use or override a deprecated API" when compiling the latest version of OpenPDF:

[INFO] --- maven-compiler-plugin:3.2:compile (default-compile) @ openpdf ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 344 source files to C:\Users\andreas\librepdf\OpenPDF\openpdf\target\classes
[INFO] /C:/Users/andreas/librepdf/OpenPDF/openpdf/src/main/java/com/lowagie/text/pdf/PdfPKCS7.java: Some input files use or override a deprecated API.
[INFO] /C:/Users/andreas/librepdf/OpenPDF/openpdf/src/main/java/com/lowagie/text/pdf/PdfPKCS7.java: Recompile with -Xlint:deprecation for details.
[INFO] /C:/Users/andreas/librepdf/OpenPDF/openpdf/src/main/java/com/lowagie/text/pdf/PdfReader.java: Some input files use unchecked or unsafe operations.
[INFO] /C:/Users/andreas/librepdf/OpenPDF/openpdf/src/main/java/com/lowagie/text/pdf/PdfReader.java: Recompile with -Xlint:unchecked for details.

Assertj dependency should have test scope

Openpdf version 1.0.4 has assertj-core as a compile scope dependency (inherited from openpdf-parent). I believe this dependency should have test scope.

By the way: Thanks for providing OpenPDF. ๐Ÿ‘

Issue closed: LGPL license

The readme says that this fork is based on iText 4, but to be precise:

  • iText 2.1.7 (7 Jul 2009) was the last MPL/LGPL release by iText Software.
  • 4.2.0 was an internal SVN tag, used to sync up versions between iText (Java) and iTextSharp (.NET). The latter was at 4.1.6 at that point. However, iText Software never released a build based on the 4.2.0 tag. It was a mid-development construct and the software wasn't guaranteed to be stable at that point. When iText migrated from SVN to Git, some technical constructs were cleaned up (by me personally, see full disclosure below), including the internal 4.2.0 tag. As far as iText Software concerns, there never was a release of "iText 4".
  • iText 5.0.0 (8 December 2009) was the first AGPL release by iText Software.
  • On 31 August 2010, GitHub user ymasory uploaded a version of iText "MPL/LGPL" to Github. It is unclear if this was based on 2.1.7 or on 2.1.7-59-g935969371a. They did not accept pull requests or did any other development.
  • On 19 September 2012, a now-defunct New York software startup called InProTopia Corporation (as far as I can tell, founded by a student of the Columbia University) took ymasory's repo and used that to upload a Maven build of "iText 4.2.0" and "iText 4.2.1" to Maven Central. However, they used (or hijacked?) com.lowagie as GroupId, which they were not allowed to do according to Apache's Guide to uploading artifacts to the Central Repository. This is explained in a blog post on iText's website: http://itextpdf.com/maven-update-problem-with-itext-4.2.2. See also this Stack Overflow answer: http://stackoverflow.com/a/14213851/766786
  • For clarity of this overview, I skipped some of the intermediate forks.

Conclusion: this project should do it's due diligence and make absolutely sure what it is based upon: is it iText 2.1.7 or is it iText 2.1.7-59-g935969371a? I don't want to sound like I'm spreading FUD, but you need to make absolutely sure that your users aren't in uncharted territory.

As a side note (maybe this should be a separate issue?), if you ever plan to upload to Maven Central, then you need to change every reference to com.lowagie to something else, as described in the link above.

I recommend that you contact Software Freedom Conservancy for legal and technical advice. They have extensive experience with community developers taking over an Open Source project after a license change.

Full disclosure: I am QA & Release Engineer at iText Software, but I've been an Open Source user & advocate since decades before I joined iText Software. From a personal point of view, I wish this project good luck because it is 8 years behind in development. The fact that I took considerable time to do my research for this issue, should give you a clue that I don't want to intentionally harm this project. From a professional point of view, I welcome the competition. It keeps us on edge. :-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.