Comments (17)
See also https://github.com/LibrePDF/OpenPDF/wiki/Accents,-DIN-91379,-non-Latin-scripts
from openpdf.
Thank you very much for your answer, which has made great progress in my questions about Khmer PDFs everywhere!
It looks almost correct, but I noticed a small issue that OpenPDF may not have handled this scene well
Below, I will provide an example image. The OpenPDF version I am using is 1.3.43
`public class HelloWorld {
public static void main(String[] args) {
LayoutProcessor.enableKernLiga();
// Register TrueTypeFont which supports Hindi
FontFactory.register("D:\\devwork\\thirddemo\\openPDF\\src\\main\\resources\\KhmerOSSiemreap.ttf", "khmerFont");
Document document = new Document();
try {
PdfWriter.getInstance(document,
new FileOutputStream("C:\\Users\\xr\\Desktop\\fonts\\openPDF.pdf"));
document.open();
document.add(new Paragraph(
"បន្ថែមនេះនឹងមានសុពលភាពចាប់ពី ថ្អែទី ២០ ខែ កញ្ញា ឆ្នាំ ២០២៣ តទៅ។ ក្រុមហ៊ុនមិនតម្រូវឲ្យលោកអ្នកធ្វើអ្វីបន្ថែមឡើយ ហើយបុព្វលាភរ៉ាប់រងរបស់",
FontFactory.getFont("khmerFont", BaseFont.IDENTITY_H,false,10)));
} catch (DocumentException de) {
System.err.println(de.getMessage());
} catch (IOException ioe) {
System.err.println(ioe.getMessage());
}
document.close();
}
from openpdf.
@wang0331 , could you provide a smaller example only with the incorrect letters?
Please compare the output of OpenPdf/LayoutProcessor with the output of HarfBuzz hb-view, see https://github.com/harfbuzz/harfbuzz/releases/tag/8.4.0
from openpdf.
Thank you very much for your reply. For a minimum example, please refer to this:
ហ៍្វ
I compared the outputs of itext8+pdfcalligraphy, and the results they displayed were clearly correct
from openpdf.
The minimal example is rendered as
with OpenPdf /LayoutProcessor (2.0x trunk)
This should be correct.
from openpdf.
The OpenPDF version I am using is 1.3.43
I did not use version 2.0. x of OpenPDF because I need to use the Java8 development environment to investigate whether OpenPDF can be integrated. If it is not supported in 1.3. xOpenPDF, can the reason be identified and adapted?
I tested 1.4.2 and 2.0.2, and they can export this character normally when paired with the corresponding version of JDK. Only 1.3.43 and Java8 cannot export this character properly. If 1.3. x is still being maintained, can I adapt it?
from openpdf.
@wang0331, so you are not talking about displaying the characters in PDF, but about the extraction of text from the PDF file using a PDF viewer.
This task is quit complicated and the exported characters seem incorrect even with the current source code on GitHub.
OpenPDF (master branch, compiled on 2024-05-11)
ហ៍្
LayoutProcessor.setWriteActualText();
ហ៍្វ
Only the output with the experimental option
LayoutProcessor.setWriteActualText();
seems correct.
from openpdf.
Analysis:
Font used: https://fonts.google.com/specimen/Siemreap
Using LayoutProcessor the input: '0x17a0', '0x17cd', '0x17d2', '0x179c'
is converted by java.awt.Font.layoutGlyphVector to glyph array
[68, 111, 165]
These glyphs map to the following Unicode characters according to
GlyphOrder of the font (converted using ttx):
68 uni17A0 111 uni17CD 165 uni17D2_uni179C.zz02
The glyph 165 is a ligature and corresponds to two Unicode
characters.
The method java.awt.font.GlyphVector.getGlyphCharIndex does not return this correspondence.
I don't see a possibility to store a one to many
correspondence in the toUnicode map of TrueTypeFontUnicode.
So if the PDF text shown in a PDF viewer is selected and copied the last character is lost.
from openpdf.
Using Branch 1.3 or Branch 1.4 with LayoutProcessor I get a correct visual appearance
and incorrect text export ហ៍¥
from openpdf.
Thank you for your patient answer! @vk-github18
But I think you may have misunderstood my meaning. I didn't try to copy the text from the PDF, I just tried to export the Khmer text copied from Microsoft Office Word correctly
I am unable to export the given minimum example correctly using Java8 and OpenPDF 1.3, but versions 1.4 and 2.0 are acceptable. If you successfully export this minimum Khmer language using 1.3, please provide your OpenPDF 1.3 code example
from openpdf.
@wang0331 , I tested the minimal example:
OpenJDK Java 1.8.0 OpenPDF Branch 1.3-Java8
chars: 17a0 17cd 17d2 179c
glyphVector = awtFont.layoutGlyphVector(...)
glyphVector.getNumGlyphs()=5
glyphs: 68 111 694 165 65535
charIndizes=0
charIndizes=1
charIndizes=2
charIndizes=2
charIndizes=3
ttx/GlyphOrder
68 uni17A0
111 uni17CD
694 uni25CC ???
165 uni17D2_uni179C.zz02
65535 ???
The method awt.Font.layoutGlyphVector() in Java 1.8 seems to return incorrect results. Java 11 or newer are correct.
This is a problem with the built in Java classes in version 1.8. I don't see a way to deal with this.
from openpdf.
Using OpenJDK Java 1.8.0 OpenPDF Branch 1.3-Java8
with FOP dependency the result is:
I used
System.out.println(FopGlyphProcessor.isFopSupported()?"fop is supported":"fop is NOT supported");
to verify that FOP is found. (I had to use Project/Context Menu/Maven/Reload Project that IntelliJ found FOP)
See https://github.com/LibrePDF/OpenPDF/wiki/Multi-byte-character-language-support-with-TTF-fonts
from openpdf.
@vk-github18
Unfortunately, I have imported two Maven dependencies for FOP using JDK 1.8 and OpenPDF 1.3, and the code shows that FOP is already supported.
However, there may still be issues with exporting PDF results. Can you share the code examples for the JDK version? I want to know if I missed some details myself
from openpdf.
@wang0331 , sure here is the example file:
App.java.txt
Running under Linux:
/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java -cp lib/commons-io-2.16.1.jar:lib/commons-logging-1.3.1.jar:lib/fop-core-2.9.jar:lib/openpdf-1.3.43.jar:lib/xmlgraphics-commons-2.9.jar:target/openpdf-khmer-1.0-SNAPSHOT.jar khmer.App
from openpdf.
@vk-github18
I used the code example you provided, but found a very interesting situation
If I don't use LayoutProcessor.enableKernLiga();
, The display of 'ហ៍្វ' appears to be correct, but there may be problems exporting text from other Khmer words, resulting in the inability to use it properly
If I use LayoutProcessor.enableKernLiga();
. The display of 'ហ៍្វ' is incorrect, but after my simple verification, the export of other Khmer text seems to be correct
Can I conclude that using jdk1.8 and OpenPDF 1.3. x, I am unable to fully export Khmer text correctly
from openpdf.
@wang0331 , I don't see a simple solution for Java 1.8.
Possibly you could create the PDF file directly with Apache FOP if you can't use a current Java version.
from openpdf.
Using FOP for your examples looks as follows
k.pdf
Input and configuration file:
fop.xconf.txt
k.fo.txt
fop -c fop.xconf -fo k.fo -pdf k.pdf
from openpdf.
Related Issues (20)
- Empty pages being added with dynamic paragraphs. HOT 6
- Release info on Github page should show 2.0.1 HOT 1
- How to implement the CSS into the HTML to be print to the PDF
- Automatic module cannot be used with jlink since version > 1.3.30 HOT 2
- Requesting support for Gradient Color Fill and Stroke HOT 5
- Replace existing text by matching pattern in PDF HOT 3
- Security risks, analysis and review of OpenPDF source code
- openpdf1.3.34version ,How can I set and get the width and height of a paragraph? If it's not possible directly, are there any alternative solutions? thanks
- [Build] `testFontStyleOfStyledFont` causes builds to fail with default Maven configuration on OSX 13.0 and higher HOT 5
- Can't fallback Font to Helvetica in PdfGraphics2D HOT 6
- iTextPdf 5.x class BarcodeQRCode missing in OpenPdf report? HOT 4
- Nested Lists in ColumnText not rendered correctly
- [bug] ERROR: Infinite table loop HOT 2
- PdfCopy cannot be used for writing HOT 1
- Streamlining Chunk Addition in ColumnText Without Storing All in JVM HOT 5
- Handling Row Content Splitting in PdfPTable.writeSelectedRows() HOT 3
- font can't display Complete when text length greater than pdf edit box length HOT 2
- Is OpenPDB library is fully supported on Android? HOT 2
- Support for Circular Shaped Images
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openpdf.