alessioluciani / flutter-pdf-text Goto Github PK
View Code? Open in Web Editor NEWA plugin for Flutter that allows you to read the text content of PDF documents and convert it into strings.
License: MIT License
A plugin for Flutter that allows you to read the text content of PDF documents and convert it into strings.
License: MIT License
Hi, I'm unable to print text from the pdf. The error I get says that the path is not defined properly or is null. How to get the path of the file as a string?
Here's a message I got after extracting text from a PDF file on Android
W/PdfBox-Android(10576):
Warning: You did not close a PDF Document`
I was wondering what the timeline for a null-safe version of the package is?
Dear Alessio,
Congratulations on the great idea.
It would be very helpful to be able to convert a PDF text into a json. In my specific case (and I guess it could be the same for many users) I need to convert a pdf that is very well structured into topics, subtopics and body text. Therefore I could imagine a feature to parse it into a json, and then be used inside flutter.
Thank you
This is my source code I have attached a pdf file where my Bangla font are breaking I didn't fiend any solution
app_flutterMY_PDF.pdf
import 'dart:io';
import 'dart:typed_data';
import 'package:flutter/services.dart';
import 'package:open_file/open_file.dart';
import 'package:path_provider/path_provider.dart';
import 'package:pdf/pdf.dart';
import 'package:pdf/widgets.dart';
import 'pdf_to_img_conveter.dart';
class PdfGenerator {
static late Font arFont;
static init() async {
arFont = Font.ttf(
(await rootBundle.load("assets/fonts/NotoSerifBengaliRegular.ttf")));
}
static createPdf() async {
String path = (await getApplicationDocumentsDirectory()).path;
File file = File(path + "MY_PDF.pdf");
Document pdf = Document();
pdf.addPage(_createPage());
Uint8List bytes = await pdf.save();
await file.writeAsBytes(bytes);
// createImg(file.path);
await OpenFile.open(file.path);
}
static Page _createPage() {
return Page(
// textDirection: TextDirection.rtl,
theme: ThemeData.withFont(
base: arFont,
),
pageFormat: PdfPageFormat.a4,
build: (context) {
return Center(
child: Container(
child: Text(
textDirection: TextDirection.ltr,
"জ্বীন জাতি বিকল্প হলো ইসলাম ধর্মের মূল গ্রন্থ কুরআনে বর্ণিত একটি জীব সৃষ্টি। প্রাক ইসলামী যুগেও জ্বীন জাতি ",
style: TextStyle(font: arFont, wordSpacing: 1),
tightBounds: true,
softWrap: true)));
});
}
}
How do I open a PDF document from Assets folder?
Is it possible to add web support using, for example, this package?
https://github.com/mozilla/pdf.js
Has anybody an idea on how to circumvent this?
This leads to ugly displayed text.
Simply replacing \n
in the string does not work as this removes actual line breaks too.
It is converting the PDF only any solution for the word document convert into strings like the same as in PDF any solution for that kindly help need your support
Support PDFDoc.fromUrl, to be able to fetch a pdf directly from a website.
Certain offending PDF files may cause this native crash:
E/AndroidRuntime( 7710): FATAL EXCEPTION: Thread-6
E/AndroidRuntime( 7710): Process: com.valorbyte.****, PID: 7710
E/AndroidRuntime( 7710): java.lang.StringIndexOutOfBoundsException: length=2; index=2
E/AndroidRuntime( 7710): at java.lang.String.charAt(Native Method)
E/AndroidRuntime( 7710): at com.tom_roush.pdfbox.util.DateConverter.parseDate(DateConverter.java:608)
E/AndroidRuntime( 7710): at com.tom_roush.pdfbox.util.DateConverter.toCalendar(DateConverter.java:676)
E/AndroidRuntime( 7710): at com.tom_roush.pdfbox.util.DateConverter.toCalendar(DateConverter.java:654)
E/AndroidRuntime( 7710): at com.tom_roush.pdfbox.cos.COSDictionary.getDate(COSDictionary.java:816)
E/AndroidRuntime( 7710): at com.tom_roush.pdfbox.pdmodel.PDDocumentInformation.getCreationDate(PDDocumentInformation.java:210)
E/AndroidRuntime( 7710): at dev.aluc.pdf_text.PdfTextPlugin.initDoc(PdfTextPlugin.kt:93)
E/AndroidRuntime( 7710): at dev.aluc.pdf_text.PdfTextPlugin.access$initDoc(PdfTextPlugin.kt:20)
E/AndroidRuntime( 7710): at dev.aluc.pdf_text.PdfTextPlugin$onMethodCall$1.invoke(PdfTextPlugin.kt:53)
E/AndroidRuntime( 7710): at dev.aluc.pdf_text.PdfTextPlugin$onMethodCall$1.invoke(PdfTextPlugin.kt:47)
E/AndroidRuntime( 7710): at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)
This plugin doesn't work for platform :ios, '10.0', I really need to set my project to that version so other plugins can work effectively but this plugin fail to build with that iOS version.
i only tested in android. so the error might be from pdfBox
W/PdfBox-Android( 6641): No Unicode mapping for CID+222 (222) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+254 (254) in font TAUElangoArunthathi
I/chatty ( 6641): uid=10281(com.example.text_audio) Thread-5 identical 2 lines
W/PdfBox-Android( 6641): No Unicode mapping for CID+254 (254) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+270 (270) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+270 (270) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+262 (262) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+262 (262) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+223 (223) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+223 (223) in font TAUElangoArunthathi
i think specifying the font while calling the method might solve. Just saying not sure
PDFDoc.fromPath(file_path, password: 'test')
throws:
PlatformException(INVALID_PATH, File path or password (in case of encrypted document) is invalid, null, null)
The password is test
, the pdf in question can be found at:
https://download.novapdf.com/download/samples/pdf-example-password.pdf
Please support password protected PDF files. Currently I'm getting the following error while opening a password protected file :
Exception has occurred.
PlatformException (PlatformException(INVALID_PATH, File path is invalid, null))
It is not safe to call the PDFDoc.text multiple times in succession, meaning to call it again before the previous call returns.
I's a pretty common scenario when an app may be processing more than one pdf file at a time.
If the program enters this function more than once before the previous call returns the app will crash.
I have tested this on a real Android device only and it crashes every single time if multiple pdfs are processed. Not sure how it behaves on iOS though.
There seems to be a workaround and it could be outlined as follows:
class PDFReader{
static final _lock = Lock();
...
Future pdfToString(File aPdfFile) async {
...
String result;
PDFDoc doc = await PDFDoc.fromFile(aPdfFile);
await _lock.synchronized(() async {
result = await doc.text;
});
...
return result;
}
...
}
Where the Lock comes from https://pub.dev/packages/synchronized
Hi , It will very helpful when it be a HTML , so i can get attribute like H1, H2, and more in text of PDF. Thanks
Any Support for https://developer.apple.com/documentation/pdfkit/pdfoutline?
it works fine with most of pdfs
but if pdf contain images it won't convert image text to text
instead it shows black pages..
Hello and thanks for flutter-pdf-text.
Unfortunately, I am not getting any useful text from PDFBox on Android. Result is different on iOS.
Android also seems to skip many lines. Below is an example with same content.
Any idea what can cause this and/or how to fix it?
Output on Android:
IndivualtypforHm,Ms(1356)NeL/CwEWGbRK07S28:4Pg
Period:07Sp2-3
Mon07
Of
Tue8Wd9
Sim
0715
1425
h1
by
90
Fri
FlD
4
Sat2
9
3
3
83
456
2
Vac
dateHuyRprACinfo
Output on Windows with PDFBOX
Individual duty plan for Hoppmann, Marius (113566) NetLine/Crew(EWG) printed by CREWLINK 28Aug20 10:59 Page 1
Period: 28Aug20 - 30Sep20
Fri28
FlD
0900
1945
Sat29
Sim
1120
1740
Sun30
Sim
0805
1425
Mon31
Off
Tue01
FlD
1400
2050
Wed02
FlD
1020
1810
Thu03
FlD
0920
2100
Fri04
Off
Sat05
Off
Sun06
Off
Mon07
Off
Tue08
Off
Wed09
Sim
0715
1425
Thu10
Sby
0700
1900
Fri11
FlD
1105
2030
Sat12
FlD
1155
1855
Sun13
FlD
1535
2030
Mon14
Off
Tue15
Off
Wed16
Off
Thu17
FlD
1245
2220
Fri18
FlD
1205
2050
Sat19
FlD
1235
2100
Sun20
FlD
1630
2210
Mon21
FlD
1505
2050
Tue22
Off
Wed23
Off
Thu24
Vac
Fri25
Vac
Sat26
Vac
Sun27
Vac
date H duty R dep arr AC info date H duty R dep arr AC info date H duty R dep arr AC info
Hi Alessio,
Thanks for this great library,
But i have one question: How i can know progress of extract pdf ?
Thanks
FAILURE: Build failed with an exception.
Hello Alessio,
Thanks for this great library, it has really been useful.
Just wondering if you can give an advice on how to highlight the text in the pdf image after searching for it.
Currently I'm using the library to retract the text and save the page number of the text in the pdf. But now I want to highlight the text itself on the pdf.
Looking forward to hearing from you.
Cheers
Using this option it can be decided whether to initialize the text stripper engine on Android immediately or at the first text read.
Is this plugin support arabic content
Arabic pdf?
I am using your great plugin - many thanks.
This error is not actually causing me any problems
I have noticed that when I call
_pdfDoc = await PDFDoc.fromPath(filePickerResult.files.single.path!);
String text = await _pdfDoc!.text; // <-- line that causes the error
I get an error:
I/or.myapp(19996): Background concurrent copying GC freed 306941(8MB) AllocSpace objects, 17(1916KB) LOS objects, 50% free, 14MB/28MB, paused 161us total 266.489ms
W/System (19996): A resource failed to call end.
2 W/System (19996): A resource failed to call close.
2 W/System (19996): A resource failed to call end.
Unfortunately there is no stack trace and I am not experienced with the memory inspector
flutter doctor -v
[✓] Flutter (Channel stable, 2.0.4, on Linux, locale en_GB.UTF-8)
• Flutter version 2.0.4 at /home/george/snap/flutter/common/flutter
• Framework revision b1395592de (8 days ago), 2021-04-01 14:25:01 -0700
• Engine revision 2dce47073a
• Dart version 2.12.2
[✓] Android toolchain - develop for Android devices (Android SDK version 30.0.3)
• Android SDK at /home/george/Android/Sdk
• Platform android-30, build-tools 30.0.3
• Java binary at: /snap/android-studio/current/android-studio/jre/bin/java
• Java version OpenJDK Runtime Environment (build
1.8.0_242-release-1644-b3-6222593)
• All Android licenses accepted.
[✓] Chrome - develop for the web
• CHROME_EXECUTABLE = /snap/bin/chromium
[✓] Android Studio
• Android Studio at /snap/android-studio/current/android-studio
• Flutter plugin can be installed from:
🔨 https://plugins.jetbrains.com/plugin/9212-flutter
• Dart plugin can be installed from:
🔨 https://plugins.jetbrains.com/plugin/6351-dart
• android-studio-dir = /snap/android-studio/current/android-studio
• Java version OpenJDK Runtime Environment (build
1.8.0_242-release-1644-b3-6222593)
[✓] Connected device (2 available)
• TA 1033 (mobile) • PLEGAR1763000928 • android-arm64 • Android 9 (API 28)
• Chrome (web) • chrome • web-javascript • Chromium
89.0.4389.114 snap
• No issues found!
Stack overflow seems to think that enabling "strict mode" will provide a better stack trace but I am not sure how to do that in a flutter app.
Using break points I tracked the error to this line
flutter-pdf-text/lib/pdf_text.dart
Line 101 in 6e042e8
and then specifically this line within the call to _invokeMethod
codec.encodeMethodCall(MethodCall(method, arguments)),
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.