Comments (6)
I added confirmed Java versions on README afe4554
Oracle Java 9 is not released yet, what is more, it is a matter of tabula-java
, not tabula-py
. I think it is enough to add the confirmed versions.
from tabula-py.
Note this issue is resolved with the recently released Tabula 0.9.2.
from tabula-py.
Or would it be possible to add a check for the java version that it was about to be called.
And raise an error.
from tabula-py.
With my environment on macOS X 10.12.2, tabula-py works with Java 8. I googled and found an issue that says it works with Java 8 in 2015. It seems depends on your environment.
~/t/tabula-py (master=) (tabula) python tests/test_read_pdf_table.py 16:27:23
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
..Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Jan 28, 2017 4:27:37 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:37 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 4:27:38 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:38 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 4:27:39 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:39 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
.Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
.Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
.Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
.Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Jan 28, 2017 4:27:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 4:27:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 4:27:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
.
----------------------------------------------------------------------
Ran 7 tests in 28.430s
OK
~/t/tabula-py (master=) (tabula) java -version 16:27:55
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)
from tabula-py.
You are correct and java 1.8 works also.
I am using linux64, tried with both ubuntu and fedora.
The test case fails for
`(eniric) ~/P/C/R/tabula-py ❯❯❯ java -version ⏎
openjdk version "9-internal"
OpenJDK Runtime Environment (build 9-internal+0-2016-04-14-195246.buildd.src)
OpenJDK 64-Bit Server VM (build 9-internal+0-2016-04-14-195246.buildd.src, mixed mode)
I have included the tests below`
openjdk version 9
(eniric) ~/P/C/R/tabula-py ❯❯❯ python tests/test_read_pdf_table.py
Exception in thread "main" java.lang.ExceptionInInitializerError
at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
at technology.tabula.Utils.<clinit>(Utils.java:30)
... 2 more
E.Exception in thread "main" java.lang.ExceptionInInitializerError
at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
at technology.tabula.Utils.<clinit>(Utils.java:30)
... 2 more
EException in thread "main" java.lang.ExceptionInInitializerError
at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
at technology.tabula.Utils.<clinit>(Utils.java:30)
... 2 more
EException in thread "main" java.lang.ExceptionInInitializerError
at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
at technology.tabula.Utils.<clinit>(Utils.java:30)
... 2 more
EException in thread "main" java.lang.ExceptionInInitializerError
at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
at technology.tabula.Utils.<clinit>(Utils.java:30)
... 2 more
EException in thread "main" java.lang.ExceptionInInitializerError
at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
at technology.tabula.Utils.<clinit>(Utils.java:30)
... 2 more
E
======================================================================
ERROR: test_conver_from (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_read_pdf_table.py", line 47, in test_conver_from
tabula.convert_into(pdf_path, temp.name, output_format='csv')
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 108, in convert_into
subprocess.check_output(args)
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', '--format', 'CSV', '--outfile', '/tmp/jneal/tmpisip3jc3', 'tests/resources/data.pdf']' returned non-zero exit status 1
======================================================================
ERROR: test_convert_remote_file (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_read_pdf_table.py", line 57, in test_convert_remote_file
tabula.convert_into(uri, temp.name, output_format='csv')
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 108, in convert_into
subprocess.check_output(args)
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', '--format', 'CSV', '--outfile', '/tmp/jneal/tmpfdnllhk4', '3763.pdf']' returned non-zero exit status 1
======================================================================
ERROR: test_read_pdf (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_read_pdf_table.py", line 15, in test_read_pdf
df = tabula.read_pdf(pdf_path)
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 50, in read_pdf_table
output = subprocess.check_output(args)
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', 'tests/resources/data.pdf']' returned non-zero exit status 1
======================================================================
ERROR: test_read_pdf_into_json (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_read_pdf_table.py", line 27, in test_read_pdf_into_json
json_data = tabula.read_pdf(pdf_path, output_format='json')
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 50, in read_pdf_table
output = subprocess.check_output(args)
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', '--format', 'JSON', 'tests/resources/data.pdf']' returned non-zero exit status 1
======================================================================
ERROR: test_read_pdf_with_option (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_read_pdf_table.py", line 37, in test_read_pdf_with_option
self.assertTrue(tabula.read_pdf(pdf_path, pages=1).equals(pd.read_csv(expected_csv1)))
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 50, in read_pdf_table
output = subprocess.check_output(args)
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', 'tests/resources/data.pdf']' returned non-zero exit status 1
======================================================================
ERROR: test_read_remote_pdf (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_read_pdf_table.py", line 21, in test_read_remote_pdf
df = tabula.read_pdf(uri)
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 50, in read_pdf_table
output = subprocess.check_output(args)
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', '3763.pdf']' returned non-zero exit status 1
----------------------------------------------------------------------
Ran 7 tests in 4.627s
FAILED (errors=6)
(eniric) ~/P/C/R/tabula-py ❯❯❯ java -version ⏎
openjdk version "9-internal"
OpenJDK Runtime Environment (build 9-internal+0-2016-04-14-195246.buildd.src)
OpenJDK 64-Bit Server VM (build 9-internal+0-2016-04-14-195246.buildd.src, mixed mode)
Java 1.8
(eniric) ~/P/C/e/bin ❯❯❯ export PATH=/opt/java/jre1.8.0_111/bin/:$PATH
(eniric) ~/P/C/R/tabula-py ❯❯❯ python tests/test_read_pdf_table.py
..Jan 28, 2017 9:47:45 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:47:46 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:47:46 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:47:47 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:47:49 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:47:49 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
....Jan 28, 2017 9:48:08 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:48:09 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:48:10 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:48:10 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:48:12 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:48:13 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
.
----------------------------------------------------------------------
Ran 7 tests in 46.419s
OK
(eniric) ~/P/C/R/tabula-py ❯❯❯ java -version
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
Java 1.7
eniric) ~/P/C/R/tabula-py ❯❯❯ export PATH=/opt/java/jre1.7.0_79/bin:$PATH
(eniric) ~/P/C/R/tabula-py ❯❯❯ python tests/test_read_pdf_table.py
java -version
..Jan 28, 2017 9:50:05 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:07 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:50:09 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:09 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:50:13 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:13 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
....Jan 28, 2017 9:50:42 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:44 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:50:46 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:46 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:50:49 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:49 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
.
----------------------------------------------------------------------
Ran 7 tests in 79.421s
OK
(eniric) ~/P/C/R/tabula-py ❯❯❯ java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
from tabula-py.
Yes the confirmed versions is enough. I have suggested the same to tabula-java
.
from tabula-py.
Related Issues (20)
- dont ignore empty columns in tables spanning multiple pages HOT 1
- Try to install tabula-py HOT 1
- Use JPype instead of subprocess HOT 11
- Add a way to set areas for non-existent pages in template HOT 4
- Exception: RuntimeError: java.lang.UnsatisfiedLinkError: HOT 2
- cant install tabula-py on m1 mac vscode. HOT 1
- Support Python 3.12 HOT 5
- Pls add "orientation" parameter to read_pdf HOT 4
- Security vulnerability in tabula-1.0.5-jar-with-dependencies.jar HOT 4
- [BUG] Encoding still being overridden even after fix to #371. HOT 5
- FutureWarning: errors='ignore' is deprecated and will raise in a future version. HOT 3
- Unable to detect table with longer header information HOT 4
- [BUG] issue just running sample code HOT 1
- Table detection in images HOT 1
- [BUG] <FutureWarning: errors='ignore' > HOT 3
- [BUG] Error importing jpype dependencies. Fallback to subprocess. No module named 'org.apache' HOT 1
- [BUG] column parameter of read_pdf currently needs to be list, not generic iterable HOT 3
- Problem in output table characters HOT 1
- 复杂格式图片处理效果不好 HOT 1
- jtypes incompatible / not available? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tabula-py.