Git Product home page Git Product logo

Comments (6)

chezou avatar chezou commented on September 21, 2024 1

I added confirmed Java versions on README afe4554

Oracle Java 9 is not released yet, what is more, it is a matter of tabula-java, not tabula-py. I think it is enough to add the confirmed versions.

from tabula-py.

jason-neal avatar jason-neal commented on September 21, 2024 1

Note this issue is resolved with the recently released Tabula 0.9.2.

from tabula-py.

jason-neal avatar jason-neal commented on September 21, 2024

Or would it be possible to add a check for the java version that it was about to be called.
And raise an error.

from tabula-py.

chezou avatar chezou commented on September 21, 2024

With my environment on macOS X 10.12.2, tabula-py works with Java 8. I googled and found an issue that says it works with Java 8 in 2015. It seems depends on your environment.

~/t/tabula-py (master=) (tabula) python tests/test_read_pdf_table.py                                           16:27:23
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
..Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Jan 28, 2017 4:27:37 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:37 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 4:27:38 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:38 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 4:27:39 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:39 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
.Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
.Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
.Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
.Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Jan 28, 2017 4:27:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 4:27:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 4:27:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 4:27:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
.
----------------------------------------------------------------------
Ran 7 tests in 28.430s

OK
~/t/tabula-py (master=) (tabula) java -version                                                                 16:27:55
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

from tabula-py.

jason-neal avatar jason-neal commented on September 21, 2024

You are correct and java 1.8 works also.

I am using linux64, tried with both ubuntu and fedora.

The test case fails for
`(eniric) ~/P/C/R/tabula-py ❯❯❯ java -version ⏎
openjdk version "9-internal"
OpenJDK Runtime Environment (build 9-internal+0-2016-04-14-195246.buildd.src)
OpenJDK 64-Bit Server VM (build 9-internal+0-2016-04-14-195246.buildd.src, mixed mode)

I have included the tests below`

openjdk version 9

(eniric) ~/P/C/R/tabula-py ❯❯❯ python tests/test_read_pdf_table.py
Exception in thread "main" java.lang.ExceptionInInitializerError
	at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
	at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
	at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
	at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
	at technology.tabula.Utils.<clinit>(Utils.java:30)
	... 2 more
E.Exception in thread "main" java.lang.ExceptionInInitializerError
	at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
	at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
	at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
	at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
	at technology.tabula.Utils.<clinit>(Utils.java:30)
	... 2 more
EException in thread "main" java.lang.ExceptionInInitializerError
	at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
	at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
	at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
	at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
	at technology.tabula.Utils.<clinit>(Utils.java:30)
	... 2 more
EException in thread "main" java.lang.ExceptionInInitializerError
	at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
	at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
	at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
	at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
	at technology.tabula.Utils.<clinit>(Utils.java:30)
	... 2 more
EException in thread "main" java.lang.ExceptionInInitializerError
	at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
	at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
	at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
	at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
	at technology.tabula.Utils.<clinit>(Utils.java:30)
	... 2 more
EException in thread "main" java.lang.ExceptionInInitializerError
	at technology.tabula.CommandLineApp.buildOptions(CommandLineApp.java:262)
	at technology.tabula.CommandLineApp.main(CommandLineApp.java:44)
Caused by: java.lang.NumberFormatException: For input string: "9-internal"
	at java.lang.NumberFormatException.forInputString(java.base@9-internal/NumberFormatException.java:65)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:695)
	at java.lang.Integer.parseInt(java.base@9-internal/Integer.java:813)
	at technology.tabula.Utils.useCustomQuickSort(Utils.java:138)
	at technology.tabula.Utils.<clinit>(Utils.java:30)
	... 2 more
E
======================================================================
ERROR: test_conver_from (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_read_pdf_table.py", line 47, in test_conver_from
    tabula.convert_into(pdf_path, temp.name, output_format='csv')
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 108, in convert_into
    subprocess.check_output(args)
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', '--format', 'CSV', '--outfile', '/tmp/jneal/tmpisip3jc3', 'tests/resources/data.pdf']' returned non-zero exit status 1

======================================================================
ERROR: test_convert_remote_file (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_read_pdf_table.py", line 57, in test_convert_remote_file
    tabula.convert_into(uri, temp.name, output_format='csv')
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 108, in convert_into
    subprocess.check_output(args)
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', '--format', 'CSV', '--outfile', '/tmp/jneal/tmpfdnllhk4', '3763.pdf']' returned non-zero exit status 1

======================================================================
ERROR: test_read_pdf (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_read_pdf_table.py", line 15, in test_read_pdf
    df = tabula.read_pdf(pdf_path)
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 50, in read_pdf_table
    output = subprocess.check_output(args)
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', 'tests/resources/data.pdf']' returned non-zero exit status 1

======================================================================
ERROR: test_read_pdf_into_json (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_read_pdf_table.py", line 27, in test_read_pdf_into_json
    json_data = tabula.read_pdf(pdf_path, output_format='json')
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 50, in read_pdf_table
    output = subprocess.check_output(args)
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', '--format', 'JSON', 'tests/resources/data.pdf']' returned non-zero exit status 1

======================================================================
ERROR: test_read_pdf_with_option (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_read_pdf_table.py", line 37, in test_read_pdf_with_option
    self.assertTrue(tabula.read_pdf(pdf_path, pages=1).equals(pd.read_csv(expected_csv1)))
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 50, in read_pdf_table
    output = subprocess.check_output(args)
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', 'tests/resources/data.pdf']' returned non-zero exit status 1

======================================================================
ERROR: test_read_remote_pdf (__main__.TestReadPdfTable)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_read_pdf_table.py", line 21, in test_read_remote_pdf
    df = tabula.read_pdf(uri)
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/wrapper.py", line 50, in read_pdf_table
    output = subprocess.check_output(args)
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/home/jneal/miniconda3/envs/eniric/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/home/jneal/miniconda3/envs/eniric/lib/python3.5/site-packages/tabula/tabula-0.9.1-jar-with-dependencies.jar', '--pages', '1', '--guess', '3763.pdf']' returned non-zero exit status 1

----------------------------------------------------------------------
Ran 7 tests in 4.627s

FAILED (errors=6)
(eniric) ~/P/C/R/tabula-py ❯❯❯ java -version                                  ⏎
openjdk version "9-internal"
OpenJDK Runtime Environment (build 9-internal+0-2016-04-14-195246.buildd.src)
OpenJDK 64-Bit Server VM (build 9-internal+0-2016-04-14-195246.buildd.src, mixed mode)


Java 1.8

(eniric) ~/P/C/e/bin ❯❯❯ export PATH=/opt/java/jre1.8.0_111/bin/:$PATH
(eniric) ~/P/C/R/tabula-py ❯❯❯ python tests/test_read_pdf_table.py
..Jan 28, 2017 9:47:45 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:47:46 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:47:46 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:47:47 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:47:49 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:47:49 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
....Jan 28, 2017 9:48:08 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:48:09 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:48:10 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:48:10 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:48:12 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:48:13 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
.
----------------------------------------------------------------------
Ran 7 tests in 46.419s

OK
(eniric) ~/P/C/R/tabula-py ❯❯❯ java -version
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

Java 1.7

eniric) ~/P/C/R/tabula-py ❯❯❯ export PATH=/opt/java/jre1.7.0_79/bin:$PATH
(eniric) ~/P/C/R/tabula-py ❯❯❯ python tests/test_read_pdf_table.py
java -version
..Jan 28, 2017 9:50:05 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:07 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:50:09 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:09 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:50:13 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:13 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
....Jan 28, 2017 9:50:42 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:44 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:50:46 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:46 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Jan 28, 2017 9:50:49 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Jan 28, 2017 9:50:49 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
.
----------------------------------------------------------------------
Ran 7 tests in 79.421s

OK
(eniric) ~/P/C/R/tabula-py ❯❯❯ java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

from tabula-py.

jason-neal avatar jason-neal commented on September 21, 2024

Yes the confirmed versions is enough. I have suggested the same to tabula-java.

from tabula-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.