Git Product home page Git Product logo

ocr_table_image's Introduction

Table OCR

This is a project that applies cnocr to achieves the conversion from image to excel file.

Before

Image

After

Image

Line

Image

Before

Image

After

Image

Introduction

ocr_resume.py and ocr_dataframe.py are two examples for table ocr, you can change the input images path to generate new excel file.

local_models directory contains trained model for dataframe.jpg with high accuracy for Number Recognition.

sub_cut_df and sub_cut_resume directory contains sub images split from the input image.

Prerequisites

  1. Python3
  2. Python-cnocr
  3. Python-skimage
  4. Python-Numpy
  5. Python-Matplotlib

Author

Ivan Chen

ocr_table_image's People

Contributors

ivan-kunfei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ocr_table_image's Issues

两个python文件报同样的错误

[WARNING 2023-10-23 08:46:02,114 _showwarnmsg:109] I:\scrap\OCR_Table_Image-main\ocr_resume.py:285: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
plt.show()

[WARNING 2023-10-23 08:46:02,118 _showwarnmsg:109] I:\scrap\OCR_Table_Image-main\ocr_resume.py:288: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
plt.show()

[WARNING 2023-10-23 08:46:02,354 _showwarnmsg:109] I:\scrap\OCR_Table_Image-main\ocr_resume.py:383: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
plt.show()

[WARNING 2023-10-23 08:46:02,432 _showwarnmsg:109] I:\scrap\OCR_Table_Image-main\ocr_resume.py:412: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
plt.show()

[WARNING 2023-10-23 08:46:02,437 _showwarnmsg:109] I:\scrap\OCR_Table_Image-main\ocr_resume.py:415: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
plt.show()

Traceback (most recent call last):
File "H:\ProgramData\Anaconda3\envs\dyk\Lib\site-packages\PIL\JpegImagePlugin.py", line 647, in _save
rawmode = RAWMODE[im.mode]
~~~~~~~^^^^^^^^^
KeyError: 'F'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "I:\scrap\OCR_Table_Image-main\ocr_resume.py", line 610, in
predict('resume.jpg', cut=False)
File "I:\scrap\OCR_Table_Image-main\ocr_resume.py", line 595, in predict
io.imsave(dir, img)
File "H:\ProgramData\Anaconda3\envs\dyk\Lib\site-packages\skimage\io_io.py", line 143, in imsave
return call_plugin('imsave', fname, arr, plugin=plugin, **plugin_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\ProgramData\Anaconda3\envs\dyk\Lib\site-packages\skimage\io\manage_plugins.py", line 205, in call_plugin
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "H:\ProgramData\Anaconda3\envs\dyk\Lib\site-packages\imageio\v3.py", line 139, in imwrite
with imopen(
File "H:\ProgramData\Anaconda3\envs\dyk\Lib\site-packages\imageio\core\v3_plugin_api.py", line 367, in exit
self.close()
File "H:\ProgramData\Anaconda3\envs\dyk\Lib\site-packages\imageio\plugins\pillow.py", line 123, in close
self._flush_writer()
File "H:\ProgramData\Anaconda3\envs\dyk\Lib\site-packages\imageio\plugins\pillow.py", line 466, in _flush_writer
primary_image.save(self._request.get_file(), **self.save_args)
File "H:\ProgramData\Anaconda3\envs\dyk\Lib\site-packages\PIL\Image.py", line 2438, in save
save_handler(self, fp, filename)
File "H:\ProgramData\Anaconda3\envs\dyk\Lib\site-packages\PIL\JpegImagePlugin.py", line 650, in _save
raise OSError(msg) from e
OSError: cannot write mode F as JPEG

result_resume _edit.xlsx怎么来的,以及其他问题

第一个问题:
识别result_resume.xlsx出现以下错误,不过没影响结果。
UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.plt.show()
Lossy conversion from float64 to uint8. Range [0.0, 255.0]. Convert image to uint8 prior to saving to suppress this warning.

第二个问题:result_resume _edit.xlsx怎么获得的,识别出来的是result_resume.xlsx

cnocr相关代码

作者大佬可以传一下完整的代码么,当前缺少cnocr相关,谢谢~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.