Git Product home page Git Product logo

lukas-blecher / latex-ocr Goto Github PK

View Code? Open in Web Editor NEW
10.8K 66.0 900.0 9.24 MB

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Home Page: https://lukas-blecher.github.io/LaTeX-OCR/

License: MIT License

Python 96.61% JavaScript 1.75% Jupyter Notebook 1.56% Dockerfile 0.05% Shell 0.02%
machine-learning transformer im2latex deep-learning image2text latex dataset pytorch im2markup ocr

latex-ocr's Introduction

pix2tex - LaTeX OCR

GitHub Documentation Status PyPI PyPI - Downloads GitHub all releases Docker Pulls Open In Colab Hugging Face Spaces

The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code.

header

Using the model

To run the model you need Python 3.7+

If you don't have PyTorch installed. Follow their instructions here.

Install the package pix2tex:

pip install "pix2tex[gui]"

Model checkpoints will be downloaded automatically.

There are three ways to get a prediction from an image.

  1. You can use the command line tool by calling pix2tex. Here you can parse already existing images from the disk and images in your clipboard.

  2. Thanks to @katie-lim, you can use a nice user interface as a quick way to get the model prediction. Just call the GUI with latexocr. From here you can take a screenshot and the predicted latex code is rendered using MathJax and copied to your clipboard.

    Under linux, it is possible to use the GUI with gnome-screenshot (which comes with multiple monitor support) if gnome-screenshot was installed beforehand. For Wayland, grim and slurp will be used when they are both available. Note that gnome-screenshot is not compatible with wlroots-based Wayland compositors. Since gnome-screenshot will be preferred when available, you may have to set the environment variable SCREENSHOT_TOOL to grim in this case (other available values are gnome-screenshot and pil).

    demo

    If the model is unsure about the what's in the image it might output a different prediction every time you click "Retry". With the temperature parameter you can control this behavior (low temperature will produce the same result).

  3. You can use an API. This has additional dependencies. Install via pip install -U "pix2tex[api]" and run

    python -m pix2tex.api.run

    to start a Streamlit demo that connects to the API at port 8502. There is also a docker image available for the API: https://hub.docker.com/r/lukasblecher/pix2tex Docker Image Size (latest by date)

    docker pull lukasblecher/pix2tex:api
    docker run --rm -p 8502:8502 lukasblecher/pix2tex:api
    

    To also run the streamlit demo run

    docker run --rm -it -p 8501:8501 --entrypoint python lukasblecher/pix2tex:api pix2tex/api/run.py
    

    and navigate to http://localhost:8501/

  4. Use from within Python

    from PIL import Image
    from pix2tex.cli import LatexOCR
    
    img = Image.open('path/to/image.png')
    model = LatexOCR()
    print(model(img))

The model works best with images of smaller resolution. That's why I added a preprocessing step where another neural network predicts the optimal resolution of the input image. This model will automatically resize the custom image to best resemble the training data and thus increase performance of images found in the wild. Still it's not perfect and might not be able to handle huge images optimally, so don't zoom in all the way before taking a picture.

Always double check the result carefully. You can try to redo the prediction with an other resolution if the answer was wrong.

Want to use the package?

I'm trying to compile a documentation right now.

Visit here: https://pix2tex.readthedocs.io/

Training the model Open In Colab

Install a couple of dependencies pip install "pix2tex[train]".

  1. First we need to combine the images with their ground truth labels. I wrote a dataset class (which needs further improving) that saves the relative paths to the images with the LaTeX code they were rendered with. To generate the dataset pickle file run
python -m pix2tex.dataset.dataset --equations path_to_textfile --images path_to_images --out dataset.pkl

To use your own tokenizer pass it via --tokenizer (See below).

You can find my generated training data on the Google Drive as well (formulae.zip - images, math.txt - labels). Repeat the step for the validation and test data. All use the same label text file.

  1. Edit the data (and valdata) entry in the config file to the newly generated .pkl file. Change other hyperparameters if you want to. See pix2tex/model/settings/config.yaml for a template.
  2. Now for the actual training run
python -m pix2tex.train --config path_to_config_file

If you want to use your own data you might be interested in creating your own tokenizer with

python -m pix2tex.dataset.dataset --equations path_to_textfile --vocab-size 8000 --out tokenizer.json

Don't forget to update the path to the tokenizer in the config file and set num_tokens to your vocabulary size.

Model

The model consist of a ViT [1] encoder with a ResNet backbone and a Transformer [2] decoder.

Performance

BLEU score normed edit distance token accuracy
0.88 0.10 0.60

Data

We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-100k [3] dataset. All of it can be found here

Dataset Requirements

In order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools:

Fonts

Latin Modern Math, GFSNeohellenicMath.otf, Asana Math, XITS Math, Cambria Math

TODO

  • add more evaluation metrics
  • create a GUI
  • add beam search
  • support handwritten formulae (kinda done, see training colab notebook)
  • reduce model size (distillation)
  • find optimal hyperparameters
  • tweak model structure
  • fix data scraping and scrape more data
  • trace the model (#2)

Contribution

Contributions of any kind are welcome.

Acknowledgment

Code taken and modified from lucidrains, rwightman, im2markup, arxiv_leaks, pkra: Mathjax, harupy: snipping tool

References

[1] An Image is Worth 16x16 Words

[2] Attention Is All You Need

[3] Image-to-Markup Generation with Coarse-to-Fine Attention

latex-ocr's People

Contributors

frankfrank9 avatar frankier avatar freed-wu avatar jcgoran avatar joepdejong avatar katie-lim avatar kxxt avatar llxlr avatar lukas-blecher avatar moetayuko avatar muyuuuu avatar r-haecker avatar rainyl avatar titc avatar yongwookha avatar zhouzq-thu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

latex-ocr's Issues

snip failed

I found some problems happed in the process of snip,

/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)

This problem leads to failure of prediction for latex.

"--no-cuda" does not work

When using the --no-cuda argument, it returns an error.

(env) λ python pix2tex.py --no-cuda
Traceback (most recent call last):
  File "H:\pytlat\ocr\pix2tex.py", line 84, in <module>
    args, model, tokenizer = initialize(args)
  File "H:\pytlat\ocr\pix2tex.py", line 33, in initialize
    model.load_state_dict(torch.load(args.checkpoint))
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 853, in _load
    result = unpickler.load()
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 845, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 834, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 151, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 135, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I use torch 1.7.+cpu, cuda version is not installed, and can't use cuda.

The problms of mismatched evaluation metrics

Hi, thank you for your excellent work. I reproduce your work with the config file named default.yaml, but cannot get the same result(BLEU=0.74). And I found the train loss increased after a few epoches. Can you give some adivice?

image

Broken links in README

The wikipedia and arXiv links are broken under Data header. (Didn't prefix them with https://)

how to get the data ?

Data
We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-100k dataset. All of it can be found here.

where is the wikipedia data ? how to use it ?
where is the arXiv data ? how to use it ?

Deficiencies in recognition

It always recognize 0 as () or O.It seems that the simpler the formula, the less likely it is to predict success.

test failed with this file: pix2tex.py

Hi, hello, I am a newbie. I would like to ask how to use pix2tex.py for testing. I entered the image path as shown in the figure below:
微信图片_20211015111906
Thank you very much for your help!

Return unbalanced latex equations

I tested with lot of images, but for most of the images it results in an unbalanced latex equation, do you have any idea to resolve it?
I also attached the produced latex equation and image for reference.

$$U_{s2A_{-}k}=\bar{\cal B}{s}{s\bar{G}^{+}}\cdot\bar{Y}{s2A{-}\bar{k}}+\bar{\cal B}{r}\frac{\displaystyle\frac{\displaystyle\cal L}{m}^{2}}{\displaystyle\frac{\displaystyle\cal L}{\displaystyle\frac{\displaystyle\cal E}{m}}{\displaystyle\cal P}{s}\cdot\bar{\cal A}{2}\cdot\bar{\cal I}{s22}\cdot\bar{Y}{s2\bar{\cal B}{-}\bar{\cal F}{s}}{\displaystyle\cal E}{s2\bar{\cal F}{-}\displaystyle\cal A}^{-}\left(\frac{\displaystyle\frac{\displaystyle\frac{\displaystyle\cal E}{m}\cdot\bar{\cal S}{s}}{\displaystyle\cal E}{s}^{2}}\right)\cdot\bar{\mathrm Y}{s4\bar{\cal B}{s2}\cdot\bar{\cal A}{s2}\displaystyle\cal E{s}\bar{\cal F}_{s}\right},$$
patent_8_60

Snipping failing everytime

I just finally managed to get to launch (getting torch errors) but it is showing prediction failed each and every time. Even expressions like 1=2-1 are not being captured. I don't think this software is this much lame. There must be some problem.
Also I am using it in a virtual environment if that matters. I am attaching a screenshot with error messages.

image

Unexpected key(s) in state_dict and size mismatch when running python3 pix2tex.py

macbook cpu inference

python3 pix2tex.py
/usr/local/lib/python3.9/site-packages/albumentations/augmentations/transforms.py:913: FutureWarning: This class has been deprecated. Please use ImageCompression
warnings.warn(
Traceback (most recent call last):
File "/mnt/disk/LaTeX-OCR/pix2tex.py", line 144, in
args, *objs = initialize(arguments)
File "/mnt/disk/LaTeX-OCR/pix2tex.py", line 55, in initialize
model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
Unexpected key(s) in state_dict: "encoder.patch_embed.backbone.stem.conv.weight", "encoder.patch_embed.backbone.stem.norm.weight", "encoder.patch_embed.backbone.stem.norm.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm3.bias".
size mismatch for encoder.patch_embed.proj.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 1, 16, 16]).

Model trained with the latest commit seems to be not working

Hi, I retrained the model with your latest commit c7898ab, and when I tried to run the pix2tex.py I got the below errors, It seems it not able to load the trained model, do you have any idea on that?

Traceback (most recent call last):
File "pix2tex.py", line 136, in
args, *objs = initialize(arguments)
File "pix2tex.py", line 49, in initialize
model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1224, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
Missing key(s) in state_dict: "decoder.net.attn_layers.layers.0.1.to_out.0.weight", "decoder.net.attn_layers.layers.0.1.to_out.0.bias", "decoder.net.attn_layers.layers.1.1.to_out.0.weight", "decoder.net.attn_layers.layers.1.1.to_out.0.bias", "decoder.net.attn_layers.layers.2.1.net.0.proj.weight", "decoder.net.attn_layers.layers.2.1.net.0.proj.bias", "decoder.net.attn_layers.layers.3.1.to_out.0.weight", "decoder.net.attn_layers.layers.3.1.to_out.0.bias", "decoder.net.attn_layers.layers.4.1.to_out.0.weight", "decoder.net.attn_layers.layers.4.1.to_out.0.bias", "decoder.net.attn_layers.layers.5.1.net.0.proj.weight", "decoder.net.attn_layers.layers.5.1.net.0.proj.bias", "decoder.net.attn_layers.layers.6.1.to_out.0.weight", "decoder.net.attn_layers.layers.6.1.to_out.0.bias", "decoder.net.attn_layers.layers.7.1.to_out.0.weight", "decoder.net.attn_layers.layers.7.1.to_out.0.bias", "decoder.net.attn_layers.layers.8.1.net.0.proj.weight", "decoder.net.attn_layers.layers.8.1.net.0.proj.bias", "decoder.net.attn_layers.layers.9.1.to_out.0.weight", "decoder.net.attn_layers.layers.9.1.to_out.0.bias", "decoder.net.attn_layers.layers.10.1.to_out.0.weight", "decoder.net.attn_layers.layers.10.1.to_out.0.bias", "decoder.net.attn_layers.layers.11.1.net.0.proj.weight", "decoder.net.attn_layers.layers.11.1.net.0.proj.bias".
Unexpected key(s) in state_dict: "encoder.patch_embed.backbone.stages.0.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm3.bias", "decoder.net.attn_layers.layers.0.1.to_out.weight", "decoder.net.attn_layers.layers.0.1.to_out.bias", "decoder.net.attn_layers.layers.1.1.to_out.weight", "decoder.net.attn_layers.layers.1.1.to_out.bias", "decoder.net.attn_layers.layers.2.1.net.0.0.weight", "decoder.net.attn_layers.layers.2.1.net.0.0.bias", "decoder.net.attn_layers.layers.3.1.to_out.weight", "decoder.net.attn_layers.layers.3.1.to_out.bias", "decoder.net.attn_layers.layers.4.1.to_out.weight", "decoder.net.attn_layers.layers.4.1.to_out.bias", "decoder.net.attn_layers.layers.5.1.net.0.0.weight", "decoder.net.attn_layers.layers.5.1.net.0.0.bias", "decoder.net.attn_layers.layers.6.1.to_out.weight", "decoder.net.attn_layers.layers.6.1.to_out.bias", "decoder.net.attn_layers.layers.7.1.to_out.weight", "decoder.net.attn_layers.layers.7.1.to_out.bias", "decoder.net.attn_layers.layers.8.1.net.0.0.weight", "decoder.net.attn_layers.layers.8.1.net.0.0.bias", "decoder.net.attn_layers.layers.9.1.to_out.weight", "decoder.net.attn_layers.layers.9.1.to_out.bias", "decoder.net.attn_layers.layers.10.1.to_out.weight", "decoder.net.attn_layers.layers.10.1.to_out.bias", "decoder.net.attn_layers.layers.11.1.net.0.0.weight", "decoder.net.attn_layers.layers.11.1.net.0.0.bias".

Feature extraction

Hi,

Can we use your trained model, to extract math features from an image. I mean can we take the layer before the prediction layer for feature extractions?

how to ocr low width latex image

0123
0123-result

If image width is too low, the ocr result will useless.

I have try to reduce patch_size to 8, but the error occured:
Exception has occurred: RuntimeError The size of tensor a (33) must match the size of tensor b (129) at non-singleton dimension 1 File "F:\code\LaTeX-OCR\models.py", line 81, in forward_features x += self.pos_embed[:, pos_emb_ind] File "F:\code\LaTeX-OCR\train.py", line 48, in train encoded = encoder(im.to(device)) File "F:\code\LaTeX-OCR\train.py", line 88, in <module> train(args)

I have struggle this issue several days, Please tell me what can I do for this situation.

Thank you very much!

what is the -tokenizer path_to_tokenizer

what is the -tokenizer path_to_tokenizer
python dataset/dataset.py --equations path_to_textfile --images path_to_images --tokenizer path_to_tokenizer --out dataset.pkl

Compatibility : Missing key(s)

Hi, author.
Thanks for your work. It will be great convenient for converting formulas to latex codes offline by this software.
I had installed its requirements under manjaro Linux today; however, it still threw an error like below if I executed the command python gui.py:

Traceback (most recent call last):
  File ".../LaTeX-OCR/gui.py", line 274, in <module>
    ex = App(arguments)
  File ".../LaTeX-OCR/gui.py", line 26, in __init__
    self.initModel()
  File ".../LaTeX-OCR/gui.py", line 33, in initModel
    args, *objs = pix2tex.initialize(self.args)
  File ".../LaTeX-OCR/pix2tex.py", line 55, in initialize
    model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
  File "/usr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
        Missing key(s) in state_dict: "decoder.net.attn_layers.layers.2.1.net.3.weight", "decoder.net.attn_layers.layers.2.1.net.3.bias", "decoder.net.attn_layers.layers.5.1.net.3.weight", "decoder.net.attn_layers.layers.5.1.net.3.bias", "decoder.net.attn_layers.layers.8.1.net.3.weight", "decoder.net.attn_layers.layers.8.1.net.3.bias", "decoder.net.attn_layers.layers.11.1.net.3.weight", "decoder.net.attn_layers.layers.11.1.net.3.bias". 
        Unexpected key(s) in state_dict: "decoder.net.attn_layers.layers.2.1.net.2.weight", "decoder.net.attn_layers.layers.2.1.net.2.bias", "decoder.net.attn_layers.layers.5.1.net.2.weight", "decoder.net.attn_layers.layers.5.1.net.2.bias", "decoder.net.attn_layers.layers.8.1.net.2.weight", "decoder.net.attn_layers.layers.8.1.net.2.bias", "decoder.net.attn_layers.layers.11.1.net.2.weight", "decoder.net.attn_layers.layers.11.1.net.2.bias". 

I would like to know what caused this problem and how can I run the code correctly?
Thanks.

Code understanding

I want to understand the whole code please direct me where i can find help

training time and equipment

Hi, thanks for your sharing code. I want to use this dataset to train a similar model. So I'd like to know how long your model has been trained and what kind of machine did you use?

What settings to achieve BLEU: 0.88?

Hi Lukas,

Thanks for the work.

I trained on the same dataset you mentioned in README.
But I only get BLEU: 0.719, ED: 3.18e-01. After that, the training diverge and the BLEU decrease.
I would like o reproduce your training to get that BLEU: 0.88.

Thanks,
Hung

gui.py cannot capture screen in the second monitor

Hi, thank you for the excellent work. I meet some problems when using the gui. I am using Ubuntu20.04 with a kde desktop. After running the screen snip(button or alt+s), I can only start the area selection in the main monitor and the opacity only changes in the main monitor. If I press my mouse in the main screen and drag it to the second screen, the program can accurately select the area but the selected rectangle only shows the main screen part. I can not select area if I firstly press my mouse in the second screen.
When using the "gui.py" script, I slightly modify it and I'm quite sure that these changes are not related with this problem. Anyway, here are the changes that I made. The ImageGrab from PIL keep throwing errors so I change from PIL import ImageGrab to import pyscreenshot as ImageGrab. I also remove the all_screen parameter in line 252: img = ImageGrab.grab(bbox=(x1, y1, x2, y2), all_screens=True) since this parameter is only available in Windows.

Model training speed

How fast should the training of the model be? I'm using the data provided in the google drive link and using default.yaml. At the current rate, I'm needing more than a day to train the model. Is there a way to shorten the amount of time considerably?

Runtime error while running

I am getting this following error: RuntimeError: Error(s) in loading state_dict for ResNetV2: size mismatch for head.fc.weight: copying a param with shape torch.Size([21, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([22, 1024, 1, 1]). size mismatch for head.fc.bias: copying a param with shape torch.Size([21]) from checkpoint, the shape in current model is torch.Size([22]). I don't really know how to resolve this issue, please help.

generate the cromhe tokenizer.json ,error,how to fix it ?

(tf_1.12) root@f15b165683e6:/home/code/LaTeX-OCR# python dataset/dataset.py --equations latex-ocr-data/crohme/CROHME_math.txt --vocab-size 8000 --out crohme-tokenizer.json
Generate tokenizer
Traceback (most recent call last):
File "dataset/dataset.py", line 244, in
generate_tokenizer(args.equations, args.out, args.vocab_size)
File "dataset/dataset.py", line 228, in generate_tokenizer
trainer = BpeTrainer(special_tokens=["[PAD]", "[BOS]", "[EOS]"], vocab_size=vocab_size, show_progress=True)
TypeError: 'str' object cannot be interpreted as an integer
(tf_1.12) root@f15b165683e6:/home/code/LaTeX-OCR#
how to fix it ?

[image_resizer.pth] RuntimeError: Error(s) in loading state_dict for Model

Hello I tried using the image_resizer.pth weight because my data often have large image size, but when I modified the pix2tex file checkpoint argument to image_resizer.pth, I got a runtime error as appeared on this screenshot I attached. Does anyone ever tried image_resizer.pth? Or any suggestions to solve this issue?

Screen Shot 2021-12-06 at 15 16 43

Thank you in advance.

Here's how I modify the checkpoint argument:
Screen Shot 2021-12-06 at 15 23 09

Use GPU to train

Dear author:
Hello! I would like to ask how to use GPU to train my own data set?

Installation Help

I'm trying to install using the
pip install -r requirements.txt
line of code, but it seems like my computer is stuck in an infinite loop. I successfully installed Pytorch and Python 3.7 before running the requirements line. At first it was unsuccessful and the error line recommended trying --user. No dice. I appreciate your help! I'm excited to try out your code.

Is there any solution for bad predicition of image that has long width?

Im working with your great code! Thanks :)

I already finish training with my own latex data and by using pix2tex.py, I get output of my own testset.

Most of testset predict well, but some images that have relatively long width predict bad.

Is there any tips for this problem? (like use small patch size, etc)

)can't snip outside window (i3-wm)

I just found your great tool and it seems that the way you capture in a windows does not allow for tiling managers like i3-wm. The snipping seems to open a separate windows instead of creating a layer on top of all windows
pix2tex
.

Use model on Android?

Hi! Your model is working great on PC, but is is possible to use it on Android device?
As far as I know, the model have to be converted to TorchScript format to work on mobile device, but it's not enough. We also need to transfer "call_model" function from pix2tex.py script to Android app, because model requires specific image resize to work. How we can do that? Thank you :)

Convert to onnx model

Your work is very helpful for you, thank you! But when I try to convert this pytorch model to onnx file, I meet some errors. Have you tried this program? Thanks!

Model fails for a simple tex

Hi,

it's a great project, many thanks for it! The model needs some work though, I found it failing for relatively simple examples like this one: $\mathcal{X} \times \Theta \to \overbar{R}$ spitting out very different results in consecutive prediction attempts. Fundamentally this issue could be solved by the introduction of confidence thresholds...

Cheers,
Wojtek

gui does not show the original text

Hey guys,

thanks for working on this, its a cool project. I have installed it and am using the GUI on win10. Here is what i see:

Unbenannt

There is no upper part of the GUI, perhaps its just the copy of the screenshot so its nothing, but just asking if this might impact the functionality.

Finally I would like to know, how can I retranslate the LATEX OCR:

$\scriptstyle\pi(\infty);=;1\times1$

from

function

back into readable format.

Thanks!

Completely unusable

At first when I tried a complicated formula, it would get stuck for a long time and then returned the wrong result.

Later, I found that even the simplest case is not recognized by this program.

latexocr

Getting unbalanced latex equations as results

Hi, I tried with your recent commit, this time I tested with a different images but it doesn't seems to fix the issue. I am still getting unbalanced latex equation, and you also mentioned you have trained a new model, but did you uploaded it in the drive?

  • I also observed that when we try to predict the same image for multiple times, it produces different latex results, is that the expected behaviour?

patent_258_2

Training?

Really Nice Work.
Can you please provide a proper pipeline on how to train with own data.
I tried to use your formulae images and config (from google drive) to train but got error
RuntimeError: merge_sort: failed to synchronize: device-side assert triggered
Can you help in training.
Thanks

Missing key(s) in state_dict

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
Missing key(s) in state_dict:

Help to speed up inference processing

Hi authors.
Thank you so much for awesome project. It working very good. Currently i got issues about time consuming when I ran with multiple cropped image formulation ( about 50 images) it took about 9 s to run all images. I ran model by function call_model in pix2text.py and with GPU 2080TI. Do you have some ideal to speed up inference processing. Thank u so much

Munch attribute error

error

I get the above error when trying to run pix2tex.py. How can I resolve this? I am running on Windows 10.

Error: Index out of range in self during the model training

I tried to train the model, in the CPU, but received the below error; not sure what could be the cause?

Loss: 1.0180: 2%|█▉ | 421/18013 [09:41<6:45:16, 1.38s/it]
Traceback (most recent call last):
File "train.py", line 94, in
train(args)
File "train.py", line 52, in train
loss = decoder(tgt_seq, mask=tgt_mask, context=encoded)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/autoregressive_wrapper.py", line 102, in forward
out = self.net(xi, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/x_transformers.py", line 738, in forward
x += self.pos_emb(x)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/x_transformers.py", line 107, in forward
return self.emb(n)[None, :, :]
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 156, in forward
return F.embedding(
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/functional.py", line 1916, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

No module named 'PyQt5.QtWebEngineWidgets'

I'm getting this error every time I try to run gui.py:

Traceback (most recent call last):
  File "gui.py", line 6, in <module>
    from PyQt5.QtWebEngineWidgets import QWebEngineView
ModuleNotFoundError: No module named 'PyQt5.QtWebEngineWidgets'

Regarding PyQt5 related packages installed, this is what I have

PyQt5==5.15.5
PyQt5-Qt5==5.15.2
PyQt5-sip==12.9.0
PyQtWebEngine-Qt5==5.15.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.