lukas-blecher / latex-ocr Goto Github PK

View Code? Open in Web Editor NEW

10.8K 66.0 900.0 9.24 MB

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Home Page: https://lukas-blecher.github.io/LaTeX-OCR/

License: MIT License

Python 96.61% JavaScript 1.75% Jupyter Notebook 1.56% Dockerfile 0.05% Shell 0.02%

machine-learning transformer im2latex deep-learning image2text latex dataset pytorch im2markup ocr

latex-ocr's Introduction

pix2tex - LaTeX OCR

The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code.

Using the model

To run the model you need Python 3.7+

If you don't have PyTorch installed. Follow their instructions here.

Install the package pix2tex:

pip install "pix2tex[gui]"

Model checkpoints will be downloaded automatically.

There are three ways to get a prediction from an image.

You can use the command line tool by calling pix2tex. Here you can parse already existing images from the disk and images in your clipboard.
Thanks to @katie-lim, you can use a nice user interface as a quick way to get the model prediction. Just call the GUI with latexocr. From here you can take a screenshot and the predicted latex code is rendered using MathJax and copied to your clipboard.

Under linux, it is possible to use the GUI with gnome-screenshot (which comes with multiple monitor support) if gnome-screenshot was installed beforehand. For Wayland, grim and slurp will be used when they are both available. Note that gnome-screenshot is not compatible with wlroots-based Wayland compositors. Since gnome-screenshot will be preferred when available, you may have to set the environment variable SCREENSHOT_TOOL to grim in this case (other available values are gnome-screenshot and pil).

If the model is unsure about the what's in the image it might output a different prediction every time you click "Retry". With the temperature parameter you can control this behavior (low temperature will produce the same result).
You can use an API. This has additional dependencies. Install via pip install -U "pix2tex[api]" and run
```
python -m pix2tex.api.run
```
to start a Streamlit demo that connects to the API at port 8502. There is also a docker image available for the API: https://hub.docker.com/r/lukasblecher/pix2tex
```
docker pull lukasblecher/pix2tex:api
docker run --rm -p 8502:8502 lukasblecher/pix2tex:api
```
To also run the streamlit demo run
```
docker run --rm -it -p 8501:8501 --entrypoint python lukasblecher/pix2tex:api pix2tex/api/run.py
```
and navigate to http://localhost:8501/

Use from within Python

from PIL import Image
from pix2tex.cli import LatexOCR

img = Image.open('path/to/image.png')
model = LatexOCR()
print(model(img))

The model works best with images of smaller resolution. That's why I added a preprocessing step where another neural network predicts the optimal resolution of the input image. This model will automatically resize the custom image to best resemble the training data and thus increase performance of images found in the wild. Still it's not perfect and might not be able to handle huge images optimally, so don't zoom in all the way before taking a picture.

Always double check the result carefully. You can try to redo the prediction with an other resolution if the answer was wrong.

Want to use the package?

I'm trying to compile a documentation right now.

Visit here: https://pix2tex.readthedocs.io/

Training the model

Install a couple of dependencies pip install "pix2tex[train]".

First we need to combine the images with their ground truth labels. I wrote a dataset class (which needs further improving) that saves the relative paths to the images with the LaTeX code they were rendered with. To generate the dataset pickle file run

python -m pix2tex.dataset.dataset --equations path_to_textfile --images path_to_images --out dataset.pkl

To use your own tokenizer pass it via --tokenizer (See below).

You can find my generated training data on the Google Drive as well (formulae.zip - images, math.txt - labels). Repeat the step for the validation and test data. All use the same label text file.

Edit the data (and valdata) entry in the config file to the newly generated .pkl file. Change other hyperparameters if you want to. See pix2tex/model/settings/config.yaml for a template.
Now for the actual training run

python -m pix2tex.train --config path_to_config_file

If you want to use your own data you might be interested in creating your own tokenizer with

python -m pix2tex.dataset.dataset --equations path_to_textfile --vocab-size 8000 --out tokenizer.json

Don't forget to update the path to the tokenizer in the config file and set num_tokens to your vocabulary size.

Model

The model consist of a ViT [1] encoder with a ResNet backbone and a Transformer [2] decoder.

Performance

BLEU score	normed edit distance	token accuracy
0.88	0.10	0.60

Data

We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-100k [3] dataset. All of it can be found here

Dataset Requirements

In order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools:

XeLaTeX
ImageMagick with Ghostscript. (for converting pdf to png)
Node.js to run KaTeX (for normalizing Latex code)
Python 3.7+ & dependencies (specified in setup.py)

Fonts

Latin Modern Math, GFSNeohellenicMath.otf, Asana Math, XITS Math, Cambria Math

TODO

add more evaluation metrics
create a GUI
add beam search
support handwritten formulae (kinda done, see training colab notebook)
reduce model size (distillation)
find optimal hyperparameters
tweak model structure
fix data scraping and scrape more data
trace the model (#2)

Contribution

Contributions of any kind are welcome.

Acknowledgment

Code taken and modified from lucidrains, rwightman, im2markup, arxiv_leaks, pkra: Mathjax, harupy: snipping tool

References

[1] An Image is Worth 16x16 Words

[2] Attention Is All You Need

[3] Image-to-Markup Generation with Coarse-to-Fine Attention

latex-ocr's People

Contributors

Stargazers

Watchers

Forkers

stjordanis adaikalaraj weiwenlan gopinathcool r-haecker katie-lim dsp6414 juice91 zyh121382 hongjea-park nguyendinhduc jhxu-org geek3000 zzvdl soumyabrotobanerjee yongwookha texervn futurepaycc daydreamer2023 yuelupenbgpeng123 tobeeeelite caizy1709 longlongvip yinjc jurision tjxj trendingtechnology soheean catherineco ast-363 handsonic cloudyw yuxxinwang toread-jxj nepgearg tanphamnewbie saitoasukakawaii tjx222 yanqi1811 zixijuns brian-wuu lijianghu setyanp jcgoran super-zoe heyeshuang sararajeshb yu45020 kwon-jaehong shyamalschandra key7men xqy266 aartea subhashreek dotpyu trungtd-2436 tducthang phu-minh lv-tuan tungnguyen1234 duydthiph ductho9799 tongocduy1601 mlx15 hydrogen1999 daibangcam xxcatullusxx danielphamvt hieuqn cxqntnt huyhoang17 hmthanh trungtv1207 kienkauko tuananh1406 rafaelmri htrang28 nguyenduyphuc ai-motive haidang124 dangxuanvuong98 trungit2001 realasking szha0068 vthuan3779 pritam-dey3 adelbennaceur wyh9297 weiquanpan jaakko-paavola kinyusui make-magic 93renke huynhnhathao thanhkaist pankajkarman hellmo718 wyukang muhammad-yousef ruacon35

latex-ocr's Issues

AttributeError: Can't get attribute 'Im2LatexDataset' on <module 'main'>

During the training, I am getting the following error message.

LaTeX-OCR/dataset/dataset.py", line 195, in load x = pickle.load(file) AttributeError: Can't get attribute 'Im2LatexDataset' on <module '__main__'>

Tanks in advance.

snip failed

I found some problems happed in the process of snip,

/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)

This problem leads to failure of prediction for latex.

‘temperature’ parameter

How should the temperature parameter be set so that the output is as consistent as possible each time？

"--no-cuda" does not work

When using the --no-cuda argument, it returns an error.

(env) λ python pix2tex.py --no-cuda
Traceback (most recent call last):
  File "H:\pytlat\ocr\pix2tex.py", line 84, in <module>
    args, model, tokenizer = initialize(args)
  File "H:\pytlat\ocr\pix2tex.py", line 33, in initialize
    model.load_state_dict(torch.load(args.checkpoint))
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 853, in _load
    result = unpickler.load()
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 845, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 834, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 151, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 135, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I use torch 1.7.+cpu, cuda version is not installed, and can't use cuda.

The problms of mismatched evaluation metrics

Hi, thank you for your excellent work. I reproduce your work with the config file named default.yaml, but cannot get the same result(BLEU=0.74). And I found the train loss increased after a few epoches. Can you give some adivice?

Hello, author. Thank you for your open source. Unfortunately, in the course of the experiment, I found that only train TXT, test and val do not have corresponding txt files? I'd like to ask where this document is

How to use the model without using gui.py?

asumming i have picture.jpg in same folder, what should i type in new python script?

Broken links in README

The wikipedia and arXiv links are broken under Data header. (Didn't prefix them with https://)

how to get the data ?

Data
We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-100k dataset. All of it can be found here.

where is the wikipedia data ? how to use it ?
where is the arXiv data ? how to use it ?

i get an error qmutex destroying locked mutex when i'm trying to get the snippet picture, how do i fix this?

the error is not showing what line should i concern about.

Deficiencies in recognition

It always recognize 0 as () or O.It seems that the simpler the formula, the less likely it is to predict success.

test failed with this file: pix2tex.py

Hi, hello, I am a newbie. I would like to ask how to use pix2tex.py for testing. I entered the image path as shown in the figure below:

Thank you very much for your help！

Return unbalanced latex equations

I tested with lot of images, but for most of the images it results in an unbalanced latex equation, do you have any idea to resolve it?
I also attached the produced latex equation and image for reference.

$$U_{s2A_{-}k}=\bar{\cal B}{s}{s\bar{G}^{+}}\cdot\bar{Y}{s2A{-}\bar{k}}+\bar{\cal B}{r}\frac{\displaystyle\frac{\displaystyle\cal L}{m}^{2}}{\displaystyle\frac{\displaystyle\cal L}{\displaystyle\frac{\displaystyle\cal E}{m}}{\displaystyle\cal P}{s}\cdot\bar{\cal A}{2}\cdot\bar{\cal I}{s22}\cdot\bar{Y}{s2\bar{\cal B}{-}\bar{\cal F}{s}}{\displaystyle\cal E}{s2\bar{\cal F}{-}\displaystyle\cal A}^{-}\left(\frac{\displaystyle\frac{\displaystyle\frac{\displaystyle\cal E}{m}\cdot\bar{\cal S}{s}}{\displaystyle\cal E}{s}^{2}}\right)\cdot\bar{\mathrm Y}{s4\bar{\cal B}{s2}\cdot\bar{\cal A}{s2}\displaystyle\cal E{s}\bar{\cal F}_{s}\right},$$

Snipping failing everytime

I just finally managed to get to launch (getting torch errors) but it is showing prediction failed each and every time. Even expressions like 1=2-1 are not being captured. I don't think this software is this much lame. There must be some problem.
Also I am using it in a virtual environment if that matters. I am attaching a screenshot with error messages.

Unexpected key(s) in state_dict and size mismatch when running python3 pix2tex.py

macbook cpu inference

python3 pix2tex.py
/usr/local/lib/python3.9/site-packages/albumentations/augmentations/transforms.py:913: FutureWarning: This class has been deprecated. Please use ImageCompression
warnings.warn(
Traceback (most recent call last):
File "/mnt/disk/LaTeX-OCR/pix2tex.py", line 144, in
args, *objs = initialize(arguments)
File "/mnt/disk/LaTeX-OCR/pix2tex.py", line 55, in initialize
model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
Unexpected key(s) in state_dict: "encoder.patch_embed.backbone.stem.conv.weight", "encoder.patch_embed.backbone.stem.norm.weight", "encoder.patch_embed.backbone.stem.norm.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm3.bias".
size mismatch for encoder.patch_embed.proj.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 1, 16, 16]).

Model trained with the latest commit seems to be not working

Hi, I retrained the model with your latest commit c7898ab, and when I tried to run the pix2tex.py I got the below errors, It seems it not able to load the trained model, do you have any idea on that?

Traceback (most recent call last):
File "pix2tex.py", line 136, in
args, *objs = initialize(arguments)
File "pix2tex.py", line 49, in initialize
model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1224, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
Missing key(s) in state_dict: "decoder.net.attn_layers.layers.0.1.to_out.0.weight", "decoder.net.attn_layers.layers.0.1.to_out.0.bias", "decoder.net.attn_layers.layers.1.1.to_out.0.weight", "decoder.net.attn_layers.layers.1.1.to_out.0.bias", "decoder.net.attn_layers.layers.2.1.net.0.proj.weight", "decoder.net.attn_layers.layers.2.1.net.0.proj.bias", "decoder.net.attn_layers.layers.3.1.to_out.0.weight", "decoder.net.attn_layers.layers.3.1.to_out.0.bias", "decoder.net.attn_layers.layers.4.1.to_out.0.weight", "decoder.net.attn_layers.layers.4.1.to_out.0.bias", "decoder.net.attn_layers.layers.5.1.net.0.proj.weight", "decoder.net.attn_layers.layers.5.1.net.0.proj.bias", "decoder.net.attn_layers.layers.6.1.to_out.0.weight", "decoder.net.attn_layers.layers.6.1.to_out.0.bias", "decoder.net.attn_layers.layers.7.1.to_out.0.weight", "decoder.net.attn_layers.layers.7.1.to_out.0.bias", "decoder.net.attn_layers.layers.8.1.net.0.proj.weight", "decoder.net.attn_layers.layers.8.1.net.0.proj.bias", "decoder.net.attn_layers.layers.9.1.to_out.0.weight", "decoder.net.attn_layers.layers.9.1.to_out.0.bias", "decoder.net.attn_layers.layers.10.1.to_out.0.weight", "decoder.net.attn_layers.layers.10.1.to_out.0.bias", "decoder.net.attn_layers.layers.11.1.net.0.proj.weight", "decoder.net.attn_layers.layers.11.1.net.0.proj.bias".
Unexpected key(s) in state_dict: "encoder.patch_embed.backbone.stages.0.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm3.bias", "decoder.net.attn_layers.layers.0.1.to_out.weight", "decoder.net.attn_layers.layers.0.1.to_out.bias", "decoder.net.attn_layers.layers.1.1.to_out.weight", "decoder.net.attn_layers.layers.1.1.to_out.bias", "decoder.net.attn_layers.layers.2.1.net.0.0.weight", "decoder.net.attn_layers.layers.2.1.net.0.0.bias", "decoder.net.attn_layers.layers.3.1.to_out.weight", "decoder.net.attn_layers.layers.3.1.to_out.bias", "decoder.net.attn_layers.layers.4.1.to_out.weight", "decoder.net.attn_layers.layers.4.1.to_out.bias", "decoder.net.attn_layers.layers.5.1.net.0.0.weight", "decoder.net.attn_layers.layers.5.1.net.0.0.bias", "decoder.net.attn_layers.layers.6.1.to_out.weight", "decoder.net.attn_layers.layers.6.1.to_out.bias", "decoder.net.attn_layers.layers.7.1.to_out.weight", "decoder.net.attn_layers.layers.7.1.to_out.bias", "decoder.net.attn_layers.layers.8.1.net.0.0.weight", "decoder.net.attn_layers.layers.8.1.net.0.0.bias", "decoder.net.attn_layers.layers.9.1.to_out.weight", "decoder.net.attn_layers.layers.9.1.to_out.bias", "decoder.net.attn_layers.layers.10.1.to_out.weight", "decoder.net.attn_layers.layers.10.1.to_out.bias", "decoder.net.attn_layers.layers.11.1.net.0.0.weight", "decoder.net.attn_layers.layers.11.1.net.0.0.bias".

ValueError: Expected positive integer steps_per_epoch, but got 0

Hi, I follow the README and I have this error while training.
Is there any solution for this error?
I appreciate your help! I'm excited to try out your code

.

Feature extraction

Hi,

Can we use your trained model, to extract math features from an image. I mean can we take the layer before the prediction layer for feature extractions?

how to ocr low width latex image

If image width is too low, the ocr result will useless.

I have try to reduce patch_size to 8, but the error occured:
Exception has occurred: RuntimeError The size of tensor a (33) must match the size of tensor b (129) at non-singleton dimension 1 File "F:\code\LaTeX-OCR\models.py", line 81, in forward_features x += self.pos_embed[:, pos_emb_ind] File "F:\code\LaTeX-OCR\train.py", line 48, in train encoded = encoder(im.to(device)) File "F:\code\LaTeX-OCR\train.py", line 88, in <module> train(args)

I have struggle this issue several days, Please tell me what can I do for this situation.

Thank you very much!

what is the -tokenizer path_to_tokenizer

what is the -tokenizer path_to_tokenizer
python dataset/dataset.py --equations path_to_textfile --images path_to_images --tokenizer path_to_tokenizer --out dataset.pkl

How to divide the crohme data set（CROHME.zip is in your google driver） into training data sets, validation data sets, and test data sets for testing ?

Compatibility : Missing key(s)

Hi, author.
Thanks for your work. It will be great convenient for converting formulas to latex codes offline by this software.
I had installed its requirements under manjaro Linux today; however, it still threw an error like below if I executed the command python gui.py:

Traceback (most recent call last):
  File ".../LaTeX-OCR/gui.py", line 274, in <module>
    ex = App(arguments)
  File ".../LaTeX-OCR/gui.py", line 26, in __init__
    self.initModel()
  File ".../LaTeX-OCR/gui.py", line 33, in initModel
    args, *objs = pix2tex.initialize(self.args)
  File ".../LaTeX-OCR/pix2tex.py", line 55, in initialize
    model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
  File "/usr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
        Missing key(s) in state_dict: "decoder.net.attn_layers.layers.2.1.net.3.weight", "decoder.net.attn_layers.layers.2.1.net.3.bias", "decoder.net.attn_layers.layers.5.1.net.3.weight", "decoder.net.attn_layers.layers.5.1.net.3.bias", "decoder.net.attn_layers.layers.8.1.net.3.weight", "decoder.net.attn_layers.layers.8.1.net.3.bias", "decoder.net.attn_layers.layers.11.1.net.3.weight", "decoder.net.attn_layers.layers.11.1.net.3.bias". 
        Unexpected key(s) in state_dict: "decoder.net.attn_layers.layers.2.1.net.2.weight", "decoder.net.attn_layers.layers.2.1.net.2.bias", "decoder.net.attn_layers.layers.5.1.net.2.weight", "decoder.net.attn_layers.layers.5.1.net.2.bias", "decoder.net.attn_layers.layers.8.1.net.2.weight", "decoder.net.attn_layers.layers.8.1.net.2.bias", "decoder.net.attn_layers.layers.11.1.net.2.weight", "decoder.net.attn_layers.layers.11.1.net.2.bias".

I would like to know what caused this problem and how can I run the code correctly?
Thanks.

Code understanding

I want to understand the whole code please direct me where i can find help

training time and equipment

Hi, thanks for your sharing code. I want to use this dataset to train a similar model. So I'd like to know how long your model has been trained and what kind of machine did you use?

What settings to achieve BLEU: 0.88?

Hi Lukas,

Thanks for the work.

I trained on the same dataset you mentioned in README.
But I only get BLEU: 0.719, ED: 3.18e-01. After that, the training diverge and the BLEU decrease.
I would like o reproduce your training to get that BLEU: 0.88.

Thanks,
Hung

gui.py cannot capture screen in the second monitor

Hi, thank you for the excellent work. I meet some problems when using the gui. I am using Ubuntu20.04 with a kde desktop. After running the screen snip(button or alt+s), I can only start the area selection in the main monitor and the opacity only changes in the main monitor. If I press my mouse in the main screen and drag it to the second screen, the program can accurately select the area but the selected rectangle only shows the main screen part. I can not select area if I firstly press my mouse in the second screen.
When using the "gui.py" script, I slightly modify it and I'm quite sure that these changes are not related with this problem. Anyway, here are the changes that I made. The ImageGrab from PIL keep throwing errors so I change from PIL import ImageGrab to import pyscreenshot as ImageGrab. I also remove the all_screen parameter in line 252: img = ImageGrab.grab(bbox=(x1, y1, x2, y2), all_screens=True) since this parameter is only available in Windows.

Model training speed

How fast should the training of the model be? I'm using the data provided in the google drive link and using default.yaml. At the current rate, I'm needing more than a day to train the model. Is there a way to shorten the amount of time considerably?

how to create new dataset for testing?

Runtime error while running

I am getting this following error: RuntimeError: Error(s) in loading state_dict for ResNetV2: size mismatch for head.fc.weight: copying a param with shape torch.Size([21, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([22, 1024, 1, 1]). size mismatch for head.fc.bias: copying a param with shape torch.Size([21]) from checkpoint, the shape in current model is torch.Size([22]). I don't really know how to resolve this issue, please help.

generate the cromhe tokenizer.json ,error,how to fix it ?

(tf_1.12) root@f15b165683e6:/home/code/LaTeX-OCR# python dataset/dataset.py --equations latex-ocr-data/crohme/CROHME_math.txt --vocab-size 8000 --out crohme-tokenizer.json
Generate tokenizer
Traceback (most recent call last):
File "dataset/dataset.py", line 244, in
generate_tokenizer(args.equations, args.out, args.vocab_size)
File "dataset/dataset.py", line 228, in generate_tokenizer
trainer = BpeTrainer(special_tokens=["[PAD]", "[BOS]", "[EOS]"], vocab_size=vocab_size, show_progress=True)
TypeError: 'str' object cannot be interpreted as an integer
(tf_1.12) root@f15b165683e6:/home/code/LaTeX-OCR#
how to fix it ?

[image_resizer.pth] RuntimeError: Error(s) in loading state_dict for Model

Hello I tried using the image_resizer.pth weight because my data often have large image size, but when I modified the pix2tex file checkpoint argument to image_resizer.pth, I got a runtime error as appeared on this screenshot I attached. Does anyone ever tried image_resizer.pth? Or any suggestions to solve this issue?

Thank you in advance.

Here's how I modify the checkpoint argument:

Use GPU to train

Dear author:
Hello! I would like to ask how to use GPU to train my own data set?

Installation Help

I'm trying to install using the
pip install -r requirements.txt
line of code, but it seems like my computer is stuck in an infinite loop. I successfully installed Pytorch and Python 3.7 before running the requirements line. At first it was unsuccessful and the error line recommended trying --user. No dice. I appreciate your help! I'm excited to try out your code.

Is there any solution for bad predicition of image that has long width?

Im working with your great code! Thanks :)

I already finish training with my own latex data and by using pix2tex.py, I get output of my own testset.

Most of testset predict well, but some images that have relatively long width predict bad.

Is there any tips for this problem? (like use small patch size, etc)

)can't snip outside window (i3-wm)

I just found your great tool and it seems that the way you capture in a windows does not allow for tiling managers like i3-wm. The snipping seems to open a separate windows instead of creating a layer on top of all windows

.

Use model on Android?

Hi! Your model is working great on PC, but is is possible to use it on Android device?
As far as I know, the model have to be converted to TorchScript format to work on mobile device, but it's not enough. We also need to transfer "call_model" function from pix2tex.py script to Android app, because model requires specific image resize to work. How we can do that? Thank you :)

Convert to onnx model

Your work is very helpful for you, thank you! But when I try to convert this pytorch model to onnx file, I meet some errors. Have you tried this program? Thanks!

Model fails for a simple tex

Hi,

it's a great project, many thanks for it! The model needs some work though, I found it failing for relatively simple examples like this one: $\mathcal{X} \times \Theta \to \overbar{R}$ spitting out very different results in consecutive prediction attempts. Fundamentally this issue could be solved by the introduction of confidence thresholds...

Cheers,
Wojtek

gui does not show the original text

Hey guys,

thanks for working on this, its a cool project. I have installed it and am using the GUI on win10. Here is what i see:

There is no upper part of the GUI, perhaps its just the copy of the screenshot so its nothing, but just asking if this might impact the functionality.

Finally I would like to know, how can I retranslate the LATEX OCR:

$\scriptstyle\pi(\infty);=;1\times1$

from

back into readable format.

Thanks!

Completely unusable

At first when I tried a complicated formula, it would get stuck for a long time and then returned the wrong result.

Later, I found that even the simplest case is not recognized by this program.

Getting unbalanced latex equations as results

Hi, I tried with your recent commit, this time I tested with a different images but it doesn't seems to fix the issue. I am still getting unbalanced latex equation, and you also mentioned you have trained a new model, but did you uploaded it in the drive?

I also observed that when we try to predict the same image for multiple times, it produces different latex results, is that the expected behaviour?

Image grab not supported in Linux

Take a snapshot of the clipboard image, if any. Only macOS and Windows are currently supported.
https://pillow.readthedocs.io/en/stable/reference/ImageGrab.html#PIL.ImageGrab.grabclipboard

pix2tex.py uses this method but it is not supported in Linux.
Need to find a workaround.

google/protobuf/pyext/descriptor.cc:358: bad argument to internal function

Hello, i got this error went i run python pix2tex.

I have torch 1.7 with cuda.

Training?

Really Nice Work.
Can you please provide a proper pipeline on how to train with own data.
I tried to use your formulae images and config (from google drive) to train but got error
RuntimeError: merge_sort: failed to synchronize: device-side assert triggered
Can you help in training.
Thanks

Missing key(s) in state_dict

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
Missing key(s) in state_dict:

Help to speed up inference processing

Hi authors.
Thank you so much for awesome project. It working very good. Currently i got issues about time consuming when I ran with multiple cropped image formulation ( about 50 images) it took about 9 s to run all images. I ran model by function call_model in pix2text.py and with GPU 2080TI. Do you have some ideal to speed up inference processing. Thank u so much

Munch attribute error

I get the above error when trying to run pix2tex.py. How can I resolve this? I am running on Windows 10.

Error: Index out of range in self during the model training

I tried to train the model, in the CPU, but received the below error; not sure what could be the cause?

Loss: 1.0180: 2%|█▉ | 421/18013 [09:41<6:45:16, 1.38s/it]
Traceback (most recent call last):
File "train.py", line 94, in
train(args)
File "train.py", line 52, in train
loss = decoder(tgt_seq, mask=tgt_mask, context=encoded)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/autoregressive_wrapper.py", line 102, in forward
out = self.net(xi, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/x_transformers.py", line 738, in forward
x += self.pos_emb(x)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/x_transformers.py", line 107, in forward
return self.emb(n)[None, :, :]
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 156, in forward
return F.embedding(
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/functional.py", line 1916, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

RuntimeError: The size of tensor a (41) must match the size of tensor b (37) at non-singleton dimension 1

I have this error while training.

I use my own data and also use dataset.py to group label and images. (So I already made train, valid pkl files)

Is there any solution for this error? ( I think problem caused by width and height of images in yaml file..)

No module named 'PyQt5.QtWebEngineWidgets'

I'm getting this error every time I try to run gui.py:

Traceback (most recent call last):
  File "gui.py", line 6, in <module>
    from PyQt5.QtWebEngineWidgets import QWebEngineView
ModuleNotFoundError: No module named 'PyQt5.QtWebEngineWidgets'

Regarding PyQt5 related packages installed, this is what I have

PyQt5==5.15.5
PyQt5-Qt5==5.15.2
PyQt5-sip==12.9.0
PyQtWebEngine-Qt5==5.15.2