Translating Math Formula Images To LaTeX Sequences

Scaling Up Image-to-LaTeX Performance: Sumen An End-to-End Transformer Model With Large Dataset.

Performance

Setup

To run the model you need Python >= 3.8:

conda create --name img2latex python=3.8 -y

Install environment:

pip install -r requirements.txt

conda env create -f environment.yml

Uses

Available Model Checkpoint

We provide many Sumen model(base) - 349m params on Hugging Face, which can be downloaded at hoang-quoc-trung/sumen-base.

Training

python train.py --config_path src/config/base_config.yaml --resume_from_checkpoint true

arguments:
    -h, --help                   Show this help message and exit
    --config_path                Path to configuration file
    --resume_from_checkpoint     Continue training from saved checkpoint (true/false)

Inference

python inference.py --input_image assets/example_1.png --ckpt src/checkpoints

arguments:
    -h, --help                   Show this help message and exit
    --input_image                Path to image file
    --ckpt                       Path to the checkpoint model

Test

python test.py --config_path src/config/base_config.yaml --ckpt src/checkpoints

arguments:
    -h, --help                   Show this help message and exit
    --config_path                Path to configuration file
    --ckpt                       Path to the checkpoint model

Web Demo

streamlit run streamlit_app.py --ckpt src/checkpoints

arguments:
    -h, --help                   Show this help message and exit
    --ckpt                       Path to the checkpoint model

python gradio_app.py --ckpt src/checkpoints

arguments:
    -h, --help                   Show this help message and exit
    --ckpt                       Path to the checkpoint model

Dataset

Dataset is available here: Fusion Image To Latex Datasets

The directory data structure can look as follows:

Save all images in a folder, replace the path as root in config file.
Prepare a CSV file with 2 columns:
- image_filename: The name of image file.
- latex: Latex code.

Samples:

image_filename	latex
200922-1017-140.bmp	\sqrt { \frac { c } { N } }
78cd39ce-71fc-4c86-838a-defa185e0020.jpg	\lim_{w\to1}\cos{w}
KME2G3_19_sub_30.bmp	\sum _ { i = 2 n + 3 m } ^ { 1 0 } i x
1d801f89870fb81_basic.png	\sqrt { \varepsilon _ { \mathrm { L J } } / m \sigma ^ { 2 } }

Random Output

import torch
import requests
from PIL import Image
from transformers import AutoProcessor, VisionEncoderDecoderModel

Load model & processor

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = VisionEncoderDecoderModel.from_pretrained('hoang-quoc-trung/sumen-base').to(device)
processor = AutoProcessor.from_pretrained('hoang-quoc-trung/sumen-base')
task_prompt = processor.tokenizer.bos_token
decoder_input_ids = processor.tokenizer(
task_prompt,
add_special_tokens=False,
return_tensors="pt"
).input_ids

Load image

image_path = '/content/image42.png' # replace with your local image path
image = Image.open(image_path).convert('RGB')
pixel_values = processor.image_processor(
image,
return_tensors="pt",
data_format="channels_first",
).pixel_values

Generate LaTeX expression

with torch.no_grad():
outputs = model.generate(
pixel_values.to(device),
decoder_input_ids=decoder_input_ids.to(device),
max_length=model.decoder.config.max_length,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
use_cache=True,
num_beams=4,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
return_dict_in_generate=True,
)
sequence = processor.tokenizer.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(
processor.tokenizer.eos_token, ""
).replace(
processor.tokenizer.pad_token, ""
).replace(processor.tokenizer.bos_token,"")
print(sequence)

This is the output for the given image,
\operatorname* { l i m } _ { x \to \infty } \frac { \frac { d } { d x } \left( e ^ { x } + - 2 \frac { 2 } { x } \right) } { \frac { d } { d x } x ^ { - 2 } }

The output should be $2 x 10^{-3}$

And it gives same output for all other images that I upload. Kindly help.

hoang-quoc-trung / sumen Goto Github PK

sumen's Introduction

Translating Math Formula Images To LaTeX Sequences

Performance

Setup

Uses

Available Model Checkpoint

Training

Inference

Test

Web Demo

Dataset

sumen's People

Contributors

Stargazers

Watchers

Forkers

sumen's Issues

Load model & processor

Load image

Generate LaTeX expression

Recommend Projects

Recommend Topics

Recommend Org