Git Product home page Git Product logo

sumen's Introduction

Translating Math Formula Images To LaTeX Sequences

Scaling Up Image-to-LaTeX Performance: Sumen An End-to-End Transformer Model With Large Dataset.

Performance

Setup

  • To run the model you need Python >= 3.8:

    conda create --name img2latex python=3.8 -y
  • Install environment:

    pip install -r requirements.txt

    or

    conda env create -f environment.yml

Uses

Available Model Checkpoint

We provide many Sumen model(base) - 349m params on Hugging Face, which can be downloaded at hoang-quoc-trung/sumen-base.

Training

python train.py --config_path src/config/base_config.yaml --resume_from_checkpoint true

arguments:
    -h, --help                   Show this help message and exit
    --config_path                Path to configuration file
    --resume_from_checkpoint     Continue training from saved checkpoint (true/false)

Inference

python inference.py --input_image assets/example_1.png --ckpt src/checkpoints

arguments:
    -h, --help                   Show this help message and exit
    --input_image                Path to image file
    --ckpt                       Path to the checkpoint model

Test

python test.py --config_path src/config/base_config.yaml --ckpt src/checkpoints

arguments:
    -h, --help                   Show this help message and exit
    --config_path                Path to configuration file
    --ckpt                       Path to the checkpoint model

Web Demo

streamlit run streamlit_app.py --ckpt src/checkpoints

arguments:
    -h, --help                   Show this help message and exit
    --ckpt                       Path to the checkpoint model

or

python gradio_app.py --ckpt src/checkpoints

arguments:
    -h, --help                   Show this help message and exit
    --ckpt                       Path to the checkpoint model

Dataset

Dataset is available here: Fusion Image To Latex Datasets

The directory data structure can look as follows:

  • Save all images in a folder, replace the path as root in config file.
  • Prepare a CSV file with 2 columns:
    • image_filename: The name of image file.
    • latex: Latex code.

Samples:

image_filename latex
200922-1017-140.bmp \sqrt { \frac { c } { N } }
78cd39ce-71fc-4c86-838a-defa185e0020.jpg \lim_{w\to1}\cos{w}
KME2G3_19_sub_30.bmp \sum _ { i = 2 n + 3 m } ^ { 1 0 } i x
1d801f89870fb81_basic.png \sqrt { \varepsilon _ { \mathrm { L J } } / m \sigma ^ { 2 } }

sumen's People

Contributors

hoang-quoc-trung avatar baodree avatar

Stargazers

hiep tran avatar 呦吼呦吼lyb avatar  avatar SoulCode1 avatar Kevin Qiu avatar Máté Pásztor avatar cyangsn avatar LinkW avatar

Watchers

Kostas Georgiou avatar  avatar

sumen's Issues

Random Output

import torch
import requests
from PIL import Image
from transformers import AutoProcessor, VisionEncoderDecoderModel

Load model & processor

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = VisionEncoderDecoderModel.from_pretrained('hoang-quoc-trung/sumen-base').to(device)
processor = AutoProcessor.from_pretrained('hoang-quoc-trung/sumen-base')
task_prompt = processor.tokenizer.bos_token
decoder_input_ids = processor.tokenizer(
task_prompt,
add_special_tokens=False,
return_tensors="pt"
).input_ids

Load image

image_path = '/content/image42.png' # replace with your local image path
image = Image.open(image_path).convert('RGB')
pixel_values = processor.image_processor(
image,
return_tensors="pt",
data_format="channels_first",
).pixel_values

Generate LaTeX expression

with torch.no_grad():
outputs = model.generate(
pixel_values.to(device),
decoder_input_ids=decoder_input_ids.to(device),
max_length=model.decoder.config.max_length,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
use_cache=True,
num_beams=4,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
return_dict_in_generate=True,
)
sequence = processor.tokenizer.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(
processor.tokenizer.eos_token, ""
).replace(
processor.tokenizer.pad_token, ""
).replace(processor.tokenizer.bos_token,"")
print(sequence)
image42
This is the output for the given image,
\operatorname* { l i m } _ { x \to \infty } \frac { \frac { d } { d x } \left( e ^ { x } + - 2 \frac { 2 } { x } \right) } { \frac { d } { d x } x ^ { - 2 } }

The output should be $2 x 10^{-3}$

And it gives same output for all other images that I upload. Kindly help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.