Git Product home page Git Product logo

image-to-latex's Introduction

Image to LaTeX

Code style: black pre-commit License

An application that maps an image of a LaTeX math equation to LaTeX code.

Image to Latex streamlit app

Introduction

The problem of image-to-markup generation was attempted by Deng et al. (2016). They extracted about 100K formulas by parsing LaTeX sources of papers from the arXiv. They rendered the formulas using pdflatex and converted the rendered PDF files to PNG format. The raw and preprocessed versions of their dataset are available online. In their model, a CNN is first used to extract image features. The rows of the features are then encoded using a RNN. Finally, the encoded features are used by an RNN decoder with an attention mechanism. The model has 9.48 million parameters in total. Recently, Transformer has overtaken RNN for many language tasks, so I thought I might give it try in this problem.

Methods

Using their dataset, I trained a model that uses ResNet-18 as encoder with 2D positional encoding and a Transformer as decoder with cross-entropy loss. (Similar to the one described in Singh et al. (2021), except that I used ResNet only up to block 3 to reduce computational costs, and I excluded the line number encoding as it doesn't apply to this problem.) The model has about 3 million parameters.

Model Architecture

Model architecture. Taken from Singh et al. (2021).

Initially, I used the preprocessed dataset to train my model, because the preprocessed images are downsampled to half of their original sizes for efficiency, and are grouped and padded into similar sizes to facilitate batching. However, this rigid preprocessing turned out to be a huge limitation. Although the model could achieve a reasonable performance on the test set (which was preprocessed the same way as the training set), it did not generalize well to images outside the dataset, most likely because the image quality, padding, and font size are so different from the images in the dataset. This phenomenon has also been observed by others who have attempted the same problem using the same dataset (e.g., this project, this issue and this issue).

To this end, I used the raw dataset and included image augmentation (e.g. random scaling, gaussian noise) in my data processing pipeline to increase the diversity of the samples. Moreover, unlike Deng et al. (2016), I did not group images by size. Rather, I sampled them uniformly and padded them to the size of the largest image in the batch, so that the model must learn how to adapt to different padding sizes.

Additional problems that I faced in the dataset:

  • Some latex code produces visually identical outputs (e.g. \left( and \right) look the same as ( and )), so I normalized them.
  • Some latex code is used to add space (e.g. \vspace{2px} and \hspace{0.3mm}). However, the length of the space is diffcult to judge even for humans. Also, there are many ways to express the same spacing (e.g. 1 cm = 10 mm). Finally, I don't want the model to generate code on blank images, so I removed them. (I only removed \vspace and \hspace, but turns out there are a lot of commands for horizontal spacing. I only realized that during error analysis. See below.)

Results

The best run has a character error rate (CER) of 0.17 in test set. Here is an example from the test dataset:

  • The input image and the model prediction look identical. But in the ground truth label, the horizontal spacing was created using ~, whereas the model used \,, so this was still counted as an error.

I also took some screenshots in some random Wikipedia articles to see whether the model generalizes to images outside of the dataset:

Screen Shot 2021-08-27 at 8 06 54 AM

  • The model output is actually correct but for some reason Streamlit can't render code with \cal.

  • Incorrectly bolded some of the symbols.

The model also seems to have some trouble when the image is larger than what those in the dataset. Perhaps I should have increased the range of rescaling factor in the data augmentation process.

Discussion

I think I should have defined the scope of the project better:

These questions should be used to guide the data cleaning process.

I found a pretty established tool called Mathpix Snip that converts handwritten formulas into LaTex code. Its vocabulary size is around 200. Excluding numbers and English letters, the number of LaTex commands it can produce is actually just above 100. (The vocabulary size of im2latex-100k is almost 500). It only includes two horizontal spacing commands (\quad and \qquad), and it doesn't recognize different sizes of parentheses. Perphas confining to a limited set of vocabulary is what I should have done, since there are so many ambiguities in real-world LaTeX.

Obvious possible improvements of this work include (1) training the model for more epochs (for the sake of time, I only trained the model for 15 epochs, but the validation loss is still going down), (2) using beam search (I only implemented greedy search), (3) using a larger model (e.g., use ResNet-34 instead of ResNet-18) and doing some hyperparameter tuning. I didn't do any of these, because I had limited computational resources (I was using Google Colab). But ultimately, I believe having data that don't have ambiguous labels and doing more data augmentation are the keys to the success of this problem.

The model performacne is not as good as I want to be, but I hope the lessons I learned from this project are useful to someone wants to tackle similar problems in the future.

How To Use

Setup

Clone the repository to your computer and position your command line inside the repository folder:

git clone https://github.com/kingyiusuen/image-to-latex.git
cd image-to-latex

Then, create a virtual environment named venv and install required packages:

make venv
make install-dev

Data Preprocessing

Run the following command to download the im2latex-100k dataset and do all the preprocessing. (The image cropping step may take over an hour.)

python scripts/prepare_data.py

Model Training and Experiment Tracking

Model Training

An example command to start a training session:

python scripts/run_experiment.py trainer.gpus=1 data.batch_size=32

Configurations can be modified in conf/config.yaml or in command line. See Hydra's documentation to learn more.

Experiment Tracking using Weights & Biases

The best model checkpoint will be uploaded to Weights & Biases (W&B) automatically (you will be asked to register or login to W&B before the training starts). Here is an example command to download a trained model checkpoint from W&B:

python scripts/download_checkpoint.py RUN_PATH

Replace RUN_PATH with the path of your run. The run path should be in the format of <entity>/<project>/<run_id>. To find the run path for a particular experiment run, go to the Overview tab in the dashboard.

For example, you can use the following command to download my best run

python scripts/download_checkpoint.py kingyiusuen/image-to-latex/1w1abmg1

The checkpoint will be downloaded to a folder named artifacts under the project directory.

Testing and Continuous Integration

The following tools are used to lint the codebase:

isort: Sorts and formats import statements in Python scripts.

black: A code formatter that adheres to PEP8.

flake8: A code linter that reports stylistic problems in Python scripts.

mypy: Performs static type checking in Python scripts.

Use the following command to run all the checkers and formatters:

make lint

See pyproject.toml and setup.cfg at the root directory for their configurations.

Similar checks are done automatically by the pre-commit framework when a commit is made. Check out .pre-commit-config.yaml for the configurations.

Deployment

An API is created to make predictions using the trained model. Use the following command to get the server up and running:

make api

You can explore the API via the generated documentation at http://0.0.0.0:8000/docs.

To run the Streamlit app, create a new terminal window and use the following command:

make streamlit

The app should be opened in your browser automatically. You can also open it by visiting http://localhost:8501. For the app to work, you need to download the artifacts of an experiment run (see above) and have the API up and running.

To create a Docker image for the API:

make docker

Acknowledgement

image-to-latex's People

Contributors

kingyiusuen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

image-to-latex's Issues

`make venv` fails: `error: Multiple top-level packages discovered in a flat-layout: ['api', 'conf', 'figures', 'streamlit', 'image_to_latex']`

 ❯ make venv
python3 -m venv venv
source venv/bin/activate && \
	python -m pip install --upgrade pip setuptools wheel && \
	make install-dev
Requirement already satisfied: pip in ./venv/lib/python3.10/site-packages (22.2.2)
Requirement already satisfied: setuptools in ./venv/lib/python3.10/site-packages (64.0.3)
Requirement already satisfied: wheel in ./venv/lib/python3.10/site-packages (0.37.1)
python -m pip install -e ".[dev]" --no-cache-dir
Obtaining file:///Users/evar/Base/_Code/uni/image-to-latex
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build editable did not run successfully.
  │ exit code: 1
  ╰─> [14 lines of output]
      error: Multiple top-level packages discovered in a flat-layout: ['api', 'conf', 'figures', 'streamlit', 'image_to_latex'].
      
      To avoid accidental inclusion of unwanted files or directories,
      setuptools will not proceed with this build.
      
      If you are trying to create a single distribution with multiple packages
      on purpose, you should not rely on automatic discovery.
      Instead, consider the following options:
      
      1. set up custom discovery (`find` directive with `include` or `exclude`)
      2. use a `src-layout`
      3. explicitly set `py_modules` or `packages` with a list of names
      
      To find more information, look for "package discovery" on setuptools docs.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
make[1]: *** [install-dev] Error 1
make: *** [venv] Error 2

checkpoint export to onnx fail

I convert checkpoint export to onnx, but fail
lit_model = LitResNetTransformer.load_from_checkpoint("artifacts/model_basic_2.ckpt") # lit_model.freeze() lit_model.eval() x = torch.randn((1, 16)) lit_model.to_onnx("xxx.onnx", x)

Traceback (most recent call last): File "C:/Users/Administrator/PycharmProjects/image-to-latex/model_test.py", line 44, in <module> load_model() File "C:/Users/Administrator/PycharmProjects/image-to-latex/model_test.py", line 21, in load_model torch.onnx.export(lit_model, x, "hpocr_torch.onnx", verbose=True, input_names=input_names, File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\onnx\__init__.py", line 275, in export return utils.export(model, args, f, export_params, verbose, training, File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\onnx\utils.py", line 88, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\onnx\utils.py", line 689, in _export _model_to_graph(model, args, verbose, input_names, File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\onnx\utils.py", line 458, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args, File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\onnx\utils.py", line 422, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\onnx\utils.py", line 373, in _trace_and_get_graph_from_model torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True) File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\jit\_trace.py", line 1160, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs) File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\jit\_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\jit\_trace.py", line 118, in wrapper outs.append(self.inner(*trace_inputs)) File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\nn\modules\module.py", line 1039, in _slow_forward result = self.forward(*input, **kwargs) File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\pytorch_lightning\core\lightning.py", line 529, in forward return super().forward(*args, **kwargs) File "C:\Users\Administrator\anaconda3\envs\image-to-latex\lib\site-packages\torch\nn\modules\module.py", line 201, in _forward_unimplemented raise NotImplementedError

Error: virtual environment named venv

make : The term 'make' is not recognized as the name of a cmdlet, function, script file, or operable program. Check
the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:1

  • make venv
  •   + CategoryInfo          : ObjectNotFound: (make:String) [], CommandNotFoundException
      + FullyQualifiedErrorId : CommandNotFoundException
    

get image by screenshot

feature request: the program has a feature with which you can get the image by screenshot instead of drag n drop

Failed to install editdistance library in Win11

  Building wheel for editdistance (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: 'D:\Applications\WPy64-3850\python-3.8.5.amd64\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'E:\\Temp\\pip-install-lr00wmov\\editdistance\\setup.py'"'"'; __file__='"'"'E:\\Temp\\pip-install-lr00wmov\\editdistance\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'E:\Temp\pip-wheel-uujy4pu5'
       cwd: E:\Temp\pip-install-lr00wmov\editdistance\
  Complete output (31 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.8
  creating build\lib.win-amd64-3.8\editdistance
  copying editdistance\__init__.py -> build\lib.win-amd64-3.8\editdistance
  copying editdistance\_editdistance.h -> build\lib.win-amd64-3.8\editdistance
  copying editdistance\def.h -> build\lib.win-amd64-3.8\editdistance
  running build_ext
  building 'editdistance.bycython' extension
  creating build\temp.win-amd64-3.8
  creating build\temp.win-amd64-3.8\Release
  creating build\temp.win-amd64-3.8\Release\editdistance
  C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30037\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I./editdistance -ID:\Applications\WPy64-3850\python-3.8.5.amd64\include -ID:\Applications\WPy64-3850\python-3.8.5.amd64\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30037\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30037\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IE:\Windows Kits\10\include\10.0.19041.0\ucrt" "-IE:\Windows Kits\10\include\10.0.19041.0\shared" "-IE:\Windows Kits\10\include\10.0.19041.0\um" "-IE:\Windows Kits\10\include\10.0.19041.0\winrt" "-IE:\Windows Kits\10\include\10.0.19041.0\cppwinrt" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30037\include" "-IE:\Windows Kits\10\Include\10.0.19041.0\ucrt" "-IE:\Windows Kits\10\Include\10.0.19041.0\um" "-IE:\Windows Kits\10\Include\10.0.19041.0\shared" "-IE:\Windows Kits\10\Include\10.0.19041.0\winrt" /EHsc /Tpeditdistance/_editdistance.cpp /Fobuild\temp.win-amd64-3.8\Release\editdistance/_editdistance.obj
  _editdistance.cpp
  editdistance/_editdistance.cpp(1): warning C4819: 该文件包含不能在当前代码页(936)中表示 的字符。请将该文件保存为 Unicode 格式以防止数据丢失
  editdistance/_editdistance.cpp(117): error C2059: 语法错误:“if”
  editdistance/_editdistance.cpp(118): error C2059: 语法错误:“else”
  editdistance/_editdistance.cpp(119): error C2059: 语法错误:“else”
  editdistance/_editdistance.cpp(120): error C2059: 语法错误:“else”
  editdistance/_editdistance.cpp(121): error C2059: 语法错误:“else”
  editdistance/_editdistance.cpp(122): error C2059: 语法错误:“else”
  editdistance/_editdistance.cpp(123): error C2059: 语法错误:“else”
  editdistance/_editdistance.cpp(124): error C2059: 语法错误:“else”
  editdistance/_editdistance.cpp(125): error C2059: 语法错误:“else”
  editdistance/_editdistance.cpp(126): error C2059: 语法错误:“else”
  editdistance/_editdistance.cpp(127): error C2059: 语法错误:“return”
  editdistance/_editdistance.cpp(128): error C2059: 语法错误:“}”
  editdistance/_editdistance.cpp(128): error C2143: 语法错误: 缺少“;”(在“}”的前面)
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.29.30037\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
  ----------------------------------------
  ERROR: Failed building wheel for editdistance

Suggest to loosen the dependency on albumentations

Hi, your project image-to-latex requires "albumentations==1.0.3" in its dependency. After analyzing the source code, we found that the following versions of albumentations can also be suitable without affecting your project, i.e., albumentations 1.0.0, 1.0.1, 1.0.2. Therefore, we suggest to loosen the dependency on albumentations from "albumentations==1.0.3" to "albumentations>=1.0.0,<=1.0.3" to avoid any possible conflict for importing more packages or for downstream projects that may use image-to-latex.

May I pull a request to further loosen the dependency on albumentations?

By the way, could you please tell us whether such dependency analysis may be potentially helpful for maintaining dependencies easier during your development?



We also give our detailed analysis as follows for your reference:

Your project image-to-latex directly uses 5 APIs from package albumentations.

albumentations.augmentations.geometric.transforms.Affine.__init__, albumentations.pytorch.transforms.ToTensorV2.__init__, albumentations.core.composition.Compose.__init__, albumentations.augmentations.transforms.GaussianBlur.__init__, albumentations.augmentations.transforms.GaussNoise.__init__

Beginning from the 5 APIs above, 15 functions are then indirectly called, including 14 albumentations's internal APIs and 1 outsider APIs. The specific call graph is listed as follows (neglecting some repeated function occurrences).

[/kingyiusuen/image-to-latex]
+--albumentations.augmentations.geometric.transforms.Affine.__init__
|      +--albumentations.core.transforms_interface.BasicTransform.__init__
|      +--albumentations.augmentations.geometric.transforms.Affine._handle_dict_arg
|      |      +--albumentations.core.transforms_interface.to_tuple
|      +--albumentations.augmentations.geometric.transforms.Affine._handle_translate_arg
|      +--albumentations.core.transforms_interface.to_tuple
+--albumentations.pytorch.transforms.ToTensorV2.__init__
|      +--albumentations.core.transforms_interface.BasicTransform.__init__
+--albumentations.core.composition.Compose.__init__
|      +--albumentations.core.composition.BaseCompose.__init__
|      |      +--albumentations.core.composition.Transforms.__init__
|      |      |      +--albumentations.core.composition.Transforms._find_dual_start_end
|      |      |      |      +--albumentations.core.composition.Transforms._find_dual_start_end
|      +--albumentations.augmentations.bbox_utils.BboxProcessor.__init__
|      |      +--albumentations.core.utils.DataProcessor.__init__
|      +--albumentations.core.composition.BboxParams.__init__
|      |      +--albumentations.core.utils.Params.__init__
|      +--albumentations.augmentations.keypoints_utils.KeypointsProcessor.__init__
|      |      +--albumentations.core.utils.DataProcessor.__init__
|      +--albumentations.core.composition.KeypointParams.__init__
|      |      +--albumentations.core.utils.Params.__init__
|      +--albumentations.core.composition.BaseCompose.add_targets
+--albumentations.augmentations.transforms.GaussianBlur.__init__
|      +--albumentations.core.transforms_interface.BasicTransform.__init__
|      +--albumentations.core.transforms_interface.to_tuple
|      +--warnings.warn
+--albumentations.augmentations.transforms.GaussNoise.__init__
|      +--albumentations.core.transforms_interface.BasicTransform.__init__

We scan albumentations's versions and observe that during its evolution between any version from [1.0.0, 1.0.1, 1.0.2] and 1.0.3, the changing functions (diffs being listed below) have none intersection with any function or API we mentioned above (either directly or indirectly called by this project).

diff: 1.0.3(original) 1.0.0
['albumentations.core.composition.Compose._check_data_post_transform', 'albumentations.core.utils.DataProcessor.postprocess', 'albumentations.augmentations.transforms.PadIfNeeded.update_params', 'albumentations.core.composition.Compose.__call__', 'albumentations.core.composition.Compose', 'albumentations.core.utils.get_shape', 'albumentations.augmentations.transforms.Normalize', 'albumentations.augmentations.geometric.transforms.Affine', 'albumentations.augmentations.crops.transforms.CropAndPad._get_px_params', 'albumentations.augmentations.crops.transforms.CropAndPad', 'albumentations.augmentations.transforms.Superpixels', 'albumentations.augmentations.transforms.PadIfNeeded', 'albumentations.augmentations.bbox_utils.convert_bbox_from_albumentations', 'albumentations.core.utils.DataProcessor', 'albumentations.augmentations.transforms.PadIfNeeded.PositionType', 'albumentations.augmentations.transforms.PadIfNeeded.__update_position_params', 'albumentations.augmentations.bbox_utils.convert_bbox_to_albumentations', 'albumentations.augmentations.transforms.PadIfNeeded.__init__', 'albumentations.augmentations.bbox_utils.check_bbox']

diff: 1.0.3(original) 1.0.1
['albumentations.core.composition.Compose.__call__', 'albumentations.core.composition.Compose', 'albumentations.core.composition.Compose._check_data_post_transform', 'albumentations.augmentations.bbox_utils.convert_bbox_to_albumentations', 'albumentations.augmentations.transforms.Superpixels', 'albumentations.core.utils.DataProcessor.postprocess', 'albumentations.core.utils.get_shape', 'albumentations.augmentations.bbox_utils.convert_bbox_from_albumentations', 'albumentations.core.utils.DataProcessor', 'albumentations.augmentations.bbox_utils.check_bbox']

diff: 1.0.3(original) 1.0.2
['albumentations.core.composition.Compose', 'albumentations.core.composition.Compose._check_data_post_transform', 'albumentations.augmentations.transforms.Superpixels', 'albumentations.core.utils.DataProcessor.postprocess', 'albumentations.core.utils.get_shape', 'albumentations.core.utils.DataProcessor', 'albumentations.augmentations.bbox_utils.check_bbox']

As for other packages, the APIs of warnings are called by albumentations in the call graph and the dependencies on these packages also stay the same in our suggested versions, thus avoiding any outside conflict.

Therefore, we believe that it is quite safe to loose your dependency on albumentations from "albumentations==1.0.3" to "albumentations>=1.0.0,<=1.0.3". This will improve the applicability of image-to-latex and reduce the possibility of any further dependency conflict with other projects.

Explain the configurations of the Transformer decoder please!

Firstly, thank you for the awesome project. I am a bit confused about why you configured the Transformer decoder like this:

  d_model: 128
  dim_feedforward: 256
  nhead: 4
  dropout: 0.3
  num_decoder_layers: 3
  max_output_len: 150

How can I configure these parameters correctly?
Thank you for reading, and I hope you will respond to me soon.
@kingyiusuen

train error with self created data

/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [37,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [37,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [37,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [37,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [37,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [37,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [37,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Error executing job with overrides: ['trainer.gpus=1', 'data.batch_size=8']
Traceback (most recent call last):
  File "run_experiment.py", line 42, in <module>
    main()
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/hydra/main.py", line 49, in decorated_main
    _run_hydra(
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/hydra/_internal/utils.py", line 367, in _run_hydra
    run_and_report(
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
    raise ex
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/hydra/_internal/utils.py", line 368, in <lambda>
    lambda: hydra.run(
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 110, in run
    _ = ret.return_value
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "run_experiment.py", line 36, in main
    trainer.tune(lit_model, datamodule=datamodule)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 688, in tune
    result = self.tuner._tune(model, scale_batch_size_kwargs=scale_batch_size_kwargs, lr_find_kwargs=lr_find_kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/tuner/tuning.py", line 54, in _tune
    result['lr_find'] = lr_find(self.trainer, model, **lr_find_kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/tuner/lr_finder.py", line 250, in lr_find
    trainer.tuner._run(model)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/tuner/tuning.py", line 64, in _run
    self.trainer._run(*args, **kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
    self.dispatch()
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
    self.accelerator.start_training(self)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage
    return self.run_train()
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 871, in run_train
    self.train_loop.run_training_epoch()
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in run_training_batch
    self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 434, in optimizer_step
    model_ref.optimizer_step(
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
    self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
    trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
    self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
    self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
    optimizer.step(closure=lambda_closure, **kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/torch/optim/adamw.py", line 65, in step
    loss = closure()
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 732, in train_step_and_backward_closure
    result = self.training_step_and_backward(
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
    training_step_output = self.trainer.accelerator.training_step(args)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
    return self.training_type_plugin.training_step(*args)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 155, in training_step
    return self.lightning_module.training_step(*args, **kwargs)
  File "/home/nd/PycharmProjects/imagetolatex/image_to_latex/lit_models/lit_resnet_transformer.py", line 55, in training_step
    logits = self.model(imgs, targets[:, :-1])
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/nd/PycharmProjects/imagetolatex/image_to_latex/models/resnet_transformer.py", line 88, in forward
    output = self.decode(y, encoded_x)  # (Sy, B, num_classes)
  File "/home/nd/PycharmProjects/imagetolatex/image_to_latex/models/resnet_transformer.py", line 122, in decode
    y = self.embedding(y) * math.sqrt(self.d_model)  # (Sy, B, E)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    return F.embedding(
  File "/home/nd/anaconda3/envs/img2latex/lib/python3.8/site-packages/torch/nn/functional.py", line 2043, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered

integrate with Lightning ecosystem CI

Hello and so happy to see you use Pytorch-Lightning! 🎉
Just wondering if you already heard about quite the new Pytorch Lightning (PL) ecosystem CI where we would like to invite you to... You can check out our blog post about it: Stay Ahead of Breaking Changes with the New Lightning Ecosystem CI
As you use PL framework for your cool project, we would like to enhance your experience and offer you safe updates to our future releases. At this moment, you run tests with a particular PL version, but it may accidentally happen that the next version will be incompatible with your project... 😕 We do not intend to change anything on our project side, but still here we have a solution - ecosystem CI with testing both - your and our latest development head we can find it very early and prevent releasing eventually bad version... 👍

What is needed to do?

What will you get?

  • scheduled nightly testing configured for development/stable versions
  • slack notification if something went wrong to investigate
  • testing also on multi-GPU machine as our gift to you 🐰

cc: @Borda

a fault during establishing virtual environment

Dear author:
I am a freshman in github and sorry to bother you with the error I meet which may seem to be quiet wired for you. When I want to download the repositories to my local computer, with the third command you offered in the "how to use" section, both vscode and python report the errors as follow:
image
could you explain why and how to solve the problem?
Thanks a lot

find_and_replace.sh: No such file or directory

When I run

python scripts/prepare_data.py

finally print

Cleaning data...
sh: /Users/xxxxx/xxxxx/image-to-latex/data/scripts/find_and_replace.sh: No such file or directory

But I find out that find_and_replace.sh the file located in

/Users/xxxxx/xxxxx/image-to-latex/scripts/find_and_replace.sh

The file path is wrong.

Best metrics on im2latex-100K?

Hey guy, your work is cool! If possible, can you tell me the best metrics you got when training on im2latex-100K, such as BLEU WER , ROUGE, etc.?

Model file can't download

i want kingyiusuen/image-to-latex/1w1abmg1,
but keep Downloading model checkpoint...

Can anyone provide a download link for this model file? thanks

How to train the model with multiple GPUs

I have a problem that when I run the scripts in this form
python scripts/run_experiment.py trainer.gpus=4 data.batch_size=32 and it will cause a bug that is shown below

CPB`~A JK6%_$%N_U(ZIIE3
How to handle this problem?
Thanks!

I can't get good results from the API

I got CER of 0.06 on im2latex-100K.
But I can't get good results from the API.
Please give me some advice.

ls -l checkpoints

drwxr-xr-x 3 root root 4096 Apr 13 15:40 'epoch=1-val'
drwxr-xr-x 3 root root 4096 Apr 13 15:54 'epoch=3-val'
drwxr-xr-x 3 root root 4096 Apr 13 16:07 'epoch=5-val'
drwxr-xr-x 3 root root 4096 Apr 13 16:21 'epoch=7-val'
drwxr-xr-x 3 root root 4096 Apr 13 16:34 'epoch=9-val'
drwxr-xr-x 3 root root 4096 Apr 13 16:48 'epoch=11-val'
drwxr-xr-x 3 root root 4096 Apr 13 17:01 'epoch=13-val'

ls -l epoch=13-val

drwxr-xr-x 2 root root 4096 Apr 15 09:42 'loss=0.12-val'

ls -l loss=0.12-val

-rw-r--r-- 1 root root 2062314067 Apr 13 17:01 'cer=0.06.ckpt'

vi test_predictions.txt

\alpha _ { 1 } ^ { r } \gamma _ { 1 } + \ldots + \alpha _ { N } ^ { r } \gamma _ { N } = 0 \quad ( r = 1 , . . . , R ) \ ,
\eta = - \frac { 1 } { 2 } \operatorname { l n } ( \frac { \operatorname { c o s h } ( \sqrt { 2 } b _ { \infty } \sqrt { 1 + \alpha ^ { 2 } } y - \operatorname { a r c s i n h } \alpha ) } { \sqrt { 1 + \alpha ^ { 2 } } } } } )
P _ { ( 2 ) } ^ { - } = \int \beta d \beta d ^ { 9 } p d ^ { 8 } \lambda \Phi ( - p , - \lambda ) ( - \frac { p ^ { I } p ^ { I } } { 2 \beta } ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p , \lambda ) \Phi ( p ,
\Gamma ( z + 1 ) = \int _ { 0 } ^ { \infty } ; d x ; e ^ { - x } x ^ { z } .
\frac { d } { d s } { \bf C } _ { i } = \frac { 1 } { 2 } \epsilon _ { i j k } { \bf C } _ { j } \times { \bf C } _ { k } , .

API Test

curl -X 'POST' 'http://0.0.0.0:8000/predict/' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -F '[email protected];type=image/png'

{"message":"OK","status-code":200,"data":{"pred":"E = m c ^ { 2 } \qquad \qquad E = m c ^ { 2 } \qquad E = m c ^ { 2 }"}}

curl -X 'POST' 'http://0.0.0.0:8000/predict/' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -F '[email protected];type=image/png'

{"message":"OK","status-code":200,"data":{"pred":"{ I } _ { \mathrm { I } } { \mit { U } } } ( { \bf { I } } { \bf { U } } ) } ) \; \; \Longrightarrow \; { \bf { U } } { \binom { \bf { I } } } { { \bf { H } } } ) \; { \frac { 1 } { \longrightarrow } } \; { \bf { U } } } { \bf { I } } } \Bigg \{ { \bf { I } } } { \bf { I } } ) \; { \bf { T } } \; { \bf { U } } { \bf { U } } \; { \bf { U } } { \bf { U } } } \; { \bf { U } } { \bf { U } } { \bf"}}

curl -X 'POST' 'http://0.0.0.0:8000/predict/' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -F '[email protected];type=image/png'

{"message":"OK","status-code":200,"data":{"pred":"{ \bf { I } } ~ \longrightarrow ~ { \bf { I } } _ { I } \prod _ { I } \bigoplus _ { I } \bigoplus _ { \bf { I } } \bigoplus _ { \bf { O } } { \bf { I } } { \bf { I } } } { \bf { I } } { \bf { I } } { \bf { I } } } { \bf { I } } } { \bf { I } } { \bf { I } } } { \bf { I } } { \bf { I } } { \bf { I } } } { \bf { I } } } { \bf { I } } } { \bf { I } } } } { \bf { I } } } { \bf { I } }"}}

curl -X 'POST' 'http://0.0.0.0:8000/predict/' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -F '[email protected];type=image/png'

{"message":"OK","status-code":200,"data":{"pred":"\begin{array} { c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c"}}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.