Git Product home page Git Product logo

hmeg's Introduction

HMEG

Yu Chen*, Fei Gao*, Yanguang Zhang, Maoying Qiao, Nannan Wang**. Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), accepted, Seattle WA, USA, June 17-21, 2024.

* Equal Contributions.

** Corresponding Author.

Video@YouTube

Abstract

In this paper, we explore a novel challenging generation task, i.e. Handwritten Mathematical Expression Generation (HMEG) from symbolic sequences. Since symbolic sequences are naturally graph-structured data, we formulate HMEG as agraph-to-image (G2I) generation problem. Unlike the generation of natural images, HMEG requires critic layout clarity for synthesizing correct and recognizable formulas, but has no real masks available to supervise the learning process. To alleviate this challenge, we propose a novel end-to-end G2I generation pipeline (i.e. graph → layout → mask → image), which requires no real masks or nondifferentiable alignment between layouts and masks. Technically, to boost the capacity of predicting detailed relations among adjacent symbols, we propose a Less-is-More (LiM) learning strategy. In addition, we design a differentiable layout refinement module, which maps bounding boxes to pixel-level soft masks, so as to further alleviate ambiguous layout areas. Our whole model, including layout prediction, mask refinement, and image generation, can be jointly optimized in an end-to-end manner. Experimental results show that, our model can generate highq quality HME images, and outperforms previous generative methods. Besides, a series of ablations study demonstrate effectiveness of the proposed techniques. Finally, we validate that our generated images promisingly boosts the performance of HME recognition models, through data augmentation.

Pipeline

Datasets and Weights

Download the CROHME 2019 dataset from Google Drive and put it in datasets/.

All pre-trained weights are also in the Google Cloud link above.

About Training

You need to change the training data path at line 96 of script train_image_generator.py

python train_image_generator.py

About test on method CAN

We released the test code of CAN, please click CAN to test.

@inproceedings{gao2024hmeg,
title={Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline},
author={Yu Chen, Fei Gao, Yanguang Zhang, Maoying Qiao, Nannan Wang},
booktitle={In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024},
url={https://openreview.net/forum?id=8r3rs0Mub6}
}

hmeg's People

Contributors

fei-aiart avatar vmaibex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

hmeg's Issues

How are npy files in the directory "link_npy" generated?

I printed npy[0] out and it shows:

{
  'name': '101_alfonso',
  'bbox': tensor([[3.2000e+01, 0.0000e+00, 4.6000e-01, 7.3333e-02, 5.4000e-01],
                  [1.8000e+01, 9.6667e-02, 4.9667e-01, 1.5000e-01, 5.2333e-01],
                  [1.0000e+00, 1.9667e-01, 4.0000e-01, 2.3000e-01, 5.8667e-01],
                  [6.7000e+01, 2.5333e-01, 4.5667e-01, 3.5667e-01, 5.2000e-01],
                  [8.2000e+01, 2.5000e-01, 5.4333e-01, 2.8333e-01, 5.8333e-01],
                  [1.8000e+01, 2.9667e-01, 5.6333e-01, 3.2667e-01, 5.8333e-01],
                  [9.0000e+00, 3.4000e-01, 5.5000e-01, 3.8000e-01, 5.8667e-01],
                  [8.7000e+01, 2.8000e-01, 4.1333e-01, 3.1333e-01, 4.4667e-01],
                  [6.9000e+01, 3.9333e-01, 4.5667e-01, 4.4000e-01, 5.1333e-01],
                  [8.2000e+01, 4.3333e-01, 5.0667e-01, 4.7000e-01, 5.4333e-01],
                  [5.0000e+00, 4.8000e-01, 4.9000e-01, 5.2667e-01, 4.9667e-01],
                  [1.0000e+00, 5.7333e-01, 4.0667e-01, 5.9667e-01, 5.5000e-01],
                  [8.7000e+01, 6.1000e-01, 4.6333e-01, 6.5333e-01, 5.1333e-01],
                  [5.0000e+00, 6.7333e-01, 4.8667e-01, 7.0667e-01, 4.9333e-01],
                  [1.0000e+01, 7.2000e-01, 4.5667e-01, 7.6667e-01, 5.0667e-01],
                  [2.0000e+00, 7.5000e-01, 4.0333e-01, 7.8667e-01, 5.5000e-01],
                  [6.0000e+01, 8.0333e-01, 4.5667e-01, 8.5333e-01, 5.1333e-01],
                  [2.0000e+00, 8.1000e-01, 3.8667e-01, 8.7000e-01, 6.0667e-01],
                  [9.1000e+01, 9.0333e-01, 4.6000e-01, 9.5000e-01, 5.2333e-01],
                  [1.0000e+01, 9.5667e-01, 4.0333e-01, 1.0000e+00, 4.5000e-01]]), 
  'edge_type': tensor([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                       [2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                       [0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                       [0, 0, 2, 0, 3, 0, 0, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                       [0, 0, 0, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                       [0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                       [0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                       [0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                       [0, 0, 0, 2, 0, 0, 0, 0, 0, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                       [0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                       [0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
                       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0],
                       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0],
                       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0],
                       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0],
                       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0],
                       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0],
                       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0],
                       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 4],
                       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0]])
}

If I want to change to my own dataset, and my annotations are like:

[
  {
      "image": "xxxx.jpg",
      "annotations": [
        {
          "label": "a",
          "coordinates": {
            "x": 78,
            "y": 83.5,
            "width": 29.5,
            "height": 161
          }
        },
        {
          "label": "b",
          "coordinates": {
            "x": 120,
            "y": 68.5,
            "width": 31.5,
            "height": 137
          }
        },
        {
          "label": "c",
          "coordinates": {
            "x": 57.5,
            "y": 143,
            "width": 33.5,
            "height": 44.5
          }
        },
        ......
      ],
      "edges": ["'a', 'b', 'right'", "'b', 'c', 'sub'"]
  },
  ......
]

How to preprocess my dataset to apply your code?
I'd appreciate it if you can help me with this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.