Git Product home page Git Product logo

built with nix

deploying

Using OCR to generate Typst code based on images of math formulas as a fully client-side webapp.

Getting started

Using the model

The model is hosted here.

Installation

Obtaining data

We use oxen to version control our data. To get the oxen executable, run nix develop. Then, from the root of this repo, clone the oxen repo:

oxen clone https://hub.oxen.ai/DiracDelta/data

The datasets we use for this project will now be available in data/.

Training the model

Detypstify uses a custom dataset which was generated by transpiling the im2latex-230k with pandoc and cleaning the resulting data (see scraper/). The final dataset is available on Kaggle.

  1. Download the dataset and unzip it
  2. Run poetry run train_val_split to perform a train validation split
  3. Generate formulas.txt by running scripts/mk_formulas_txt.sh on the train and val directories
  4. Install pix2tex
    1. Follow the instructions to generate tokenizer, train.pkl, val.pkl
    2. Create a config.yaml based on the template
    3. Train the model with python -m pix2tex.train --config config.yaml

detypstify's Projects

detypstify icon detypstify

Using OCR to convert images of formulas into Typst code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.