Untexify - a clone of Detexify
However, its mnemonic conventions for symbols are old and debatably antiquated. Many beginning researchers and students of the sciences find difficulty remembering the
This is accomplished using an OCR machine learning model written in Python using TensorFlow, which is then ran in a webapp created with Django and hosted online with the help of fly.io.
The training dataset was not hand-drawn, like in the case of many other MNIST-like models. Rather, it was programmatically generated using an image-transformation pipeline created with the Albumentations library. This way, I didn’t have to find or write myself thousands of hand-drawn
This entire project was planned and conceived in the lovely Org-mode, as Emacs is my primary development environment. The website was itself written as a series of HTML and Javascript codeblocks which were tangled together using Org-mode’s exporting functionality.
- Backend
- Create the dataset
- Pull a large list of symbols from the OEIS
I simply copied a table’s symbols and formatted them into a file such that each piece of
$\LaTeX$ was on its own line. - Convert the images into .png files
I used latex2image to convert the list of commands into small, square images of each symbol. The program is a bit finicky, so for future reference, I placed the file containing my equations called
equations.txt
the root of the git repository, then ran from the root:cd src/ . set.sh $absolute_path_to_equations_txt cd .. cd equations.txt_aux python generate_latex.py
I numbered the resulting files using Dired.
- Sort the images into classes based on their
$\LaTeX$ code I created my images folder, and used this bit of bash magic to sort them into subfolders sharing their same names:for i in $(seq 0 $IMAGE_COUNT); do mkdir $i; mv $i.png $i/; done
- Simulate handwriting
To do this I need a series of “transforms” which will piecewise randomly affect an aspect of a given image. This prevents overfitting, and in the first phase makes the model functional at all. Here are the aspects of the image I chose to transform:
Writing aspect Transform name “wiggliness” or poor handwriting A.ElasticTransform()
Sharpening A.Sharpen()
Uniform color A.Equalize()
Orientation/rotation Scale - Translation and scale Although a textbook cited at the keras docs mentions that convolution layers should be translation invariant, a cursory test of my model indicates they are definitely not. So, I need to alter the transformation stack accordingly. The model is also not resistant to the scale of the input, so I need to fix that as well.
- Stroke The model is not resistant to different strokes. Depending on the way I implement the frontend, there may be no reason to train the model to recognize this.
- Choose a list of symbols
Initially, I chose a sample of 50 symbols picked mostly arbitrarily. The initial sample includes multiple sets of symbols which would be similarly drawn (
$\prec$ and$<$ , for example), and also made liberal use of$¬$ ’s (¬’s). Because no large public facing database of small$\LaTeX$ symbols in the model’s format exists, and the transform stack is prohibitively computationally expensive, I had to decide what my relatively small data set will contain. I decided on a set of symbols composed mostly of some of the most popular mathematical symbols.This might be a bit paradoxical, because those symbols which are most popular surely are the most remembered. This may be true, but it is also true that there are probably more beginning researchers and students in need of a reference for basic symbols than there are people who need to look up the more esoteric symbols. Since detexify exists and has a more comprehensive database, I choose for my tool to be more of a quick reference.
- Pull a large list of symbols from the OEIS
I simply copied a table’s symbols and formatted them into a file such that each piece of
- [ ] Train the model
- Create the dataset
- Frontend
- Hosting To host this project I used fly.io for its excellent integration with Django, which was used to construct the frontend. Fly.io’s extremely simple installation instructions for a number of web-app libraries for popular languages, and it was overall very simple to use for someone not experienced in website hosting like myself.
- Website structure
The frontend’s structure was made entirely using Django, which was excellent for me as someone with lots of Python experience, and little HTML or CSS experience.
Most of the interface between the model (which was made using another Python library, Tensorflow) and the page was handled in a single views.py file. Python acted as the glue between Django and Tensorflow, which was extremely helpful and satisfying to work with.
Those parts of the website I needed to actually delve into HTML for, were done almost entirely using org-mode’s helpful HTML export. I could export large swaths of org-mode documents to a nice-looking CSS “frame”, while embedding HTML within the plain org text for seamless integration into the final product.
The utility of a literate configuration in this case is debatable, and it exists mostly as a proof-of-concept, and convenience since most of my other design lives within Emacs. With the use of custom stylesheet functionality, however, I can quickly alter the look of the site at anytime, with minimal effort.
This block is the javascript code for the HTML canvas responsible for accepting user input, in the form of hand-drawn approximations of the symbol they are trying to look up.
// matches elements of the document "document" (presumably the default instance of the Document() object instantiated by call "defer" in the script element) which have "class=myCanvas".
const canvas = document.querySelector('.myCanvas');
const width = canvas.width = window.innerWidth;
const height = canvas.height = window.innerHeight - 85;
const ctx = canvas.getContext('2d');
ctx.fillStyle = 'rgb(0,0,0)';
ctx.fillRect(0, 0, width, height);
const colorPicker = document.querySelector('input[type="color"]');
const sizePicker = 4;
const output = document.querySelector('.output');
const clearBtn = document.querySelector('button');
// covert degrees to radians
function degToRad(degrees) {
return degrees * Math.PI / 180;
};
// update sizepicker output value
sizePicker.addEventListener('input', () => output.textContent = sizePicker.value);
// store mouse pointer coordinates, and whether the button is pressed
let curX;
let curY;
let pressed = false;
// update mouse pointer coordinates
document.addEventListener('mousemove', e => {
curX = (window.Event) ? e.pageX : e.clientX + (document.documentElement.scrollLeft ? document.documentElement.scrollLeft : document.body.scrollLeft);
curY = (window.Event) ? e.pageY : e.clientY + (document.documentElement.scrollTop ? document.documentElement.scrollTop : document.body.scrollTop);
});
canvas.addEventListener('mousedown', () => pressed = true);
canvas.addEventListener('mouseup', () => pressed = false);
clearBtn.addEventListener('click', () => {
ctx.fillStyle = 'rgb(0,0,0)';
ctx.fillRect(0, 0, width, height);
});
function draw() {
if (pressed) {
ctx.fillStyle = colorPicker.value;
ctx.beginPath();
ctx.arc(curX, curY - 85, sizePicker.value, degToRad(0), degToRad(360), false);
ctx.fill();
}
requestAnimationFrame(draw);
}
draw();
Now, we render embed the user-facing HTML elements onto the page.
<!DOCTYPE html>
{% load static %}
<html lang="en-us">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=width, initial-scale=5.0">
<title>Canvas</title>
<script src="{% static 'testapp/script.js' %}" defer></script>
</head>
<body>
<canvas class="myCanvas" id="canvas" style="border: 3px solid black;">
<p>Add suitable fallback here.</p>
</canvas>
<div class="toolbar">
<form enctype="multipart/form-data" action="" method="post">
{% csrf_token %}
{{ form }}
<input type="submit" value="Submit">
</form>
<button class="clearButton">Clear canvas</button>
</div>
<canvas class="background">
</canvas>
{{ symbol }}
</body>
</html>