mathSCAN — We break with rigid formats: Upload your snipped math formula whether from lectures,
books or pdf slides and get it back in editable LaTeX format — enabled by OCR AI.

3 min readJun 27, 2023

This project was carried out as part of the TechLabs “Digital Shaper Program” in Dortmund (winter term 2022/23).

In a nutshell:

Our project offers a python-based web application that can recognize mathematical expressions in images and convert them into digital text.Users can upload an image of their mathematical expression and the model will generate the corresponding LaTeX code. Our business case is to provide a tool that can save time and effort for anyone dealing with mathematical expressions, such as researchers, students, or educators.

Introduction:

We all share a common experience: as either current or former students, we’ve had to grapple with mathematical expressions presented in lectures, books, or PDF slides. However, these formats are often non-editable. Our project offers a solution for anyone who needs to work with mathematical expressions — be it a student or an engineer — by allowing them to effortlessly customize formulas in an editable LaTeX format, without having to resort to pen and paper.

Methodology:

We developed our webapp incrementally according to the CRISP DM process model.

Data Preparation

The code has been designed to read all images from a designated data directory for a dataset that comprises 104,000 PNG images, classified into 530 different classes. Each sample is encoded by reading an image, converting it to grayscale, resizing it to the desired size, and transforming the image. The training and validation data are retrieved from CSV files and stored in arrays. The formulas are then tokenized and a vocabulary is created. The formulas are then preprocessed to attain the desired length. Finally, the lists are converted to tensors for further processing.

Modeling and Evaluation

To achieve the described functionality, we employ machine learning libraries from TensorFlow and Keras to implement and train our deep learning model. We implemented an optical character recognition (OCR) model with a Connectionist Temporal Classification (CTC) loss function as a neural network using a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). The model takes an image as input and produces a sequence of characters as output. The CTC loss function is used to train the model to predict the correct sequence of characters. We have implemented a custom layer, CTCLayer, to compute the CTC loss between the predicted and true character sequence, which is added to the model during training. We have also incorporated an attention layer, which significantly enhances the model’s performance. However, to ensure that the model performs well on images from online sources, data augmentation is necessary to introduce more noise into the model.

Deployment

Our interface is built using Streamlit, a Python library for building / hosting web applications.

Results:

The ultimate outcome of this project is a python-based webapp, that can convert mathematical expressions represented in image format into LaTeX text. The focus and impact of the project lie in ensuring that the image data is transformed securely, leveraging a well-trained AI model while minimizing data loss. Going forward, we aim to expand the capabilities of the application by enabling handwriting recognition, and the recognition of tables and diagrams through optimized training, which is the long-term goal that we hope to achieve.

Github repository:

https://github.com/TechLabs-Dortmund/document-scanner

Team:

Leon Rosenkranz (24), field of study: Master Mechanical Engineering and Research for Data Management in industrial production applications https://www.linkedin.com/in/leon-rosenkranz-1b7997225/

Joshua Göcking (25), field of study: Master Chemical Engineering https://www.linkedin.com/in/joshua-g%C3%B6cking-42a221267/

Ayman Soultana (21), field of study: Applied Computer Science https://de.linkedin.com/in/ayman-soultana-3408a5192

Tobias Averbeck (31), field of study: Research of machine learning applications for thermodynamic properties prediction https://de.linkedin.com/in/tobias-averbeck-771390157

Mentors:

Web Development: Tom Stein, Deep Learning and AI: Miguel Krause

Sources:

Dataset: https://www.kaggle.com/datasets/shahrukhkhan/im2latex100k

Figures & Images: