Git Product home page Git Product logo

lt2326-ml2-h23-a1's Introduction

lt2326-ml2-h23-a1

Car Plate Recognition

Repo of original notebook: https://github.com/Deepayan137/Adapting-OCR (Das & Jawahar)

Part 1: Basic Implementation

I had issues trying to set up the virtual environment like authors recommended, using the env.txt file. It complained about conflicting dependencies and what not. I ended up pip installing the requirements.txt file, but with later versions of many of the dependencies. Running the code as is (with minor tweaks) gave me following results:

part1 results

Character Accuracy Word Accuracy
95.42 0.78

Part 2: Using Our New Data

I think what I initially tried was just to use the existing loader and collator straight off to see what would break. The first issue was the AssertionError: the height of conv must be 1, which gave an indication that the image sizes and/or colours might be the issue, which wasn't unreasonable given that they are both larger than 32 px and coloured. I tried to account for that in the collator by doing the same thing as the original authors had done for the height, but it didn't seem to help. In the end I decided to fit the dataset to the code instead of vice versa and re-sized the images to 32 px image height and made them black and white. Since that allowed me to keep the collator class and the rest of the code the same, I only updated the DataLoder to account for the annotation labels in the xml files, which seemed to work (as in, the code ran without errors). See part_2.ipynb for reference.

Part 3 – Training and Evaluation

Even after adjusting the dataset and training as well as evaluating the model, I only got a character accuracy of around 6-7 %, and no word accuracy whatsoever:

part2 results

Character Accuracy Word Accuracy
6.56 0.00

Part 4 – Evaluation and Error Analysis

It's intriguing that the prediction is only 1-2 character long. I'm guessing there are still adjustments in the code that need to be made (maybe prediction size needs to be set somewhere?), so it's hard to do a fair error analysis. At first I thought I might have corrupted the images while making them smaller, but they still seem readible. Maybe the fact that the dataset is pretty small (around 200 images) explains at least partly the bad results. Since many, but not all, of the pictures are taken from skewed angles, data augmentation with the existing images, e.g. skewing the straight images as well to increase the variance in the dataset, might improve results further.

Part 5 - Exploring New Architectures

This part made me revisit the argument input to the model for training, making me realize one issue all along could have been that: epoch (8) x batch_size (12) < train set size! Increasing no of epochs to 13 improved performance by more than double:

part2 results

Character Accuracy Word Accuracy
14.48 0.00

I didn't manage to change the no of hidden dimensions without getting errors, but I did experiment with the alphabet input (changing it to: """0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ""") to see if that would help the model further. I would have thought that it would since it decreased the number of possible classes and the complexity of the task. But the character accuracy stayed at around 14 % without further improvement.

lt2326-ml2-h23-a1's People

Contributors

datatjej avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.