avs-admin – Page 3 – Alver Valley Software Limited

For two client projects this summer, I’ve needed an OCR solution, and I’ve ended up using Tesseract. It seemed like the obvious choice – open source, been in development since the ’80’s, development ‘sponsored by Google’ since 2006, etc.

Initial signs were good. I installed both the command line tool and the SDK on Linux. Within 5 minutes I was getting results from the command line tool, and within an hour I was also getting results from my own test program using the API. Only another few minutes after that, and I had got it using images provided by OpenCV, rather than by Leptonica, which it uses by default. All was looking good.

But since then, things have gone downhill somewhat. Maybe I’m using it in a case that it isn’t really designed for, and/or maybe I haven’t put enough time into training it with the specific font in question.

My ‘use case’ (without giving away client-specific details) is that I’m trying to recognise a sequence of numbers and letters, which may not be dictionary words – they may be acronyms, or just ‘random’ strings, and in some case will be individual letters.

For some characters it seems to work fairly well. In some of the cases it doesn’t, it’s almost understandable: An upper case letter ‘O’ does look a bit like an upper case letter ‘D’, and I can understand it confusing the upper case letter ‘I’ with the numeral ‘1’. But in other examples, it almost always seems to confuse upper case ‘B’ and ‘E’, even when the difference (i.e. the right hand side) is clearly visible. Why?!

For customisation, it seems to want training on languages, which I can understand – but surely there should be the option to just train it on a new font and have it simply recognise on a character-by-character basis too? There are options to switch off whole-word recognition, but they don’t seem to make much difference.

Finally, the whole thing is very under-documented, and unstable. One wrong parameter, and the whole thing crashes without an error message. In particular, the training process is long, cumbersome, and then crashes without further explanation.

I’ve spent a lot of time on this recently, and am probably about to give up for now. On the plus side, I did get it working on Android, thanks to the tess-two library, but the OCR results themselves were of course the same.

I’m hoping Google will pump some serious resource into getting Tesseract up to scratch – or that someone will come up with a good (i.e. documented, stable, and working) open source alternative.

[rant ends]

I spent a fair bit of time over the summer getting set up and aquainted with OpenCV4Android, starting with the Android SDK itself, and also including the NDK for native C++ development.

It’s all quite a bit of a learning curve: You end up learning Eclipse, the ADT add-ons, the Android framework, Java, NDK, and OpenCV4Android all more or less at the same time – and that’s assuming you’re happy with OpenCV and C++ to start with. Anyway, I have got there with plenty of help from the web. (I also have my own set of notes, taking me through the entire process again step by step, which I have since done twice on two new laptops).

But I am now completely up and running on this environment, and have successfully ported a number of my own OpenCV apps to run on various Android devices. They run well, too – the processing horsepower available on a modern smartphone turns out to be more than enough to handle images from the on-board camera, and run ‘average’ OpenCV apps, and I am currently working on two such apps for clients.