Tesseract

I was very pleased to be invited last weekend to a ‘wearable devices hackathon’, at Google Campus in central London. Having never been to such a thing before, I wasn’t sure what to expect.

We were asked to go with an Android app, ready to port to Google Glass. I have plenty of my own Android Apps, but I had little idea in advance of what capabilities Glass would offer in terms of what my apps need (namely: direct access to the camera, and plenty of CPU horsepower). Would I need to link to a back-end Android phone, and if so, how? Would the Mirror API offer what I needed? Would I be able to learn the GDK, which had been announced as a ‘sneak preview’ a few days earlier, quickly enough? Would I be twice the age of everyone else there? On a more serious note, how much would I be able to achieve in one day? I’ve had single bugs that have taken longer than that to fix. I am persistent, and don’t often give up on getting something working, but I rarely make promises as to how long something will take.

I went well-prepared: Two laptops, both identically configured with the entire Android SDK / OpenCV4Android / Tesseract stack (see this post), and well rehearsed in the code of my own app. It was an interesting day – we started at 9:00am, with some short presentations, following by splitting into teams. As we had gone with our own app, my client and I formed our own team. The day ended at midnight.

On the whole, the day and the end result was positive. Having got the initial ‘Let me try it on! Someone take my picture!’ moment out of the way (I had to join the queue…), it turned out that porting our app to run on Google Glass was not as difficult as I had feared. We didn’t need any Glass-specific code, other than installing the ‘sneak preview’ API into Eclipse. With some help from a couple of real (and very jet-lagged) Glass experts, our app was installed on a Glass device. With a couple of tweaks, it was running – quite a moment to see your own app running on brand new hardware.

We had one problem, which spoiled our day to some extent: The image that comes back from the Glass camera was distorted, as if by a driver problem. A quick search revealed that other people had had the same problem in the last couple of weeks since the ‘sneak preview’ was released. Various work-arounds were suggested, but none that would work for OpenCV. As much as we tried to work around it, there was nothing to be done – the incoming image (from the point of view of our app) was garbled. A real shame, as our app itself was working well – debug windows, output to the Glass screen, and all – and furthermore, it was running reasonably fast: at least 2 frames per second, comparing favourably to the 3-4 I expect on a Samsung Galaxy S4. Not bad at all for a 1st version of a brand new headset.

So, we didn’t quite get there. However, I have since reported the bug to OpenCV, had the bug accepted, and it is slated to be fixed in the next release (2.4.8). At that point, we will be up and running on Glass, and ready to move on to tweaking the UI for Glass-specific gestures. My client has a specific market, that will open up fairly rapidly at that point.

[EDIT Nov 2014: My client recently took delivery of a new Google Glass device, with the new drivers, and the app we had developed worked immediately and very well. To quote him, “it works like a miracle”].

Summary: As soon as OpenCV 2.4.8 is out, we should be there – and we now know that Google Glass is a capable platform for running OpenCV4Android apps, on board (and how to achieve that). Exciting times ahead, I think.

For two client projects this summer, I’ve needed an OCR solution, and I’ve ended up using Tesseract. It seemed like the obvious choice – open source, been in development since the ’80’s, development ‘sponsored by Google’ since 2006, etc.

Initial signs were good. I installed both the command line tool and the SDK on Linux. Within 5 minutes I was getting results from the command line tool, and within an hour I was also getting results from my own test program using the API. Only another few minutes after that, and I had got it using images provided by OpenCV, rather than by Leptonica, which it uses by default. All was looking good.

But since then, things have gone downhill somewhat. Maybe I’m using it in a case that it isn’t really designed for, and/or maybe I haven’t put enough time into training it with the specific font in question.

My ‘use case’ (without giving away client-specific details) is that I’m trying to recognise a sequence of numbers and letters, which may not be dictionary words – they may be acronyms, or just ‘random’ strings, and in some case will be individual letters.

For some characters it seems to work fairly well. In some of the cases it doesn’t, it’s almost understandable: An upper case letter ‘O’ does look a bit like an upper case letter ‘D’, and I can understand it confusing the upper case letter ‘I’ with the numeral ‘1’. But in other examples, it almost always seems to confuse upper case ‘B’ and ‘E’, even when the difference (i.e. the right hand side) is clearly visible. Why?!

For customisation, it seems to want training on languages, which I can understand – but surely there should be the option to just train it on a new font and have it simply recognise on a character-by-character basis too? There are options to switch off whole-word recognition, but they don’t seem to make much difference.

Finally, the whole thing is very under-documented, and unstable. One wrong parameter, and the whole thing crashes without an error message. In particular, the training process is long, cumbersome, and then crashes without further explanation.

I’ve spent a lot of time on this recently, and am probably about to give up for now. On the plus side, I did get it working on Android, thanks to the tess-two library, but the OCR results themselves were of course the same.

I’m hoping Google will pump some serious resource into getting Tesseract up to scratch – or that someone will come up with a good (i.e. documented, stable, and working) open source alternative.

[rant ends]

Category: Tesseract

Playing with Google Glass