Colour and pattern matching

Lots of work for a client recently in the area of colour matching, and moving on from there, the more challenging problem of matching patterns (i.e. groups of colours, arranged into shapes or regions of different sizes).

Although OpenCV provides the basic tools needed to work with colour (primarily, the ability to switch between colour spaces like RGB and HSV), matching colours is more challenging than it first seems.  In this instance, we were trying to match colours as the human eye would perceive them, which is different from the simple measure of how ‘close’ two colours are in a simple RGB cube.  From there, a further complication is that much of the matching was done from photographs taken in the ‘real world’, where lighting and shading varies widely – even something which is the ‘same colour’ across all its surface may appear very different at different points in the photo, or across several photos.  Deciding what is a difference in colour, as opposed to a difference in lighting, is tricky, but there are techniques that can be used to help.  In our case, we also had to decide which part of the image was the ‘object’, and which was the background to be ignored completely.

Moving on to matching patterns adds further complexity.  Is it more important to match colours as exactly as possible, regardless of size/shape of the pattern, or vice versa?  We created a test set containing around 20,000 images to experiment with the best ways of finding a pattern that most closely matches another pattern, and have found a good balance – but it’s not a simple problem.

Chess – the classic AI problem

To settle a pub argument that took place around 20 years ago (I’ve been busy, OK?!) about artificial intelligence and emergent behaviour, and because I have some work coming up in this area, I have finally got around to writing a simple chess engine.

In terms of the pub argument, it has immediately proved that (a) yes, a chess engine can be very easily capable of beating the person who wrote it, and (b) even though it has no built-in knowledge of tactics such as forks, skewers or discovered attacks, it can and will use all such techniques during a game to very good effect. In other words, that behaviour ’emerges’ from the raw computation.

Neither of those things will come as any surprise to anyone who understands anything
about AI, of course. But it was fun to write, and it’s actually quite fun to play against, but also fun to play *with*. For example, it’s interesting to see how it chooses different moves depending on how many moves ahead it’s allowed to look.

When away from the field of computer vision, the speed of a modern computer is astounding. My code is currently very unoptimised, and was written mostly for ease of writing rather than with performance in mind – and yet, on a normal desktop PC, running on one core only, it is capable of generating, and analysing, approximately one million moves (board positions) per second.

Of course, chess is a famous example of a problem with a ‘combinatorial explosion’, meaning that for even relatively small numbers of moves to look ahead, the number of possible board positions rapidly becomes fantastically large. At 5 moves ahead, it is looking at around 100 million board positions – suddenly a millions moves per second doesn’t seem so much.

I am currently playing it ‘against’ another computer chess game, which after a few games will provide an estimated ELO score – I’ll report back in a day or two as to the results of that.

The current algorithm is pure ‘Minimax’, with no modifications.  It has no opening book or endgame database, so the play is a bit ragged at those stages. The first couple of moves are fairly random, but as the two sides ‘engage’ it starts making proper moves. By the end-game (at least when playing against me) it’s usually got enough of an advantage to be fairly decisive – it usually manages a check-mate during the ‘main’ part of the game, rather than waiting for a typical ‘endgame’ situation where we’ve only got 2-3 pieces each.

When time allows, I have other plans for this – after building in simple alpha-beta pruning to speed it up, I want to start to work on strategies – but from an AI point of view, I want it to be able to learn strategies itself, rather than be pre-programmed with them. I have some ideas in mind, and it should be interesting.

More OCR work, and the start of some ‘pure’ AI

Another busy month getting some OCR work up to production standard – there’s a big difference between ‘basically working’ and ‘industrial grade’.  It’s taken a lot of work, but is there now.

I have written a test harness that allows bulk amounts of test data to be tested and retested, in as little as 1 second each time, and in an automated way after each round of training.  A bit of ‘infrastructure’ and formal testing goes as long way at this stage.

Success rates in the test suite are now up to 100%, and the real prodution environment beckons.

In other news, I have some work to do on a ‘pure’ AI project – some software to take one player’s turn in a ‘full information’ two-player game, and associated research on scoring algorithms.  The idea is to see what level of ‘intelligent behaviour’ emerges from just the raw computation, given enough time and CPU cores.  It sounds like my idea of fun 🙂

2D point matching (or pose estimation) based just on relative position

For a client’s project recently I needed to be able to correlate the positions of a number of points in 2D space from one image to another.  These aren’t ‘features’ as usually handled by routines such as SIFT/SURF/ORB, etc, but just 2D points with no other attributes at all.  Most of the points in image A will be present in image B, but  some may be missing, there may be extras, and image B may be rotated, scaled, and translated in x,y, by any amount.

It turns out to be quite an interesting problem.  Luckily the number of points in the images were fairly small (<50), so a brute force approach works – and it does work well.  As long as the most of the points in image A are present in image B, in something close to the same positions relative to each other, then they are found  correctly in image B – and importantly the algorithm knows which points correlates with which, so the position of each point can be ‘followed’ into the new image.  Finally, it returns the amount of rotation, scaling and translation applied.

If there were a large number of points, the efficiency of the algorithm would be a problem, and a different approach would be needed, but for this application it has worked very well.

Computer Vision with OpenCV on a Raspberry Pi

This week I have taken delivery of a Raspberry Pi 2, and a Pi camera module:  Total cost around UKP50.  The aim of the experiment is to see whether the Pi is powerful enough to be used for computer vision applications in the real world.  More of that over the coming days, but the short version is:  Yes it is.

I also needed several other Pi-related components (again, more details of the fun we’re having at a later date).  For various reasons mostly to do with who had what in stock, I split the purchases between two UK companies – 4Tronix, who supply all sorts of superb robotics stuff for Pi and Arduino, and The Pi Hut, who as the name implies sell all things Pi-related.  Both orders were handled quickly, and I recommend both companies highly.

Setting up the new Pi took 2 minutes, and attaching the camera module is easy, if slightly fiddly.

I used the ‘picamera’ module and was getting images displayed on screen, and saved to the filesystem, all within a further few minutes.  The ‘picamera’ module appears to be a very well written library, and the API is certainly powerful.

It was then time to build OpenCV.  This is a slightly more involved process (build it from the source code), which took a few minutes of hands-on time, followed by about 4 hours of waiting for it to compile.  A quick experiment then showed OpenCV working properly from both C++ and Python.

The picamera module can process images in such a way that they can be handled by OpenCV – the interface between the two is straightforward.  As such, within a few more minutes I was grabbing images live from the Pi camera module, and processing them with normal OpenCV Python calls.  I don’t yet know what would be involved in getting images from the camera from C++, but with a Python interface this good, it may not be necessary to worry about it (Python can of course call C/C++ routines anyway).

Initial impressions are that it all works beautifully.  On the *initial* setup, it seems to take about one second to capture a frame from the camera, but the good news is that OpenCV processing (standard pre-processing such as blurring, and Canny edge detection) are faster than I’d expect from a computer this size.  After playing with a few settings, I am now able to increase the frame rate to many frames per second at capture, and around 4 FPS even including some OpenCV work (colour conversion, blur, and Canny edge detection) – bearing in mind some of those are compute-intensive tasks, I think that’s impressive.

So yes:  The Raspberry Pi 2 and the Pi camera module are certainly suitable for computer vision tasks using OpenCV, and I have two contracts lined up already to work on this.

Some more OpenCV tricks

A busy month of OpenCV contracting for a number of clients, including some work in areas of OpenCV I’ve not used much, if at all, before (non-chargeable, of course – I only charge for productive time).

I am now more familiar than I ever thought I’d be with the HoughLines(P) and HoughCircles functions – the former of which is more complex than it first seems.  Like many things in computer vision, it takes some coaxing to get good results, and even more coaxing to get really robust results across a range of ‘real live’ images in the problem domain.

I have also worked a lot this month with the whole ‘camera calibration’ suite of functions, and then followed that up by gaining experience with the ‘project image points into the plane’ routines, which can lead to some interesting ‘augmented reality’ applications.  However, in my case, I’ve used them to simply determine exactly where (in the 2D image) a specific point in the 3D space would appear.  It works very well, and I have a project lined up ready to put this into action.

I’ve revisited one of my ‘favourite’ (i.e. most used) parts of the library:  contour finding, and associated pre- and post-processing, but this time all from Python.

During the last few days, I’ve started looking at 2D pose estimation:  specifically in this case, trying to determine the location of a known set of 2D points in a target image, given possible translation, rotation and scale invariance.  Not finished with that one, yet.

Last (but not least – this isn’t going to go away) I’ve been making an effort to learn Git.  I was pleased to find this simple guide, which at least let me get on with my work while I learn the rest.

OpenCV with Python – first impressions

I’ve spent a month or so trying to make an effort to learn Python, mostly by forcing myself to do any new ‘prototype’ vision / OpenCV work in the language.  This has cost me some money – I only charge for ‘productive’ time, not ‘learning’ time, and at times the temptation to go back to ‘nice familiar C++’ has been great.  But I’ve made good progress with Python, and I’m glad I’ve stuck at it.  Apart from anything else, the language itself isn’t hard to pick up.

The pros and cons from a computer vision perspective are roughly as expected.  It can be slower to run, but depending on how the code is written, it’s not a big difference.  Once ‘inside’ the OpenCV functions, the speed appears to be about the same (as you’d expect:  it’s just a wrapper for the same code), but any code run actually in Python needs careful planning, and if large amounts of compution were going to be done, C++ would no doubt still be the best bet.

But anything it lacks in runtime speed, it certainly makes up for in speed of development.  As a prototyping language, I think I’m already more productive in Python than C++ (and that’s after 20+ years of C++, and a month of part-time Python).  There will always be more to learn, of course, but I think I’m at the point where the learning curve is beginning to get less steep.

Python

Python has been around for years (since the late 80’s, I was surprised to discover, although not in mainstream use until much more recently).  I have used it occasionally for very simple scripts, usually where the larger ‘ecosystem’ of the project I’m involved with has also been Python-based.

However, it’s now becoming clear that Python has broken through as a mainstram language in the scientific community, and also specifically in computer vision and AI.  OpenCV – my main area of work – has a good Python binding.

Time to learn this language more deeply then, I think.  I have shied away from it a little until now, on the basis that C/C++ are bound to be faster for compute-intensive tasks such as vision.  However, initial tests show that the Python binding is roughly as fast (perhaps because it is exactly that – a binding, to the core of OpenCV which is still C/C++).  It may be the case that C/C++ will remain faster when much of the functionality of the application is above the OpenCV level – but if the application is mainly just calls to OpenCV, then perhaps C/C++ doesn’t have such an advantage.

I will be creating some test apps in both languages as way of learning Python, and will post comparitive results here in due course.

OpenCV on Google Glass now working properly

Almost exactly a year ago, I was invited by a client to Google’s Campus in London, to attempt to port our app to the new Google Glass device, which was still pre-release in the UK at that point.  As I wrote at the time, our app worked well (and surprisingly quickly), but due to a bug at the device driver level, the images being received from the camera were garbled, so our app wasn’t able to do anything very useful.

We reported the bug to Google and to OpenCV on the day. I am pleased to say that my client now has access to a current Google Glass device, containing the latest drivers – he has just contacted me to inform me that the app works perfectly now, without any modification, and is processing images as intended.

As you can imagine, I’m very pleased to hear this – not only does it prove that our app was working properly in the first place, but it now gives us an exciting new platform to develop computer vision apps on in the future. Watch this space.

Fourier Transforms for the non-mathematician

There are a few areas of computer vision and image processing where a little bit of maths is hard to avoid. Luckily for me (I’m no mathematician) these are few and far between – in most cases these days, either the maths is not too advanced, or the popular libraries (such as OpenCV) help hide the worst of it and let us get on with being ‘practitioners’.

However, one exception that keeps cropping up is Fourier Transforms. They are everywhere in computer vision, and for good reason: they help solve a lot of problems (I’ll write another post about this when time allows, but my current project has been revolutionised by using Fourier Transforms).

However, almost all explanations plunge straight into maths, involving the so-called complex numbers: the square root of minus one, and all that. The simple truth is that my school maths (hi, Mr. Feakes!) didn’t equip me for this, and I strongly suspect I’m not alone. While OpenCV helps hide the real nuts and bolts, an intuitive explanation of what is going on is essential to help decide when to use this tool, and just on a basic level, how it works.

So I was very pleased to find the following: An Intuitive Explanation of Fourier Theory, with pictures, and no hard stuff. Just enough for me to understand intuitively how this works – perfect. Thanks to Steven Lehar for writing it.

EDIT 2014-04-16: Having been in touch with Steven to thank him personally, he has recommended a number of other articles for people who, like me, prefer ‘intuitive’ approaches to things. In particular, I’m looking forward to studying two – one of his own, and one other he recommends:

A Visual, Intuitive Guide to Imaginary Numbers

Clifford Algebra: A visual introduction

Thanks again, Steven.