A tricky computer vision project

On a recent computer vision project, we were trying to detect objects in images. Nothing unusual there, but we had some specific problems: The objects could vary in size from ‘very big’ (taking up nearly half the image) down to ‘very small’ (just a few pixels across). The objects were against a very ‘noisy’ background, namely the sky: complete with clouds and the Sun, sometimes rain, and other foreground objects such as trees. The detection accuracy had to be very high, and false-postives had to be very low: There could be no human intervention to correct mistakes. The software had to run on a credit-card sized computer (Jetson NX), alongside other software including a guidance system – the whole system was on-board a drone. And finally, it had to be fast: The camera we used was running at 30 frames-per-second, and the guidance system we were feeding into expects inputs at that rate: We had to keep up.

Traditional Computer Vision pipelines, and Genetic Algorithms

We had developed ‘pipelines’ of ‘traditional’ computer vision techniques, using OpenCV: colour space conversions, blurring, thresholding, contour-finding, etc. The system was working well in about 80% of cases – surprisingly well, given the range of object sizes we were trying to detect against such a noisy background.

But we got stuck chasing down that last 20% or so. Whenever we made a change to get things working on one specific class of image, it would break another that was previously working. We even tried automatically generating pipelines using Genetic Algorithms (not my idea, but I wish it had been!) – this generated some pipelines that worked well, but still we couldn’t achieve a system that worked well enough in all cases we might encounter.

Deep Learning: Yolo v7

The main reason for using traditional techniques was for speed – as I metioned, we had very tight timing constraints – but also because, last time we tried deep-learning models (Yolo V1 and V2), they were very bad at detecting objects that were small in the image.

But having hit a blocker in our progress, as a ‘last throw of the dice’, we decided to review the state of the art in deep-learning detectors.

After a review of the options, we settled for various reasons on Yolo v7: Even then (summer 2024), this wasn’t by any means the newest version of Yolo, but it gave a good combination of being fully open-source, well-documented and supported, and well-integrated with the languages we wanted to use (Python and C++).

The work itself took a while: There were a number of problems that we had to overcome, including some very technical ‘gotchas’ that nearly caused us to give up on it a few times. Of course, we also needed a large set of labeled training data, but we already had that.

Results

In short, the results are staggeringly good. We are now able to detect objects down to just a few pixels across, but more to the point, against very noisy backgrounds: In some cases ‘lens flares’ caused by the camera facing directly into the sun make the target object almost invisible to the human eye – but our system based on Yolo v7 detects the objects successfully in a very high percentage of cases. Also, the performance is exceptional – on a Jetson NX, running on the GPU, we are doing inference in around 8ms, allowing time for pre- and post-processing steps to be added and still achieve 30FPS, which is our camera frame-rate.

Yolo V7 is not a plug-and-play solution straight out of the box: Even just for training, we had to do some careful setup and config, ensure a well-balanced training and testing set, and then train and test until we were satisfied we had a good model that could not only detect our target object, but exclude all others. Inference (i.e. runtime), especially from C++, was far more difficult – there were one or two fairly esoteric problems. In particular, detections were often centered correctly, but with wildly wrong ‘rectangular’ bounding boxes – it took a while to work out the solution to that one.

Summary

There’s still a place for traditional computer vision techniques (and we still use some in this project), but Yolo and other deep-learning detectors are well worth considering.

Contact me (tom@alvervalleysoftware.com) to discuss whether I can help with your computer vision project. If you’re thinking about using Yolo from Python and C++, I probably can…

Genetic algorithms (GAs) are a search and optimisation technique inspired by the ideas of “survival of the fittest” in populations of individuals, as well as other genetic concepts such as crossover and mutation.

GAs can often find good solutions to problems very quickly – often finding solutions in complex, multi-dimensional, non-linear problem “spaces” that other algorithms struggle badly with.

Successfully applying a Genetic Algorithm to a problem involves steps such as:

Identifying whether the problem “space” is suited to a GA.
Encoding the problem into a “genome” that the GA can work with.
Writing a GA (or using a standard library).
Defining and writing a fitness function.
Avoiding pitfalls such as using a weak random number generator, using encodings with big “step” values in them which can block improvements, etc.

Unlike neural networks where I favour a pre-written open source library, with Genetic Algorithms I prefer to write my own – the algorithm itself is small and simple, and it is best to have control over some of the other aspects mentioned above.

I have used my own GAs as part of commercial projects mentioned elsewhere on this website, including Computer Vision, and other data analysis projects.

I have also implemented other evolutionary computing algorithms, such as variations of Particle Swarm Optimisation and Ant Colony Optimisation. Each algorithm has its own “class” of problem that it solves better than most other algorithms.

Please email me to discuss your project and we’ll see if I can help.

Month: January 2025

Traditional Computer Vision vs. Deep Learning with Yolo v7