OpenCV on CUDA

I recently had the opportunity to do some work on an NVidia Jetson TK1 – a customer is hoping to use these (or other powerful devices) for some high-end embedded vision tasks.

First impressions of the TK1 were good.  After installing the host environment on a Ubuntu 14.04 64 bit box and ‘flashing’ the TK1 with the latest version of all the software (including Linux4Tegra), I ran some of the NVidia demos – all suitably impressive.  It has an ARM quad-core CPU, but the main point of it is the CUDA GPU, with 192 cores, giving a stated peak of 326 GFLOPS – not bad for board that is under 13cm square.  It’s a SIMD (Single Instruction Multiple Data) processor, also known as a vector processor – so their claim that it is a ‘minisupercomputer’ isn’t too wildly unrealistic – although just calling it a ‘graphics cards with legs’ would also be fair.

I wrote some sample OpenCV programs using OpenCV4Tegra, utilising the GPU and CPU interchangeably, so we could do some performance benchmarks.  The results were OK, but not overwhelming.  Some code ran up to 4x faster on the GPU than the CPU, while other programs didn’t see that much benefit.  Of course, GPU programming is quite different from CPU programming, and not all tasks will ‘translate’ well to a vector processor.  One task we need in particular – stereo matching – might benefit more.

We will do more work on this in due course.  We will also be comparing the processing power to a Raspberry Pi 3, and some Odroids, as part of our evaluation of suitable hardware for this demanding embedded project.  More results will be posted here as we get them.