

For example, you almost always need to get all of your numbers into a contiguous block in memory. But it's still about working with numbers, and numbers in particular formats. Now, for many very common kinds of problems, the compiler automatically can work all of this out. Vector programming is a different way of thinking about your program. You need to organize things such that the math avoids most conditionals (because looking at the results too soon means a roundtrip to the NEON). To get that benefit, you must put your data in very specific formats so that the vector processor can load multiple data simultaneously, process it in parallel, and then write it back out simultaneously.

It also loves to do things like "add all these numbers together" or "add each element of these two lists of numbers to create a third list of numbers." So if you problem looks like those things the NEON processor is going to be huge help. That means that it is very good at performing an instruction (say "multiply by 4") to several pieces of data at the same time. It is an SIMD (Single Instruction, Multiple Data) vector processor. Can GCC optimize these cycles even though they iterate through custom data types?įrom your update, you may misunderstand what the NEON processor does.

Would you expect this to improve the performance of the project? Because we experienced no changes at all, which is rather weird considering all the answers I read here.Īnother question: all the for cycles have an apparent number of iterations, but many of them iterate through custom data types (structs or classes). To compile for the ARM board we use a Linaro toolchain cross-compiler, and GCC's version is 4.8.3. Keep in mind that this project includes extensive libraries such as open frameworks, OpenCV, and OpenNI, and everything was compiled with these flags. I compiled my project with the following flags: -O3 -mcpu=cortex-a9 -ftree-vectorize -mfloat-abi=hard -mfpu=neon I use Eclipse IDE in Linux Gentoo to write C++ code.Īfter reading the answers I did some tests with the software. Is there some kind of library or set of functions that can be used in C++ environment? I'm looking to optimize C++ code (mainly some for loops) using the NEON capability of computing 4 or 8 array elements at a time.
