Optimizing floating point code for ARM Cortex A8 and later CPUs

Difficulty Levels: Advanced
Date added: July 4, 2018
Affected Products: TAM-3517 , TAO-3530 , TDM-3730 , THB-3517

Summary

This tech note shows compiler options to get more performance of an ARM Cortex A8 Processor.

Compile command

This guide is written with Code Sourcery G++ 2010.09-50 (gcc 4.5.1) compiler in mind, but other versions should be similar.

Before starting, remember to use cross compile versions of all bintools:
export CC=arm-none-linux-gnueabi-gcc
export AS=arm-none-linux-gnueabi-as
export CPP=arm-none-linux-gnueabi-cpp
etc.

The optimization steps explained here is done by enabling ARM friendly optimizations in GCC. The default settings are often very conservative or X86 focused.

The ARM Cortex A8 and later contain the NEON SIMD unit. While intended to do vector operations, it is also quite efficient as an FPU.
Flags to gcc that enable floating point using the Neon SIMD DPS are
-mfpu=neon -funsafe-math-optimizations -mfloat-abi=softfp

The switch enabling unsafe floating point should be used with care, however it is necessary for gcc to generate Neon instructions (Neon is not 100% compatible with IEEE standards)

Soft-fp ABI switch is to enable FP instructions, but use software emulated fp calling conventions.
Note: that if your root filesysmte uses hard floats, the last flag should be replaced with -mfloat-abi=hard.

The TDM3730 contains an ARM Cortex A8 core, which supports ARMv7-A instructions, and can be enabled by:
-marm -mcpu=cortex-a8 -march=armv7-a

The mcpu flag defines what architecture to optimize for, and the march flag defines what instructions are allowed to use.

Some other, often useful flags is
-ftree-vectorize
is not included in -O2 optimization level, and adding it allows gcc to auto-generate SIMD code for the Neon.

All-in-all one tuned compile command is:
arm-none-linux-gnueabi-gcc -marm -mcpu=cortex-a8 -march=armv7-a -mfpu=neon -funsafe-math-optimizations -ftree-vectorize -mfloat-abi=softfp

Example

This contains an example how an autoconf based source code package can be compiled with optimzations enabled :
Note that the commands assume tcsh and not bash, replace setenv with export to make it bash compatible.

setenv ARMROOT /usr/src/tmp/tdm3730-default/rootfs/usr
setenv CC arm-none-linux-gnueabi-gcc
setenv AS arm-none-linux-gnueabi-as
setenv CPP arm-none-linux-gnueabi-cpp
setenv CFLAGS "-O2 -fwhole-program -marm -mcpu=cortex-a8 -march=armv7-a -mfpu=neon -funsafe-math-optimizations -ftree-vectorize -mfloat-abi=softfp -I${ARMROOT}/include -L${ARMROOT}/lib"
configure --prefix=$ARMROOT --host=i686 --target=arm

(change host from i686 to x86_64 for 64-bit hosts)

Further reading

For more in-depth details about optimizing ARM floating point math, read ahead here.

Stay up to date with all the latest TechNexion news...

Sign-up for our Newsletter