DSP Bricklayer: Benchmarks

Benchmarking, Performance analysis, Technical comparison
Under the benchmark heading, we will cover all techniques which are also known as Performance Analysis in Computer architecture.

Background

With each new type of chips, it was usual for electronics magazine to do a special report where the main item was a synthetic table of features.So we became very adept to do the same and for a DSP, a table of 20 features was largely sufficient. Even better, it could be reduced to 2 lines: Mhz and Nbr of MAC operations/s

thinking about it, in these days all DSPs were single MAC so that MHz was sufficient.The big issue was the number of data buses.
We also uses graphical analysis. In a nutshell, memory buses, MAC and ALU were drawn with their data path width; the bigger area, the better.

Hence came the concept of figures of merits. The CPU world had MIPS, the DSP world had MMAC/s.The natural step was to introduce kernel benchmarks since MMAC/s and FIR filter were identical. Also by the mid-80s, all manufacturers had their own list of kernels so it was easy to synthesize our own. And so we deeply believe that a benchmark mix of FIR, Biquad, adaptive FIR is still sufficient to characterize a DSP.

Before going further we must note two very important concepts.
The first one is the concept of benchmark mix.The biggest advance of BDTI, was to standardize the MIX in a simple and intelligent way.
The second one is when using kernel benchmarks, we gave up on speed and instead used cycles/kernel. In other words smaller was better as opposed to the CA methodology of bigger is better (e.g. D-MIPS).
And then, we naturally went from kernel to application benchmarks. By that time we had plenty of competition from ersatz DSP which used MHz as the measure of benchmarks and so we came with the concept of DSP MIPS.

By then, the CA guys were completely lost. Our reasoning is complex, but trust us this the best way to tackle the problem.

Instead of benchmarking the DSPs, we benchmarked the algorithms such that a G711 was 0.5 DSP MIPS, a G729 was 10 MIPS and a G723 was 30 MIPS. {Tbd recheck}
And then, we were so clever, we also could use this figure of merits for DSPs. All DSPs were single issue so that MHz and DSP MIPS were synonymous. So it was very easy and safe to predict that a 50MHz DSP had the workpower of 50 DSP MIPS and enough to implement a G723 speech coder.
By this simple technique we had merged chips, algorithms under one umbrella. Kernel benchmarks could also easily be integrated such as the BDT 256 point FFT had a 0.008 DSP MIPS (8000 cycles)
But then, all hell broke loose for multiple reasons:

DSP became CPU like and the CPU standard is Dhrystone MIPS.
C became the only acceptable of benchmarking applications.
Because MultiMedia was kind of becoming synonymous with DSP with ended up with all the benchmarks politics of data size.

audio is 24-bit so how do you compare apples and oranges etc.. to avoid the problem BDT use the concept of native size.

Even truer, CPUs and cores took a back seat to application platforms. Applications became the only respectable benchmarks (and by the same token totally impractical).

Nobody in his right mind is going to implement a full application for benchmarking purpose.

So as of 2012, benchmarking is 90% Linux testbench (downloading, rebuilding, compiling -o3, running, testing and measuring) and 10% optimization. We are miles away from evaluating the performance of a DSP.

List of techniques

Table of features

Problem: nowadays this is quantitatively more difficult. A standard SOC is made of hundreds of basic piece of IP.
Problem: when a 32-bit shifter is not equal to a 32-bit shifter

Single figure of Merits

MIPS
DSP MIPS
MMAC/s

the problem

MOPS

the problem: 1 MMAC = 2MOPS

ITU WMOPS
D-MIPS (Dhrystone MIPS)
Graphical figure of merit: the Lucent Cube

Graphical analysis
Manufacturer benchmarks (kernels)
personal and custom benchmark
Industry standard DSP benchmark (assembler) - BDT
Industry standard DSP benchmarks - the rest -> from bad to worse

Benchmark results gives ranking linearly proportional to MHz! So why bother?

Application benchmarks
Models

graphical models; fatter is better
Bob Owen nice little drawings.

References

BDTI web site
Eric Martin

Sunday, January 15, 2012

Benchmarks

No comments:

Post a Comment

Followers