Sunday, January 15, 2012

Benchmarks

Benchmarking, Performance analysis, Technical comparison
Under the benchmark heading, we will cover all techniques which are also known as Performance Analysis in Computer architecture.

Background
With each new type of chips, it was usual for electronics magazine to do a special report where the main item was a synthetic table of features.So we became very adept to do the same and  for a DSP, a table of 20 features was largely sufficient. Even better, it could be reduced to 2 lines: Mhz and Nbr of MAC operations/s
    • thinking about it, in these days all DSPs were single MAC so that MHz was sufficient.The big issue was the number of data buses.
    • We also uses graphical analysis. In a nutshell, memory buses, MAC and ALU were drawn with their data path width; the bigger area, the better.   
Hence came the concept of figures of merits. The CPU world had MIPS, the DSP world had MMAC/s.The natural step was to introduce kernel benchmarks since MMAC/s and FIR filter were identical. Also by the mid-80s, all manufacturers had their own list of kernels so it was easy to synthesize our own. And so we deeply believe that a benchmark mix of FIR, Biquad, adaptive FIR is still sufficient to characterize a DSP.

Before going further we must note two very important concepts.
The first one is the concept of benchmark mix.The biggest advance of BDTI, was to standardize the MIX in a simple and intelligent way.
The second one is when using kernel benchmarks, we gave up on speed and instead used cycles/kernel. In other words smaller was better as opposed to the CA methodology of bigger is better (e.g. D-MIPS).
And then, we naturally went from kernel to application benchmarks. By that time we had plenty of competition from ersatz DSP which used MHz as the measure of benchmarks and so we came with the concept of DSP MIPS.
  • By then, the CA guys were completely lost. Our reasoning is complex, but trust us this the best way to tackle the problem.
Instead of benchmarking the DSPs, we benchmarked the algorithms such that a G711 was 0.5 DSP MIPS,  a G729 was 10 MIPS and a G723 was 30 MIPS.  {Tbd  recheck}
And then, we were so clever, we also could use this figure of merits for DSPs. All DSPs were single issue so that MHz and DSP MIPS were synonymous. So it was very easy and safe to predict that a 50MHz DSP had the workpower of 50 DSP MIPS and enough to implement a G723 speech coder.  
By this simple technique we had merged chips, algorithms under one umbrella. Kernel benchmarks could also easily be integrated such as the BDT  256 point FFT had a 0.008 DSP MIPS (8000 cycles)
But then, all hell broke loose for multiple reasons:
  • DSP became CPU like and the CPU standard is Dhrystone MIPS.
  • C became the only acceptable of benchmarking applications.
  • Because MultiMedia was kind of becoming synonymous with DSP with ended up with all the benchmarks politics of data size. 
    • audio is 24-bit so how do you compare apples and oranges etc.. to avoid the problem BDT use the concept of native size. 
  • Even truer, CPUs and cores took a back seat to application platforms. Applications became the only respectable benchmarks (and by the same token totally impractical).
    • Nobody in his right mind is going to implement a full application for benchmarking purpose.
  • So as of 2012, benchmarking is 90% Linux testbench (downloading, rebuilding, compiling -o3, running, testing and measuring) and 10% optimization. We are miles away from evaluating the performance of a DSP.

List of techniques
  • Table of features
    • Problem: nowadays this is quantitatively more difficult. A standard  SOC is made of hundreds of basic piece of IP. 
    • Problem: when a 32-bit shifter is not equal to a 32-bit shifter
  • Single figure of Merits
      • MIPS
      • DSP MIPS
      • MMAC/s
        • the problem
      • MOPS
        • the problem: 1 MMAC = 2MOPS
      • ITU WMOPS  
      • D-MIPS (Dhrystone MIPS)
      • Graphical figure of merit: the Lucent Cube
      • Graphical analysis
      • Manufacturer benchmarks (kernels)
      • personal and custom benchmark
      • Industry standard DSP benchmark (assembler) - BDT
      • Industry standard DSP benchmarks - the rest -> from bad to worse
        • Benchmark results gives ranking linearly proportional to MHz! So why bother?
      • Application benchmarks
      • Models 
        • graphical models; fatter is better
        • Bob Owen nice little drawings.


    References
    1.  BDTI web site
    2.  Eric Martin

    Saturday, January 7, 2012

    DSP of Any Kind

    DSP of Any kind : boards, FPGA,  Custom chips etc..
    The goal of this section is to cover the "free world" of DSP architecture.

    By definition, commercial GP (General Purpose) DSPs such as a TI C64xx are not free. They must follow the constraints of Chip design, ISA design and most of all, compiler friendly and software upward compatibility.
    By comparison, there are many products which are point products or application specific. They are full of interesting tricks and original solutions.This is the case for custom DSP chips, and they can also be found among boards and FPGA based platforms.

    The problem with all these custom solutions is the lack of public information.
    • FPGA is the most obscure because the description is both code (HDL) and proprietary. 
    • Custom chips are more available in the form of conference papers [ref 1]. 
    • Boards, especially older boards, are the most accessible since the description is a block diagram  


    Background
    From 1978 to 1985, designing dsp boards with dsp BB
    A big application of bit slice families were DSP boards. We largely used the architecture of these boards to design the first generation of integrated GP DSPs.
    From 1985 to 1995, designing dsp boards with GP DSP
    Once integrated GP DSPs were available, the board architecture became predictable and most of the advances were largely outside DSP such as host interfacing[ref 10..13]
    Since 1995, FPGA

    Around 1995 was the time that , {with a different type of footprint}, the FPGA  became a credible alternative to board.  Not surprisingly the FPGA companies reinvented many of the BB and board architecture techniques of the 80s.

    DSP BOARDS                                                                           
    The historical path to electronics progress is to design boards based both on new components and better architecture understanding. And new components are then developed by integrating what was then a full board of electronics. Recent footprint changes (SIP) and nanotechnology did not fundamentally alter the process. A end-user board still represents a very good approximation of the next generation SOC platform. So in a methodology like ours, which design COPS by relying on proven DSP architectures, studying DSP boards will represent some design values.
    Let us start by categorizing DSP boards and keep in mind that the same categories could be applied to COPs.:
    1. Boards designed using  a bit slice family (AMD) or a building block family (ADI).
    2. Boards based on a General Purpose DSP (typically a TI DSP ) or high performance CPU.
    3. Special purpose boards such as Array processors or Vector Processors.
    4. Boards based on one or several custom chips  

      FPGA
      In a way, FPGA can be seen as a new type of board. There are however deep differences. 
      • The FPGA vendors (Xilinx, Altera, Quicklogic) propose catalogs of Intellectual Properties (IP) which constitute an important reference for the COP designer.
      • They also propose multiple builder tools. Some can serve as example for developing COP.
      • Finally, they have multiple experiences with "Matlab" to implementation (in the form of Simulink block to Xilinx hard macro). 
      Note: we do not try to  cover the implementation of a COP in a FPGA fabric. Instead we imply that the FPGA ecosytem constitutes a vast amount of resources to be tapped in.

      DSP CUSTOM CHIPS
      A long time ago, custom chips were a mine of information. For instance the key search of  'FFT' on the IEEE ISSCC DVD will bring in the region of 100 hits over the 1980s 10 years span. Nowadays the number of custom chips has dramatically diminished and the descriptions are rarely at the level of the building block. Still like boards they are a very good reference for communication and multiple processing.
      Also we should not forget that a COP is a form of custom chip.

      FURTHER: WHERE TO START WITH DSP BOARDS?
      Motivation: "should a DSP COP architect be studying DSP boards?".
      1) Today's DSP boards: they are too sophisticated to serve as good examples. Their design advances have more to do with inter-board techniques and software than "pure"DSP.  Still there are few techniques worth studying such as HOST-COP interfacing, serial inter-chip communication (SRIO) and heterogeneous MP.
      2) DSP boards of the 70-80s: they are much more amenable to study. They had a large range of architectures and structures [ref 2-9]. Theys are nearer to the architecture of a current COP design than current DSP boards and chips. 
      • One could think "If the DSP chip architects had done a good job, we would not need to go back to obscure stuff to get our architecture inputs". But this would be unfair since designing a GP DSP had  many constraints given by software. We will not say that these constraints should be totally ignored by the COP designer but the whole point of designing a COP is for these constraints to take a back seat.   
      We will now do a quick review of some paper references and draw conclusions on what can be possibly reused:
      1. Most articles had a block diagram of (what was supposed to be) a DSP
      2. The definition of a DSP varied widely. For example:
        1. Usage of serial arithmetic.   
        2. Usage of shift register as opposed to memory
        3. A memory model reduced to I/O buffers (ADC/DAC) so that there was no need of AGU.
          1. Even a model with only DAC (MIT speech generator)
        4. AGU with flags going back to the ALU status registers. Obviously to do circular.
        5. Single bus architecture, and the Zurich zip trick as an architecture feature 
      3. Veendrick [ref. 11]  had an interesting BB that we will call "MULIMIT"
        1. Z= K*A + L*B
      References                                                                                                                                      
      1. Hot chips, ICSPAT, ICCASP, ISSCC, see also IEEE DVDs (Com. Soc., SP Soc. SSC. Soc.)
      2. Louis Schirm IV, TRW inc. "Packing a signal processor on a single digital board" electronics dec 20, 1979
      3. John Mick "Fast computational devices  for DSP" likely at some conference before 1980 (?)
      4.  Zaheer Ali "Know the LSI hardware of digital signal processors" EDN 21 june, 1979
      5. Capello, etc.. "Completly pipelined architectures for DSP" IEEE Vol ASSP-31, August 1983
      6. S.Chin, C.Brooks "Microprogramming enhances signal processor's performance"   Electronics, nov 17, 1982
      7. Zemam, Troy Nagle " A high speed microprogrammable DSP employing distributed arithmetic" IEEE Vol SC-15, Feb 1980
      8. R.Shively "Architecture of a programmable DSP" IEEE Vol C-31, Jan 1983
      9. IBM SP16
      10. MIT Klatt vocal tract model" ICCASP 82 ??
      11. Phillips Labs "A 40 MHz Multiapplicable DSP chip" IEEE Vol SC-17, Feb 1982
      12. TRW ICCASP 83
      13. David Karlin ( Fairchild) " VLSI BB for DSP" ICCASP 1982
      14. Barral, Moreau "Circuits for Digital signal processing" ICCASP 1984
      15. NTT "LSI's for DSP" IEEE vol. SC-14, April 1979
      16. F. Mintzer, A.Peled "The architecture of the real-time signal processor "ICCASP 1982
      17. and dozens of others with key searches in IEEE explore
      18.  Plug-in DSP boards" EDN April 26, 1990
      19. "DSP coprocessor boards" EDN Sep 13, 1991
      20. "DSP boards help tackle a tough class of AI tasks" electronics, aug 21,1986
      21. G.Pawle, T.Faherty "DSP development board offers host independence" computer design october15,1984

      Sunday, January 1, 2012

      COP: DSP boards, FPGA

      COP: DSP boards, FPGA
      The goal of this section is to cover any DSP COP, with a footprint different from a chip. The two main candidates are boards and FPGA. While FPGA are chips, the very large FPGA are more like boards in terms of price, flexibility and topology. Also they are direct competitors in the PC SOC socket. 

      Background
      Sometimes in the 90s plug-in dsp boards become coprocessor boards. In a terminology which was largely marketing, a dsp board for the VME bus was called a plug-in but the same board with degraded performance for the ISA bus was a COP board.[ref 1,2,3,4]


      Description
      PC coprocessing socket
      One of the most interesting DSP Application of recent years was financial coprocessor. Intel architecture allows adding a COP on a very tightly coupled interface.  A couple of people [ref 5,6] used this socket to put a board which boils down to 1 or several higest end FPGAs. For the record what is accelerated is effectively some Matlab functions.
      Embedded coprocessing
      Also in the embedded world, a FPGA + C64xx is a standard sight. For instance in general purpose boards or wireless infrastructure. While a part of the  FPGA is for jelly beans an even larger part takes care of pre/post processing. Also nore that the C64xx does not support specifically for coprocessors. 


      References                                                                                                                                      
      1. " Plug-in DSP boards" EDN April 26, 1990
      2. "DSP coprocessor boards" EDN Sep 13, 1991
      3. "DSP boards help tackle a tough class of AI tasks" electronics, aug 21,1986
      4. G.Pawle, T.Faherty "DSP development board offers host independence" computer design october15,1984
      5. Nallatech
      6. HC 2009??