Sunday, May 20, 2012

T.O.C 20 May 2012

  1. So you want to be a DSP architect?
    1. The long story
    2. Background Checks 
    3. Building Block issues
    4. Methodology and tools issues
  2. Arithmetic BB
  3. Bit wise BB
  4. Vector BB
  5. Math BB
  6. DSP BB
  7. Matlab BB
  8. Found on the web
  9. Stop me if you've heard this one before!
  10. APPENDIX - DSP Architecture 
    1. DSP of the First Kind (1980-95)
    2. DSP of the Second Kind (1995-2005)
    3. DSP of the Third Kind (2010- ?)
    4. DSP of the lost kind (1995-2005)
    5. DSP  of any kind (1950-2050)
  11. APPENDIX - DSP structural Building Block
    1. AGU 
    2. PCU
    3. DAU
    4. SU (The Shuffle Unit) 
    5. RF (Register File)
    6. Bit slice: Bit slice building blocks

Bit Wise BB - an introduction

Bit Wise BB
This is  a heck of a topic!
Firstly it overlaps ISA studies  and DSP Building Blocks in at least 2 specific places (the DAU and the shuffle units).
Secondly, the variety of  function classes is very wide. For instance, pack/unpack, Count Leading Signs, Gallois Fields, bit manipulation.
Thirdly, and this will serve as an introduction to this topic, the function themselves can become amazingly out of hand. 
Finally, Matlab bit wise capability is limited or non coherent. As of this writing our latest bit wise library is floating point based!

Background                                                                                                                                  
In 1996, when we were designing the Tricore ISA, Bruce came up with a set of so called "permute" instructions and I do remember Rod's reaction as between bafflement and  irritation.
On Bruce side, the truth is that all Media processors had this class of instructions [ ref: search Ruby Lee].
On Rod side, we all know that the most precious resource in ISA design is the opcode. And, the biggest issue with bit wise instructions is that they are opcode hogs (always requiring dozens of control bits).
For the sake of the story, I would add that, since 1996 I have been involved with this issue multiple times and seen a few people falling in this trap.     

Example: TI C64+ : instruction PACKHL2                                                                                              
To illustrate this topic, we will now discuss in details the PACKHL2 instruction of the TI C64+ DSP.  We will then expand to all C64+ PACK instructions.

naming convention and syntax issue
The instruction mnemonic PACK implies that the destination register is smaller than the source register(s), HL is a subtype and 2 in the TI terminology means 2 sub-word operations. Since the registers are 32-bit wide,  ADD2 (sub2, etc.. ) will mean 2 x 16-bit additions, and ADD4 will 4 x 8-bit additions in parallel.

The syntax is
      PACKHL2(src1,src2, dst)   where src1,src2, dst  are all 32 bit registers

 Question: since he syntax is exactly the same as  
      ADD (src1,src2,dst)   where src1,src2, dst  are all 32 bit registers
why do you need the suffix 2?
Answer; In a 32-bit architecture, the 32-bit register is generic. All instructions use 32-bit registers. What matters are the operations inside the 32-bit registers. In this case 2 implies 16-bit data.
Question:  PACK implies some kind of register demotion. The syntax dst = src1 <op> src2 is just like any standard 2-operator syntax. The 2 registers are operated upon and the result is written to destination. Where is demotion in this type of operation?
Effectively, we would be more comfortable with   dst32= PACK(src64) where src64 is a 32-bit register pair (such as A3_A2). 
                         PACKHL A3_A2,A0       ; a syntax using a 32-bit register pair (64bit )  
                         PACKHL A2, A8, A0       ; TI syntax using  two 32-bit registers
but the advantage of the TI syntax is obvious: register flexibility.

Definition
We are now entering the core of the matter. What is the definition of the PACKHL instruction? First we want something simple to describe this instruction. Writing a "C" definition is rather wordy and the TI standard description is more complicated than needed.  The simplest description is to consider the 2 registers side by side, src2 on the left (made of the two half words D_C) and src1 on the right (respectively B_A), the result will be C_B.  
                      D_C B_A
                     C_B       
We have now the following definitions:
                        Z = concat(hi(X),lo(Y));    % hi() and lo() are self explaining functions
                        Z = [hi(X) lo(Y)];              % even more Matlab like
                        Z = [X(31:16) Y(15:0)];     not so Matlab

And they all look clear. But concatenation is not the same as packing. In fact, intuitively it is the contrary. One increase the variable size, the other reduce it.

Towards a general definition of PACK
Let us start with a most general definition. The register size is 64-bit and the granularity is 8-bit. Both values are very reasonable in 32-bit architectures.We will call this instruction PERM(ute).
                                      PERM(src64, dst64, controlword);
We will see later that PACKHL2 is just a sub-case of PERM.
PERM is easily described. as shown the following example
  src64   HGFEDCBA
         8x8 switch
  dst64   AAGHFEDC
In this description each  letter represents a byte and note the little endian choice.
But then what is the control word and how many bits do you need? The number of bits is the problem. In this case we have for destination a 64-bit register made of 8 sub-component (bytes). Each byte can receive any of the 8 source bytes (a 3-bit control) and since there are 8 destination bytes, the total number of bits is then 3x8= 24 bits. Not an easy decision to make in a 32-bit opcode! And using a register to hold the control word is a poor solution. It means a 3-cycle instruction (MVK, MVHK, PERM).
Finally the control word syntax is very straightforward. We just use the representation of the destination register.In the example above it is:
                              PERM(src64,dst64, AAGHFEDC)

Matching PERM and PACK
Using the similar definition to the one above, it can be seen that PACKHL2 is equivalent to:
                                         PERM(src32,src32,dst32, FEDC)
 In fact we can match all C64+ PACK Instructions in the same way (see table) 
It must be noted that the C64+  DPACK instructions are effectively more like PERM than PACK since the sources (combined) and destination have the same 64-bit width.


Conclusions
  • Defining bit wise functions just by looking at the datapath is relatively easy and can give simple yet very powerful structures. Architects (attracted by elegance) will always love that.
  • The problem is the control path which can become rapidly out of hand.
  • To illustrate the problem we took an intruction from the C64+ instruction (PACKHL2) and extended it to a generic Permute instruction. The number of bits require to describe the instructions would be 24. 
  • Thiis problem applies the same to a building block or a coprocessor unit. 
  • With reference to the C64+ we will now compare the two approaches:
    • Advantages of using a general PERM instruction
      • Conceptually, it is very simple.
      • Software implementation is very direct.
      • Very flexible: any source byte can go to any destination byte. Any source byte can be duplicated (or replicated n times) in destination. 
      • No need to do long studies and have drastic selection to choose the right PACK datapath to implement (this is very often the case with bytes). 
        • C64+ offers only 2 choice: byte even and byte odd
    •  Shortcomings for using a general PERM instruction
      • Need 24 bits of control. This is not realistic in 32-bit ISA.
        • C64+ defines only 8 instructions.The footprint is minimal
      • Having 24 bits gives power(2,24) possibilities; how to test that? (by construction?)
      • The flexibility advantage may be a delusion. Some features are missing. For instance, just looking at the C64+ ISA (sign extension, packing with saturation). saturation.
  • While TI made the right choices for the C64+ ISA, a different situation (64-bit ISA, dedicated COP, etc..) might give different results.The astute reader, we are sure, has already plenty of ideas and solutions.
  • BUT this is not the point of this section. The point is to make sure that you understand the main risk associated with Bit wise instructions (the control bits) . To be forewarned is to be ... 






  


 



Saturday, February 18, 2012

SDR (Software Defined Radio)

SDR (Software Defined Radio)
This section covers an ambitious project. Let us imagine a wireless terminal made of 2 CPU blocks:  a powerful host armed with all media capabilities and a second block which can process any kind of RF signals and radio  protocols  (i.e 3G,4G, Wifi, gps, up to future standards such as cognitive radio). This second block is commonly called Software Defined Radio (SDR).

Background
The concept of SDR is as old as the radio [ref 1] but in our context we will put the burden on the solid shoulders of Les Mintzer circa 2000 [ref 2]. This was an FPGA implementation which makes a lot of sense as the money and impetus came from the military. In the same school see also Xilinx's Chris Dick, Spectrum's Lee Pucker and a few others [ref 3,4,5,6,7,8,9,10]. Now at this stage their implementation of SDR was far from fitting the definition but it made sense. Worse was to come. Firstly, in a brilliant case of taking the tree for the forest, a bunch of guys were using the PC/pentium as their example of SDR.They could do everything except radio.
Frankly first time I heard about that, I thought of genius generating radio from a PC  by using the EMC radiation of the chips, this sounds silly but after all this was the time of UWB so why not?
And then we had the usual silicon valley trend: a new buzzword for a new type of chip architecture [ref 11,12,13].


Reference
  1. Dan Stransberg  " A century old technology enters the digital age", EDN , October 28 1999
  2. Les Mintzer "Soft Radios and modems and FPGA" CSD mag, february 2000 p.52
  3. Chris Dick, Fred Harris "the platform FPGA Software defined Radio" good paper but no reference, likely a conference circa 2001
  4. Chris Dick "FPGAs cranked for software radio" EET, April 10, 2000
  5. Chris Dick " A case for using FPGAs in SDR Phy" EET, August 12,2002
  6. Lee Pucker "Paving Paths to SDR" csd mag june 2001 p.19
  7. Lee Pucker, Systems Architect, Spectrum Signal Processing "Distributed Architecture for SDR" CSD 2001
  8. Robert Sgandura PM Pentek " W221: Software Radio - From concepts to Implementation" likely at Communication System Design conference 2000 or 2001
  9. Cord Finlay "Understanding SDR requirements" Wireless System Design July 2001
  10. "Soft Radio key to universal applications" EE Times Special Report March 19, 2001
    1. so full of hope, fortunately Loring Wirbel deflated the hot air in EET aug 12, 2002.
  11. John Ralston "SDR emerges to address wireless industry needs" Wireless System design october 2000 
    1. Describes the Morphics architecture
  12. Armin Nueckel, Prashant Rao "The use of Reconfigurable Processor Arrays in wireless Infrastructure Systems" July 25, 2001 {white paper?} 
    1. Describes the PACT architecture
  13. Paul Master "the baseband solution for the worldphone" www.quicksilver.com 2001

    Saturday, February 11, 2012

    DSP of the lost kind

    DSP of the lost kind
    The goal of this section is to laugh through the multiple attempts at defining new DSPs , next craze DSPs, further than DSPs, beyond DSPs , everything and their contrary as we went from 1995 to 2005 . Among the most notables we will have a quick pass at: Media processors.
    Now, the obvious question is: why should we care? Once more, the answer is that there is a very large difference in use model between a  GP full market CPU and a COP.
    What is ridiculous in one case becomes brilliant and  maybe most effective for the given application (see NVIDIA evolution). Also some of the brightest architects of  our time were involved in the design of these new machines. Finally this is still one of  the riches vein of ideas available publicly (but not freely) (IEEE, ACM, MDR, etc..). 

    BACKGROUND
    There are two reasons why we feel sad about this "Idiot Wind" period .
    1. Too many people were getting carried away and too many arguments just did not make sense. 
      1. For instance we computed it would take 1200 men years of application software (not tools)  just to meet the minimum requirements of fulfilling the claim of being a Media processor.
    2. For a while, we went in the same direction, (and we have to thank the influence of Jim Turley for that) (mind you he changed his mind pretty quickly too).
    DESCRIPTION AND TYPES
    • Type 1
      • Video Signal Processors
      • Video DSP
      • Media Processors
    • Type 2  Legoland
      • Multi IP DSP, Cluster Based DSP
        • Bops, Cell, Improv, Equator
        •  atsana, chipswright, clearspeeed, craddle, e-lite, powerFFT, sandbridge, siroyan, synputer, telairity, tops, trips  
        • Infinite-tech RADarray  (neil stollon ICSPAT 99)
          • RADarray of RADcore coprocessors connected to host
          • Host and peripherals are 3rd-party IP
          • Each RADcore is a reconfigurable stream algorithm coprocessor.
          • Each RADcore is made of user selected Execution Units (EXUs) interconnected by reconfigurable data path bus architecture
          • EXUs form independent elemental processing blocks such as ALUs, MACs, memories, I/O,etc..
    • Type 3 Domain DSP
      • Wireless DSP,
      • Speech, Audio DSP
      • Broadband DSP
    • Custom DSP, configurable DSP, 
    • Reconfigurable DSP
    Media Processor (MP) (or MVP) (V for Video)
    Born around 92-94, the  MVP had a peak around 1997, went through multiple "down and up"s before final death by 2005.The most famous (first wave) names were TriMedia, MicroUnity, Chromatics, NVidia [ref.1]
    They had several characteristic in common:
    • It was a new type of processor (the MEDIA processor)
      • from our DSP perspective it was a bit baffling. Why not use a DSP?
    • They were the latest incarnation of the MultiMedia (MM) craze.
      • very soon followed by MM extensions. Compare MP and MM extension
    • For some reasons most were based on VLIW.
      • Why?
    • While 5 of 7 MM functions could be done by a DSP,  they rightly believed that a DSP would not be able to do the last 2 (video and graphics). 
      • But then why do they think they could?
      • What were the 7 functions?
    • Finally, the most obvious characteristic was their disdain for the Pentium. In a typical Silicon Valley Fashion, after loosing the Risc war, that was the new  frontier.In a nutshell:
      • First they lost the PC seat to the Pentium (that was CISC versus RISC)
      • Then they (re lost) the PC MM seat to the Pentium (that was GP processor versus Media processor)
      • Finally they could not even find a small coprocessor seat in the PC, i.e. to be used as a Media COP!!
        • This is the point which merits attention. See NVIDIA below
      • In their final days, to add insult to injury they were totally inadequate for the embedded market. 
        • Mind you, good luck trying DVD in 1999 or cell phone in 2000 with a 288 bit wide opcode and no software support!
    NVIDIA
    In the most remarkable feat of the century, NVIDIA changed direction by focusing on graphics instead of a rainbow of applications [ref.2].
           Not sure about their recent direction towards general purpose, people never learn it seems!
    TriMedia
    In another example of adaptation, the champion of VLIW, after years of trying software optimization instead redesigned their chips with two multiple powerful video coprocessors. Also important, Phillips concentrated on platforms and Trimedia became "the other core" of the Nexperia platform. Instead of ARM+DSP it was MIPS+TriMedia. Mind you, the latest instantiation of the platform could be ARM + COPs.
    Definitely a good DSP core, the TriMedia  was a relatively simple 5 issue machine, had some neat tricks (fusing of  operations at the register file ports) and very eearly code compression. [ref 6,7,8,9]
    MicroUnity MediaProcessor
    Original coiner of the term, famous for John Moussouris, MicroUnity was the first on the scene and the first to die; they set the architecture standard for a single core parallelism pretty high. [ref 10,11]. They had a following with Equator.
    Chromatics MPACT
    The other Media Processor, noticed for its 792-bit datapath (if today it sounds peanuts, at the time the standard was 64-bit)... currently dead. [ref 12]
    Offer from the East
    As can one expect, Japan and its all powerful integrated consumer industry were not the last to follow the hype with their own offering
    - Sharp DDMP (1997)
    - NEC MP98   (MPR March2000)
    - Toshiba MEP (after giving up MPACT) (EPF 2002)
    - Fujitsu (was their VLIW a Media Processor?)
    - Hitachi ??
    - Matsushita ??
    Among the Noise
    While not serious commercial products, the MVPs developed by Universities and Research labs had interesting features. These are 2 examples we were familiar at the time but there are litteraly 100s of them [ref 13,14].
    • University of Hannover HiPAR  {search Johannes Kneip} (IEEE Video for Circuit and systems 1996)
      • Impressive X,Y memory
    • Infineon VIP {search Uli Ramacher} (Hot Chips 13) 
    Finally the true MVP
    The TI C80 also known as MVP (1993) remains an excellent architecture example. It is NOT a Media processor. Its was designed like any other DSP, with an ambitious target in MIPS which happened to be  video conferencing. Now as part of the strategy obviously the target was anything in the same order of MIPS magnitude. Most interesting it was a host+4xDSP core solution. Even more: how can you be so wrong in the memory model? (refer to somewhere else in this Blog). 

    Lessons Learnt
    NVIDIA
    Headline1: MultiMedia processor becomes UniMedia processor.

    Headline2: Media processor becomes Graphics solution..
    And they had enormous success as Pentium "Coprocessor" .
    Now this is an important lesson, for co-processing.  As a general purpose co-processing MM chip, the MVP architecture was a complete failure. As a "very focused" co-processing solution point, it is part of the standard PC architecture. 
    Microunity, Chromatics, TriMedia
    It would be silly to forget these architectures. On one hand, more recent and better architectures have been developed but on the other hand they are not so visible. 
    Further: the GPU story
    Obviously these are still questions (we have to wait another 3-5 years to learn the lesson):
    - KEY: is there a place for another type of computing (the GPU)?
           One has to bear in mind that in 40 years of  silicon computing  (except for a short time of DSP) all attempts to compete with the general purpose model ended up in abject failures.
    - what kind of the processing model is the dual head(CPU+GPU) adopted by AMD ?
    - in the same way, what kind CPU+GPU is the Apple model?
     
    References
    1. John.A.Watlington "Video signal processors" , circa 1997, http://wad.www.media.mit.edu/peole/wad/vsp/node1.htm from a web site which (as usual) has gone pining for the woods. Too bad , it contained an excellent and succint table of comparison.
    2. Section news" NVIDIA changes direction" EBN December 23,1996 issue:1038 
    3. Bernard Cole "New processors up multimedia's punch" EET February 3, 1997.
      8. Lee, W., Kim, Y., Gove, R.J., and Reed, C.J., “MediaStation 5000: Integrating Video an
      1. it is more the answer from CPUs (MM extensions) to MM processors.
    4. Jim Turley "Multimedia chips complicate choices" MDR Feb 12, 1996 page 14
    5. Maury Wright "Media Procesors target digital video Roles" EDN  sep 1, 1998
    6. Tom Halfill "Philips Trimedia goes Mobile " MPR dec 5, 2005 
    7. Gert Slavenburg and his biking pal "DSPCPU operations for TM1100" 1999, Appendix A, Preliminary information
    8. Gert Slavenburg, etc..'Custom operations for Multimedia" Chapter 4 of the same reference
    9. Peter Clarke " Compressed VLIW meets multimedia" EE Dec, 1995
    10. Craig Hansen "MicroUnity MediaProcessor Architecture" IEEE micro, 1996
      1. quote "A broadband mediaprocessor extends and streamlines a general-purpose computer system to attain the goal of communicating and processing digital video, audio, data and RF signals (SIC!) at broadband rates using compiled, downloadable, software rather than special-purpose hardware. The instruction set, system facilities, and initial implementations of an architectural family of broadband mediaprocessors are introduced, and compiled software development is illustrated with an example and description of the development environment".
    11. John Moussouris (easy to remember mouse+souris) "A roadmap of the Mediaprocessor Design space" Microdesign Resources Dinner May 9, 1996
    12. Yong Yao " Chromatics's Mpact2 boosts 3D, MPR Nov 18, 1996
    13. SSC 27 12, dec 92 p1886
    14. SSC 29 12, dec 94 p1474
    15. http://www.cse.fau.edu/~borko/Chapter21_mc.pdf
      1. Just added this morning, just wished I had  read it before.kind of complement to our MM splash. Hey Borko is it public? 
    Answer to questions
    The 7 functions:
    • graphics including 3D
    • video (compression, editing , conferencing)
    • high quality Audio
    • computer telephony, Speech
    • communications, Fax, Modems
    • real time 3D games
    • DVD?

      Wednesday, February 1, 2012

      COP - DSP FUNCTIONS

      DSP Functional Building Blocks
      The goal of this section is to cover the history of long story DSP functional blocks. As its name implies, this type of block is opposed to DSP structural block such as (bit/byte/word  slice logic).  

      BACKGROUND
      Let us give a special mention to the CAFIR (a FIR filter from motorola 1986) and the C66xx FTT Engine (TI 2010) and to our friends of the Morphics Next Gen.


      LIST OF FUNCTIONS
      • Filters
        • including equalizers
      • FFTers
        • bit reverse (!!)
      • Convolvers
      • Bit convolvers and Gallois field arithmetic
      • Hats off to unsung heroes of 30 years of custom DSP (see for instance ISSCC 1980-now)
      FILTERS
      DSP56200 also known as CAFIR (Motorola 1987)  [ref. 1]
      - Main function is adaptive filtering using LMS .
      - 256 x 24-bit coef RAM
      - 256 x16-bit data  RAM
      Can be configured as single FIR, dual FIR or Single adaptive FIR
      Programmable loop gain in adaptive Mode
      Programmable coef. leakage term.




      REFERENCES
       
      1. Motorola " DSP56200  data sheet", 1988,also called ADI1257R1
      2. Hossein Yassaie  "Digital filtering with the IMS A100" Inmos 1986
      3. FUJITSU " MB86795 data sheet" ,  aug 1987
      4. Atmel "ATC 76C001 programmable FIR filter", 1996
      5. Amphion " Cascadable FIR", Quicklogic, 2001
      6. ISS "Biquad IIR filter megafunction" Altera, 200
      7. see Xilinx, Altera etc.. Catalogues
      8. Jordan, mannock "Correlation-function, peak detector",  IEE Proc., March 1981
      9. Tao Lin, Dahn Le ngoc" Implementation of digital filters using IDT7320, IDT7210, IDT7216, and IDT7383" IDT A.N  AN-32, 1990

      Sunday, January 15, 2012

      Benchmarks

      Benchmarking, Performance analysis, Technical comparison
      Under the benchmark heading, we will cover all techniques which are also known as Performance Analysis in Computer architecture.

      Background
      With each new type of chips, it was usual for electronics magazine to do a special report where the main item was a synthetic table of features.So we became very adept to do the same and  for a DSP, a table of 20 features was largely sufficient. Even better, it could be reduced to 2 lines: Mhz and Nbr of MAC operations/s
        • thinking about it, in these days all DSPs were single MAC so that MHz was sufficient.The big issue was the number of data buses.
        • We also uses graphical analysis. In a nutshell, memory buses, MAC and ALU were drawn with their data path width; the bigger area, the better.   
      Hence came the concept of figures of merits. The CPU world had MIPS, the DSP world had MMAC/s.The natural step was to introduce kernel benchmarks since MMAC/s and FIR filter were identical. Also by the mid-80s, all manufacturers had their own list of kernels so it was easy to synthesize our own. And so we deeply believe that a benchmark mix of FIR, Biquad, adaptive FIR is still sufficient to characterize a DSP.

      Before going further we must note two very important concepts.
      The first one is the concept of benchmark mix.The biggest advance of BDTI, was to standardize the MIX in a simple and intelligent way.
      The second one is when using kernel benchmarks, we gave up on speed and instead used cycles/kernel. In other words smaller was better as opposed to the CA methodology of bigger is better (e.g. D-MIPS).
      And then, we naturally went from kernel to application benchmarks. By that time we had plenty of competition from ersatz DSP which used MHz as the measure of benchmarks and so we came with the concept of DSP MIPS.
      • By then, the CA guys were completely lost. Our reasoning is complex, but trust us this the best way to tackle the problem.
      Instead of benchmarking the DSPs, we benchmarked the algorithms such that a G711 was 0.5 DSP MIPS,  a G729 was 10 MIPS and a G723 was 30 MIPS.  {Tbd  recheck}
      And then, we were so clever, we also could use this figure of merits for DSPs. All DSPs were single issue so that MHz and DSP MIPS were synonymous. So it was very easy and safe to predict that a 50MHz DSP had the workpower of 50 DSP MIPS and enough to implement a G723 speech coder.  
      By this simple technique we had merged chips, algorithms under one umbrella. Kernel benchmarks could also easily be integrated such as the BDT  256 point FFT had a 0.008 DSP MIPS (8000 cycles)
      But then, all hell broke loose for multiple reasons:
      • DSP became CPU like and the CPU standard is Dhrystone MIPS.
      • C became the only acceptable of benchmarking applications.
      • Because MultiMedia was kind of becoming synonymous with DSP with ended up with all the benchmarks politics of data size. 
        • audio is 24-bit so how do you compare apples and oranges etc.. to avoid the problem BDT use the concept of native size. 
      • Even truer, CPUs and cores took a back seat to application platforms. Applications became the only respectable benchmarks (and by the same token totally impractical).
        • Nobody in his right mind is going to implement a full application for benchmarking purpose.
      • So as of 2012, benchmarking is 90% Linux testbench (downloading, rebuilding, compiling -o3, running, testing and measuring) and 10% optimization. We are miles away from evaluating the performance of a DSP.

      List of techniques
      • Table of features
        • Problem: nowadays this is quantitatively more difficult. A standard  SOC is made of hundreds of basic piece of IP. 
        • Problem: when a 32-bit shifter is not equal to a 32-bit shifter
      • Single figure of Merits
          • MIPS
          • DSP MIPS
          • MMAC/s
            • the problem
          • MOPS
            • the problem: 1 MMAC = 2MOPS
          • ITU WMOPS  
          • D-MIPS (Dhrystone MIPS)
          • Graphical figure of merit: the Lucent Cube
          • Graphical analysis
          • Manufacturer benchmarks (kernels)
          • personal and custom benchmark
          • Industry standard DSP benchmark (assembler) - BDT
          • Industry standard DSP benchmarks - the rest -> from bad to worse
            • Benchmark results gives ranking linearly proportional to MHz! So why bother?
          • Application benchmarks
          • Models 
            • graphical models; fatter is better
            • Bob Owen nice little drawings.


        References
        1.  BDTI web site
        2.  Eric Martin

        Saturday, January 7, 2012

        DSP of Any Kind

        DSP of Any kind : boards, FPGA,  Custom chips etc..
        The goal of this section is to cover the "free world" of DSP architecture.

        By definition, commercial GP (General Purpose) DSPs such as a TI C64xx are not free. They must follow the constraints of Chip design, ISA design and most of all, compiler friendly and software upward compatibility.
        By comparison, there are many products which are point products or application specific. They are full of interesting tricks and original solutions.This is the case for custom DSP chips, and they can also be found among boards and FPGA based platforms.

        The problem with all these custom solutions is the lack of public information.
        • FPGA is the most obscure because the description is both code (HDL) and proprietary. 
        • Custom chips are more available in the form of conference papers [ref 1]. 
        • Boards, especially older boards, are the most accessible since the description is a block diagram  


        Background
        From 1978 to 1985, designing dsp boards with dsp BB
        A big application of bit slice families were DSP boards. We largely used the architecture of these boards to design the first generation of integrated GP DSPs.
        From 1985 to 1995, designing dsp boards with GP DSP
        Once integrated GP DSPs were available, the board architecture became predictable and most of the advances were largely outside DSP such as host interfacing[ref 10..13]
        Since 1995, FPGA

        Around 1995 was the time that , {with a different type of footprint}, the FPGA  became a credible alternative to board.  Not surprisingly the FPGA companies reinvented many of the BB and board architecture techniques of the 80s.

        DSP BOARDS                                                                           
        The historical path to electronics progress is to design boards based both on new components and better architecture understanding. And new components are then developed by integrating what was then a full board of electronics. Recent footprint changes (SIP) and nanotechnology did not fundamentally alter the process. A end-user board still represents a very good approximation of the next generation SOC platform. So in a methodology like ours, which design COPS by relying on proven DSP architectures, studying DSP boards will represent some design values.
        Let us start by categorizing DSP boards and keep in mind that the same categories could be applied to COPs.:
        1. Boards designed using  a bit slice family (AMD) or a building block family (ADI).
        2. Boards based on a General Purpose DSP (typically a TI DSP ) or high performance CPU.
        3. Special purpose boards such as Array processors or Vector Processors.
        4. Boards based on one or several custom chips  

          FPGA
          In a way, FPGA can be seen as a new type of board. There are however deep differences. 
          • The FPGA vendors (Xilinx, Altera, Quicklogic) propose catalogs of Intellectual Properties (IP) which constitute an important reference for the COP designer.
          • They also propose multiple builder tools. Some can serve as example for developing COP.
          • Finally, they have multiple experiences with "Matlab" to implementation (in the form of Simulink block to Xilinx hard macro). 
          Note: we do not try to  cover the implementation of a COP in a FPGA fabric. Instead we imply that the FPGA ecosytem constitutes a vast amount of resources to be tapped in.

          DSP CUSTOM CHIPS
          A long time ago, custom chips were a mine of information. For instance the key search of  'FFT' on the IEEE ISSCC DVD will bring in the region of 100 hits over the 1980s 10 years span. Nowadays the number of custom chips has dramatically diminished and the descriptions are rarely at the level of the building block. Still like boards they are a very good reference for communication and multiple processing.
          Also we should not forget that a COP is a form of custom chip.

          FURTHER: WHERE TO START WITH DSP BOARDS?
          Motivation: "should a DSP COP architect be studying DSP boards?".
          1) Today's DSP boards: they are too sophisticated to serve as good examples. Their design advances have more to do with inter-board techniques and software than "pure"DSP.  Still there are few techniques worth studying such as HOST-COP interfacing, serial inter-chip communication (SRIO) and heterogeneous MP.
          2) DSP boards of the 70-80s: they are much more amenable to study. They had a large range of architectures and structures [ref 2-9]. Theys are nearer to the architecture of a current COP design than current DSP boards and chips. 
          • One could think "If the DSP chip architects had done a good job, we would not need to go back to obscure stuff to get our architecture inputs". But this would be unfair since designing a GP DSP had  many constraints given by software. We will not say that these constraints should be totally ignored by the COP designer but the whole point of designing a COP is for these constraints to take a back seat.   
          We will now do a quick review of some paper references and draw conclusions on what can be possibly reused:
          1. Most articles had a block diagram of (what was supposed to be) a DSP
          2. The definition of a DSP varied widely. For example:
            1. Usage of serial arithmetic.   
            2. Usage of shift register as opposed to memory
            3. A memory model reduced to I/O buffers (ADC/DAC) so that there was no need of AGU.
              1. Even a model with only DAC (MIT speech generator)
            4. AGU with flags going back to the ALU status registers. Obviously to do circular.
            5. Single bus architecture, and the Zurich zip trick as an architecture feature 
          3. Veendrick [ref. 11]  had an interesting BB that we will call "MULIMIT"
            1. Z= K*A + L*B
          References                                                                                                                                      
          1. Hot chips, ICSPAT, ICCASP, ISSCC, see also IEEE DVDs (Com. Soc., SP Soc. SSC. Soc.)
          2. Louis Schirm IV, TRW inc. "Packing a signal processor on a single digital board" electronics dec 20, 1979
          3. John Mick "Fast computational devices  for DSP" likely at some conference before 1980 (?)
          4.  Zaheer Ali "Know the LSI hardware of digital signal processors" EDN 21 june, 1979
          5. Capello, etc.. "Completly pipelined architectures for DSP" IEEE Vol ASSP-31, August 1983
          6. S.Chin, C.Brooks "Microprogramming enhances signal processor's performance"   Electronics, nov 17, 1982
          7. Zemam, Troy Nagle " A high speed microprogrammable DSP employing distributed arithmetic" IEEE Vol SC-15, Feb 1980
          8. R.Shively "Architecture of a programmable DSP" IEEE Vol C-31, Jan 1983
          9. IBM SP16
          10. MIT Klatt vocal tract model" ICCASP 82 ??
          11. Phillips Labs "A 40 MHz Multiapplicable DSP chip" IEEE Vol SC-17, Feb 1982
          12. TRW ICCASP 83
          13. David Karlin ( Fairchild) " VLSI BB for DSP" ICCASP 1982
          14. Barral, Moreau "Circuits for Digital signal processing" ICCASP 1984
          15. NTT "LSI's for DSP" IEEE vol. SC-14, April 1979
          16. F. Mintzer, A.Peled "The architecture of the real-time signal processor "ICCASP 1982
          17. and dozens of others with key searches in IEEE explore
          18.  Plug-in DSP boards" EDN April 26, 1990
          19. "DSP coprocessor boards" EDN Sep 13, 1991
          20. "DSP boards help tackle a tough class of AI tasks" electronics, aug 21,1986
          21. G.Pawle, T.Faherty "DSP development board offers host independence" computer design october15,1984

          Sunday, January 1, 2012

          COP: DSP boards, FPGA

          COP: DSP boards, FPGA
          The goal of this section is to cover any DSP COP, with a footprint different from a chip. The two main candidates are boards and FPGA. While FPGA are chips, the very large FPGA are more like boards in terms of price, flexibility and topology. Also they are direct competitors in the PC SOC socket. 

          Background
          Sometimes in the 90s plug-in dsp boards become coprocessor boards. In a terminology which was largely marketing, a dsp board for the VME bus was called a plug-in but the same board with degraded performance for the ISA bus was a COP board.[ref 1,2,3,4]


          Description
          PC coprocessing socket
          One of the most interesting DSP Application of recent years was financial coprocessor. Intel architecture allows adding a COP on a very tightly coupled interface.  A couple of people [ref 5,6] used this socket to put a board which boils down to 1 or several higest end FPGAs. For the record what is accelerated is effectively some Matlab functions.
          Embedded coprocessing
          Also in the embedded world, a FPGA + C64xx is a standard sight. For instance in general purpose boards or wireless infrastructure. While a part of the  FPGA is for jelly beans an even larger part takes care of pre/post processing. Also nore that the C64xx does not support specifically for coprocessors. 


          References                                                                                                                                      
          1. " Plug-in DSP boards" EDN April 26, 1990
          2. "DSP coprocessor boards" EDN Sep 13, 1991
          3. "DSP boards help tackle a tough class of AI tasks" electronics, aug 21,1986
          4. G.Pawle, T.Faherty "DSP development board offers host independence" computer design october15,1984
          5. Nallatech
          6. HC 2009??