DSP Bricklayer: 2012

Sunday, May 20, 2012

T.O.C 20 May 2012

So you want to be a DSP architect?

The long story
Background Checks

Coprocessing
Benchmarking

BDT as an architectural tool

Building Block issues
Methodology and tools issues

Arithmetic BB
Bit wise BB
Vector BB
Math BB
DSP BB
Matlab BB
Found on the web
Stop me if you've heard this one before!
APPENDIX - DSP Architecture

DSP of the First Kind (1980-95)
DSP of the Second Kind (1995-2005)
DSP of the Third Kind (2010- ?)
DSP of the lost kind (1995-2005)
DSP of any kind (1950-2050)

APPENDIX - DSP structural Building Block

AGU
PCU
DAU
SU (The Shuffle Unit)
RF (Register File)
Bit slice: Bit slice building blocks

Bit Wise BB
This is a heck of a topic!
Firstly it overlaps ISA studies and DSP Building Blocks in at least 2 specific places (the DAU and the shuffle units).
Secondly, the variety of function classes is very wide. For instance, pack/unpack, Count Leading Signs, Gallois Fields, bit manipulation.
Thirdly, and this will serve as an introduction to this topic, the function themselves can become amazingly out of hand.
Finally, Matlab bit wise capability is limited or non coherent. As of this writing our latest bit wise library is floating point based!

Background

In 1996, when we were designing the Tricore ISA, Bruce came up with a set of so called "permute" instructions and I do remember Rod's reaction as between bafflement and irritation.
On Bruce side, the truth is that all Media processors had this class of instructions [ ref: search Ruby Lee].
On Rod side, we all know that the most precious resource in ISA design is the opcode. And, the biggest issue with bit wise instructions is that they are opcode hogs (always requiring dozens of control bits).
For the sake of the story, I would add that, since 1996 I have been involved with this issue multiple times and seen a few people falling in this trap.

Example: TI C64+ : instruction PACKHL2
To illustrate this topic, we will now discuss in details the PACKHL2 instruction of the TI C64+ DSP. We will then expand to all C64+ PACK instructions.

naming convention and syntax issue
The instruction mnemonic PACK implies that the destination register is smaller than the source register(s), HL is a subtype and 2 in the TI terminology means 2 sub-word operations. Since the registers are 32-bit wide, ADD2 (sub2, etc.. ) will mean 2 x 16-bit additions, and ADD4 will 4 x 8-bit additions in parallel.

The syntax is
PACKHL2(src1,src2, dst) where src1,src2, dst are all 32 bit registers

Question: since he syntax is exactly the same as
ADD (src1,src2,dst) where src1,src2, dst are all 32 bit registers
why do you need the suffix 2?
Answer; In a 32-bit architecture, the 32-bit register is generic. All instructions use 32-bit registers. What matters are the operations inside the 32-bit registers. In this case 2 implies 16-bit data.
Question: PACK implies some kind of register demotion. The syntax dst = src1 <op> src2 is just like any standard 2-operator syntax. The 2 registers are operated upon and the result is written to destination. Where is demotion in this type of operation?
Effectively, we would be more comfortable with dst32= PACK(src64) where src64 is a 32-bit register pair (such as A3_A2).
                         PACKHL A3_A2,A0 ; a syntax using a 32-bit register pair (64bit )
                         PACKHL A2, A8, A0 ; TI syntax using two 32-bit registers
but the advantage of the TI syntax is obvious: register flexibility.

Definition
We are now entering the core of the matter. What is the definition of the PACKHL instruction? First we want something simple to describe this instruction. Writing a "C" definition is rather wordy and the TI standard description is more complicated than needed. The simplest description is to consider the 2 registers side by side, src2 on the left (made of the two half words D_C) and src1 on the right (respectively B_A), the result will be C_B.
                      D_C B_A

C_B

We have now the following definitions:
                        Z = concat(hi(X),lo(Y));    % hi() and lo() are self explaining functions
                        Z = [hi(X) lo(Y)];              % even more Matlab like
                      Z = [X(31:16) Y(15:0)];     not so Matlab

And they all look clear. But concatenation is not the same as packing. In fact, intuitively it is the contrary. One increase the variable size, the other reduce it.

Towards a general definition of PACK
Let us start with a most general definition. The register size is 64-bit and the granularity is 8-bit. Both values are very reasonable in 32-bit architectures.We will call this instruction PERM(ute).
PERM(src64, dst64, controlword);
We will see later that PACKHL2 is just a sub-case of PERM.
PERM is easily described. as shown the following example

  src64   HGFEDCBA

         8x8 switch

  dst64   AAGHFEDC

In this description each letter represents a byte and note the little endian choice.
But then what is the control word and how many bits do you need? The number of bits is the problem. In this case we have for destination a 64-bit register made of 8 sub-component (bytes). Each byte can receive any of the 8 source bytes (a 3-bit control) and since there are 8 destination bytes, the total number of bits is then 3x8= 24 bits. Not an easy decision to make in a 32-bit opcode! And using a register to hold the control word is a poor solution. It means a 3-cycle instruction (MVK, MVHK, PERM).
Finally the control word syntax is very straightforward. We just use the representation of the destination register.In the example above it is:
PERM(src64,dst64, AAGHFEDC)

Matching PERM and PACK

Using the similar definition to the one above, it can be seen that PACKHL2 is equivalent to:

PERM(src32,src32,dst32, FEDC)
In fact we can match all C64+ PACK Instructions in the same way (see table)

It must be noted that the C64+ DPACK instructions are effectively more like PERM than PACK since the sources (combined) and destination have the same 64-bit width.

Conclusions

Defining bit wise functions just by looking at the datapath is relatively easy and can give simple yet very powerful structures. Architects (attracted by elegance) will always love that.
The problem is the control path which can become rapidly out of hand.
To illustrate the problem we took an intruction from the C64+ instruction (PACKHL2) and extended it to a generic Permute instruction. The number of bits require to describe the instructions would be 24.
Thiis problem applies the same to a building block or a coprocessor unit.
With reference to the C64+ we will now compare the two approaches:

Advantages of using a general PERM instruction

Conceptually, it is very simple.
Software implementation is very direct.
Very flexible: any source byte can go to any destination byte. Any source byte can be duplicated (or replicated n times) in destination.
No need to do long studies and have drastic selection to choose the right PACK datapath to implement (this is very often the case with bytes).

C64+ offers only 2 choice: byte even and byte odd

Shortcomings for using a general PERM instruction

Need 24 bits of control. This is not realistic in 32-bit ISA.

C64+ defines only 8 instructions.The footprint is minimal

Having 24 bits gives power(2,24) possibilities; how to test that? (by construction?)
The flexibility advantage may be a delusion. Some features are missing. For instance, just looking at the C64+ ISA (sign extension, packing with saturation). saturation.

While TI made the right choices for the C64+ ISA, a different situation (64-bit ISA, dedicated COP, etc..) might give different results.The astute reader, we are sure, has already plenty of ideas and solutions.
BUT this is not the point of this section. The point is to make sure that you understand the main risk associated with Bit wise instructions (the control bits) . To be forewarned is to be ...

Saturday, February 18, 2012

SDR (Software Defined Radio)

SDR (Software Defined Radio)
This section covers an ambitious project. Let us imagine a wireless terminal made of 2 CPU blocks: a powerful host armed with all media capabilities and a second block which can process any kind of RF signals and radio protocols (i.e 3G,4G, Wifi, gps, up to future standards such as cognitive radio). This second block is commonly called Software Defined Radio (SDR).

Background

The concept of SDR is as old as the radio [ref 1] but in our context we will put the burden on the solid shoulders of Les Mintzer circa 2000 [ref 2]. This was an FPGA implementation which makes a lot of sense as the money and impetus came from the military. In the same school see also Xilinx's Chris Dick, Spectrum's Lee Pucker and a few others [ref 3,4,5,6,7,8,9,10]. Now at this stage their implementation of SDR was far from fitting the definition but it made sense. Worse was to come. Firstly, in a brilliant case of taking the tree for the forest, a bunch of guys were using the PC/pentium as their example of SDR.They could do everything except radio.

Frankly first time I heard about that, I thought of genius generating radio from a PC by using the EMC radiation of the chips, this sounds silly but after all this was the time of UWB so why not?

And then we had the usual silicon valley trend: a new buzzword for a new type of chip architecture [ref 11,12,13].

Reference

Dan Stransberg " A century old technology enters the digital age", EDN , October 28 1999
Les Mintzer "Soft Radios and modems and FPGA" CSD mag, february 2000 p.52
Chris Dick, Fred Harris "the platform FPGA Software defined Radio" good paper but no reference, likely a conference circa 2001
Chris Dick "FPGAs cranked for software radio" EET, April 10, 2000
Chris Dick " A case for using FPGAs in SDR Phy" EET, August 12,2002
Lee Pucker "Paving Paths to SDR" csd mag june 2001 p.19
Lee Pucker, Systems Architect, Spectrum Signal Processing "Distributed Architecture for SDR" CSD 2001
Robert Sgandura PM Pentek " W221: Software Radio - From concepts to Implementation" likely at Communication System Design conference 2000 or 2001
Cord Finlay "Understanding SDR requirements" Wireless System Design July 2001
"Soft Radio key to universal applications" EE Times Special Report March 19, 2001

so full of hope, fortunately Loring Wirbel deflated the hot air in EET aug 12, 2002.

John Ralston "SDR emerges to address wireless industry needs" Wireless System design october 2000

Describes the Morphics architecture

Armin Nueckel, Prashant Rao "The use of Reconfigurable Processor Arrays in wireless Infrastructure Systems" July 25, 2001 {white paper?}

Describes the PACT architecture

Paul Master "the baseband solution for the worldphone" www.quicksilver.com 2001

Saturday, February 11, 2012

DSP of the lost kind

DSP of the lost kind
The goal of this section is to laugh through the multiple attempts at defining new DSPs , next craze DSPs, further than DSPs, beyond DSPs , everything and their contrary as we went from 1995 to 2005 . Among the most notables we will have a quick pass at: Media processors.
Now, the obvious question is: why should we care? Once more, the answer is that there is a very large difference in use model between a GP full market CPU and a COP.
What is ridiculous in one case becomes brilliant and maybe most effective for the given application (see NVIDIA evolution). Also some of the brightest architects of our time were involved in the design of these new machines. Finally this is still one of the riches vein of ideas available publicly (but not freely) (IEEE, ACM, MDR, etc..).

BACKGROUND

There are two reasons why we feel sad about this "Idiot Wind" period .

Too many people were getting carried away and too many arguments just did not make sense.

For instance we computed it would take 1200 men years of application software (not tools) just to meet the minimum requirements of fulfilling the claim of being a Media processor.

For a while, we went in the same direction, (and we have to thank the influence of Jim Turley for that) (mind you he changed his mind pretty quickly too).

DESCRIPTION AND TYPES

Type 1

Video Signal Processors
Video DSP
Media Processors

Type 2 Legoland

Multi IP DSP, Cluster Based DSP

Bops, Cell, Improv, Equator
atsana, chipswright, clearspeeed, craddle, e-lite, powerFFT, sandbridge, siroyan, synputer, telairity, tops, trips
Infinite-tech RADarray (neil stollon ICSPAT 99)

RADarray of RADcore coprocessors connected to host
Host and peripherals are 3rd-party IP
Each RADcore is a reconfigurable stream algorithm coprocessor.

Each RADcore is made of user selected Execution Units (EXUs) interconnected by reconfigurable data path bus architecture
EXUs form independent elemental processing blocks such as ALUs, MACs, memories, I/O,etc..

Type 3 Domain DSP

Wireless DSP,
Speech, Audio DSP
Broadband DSP

Custom DSP, configurable DSP,
Reconfigurable DSP

Media Processor (MP) (or MVP) (V for Video)

Born around 92-94, the MVP had a peak around 1997, went through multiple "down and up"s before final death by 2005.The most famous (first wave) names were TriMedia, MicroUnity, Chromatics, NVidia [ref.1]
They had several characteristic in common:

It was a new type of processor (the MEDIA processor)

from our DSP perspective it was a bit baffling. Why not use a DSP?

They were the latest incarnation of the MultiMedia (MM) craze.

very soon followed by MM extensions. Compare MP and MM extension

For some reasons most were based on VLIW.

Why?

While 5 of 7 MM functions could be done by a DSP, they rightly believed that a DSP would not be able to do the last 2 (video and graphics).

But then why do they think they could?
What were the 7 functions?

Finally, the most obvious characteristic was their disdain for the Pentium. In a typical Silicon Valley Fashion, after loosing the Risc war, that was the new frontier.In a nutshell:

First they lost the PC seat to the Pentium (that was CISC versus RISC)
Then they (re lost) the PC MM seat to the Pentium (that was GP processor versus Media processor)
Finally they could not even find a small coprocessor seat in the PC, i.e. to be used as a Media COP!!

This is the point which merits attention. See NVIDIA below

In their final days, to add insult to injury they were totally inadequate for the embedded market.

Mind you, good luck trying DVD in 1999 or cell phone in 2000 with a 288 bit wide opcode and no software support!

NVIDIA
In the most remarkable feat of the century, NVIDIA changed direction by focusing on graphics instead of a rainbow of applications [ref.2].
Not sure about their recent direction towards general purpose, people never learn it seems!
TriMedia
In another example of adaptation, the champion of VLIW, after years of trying software optimization instead redesigned their chips with two multiple powerful video coprocessors. Also important, Phillips concentrated on platforms and Trimedia became "the other core" of the Nexperia platform. Instead of ARM+DSP it was MIPS+TriMedia. Mind you, the latest instantiation of the platform could be ARM + COPs.
Definitely a good DSP core, the TriMedia was a relatively simple 5 issue machine, had some neat tricks (fusing of operations at the register file ports) and very eearly code compression. [ref 6,7,8,9]
MicroUnity MediaProcessor
Original coiner of the term, famous for John Moussouris, MicroUnity was the first on the scene and the first to die; they set the architecture standard for a single core parallelism pretty high. [ref 10,11]. They had a following with Equator.
Chromatics MPACT
The other Media Processor, noticed for its 792-bit datapath (if today it sounds peanuts, at the time the standard was 64-bit)... currently dead. [ref 12]
Offer from the East
As can one expect, Japan and its all powerful integrated consumer industry were not the last to follow the hype with their own offering
- Sharp DDMP (1997)
- NEC MP98 (MPR March2000)
- Toshiba MEP (after giving up MPACT) (EPF 2002)
- Fujitsu (was their VLIW a Media Processor?)
- Hitachi ??
- Matsushita ??
Among the Noise
While not serious commercial products, the MVPs developed by Universities and Research labs had interesting features. These are 2 examples we were familiar at the time but there are litteraly 100s of them [ref 13,14].

University of Hannover HiPAR {search Johannes Kneip} (IEEE Video for Circuit and systems 1996)

Impressive X,Y memory

Infineon VIP {search Uli Ramacher} (Hot Chips 13)

Finally the true MVP
The TI C80 also known as MVP (1993) remains an excellent architecture example. It is NOT a Media processor. Its was designed like any other DSP, with an ambitious target in MIPS which happened to be video conferencing. Now as part of the strategy obviously the target was anything in the same order of MIPS magnitude. Most interesting it was a host+4xDSP core solution. Even more: how can you be so wrong in the memory model? (refer to somewhere else in this Blog).

Lessons Learnt

NVIDIA
Headline1: MultiMedia processor becomes UniMedia processor.

Headline2: Media processor becomes Graphics solution..
And they had enormous success as Pentium "Coprocessor" .
Now this is an important lesson, for co-processing. As a general purpose co-processing MM chip, the MVP architecture was a complete failure. As a "very focused" co-processing solution point, it is part of the standard PC architecture.
Microunity, Chromatics, TriMedia
It would be silly to forget these architectures. On one hand, more recent and better architectures have been developed but on the other hand they are not so visible.
Further: the GPU story
Obviously these are still questions (we have to wait another 3-5 years to learn the lesson):
- KEY: is there a place for another type of computing (the GPU)?
One has to bear in mind that in 40 years of silicon computing (except for a short time of DSP) all attempts to compete with the general purpose model ended up in abject failures.
- what kind of the processing model is the dual head(CPU+GPU) adopted by AMD ?
- in the same way, what kind CPU+GPU is the Apple model?

References

John.A.Watlington "Video signal processors" , circa 1997, http://wad.www.media.mit.edu/peole/wad/vsp/node1.htm from a web site which (as usual) has gone pining for the woods. Too bad , it contained an excellent and succint table of comparison.
Section news" NVIDIA changes direction" EBN December 23,1996 issue:1038
Bernard Cole "New processors up multimedia's punch" EET February 3, 1997.
8. Lee, W., Kim, Y., Gove, R.J., and Reed, C.J., “MediaStation 5000: Integrating Video an

it is more the answer from CPUs (MM extensions) to MM processors.

Jim Turley "Multimedia chips complicate choices" MDR Feb 12, 1996 page 14
Maury Wright "Media Procesors target digital video Roles" EDN sep 1, 1998
Tom Halfill "Philips Trimedia goes Mobile " MPR dec 5, 2005
Gert Slavenburg and his biking pal "DSPCPU operations for TM1100" 1999, Appendix A, Preliminary information
Gert Slavenburg, etc..'Custom operations for Multimedia" Chapter 4 of the same reference
Peter Clarke " Compressed VLIW meets multimedia" EE Dec, 1995
Craig Hansen "MicroUnity MediaProcessor Architecture" IEEE micro, 1996

quote "A broadband mediaprocessor extends and streamlines a general-purpose computer system to attain the goal of communicating and processing digital video, audio, data and RF signals (SIC!) at broadband rates using compiled, downloadable, software rather than special-purpose hardware. The instruction set, system facilities, and initial implementations of an architectural family of broadband mediaprocessors are introduced, and compiled software development is illustrated with an example and description of the development environment".

John Moussouris (easy to remember mouse+souris) "A roadmap of the Mediaprocessor Design space" Microdesign Resources Dinner May 9, 1996
Yong Yao " Chromatics's Mpact2 boosts 3D, MPR Nov 18, 1996
SSC 27 12, dec 92 p1886
SSC 29 12, dec 94 p1474
http://www.cse.fau.edu/~borko/Chapter21_mc.pdf

Just added this morning, just wished I had read it before.kind of complement to our MM splash. Hey Borko is it public?

Answer to questions

The 7 functions:

graphics including 3D
video (compression, editing , conferencing)
high quality Audio
computer telephony, Speech
communications, Fax, Modems
real time 3D games
DVD?

Wednesday, February 1, 2012

COP - DSP FUNCTIONS

DSP Functional Building Blocks
The goal of this section is to cover the history of long story DSP functional blocks. As its name implies, this type of block is opposed to DSP structural block such as (bit/byte/word slice logic).

BACKGROUND

Let us give a special mention to the CAFIR (a FIR filter from motorola 1986) and the C66xx FTT Engine (TI 2010) and to our friends of the Morphics Next Gen.

LIST OF FUNCTIONS

Filters

including equalizers

FFTers

bit reverse (!!)

Convolvers
Bit convolvers and Gallois field arithmetic
Hats off to unsung heroes of 30 years of custom DSP (see for instance ISSCC 1980-now)

FILTERS

DSP56200 also known as CAFIR (Motorola 1987) [ref. 1]
- Main function is adaptive filtering using LMS .
- 256 x 24-bit coef RAM
- 256 x16-bit data RAM
Can be configured as single FIR, dual FIR or Single adaptive FIR
Programmable loop gain in adaptive Mode
Programmable coef. leakage term.

REFERENCES

Motorola " DSP56200 data sheet", 1988,also called ADI1257R1
Hossein Yassaie "Digital filtering with the IMS A100" Inmos 1986
FUJITSU " MB86795 data sheet" , aug 1987
Atmel "ATC 76C001 programmable FIR filter", 1996
Amphion " Cascadable FIR", Quicklogic, 2001
ISS "Biquad IIR filter megafunction" Altera, 200
see Xilinx, Altera etc.. Catalogues
Jordan, mannock "Correlation-function, peak detector", IEE Proc., March 1981
Tao Lin, Dahn Le ngoc" Implementation of digital filters using IDT7320, IDT7210, IDT7216, and IDT7383" IDT A.N AN-32, 1990

Sunday, January 15, 2012

Benchmarks

Benchmarking, Performance analysis, Technical comparison
Under the benchmark heading, we will cover all techniques which are also known as Performance Analysis in Computer architecture.

Background

With each new type of chips, it was usual for electronics magazine to do a special report where the main item was a synthetic table of features.So we became very adept to do the same and for a DSP, a table of 20 features was largely sufficient. Even better, it could be reduced to 2 lines: Mhz and Nbr of MAC operations/s

thinking about it, in these days all DSPs were single MAC so that MHz was sufficient.The big issue was the number of data buses.
We also uses graphical analysis. In a nutshell, memory buses, MAC and ALU were drawn with their data path width; the bigger area, the better.

Hence came the concept of figures of merits. The CPU world had MIPS, the DSP world had MMAC/s.The natural step was to introduce kernel benchmarks since MMAC/s and FIR filter were identical. Also by the mid-80s, all manufacturers had their own list of kernels so it was easy to synthesize our own. And so we deeply believe that a benchmark mix of FIR, Biquad, adaptive FIR is still sufficient to characterize a DSP.

Before going further we must note two very important concepts.
The first one is the concept of benchmark mix.The biggest advance of BDTI, was to standardize the MIX in a simple and intelligent way.
The second one is when using kernel benchmarks, we gave up on speed and instead used cycles/kernel. In other words smaller was better as opposed to the CA methodology of bigger is better (e.g. D-MIPS).
And then, we naturally went from kernel to application benchmarks. By that time we had plenty of competition from ersatz DSP which used MHz as the measure of benchmarks and so we came with the concept of DSP MIPS.

By then, the CA guys were completely lost. Our reasoning is complex, but trust us this the best way to tackle the problem.

Instead of benchmarking the DSPs, we benchmarked the algorithms such that a G711 was 0.5 DSP MIPS, a G729 was 10 MIPS and a G723 was 30 MIPS. {Tbd recheck}
And then, we were so clever, we also could use this figure of merits for DSPs. All DSPs were single issue so that MHz and DSP MIPS were synonymous. So it was very easy and safe to predict that a 50MHz DSP had the workpower of 50 DSP MIPS and enough to implement a G723 speech coder.
By this simple technique we had merged chips, algorithms under one umbrella. Kernel benchmarks could also easily be integrated such as the BDT 256 point FFT had a 0.008 DSP MIPS (8000 cycles)
But then, all hell broke loose for multiple reasons:

DSP became CPU like and the CPU standard is Dhrystone MIPS.
C became the only acceptable of benchmarking applications.
Because MultiMedia was kind of becoming synonymous with DSP with ended up with all the benchmarks politics of data size.

audio is 24-bit so how do you compare apples and oranges etc.. to avoid the problem BDT use the concept of native size.

Even truer, CPUs and cores took a back seat to application platforms. Applications became the only respectable benchmarks (and by the same token totally impractical).

Nobody in his right mind is going to implement a full application for benchmarking purpose.

So as of 2012, benchmarking is 90% Linux testbench (downloading, rebuilding, compiling -o3, running, testing and measuring) and 10% optimization. We are miles away from evaluating the performance of a DSP.

List of techniques

Table of features

Problem: nowadays this is quantitatively more difficult. A standard SOC is made of hundreds of basic piece of IP.
Problem: when a 32-bit shifter is not equal to a 32-bit shifter

Single figure of Merits

MIPS
DSP MIPS
MMAC/s

the problem

MOPS

the problem: 1 MMAC = 2MOPS

ITU WMOPS
D-MIPS (Dhrystone MIPS)
Graphical figure of merit: the Lucent Cube

Graphical analysis
Manufacturer benchmarks (kernels)
personal and custom benchmark
Industry standard DSP benchmark (assembler) - BDT
Industry standard DSP benchmarks - the rest -> from bad to worse

Benchmark results gives ranking linearly proportional to MHz! So why bother?

Application benchmarks
Models

graphical models; fatter is better
Bob Owen nice little drawings.

References

BDTI web site
Eric Martin

Saturday, January 7, 2012

DSP of Any Kind

DSP of Any kind : boards, FPGA, Custom chips etc..
The goal of this section is to cover the "free world" of DSP architecture.

By definition, commercial GP (General Purpose) DSPs such as a TI C64xx are not free. They must follow the constraints of Chip design, ISA design and most of all, compiler friendly and software upward compatibility.
By comparison, there are many products which are point products or application specific. They are full of interesting tricks and original solutions.This is the case for custom DSP chips, and they can also be found among boards and FPGA based platforms.

The problem with all these custom solutions is the lack of public information.

FPGA is the most obscure because the description is both code (HDL) and proprietary.
Custom chips are more available in the form of conference papers [ref 1].
Boards, especially older boards, are the most accessible since the description is a block diagram

Background

From 1978 to 1985, designing dsp boards with dsp BB
A big application of bit slice families were DSP boards. We largely used the architecture of these boards to design the first generation of integrated GP DSPs.
From 1985 to 1995, designing dsp boards with GP DSP
Once integrated GP DSPs were available, the board architecture became predictable and most of the advances were largely outside DSP such as host interfacing[ref 10..13]
Since 1995, FPGA

Around 1995 was the time that , {with a different type of footprint}, the FPGA became a credible alternative to board. Not surprisingly the FPGA companies reinvented many of the BB and board architecture techniques of the 80s.

DSP BOARDS

The historical path to electronics progress is to design boards based both on new components and better architecture understanding. And new components are then developed by integrating what was then a full board of electronics. Recent footprint changes (SIP) and nanotechnology did not fundamentally alter the process. A end-user board still represents a very good approximation of the next generation SOC platform. So in a methodology like ours, which design COPS by relying on proven DSP architectures, studying DSP boards will represent some design values.
Let us start by categorizing DSP boards and keep in mind that the same categories could be applied to COPs.:

Boards designed using a bit slice family (AMD) or a building block family (ADI).
Boards based on a General Purpose DSP (typically a TI DSP ) or high performance CPU.
Special purpose boards such as Array processors or Vector Processors.
Boards based on one or several custom chips

FPGA

In a way, FPGA can be seen as a new type of board. There are however deep differences.

The FPGA vendors (Xilinx, Altera, Quicklogic) propose catalogs of Intellectual Properties (IP) which constitute an important reference for the COP designer.
They also propose multiple builder tools. Some can serve as example for developing COP.
Finally, they have multiple experiences with "Matlab" to implementation (in the form of Simulink block to Xilinx hard macro).

Note: we do not try to cover the implementation of a COP in a FPGA fabric. Instead we imply that the FPGA ecosytem constitutes a vast amount of resources to be tapped in.

DSP CUSTOM CHIPS

A long time ago, custom chips were a mine of information. For instance the key search of 'FFT' on the IEEE ISSCC DVD will bring in the region of 100 hits over the 1980s 10 years span. Nowadays the number of custom chips has dramatically diminished and the descriptions are rarely at the level of the building block. Still like boards they are a very good reference for communication and multiple processing.

Also we should not forget that a COP is a form of custom chip.

FURTHER: WHERE TO START WITH DSP BOARDS?

Motivation: "should a DSP COP architect be studying DSP boards?".
1) Today's DSP boards: they are too sophisticated to serve as good examples. Their design advances have more to do with inter-board techniques and software than "pure"DSP. Still there are few techniques worth studying such as HOST-COP interfacing, serial inter-chip communication (SRIO) and heterogeneous MP.
2) DSP boards of the 70-80s: they are much more amenable to study. They had a large range of architectures and structures [ref 2-9]. Theys are nearer to the architecture of a current COP design than current DSP boards and chips.

One could think "If the DSP chip architects had done a good job, we would not need to go back to obscure stuff to get our architecture inputs". But this would be unfair since designing a GP DSP had many constraints given by software. We will not say that these constraints should be totally ignored by the COP designer but the whole point of designing a COP is for these constraints to take a back seat.

We will now do a quick review of some paper references and draw conclusions on what can be possibly reused:

Most articles had a block diagram of (what was supposed to be) a DSP
The definition of a DSP varied widely. For example:

Usage of serial arithmetic.
Usage of shift register as opposed to memory
A memory model reduced to I/O buffers (ADC/DAC) so that there was no need of AGU.

Even a model with only DAC (MIT speech generator)

AGU with flags going back to the ALU status registers. Obviously to do circular.
Single bus architecture, and the Zurich zip trick as an architecture feature

Veendrick [ref. 11] had an interesting BB that we will call "MULIMIT"

Z= K*A + L*B

References

Hot chips, ICSPAT, ICCASP, ISSCC, see also IEEE DVDs (Com. Soc., SP Soc. SSC. Soc.)
Louis Schirm IV, TRW inc. "Packing a signal processor on a single digital board" electronics dec 20, 1979
John Mick "Fast computational devices for DSP" likely at some conference before 1980 (?)
Zaheer Ali "Know the LSI hardware of digital signal processors" EDN 21 june, 1979
Capello, etc.. "Completly pipelined architectures for DSP" IEEE Vol ASSP-31, August 1983
S.Chin, C.Brooks "Microprogramming enhances signal processor's performance" Electronics, nov 17, 1982
Zemam, Troy Nagle " A high speed microprogrammable DSP employing distributed arithmetic" IEEE Vol SC-15, Feb 1980
R.Shively "Architecture of a programmable DSP" IEEE Vol C-31, Jan 1983
IBM SP16
MIT Klatt vocal tract model" ICCASP 82 ??
Phillips Labs "A 40 MHz Multiapplicable DSP chip" IEEE Vol SC-17, Feb 1982
TRW ICCASP 83
David Karlin ( Fairchild) " VLSI BB for DSP" ICCASP 1982
Barral, Moreau "Circuits for Digital signal processing" ICCASP 1984
NTT "LSI's for DSP" IEEE vol. SC-14, April 1979
F. Mintzer, A.Peled "The architecture of the real-time signal processor "ICCASP 1982
and dozens of others with key searches in IEEE explore
Plug-in DSP boards" EDN April 26, 1990
"DSP coprocessor boards" EDN Sep 13, 1991
"DSP boards help tackle a tough class of AI tasks" electronics, aug 21,1986
G.Pawle, T.Faherty "DSP development board offers host independence" computer design october15,1984

Sunday, January 1, 2012

COP: DSP boards, FPGA

COP: DSP boards, FPGA
The goal of this section is to cover any DSP COP, with a footprint different from a chip. The two main candidates are boards and FPGA. While FPGA are chips, the very large FPGA are more like boards in terms of price, flexibility and topology. Also they are direct competitors in the PC SOC socket.

Background

Sometimes in the 90s plug-in dsp boards become coprocessor boards. In a terminology which was largely marketing, a dsp board for the VME bus was called a plug-in but the same board with degraded performance for the ISA bus was a COP board.[ref 1,2,3,4]

Description

PC coprocessing socket
One of the most interesting DSP Application of recent years was financial coprocessor. Intel architecture allows adding a COP on a very tightly coupled interface. A couple of people [ref 5,6] used this socket to put a board which boils down to 1 or several higest end FPGAs. For the record what is accelerated is effectively some Matlab functions.
Embedded coprocessing
Also in the embedded world, a FPGA + C64xx is a standard sight. For instance in general purpose boards or wireless infrastructure. While a part of the FPGA is for jelly beans an even larger part takes care of pre/post processing. Also nore that the C64xx does not support specifically for coprocessors.

References

" Plug-in DSP boards" EDN April 26, 1990
"DSP coprocessor boards" EDN Sep 13, 1991
"DSP boards help tackle a tough class of AI tasks" electronics, aug 21,1986
G.Pawle, T.Faherty "DSP development board offers host independence" computer design october15,1984
Nallatech
HC 2009??

Sunday, May 20, 2012

T.O.C 20 May 2012

Bit Wise BB - an introduction

Saturday, February 18, 2012

SDR (Software Defined Radio)

Saturday, February 11, 2012

DSP of the lost kind

Wednesday, February 1, 2012

COP - DSP FUNCTIONS

Sunday, January 15, 2012

Benchmarks

Saturday, January 7, 2012

DSP of Any Kind

Sunday, January 1, 2012

COP: DSP boards, FPGA

Followers