DSP Bricklayer: September 2011

Wednesday, September 28, 2011

1.2 Background Checks

1.2 Background Checks

Henneson and Patterssy

Computer Architecture

Coprocessors (COP)
The Core

Which type of core? --> let us start with a DSP of the first kind
Is a DSP core best for DSP?
Xilinx has a MAC. What is the next level?

Benchmark and benchmarking

VADD

The noble art of profiling

Profiling an ever changing reconfigurable machine

Optimization and tuning

Fixed Point Dialects
Algorithms, signals, structures, tips and tricks

Signal Processing
Matlab

Interesting feature: is M predication same as CA predication?

Once a (logic) designer always a ~~pain in the ass~~ designer

Logic
Arithmetic
Sequential
Xor constructs

SOC architecture

the ideal : NS Dgt Answer Phone -- 3 serial ports

Fixed Point Dialects

Fixed Point (FXP) dialects
The goal of this section is to summarize in a few words as possible the gigantic complexity of FXP.

Background

When 2q30 is 1q30?
Float or integer
First it was q15
Then came q14
Then came 1q15, 1q31, 2q30 etc..
Then we got lost; 1q31=1q15 but 1q15 ~= 1q15 !!
Still

List of FXP dialects

Assembly language on first generation DSPs
ITU basic operators
Matlab FI
Simulink Fixed Point
System C
DSP extensions in C
Proprietary FXP languages from CAD companies

Issues

FXP and integer
When translating FP to FXP the common mistake is to think that replacing the float data types with integer data types will suffice. This would be true if the operations were addition or based on addition. But as soon as multiplications are involved this does not work so well and it gets worse with division, transcendental functions, etc...
The reason is (in 99% cases), an FXP is not an integer. It is a fractional number. It can be a pure fractional (less than 1) or partly fractional (for instance in the range +31 to -32 will have 11 bit of fractions, in a 16-bit wide word).
Hence it is well known that an integer 16x16 gives 32bits and a fractional 16x16 gives 31bit ( and a saturated case). Also it is easy to visualize that an integer division and a fractional division will grow in opposite directions.

Tuesday, September 27, 2011

Example 2 - Cxadd

Saturday, September 17, 2011

Computer Architecture

Computer Architecture - Viewpoint
As a "DSP architect" I never considered Computer Architecture(CA) a major influence on my work. Still, the field is remarkable by its emphasis on quantitative approach(*) (**) and the number of ideas generated/implemented in the last 30 years.
(*) dixit Hennessy-Patterson
(**) Here quantitative approach is opposed to the "free thinker attitude" of the glorious silicon valley architects.

Computer Architecture - Past, Present, Future
Obviously, the heydays of CA are long gone. First we had the death of the major silicon valley conferences, the end of many CPU architectures, then the confidential reach of Microprocessor Report. Nowadays when the main subject of excitement at Hot Chips is virtual machines you know that the field is in trouble.
Around 1900 one famous Physicist announced that we had found everything in Physics except this little second order detail. The detail happened to be the source of Einstein's relativity.
So by analogy and to be on the safe side, we will announce that we have found everything in Computer Architecture except this little quantitative improvement called "transactional memory".

Computer Architecture - Value Proposition
The point i am trying to make is that you cannot be a very good DSP architect without a serious background in CA. You can be very good at designing one product but not a series of products over time and even less a hierarchy of differentiated products (such as cores, super cores and chips).
Now, on the other hand, the beauty of customised DSP architecture is by definition to have no past of future. You are designing a point product (tailored to the task).
Still, we will insist. The ignorance of the lessons of 30 years of CA will bite you one day as bitterly as having a mistake in your rounding function. So to fill that gap we will go through a CA check list.

Computer Architecture - Checklist

Low hanging fruit: Hot Chips conference

all 25 years+ of conferences are available on the web

Microprocessor Report and Microprocessor Conferences:

were as theirs best during the war years (f86 vs SPARC vs MIPS vs HP ) (1990-2000).
After 2000 became more of a SOC conference and a lot more DSPs too, including some interesting Coprocessors.

Speaking about Cop, the first Cop was the FPU (Floating Point Unit)

evolution 8087, 80287, 80387 and opposition to 68881

The intel i860: one of the rare "beast" I never had time to analyze in details;

genuine Computer Architects put it at the top of the list, so..

My favorite weirdos

Intel IAPX 432
Transputer
Fairchild F8
and others
because they tried to implement the SPC (Stored Program Control) model in a different and sometimes radical way. And they were always wrong.

Transactional memories for the same reason. But in fact I am interested in intelligent memories or using memories as coprocessors

IRAM
CAM,

Multithreading, Reconfigurable Computing, Transmeta patents
TenSilica and Stretch
Keep an eye on the latest chips from Intel (including IDF), one never knows
Not recommended:

IEEE computer magazines
ARM (boring)
GPU (a world in its own)

What is a Computer Architecture: My narrow vision
Think of the role of an architect building a BIG house. He knows all possible styles, all the buildings of the world, pyramids, bavarian castles, Gaudi.

For me, it is more a matter of culture than qualitative measures. The analogy with DSP architect is to know all DSPs ever built.

Also the architect must know the construction structures and which material to use, why and where..
The analogy here is to know your buses and DSP Building blocks.

Sunday, September 11, 2011

1.3 Building Block Issues

====== Building Block ISSUES =======

Naming conventions

Example: addsub21
The Philosophy of BB

Matching functions to structure

Naming Conventions

The biggest of all BB issues is strangely enough the organization and naming convention. For people familiar with software libraries this is a trivial problem but for people designing chips, or instruction set, or say building block for a FPGA library this can really turn into a nightmare.
So the right attitude is not underestimate this problem but do not spend much time on it (*)

The best BBs have 8 letters or less.
Have a naming convention from the beginning and STICK TO IT.
Expect this naming convention to become dirtier as time goes by. As you design more and more final products, there is obviously positive feedback trickling down to the lower level and that means redesigning the block and giving a derived name. Such as a 3 input adder working on on vector of 72 packed bytes called ADD31_PBV72, becomes ADD31_PBV72A or PBV721 or PBV72v2; bottom line it complicates your nice naming convention.
Expect this naming convention not to be universal; when a BB (between 1 and 10% will) not fit the naming convention; just call it Jack, jojo or Kapernic.
Instead keep an updated excel with a minimum of characteristics for all designed (and planned) BBs. This is a real pain, but it is necessary. you cannot judge a BB by its name. As I will show in an example ADDSUB can literally have more than 100 definitions.
Any automatic method helps but in all cases it means human reviewing since you must at least know what you have in stock. Remember that you are a designer not a user.
As you go up the food chain, naming conventions must change from a description (MUL16x32) to a catalog entry (DSP5400).
There are functional blocks which as least should have the name of the function (FFT). And also can be both descriptive and catalog. Hence FFT16x32 is nice but then what about final result is it 16 or 32? do you use rounding? bit reverse? in place? Very soon you adopt the catalog attitude such as FFT1, FFT2, etc..
Parametrized blocks help but are costly in description, hardware, testing and software.
A good solution is the concept of toolkit and build. To have a toolkit ready such as new blocks are built, tested and DOCUMENTED only when needed.
I strongly recommend to use a meaningful name for every build and not the automatic conventions found in software versioning. Firstly we are speaking about generating different BBs, not different versions of the same BB. Secondly a BB is by definition simple. It is bounded in functionality, input and output. This is not the case of software.

Example of building block: addsub
In this example, we are showing how even simple function can rapidly come out of hand. First let us start withe name. My naming convention is based on 3 fields: functionName[IO pins][arithmetic scheme]_[data type]. So for instance :

In this case we will use the Matlab default (FP) which has also a default arithmetic scheme and we will call the component addsub22 such that:

function [Z W]=addsub22(X,Y) Z= X+Y;

W= X-Y;
end

So far so good. So what can go wrong with the definition? Before we continue, it must be noted that this is NOT the most popular definition. The popular definition has 4 inputs and consists of two independent add and a sub in parallel. This is okay since using our naming convention it would be

function [z w]=addsub42(x,y,u,v)

z= x+y;

w= u-v;

end

The Data types
Using (b,s,l) hungarian convention we define the 3 signed integer components:
function [bZ bW]=addsub22_b(bX,bY) % int8

function [sZ sW]=addsub22_s(sX sY) % int16
function [lZ lW]=addsub22_l(lX,lY) % int32
We can use standard Matlab FP to define these components. It is a bit wordy since we must write the code for casting FP to integer and also FP does not offer 64-bit precision.

The alternative is to use the the Matlab "integer" type (int8,int16,int32) but it does not cover all Matlab functions (complex numbers) and it offers only 1 arithmetic scheme(sat).

The modals (arithmetic schemes)
How to indicate the modals? Contrary to popular beliefs theye are NOT 2 (modulo, sat) but 3 (modulo, sar, promoted). Promoted is the integer equivalent of FP. As the data grows so do the output.
function [sZ sW]=addsub22m_s(sX,sY) % if ovf, results are wrap around
function [sZ sW]=addsub22s_s(sX,sY) % if ovf, results are saturated
function [sZ sW]=addsub22p_s(sX,sY) % output is 17-bit so there cannot be ovf;
!) note that the hungarian notation is plainly wrong for the promoted outputs.
!!) and the "p" scheme not precise enough; output could be 32 bit (standard in CPU)

A vast and unfriendly territory is the Fixed Point data type (FXP).
Developing an FXP algorithm in Matlab native type (FP) might be faster than for integer, So the difficulty is not there.Moreover Matlab offers its own FXP language which is natively based.
The complexity comes from multiple misconceptions with FXP (not covered here).
Here is the code
function [z w]=addsub22_q(x,y,qpoint)
t= plus(x,y);
z= qformat(t,qpoint);
t= minus(x,y);
z= qformat(t,qpoint)

Note that code is flexible enough to take any q format and even integers. An int16 is defined as16q0 and an int32 as a 32q0.
But there are also serious limitations. Matlab FP (double) is limited to around 43 bits of precision. That does not fly well for 48-bit accumulators (or a 108-bit accumulator as I saw on a recent piece of DSP IP).
Also there is no casting in inputs, only in outputs. Hence the basic arithmetic scheme is FP (promoted).
Finally there is no concept of unsigned here.

Another vastly uncharted territory is sub-word parallelism(SWP)
Also called Packed arithmetic or more commonly called SIMD or whatever marketing Pentium extensions, sub word parallelism (SWP) is a very powerful BB technique which cannot be ignored.
Matlab offers inherent parallelism, shapeless operators, packing and predication flags which makes it a good match for SWP.
But in the example I will show the difficulties of SWP.
If we take the BB called addsub42, its SWP version becomes addsub21 such that:
function z_w= addsub21(x_u, y_v)
z= extract_left(x_u) + extract_left(y_v);
w= extract_right(x_u) - extract_right(y_v);
z_w= pack(z,w);
While this code is essentially correct, it is not useful. There is still too much vagueness to be used as specs.

References

Foot notes

(*) not like my friend Klaus who started a war on verilog naming convention):

Sunday, September 4, 2011

1 - So you want to be a DSP architect?

Statement
A DSP (*) combine the skills of CA, the byzantine expertise of FXP, the beauties of Algorithmic Design and the complexity of signal processing... so it is a good place for a buddying architect.
We believe the above statement to be true, but for reasons that you do not see in the usual litterature.

Background
First the bad news
Originated at the end of the 70s, and having gone through multiple phases, the 2015 DSP (and DSP architect) is dead!

Firstly because it is very difficult to come up with something else than the CPU model in the software world. In fact among the hundred of attempts, only DSP had a successful run (30 years+) which unfortunately is coming to an end.
Secondly, for the last 15 years "SOC platforms" have taken over from GP DSP or GP CPU as the mainstream chip. In this situation, the quality of the DSP or CPU core does not matter so much; all the specific IP blocks, including software will have more impact.

Or in effect, the quality of the CPU core is to be as transparent as possible.

Which is another way to say "Software is the killer"

"nobody has ever been fired when choosing ARM ".

Let us consider the standard platform. It will have a single or dual core ARM, maybe a DSP , many coprocessors and perhaps a DMA so smart that it can be called an I/O processor. Obviously most of the software is running on the ARM which is also taking care of the whole data management. And any programmable processor such as DSP is an hindrance to the whole programming model. So DSPs (or DSP cores) are slowly but surely disappearing from all platforms.

Now for the good news:

What direction to take for a DSP architect?

Become a full blown CPU architect and say bye bye to DSP.
Join the FPGA bandwagon.This seems kind of obvious and a little spartian too.
Learn 4G,5G standards, good luck!
Change job. Join Wall Streeeet and the analytics crowd.
Fill your specific description here.

Now for the news from hyperspace:
OR... this is the subject of this blog.You want to be a DSP architect? Plenty of work till 2052!

Let us put it in a few words:

As predicted by the experts, customization has taken over parallelism.
For a given platform, accels or cops (especially dsp) have grown in numbers and importance.
Matlab is the de-facto dsp language
From points 1,2,3: Matlab will be used to design dsp accels, cops and any application specific DSP (AS-DSP).
From point 3, Matlab has no concept of CPU (such as a programmable DSP core). Matlab basic concept is to sequence (in order) a series of functions.
The implementation of a Matlab function is done through mapping (not compiling).

Today we map a simulink block to a FPGA block.
Tomorrow we will map a a Matlab BB (a function which looks like a hardware BB) to implementation BB (assembler, verilog).

This methodology is based on a bottom up approach; developping increasingly complex BBs in M-code

Same as a simulink toolset .
Or the way that TTL parts went from SSI, MSI, LSI.

Considering that the number of BBs needed is in the order of a few Millions, we estimate that this methodology will NOT be available before a long long long time...

So we better start now ...

(*) Lingo
Accel: Accelerator
BB: Building Block
CA: Computer Architecture
Cop: coprocessor
DSP: a Digital Signal Processor chip (TI 66xx ) or Core (TI C64P, Ceva family)
FXP: Fixed Point

General References
Hot chips
TI web
http://www.iqmagazineonline.com/archived.php...............ARM IQ journal