DSP Bricklayer: November 2015

Friday, November 27, 2015

T.O.C 27Nov2015

===================== last updated: 17dec2015 ============================

So you want to be a DSP architect?

THE LONG STORY

DSP Architecture today
Why Matlab?
The SSS (Sad State of Software) story

BACKGROUND CHECKS

Henneson and Patersy (CA)
Coprocessing
The "CORE"/the "CELL" / The "SLICE"
Benchmarking

BDT as an architectural tool

Fixed Point dialects
Algorithms and Data types
Platforms and SOC architecture

BUILDING BLOCK ISSUES

Physical limits
Design time, Build time and Run time

THE FLOW: METHODOLOGY AND TOOL ISSUES

My stories

Back to 1979: What is descriptive language?
Mapping vs Compiling: the never ending story?
Soft vs Hard Macros - is Firm the answer?

NPU lessons
Retargetable compilers
Simulation: we KNOW profiling is THE key! got the answer?
Beserkley or Berkeley? and Alberto during that time...

FOUND IN THE WEBS

The Matlab Engine
Is Kurt Keutzer the Donald Trump of Hardwired Processing?
Another Berkeley Randy Cat?
Found in the cobwebs: my garage
Jeff Bier is promoting CNN!
Andre Dehon is promoting Multics!
The Trailing Edge

DSP (uP) (HISTORY OF ~ ARCHITECTURES)

DSP of the First Kind (1980-95)
DSP of the Second Kind (1995-2005)
DSP of the Third Kind (2010- ?)
DSP of the lost kind (1995-2005)
DSP of any kind (1950-2050)
DSP vs Micros
DSP vs FPGA
The Archives and the DSP historian

FROM DSP TO DSP

This wonderful world of DSP

A world? More like a sect!

The Pope (Will), the Cardinal (Jeff) and the Wizard (Gene)

The DSP Old Timer's Club.
Stop me if you've heard this one before!
DSP history

Bit slice: a saga 1975-1985
Building Block: was 1992 and IDT the last DSP BB? or Weitek?

TARGET BBs

Analog programs (Basic, HPC and other gizmos)
TI C25 Tips and Tricks
SPmag Tips and Tricks
SPMag 1990-2005

BB Level 1

Arithmetic BB

add,mul,compare, conditionals, degenerated, compound, muladd

Bit wise BB

Shift,
Bit-field,
Byte field,
Bit count logic (cls,clz, clo. etc..)
Boolean processing

Vector BB

Vector arithmetic,
Memory vector search, structured tables, lookup
Shapes (Matlab)

Shape generation
Matlab primitives
Reporting

BB1 DSP STRUCTURES

DSP trilogy

ALU
BMU
MULT

MAC
DAU
AGU
PCU
RF (Register File) and MEM
VPU (Vector Processing Units)
SU (The Shuffle Unit)
Bit slice: Bit slice building blocks

BB2 DSP FUNCTIONS

Filter
FFT
Correlators & bit comm. engines
Sampling functions
Matlab dsp functions

accumarray
diff
sum

BB3 MATLAB CONSTRUCTS

Find
Predication with Matlab
Matrix functions

BB4 MATH

Divider
Math.h
Complex numbers
Further with numerical recipes

BB5 CODING

Basic Coders

Coding generalized engine
Gray, Hamming, etc..
Parity
Cyclic codes: Firecode
CRC
Huffman coding
Arithmetic coding
Gallois Fields

Convolutional coders

Gsm decode
Hard decision
distab
distaabcc
Viterbi equalizer
Viterbi butterfly

Com. Block coders
Iterative coders

Turbo encoding
Turbo decoding
LDPC

Walsh, Hadamard and cdma

FHT
Happy Chirper.

5 MATLAB TOOLBOXES

NUFFT
SAR

APPENDIX

Sunday, November 22, 2015

Is Andre Dehon a genuine philosopher and Multics misunderstood?

The answer is: YES
=====================================================================
That's beat all for this week!

We all know how bad Multics was and so Unix was created. From that point history has been written:

Unix is the white knight....mac OS ...embedded linux
Microsoft windows/dos/PC is the dark side
Multics? dead and buried.. never again.

So, imagine my surprise, while checking some older MIT stuff from AD (Andre Dehon) I came across the Multics Reunion MIT 2014 with a paper of AD.

Abstract: At a time when computers are increasingly involved in all aspects of our lives, our computer systems are too easily broken or subverted. The current state of affairs is, no doubt, unsurprising to Multicians who are painfully aware of the design and security compromises that went into the base design of today's mainstream systems. The past 30 years has also brought vast changes in the availability and costs of computer hardware as well as significant advances in formal methods. How do we exploit these advances to make computer systems worthy of the trust we are now placing in them? We specifically take a clean-slate approach to computer architectures and system designs based on modern costs and threats. We spend now cheap hardware to reduce or eliminate traditional security-performance tradeoffs and to provide stronger hardware safety and security interlocks that prevent gross security and safety violations even when there are bugs in the code. We embrace well-known security principles of least and separate privileges and complete mediation of operations. Our system revisits many pioneering Multics concepts including gates between software components with different-privileges, small and verified system components, and formal information flow properties and guarantees.

Project paper: http://www.crash-safe.org

Now, I consider AD a madman since he went back to the east coast instead of staying in sunny California. But then AD was(is) always the ultimate contrarian, lateral thinker, deep searcher of truth, and so I should not be surprised by such a support on Multics..
Ten years ago he was fighting for the replacement of silicon platforms by {bio? to be checked}. And asked about the prospect, he rightly said that he was an academic and would not dare fighting the semiconductor industry...

Is there a lesson to learn? Obviously, cutting corners and bowing to economics pressure will bite you in the long run. Was Multics the answer to hacking? I have my doubts (but deep interest in this kind of rhetorical questions).

And I cannot not leave without mentioning Andre Dehon bio
Andre DeHon received S.B., S.M., and Ph.D. degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology in 1990, 1993, and 1996 respectively. From 1996 to 1999, Andre co-ran the BRASS group in the Computer Science Department at the University of California at Berkeley. From 1999 to 2006, he was an Assistant Professor of Computer Science at the California Institute of Technology.
In 2006 he joined the Electrical and Systems Engineering Department at the University of Pennsylvania, where he is now a Full Professor. He is broadly interested in how we physically implement computations from substrates, including VLSI and molecular electronics, up through architecture, CAD, and programming models.
He places special emphasis on spatial programmable architectures (e.g. FPGAs) and interconnect design and optimization.

Multics BIO: Andre DeHon is a bastard child of the tail end of LISP Machine and Multics eras, having been a research assistant for Knight and a teaching assistant for Saltzer. As a member of MIT's Student Information Processing Board (SIPB), he was part of the group that pushed Multics access to MIT students and was logged in during the decommissioning of MIT-Multics. So, while he never contributed to Multics, he was around in time to learn that there were computer systems that predated Unix and Windows and that did have a principled way to address safety and security. He hopes the world is now ready for many of the Multics and LISPM ideas that were ahead of their time and have mostly been forgotten during the dark ages of mainstream Internet growth.

1.4 THE FLOW: METHODOLOGY AND TOOL ISSUES

The goal of this section is to put together all topics relevant to methodology and tools

The proposed M2IMPIR flow

M2IMPIR (Matlab2Implementation) quick description

Matlab BB (Building Blocks)

Develop specific BBs.
Reuse BBs, out of one of the several thousands of Toolboxes and librairies.
Build AS-DSP and simulate it.

2 Map to Ideal Platform (BB to BB)
IMPlementation with Ideal Resources

Run and verify
Iterate

M2IMPIR issues

There are 3 types of issues: Matlab, Mapping, Implementation platform.

Matlab: Here the problem is "Matlab BBs".

Because, more often than none, a Matlab BB ends up as a FPGA IP, i.e hdl code, i.e concurency. And Matlab is everything but concurent.
The other issues have to do with the language. Going through all of them is outside this section just say the number is large and many are not obvious.

Mapping: Mapping has a few pitfalls but they are well known.
Implementation platform:

My current methodology is to use a platform with ideal resources (which is not the same as infinite resources). That's all.
Even an ideal platform has some issues

but at least the flow is managable.

For those interested in implementation platforms and architecture, I propose to start with Andre Dehon web site (at Uni Penn) , read all papers and go from there.

Further Topics

My early life: looking for an oxymoron: the descriptive Language
The NPU lessons

The thesis of Norwich Rich.

Retargetable compiler: from bit slice to Archelon.
Meet the infinite loop: simulation and profiling on RC platforms (reconfigurable computing).

Friday, November 20, 2015

Jeff Bier is promoting CNN

---- Jeff Bier is promoting CNN! ------

Thesis: Neural Networks? not again!

I used to say " when I hear the word RISC (or speech recog) I draw my guns". Things have slightly changed so instead theses days, my current one is "when I hear Neural Network ..."

I never had a really good fight with NN because I never met NN in the open.

NN was an esoteric digital technique (like many others) found in the DSP conference papers. Over the years I have kept NN implementation of DCT, FFT, MAC, etc.. and never had a chance to take a look.

But recently I have seen many micro-controllers promoting the technique.

Antithesis: Neural Networks are alive and well

Matlab has a Toolbox

and maybe much more than that.

Jeff Bier put his money where his mouth is:

The Embedded Vision Alliance invites you to attend an exciting event --> see below:

Synthesis: No thanks, I pass on that one. I still think it is bleeding edge.

DSP architecture today

The goal of this section is to put in one place any architecture bits and pieces which are interesting for this blog.

The generic design methods

Vector processing
SIMD, swp (sub-word parallelism), multi lanes, packed arithmetic...

where to stop? 256? 1024?
why stop at 1K? Matlab row vectors, cln vectors, arrays are a magnitude higher
Minimum parallelism= 64K

simd of 16 x 16 clusters x 16 cores x 16 modules

Golden ratio (x4, x16, x64, x256)

On My watch

"ASPI world"

Andre Duhon
FCCM
Xilinx at Large

GPU

Matlab and GPU
Coda
Deep packet searching

Hot Chips

TenSilica cores and domain specific platforms,

Cadence

They still Matters

ARM Cortex A - family and evolution

how far will they go with speculation?
repeating the same mistakes in a new way?

Intel ecosystem

Altera integration

TI Platforms evolution

replacing DSP core with hardwired COP

Lessons Learnt

---------------

Saturday, November 14, 2015

SPmag1990 jan

SPmag 1990 jan

(this post was written while listening to: Dead Europe 72 interleaved with NRPS first album)

We start lucky. In this issue, the main article is a survey of multiplier techniques by the master himself (Fred J. Taylor). Also interesting is the introduction of Matlab version 3.5, including the all new signal processing toolbox. All the book reviewed and many items are relevant to this work. We will NOT finish with a very fashionable topic: Neural Network.

BB struct

The multiplier (fred J. taylor)

This is the DSP BB by excellence. There is hardly any algo which does not rely on a multiplier and it is not cheap. As Fred mentionned, the 16x16 multiplier occupies 40% of the 32020 chip. And I believe 80% of non-memory real estate.

Now, in our world of ideal resources why do we bother with the size of the Mult? Because ideal does not mean infinite. If we develop (say) a 2000 Million node chip, each node would better be optimised. And the best choice for a node (or PE) is a MULT. The next best choice is a DSP core, where MULT is still 80% of the core. (it reminds me of the debate at Xilinx to replace the embedded 18x18MULT with a C25 like core to extend their market share; all people cannot be right all of the time)

So here are the techniques (flattened) given by Fred:

Shit add or iterative
Booth algorithm (and modified~)
Wallaces trees
Cellular arrays ( Perazis, Baugh-Wooley)
Systolic Array Multiplier
Bit Serial
Distributed arithmetic
Canonic signed digit number systems
Logarithmic number systems
Residue Number sytems

With the hindsight of 25 years, what to think? Well it stands pretty good, except for the systolic array.
What about the non-traditional number systems? Frankly we are very NOT positive about these, because the problem is the time lost in the interface.

As for the others, bit serial and DA are common FPGA techniques and are available in Matlab too (how to generate HDL code for a lowpass FIR filter with Distributed Arithmetic (DA) architecture).

The parallel techniques are standard logic techniques but they are still the most promising because the exploration space is very vast.
We have a favorite, which is a kind of subword parallelism. The longest delay in a 16x16 MULT is the final 32-bit adder. This delay can be halved by splitting the adder in 3 16-bit adders. By the same token we can subdivide further with byte and nibble adders hence reducing the delay from 30 to 6 logic gates. We simulated this type of architecture in Matlab (using FP!) (see elsewhere).

Workshop: VLSI for Signal processing at Iccasp90 (Edward Lee)

It is mentionned that several Experimental Parallel Machines for signal processing will be explained. OK let us see.

With the hindsight of 25 years, what to think? A bit of disaster array (sorry area) and we know the results (the GPU). However, in the M2IMP world where the resources are ideal, and the software is straightforward, each of these EPM will be given a good look in due time.

Vector Quantizer (Tran & all, IEEE tr. on com, sep 89)

It is worth asking if the VQ is a structural or functional (speech specific) BB.
In our experience we consider it as a generic unit with app. specific parameters.

Matlab? It is an object in DSP system toolbox .

With the hindsight of 25 years, what to think? This is a good example where Matlab is now the reference.

BB DSP fun

FFT workshop at ICCASP90 (John Cooley)

In this workshop the FFT master explains all the options and innards, including the different radices. Great!

Matlab? FFT is a built-in. The number of samples is totally flexible. The problem is that it is a black box. Not quite all dark as it is based on FFTW. But we had a mixed experience with the golden model methodology. While trying to develop a radix 9 FFT and comparing it to Matlab, stage by stage. As a matter of fact it was easier in Excel. As I am writing this, it does not make sense since in effect I am saying that FFT is non determinstic. But so is my memory of it. I do remember Marc having a lot of trouble with Viterbi decoder but it is different cause because it is a treillis.

Further Transforms - workshop at ICCASP90 (John Cooley)

In this workshop, John goes further with number-theoretic transforms(NTT) and polynomial transforms.

Matlab? As far as I know they are not included. Fortunatly the community supplies them

NTT: NTT.m on matlab central.
PT brought me to the site of chemnitz, Daniel Pots. In turn brough me the NUFFT toolbox.

With the hindsight of 25 years, what to think?

FFT: is now old hand, {see elsewhere}.
NTT; I have no clue, but a NTT engine would be nice.
NUFFT (non uniform FFT) an extremely promising field which I bundled with CS (compressed sampling).

Communication paper: dpll

Matlab? not included; see ecosystem: Modeling and Simulating an All-Digital PhaseLocked Loop By Russell Mohn, Epoch Microelectronics Inc.

Communication paper: sigma delta

Matlab? included

BB in DSP domain

A tutorial at ICASSP 90: SAR (Synthetic Aperture radar) (David Munson)

Matlab? Included, excellent Block diagram and tutorial.

BB in Matlab specific

These Building Blocks are Matlab specific because if they exist it will be more likely in the shape of M-code than anything else. Never met them in DSP software libraries or commercial chips.

Signal processing Toolbox is introduced {advert for Matlab3.5}
First ever SP toolbox. This includes a large number of functions many of which are not very common.

Non Linear Spectral Estimation Methods (Prabhu &all, IEEE proceedings, June 1989)

Here is an article which covers algos than a standard DSP cannot implement. Among these techniques are arma, music and esprit.
Matlab? They can be found over diverse Matlab toolboxes.
Also, many of them can be found in the very professional looking HOSA toolbox which is available from the Central. {HOSA - Higher Order Spectral Analysis Toolbox by Ananthram Swami 12 Feb 2003 (Updated 13 Feb 2003) Spectral and polyspectral analysis, and time-frequency distributions.}

SUMMARY TABLE

Monday, November 9, 2015

The DSP Historian

DSPs - Archives

The DSP Historian (see below)
1978 - 1987: the call-Up
1988 - 1995
1996 - 2004
2004 - Today
Pre History

The DSP Historian:
The goal of this section is to go through all the papers having for topic "the DSP uP chips". This will become clearer as we go in order of things.

Background

The knowledge of " History of DSP uP chips" is a major tool in my "alternative world" methodology to develop AS-DSP IP. As explained elsewhere I believe that anything after 1990 has little value compared to whatever happens before. Hence the "History of DSP uP chips" become 2 major fields:

before 1990: the chips ans architecture to know
after 1990: the field of ideas: sounding board, the mistakes, the real advances, etc..

Taking that at face value, there are plenty of buddying architects who would like a quick summary of DSP uP chips before investing further time.
Hence, the issue becomes: where is the best story? "who is the best DSP historian?"

Anything in my garage?
I will do the usual and go through my stash. In one folder I found stuff which should be available on the web, but not always free.

Noted, in a 10 page paper [1],

the first generation dsp (amd2900 and TI C10) followed by the 2nd generation (C25, dsp16). This is so wrong that you have to be admirative of the author(a lateral thinker? an hyperspace traveller?).
Anyway, the point here is that by studying the history of DSPs, the architect is obliged to classify (by generation, by features) and hence make choice, which as we all know, the lifeline of a DSP architect. If you cannot even classify DSPs don't bother.
Also noted are references to the pipeline, including the mind-boggling <data stationary versus time stationary>. This, I believe, is a complete red herring but in the world of free world DSP, anything is fair and square.
OOh, and the DSP32C has a reservation table!!
Finally, another "gem" is Edward Lee's "interleaved pipelined" architecture. Hum?

Speaking about Edward Lee, not only he is the grand father of many things (see BDT) but also the father of the classical paper[2,3], a first, dated 1988 < programmable DSP architectures>. If you have access to the paper, you can stop here.

There is an IEEE micro version dated 1990 [4], in which Lee conservativly dropped the architecture term.

In these days, only a few people could risk using the term.(see MPR oct 1989)

In my case, Lee was was far from being the first one. Since 1978, I had already written 4 different versions of History of DSP chips including a 5m long mural, so Lee was of little value.

At this stage it is worth mentioning the TRILOGY (Will, Gene, Jeff). What where they doing in 1988?For different reasons, none of them had written"THE REFERENCE".

Jeff: obviously, being a student of Lee, postdate his work.
Gene: being in full war with his competitors, he was not in the position to write objective reports. Still, good grunting noise could be heard at TI conferences.
Will: I came across his first(?) report circa 1986. Primarily a marketing document, but in these days it was a technical gem.

Following this, here are much more recent papers, freely available, all being university courses on DSP .

With the title, Special Purpose Digital Processors (DSP) [5], a new type of acronym is introduced: the warped acronym. Why not? e.g. General Purpose Processor (CPU). It is a rather exhaustive 29 pages lecture note document from TUHH.
Less exhaustive a Texas loaded ppt from Austin[6]
A french in french 144 slides pdf from Irisa [7]. Note the T.O.C is perfect for a classical story of DSPs.

I. Introduction
II. Architectures MAC/Harvard
III. Evolutions des DSP
IV. Flot de développement (Flow)

Finally [8] is a french paper in English from 2002. The authors are from McGill.

Gene's law

As already mentionned Gene Frantz was kind of busy proseletyzing the world to the DSP mantra till in 2000 he was convinced to write a Millenium paper [9] for IEEE micro in which developped his popular (among us DSP architects) Gene's law. Gene's law is the equivalent of Moore's Law.

First for the not so good.

Here is the lesson for us all architects: we DON'T KNOW HOW TO limit ourselves to history. It is always past, present, future. Extrapolation is what sells the paper and 9 out of 10 we are wrong.
Unfortunatly this paper was badly timed as Moore's law (speed) was going broke..
Gene extrapolated for the 2010 and came with the 10GHz DSP .

Myself, around that time I had papers extrapolating speed for CPU and embedded CPUs such that 2010:10G __ 2020: 20G__ 2030:30G for CPUs you see the trend. Embedded CPUs had a lower slope and were around 5G in 2010.

Now, even better is the Moore analogy. Little known but true, in multiple interviews circa 1980 Moore explained that with ultra VLSI, except for FFT he had no idea how to fill a logic chip. Good one, Gordon, no wonder you came up with the !IAPX432. In the same way, is Gene, turning the TI slogan on its head "limits to your imagination" here is the excerpt (Determining how to use that processing power effectively will require imagination that goes beyond conventional engineering methodologies.)...ooooh my oh my!
In all fairness, the paper was called DSP trends, so...

Now, for the good things, .

Moore's law is broken but I would not be so sure about Gene's Law
Gene changed architecture by putting power consumption (and efficiency) to the forefront.

This is what matters to me. Effectivelly it means than in the end, after all low hanging fruits are gone, there is only 1 technique left to improve efficiency: hardwired is replacing software. Customization is replacing cpu clock.{at this stage parallelism and multicore is just an implementation detail}

Gene extrapolated the 50 GIPS DSP which did happen and even by today seems pretty tame.
Gene mentionned a lot of stuff, details that I picked up and in the end, this is it. MY MOST RECENT PAPER ON THIS TOPIC WITH ANY VALUE.

From Lee to BDT

By the 1990s, Jeff Bier was starting an exceptional business <www.BDTI.com> B standing for Berkeley (see the story of B&B elsewhere). As part of his duties to the DSP community at large, Jeff was given lectures on DSP chips. So if you've read Lee's seminal paper the next logical step is a Jeff's paper. Which one? No idea, I have 41 in my stash. Even Jeff does not know how many he made so better go the BDTI web site.
Also, there is much more to BDTI than DSP chips history and trends. They are THE reference in DSP benchmark (see elsewhere).

Krishna Yargalada

By 1996, BDT had the monopole of "DSP chips - history and trends". Serious talk about breaking it, was heard in congress. Courageously Krishna Yargalada attempted to infiltrate DSP conferences with his own version[10] but t,hinking about it, it was a clever disguise to launch Hellosoft in what became the big revolution of our world of DSP "outsourcing assembly to India" {see elsewhere}.

Henry Davis

Hi Henry!

Robert Cushman and the EDN DSP directory

From 1981 to 1988(?), Robert Cushman, a major contributor to EDN, wrote many articles on DSP. He put together the first EDN (microprocessor-like) DSP directory in 1987. I have all of them till 2008. Going through them in order, gives an exceptional view of history.

THE UBER REFERENCE

Noted with pleasure that a lot of papers put as reference the BDTI classical book.
Phil Lapsley, Jeff Bier, Amit Shoham, Edward A. Lee , “DSP Processor Fundamentals: Architectures and Features,” IEEE Press, 1996.
This book is THE BIBLE reference for DSP architect (see elsewhere). It is much much more than an introduction to DSP chips.

References

<honestly I dont know; lost in history>
Edward Lee "Programmable DSP Architectures Part 1" IEEE ASSP oct1988
Edward Lee "Programmable DSP Architectures Part 2" IEEE ASSP jan1989
Edward Lee "Programmable DSPs - a brief overview" IEEE Micro oct1990. This is essentially the intro of paper 1 + an excellent summary table.
F. Mayer-Lindenberg, "Special Purpose Digital Processors (DSP)" TUHH (Hamburg University) lecture notes, dated ?
Brian L, Evans "Introduction to DSP" University of Texas, Austin
<http://www.irisa.fr/R2D2> Olivier Sentieys, IRISA, 2005
Benoit Champagne, fabrice Labreau "An introduction to Digital Signal Processors" Compiled 2002
Gene Frantz "DIGITAL SIGNAL PROCESSOR TRENDS", IEEE micro nov-dec 2000 p.52
Krishna Yagarlada "DSP chips for communications" which conf?? 2003?? (see excellent slide)

This is technical slide, illustrating different architecture choices !!

Editor notes: no links are given as they become obsolete. Google keywords instead!

Saturday, November 7, 2015

The philosophy of BB

A BB is either simple or complex. It should never be complicated.

The difference between complex and complicated is that complex can always be broken down in a sum of simple things.
And complicated is well .. complicated. For instance a lot of software is complicated.

People familiar with the evolution of electronics, remember the SSI, MSI, LSI stages and can put the heydey of BBs as the days of Bit Slice.

A bit slice was built with simple elements and a bit slice was itself the building block used in upper dsp Building Blocks, dsp functional Blocks (FFTer, Filter, Correlator), dsp CPU or even the DEC mini-computer.

For instance a standard dsp CPU (called DSP thanks to TI) are made of 4 standard bit slice blocks (ALU, MULT, AGU, LSU) + memories + I/O Space.
A very good DSP design should have a maximum of 3 hierarchical levels (MSI, LSI, Final product).

Anything above 5 levels is prohibitive and should not be tackled here.

Because we rely on a hierarchy of blocks, testing of each block is vital to the whole process.

A bug in a low level block is a disaster since it will be repeated hundred of times over several final products.

Same thing for an inefficient block but with a twist.

Contrary to a bug, efficiency is a relative term.

You can redesign the BB with a different name.
(for sophisticated designer) you can redesign an upward compatible block with the same name

but then you need a very hefty test suite.

So what about repairing a bug?

In my book, a bug is both visible and NOT compatible with itself.
Hence if you repair a bug you don't know the result; somewhere down the line somebody wrote software taking this bug into account or somebody used the BB taking the bug into account.

This is the INTRINSIC problem to the BB approach. You are going BOTTOM UP in the definition. So forget about reparing bugs. Instead create another BB.

Matching functions to structures

====== Building Block ISSUES =======

Matching functions to structures

When designing a new BB it is because you need it. Hence, we call this need: a strictly functional need. There are two ways to go:
- stick to it and solve the present issue.
- have a broader view and design a more general purpose brick which can be reused.
A third case happens when the functional block can be turned into a generic structural block for no additional price. So before designing a new block, it is worth asking oneself.
Can I make this BB more generic?
Let us take as example the addition needed in a complex multiplication:
zr = xr*kr -xi*ki
zi = xi*kr + xi*kr
same as
zr = p1 - p2
zi= p3 + p4
If we use a dual multiplier, the 4 multiplications can be done in 2 cycles.

Then it must be followed by two BBs for the complete equations. We now write the equations as follows.

[zr zi] = thru22(p1,p3) % zr=p1 zi=p3
[zi zr]= addsub42(zi,p4,zr,p2) %zi= zi+p4 zr=zr+p2

It is easy to see that the two BBs are generic but the inputs and the equations are not straightforward..

The standard alternative is

zr = sub(p1,p2)
zi= add(p3,p4)

which is simpler. So what are taking about here?

In fact there are two issues with the standard way;

It needs the 4 input data on the first dual multiplication
It does not work when we want real_imag to be in the same register so that the complex number is considered an entity made of 2 parts..

Thursday, November 5, 2015

T.O.C 4Nov2015

So you want to be a DSP architect?

THE LONG STORY

DSP Architecture today

BACKGROUND CHECKS

Henneson and Patersy
Coprocessing
Benchmarking

BDT as an architectural tool

BUILDING BLOCK ISSUES

Physical limits
Design time, Build time and Run time

METHODOLOGY AND TOOL ISSUES
FOUND IN THE WEBS

The Matlab Engine
Is Kurt Keutzer the Donald Trump of Hardwired Processing?
Found in the cobwebs: my garage
The Trailing Edge

DSP (GP) ARCHITECTURES

DSP of the First Kind (1980-95)
DSP of the Second Kind (1995-2005)
DSP of the Third Kind (2010- ?)
DSP of the lost kind (1995-2005)
DSP of any kind (1950-2050)

FROM DSP TO DSP

This wonderful world of DSP

A world? More like a sect!
The Pope (Will), the Cardinal (Jeff) and the Wizzard (Gene)

The DSP Old Timer's Club.
Stop me if you've heard this one before!
DSP history

Bit slice: a saga 1975-1985
Building Block: was 1992 and IDT the last DSP BB? or Weitek?

EXAMPLES of Target AS-DSP

Analog programs (Basic, HPC and other gizmos)
TI C25 Tips and Tricks
SPmag Tips and Tricks

BB Level 1

Arithmetic BB
Bit wise BB
Vector BB

BB1 DSP STRUCTURES

Processing trilogy (ALU, BMU, MULT)
MAC
DAU
AGU
PCU
RF (Register File) and MEM
VPU (Vector Processing Units)
SU (The Shuffle Unit)
Bit slice: Bit slice building blocks

BB2 DSP FUNCTIONS

Filter
FFT
Correlators & bit comm. engines
Sampling
Matlab functions

BB3 MATLAB BB
BB4 MATH

including Complex numbers

APPENDIX

Is Kurt Keutzer the Donald Trump of Hardwired Processing?

The answer is No!

The link
http://www.eecs.berkeley.edu/~keutzer/

Background
In the sputtering days of CPU Architecture, just after the millenium, I remember KK (and Berkeley in general) as a different voice as to which directions to take for future CPUs. Effectivelly here are a few arguments of the time (*):
- we are now in a hardware cycle as opposed to a software cycle
- the next step in MIPs is NOT reached by increasing the clock
- instead programmable logic or FPGA or hardware or ASIP (Application Specific IP)
- hurray for configurable core (typ. tensilica, then stretch)
- hurray hooray for reconfigurable computing (all dead).

Jump to 2015. As I am going through my notes on possible architecture trends (of interest to this Blog), I have 2 KK papers that I want in electronics format. Google the paper name + author and got nothing. Lost in the ozone. Okay I will scan them.
To cut a long story short, the KK I found is not the one I had in mind(*), but still an exciting guy to follow..
Anyway, here is his web page; among other things, KK mentions that he lost money with Catalytic, sorry, I contributed to that Kurt!
Here is his current interest :
Exploring Design Patterns for Parallel Computing
which is somehow related to this Blog.
I downloaded 2 papers:
- how to map recursivity in hardware
- speculation.
And the other papers are good too!

(*) as I remember; I might be confused, not sure if KK was leading all that;

1.1 - THE LONG STORY

The goal of this section is to give some meat (and the background) to our base statements.But it does not try to discuss or justify any of our choices.

As mentionned in the introduction, we are in the very comfortable position of presenting a system which is outside reality.

In effect, all decisions have NO tangible reasons (such as based on cost, design time or available tools and platforms).
It is all in the eyes of the beholder.

Q&A

DSP architect is Dead?

Today DSP architect job's split: 95% platform, 2% Core, 3% tools.
PLATFORM:

Is there a DSP architect in the house? No, he is split into pieces
The system guy (m-code and simulation) --> 90% of the work
The DSP guy (*) --> 10% of the work
The firmware guy: can take the job of the DSP guy, especially writing in C
The ASIC/FPGA designer: can take the job of the DSP guy.

From above, the REAL DSP architect is the system guy? yes

But this guy is not (does not want to be) an architect
{..} Here would come all the efforts such as Matlab to C/verilog, System C, etc,, and in general all the failures of the Tool vendors to provide a flow:system to implementation.

Why customisation?

Because we used to be vertical, went horizontal and now we are back to vertical.

What is accelerated processing?

see IEEE micro july/aug 2008

Why Matlab?

Because it is the most popular system tool.
Also Matlab is much more than a language (randy's 3 things 1):

a language, ***
an EDE ****
a simulation environment *****

Matlab is not a concurent language (contrary to verilog). Does that prevent you from sleeping? here i avoid the use of the word parallelism because of the funny parfor.

What is mapping?

An analogy: mapping is hardmacro, compiling is soft macro
An approximation: mapping is by hand, compiling use a tool.
An aromatism: mapping is an assembler macro, compiling is C

Can mapping be used with a compiler?

You bet! As a matter of fact, it is the only way to be succesful.

Friday, November 27, 2015

Sunday, November 22, 2015

The proposed M2IMPIR flow

Further Topics

Friday, November 20, 2015

---- Jeff Bier is promoting CNN! ------

Thesis: Neural Networks? not again!

Antithesis: Neural Networks are alive and well

Wednesday, November 18, 2015

On My watch

They still Matters

Lessons Learnt

---------------

Saturday, November 14, 2015

SPmag 1990 jan

BB struct

The multiplier (fred J. taylor)

Workshop: VLSI for Signal processing at Iccasp90 (Edward Lee)

Vector Quantizer (Tran & all, IEEE tr. on com, sep 89)

BB DSP fun

FFT workshop at ICCASP90 (John Cooley)

Further Transforms - workshop at ICCASP90 (John Cooley)

Communication paper: dpll

Communication paper: sigma delta

BB in DSP domain

BB in Matlab specific

SUMMARY TABLE

Monday, November 9, 2015

DSPs - Archives

Saturday, November 7, 2015

The philosophy of BB

Matching functions to structures

Thursday, November 5, 2015

1.1 - THE LONG STORY

Q&A