Wednesday, September 28, 2011

Found in the webs

Oxford DSP


1.2 Background Checks

1.2 Background Checks

  1. Henneson and Patterssy
    1. Computer Architecture
  2. Coprocessors (COP) 
  3. The Core
    • Which type of core? -->  let us start with a DSP of the first kind
    • Is a DSP core best for DSP?
    • Xilinx has a MAC. What is the next level?
  4. Benchmark and benchmarking
    1. BDT
      1. VADD
    2. The noble art of profiling
      1. Profiling an ever changing reconfigurable machine
    3. Optimization and tuning
  5. Fixed Point Dialects
  6. Algorithms, signals, structures, tips and tricks
    1. Signal Processing
    2. Matlab
      1. Interesting feature: is M predication same as CA predication?
  7. Once a (logic) designer always a  pain in the ass designer 
    1. Logic
    2. Arithmetic
    3. Sequential
    4. Xor constructs
  8. SOC architecture
    1. the ideal : NS Dgt Answer Phone -- 3 serial ports

Fixed Point Dialects

Fixed Point (FXP) dialects
The goal of this section is to summarize in a few words as possible the gigantic complexity of FXP.

Background
  1. When 2q30 is 1q30? 
  2. Float or integer
  3. First it was q15
  4. Then came q14
  5. Then came 1q15, 1q31, 2q30 etc..
  6. Then we got lost; 1q31=1q15 but 1q15  ~= 1q15 !!
  7. Still 

List of FXP dialects
  • Assembly language on first generation DSPs 
  • ITU basic operators
  • Matlab FI
  • Simulink Fixed Point
  • System C
  • DSP extensions in C
  • Proprietary FXP languages from CAD companies
Issues

FXP and integer
When translating FP to FXP the common mistake is to think that replacing the float data types with integer data types will suffice. This would be true if the operations were addition or based on addition. But as soon as multiplications are involved this does not work so well and it gets worse with division, transcendental functions, etc...
The reason is (in 99% cases), an FXP is not an integer. It is a fractional number. It can be a pure fractional (less than 1) or partly fractional (for instance in the range +31 to -32 will have 11 bit of fractions, in a 16-bit wide word).
Hence it is well known that an integer 16x16 gives 32bits and a fractional 16x16 gives 31bit ( and a saturated case). Also it is easy to visualize that an integer division and a fractional division will grow in opposite directions.

   



    Saturday, September 17, 2011

    Computer Architecture

    Computer Architecture - Viewpoint
    As a "DSP architect" I never considered Computer Architecture(CA) a major influence on my work. Still, the field is remarkable  by its emphasis on quantitative approach(*) (**) and the number of ideas generated/implemented in the last 30 years.
    (*) dixit Hennessy-Patterson
    (**) Here quantitative approach is opposed to the "free thinker attitude" of the glorious silicon valley architects.

    Computer Architecture - Past, Present, Future
    Obviously, the heydays of CA are long gone. First we had the death of the major silicon valley conferences, the end of many CPU architectures, then the confidential reach of Microprocessor Report. Nowadays when the main subject of excitement at Hot Chips is virtual machines you know that the field is in trouble.
    Around 1900 one famous Physicist announced that we had found everything in Physics except this little second order detail. The detail happened to be the source of Einstein's relativity.
    So by analogy and to be on the safe side, we will announce that we have found everything in Computer Architecture except this little quantitative improvement called "transactional memory".

    Computer Architecture - Value Proposition
    The point i am trying to make is that you cannot be a very good DSP architect without a serious background in CA. You can be very good at designing one product but not a series of products over time and even less a hierarchy of differentiated products  (such as cores, super cores and chips).
    Now, on the other hand, the beauty of customised DSP architecture is by definition to have no past of future. You are designing a point product (tailored to the task).
    Still, we will insist. The ignorance of the lessons of 30 years of CA will bite you one day as bitterly as having a mistake in your rounding function. So to fill that gap we will go through a CA check list.

    Computer Architecture - Checklist
    • Low hanging fruit: Hot Chips conference
      • all 25 years+ of conferences are available on the web
    • Microprocessor Report and Microprocessor Conferences: 
      • were as theirs best during the war years (f86 vs SPARC vs MIPS vs HP ) (1990-2000). 
      • After 2000 became more of a SOC conference and a lot more DSPs too, including some interesting Coprocessors. 
    • Speaking about Cop, the first Cop was the FPU (Floating Point Unit)
      • evolution 8087, 80287, 80387 and opposition to 68881
    • The intel i860:  one of the rare "beast" I never had time to analyze in details; 
      • genuine Computer Architects put it at the top of the list, so..
    • My favorite weirdos
      • Intel IAPX 432
      • Transputer
      • Fairchild F8 
      • and others 
      • because they tried to implement the SPC (Stored Program Control) model in a different and sometimes radical way. And they were always wrong.
    • Transactional memories for the same reason. But in fact I am interested in intelligent memories or using memories as coprocessors
      • IRAM
      • CAM,
    • Multithreading, Reconfigurable Computing, Transmeta patents
    • TenSilica and Stretch
    • Keep an eye on the latest chips from Intel (including IDF), one never knows
    • Not recommended: 
      • IEEE computer magazines
      • ARM (boring)
      • GPU (a world in its own) 

    What is a Computer Architecture: My narrow vision
    Think of the role of an architect building a BIG house. He knows all possible styles, all the buildings of the world, pyramids, bavarian castles, Gaudi.


    For me, it is more a matter of culture than qualitative measures. The analogy with DSP architect is to know all DSPs ever built. 
    Also the architect must know the construction structures and which material to use, why and where..
    The analogy here is to know your buses and DSP Building blocks.

    Sunday, September 11, 2011

    1.3 Building Block Issues


    ====== Building Block ISSUES  ======= 

    1. Naming conventions
      1. Example: addsub21 
      2. The Philosophy of BB
    2. Matching functions to structure
    Naming Conventions

    The biggest of all BB issues is strangely enough the organization and naming convention. For people familiar with software libraries this is a trivial problem but for people designing chips, or instruction set, or say building block for a FPGA library  this can really turn into a nightmare.
    So the right attitude is not underestimate this problem but do not spend much time on it (*)
    • The best BBs have 8 letters or less.
    • Have a naming convention from the beginning  and STICK TO IT.
    • Expect this naming convention to become dirtier as time goes by. As you design more and more final products, there is obviously positive  feedback trickling down to the lower level and that means redesigning the block and giving a derived name. Such as a 3 input adder working on on vector of 72 packed bytes called ADD31_PBV72, becomes ADD31_PBV72A or PBV721 or PBV72v2; bottom line it complicates your nice naming convention.
    • Expect this naming convention not to be universal; when a BB (between 1 and 10%  will) not fit the naming convention; just call it Jack, jojo or Kapernic.
    • Instead keep an updated excel with a minimum of characteristics for all designed (and planned) BBs. This is a real pain, but it is necessary. you cannot judge a BB by its name. As I will show in an example ADDSUB can literally have more than 100 definitions.
    • Any automatic method helps but in all cases it means human reviewing since you must at least know what you have in stock. Remember that you are a designer not a user.
    • As you go up the food chain, naming conventions must change from a description (MUL16x32) to a catalog entry (DSP5400).
    • There are functional blocks which as least should have the name of the function (FFT). And also can be both descriptive and catalog. Hence FFT16x32 is nice but then what about final result is it 16 or 32? do you use rounding? bit reverse? in place?  Very soon you adopt the catalog attitude such as FFT1, FFT2, etc..
    • Parametrized blocks help but are costly in description, hardware, testing and software.
    • A good solution is the concept  of toolkit and build. To have a toolkit ready such as new blocks are built, tested and DOCUMENTED only when needed.
    • I strongly recommend to use a meaningful name for every build and not the automatic conventions found in software versioning. Firstly we are speaking about generating different BBs, not different versions of the same BB. Secondly a BB is by definition simple. It is bounded in functionality, input and output. This is not the case of software.
    Example of building block: addsub
    In this example, we are showing how even simple function can rapidly come out of hand. First let us start withe name. My naming convention is based on 3 fields: functionName[IO pins][arithmetic scheme]_[data type]. So for instance :

    In this case we will use the Matlab default (FP) which has also a default arithmetic scheme and we will call the component addsub22 such that:

    function [Z W]=addsub22(X,Y)                                         Z= X+Y;
      W= X-Y;
    end


    So far so good. So what can go wrong with the definition? Before we continue, it must be noted that this is NOT the most popular definition. The popular definition has 4 inputs and consists of two independent add and a sub in parallel. This is okay since using our naming convention it would be  
    function [z w]=addsub42(x,y,u,v)
        z= x+y;
        w= u-v;
    end

    The Data types
    Using (b,s,l) hungarian convention we define the 3 signed integer components:
       function [bZ bW]=addsub22_b(bX,bY)   % int8
       function [sZ sW]=addsub22_s(sX sY)     % int16
       function [lZ lW]=addsub22_l(lX,lY)       % int32
    We can use standard Matlab FP to define these components. It is a bit wordy since we must write the code for casting FP to integer and also FP does not offer 64-bit precision.
    The alternative is to use the the Matlab "integer" type (int8,int16,int32)  but it does not cover all Matlab functions (complex numbers) and it offers only 1 arithmetic scheme(sat).

    The modals (arithmetic schemes)
    How to indicate the modals? Contrary to popular beliefs theye are NOT 2 (modulo, sat) but 3 (modulo, sar, promoted). Promoted is the integer equivalent of FP. As the data grows so do the output.
    function [sZ sW]=addsub22m_s(sX,sY)   % if ovf, results are wrap around
    function [sZ sW]=addsub22s_s(sX,sY)   % if ovf, results are saturated 
    function [sZ sW]=addsub22p_s(sX,sY)   % output is 17-bit so there cannot be ovf;
    !) note that the hungarian notation is plainly wrong for the promoted outputs.
    !!)  and the "p" scheme not precise enough; output could be 32 bit (standard in CPU)


    A vast and  unfriendly territory is the Fixed Point data type (FXP). 
    Developing an FXP algorithm in Matlab native type (FP) might be faster than for integer, So the difficulty is not there.Moreover Matlab offers its own FXP language which is natively based. 
    The complexity comes from multiple misconceptions with FXP (not covered here). 
    Here is the code
     function [z w]=addsub22_q(x,y,qpoint)
       t= plus(x,y);    
       z= qformat(t,qpoint);   
       t= minus(x,y);    
       z= qformat(t,qpoint)  

    Note that code is flexible enough to take any q format and even integers. An int16 is defined as16q0 and an int32 as a 32q0. 
    But  there are also serious limitations. Matlab FP (double) is limited to around 43 bits of precision. That does not fly well for 48-bit accumulators (or a 108-bit accumulator as I saw on a recent piece of DSP IP). 
    Also there is no casting in inputs, only in outputs. Hence the basic arithmetic scheme is FP (promoted). 
    Finally there is no concept of unsigned here.

    Another vastly uncharted territory is sub-word parallelism(SWP)
    Also called Packed arithmetic or more commonly called SIMD  or whatever marketing Pentium extensions, sub word parallelism (SWP) is a very powerful BB technique which cannot be ignored.
    Matlab offers inherent parallelism, shapeless operators, packing and predication flags which makes it a good match for SWP. 
    But in the example I will show the difficulties of SWP.
    If we take the BB called addsub42, its SWP version becomes addsub21 such that:
    function z_w= addsub21(x_u, y_v)
      z= extract_left(x_u) + extract_left(y_v);  
      w= extract_right(x_u) - extract_right(y_v);  
      z_w= pack(z,w);
    While this code is essentially correct, it is not useful. There is still too much vagueness to be used as specs.

    References


      Foot notes
      (*) not like my friend Klaus who started a war on verilog naming convention):

        Sunday, September 4, 2011

        1 - So you want to be a DSP architect?

        Statement
        A DSP (*) combine the skills of CA, the byzantine expertise of FXP,  the beauties of Algorithmic Design and the complexity of signal processing... so it is a good place for a buddying architect.
        We believe the above statement to be true, but for reasons that you do not see in the usual litterature.

        Background
        First the bad news
        Originated at the end of the 70s, and having gone through multiple phases, the 2015 DSP (and DSP architect) is dead!
        1. Firstly because it is very difficult to come up with something else than the CPU model in the software world. In fact among the hundred of attempts, only DSP had a successful run (30 years+) which unfortunately is coming to an end.
        2. Secondly, for the last 15 years "SOC platforms"  have taken over from GP DSP or GP CPU as the mainstream chip. In this situation, the quality of the DSP or CPU core does not matter so much; all the specific IP blocks, including software will have more impact.
          1. Or in effect, the quality of the CPU core is to be as transparent as possible. 
            1. Which is another way to say "Software is the killer" 
          2. "nobody has ever been fired when choosing ARM ".
        Let us consider the standard platform. It will  have a single or dual core ARM, maybe a DSP , many coprocessors and perhaps a DMA so smart that it can be called an I/O processor. Obviously most of the software is running on the ARM which is also taking care of the whole data management. And any programmable processor such as DSP  is an hindrance to the whole programming model. So DSPs (or DSP cores) are slowly but surely disappearing from all platforms.

        Now for the good news: 

        What direction to take for a DSP architect?
        1. Become a full blown CPU architect and say bye bye to DSP.
        2. Join the FPGA bandwagon.This seems kind of obvious and a little spartian too.
        3. Learn 4G,5G standards, good luck! 
        4. Change job. Join Wall Streeeet and the analytics crowd.
        5. Fill your specific description here.
        Now for the news from hyperspace: 
        OR... this is the subject of this blog.You want to be a DSP architect? Plenty of work till 2052!
        Let us put it in a few words:
        1. As predicted by the experts, customization has taken over parallelism.
        2. For a given platform, accels or cops (especially dsp) have grown in numbers and importance.
        3. Matlab is the de-facto dsp language
        4. From points 1,2,3: Matlab will be used to design dsp accels, cops and any application specific DSP (AS-DSP).
        5. From point 3, Matlab has no concept of  CPU (such as a programmable DSP core). Matlab basic concept is to sequence (in order) a series of functions.
        6. The implementation of a Matlab function is done through mapping (not compiling). 
          1. Today we map a simulink block to a FPGA block. 
          2. Tomorrow we will map a a Matlab BB (a function which looks like a hardware BB) to implementation BB (assembler, verilog).
        7. This methodology is based on a bottom up approach; developping increasingly complex BBs in M-code 
          1. Same as a simulink toolset . 
          2. Or the way that TTL parts went from SSI, MSI, LSI. 
        8. Considering that the number of BBs needed is in the order of a few Millions, we estimate that this methodology will NOT be available before a long long long time...    
        So we better start now ...

        (*) Lingo
        Accel: Accelerator
        BB: Building Block
        CA: Computer Architecture
        Cop: coprocessor
        DSP: a Digital Signal Processor chip (TI 66xx ) or Core (TI C64P, Ceva family)
        FXP: Fixed Point

        General References
        Hot chips
        TI web
        http://www.iqmagazineonline.com/archived.php...............ARM IQ journal