Friday, October 7, 2011

Coprocessors (COP)

Coprocessors (COP) 


 The goal of this section is to understand the scope and definitions of coprocessors (COP).
  • Difference between processors and COP. 
  • Difference between COP and peripheral.
  • ........already we can visualize that a COP is less than a processor but more than a Peripheral.
  • Difference between COP and accelerator
  • ........... at one time it was very clear
      • a COP sits next to a CPU on the level 1 bus  (like a 80387 or more recently a GPU)
      • an accelerator sits on the level 3 or system bus (like a Turbo decoder) 
  • Difference between COP and I/O processor (IOP)
  • Difference between COP and Intelligent Peripheral (IPERI)
  • ........... at one time it was very clear
      • a COP is a CPU  extension whereas  an IPERI is an CPU-agnostic block (see evolution from Am9511 to 8087).
  • Difference between COP and Application Specific Processor (ASP)
  • Difference between COP and DSP
    • ...........at one time it was very clear
      • a DSP is not a COP! 
      • But seen from the perspective of software running on the Host (ARM),  it is exactly becoming that, just another API.
    • .............and more
    • For simplification sake, we will use the definition
      • A COP is less than a CPU but more than a Peripheral.
      • A COP is dedicated to fill the system gap between standard CPU and standard Peripheral. It can be a custom function or an application specific processor.
    • Alternatively, the term "Accelerated processing " fits our definition.
    Background
    Beginning of the 80s Intel came up with a Floating Point Unit (8087) which is then the de-facto first coprocessor. The instruction set (ISA) and interface is well documented. Following the very successful 8087, Intel came up with a series of COPs:  the 80130 multitasking software, the 82586 for local network , the 82730 for Text (screen). While the 80130 was tightly coupled, the last two were bus coupled (not unlike the 8089 I/O processor). In fact, not surprisingly for Intel, everything was a COP.
    Co-processing was then one of the solution to the so called problem of extended processing. How to cover all possible applications without specializing the microprocessor ISA?  There were 3 types of solutions:
    1. the macrostore.  The most commonly used subroutines are romed (?) as micro-instructions instead of being executed from the external program store. Proponent: Texas
      1. not the brightest tree in the forest ; still, it could make sense in DSP COP. File under microprograming techniques.
    2. the coprocessor. The  most commonly used subroutines are executed by a specialized processor.
    3. the intelligent peripheral. The advantage being that concurrent processing is easy but the big disadvantage of not being transparent to the programmer.
    After 1983, CPU architecture becomes the standardized way so ---> refer to .. for the rest of the story.

      Topics
      1.  Most interesting was the Motorola answer (68881) to the 8087 which had a similar ISA but a much better non-blocking interface. In other words, Motorola introduced concurrent processing. Incidentally Motorola also introduced the first major COP architecture topic: what type of interface.


      A non exhaustive list of COPs and their applications
      DSP coprocessor:  A list of  ISA DSP extension  in kind of chronological
      While ISA extension and COP are not synonymous, we grouped them together for simplicity reasons.
      The advantage of a COP over an ISA extension are obvious.  The specs including the interface and the DSP extensions are physically separate from the rest of the Core, hence it is easier to model, to implement and to validate. 
      • NS FX161 (~ 1991) had a DSP COP
        • amazing trick!! the integer register and the DSP registers were skewed by 1 bit (because of q format) 
      • ARM PICCOLO (~1996) point solution for GSM speech coder.
        • Stands today for its very original  way of interfacing to the Core; 
        • both tightly coupled and asynchronous data through a FIFO.
          • TBD : integrates notes and ICSPAT 98 Moerman class notes
          • TBD:  matlab model   
        • !!! circular addressing on register file.
          • TBD: matlab model
        • http://www.cs.umd.edu/class/fall2001/cmsc411/proj01/arm/dsp.html
      • HITACHI: SH-DSP  (1996)
      • PPC: ALTIVEC (circa 1998)
      • EXTENSA (circa 2003)
      • DSP PIC (2005?)
      • ........

        Design Issues
        • Tightly or Loosely coupled?
        • How do you return data to the core? interrupt?
          • blocking , non blocking
        • Memory hierarchy position
          • Level 0 memory : COP has access to the core Register File.
            • the COP is just another execution unit inside the DAU
          • Level 1 memory: COP has access to level 1 Memory
            • even better: COP sits in the same place than a level 1 memory
              • see FFTer from ?
          • Level 2 or 3 memory: COP sits on one of the system Buses
        • Instruction Set or not?
          • In theory an instruction set is good idea but it implies a lot of added complexities none of them major, but the whole can become unmanageable.
            • tool issues
            • C compiler or not
            • opcode design
            • added power consumption due to fetch,
          • parameters are preferable especially a combination of build and run parameters.
        • Scheduling techniques
          • "pure" datapath    
          • vector processing  (access to data as block in memory)
            • length,stride
          • pipeline data path 
          • sequential ; concept of clock  ; if cycle==1 ... if cycle==2...
          • autonomous  z= FFT64(x)
        • Topology: how many ports? 
          • a port must be a physical reality and not be a pointer to a structure.

        Further Topics 
        • Accelerated Processing  (AP)
          • Traditionally AP is divided in several techniques 
            • Central Core (CPU) based
              • Specialized CPU
              • CPU + COP(s)
            • Periphery (Non  CPU) based
              • Intelligent peripherals
              • FPGA
            • anything between  Core and Periphery.
              • to simplify: Core is level 0 memory and Periphery is level 2 or 3.
            • anything outside the chip is considered periphery.
            • since our focus is customization we do not consider massive parallelism as a solution. 
          • For our application space (dsp) it is simpler to treat AP and COP as a single topic.   

        Advanced Topics
        • FPGA Nodes:  Combining Massive Parallelism (MPP)  and customization
          • MPP machines kind of disappeared of the DSP (and embedded) scene for obvious reasons of programming model and power efficiency.
          • the next generation is based on a slightly different approach
          • you have a switch fabric (say 16x16 or 256 nodes) and each node is dedicated to a function
            • in fact some guys proposed a fabric based on the FFT treillis instead of row/column
          • this approach is interesting because it is more sophisticated than our proposed signal graph
            • since we map a Matlab/Simulink flow. 
          • We are not familiar with the state of the art but it does not seem that this type of solution went deeper than FPGA implementation.
          • And maybe it is the right technology.
          References: my garage, google and questions 
          1. "Making software acceleration simple" Critical Blue, 2002
            1. http://www.criticalblue.com/
            2. What is the Critical Blue philosophy? the methodology? the application space?
            3. Is there a paragdim shift?
            4. Any link to dsp? Matlab? 
          2. "OptimoDE.;...etc" ARM, Hot Chips August 2004
            1. http://www.hotchips.org/archives/hc16/3_Tue/12_HC16_Sess9_Pres3_bw.pdf
              1. see also the PPT slides from CCCP, University of Michigan
            2. http://www.iqmagazineonline.com/magazine/pdf/v3_n3_pdf/Pg74_ARM_Phonex.pdf
            3. Originally developed by Adelante (an offspring of a Philips research company). They were partially bought by ARM.
            4. OptiMode is a general purpose (GP) COP. What is wrong with this approach?
            5. OptiMode is a GP methodology to design a COP. Advantages and limitations?
            6. It is based on a VLIW core. What is the one big wrong with VLIW?
              1. Compare a C55x MAC2 and a C62x MAC2
              2. Compare evolution C62, C64, C64+
              3. What is code footprint?
              4. What is compound instruction?
              5. What is a thick and thin operator (data-path)?
          3. "Creating FPGA-based Co-processors for DSPs using Model Based Designs..." Avnet, Xilinx, April 2009
          4. " Extreme Processing" Max Barron Instat/MDR, October 14, 2002
            1. This reference, while excellent, illustrates what we do not want to do. Max Barron used the term "extreme" because he delved into some architecture which were massively parallel and general purpose
            2. Here we consider solution (COP) as being customized for efficiency and specific to a task.
              1. note: efficiency can also mean parallelism
          5. "Accelerator Architecture" IEEE micro July/August 2008
          6. Anand balaram, Andrew Volk "Text coprocessor brings quality to CRT displays" EDN feb 17, 1983
            1. Including 80186-82730 interface
            2. Software interface: command block, screen characteristics interface, string pointer list and display data strings
          7. Stan Groves "standard interface keys processor design" Electronics Nov 17,1983
          8. Michael Cruess "The 68000 coprocessor interface an overview" Motorola document dated, june 8, 1982
            1. the author is Linked-in

              No comments:

              Post a Comment