The earliest record of digital filtering techniques (albeit on paper) was in solving problems of astronomy and the compilation of mathematical tables in the early 1600s.  The great mathematician Laplace (c.1779) understood the “z-transform,” the mathematical basis of modern digital signal processing.

During the Great Depression of the ’30s, the U.S. Bureau of Standards retained its surplus employees and set them to developing a variety of mathematical tools. Perhaps the most useful of these was a technique to evaluate the Fourier transform from a number of discrete data points, and using only multiplications and additions.

This Discrete Fourier Transform (DFT) technique lay dormant for a number of years before sampled-data control systems came into common usage. It was then realized that the Bureau’s technique could be directly applied to analyzing the frequency makeup (or spectral content) in these systems, and further, that this technique was ideal for use with computers, and later, digital signal processors. The successor to the DFT, the Fast Fourier Transform, or FFT, is a basic DSP algorithm employed in all forms of spectral analysis from seismic data processing and radar image processing to MP3 audio compression and Wi-Fi, DSL and WiMAX communication, and soon 4th-generation cellular (LTE).

In the early ’70s, scientists were beginning to use off-the-shelf TTL (transistor-transistor-logic) discrete logic chips to implement specialized DSP “engines.” The first systems were relatively slow and consumed lots of space, but the second generation of IC implementations (c.1974) began to use bit-slice logic, like Advanced Micro Devices’ Am2901 TTL 4-bit arithmetic logic unit (ALU). In 1973, TRW bid a military project with the first practical parallel multiplier designs for use with bit-slice ALUs and shipped the first working parts in 1975. But, at several hundred dollars just for the multiplier chip, only the military and government laboratories could afford the approach.

Originally used for implementing “super” minicomputers, the Am2901 found application as the heart of Digital Equipment Corporation’s DECsystem 2020, Data General’s Nova 4 and other mid-sized computer systems of the day. The 2901 and associated chips (consisting of address generators, carry look-ahead logic, program sequencers and fast multipliers—along with memory and I/O circuitry) constituted a basic “building-block” approach to implementing a fast digital signal processor.

In the late ’70s, some of the first commercial applications of DSP used the Am2901 chip family to implement array processors for medical diagnostic equipment like CT (computer tomography) scanners and nuclear magnetic resonance (NMR, now called magnetic resonance imaging, or MRI) systems. The high sales price of such systems justified the high cost of the building-block DSP technology.

For military applications, like radar image processing, the building-block approach proved to be ideal. Because little else was available, the 2901 family chips (and its successors) were also applied to other military DSP programs, such as sonar. Although other IC houses made their own bit-slice chips (and sequencers, etc.), the 290x family architecture has become obsolete (though later implemented as CMOS data-path elements in several ASIC libraries).

Probably the first single-chip implementation of a DSP algorithm was the TMS280 (later renamed the 281A) chip in Texas Instrument’s Speak & Spell™ learning aid introduced in 1978. Implementing Linear Predictive Coding (LPC) for speech synthesis, the device was not programmable, but was controlled by a separate microprocessor (TMS370) and a large ROM containing the library of digitized words. All three chips were implemented in PMOS dynamic logic. The Speak & Spell design team was headed by Gene Frantz (now a TI Principal Fellow). The idea for the product came from Frantz’ boss at the time, Paul Breedlove, who came up with the idea through a series of brainstorming sessions on how to use a hot technology of the day…bubble memories.

This was clearly a consumer product, a chip that retailed for $49.95 (instead of TI’s design goal of $29.95), rather than the thousands of dollars inherent in earlier military implementations. Speak & Spell was such a wild success that TI couldn’t meet demand, so it kept raising the price. However, it proved the commercial viability of DSP technology in a consumer product.

In 1978, American Microsystems Inc. (AMI) announced the first programmable integrated circuit designed specifically for digital signal processing, the 12-bit S2811, designed by Dick Blasco and his group under the direction of Bill Nicholson. Although of truly innovative circuit design, the chip was implemented in a radical “V-groove” MOS technology and never yielded volume commercial products.

Although AMI was the first to announce a single-chip DSP, Intel Corporation was the first company to actually begin shipping a product. In 1979, Intel introduced the Intel 2920 DSP chip, designed by Marcian (Ted) Hoff (famous for invention of what some count as the first single-chip MPU, the Intel 4004). Designed as a “drop-in” analog circuit replacement, complete with on-board A/D and D/A converters, the chip was called an “analog signal processor” by Intel; after all, it (digitally) processed analog signals. The 2920 did not have a parallel multiplier and was too slow (with a 600 ns cycle time) to perform useful work in the audio spectrum—where the initial high-volume DSP chip market was to eventually materialize. After lack of success elsewhere, the second wafer lot was sold out to U.S. Robotics for use in then-current 300 bps modems as adaptive equalizers. Although the 2920 was unsuccessful, Intel did not capitalize on the fact that (with the on-board A/D & D/A converters) this was the first single-chip codec—for which, Intel was awarded the patent.

It was in 1980 that NEC announced the first practical programmable single-chip DSP for the merchant market, the 16-bitµPD7720. Although hampered by primitive development tools, the 122-ns NMOS chip had a (two-cycle) on-chip parallel multiplier which was fast enough to perform useful “work” in the audio spectrum.

This began the first generation of “true” DSP chips, most based on the “Harvard” architecture which employs separate data and memory buses for better real-time operation. In the same year, AT&T’s Bell Laboratories introduced the DSP-1, which also had an on-chip parallel multiplier. But the chip was intended for captive use by AT&T’s manufacturing arm, Western Electric Company. Consequently, NEC was the first merchant market vendor of a practical DSP chip.

In 1979, Ed Caudell of Texas Instruments designed the initial architecture of what was later to become TI’s first DSP chip. Caudell was earlier involved in designing TI’s very popular TMS1000 8-bit MCU. The effort was under TI’s Microprocessor Microcomputer Products (MMP) group headed by Wally Rhines (now CEO of Mentor Graphics) and Jerry Rogers (MMP Design Manager). Known as the Signal Processing Computer (SPC) Program, John Hughes was the Program Manager and Tony Leigh was the Design Manager. Dr. Surendar Magar was hired in 1980 to optimize the SPC architecture around DSP algorithms. Dr. Magar’s Ph.D. was in Signal Processing and had been working for Plessey in the U.K. Soon after joining TI, Dr. Magar recommended the inclusion of a hardware multiplier, which was not in the original SPC specification. Wanda Gass (nee English) joined Magar as a key Design Engineer along with others and the full logic was complete by the end of 1980.

The resulting design was implemented in 3.0 um NMOS and introduced to the world in February, 1982 through Dr. Magar’s classic ISSCC (International Solid State Circuits Conference) paper. The final product, the TMS32010, was announced by Caudell in April, 1982 at the Paris, France ICASSP (International Conference on Acoustics, Speech and Signal Processing). The TMS32010 went into production in 1983 and the DSP Group at that time was headed by Dave French (later VP at Analog Devices Inc. and then CEO of Cirrus Logic Inc.).

TI’s early recognition of the potential of DSP carried it through seven years of “missionary” work before a profit was turned, and by then others realized that it was a market ripe for growth. Three other major semiconductor companies then joined TI and NEC in the programmable DSP chip market: AT&T Microelectronics (later Agere Systems, then merged with LSI Logic to become part of LSI Corp.), Motorola Semiconductor Products Sector (now Freescale Semiconductor) and Analog Devices Inc. Today, there are many other semiconductor houses that employ DSP technology in their products, but for the most part, those chips are not programmable by the user.

First-generation chips generally lacked parallelism, since at least two processor cycles were required to perform a complete multiply-accumulate (MAC) function, and limited on-chip memory required expensive (in terms of real estate and added memory) additions for most applications. The more primitive “engines” of the day (like conventional MCUs) required as many as 60 clock cycles to perform a multiply-accumulate operation.

Other chips offered to the market in this first-generation era were the ITT UDPI-01 (the first announced CMOS single-chip DSP, which never reached the sampling stage) and the Hitachi HD61810, another CMOS chip which saw mostly internal company use.

The most striking improvement in second-generation DSP chips was in the implementation of a single-cycle multiplier-accumulator (MAC), effectively doubling the bandwidth capability of the chips. Direct memory access (DMA) emerged as a way to quickly load new algorithms into the DSP chip. Enhancements to first-generation chips added serial communication ports and timers, and interrupt capability began to emerge for control applications. DSP instruction sets became richer, with event control capability, which further broadened chip utility—allowing true stand-alone capability for many more implementations.

The Fujitsu MB8764 (announced in 1983) was the first of this genre, followed by the TI TMS32020 (in 1985). The TMS32020 was the result of collaborative design efforts between Texas Instruments and their customer, ITT Corp. (then International Telephone and Telegraph Corp.). Dr. Surendar Magar of TI and Dr. Kristine Kneib of ITT were the principal architects of the -020.

Other single-chip DSPs were announced in this second-generation era by Toshiba (T6386/7), STC (DSP-128) and Matsushita (MN1901/9), but they were never successful in the merchant market. (The DSP-128 never sampled, according to our information.) During this time, AT&T continued internal DSP development with the DSP-2 chip. Thomson (now STMicroelectronics) introduced the ST68930/31 in 1986, but confined most marketing efforts to Europe.

Texas Instruments licensed the NMOS 32020 architecture to General Instrument Microelectronics (now Microchip Technology). GI moved the NMOS design into CMOS and in turn provided it back to TI which resulted in TI’s first CMOS DSP chip, the TMS320C25. The new chip was designed in Japan and principal designer of the -C25 was Takashi Takamizawa, later to become TI’s DSP Business Manager in Japan.

The first floating-point DSP chip from a major vendor reluctantly made it to market in this era. AT&T introduced itsDSP32 for internal use at 8 MFLOPS in 1984 and began to sell its DSP32 to the merchant market in July, 1986. Thrust into the merchant world by the divestiture of the Bell operating companies, AT&T did not have cohesive DSP-chip marketing direction until late 1987, when the company realized that their first integer DSP chip, the (third-generation-design) 18.2 MIPS DSP-16, could make the company a credible force in the merchant market.

The period of the third generation became the “glory years” for DSP volume shipments by TI, which had captured over 60% of the world single-chip DSP market by 1986. Third-generation changes centered mainly on reconfigurable memory, with flexibility of on- and off-chip memory that could be variously configured for program, data, or coefficients. The degree of parallelism increased even further, with as many as three operations performed in a single clock cycle. Further-expanded instruction sets became evident. In late 1986, Zoran Corp. introduced the first single-chip DSP with CISC-like vector instructions to efficiently perform FFT functions for military applications.

Analog Devices introduced the ADSP-2100 chip, which was unique in that it had a 24-bit instruction word, 16-bit data paths, and had no on-board memory. But, it was designed to access two words of external data on every cycle and had an instruction set optimized to perform FFTs and zero-overhead loops. The ADSP-2100 and its faster successor, the ADSP-2100A, found heavy use in military and imaging applications, while newer members of the family, the ADSP-2101 and ADSP-2105 (both with on-board memory) saw a wider variety of applications.

The Motorola 56000 family of 24-bit chips (for both instructions and data paths) was the first integer DSP optimized for high-fidelity audio applications, and found early acceptance in professional audio processing and music synthesis. Other third-generation integer DSP chips included those from AT&T (the DSP-16A at 40 MIPS by 1988), Hitachi (the DSP-I, sold only in Japan) and TI (TMS320C50).

Coincident with the era of the third-generation of integer DSP chips, additional first-generation floating-point DSP chips were announced, including improved units from AT&T (the 25 MFLOP CMOS DSP32C was announced in 1987), Texas Instruments (TMS320C30), Zoran (ZR34325), Fujitsu (MB86232), STMicroelectronics (ST18940/41), NEC (µPD77230) and Oki (MSM699210).

The emergence of a fourth generation of DSP chips was heralded by announcements made in the early ’90s. Several fourth-generation integer DSP chips were characterized by on-chip codec circuitry, like Motorola’s DSP56156.

As the fourth generation evolved, geometries progressed to submicron (0.8 µm) levels and multiply-accumulate times continued to fall. Additional CISC instructions to accommodate key algorithms became evident for some chip designs.

Second-generation floating-point chips emerged coincident with the introduction of fourth-generation integer chips: AT&T’s DSP3210, Motorola’s DSP-96002, NEC’s µPD77240, TI’s TMS320C40, and somewhat later, Analog Devices’ ADSP-21020.

The fifth generation of DSP chips began in 1994 with the Texas Instruments’ TMS320C54xx family. Introduced with a 20ns (50 MIPS) speed, it was TI’s first chip with a Viterbi accelerator, and the successor to its popular C25 and C50 families. With the accelerator, the chip was clearly intended for communication applications (like modems).

But, from a public relations standpoint, the C54 family was overshadowed by the 1985 sampling of TI’s TMS320C80 (earlier termed a Multimedia Video Processor—MVP). The C80 consisted of four 64-bit DSP cores along with a 32-bit RISC core. A less-capable, but cheaper version, the C82, was introduced with two DSP cores and the RISC core. Although an extremely powerful chip family, the C80′s programming complexity confined the bulk of its applications to sophisticated image processing, and the chip never achieved general industry acceptance.

Motorola Semiconductor’s Data Communications Operation melded its DSP56002 DSP core with a 68302 microprocessor (MPU) core on the same die, the M68356 “Signal Processing Communications Engine.” In a similar pairing, TI joined its C54 core with an ARM RISC core on a single die and found success in tens of millions digital cellphones.

Although multiple-MAC designs were earlier available on specialized 8-bit video filter chips like the Inmos (acquired by STMicroelectronics) A121 and Zoran ZR33881, 16-bit programmable DSPs also began taking on multiple MACs late in this generation. Half-micron geometries led to sub-20 ns multiply-accumulate times—across several MACs or several DSP cores, leading to substantially higher bandwidth capability.

Continued improvements of second-generation floating-point DSP chips were introduced coincident with the fifth generation of fixed-point chips. Texas Instruments introduced the TMS320C32 and TMS320C44 chips, while Analog Devices introduced the ADSP-21060 SHARC™ (Super Harvard Architecture Computer).

The late-‘90s era of 0.35 um CMOS geometries saw the first introductions of VLIW and superscalar architectures for DSP.

Texas Instruments’ TMS320C6201 was the first user-programmable VLIW DSP chip available. Employing eight ALUs, two of which had MACs, the ‘C62 was initially capable of executing 1,600 raw MIPS and 400 DSP MIPS (MMACS) at 200 MHz. The VLIW approach requires an optimizing C-language compiler, and TI invested heavily in developing an efficient compiler. Until the family later moved to 0.18 um geometries, power consumption was a problem. The ‘C62x family was announced in February 1997 and began to ship in moderate volumes in Q1/98.

Lucent Technologies (later Agere Systems) introduced the DSP16000 family of 16-bit DSPs which featured dual ALUs and dual MACs. The 16000 was optimized for low power consumption and for bank speech coding applications such as required in cellular base stations or Internet Protocol (IP) telephony gateways. Initially rated at 400 MIPS (@200 MHz), the 16000 could go head-to-head with TI’s C62x family in applications where there was no need for the extra (non-DSP) MIPS provided by the TI chip. The DSP16000 began sampling in November 1997 and in 1998, Lucent began shipping the DSP16410, a chip consisting of two 16000 cores on a die. The DSP16410 has been a favorite in GSM cellular base station implementations.

In 1998, a startup company, ZSP Corporation, began sampling its ZSP16400 family of DSPs, which like the Lucent product, had dual ALUs and dual MACs. However, the ZSP family was based on a 4-issue superscalar architecture and employed a different (proprietary) approach to feeding data to multiple MACs. Initially rated at 400 MIPS (@200 MHz), the ZSP design was acquired by LSI Logic Corp. in mid-1999, and volume shipments of the renamed ZSP400 family began in Q3/99. LSI Logic sold both ZSP chips and licensed ZSP cores to a number of companies. The ZSP chip operations were sold in mid-2006 by LSI Logic to Verisilicon Corp., which has since expanded the product offerings.

Third-generation floating-point chipsemerged in this time frame, beginning with TI’s C67x family based on the VLIW architecture of the fixed-point C62x family. The family is source-code compatible with the C62x and started with a 1-GOPS version. Analog Devices announced its own dual-ALU/dual-MAC product, the floating-point ADSP21100 (“Hammerhead”) family, which was code-compatible with its ADSP21000 (SHARC) family of products. Sampling began in Q4/99. Because of its code compatibility, the Hammerhead presented an instant upgrade for existing sockets.

In 2001, TI introduced a formal pairing of its C55-family DSP core and ARM900-family RISC cores on a single die, formalized as its OMAP™ product family (said to have evolved from Open Multimedia Applications Processor). For reasons explained later in this report, the pairing of DSP and RISC, rather than a single processor architecture for both, has considerable merit in many applications.

By 2003, 0.15 um DSP chips became commonplace and 0.13 um chips were in volume production. And DSP cores become an ever smaller percentage of the die area as peripherals and (mostly) on-board memory begin to dominate the silicon die.

By early 2007, 65-nm versions of Texas Instruments’ C55 family were shipping in volume as part of the OMAP™ family, which became the market leader due to its deployment in hundreds of millions of cellphones, annually. But, the company’s1-GHz C64 VLIW became TI’s flagship “catalog” product. One member of the C64 family includes both Viterbi and turbocoding accelerators, clearly targeting the cellular base station market. The C64 family has since been expanded to three or more cores on a single die, again addressing the base station market. Other C64 implementations employ MPEG4 and graphics accelerators for video multimedia applications, under the DaVinci™ family name.

Motorola has fielded its MSC8144 chip, successor to those first announced in Q4/01. Based on four StarCore VLIW cores (each with four ALUs & MACs) and 11.5 Mbits of on-chip memory, the earlier MSC8122 chip was originally introduced at 300 MHz, but the successor MSC8144 is now shipping at over 1GHz, and at introduction was rated by Berkeley Design Technology Inc. (BDTI) as the fastest available DSP processor, even as a single-chip implementation.

Formally announced in Q3/01, Analog Devices’ TigerSHARC™, a floating-point VLIW design, was also targeted (unsuccessfully) for the cellular base station market. Uniquely, in addition to traditional “symbol-rate” baseband processing, the chip can also perform high-speed “chip-rate” signal processing functions (required for CDMA cellular operation) that competitors generally assign to ASIC or FPGA implementations. The chip provided a significant jump in performance over the earlier Hammerhead, but it is not code compatible with the ADSP21000 product family, so new tools were required to develop products based on the higher performance product.

In a similar vein, Analog Devices introduced the fixed-point Blackfin™ DSP family, based on the “Frio” core jointly developed with Intel Corporation. The first chip (the ADI21535) was announced in mid-2001 at a 300MHz clock rate. It was priced at $27 @10K units. The initial superscalar design employed two 16-bit MACs and two 40-bit ALUs and four 8-bit video ALUs, clearly targeting multimedia applications. By mid-2003, the chip family began sampling at 600 MHz (1.2 GMACS), and is currently available at 750 MHz. The architecture is said to scale to at least 1GHz, a speed that ADI’s older 219x devices would be unlikely to achieve. As with the TigerSHARC, it is not code compatible with the earlier 2100 family devices, so the company has developed new tools to support the new architecture. However, with the 300 MHz version later priced at $4.95 @10K units, the chip achieved strong early market attention.

Intel chose to employ the Frio core in SoC (System on Chip) products under its PCA (Personal Internet Communications Architecture) banner. Rather than use the DSP nomenclature (a term which they equate to TI), Intel chose to call the Frio technology “Micro Signal Architecture.” The Frio architecture became part of Intel’s cellular baseband chip, code-named “Hermon” (after Mount Hermon, the highest mountain in Israel), which (along with Intel’s XScale RISC product line) was sold to Marvell Semiconductor in late 2006. Marvell has since expanded on the initial architecture, now offering a cellular baseband chip code-named “Tavor” (after the second highest mountain in Israel). Tavor found a home in RIM’s Blackberry “Bold” 3G cellphone introduced in late 2008.

Agere Systems introduced (Q2/03) the DSP16411, a 0.13 um successor to the earlier (Lucent) DSP16410 that has been popular in GSM base stations. Clearly, the faster, and code-compatible, 16411 is a natural for retrofitting GSM base stations to GPRS capability. The operation became a division of LSI Corp. shipping its Trident HP chip based on its workhorse DSP16000 and ARM7TDMI cores for the GSM/GPRS/EDGE cellphone market. The cellphone chip operation was later sold to Infineon, but LSI Corp. continues serving the cellular base station market with its Starpro line of multicore chips based on the StarCore DSP core jointly developed with (then) Motorola’s Semiconductor Division.

VeriSilicon now offers its new ZSP600 family as licensed cores. Based on a 6-issue superscalar architecture with 4 MACs running at up to 300 MHz in 0.13um CMOS, the chip occupies a unique niche. Although earlier designs, like the ZSP400 and ZSP500 were also sold as chips by LSI Logic, VeriSilicon only licenses the IP, in addition to applying the cores in its own ASIC chip designs.

The designs of programmable DSP chips continue to evolve, with VLIW becoming the base architecture of choice for most of the highest performance discrete chips. But all discrete DSP chip vendors are now treating their basic engines as ASIC cores, as central elements in an ASSP (application-specific standard product), like a digital still camera chip, or for a customer-specific (usually high-volume) design like a cellphone baseband chip.

Another clear trend is that of licensable RISC cores have evolved to incorporate ever-increasing DSP functionality, either through customizable instruction set architectures or through the addition of SIMD augmentation.

The trend toward ASSPs for vertical markets, like cellphones, cameras and personal media players continues, with off-the-shelf discrete DSPs becoming a diminishing percentage of the (still-growing) DSP-centric silicon market.