Arduino Hardware-assisted SPI: Synchronous Serial Data I/O

Many interesting projects require more digital output bits than the Arduino hardware can support. You then use 74HC595 serial-in/parallel-out chips and that tutorial pretty well explains how it works. The shiftOut() library function squirts a byte out through an arbitrary pin, leading with either the high or low bit.

Software SPI: Clock and Data
Software SPI: Clock and Data

Just drop one byte into shiftOut() for each ‘595 lined up on your board. Remember to latch the bits (LOW-to-HIGH on RCK @ pin 12 of the ‘595) and enable the output drivers (LOW on -G @ pin 13, similarly) when you’re done sending everything. You can have separate latches-and-enables for each ‘595 if that suits your needs, although then you once again run out of Arduino bits pretty quickly. It’s entirely possible to devote a ‘595 to latches-and-enables for the rest of the chain, but that gets weird in short order.

The scope shot shows that shiftOut() ticks along at 15 µs per bit (clock in the upper trace, data in the lower trace). For back-of-the-envelope purposes, call it 8 kB/s, which is probably less than you expected. If you have a string of 5 external bytes, as I did on a recent project, that’s only 1600 updates / second. It was part of a memory board reader & EPROM programmer: reading an 8 kB ROM chip requires two shift-register runs (one to set the address & data, one to read in the chip output), so the overall rate was on the order of 10 seconds per pass and much worse for programming. You can optimize the number of bits by not shifting out all the bytes, but that’s the general idea.

Because ‘595 chips are output-only, in order to get 8 bits of data into the Arduino board, add a 74HC166 parallel-in/serial-out chip to the string. Alas, shiftOut() doesn’t know about input bits, so you’re on your own.

Hardware SPI: Clock and Data
Hardware SPI: Clock and Data

If you’re going to have to write some code to get input bits anyway, you may as well use the ATmega168 (and its ilk) hardware SPI as it was intended to be used: for high-speed synchronous serial I/O. This scope shot shows the SPI clock (in the top trace again) ticking along at 1 µs per bit, which is 1/16 the Diecimila’s oscillator frequency. You can pick any power of two between 1/2 and 1/128; I used 1/16 because it’s fast enough to make the rest of the software the limiting factor, while slow enough to not require much attention to layout & so forth.

Start by Reading The Fine Manual section about the ATmega168’s SPI hardware, starting at page 162.

The pin definitions, being lashed to internal hardware, are not optional. Note that SCK is also the standard Arduino LED, which won’t be a problem unless you need a tremendous amount of drive for a zillion ‘595s. I stuck an additional LED on Arduino digital pin 2.

#define PIN_HEARTBEAT     2             // added LED
#define PIN_SCK          13             // SPI clock (also Arduino LED!)
#define PIN_MISO         12             // SPI data input
#define PIN_MOSI         11             // SPI data output

Initial hardware setup goes in the usual setup() function:

pinMode(PIN_SCK,OUTPUT);       // set up for "manual" SPI directions

pinMode(PIN_MISO,INPUT);       // configure inputs

SPCR = B01110001;              // Auto SPI: no int, enable, LSB first, master, + edge, leading, f/16
SPSR = B00000000;              // not double data rate

Basically, the “manual” setup allows you to wiggle the bits by hand with the hardware SPI control disabled.

Arduino Hardware SPI Schematic
Arduino Hardware SPI Schematic

Here’s a chunk of the schematic so you can see how the bits rattle around. You’ll surely want to click it to get the details…

I put the data in a structure that matches the shift register layout, with the first byte (Controls) connected to the ATmega’s MOSI pin and the last byte (DataIn) connected to MISO. The SCK pin drives all of the serial clock pins on the ‘595 and ‘166 chips in parallel. Your structure will certainly be different; this was intended to suck data from a Tek 492 Spectrum Analyzer memory board.

typedef struct {      // external hardware shift register layout
 byte Controls;       // assorted control bits
 word Address;        // address value
 byte DataOut;        // output to external devices
 byte DataIn;         // input from external devices

SHIFTREG Outbound;    // bits to be shifted out
SHIFTREG Inbound;     // bits as shifted back in

The functions that make it happen are straightforward:

void TogglePin(char bitpin) {
void PulsePin(char bitpin) {

void EnableSPI(void) {
 SPCR |= 1 << SPE;

void DisableSPI(void) {
 SPCR &= ~(1 << SPE);

void WaitSPIF(void) {
 while (! (SPSR & (1 << SPIF))) {
//        TogglePin(PIN_HEARTBEAT);       // use these for debugging!
//        TogglePin(PIN_HEARTBEAT);

byte SendRecSPI(byte Dbyte) {             // send one byte, get another in exchange
 SPDR = Dbyte;
 return SPDR;                             // SPIF will be cleared

void CaptureDataIn(void) {                // does not run the shift register!
 digitalWrite(PIN_ENABLE_SHIFT_DI,LOW);   // allow DI bit capture
 PulsePin(PIN_SCK);                       // latch parallel DI inputs
 digitalWrite(PIN_ENABLE_SHIFT_DI,HIGH);  // allow DI bit shifting

void RunShiftRegister(void) {
 EnableSPI();                             // turn on the SPI hardware

 Inbound.DataIn  = SendRecSPI(Outbound.DataIn);
 Inbound.DataOut = SendRecSPI(Outbound.DataOut);

 Inbound.Address  =         SendRecSPI(lowByte(Outbound.Address));
 Inbound.Address |= ((word) SendRecSPI(highByte(Outbound.Address))) << 8;

 Inbound.Controls = SendRecSPI(Outbound.Controls);

 PulsePin(PIN_LATCH_DO);                   // make new shift reg contents visible

 DisableSPI();                             // return to manual control

Actually using the thing is also straightforward. Basically, you put the data-to-be-sent in the Outbound variables and call RunShiftRegister(), which drops output bytes into SPDR and yanks incoming bytes out, then stuffing them in the Inbound variables. I have separate latch controls for the Controls, Address, and Data chips, although I don’t use them separately here.

You must wiggle the parallel latch enable line on the 74HC166 chip before shifting to capture the data, as shown in CaptureDataIn(). That chip also requires a separate pulse on its serial clock line to latch the data, which you do manually with the hardware SPI disabled. If you’re paying attention, you’ll wonder if that clock pulse also screws up the data in the rest of the chips: yes, it does. If this is a problem, you must add some external clock-gating circuitry, disable the ‘595s, or pick a different input shift register chip; it wasn’t a problem for what I was doing.

Here’s a function that reads data from a RAM chip on the Tek memory board, so it must write the address and read the RAM chip’s output. The PIN_DISABLE_DO bit controls the output buffers on the ‘595 that drives the RAM’s data pins; they must be disabled to read data back from the RAM. Don’t worry about the other undefined bits & suchlike; just assume everything does what the comments would have you believe.

byte ReadRAM(word Address) {
 digitalWrite(PIN_DISABLE_DO,HIGH);            // turn off data latch output
 digitalWrite(PIN_BUS_READ,HIGH);              // allow RAM read access

 Outbound.Controls |=  CB_BUS_CLKPH2_MASK;     // set up RAM -CS gate
 Outbound.Address = Address;
 Outbound.DataOut = 0x55;                      // should not be visible

 digitalWrite(PIN_BUS_N_SYSRAM,LOW);           // activate RAM -CS
 CaptureDataIn();                              // latch RAM data
 digitalWrite(PIN_BUS_N_SYSRAM,HIGH);          //  ... and turn -CS off

 Outbound.Controls &= ~CB_BUS_CLKPH2_MASK;     // disable -CS gate
 RunShiftRegister();                           // tell the board and get data

 return Inbound.DataIn;
Hardware SPI - Detail of clock and data timing
Hardware SPI - Detail of clock and data timing

Here’s a detailed shot of the outbound bit timing. Notice that the upward clock transitions shift bits into the ‘595 and ‘166 chips, while the SPI output data changes on the downward transitions. You can tweak that to match your hardware if you’re using different shift register chips, by messing with the SPCR settings.

Bottom line: using the ATmega168 hardware SPI provided a factor-of-15 speedup and serial digital input, too.

15 thoughts on “Arduino Hardware-assisted SPI: Synchronous Serial Data I/O

  1. Wow, that’s an incredibly helpful post! Thank you!
    I’m in the process of putting together a persistance of vision-style project using LED strips with shift registers, and I think this info will have upped my color resolution by a factor of 10 or so. Brilliant!

    1. Excellent! Glad to help…

      The SPI information isn’t exactly a secret, but until you spend some Quality Time with the hardware data book, it’s really hard to get it working.

      Beware the gotcha with inadvertently setting Slave mode; see the cross-link in Comment 2 for more details. I stumbled hard over that one…

  2. Very interesting paper – thanks. Do you think it would be possible to use the SPI interface to read a stream of high speed data say clocked at 1MHz and push the data out through the USB port?

    best wishes

    1. Definitely not…

      clocked at 1MHz

      The SPI can run at a blistering pace, but the software plucking the incoming bytes from the hopper isn’t all that fast: the overall rate won’t be nearly as high as you’d expect. A 1 MHz bit rate means a byte arrives every 8 us and a 16 MHz processor has only 128 instructions between bytes!

      push the data out through the USB port

      Definitely not. Remember that the FT232 chip is pretending to be a serial port, so you get a start/stop bit pair wrapped around each byte. Not only is the outgoing data rate limited to (I think) 115.2 kb/s, the actual data rate is much slower.

      I think an Arduino is the wrong hammer for that job. In fact, I’m not sure any microcontroller is the right hammer; there just isn’t enough CPU to go around without some external hardware assistance. And, when you need external hardware, a lot of the motivation for using a single-chip microcontroller evaporates…

  3. Thanks for the prompt answer! I thought that I would be pushing my luck using an Arduino. I’m new to all this digital electronics so to a large extent I am feeling my way; there are so many bits of kit which might do the job from custom chip sets to CLPDs and FPGAs – my head hurts.
    “when you need external hardware, a lot of the motivation for using a single-chip microcontroller evaporates…” – my motivation is cost! :-)

    If you can think of a hammer for 1MHz Manchester encoded data please let me know.

    best wishes

    1. 1MHz Manchester encoded data

      I think the bottleneck is on the output side: you’re transliterating a bit stream and trying to stuff it into a USB-shaped hole. The real bottleneck will be on that end, so I think you’d be best off looking at the microcontrollers with built-in USB support and great support for isochronous transfers.

      This definitely isn’t going to be well-handled by a chip that’s supposed to look like a mouse or a keyboard!

  4. Things have changed a little in the last year. Try a MIPS32 processor running at 80 MHz with built in USB2.0 (OTG) as well as an Ethernet interface (MII output, so it needs an external PHY for 10/100 Base-T), the Microchip PIC32. Several built in SPI ports, serial ports, USB, Ethernet, A/D and more and fairly inexpensive. Great chip for use with an FPGA coprocessor, but beware the parallel port. Nothing wrong with it except it’s a “peripheral” and you have to use a dummy read for inputs as each read gets data from the previous address.

    1. Microchip PIC32

      Now, that is an example of excellent marketing: a new member of the family having absolutely nothing in common with the “original” PIC architecture other than the name!

      Sounds like a good hammer for the job, except for one thing: it lacks the whole Arduino vibe. As we all know, in this day & age you can’t do an embedded system without using an Arduino… [grin]

      I expect the Arduino project to come out with something like that in a few years, as people start bumping into the upper limits of what the ATmega series can handle. Only time will tell.

      1. Indeed, indeed. I have been a fan of Atmels 8051 products for years and can see them expanding their portfolio to cover more ground vis the AVR family the Arduino is based on. The PIC32 is more suited for Simons problem with high speed SPI for streaming applications, anyway. Still waiting for decent 200MHz plus 8051 flavor myself :P

        1. Amazing how those ancient architectures from back in the days when transistors were a countably finite resource have such staying power. Of course, the fact that the entire die is barely visible allows us to have 50-cent microcontrollers in 8-pin packages, which you couldn’t do with a fancier architecture.

          The first step towards computronium will probably be a Pentium with 64 MB of RAM in a speck the size of a grain of salt…

Comments are closed.