Arduino Hardware-assisted SPI: Synchronous Serial Data I/O

Many interesting projects require more digital output bits than the Arduino hardware can support. You then use 74HC595 serial-in/parallel-out chips and that tutorial pretty well explains how it works. The shiftOut() library function squirts a byte out through an arbitrary pin, leading with either the high or low bit.

Software SPI: Clock and Data
Software SPI: Clock and Data

Just drop one byte into shiftOut() for each ‘595 lined up on your board. Remember to latch the bits (LOW-to-HIGH on RCK @ pin 12 of the ‘595) and enable the output drivers (LOW on -G @ pin 13, similarly) when you’re done sending everything. You can have separate latches-and-enables for each ‘595 if that suits your needs, although then you once again run out of Arduino bits pretty quickly. It’s entirely possible to devote a ‘595 to latches-and-enables for the rest of the chain, but that gets weird in short order.

The scope shot shows that shiftOut() ticks along at 15 µs per bit (clock in the upper trace, data in the lower trace). For back-of-the-envelope purposes, call it 8 kB/s, which is probably less than you expected. If you have a string of 5 external bytes, as I did on a recent project, that’s only 1600 updates / second. It was part of a memory board reader & EPROM programmer: reading an 8 kB ROM chip requires two shift-register runs (one to set the address & data, one to read in the chip output), so the overall rate was on the order of 10 seconds per pass and much worse for programming. You can optimize the number of bits by not shifting out all the bytes, but that’s the general idea.

Because ‘595 chips are output-only, in order to get 8 bits of data into the Arduino board, add a 74HC166 parallel-in/serial-out chip to the string. Alas, shiftOut() doesn’t know about input bits, so you’re on your own.

Hardware SPI: Clock and Data
Hardware SPI: Clock and Data

If you’re going to have to write some code to get input bits anyway, you may as well use the ATmega168 (and its ilk) hardware SPI as it was intended to be used: for high-speed synchronous serial I/O. This scope shot shows the SPI clock (in the top trace again) ticking along at 1 µs per bit, which is 1/16 the Diecimila’s oscillator frequency. You can pick any power of two between 1/2 and 1/128; I used 1/16 because it’s fast enough to make the rest of the software the limiting factor, while slow enough to not require much attention to layout & so forth.

Start by Reading The Fine Manual section about the ATmega168’s SPI hardware, starting at page 162.

The pin definitions, being lashed to internal hardware, are not optional. Note that SCK is also the standard Arduino LED, which won’t be a problem unless you need a tremendous amount of drive for a zillion ‘595s. I stuck an additional LED on Arduino digital pin 2.

#define PIN_HEARTBEAT     2             // added LED
#define PIN_SCK          13             // SPI clock (also Arduino LED!)
#define PIN_MISO         12             // SPI data input
#define PIN_MOSI         11             // SPI data output

Initial hardware setup goes in the usual setup() function:

pinMode(PIN_SCK,OUTPUT);       // set up for "manual" SPI directions

pinMode(PIN_MISO,INPUT);       // configure inputs

SPCR = B01110001;              // Auto SPI: no int, enable, LSB first, master, + edge, leading, f/16
SPSR = B00000000;              // not double data rate

Basically, the “manual” setup allows you to wiggle the bits by hand with the hardware SPI control disabled.

Arduino Hardware SPI Schematic
Arduino Hardware SPI Schematic

Here’s a chunk of the schematic so you can see how the bits rattle around. You’ll surely want to click it to get the details…

I put the data in a structure that matches the shift register layout, with the first byte (Controls) connected to the ATmega’s MOSI pin and the last byte (DataIn) connected to MISO. The SCK pin drives all of the serial clock pins on the ‘595 and ‘166 chips in parallel. Your structure will certainly be different; this was intended to suck data from a Tek 492 Spectrum Analyzer memory board.

typedef struct {      // external hardware shift register layout
 byte Controls;       // assorted control bits
 word Address;        // address value
 byte DataOut;        // output to external devices
 byte DataIn;         // input from external devices

SHIFTREG Outbound;    // bits to be shifted out
SHIFTREG Inbound;     // bits as shifted back in

The functions that make it happen are straightforward:

void TogglePin(char bitpin) {
void PulsePin(char bitpin) {

void EnableSPI(void) {
 SPCR |= 1 << SPE;

void DisableSPI(void) {
 SPCR &= ~(1 << SPE);

void WaitSPIF(void) {
 while (! (SPSR & (1 << SPIF))) {
//        TogglePin(PIN_HEARTBEAT);       // use these for debugging!
//        TogglePin(PIN_HEARTBEAT);

byte SendRecSPI(byte Dbyte) {             // send one byte, get another in exchange
 SPDR = Dbyte;
 return SPDR;                             // SPIF will be cleared

void CaptureDataIn(void) {                // does not run the shift register!
 digitalWrite(PIN_ENABLE_SHIFT_DI,LOW);   // allow DI bit capture
 PulsePin(PIN_SCK);                       // latch parallel DI inputs
 digitalWrite(PIN_ENABLE_SHIFT_DI,HIGH);  // allow DI bit shifting

void RunShiftRegister(void) {
 EnableSPI();                             // turn on the SPI hardware

 Inbound.DataIn  = SendRecSPI(Outbound.DataIn);
 Inbound.DataOut = SendRecSPI(Outbound.DataOut);

 Inbound.Address  =         SendRecSPI(lowByte(Outbound.Address));
 Inbound.Address |= ((word) SendRecSPI(highByte(Outbound.Address))) << 8;

 Inbound.Controls = SendRecSPI(Outbound.Controls);

 PulsePin(PIN_LATCH_DO);                   // make new shift reg contents visible

 DisableSPI();                             // return to manual control

Actually using the thing is also straightforward. Basically, you put the data-to-be-sent in the Outbound variables and call RunShiftRegister(), which drops output bytes into SPDR and yanks incoming bytes out, then stuffing them in the Inbound variables. I have separate latch controls for the Controls, Address, and Data chips, although I don’t use them separately here.

You must wiggle the parallel latch enable line on the 74HC166 chip before shifting to capture the data, as shown in CaptureDataIn(). That chip also requires a separate pulse on its serial clock line to latch the data, which you do manually with the hardware SPI disabled. If you’re paying attention, you’ll wonder if that clock pulse also screws up the data in the rest of the chips: yes, it does. If this is a problem, you must add some external clock-gating circuitry, disable the ‘595s, or pick a different input shift register chip; it wasn’t a problem for what I was doing.

Here’s a function that reads data from a RAM chip on the Tek memory board, so it must write the address and read the RAM chip’s output. The PIN_DISABLE_DO bit controls the output buffers on the ‘595 that drives the RAM’s data pins; they must be disabled to read data back from the RAM. Don’t worry about the other undefined bits & suchlike; just assume everything does what the comments would have you believe.

byte ReadRAM(word Address) {
 digitalWrite(PIN_DISABLE_DO,HIGH);            // turn off data latch output
 digitalWrite(PIN_BUS_READ,HIGH);              // allow RAM read access

 Outbound.Controls |=  CB_BUS_CLKPH2_MASK;     // set up RAM -CS gate
 Outbound.Address = Address;
 Outbound.DataOut = 0x55;                      // should not be visible

 digitalWrite(PIN_BUS_N_SYSRAM,LOW);           // activate RAM -CS
 CaptureDataIn();                              // latch RAM data
 digitalWrite(PIN_BUS_N_SYSRAM,HIGH);          //  ... and turn -CS off

 Outbound.Controls &= ~CB_BUS_CLKPH2_MASK;     // disable -CS gate
 RunShiftRegister();                           // tell the board and get data

 return Inbound.DataIn;
Hardware SPI - Detail of clock and data timing
Hardware SPI - Detail of clock and data timing

Here’s a detailed shot of the outbound bit timing. Notice that the upward clock transitions shift bits into the ‘595 and ‘166 chips, while the SPI output data changes on the downward transitions. You can tweak that to match your hardware if you’re using different shift register chips, by messing with the SPCR settings.

Bottom line: using the ATmega168 hardware SPI provided a factor-of-15 speedup and serial digital input, too.