Inserting a few simple floating point operations between the SPI transfers provides a quick-n-dirty look at the timings:

The corresponding code runs in the ADC end-of-conversion handler:
void adc0_isr(void) { digitalWriteFast(ANALOG_PIN,HIGH); AnalogSample = adc->readSingle(); // fetch just-finished sample SPI.beginTransaction(SPISettings(8000000, MSBFIRST, SPI_MODE0)); digitalWriteFast(DDS_FQUD_PIN, LOW); SPI.transfer(DDSBuffer.Phase); // interleave with FM calculations FlipPin(GLITCH_PIN); TestFreq += DDSStepFreq; FlipPin(GLITCH_PIN); SPI.transfer(DDSBuffer.Bits31_24); TestFreq -= DDSStepFreq; SPI.transfer(DDSBuffer.Bits23_16); TestFreq *= DDSStepFreq; SPI.transfer(DDSBuffer.Bits15_8); FlipPin(GLITCH_PIN); TestFreq /= DDSStepFreq; FlipPin(GLITCH_PIN); SPI.transfer(DDSBuffer.Bits7_0); SPI.endTransaction(); // do not raise FQ_UD until next timer tick! digitalWriteFast(ANALOG_PIN,LOW); }
The FlipPin()
function twiddling the output bit takes a surprising amount of time, as shown by the first two gaps in the blocks of SPI clocks (D4). Some cursor fiddling on a zoomed scale says 300 ns = 50-ish cycles for each call. In round numbers, actual code doing useful work will take longer than that.
Double precision floating add / subtract / multiply seem to take about 600 ns. That’s entirely survivable if you don’t get carried away.
Double precision division, on the other paw, eats up 3 μs = 3000 ns, so it’s not something you want to casually plunk into an interrupt handler required to finish before the next audio sample arrives in 20 μs.
Overall, the CPU utilization seems way too high for comfort, mostly due to the SPI transfers, even without any computation. I must study the SPI-by-DMA examples to see if it’s a win.