The OLED display has a noticeable delay between writing the first (double-size) line of text and the last line, which seemed odd:

The top trace in this scope shot goes high while the code begins the display update, which involves converting the variable to strings, the characters to bitmaps, then writing the data to the display:

The bottom trace shows I²C bus activity pretty much blots up all the time, with very little required for the computations in between the display writes for each text line.
Near the leading edge of the top trace, the code computes the new delta phase value and the X axis DAC output corresponding to that frequency:
TestCount.fx_64 = MultiplyFixedPt(ScanFreq,CtPerHz); // compute DDS delta phase TestCount.fx_32.low = 0; // truncate count to integer TestFreq.fx_64 = MultiplyFixedPt(TestCount,HzPerCt); // compute actual frequency Temp.fx_64 = (DAC_MAX * (ScanFreq.fx_64 - ScanFrom.fx_64)) / ScanWidth.fx_32.high; XAxisValue = Temp.fx_32.high; WriteDDS(TestCount.fx_32.high); // set DDS to new frequency XAxisDAC.setVoltage(XAxisValue,DAC_WR); // and set X axis to match
The burst in the top trace shows the five SPI writes to the DDS (one pulse per byte, with the hardware handling the serialization) and the bottom trace shows four I²C bus writes to the DAC:

A bit more detail shows writing each I²C byte to the DAC requires nine clock pulses (8 data, 1 ack):

The I²C bus ticks along at 400 kHz, with each byte requiring 33.4 µs (including the mandatory downtime around each burst), so the DAC update requires about 100 µs. The MCP4725 datasheet suggests a three byte “fast mode” write, but there’s not much point in doing so for my simple needs.
The display ticks along at the same pace with far more data.
In round numbers, the entire display update hits 6 text lines (1 double-height + 4 single-height) × 16 characters / line × 64 pixels / character = 6144 pixels.
The first scope shot shows the update requires something close to 90 ms, which allows for 2700 bytes = 90 ms / 33.4 µs, the equivalent of 21 k pixels. The SH1106 hardware includes an internal address counter, so there’s no need to transfer an address with each byte; I’m not sure where the factor-of-two overhead goes.
In order to get a faster update, there’s a definite need for lazy screen updates: no writes when there’s no change.
This probably doesn’t matter, because I can’t watch much faster, but it’s good to know the fancy fixed-point arithmetic isn’t the limiting factor.
+1 ? ;-)