Thing-O-Matic: Thermal Runaway!

Well, it finally happened: the Extruder Controller jammed the extruder heater full on and the Thermal Core temperature went on an uncontrolled rise.

This changes the thermal runaway scenario from “It can’t happen here” to “Once is happenstance“.

I’d been tweaking an OpenSCAD model, slicing it, not liking the results shown in Skeinlayer, re-tweaking, re-slicing, and iterating around that loop for quite some time. When I figured I was close to having a good model & G-Code, I turned on the heaters to get the printer ready; heating from a cold start requires about 12 minutes due to the double aluminum build plates and hulking cartridge heater adapters.

I made several (well, many) more iterations through OpenSCAD and slicing, flipped to the Control Panel, and discovered the Thermal Core temperature was passing through 285 °C on the way up. Now, the temperature doesn’t rise abruptly, but it was already far higher than the original 210 °C setpoint; you cannot set an Extruder temperature over 260 °C (!) without passing through a confirmation dialog.

Oddly, the setpoint temperatures for both the Extruder and HBP showed 9 °C. The build platform was cooling off, as you would expect for a setpoint far below the actual temperature, but the Thermal Core heater was jammed on.

An LED indicator on the Z stage shows when the EC switches the cartridge heater on. That LED was lit, so I knew the EC had gone nuts: the heater was on, even though the Thermal Core temperature was far above the setpoint.

I passed up an opportunity for Science by punching the Emergency Shutdown button on the Thermal Lockout: the Thing-O-Matic went dark. Turned it back on, reconnected RepG, and ran a few yards of smoking hot filament out of the nozzle. Yes, the Thermal Core really was that hot… I ran the filament drive at 25 rev/min for 10 seconds and it sprayed filament like crazy.

The Thermal Lockout had not tripped, very much as expected. Back when I was figuring out where to mount the thermal switches, those measurements suggested that the Thermal Core would probably exceed 325 °C before the 100 °C NC switch at the top of the Thermal Rise opened. The Core hadn’t gotten close to that temperature, but it was on the way!

The 40 °C NO switch glued to the base of the filament drive had long since closed and lit the yellow LED, but even I agree that’s an absurdly low temperature for such a warning. I have no idea what the actual temperatures were, but I’m thinking of putting a pair of thermocouples in the obvious spots.

I’ve devoted considerable time and energy to eliminating erratic operation and glitchy behavior, to the extent that the printer has behaved flawlessly up to this point. Obviously, something glitched the Extruder Controller, which is basically an Arduino-class microcontroller monitoring two temperature sensors and driving two MOSFETs, while chatting with the Arduino Mega that lives under the Motherboard.

A power glitch will hard-reset the Mega (because I connected +Power Good to -Reset), but the (2.7, anyway) MB firmware has the disturbing property of not resetting the EC when it restarts. You can confirm this by turning the heaters on using the Control Panel, closing it, then blipping the Motherboard Reset button: the heaters remain on. I’ll connect +Power Good to the EC’s -Reset the next time I open the box.

The Thing-O-Matic worked fine (again) after the shutdown, but, as far as I’m concerned, all the effort I put into the Thermal Lockout on the Extruder Heaters has been justified. If the EC ran away once, it will run away again.

I don’t have an MBI MK6+ Thermal Core or their Safety Cutout Switch circuit, but a look at the schematic suggests a heads-up if you use that hardware. The Safety Cutout Switch has (IMO, anyway) several design flaws:

  • The relay is not energized during normal operation. If it fails to energize when the thermal switch activates, the heater remains active.
  • By design, the 12 V relay sees only 9 V when the thermal switch operates. The usual wiring resistance and MOSFET resistances, in combination with a sagging power supply,  can reduce the available voltage below the relay’s 8.4 V must-operate voltage: it may not cut off the heater.
  • The relay cannot operate when the ground connection in the Alarm cable isn’t connected, as is the case in a Cupcake without a matching E-Stop jack. In any printer, a disconnected Alarm cable or a broken ground wire will silently prevent the relay from operating.

I’m willing to be proven wrong on any of those points, but as nearly as I can tell, the cartridge heater in a MK6+ can operate with a completely non-functional “Safety Cutout” and you’ll never know there’s a problem. When installed exactly as directed and with the entire printer working properly, it’ll probably work… but if everything worked properly, you wouldn’t need it.

[Update: MBI recently issued a slipstream update that put the Safety Circuit at Rev B. The relay now attaches to the Extruder Controller MOSFET, thus eliminating the requirement for the Alarm cable’s ground.  However, the relay now sees the heater’s high-current voltage drop on both terminals. Measure the actual voltage at the Safety Circuit’s input terminals with both heaters running. If that voltage is less than 11.2 V, the relay must operate below its must-operate voltage in order to cut off the Extruder heater.]

The rant at the bottom of that post gave some features that I considered vital when I was designing the Thermal Lockout for my printer. I thought it went without saying that a safety circuit should fail safe: if the safety circuit does not operate, the protected equipment must not operate. As long as the Thermal Switch functions correctly, my Thermal Lockout will fail safe: an inoperative relay, a broken wire, a disconnected switch, or a weak power supply will prevent the printer from starting.

My circuit is not completely fail-safe, of course, but the most common problems cause a hard shutdown.

It really does matter…

I’m now thinking that a thermal switch on the heated build platform is a Good Thing. I suspect the solder will melt and the connector will fall off before melting the acrylic in an ABP or charring the plywood in an HBP, but it’d be interesting to test that assumption, wouldn’t it?

7 thoughts on “Thing-O-Matic: Thermal Runaway!

  1. My Probability and Sadistics instructor once said “No matter how improbable, given enough time it’s inevitable.”

    1. Back when I was The Newkid working on the General Registers inside an IBM mainframe, a Senior Engineer pointed out that a marginal circuit with a one-in-a-million chance of failure would crash 40 times a second, so not pushing the design rules was a Fundamental Good Practice.

      We did a lot of design simulation…

  2. Close to the top of my list of things to look for in case of “middle-of-the-night” malfunctions: metastability problems caused by async inputs to a synchronous system.

    Shows up (rarely, these days) in hardware. But pretty common to see issues with race conditions in interrupt-driven systems; which I’d claim is the software equivalent.

    Time to dust off Ye Olde Logic Analyzer?

    1. Time to dust off Ye Olde Logic Analyzer?

      I haven’t a clue what to trigger on, though, and it’d take quite a while before I was confident the printer had a repeatable failure.

      My money is on a power glitch at the edge of both heaters turning on or off, despite the minimum loads that stabilize the voltages. Those problems don’t yield to analysis, so the best you can do is keep reducing the probabilities until they Go Away.

      Or it’s a race condition in the EC firmware. Those problems don’t yield to anything.

  3. Yeah, I try to be conservative when designing stuff. And *insanely* so when designing safety stuff, backups, and that sort of thing. Like you said, it matters.

    1. Most of the time, nothing happens and you really don’t need that fire extinguisher hanging over there by the door. When you need it, though, you need it bad

      I think it’s unreasonable to insist on complete fail-safe protection for a consumer product, but injecting some solid engineering into the product design couldn’t possibly do any harm. Conservative design practices and safe implementations don’t appear anywhere on the whiteboards; I get the impression everybody would rather just ship something and fix it up later.

      Like, for example, putting a clearly marked Emergency Stop jack on the Motherboard, then shipping firmware without implementing an E-Stop function. Now, we all know E-Stop shouldn’t depend on firmware, but, still …

Comments are closed.