Ed Nisley's Blog: Shop notes, electronics, firmware, machinery, 3D printing, laser cuttery, and curiosities. Contents: 100% human thinking, 0% AI slop.
After describing the initial failure there, I’ve been having it boot at 6:15 am and shut down at 8:00 am; if it ever fails to shut down, I know it hasn’t started up correctly.
As you might expect, it’s booted fine every morning for the last week. I think that’s good news…
The /etc/crontab entry to make that happen is:
00 08 * * * root shutdown -P now
I suppose I should put a little script in /etc/cron.daily, rather than futzing with the main crontab file, but I’m lazy.
The /etc/rc.local file starts up a few odds & ends:
date >> /home/ed/startup.txt
echo -n "starting rc.local ... " >> /home/ed/startup.txt
##-- update dyndns record of our external IP address
echo -n "ddclient ... " >> /home/ed/startup.txt
if [ -x /usr/sbin/ddclient ] ; then
ddclient -force
fi
#-- fire off Primenet Mersenne Prime search
echo -n "mprime ... " >> /home/ed/startup.txt
/opt/primenet/mprime -t &
echo -n "scanbuttond ... " >> /home/ed/startup.txt
scanbuttond
echo "done" >> /home/ed/startup.txt
exit 0
Most of that does some really crude logging to a file in my home directory. If it fails to start up, I’ll at least know when it last worked…
As I mentioned earlier, I’ve disabled ddclient (to prevent it from snatching Mom’s current IP address out from under dyndns.com) by the simple expedient of renaming the file to ddclient.off. The script actually tests whether /usr/sbin/ddclient is executable, but changing the name makes it obvious why it’s not running.
Just to keep the CPU heatsink warm, it runs GIMPS: The Great Internet Mersenne Prime Search. Admittedly, a 2.4 GHz Celeron isn’t exactly the ideal CPU for this task, but every little bit helps. Right now it’s running the torture test with all the memory it wants, but I’ll throttle that back when Mom gets it.
Don’t know what I’ll do if it fails, to tell you the truth.
Memo to self: remember to switch mprime back to normal mode with less memory.
After grounding the obvious metal bits around the desk as shown thereand taking some pains to zap the light switch on the wall (rather than the grounded objects) before sitting down and routing the USB cable away from everything else, the mysterious USB disconnects seem to have Gone Away.
The USB hubs were reporting exactly what happened:
hub 3-0:1.0: port 1 disabled by hub (EMI?), re-enabling...
Which wasn’t much help in the beginning, because I couldn’t correlate a static zap with the disconnect. Quite often, it’d be something innocent like plugging a camera into its USB/charging cradle with no obvious discharge.
The onset of 0 °F weather and the ensuing 0% relative humidity, plus my donning a synthetic fleece jacket while venturing into the rather too-chilly basement laboratory, brought the problem to the fore. An inch-long arc to a light switch gets your attention pretty quick!
Hint: when you know you’re charged, pull a pen from your pocket, get a good grip on the metal pocket clip, and use that to draw the spark from the light switch. The larger surface area contacting your fingers reduces the current density to the mild tingle level, rather than leaving a charred pit on the end of your finger.
In round numbers, the dielectric breakdown voltage of air is 1 kV / mm. That inch-long arc required upwards of 20 kV: not bad for an acrylic jacket!
When we go riding in the winter, we dress in layers of acrylic this and synthetic that, to the extent that simply moving generates a nasty charge. Hence the punchline: nobody moves, nobody gets hurt.
I just tried compiling a program (for an Arduino) and make grumped about a date in the future:
make: Warning: File `Makefile' has modification time 1.4e+02 s in the future
<<< usual compile output snipped >>>
make: warning: Clock skew detected. Your build may be incomplete.
Turns out that the timestamps really were screwy:
[ed@shiitake Solar Data Logger]$ date
Sun Jan 25 10:57:44 EST 2009
[ed@shiitake Solar Data Logger]$ ll
total 28
drwxr-xr-x 2 ed ed 4096 2009-01-25 10:59 applet
-rw-r--r-- 1 ed ed 1920 2009-01-25 10:35 Logger.pde
-rwxr-xr-x 1 ed ed 7719 2009-01-25 10:59 Makefile
-rwxr-xr-x 1 ed ed 7689 2009-01-25 10:53 Makefile~
-rw-r--r-- 1 ed ed 1880 2009-01-25 10:08 Solar Data Logger.pde~
Well, now, how can that be?
The offending files are stored on a file server, not on the machine in front of my Comfy Char. The current dates for the two machines weren’t quite the same: the server was running just slightly in the future.
I used an ordinary Kubuntu desktop install on our “file server”, which is basically a Dell Inspiron 531s running headless in the basement. All this is behind a firewall router, so I do not have an Internet-facing machine running X, OK?
Kubuntu has an option that updates the clock automagically, but only once per boot.
Right now, that box claims an uptime of just over 22 days. It’s run for months at a time without any intervention, which just one of the things I like about Linux:
I think you can see the problem: after three weeks the PC’s internal clock had drifted more than two minutes fast.
I used the Big Hammer technique to whack the server’s clock upside the head:
sudo ntpdate pool.ntp.org
[sudo] password for ed: youwish
25 Jan 11:01:22 ntpdate[23062]: step time server 66.7.96.2 offset -151.277185 sec
That’s 7 seconds per day or 151 seconds out of 2 megaseconds: 77 parts per million. It’s in a basement at 55 F right now, so there may well be a temperature effect going on.
You can set up ntp (www.ntp.org or, better, from a package in your distro) to run continuously in the background and keep the clock in time by slewing it ever so slightly as needed to make the average come out right. I just added an entry to /etc/crontab like this:
That way the clock gets whacked into line once a day when nobody’s looking.
If you’re running a real server with heavy activity, ntp is the right hammer for the job because you don’t want ntpdate to give you mysterious gaps of a few seconds or, worse, duplicate timestamps. Leap year is bad enough.
Memo to Self: set up ntp on the server and then aim all the desktops at it.
I used to do that when I was running a GPS-disciplined oscillator to produce a nearly Stratum 1 clock on my server, but then power got too expensive for that frippery.
Adding those grounding wires from my desk lamp and the aluminum plate under the keyboard / trackballs to the PC case reduced the problem, but didn’t eliminate it.
Grounding Wire on Desk Lamp
Logging all that data, though, pointed to what (I think) is the cause: static discharge. I’ve been touching the screws on the wall switch before sitting down, which pretty much made the problem Go Away. Touching the (now-grounded) desk lamp or the keyboard plate still kills the hub, so the hub inside the PC must be way sensitive.
The disconnect follows the external hub’s cable, which means (I think) the jolt’s entering through that wire. I’ve already tried different cables, but perhaps different routing will help; there’s a huge tangle of wires behind the desk.
After adding ground connections to the lamp and keyboard / trackball tray (doodling off to the PC case), another disconnect after a day of rising expectations:
[37681.592585] hub 3-0:1.0: port 1 disabled by hub (EMI?), re-enabling...
[37681.592595] usb 3-1: USB disconnect, address 4
[37681.592598] usb 3-1.1: USB disconnect, address 5
[37681.644329] usb 3-1.3: USB disconnect, address 6
[37681.720326] usb 3-1.4: USB disconnect, address 7
[37681.875555] usb 3-1: new full speed USB device using uhci_hcd and address 8
[37682.044444] usb 3-1: configuration #1 chosen from 1 choice
[37682.047403] hub 3-1:1.0: USB hub found
[37682.049363] hub 3-1:1.0: 4 ports detected
[37682.371776] usb 3-1.1: new low speed USB device using uhci_hcd and address 9
[37682.508656] usb 3-1.1: configuration #1 chosen from 1 choice
[37682.511687] input: Wacom Graphire3 6x8 as /devices/pci0000:00/0000:00:1d.2/usb3/3-1/3-1.1/3-1.1:1.0/input/input10
[37682.759115] usb 3-1.3: new low speed USB device using uhci_hcd and address 10
[37682.902996] usb 3-1.3: configuration #1 chosen from 1 choice
[37682.920106] input: Microsoft Comfort Curve Keyboard 2000 as /devices/pci0000:00/0000:00:1d.2/usb3/3-1/3-1.3/3-1.3:1.0/input/input11
[37682.957903] input,hidraw2: USB HID v1.11 Keyboard [Microsoft Comfort Curve Keyboard 2000] on usb-0000:00:1d.2-1.3
[37682.977914] input: Microsoft Comfort Curve Keyboard 2000 as /devices/pci0000:00/0000:00:1d.2/usb3/3-1/3-1.3/3-1.3:1.1/input/input12
[37683.033615] input,hidraw3: USB HID v1.11 Device [Microsoft Comfort Curve Keyboard 2000] on usb-0000:00:1d.2-1.3
[37683.238299] usb 3-1.4: new low speed USB device using uhci_hcd and address 11
[37683.377195] usb 3-1.4: configuration #1 chosen from 1 choice
[37683.398232] input: Kensington Kensington Expert Mouse as /devices/pci0000:00/0000:00:1d.2/usb3/3-1/3-1.4/3-1.4:1.0/input/input13
[37683.436882] input,hidraw4: USB HID v1.10 Mouse [Kensington Kensington Expert Mouse] on usb-0000:00:1d.2-1.4
For the last month I’ve been twiddling it every now & again in preparation for my next visit, plus just letting it run to get some power-on hours under my supervision. You’ll find some of the info on that process earlier in the PC Tweakage category.
So it’s been booting up automagically at 6:15 am every morning, which is easier for Mom, but every now & again it wakes up dead. This is why I’m doing a month or two of burn-in here!
The diagnostic LEDs (the ABCD lights on the back panel) are GYGG, which isn’t listed in their hard-to-find LED reference. [Update: maybe now at Optiplex Diagnostic Indicators]
Dell Optiplex GX270 Auto-On Boot Failure LEDs
I did the usual diagnostic stuff. All the Dell diagnostic tests work fine, replugging the memory doesn’t help, and so forth & so on. Running many passes of memtest86+ (from the invaluable System Rescue CD) shows no problems at all.
Called up 800-891-8595, the DFS warranty service number (which is different from the usual Dell route), told my story, and got a call back (!) from the tech. I related the situation, mentioned that I’d set it for auto-on, and he said “Oh, they never got that BIOS code working, it’s never been released, and I’m surprised it works at all.”
Riiiight…
This is a biz machine, the sort acquired in semitrailer loads by big companies with actual IT departments, the ones that automagically wake up their flock of machines for overnight updates. Maybe they trigger auto-on through the LAN port (that’s another BIOS option) these days, but the BIOS wake-up alarm clock function has been available in pretty nearly every Dell I’ve ever owned… and works fine.
This is not rocket science.
Indeed, if anyone’s ever had the slightest problem with Dell’s auto-on, Google shows no sign of it. There’s nothing on the normally loquacious Dell forums. Nay, verily, the GX270 manual itself touts the “advanced feature” of having it turn on at a preset time and day.
Anyhow, he says the LED code shows the problem has something to do with the memory or video chip not starting up in time. That information is in his “internal” debugging info, which is not available to mere customers. He’s unwilling to swap memory (I tried another stick to no avail), let alone the system board.
Conclusion: his assignment is to make me Go Away without spending any money on warranty repairs.
Seeing as how the GX270 was a whopping 100 bucks delivered, I can sympathize with his marching orders, even if I disagree with their outcome.
So maybe Mom’s going to have to get used to turning the box on in the morning; it seems to work perfectly that way. A straightforward crontab entry turns it off in the evening… at least that part still works.
I’ve bought other off-lease & Dell Outlet boxes; they’ve worked fine. This one is a bit more battered than usual, but it’s otherwise in fine shape. It’s even been re-capped; the larger electrolytic caps aren’t the dreaded Nichicon popcorn caps.
Update: It seems to be booting OK with this burn-in regimen.
For the last few months, one of the USB hubs in my PC has been disabling a port connected to an external USB hub. The external hub re-establishes communication with all the devices, but the X server doesn’t take kindly to having devices yanked out from underneath it. After this glitch, my left-hand trackball and tablet are dead in the water; the only way to get ’em working is to log out, restart X, and log back in again. Not pleasant.
Oddly, the keyboard continues to function.
Note that it’s the hub in the PC that’s complaining, not the external hub.
USB connections at PC
I’ve tweaked the obvious things: switched USB ports on the PC, replaced the external hub, powered and un-powered the external hub, rearranged the devices on the hub, moved other devices away from the hub, and so forth and so on.
Now it’s time to start taking notes. The current external hub is a cheap, no-name gadget direct from China that bears a striking resemblance to the tchotchke HP-branded “Made In China” hub a friend picked up at SC06.
Both hubs have a cut-out in the case for a power plug, but the internal circuit boards lack the requisite jack to actually make use of external power.
Here’s the current dmesg dump.
[ 2917.702975] hub 1-0:1.0: port 2 disabled by hub (EMI?), re-enabling...
[ 2917.702986] usb 1-2: USB disconnect, address 3
[ 2917.702989] usb 1-2.2: USB disconnect, address 4
[ 2917.712300] /build/buildd/linux-2.6.24/drivers/input/tablet/wacom_sys.c: wacom_sys_irq - usb_submit_urb failed with result -19
[ 2917.787223] usb 1-2.3: USB disconnect, address 5
[ 2917.831161] usb 1-2.4: USB disconnect, address 6
[ 2918.002420] usb 1-2: new full speed USB device using uhci_hcd and address 7
[ 2918.165658] usb 1-2: configuration #1 chosen from 1 choice
[ 2918.170675] hub 1-2:1.0: USB hub found
[ 2918.172533] hub 1-2:1.0: 4 ports detected
[ 2918.502967] usb 1-2.2: new low speed USB device using uhci_hcd and address 8
[ 2918.662816] usb 1-2.2: configuration #1 chosen from 1 choice
[ 2918.679915] input: Microsoft Comfort Curve Keyboard 2000 as /devices/pci0000:00/0000:00:1d.0/usb1/1-2/1-2.2/1-2.2:1.0/input/input10
[ 2918.721211] input,hidraw2: USB HID v1.11 Keyboard [Microsoft Comfort Curve Keyboard 2000] on usb-0000:00:1d.0-2.2
[ 2918.751759] input: Microsoft Comfort Curve Keyboard 2000 as /devices/pci0000:00/0000:00:1d.0/usb1/1-2/1-2.2/1-2.2:1.1/input/input11
[ 2918.793088] input,hidraw3: USB HID v1.11 Device [Microsoft Comfort Curve Keyboard 2000] on usb-0000:00:1d.0-2.2
[ 2919.010108] usb 1-2.3: new low speed USB device using uhci_hcd and address 9
[ 2919.165963] usb 1-2.3: configuration #1 chosen from 1 choice
[ 2919.187020] input: Kensington Kensington Expert Mouse as /devices/pci0000:00/0000:00:1d.0/usb1/1-2/1-2.3/1-2.3:1.0/input/input12
[ 2919.244778] input,hidraw4: USB HID v1.10 Mouse [Kensington Kensington Expert Mouse] on usb-0000:00:1d.0-2.3
[ 2919.465338] usb 1-2.4: new low speed USB device using uhci_hcd and address 10
[ 2919.617208] usb 1-2.4: configuration #1 chosen from 1 choice
[ 2919.625530] input: Wacom Graphire3 6x8 as /devices/pci0000:00/0000:00:1d.0/usb1/1-2/1-2.4/1-2.4:1.0/input/input13
The external hub is on the white cable (a USB extender cable, which may contribute to the problem) plugged into the port at the far right from the network cable. But i’ve used an external hub with a different and very official USB A-B cable.
The USB connections on the back panel:
Top row: Zire 71 PDA cradle | External USB hub (the offending one)
[10625.203907] usb 1-2.2: USB disconnect, address 8
[10625.970605] usb 1-2.3: USB disconnect, address 9
[10626.865087] usb 1-2.4: USB disconnect, address 10
[10627.880887] usb 1-2: USB disconnect, address 7
[10638.410365] usb 1-2: new full speed USB device using uhci_hcd and address 11
[10638.579231] usb 1-2: configuration #1 chosen from 1 choice
[10638.582167] hub 1-2:1.0: USB hub found
[10638.584118] hub 1-2:1.0: 4 ports detected
[10638.898574] usb 1-2.1: new low speed USB device using uhci_hcd and address 12
[10639.035458] usb 1-2.1: configuration #1 chosen from 1 choice
[10639.038484] input: Wacom Graphire3 6x8 as /devices/pci0000:00/0000:00:1d.0/usb1/1-2/1-2.1/1-2.1:1.0/input/input14
[10639.301891] usb 1-2.3: new low speed USB device using uhci_hcd and address 13
[10639.445768] usb 1-2.3: configuration #1 chosen from 1 choice
[10639.466865] input: Microsoft Comfort Curve Keyboard 2000 as /devices/pci0000:00/0000:00:1d.0/usb1/1-2/1-2.3/1-2.3:1.0/input/input15
[10639.505029] input,hidraw2: USB HID v1.11 Keyboard [Microsoft Comfort Curve Keyboard 2000] on usb-0000:00:1d.0-2.3
[10639.525686] input: Microsoft Comfort Curve Keyboard 2000 as /devices/pci0000:00/0000:00:1d.0/usb1/1-2/1-2.3/1-2.3:1.1/input/input16
[10639.572404] input,hidraw3: USB HID v1.11 Device [Microsoft Comfort Curve Keyboard 2000] on usb-0000:00:1d.0-2.3
[10639.777083] usb 1-2.4: new low speed USB device using uhci_hcd and address 14
[10639.917975] usb 1-2.4: configuration #1 chosen from 1 choice
[10639.934026] input: Kensington Kensington Expert Mouse as /devices/pci0000:00/0000:00:1d.0/usb1/1-2/1-2.4/1-2.4:1.0/input/input17
[10639.971692] input,hidraw4: USB HID v1.10 Mouse [Kensington Kensington Expert Mouse] on usb-0000:00:1d.0-2.4
[10962.110496] usb 5-5.2: new high speed USB device using ehci_hcd and address 8
[10962.219819] usb 5-5.2: configuration #1 chosen from 1 choice
[12290.756701] usb 5-5.2: USB disconnect, address 8
[54272.946992] hub 1-0:1.0: port 2 disabled by hub (EMI?), re-enabling...
[54272.947001] usb 1-2: USB disconnect, address 11
[54272.947003] usb 1-2.1: USB disconnect, address 12
[54273.003212] usb 1-2.3: USB disconnect, address 13
[54273.071171] usb 1-2.4: USB disconnect, address 14
[54273.214485] usb 1-2: new full speed USB device using uhci_hcd and address 15
[54273.383873] usb 1-2: configuration #1 chosen from 1 choice
[54273.386778] hub 1-2:1.0: USB hub found
[54273.388736] hub 1-2:1.0: 4 ports detected
[54273.699213] usb 1-2.1: new low speed USB device using uhci_hcd and address 16
[54273.840096] usb 1-2.1: configuration #1 chosen from 1 choice
So replacing the hub seems to have not done anything useful, as expected.
Next step: plug the hub cable into the right-hand port on the bottom row.
Update: Time passes and another disconnect pops up. I’d just poked the USB button on my pocket camera’s cradle. Coincidence? Sometimes it disconnects when I sit down, which suggests a static discharge.
The dmesg dump:
[ 1275.088536] hub 3-0:1.0: port 2 disabled by hub (EMI?), re-enabling...
[ 1275.088547] usb 3-2: USB disconnect, address 5
[ 1275.088549] usb 3-2.1: USB disconnect, address 6
[ 1275.132321] usb 3-2.3: USB disconnect, address 7
[ 1275.220223] usb 3-2.4: USB disconnect, address 8
[ 1275.371502] usb 3-2: new full speed USB device using uhci_hcd and address 9
[ 1275.541198] usb 3-2: configuration #1 chosen from 1 choice
[ 1275.544148] hub 3-2:1.0: USB hub found
[ 1275.546102] hub 3-2:1.0: 4 ports detected
[ 1275.856554] usb 3-2.1: new low speed USB device using uhci_hcd and address 10
[ 1275.993439] usb 3-2.1: configuration #1 chosen from 1 choice
[ 1275.997194] input: Wacom Graphire3 6x8 as /devices/pci0000:00/0000:00:1d.2/usb3/3-2/3-2.1/3-2.1:1.0/input/input10
[ 1276.259871] usb 3-2.3: new low speed USB device using uhci_hcd and address 11
[ 1276.406735] usb 3-2.3: configuration #1 chosen from 1 choice
[ 1276.423836] input: Microsoft Comfort Curve Keyboard 2000 as /devices/pci0000:00/0000:00:1d.2/usb3/3-2/3-2.3/3-2.3:1.0/input/input11
[ 1276.469640] input,hidraw2: USB HID v1.11 Keyboard [Microsoft Comfort Curve Keyboard 2000] on usb-0000:00:1d.2-2.3
[ 1276.490647] input: Microsoft Comfort Curve Keyboard 2000 as /devices/pci0000:00/0000:00:1d.2/usb3/3-2/3-2.3/3-2.3:1.1/input/input12
[ 1276.545539] input,hidraw3: USB HID v1.11 Device [Microsoft Comfort Curve Keyboard 2000] on usb-0000:00:1d.2-2.3
[ 1276.746036] usb 3-2.4: new low speed USB device using uhci_hcd and address 12
[ 1276.896906] usb 3-2.4: configuration #1 chosen from 1 choice
[ 1276.912985] input: Kensington Kensington Expert Mouse as /devices/pci0000:00/0000:00:1d.2/usb3/3-2/3-2.4/3-2.4:1.0/input/input13
[ 1276.960781] input,hidraw4: USB HID v1.10 Mouse [Kensington Kensington Expert Mouse] on usb-0000:00:1d.2-2.4
[ 1280.734257] usb 5-5.2: new high speed USB device using ehci_hcd and address 7
[ 1280.843575] usb 5-5.2: configuration #1 chosen from 1 choice
Bottom Row: Logitech trackball | External USB hub (the offending one) | empty port
Methinks it’s time to start yanking the SD card out of the camera and poking it into the PC’s media reader. That should eliminate one possible source of bus confusion.
Time to reset…
Update: time passes and it’s been working OK… up until I zapped the desk lamp with a teeny static spark, at which moment dmesg reports this:
[49077.518404] hub 3-0:1.0: port 1 disabled by hub (EMI?), re-enabling...
[49077.518413] usb 3-1: USB disconnect, address 2
[49077.518416] usb 3-1.1: USB disconnect, address 3
[49077.562470] usb 3-1.3: USB disconnect, address 4
[49077.658411] usb 3-1.4: USB disconnect, address 5
[49077.805865] usb 3-1: new full speed USB device using uhci_hcd and address 6
[49077.974924] usb 3-1: configuration #1 chosen from 1 choice
[49077.982443] hub 3-1:1.0: USB hub found
[49077.983860] hub 3-1:1.0: 4 ports detected
[49078.302253] usb 3-1.1: new low speed USB device using uhci_hcd and address 7
[49078.439141] usb 3-1.1: configuration #1 chosen from 1 choice
[49078.442156] input: Wacom Graphire3 6x8 as /devices/pci0000:00/0000:00:1d.2/usb3/3-1/3-1.1/3-1.1:1.0/input/input10
[49078.709567] usb 3-1.3: new low speed USB device using uhci_hcd and address 8
[49078.853443] usb 3-1.3: configuration #1 chosen from 1 choice
[49078.875540] input: Microsoft Comfort Curve Keyboard 2000 as /devices/pci0000:00/0000:00:1d.2/usb3/3-1/3-1.3/3-1.3:1.0/input/input11
[49078.920499] input,hidraw2: USB HID v1.11 Keyboard [Microsoft Comfort Curve Keyboard 2000] on usb-0000:00:1d.2-1.3
[49078.941372] input: Microsoft Comfort Curve Keyboard 2000 as /devices/pci0000:00/0000:00:1d.2/usb3/3-1/3-1.3/3-1.3:1.1/input/input12
[49078.999851] input,hidraw3: USB HID v1.11 Device [Microsoft Comfort Curve Keyboard 2000] on usb-0000:00:1d.2-1.3
[49079.212710] usb 3-1.4: new low speed USB device using uhci_hcd and address 9
[49079.351593] usb 3-1.4: configuration #1 chosen from 1 choice
[49079.367726] input: Kensington Kensington Expert Mouse as /devices/pci0000:00/0000:00:1d.2/usb3/3-1/3-1.4/3-1.4:1.0/input/input13
[49079.411112] input,hidraw4: USB HID v1.10 Mouse [Kensington Kensington Expert Mouse] on usb-0000:00:1d.2-1.4
So it looks like there’s really something EMI-ish going on. Perhaps it’s time to look into grounding all the exposed metal around here.
Update: It really was static electricity. More discussion there.