The Smell of Molten Projects in the Morning

Ed Nisley's Blog: Shop notes, electronics, firmware, machinery, 3D printing, laser cuttery, and curiosities. Contents: 100% human thinking, 0% AI slop.

Arduino Mega: Showstopper Workaround

The discussion following that post gave me enough impetus to figure this out. What I have here is not a complete solution, but it seems to solve the immediate problem.

Downside: this will not survive the next regular system update that touches the gcc-avr package (yes, it’s the avr-gcc compiler and the gcc-avr package). Hence, I must write down the details so I can do it all over again…

To review:

The problem is that the avr-gcc cross-compiler produces incorrect code for Atmega1280-class chips with more than 64 KB of Flash space: a register isn’t saved-and-restored around a runtime routine that alters it. Simple sketches (seem to) run without problems, but sketches that instantiate objects crash unpredictably. Because Arduino sketches depend heavily on various objects (like, oh, the Serial routines), nontrivial sketches don’t work.

The workaround is to patch the library routine that invokes the constructors, as detailed in that gcc bug report, to push / pop r20 around the offending constructors. The patch tweaks two spots in the libgcc.S source file, which then gets built into an assortment of chip-specific libgcc.a files during the compile.

I was highly reluctant to do that, simply I’ve already installed the various gcc packages using pacman (the Arch Linux package manager) and really didn’t want to screw anything up by recompiling & reinstalling gcc from source. It’s certainly possible to update just the avr portion, but I don’t know exactly how to do that and doubt that I could get it right the first time… and the consequences of that catastrophe I don’t have time to deal with.

So I elected to build the avr cross-compiler from source, verify that the as-built libgcc.a file was identical to the failing one, apply the patch, recompile, then manually insert the modified file in the right spot(s) in my existing installation. This is less manly than doing everything automagically, but has a very, very limited downside: I can easily back out the changes.

Here’s how that went down…

The instructions there (see the GCC for the AVR target section) give the overview of what to do. The introduction says:

The default behaviour for most of these tools is to install every thing under the /usr/local directory. In order to keep the AVR tools separate from the base system, it is usually better to install everything into /usr/local/avr.

Arch Linux has the tools installed directly in /usr, not /usr/local or /usr/local/avr, so $PREFIX=/usr. Currently, they’re at version 4.5.1, which is typical for Arch: you always get the most recent upstream packages, warts and all.

Download the gcc-g++ (not gcc-c++ as in the directions) and gcc-core tarballs (from there or, better, the gnu mirrors) into, say, /tmp and unpack them. They’ll both unpack into /tmp/gcc-4.5.1, wherein you create and cd into obj-avr per the directions.

I opted to feed in the same parameters as the Arch Build System used while installing the original package, rather than what’s suggested in the directions. That’s found in this file:

/var/abs/community/gcc-avr/PKGBUILD

Which contains, among other useful things, this lump of command-line invocation:

../configure --disable-libssp \
               --disable-nls \
               --enable-languages=c,c++ \
               --infodir=/usr/share/info \
               --libdir=/usr/lib \
               --libexecdir=/usr/lib \
               --mandir=/usr/share/man \
               --prefix=/usr \
               --target=avr \
               --with-gnu-as \
               --with-gnu-ld \
               --with-as=/usr/bin/avr-as \
               --with-ld=/usr/bin/avr-ld

Yes, indeed, $PREFIX will wind up as /usr

Feeding that into ./configure produces the usual torrent of output, ending in success after a minute or two. Firing off the make step is good for 15+ minutes of diversion, even on an 11-BogoMIPS dual-core box. I didn’t attempt to fire up threads for both cores, although I believe that’s a simple option.

The existing compiler installation has several libgcc.a files, each apparently set for a specific avr chip:

[ed@shiitake tmp]$ find /usr/lib/gcc/avr/4.5.1/ -name libgcc.a
/usr/lib/gcc/avr/4.5.1/libgcc.a
/usr/lib/gcc/avr/4.5.1/avr35/libgcc.a
/usr/lib/gcc/avr/4.5.1/avr3/libgcc.a
/usr/lib/gcc/avr/4.5.1/avr51/libgcc.a
/usr/lib/gcc/avr/4.5.1/avr4/libgcc.a
/usr/lib/gcc/avr/4.5.1/avr6/libgcc.a
/usr/lib/gcc/avr/4.5.1/avr5/libgcc.a
/usr/lib/gcc/avr/4.5.1/avr31/libgcc.a
/usr/lib/gcc/avr/4.5.1/avr25/libgcc.a

The key to figuring out which of those files need tweaking lies there, which says (I think) that the Atmega1280 is an avr5 or avr51. Because I have an Arduino Mega that’s affected by this bug, I planned to tweak only these files:

/usr/lib/gcc/avr/4.5.1/libgcc.a
/usr/lib/gcc/avr/4.5.1/avr51/libgcc.a
/usr/lib/gcc/avr/4.5.1/avr5/libgcc.a

I have no idea what the top-level file is used for, but … it seemed like a good idea.

Now, I innocently expected that the libgcc.a files for a 4.5.1 installation would match the freshly compiled files for a 4.5.1-from-source build, but that wasn’t the case. I don’t know what the difference might be; perhaps there’s an embedded path or timestamp or whatever that makes a difference?

The Arch Linux standard installation of gcc 4.5.1 has these files:

$ find /usr/lib/gcc/avr/4.5.1/ -iname libgcc.a -print0 | xargs -0 ls -l
-rw-r--r-- 1 root root 2251078 Sep  4 16:26 /usr/lib/gcc/avr/4.5.1/avr25/libgcc.a
-rw-r--r-- 1 root root 2256618 Sep  4 16:26 /usr/lib/gcc/avr/4.5.1/avr31/libgcc.a
-rw-r--r-- 1 root root 2252506 Sep  4 16:26 /usr/lib/gcc/avr/4.5.1/avr35/libgcc.a
-rw-r--r-- 1 root root 2256310 Sep  4 16:26 /usr/lib/gcc/avr/4.5.1/avr3/libgcc.a
-rw-r--r-- 1 root root 2250930 Sep  4 16:26 /usr/lib/gcc/avr/4.5.1/avr4/libgcc.a
-rw-r--r-- 1 root root 2251846 Sep 27 12:58 /usr/lib/gcc/avr/4.5.1/avr51/libgcc.a
-rw-r--r-- 1 root root 2251550 Sep 27 12:58 /usr/lib/gcc/avr/4.5.1/avr5/libgcc.a
-rw-r--r-- 1 root root 2252458 Sep  4 16:27 /usr/lib/gcc/avr/4.5.1/avr6/libgcc.a
-rw-r--r-- 1 root root 2251474 Sep 27 12:57 /usr/lib/gcc/avr/4.5.1/libgcc.a

The compilation-from-source using the gcc 4.5.1 tarballs has these files:

$ pwd
/tmp/gcc-4.5.1/obj-avr
$ find -iname libgcc.a -print0 | xargs -0 ls -l
-rw-r--r-- 1 ed ed 2250258 Sep 27 15:51 ./avr/avr25/libgcc/libgcc.a
-rw-r--r-- 1 ed ed 2255798 Sep 27 15:51 ./avr/avr31/libgcc/libgcc.a
-rw-r--r-- 1 ed ed 2251686 Sep 27 15:51 ./avr/avr35/libgcc/libgcc.a
-rw-r--r-- 1 ed ed 2255490 Sep 27 15:51 ./avr/avr3/libgcc/libgcc.a
-rw-r--r-- 1 ed ed 2250110 Sep 27 15:51 ./avr/avr4/libgcc/libgcc.a
-rw-r--r-- 1 ed ed 2251838 Sep 27 15:51 ./avr/avr51/libgcc/libgcc.a
-rw-r--r-- 1 ed ed 2251550 Sep 27 15:51 ./avr/avr5/libgcc/libgcc.a
-rw-r--r-- 1 ed ed 2251638 Sep 27 15:52 ./avr/avr6/libgcc/libgcc.a
-rw-r--r-- 1 ed ed 2251474 Sep 27 15:52 ./avr/libgcc/libgcc.a
-rw-r--r-- 1 ed ed 2250258 Sep 27 15:51 ./gcc/avr25/libgcc.a
-rw-r--r-- 1 ed ed 2255798 Sep 27 15:51 ./gcc/avr31/libgcc.a
-rw-r--r-- 1 ed ed 2251686 Sep 27 15:51 ./gcc/avr35/libgcc.a
-rw-r--r-- 1 ed ed 2255490 Sep 27 15:51 ./gcc/avr3/libgcc.a
-rw-r--r-- 1 ed ed 2250110 Sep 27 15:51 ./gcc/avr4/libgcc.a
-rw-r--r-- 1 ed ed 2251838 Sep 27 15:51 ./gcc/avr51/libgcc.a
-rw-r--r-- 1 ed ed 2251550 Sep 27 15:51 ./gcc/avr5/libgcc.a
-rw-r--r-- 1 ed ed 2251638 Sep 27 15:52 ./gcc/avr6/libgcc.a
-rw-r--r-- 1 ed ed 2251474 Sep 27 15:52 ./gcc/libgcc.a

The top-level files have the same size, but are not identical:

$ diff ./avr/libgcc/libgcc.a ./gcc/libgcc.a
Binary files ./avr/libgcc/libgcc.a and ./gcc/libgcc.a differ

Haven’t a clue what’s going on with different files in different spots, but I saved the existing files in the installed tree as *.base and copied the new ones from ./gcc/avr* into place. While there are many ways to crash a program, the AnalogInOutSerial demo program ran correctly on a Duemilanova (presumably with the existing libgcc.a) and failed on the Mega (with the recompiled libgcc.a). Save those files as *.rebuild just in case they come in handy.

Manually change the libgcc.S source file (it’s only four lines, I can do this), recompile, and the build process recompiles only the affected files; that’s comforting. Copy those into the installed tree and, lo and behold, the demo program now runs on both the Duemilanova and the Mega.

While it’s too soon to declare victory, the hardware bringup program I’m writing also works, so the initial signs are good.

Thanks to Mark Stanley for blasting me off dead center on this. I didn’t do a complete install, but he got me thinking how to make the least disruptive change…

And a tip o’ the cycling helmet to the whole Free Software collective for making a mid-flight patch like this both feasible and possible: Use The Source!