Automated Scan-and-Enhance: ImageMagick to the Rescue

Mary’s folks enjoy the daily crossword, but they wanted a slightly larger edition… and, after a bit of procrastination, I conjured up an automated way to make it happen, so her father need not do this manually with The GIMP and Xsane.

The scanner, an old HP Scanjet 3970, dropped off the Windows driver list after Vista, so it now runs only with Linux.

Doing the scan is straightforward, as it’s the default scanner:

scanimage --mode Gray --opt_emulategray=yes --resolution 300 -x 115 -y 210 --format=pnm & scan.pnm

The X and Y coordinates set the scan dimensions in millimeters, which should be as small as possible consistent with scanning the whole crossword.

The driver produces output image files in PNM format, which isn’t particularly common these days, or TIFFImageMagick knows what to do with both of them; I picked PNM.

Unfortunately, for some unknown reason, the SANE driver produces a severely low-contrast image:

HP3900 Grayscale Scan
HP3900 Grayscale Scan

ImageMagick can produce a histogram:

convert scan.pnm histogram:hist.png

Which shows the problem:

HP3900 Grayscale Histogram
HP3900 Grayscale Histogram

That’s using the grayscale emulation mode: the driver does a Color scan and converts to Gray mode for the output image. It seems having the driver do the conversion produces better results than scanning directly in Color and then applying ImageMagick, but it’s not my scanner and I don’t have a lot of experience with it.

Given the PNM image:

  • Blow out the contrast
  • Resize the scan to fill the page
  • Crisp up the edges a bit
convert scan.pnm -level 45%,60% -resize 2400x3000 +repage -unsharp 0 trim.png

Which looks like this:

Crossword - contrasty resize
Crossword – contrasty resize

This being Linux, the best way to print something is with either Postscript or PDF. I used PDF, because then we can look at the results with Reader, a more familiar program than, say, Evince:

convert -density 300 -size 2550x3300 canvas:white trim.png -gravity center -composite page.pdf

Which centers the crossword on the page over a white background with enough margin to keep the printer happy:

Crossword - full page
Crossword – full page

That PDF goes to the default printer queue, where it’s turned into Postscript and comes out exactly like it should:

lp page.pdf

I gimmicked the default printer instance to use only black ink by creating a separate CUPS printer with the appropriate defaults. Other programs pay no attention to that setting and the printer uses colored inks. There is no explanation I can find for any of this; Linux / CUPS printing is basically a black box operation.

In theory, you could print the composited image file as a PNG or some such, but I cannot make it come out the right size in the right place.

You could do all of that in one line, with one huge ImageMagick invocation kicking off the scan and firing the result to the printer, but leaving some intermediate results lying along the trail isn’t necessarily a Bad Thing. I should probably use random temporary file names, though, in the interest of not polluting the namespace.

All this happened remotely, with me signed on through SSH: hooray for the command line. Had to use SCP a few times to fetch those intermediate files to puzzle over the results, too.

The complete Bash script:

scanimage --mode Gray --opt_emulategray=yes --resolution 300 -x 115 -y 210 --format=pnm > /tmp/scan.pnm
convert /tmp/scan.pnm -level 45%,60% -resize 2400x3000 +repage -unsharp 0 /tmp/trim.png
convert -density 300 -size 2550x3300 canvas:white /tmp/trim.png -gravity center -composite /tmp/page.pdf
lp /tmp/page.pdf

A slightly closer scan crop with left and top margins may also work, at the cost of more precise positioning on the scanner:

scanimage --mode Gray --opt_emulategray=yes --resolution 300 -l 5 -t 6 -x 105 -y 190 --format=pnm > /tmp/scan.pnm