Archive for November 18th, 2016

Cropping Images in a PDF

For reasons not relevant here, I had a PDF made from scanned page images with far too much whitespace around the Good Stuff. As with all scanned pages, the margins contain random artifacts that inhibit automagic cropping, so manual intervention was required.

Extract the images as sequentially numbered JPG files:

pdfimages -j mumble.pdf mumble

Experimentally determine how much whitespace to remove, then:

for f in mumble-0??.jpg ; do convert -verbose $f -shave 225x150 ${f%%.*}a.jpg ; done

You could use mogrify to shave the images in-place. However, not modifying the files simplifies the iteration process by always starting with the original images.

Stuff the cropped images back into a PDF:

convert mumble-0??a.jpg mumble-shaved.pdf