Advertisements

Archive for February 8th, 2013

Fixing LibreOffice Document Graphic File Paths

It turns out that if you put convenient symlinks in your directories, then use them to build a LibreOffice document, LO will cheerfully put those paths into the graphic file links inside its XML files. That will produce horrible breakage on a new system without those links. We’ve come to the conclusion that the only way to keep LO happy is to create a Pictures directory in whatever directory holds the document file, then put all of the document’s image files into that directory, and make sure LO stores relative paths. Of course, this leaves us with the prospect of updating a whole bunch of existing (and, alas, horribly broken) documents by hand, which is unappealing. My previous solution worked for a single file, but now it’s time for some scripting…

This would probably be easier in Python, but Bash works fine after you get the quoting straightened out. This script builds several other scripts that actually do the heavy lifting, because that way you can inspect the scripts before running them to verify that you’re not about to make a bad situation much, much worse. I recommend copying the presentations into another directory, running this script, check the output scripts, run them by hand, and then copy the fixed files and the Pictures directory back where they belong.

You must tweak the actual paths to the pictures to match your situation; for these documents, one simple change sufficed for all the image files. Those paths are not variables, because I can barely keep the quoting straight without adding another layer of indirection. Make sure all the paths match up, verify the scripts before you run them, and don’t trust anything you see.

CAUTION: It’s highly likely that the multiple levels of character escaping required to make these listings appear correctly on the screen will produce incorrect results when copied-and-pasted. You can download the script file as FixGraphics.sh.odt, which is a bare-ASCII TXT file (which you must rename to eliminate the ODT extension, then make executable as a shell script), to see how it compares.

The main FixGraphics.sh script, with some key lines highlighted:

#!/bin/bash

echo "Extract list of images from all ODP files"
rm images.txt
for f in *odp
do
	unzip -p "$f" content.xml | sed 's/></>\n</g' | grep Cameras | cut -d \" -f 2 | sort -u >> images.txt
done

echo "Make source file name list"
# strip off leading relative pathing, set actual absolute path, un-quote blanks and special characters, add quotes
sed 's/..\/..\/..\/../\/mnt/' images.txt | sed 's/%20/ /g' | sed 's/&amp;/\&/g' | sed 's/^.*/\"&\"/' > source.lst

echo "Make target file name list"
# set relative to current directory
sed 's/\/mnt\/bulkdata\/Cameras\/MCWN/\.\/Pictures/' source.lst > target.lst

echo "Make target directory list"
# must add trailing quote stripped by dirname
rm dirs.lst
cat target.lst | while read tline ; do
	tdir=`dirname "$tline"`
	echo ${tdir}\"
done > dirs.lst

echo "Create target directory structure script"
rm mkdirs.sh
sort -u dirs.lst | while read dline ; do
	echo mkdir --parents ${dline}
done > mkdirs.sh
chmod u+x mkdirs.sh

echo "Create image file copy script"
rm cpjpgs.sh
cat dirs.lst | while read dline ; do
	echo cp -n -t ${dline}
done > cptemp.txt
paste cptemp.txt source.lst > cpjpgs.sh
chmod u+x cpjpgs.sh

echo "Create ODP fixup script"
echo "for f in *odp ; do" > fixodp.sh
echo "unzip -p \"\$f\" content.xml > raw.xml" >> fixodp.sh
echo "sed 's/..\/..\/..\/..\/bulkdata\/Cameras\/MCWN/\.\.\/Pictures/g' raw.xml > content.xml"  >> fixodp.sh
echo "zip \"\$f\" content.xml"  >> fixodp.sh
echo "done" >> fixodp.sh
echo "rm raw.xml content.xml" >> fixodp.sh
chmod u+x fixodp.sh

Run mkdirs.sh, cpjpgs.sh, and fixodp.sh: then it Just Works.

Some of the tricky parts:

The content.xml file may be stored in unformatted mode, with everything mushed together into one huge line. To make it readable and parse-able, insert a newline between each pair of adjoining angle brackets:

sed 's/></>\n</g'

This burst of line noise un-escapes the file name from the way LO stores it internally. Note that the middle sed command really does have the literal escape sequence ampersand-amp-semicolon in it and the ampersand in the last one is the sed-ism for “the whole matching string”:

sed 's/%20/ /g' | sed 's/&amp;/\&/g' | sed 's/^.*/\"&\"/'

The difference between these two sed strings indicates the actual relative path to the Pictures subdirectory in the filesystem and the faked relative path from the LO pseudo-subdirectory where the document stores its internal state. The string of periods in the second command shows what LO stored for the original files in our documents; your mileage will certainly differ:

sed 's/\/mnt\/bulkdata\/Cameras\/MCWN/\.\/Pictures/' source.lst > target.lst
sed 's/..\/..\/..\/..\/bulkdata\/Cameras\/MCWN/\.\.\/Pictures/' raw.xml > content.xml

I don’t know how they could make the file linkages work better, but it’d be really nice if there were a less horrible way to fix the breakage.

Advertisements

22 Comments