Blog Backup: Incremental Media

The recipe for incrementally copying media files since the previous blog backup works like this:

grep attachment_url *xml > attach.txt
sed 's/^.*http/http/' attach.txt | sed 's/<\/wp.*//' > download.txt
wget -nc -w 2 --no-verbose --random-wait --force-directories --directory-prefix=Media/ -i download.txt

The -nc sets the “no clobber” option, which (paradoxically) simply avoids downloading a duplicate of an existing file. Otherwise, it’d download the file and glue on a *.1 suffix, which isn’t a desirable outcome. The myriad (thus far, 0.6 myriad) already-copied files generate a massive stream of messages along the lines of File ‘mumble’ already there; not retrieving.

Adding --no-verbose will cut the clutter and emit some comfort messages.

There seems no way to recursively fetch only newer media files directly from the WordPress file URL with -r -N; the site redirects the http:// requests to the base URL, which doesn’t know about bare media files and coughs up a “not found” error.

Advertisements

  1. Leave a comment

Spam comments vanish. Comment moderation may cause a delay.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s