The recipe for incrementally copying media files since the previous blog backup works like this:
grep attachment_url *xml > attach.txt sed 's/^.*http/http/' attach.txt | sed 's/<\/wp.*//' > download.txt wget -nc -w 2 --no-verbose --random-wait --force-directories --directory-prefix=Media/ -i download.txt
-nc sets the “no clobber” option, which (paradoxically) simply avoids downloading a duplicate of an existing file. Otherwise, it’d download the file and glue on a
*.1 suffix, which isn’t a desirable outcome. The myriad (thus far, 0.6 myriad) already-copied files generate a massive stream of messages along the lines of
File ‘mumble’ already there; not retrieving.
--no-verbose will cut the clutter and emit some comfort messages.
There seems no way to recursively fetch only newer media files directly from the WordPress file URL with
-r -N; the site redirects the
http:// requests to the base URL, which doesn’t know about bare media files and coughs up a “not found” error.