Disclaimer: You are looking at a post I wrote some time ago. The information and opinions contained within may be outdated and may differ from my current views. Please proceed accordingly.

wget Output Conditionals

Sep 11, 2005 8:58 AM
Tags:

Do any of my techie readers have an idea about how to solve the following problem in a simple manner? (Or what would be a good place to ask, e.g. a news group that hasn't fallen out of use or something like Perl Monks?)

Once a day, I back up my del.icio.us bookmarks with the following:

wget -q -t1 --http-user=josephgrossberg --http-passwd=password -O foo.txt http://del.icio.us/api/posts/all

However if, for some reason, there is an empty response (e.g. del.icio.us is down), my file is overwritten with nothing.

After reading through some man pages, I found the promising --no-run-if-empty option for xargs:

If the standard input does not contain any non blanks, do not run the command. Normally, the command is run once even if there is no input.

I tested it out and something like echo "joe" | xargs --no-run-if-empty echo > foo.txt behaved correctly (it wrote to the file), as did echo "" | xargs --no-run-if-empty echo > foo.txt (it did not write to the file).

However, this still doesn't work with wget:

wget -q -t1 --http-user=josephgrossberg --http-passwd=password http://del.icio.us/api/posts/all | xargs --no-run-if-empty echo > foo.txt

The foo.txt file gets overwritten with 0 bytes.

Now, I suppose there could be something "obvious" I'm missing, but is there any canonical UNIX way to solve this problem other than writing a full-fledged bash script or piping it to Perl or the like?


Comments: wget Output Conditionals

If you want portable, I'd go with this:

$ wget -O foo.txt.new [...]
$ if [ `wc -c < foo.txt.new` -gt 0 ]; then mv foo.txt.new foo.txt; else rm foo.txt.new; fi

But, this only protects against a zero-byte response. Wouldn't it be better to just save your backups with YYYYMMDD embedded in the filename, and just keep the last N days worth of backups?

Posted by: Dossy Shiobara on September 11, 2005 11:19 AM | permalink

Dossy:

Thanks for the response.

FWIW, I download the XML to the same filename every day, but have it all under CVS. This gives me the daily snapshots, without a whole mess of files to worry about.

Posted by: Joe Grossberg on September 11, 2005 12:14 PM | permalink

Ah, see -- if you're using CVS, then I say just fetch with wget and delete it if it's empty. Then, you follow with "cvs ci" then "cvs up" -- if the file was empty, it'll be deleted and "cvs ci" won't check in an update. "cvs up" will be a no-op if a new file was fetched, and it will pull out the last repository copy if the fetch was zero-byte since you rm'ed it.

$ wget ...
$ test \! -s filename.txt && rm filename.txt
$ cvs ci -mblah filename.txt
$ cvs up filename.txt

Posted by: Dossy Shiobara on September 11, 2005 9:34 PM | permalink

No more comments! Either someone has violated Godwin's Law, I'm tired of the discussion or, most likely, the ten-week window has closed. You can, however, contact me through email.