Xidel - HTML/XML data extraction tool

"Xidel is a command line tool to download and extract data from html/xml pages." Neat -- I usually wind up writing some Python to do this kind of thing.

British Library: Free Data Services

The BL catalogue available in convenient formats (note it's also downloadable from archive.org). I should make ALMS use this as a source; it can't possibly be worse than Amazon's data!

Ian Bicking: a blog :: lxml: an underappreciated web scraping library

Useful overview of lxml, which is the module I really ought to use for random XML/HTML parsing.

XML Alternatives

