Foggy sponges and kittens

Sponges are easy. Nobody likes to write loops with lots of error handling for IOError or trying to figure out what exactly rewind() does. Plus, we can’t write our favorite dot-chain-of-death map/select/flatten/zip monstrosity until that file’s contents are in something that implements Enumerable, right? As a result of this type of Ruby-think, I’ve seen far too many “working with files” examples that boil down to contents = File.new("something.txt").readlines or IO.read("blah.txt") and should instead be correctly titled “files are hard, let’s go sponging”.

Sadly, sponges also kill kittens. Not really, but they get pretty close. When you are dealing with file input and write a sponge, what you’re really saying is “I can’t wait until this grows, consumes all memory, and gets OOM killed!” It’s a false economy: you’re trading superficial Ruby idiom simplicity for a design that is built to fail as your data set grows.

If you write a sponge today, you will almost surely rewrite it tomorrow.

This came up again recently with respect to dealing with Amazon S3 and a rewrite we had to do because (surprise) it was consuming all the memory on a Resque machine and getting killed. The engineers assigned solved some of the hard problems (dealing with the core algorithm that is spongy), but left an obvious sponge right at the center of things, ensuring the “fix the OOM” branch would suffer a different but related OOM once merged.

This is where we get to Fog. Fog is a really impressive gem and you should probably be using it if you’re dealing with any cloud services and writing Ruby. But we were creating a sponge with Fog when putting and getting potentially large files from S3. The seductive “one clean line” solution was staring at me from yet another pull request.

It seemed odd to me. I know Wes. He and the others that work on Fog are really smart people. The gem is widely used and contributed to. It’s unlikely that nobody has noticed it is impossible to use Fog::Storage unless you can fit the file into local memory. That’d be a silly design deficiency.

And here’s where the corollary to “A bad workman blames his tools” should kick in: “A good workman blames himself.” There had to be something I was missing. I either skipped some documentation or am too stupid to fully understand the very clever (and beautiful) code that makes up Fog. As it turns out, both were true. In the process of digging, code reading, and thinking I found my answers. It’s not particularly well documented nor code-obvious, but Fog supports streaming instead of sponging. Briefly:

The “create” example is right there in the Fog::Storage docs, but is easy to look past if you’re not used to dealing with IO-like objects. The “get” example, however, is not documented in a clear place, nor is it obvious from a cursory reading of the code that it takes a block variation (the implementation is actually over in Wes’s similarly excellent Excon library).

Consider sponges an anti-pattern. Watch for them. Avoid them. Get comfortable with chunked or line-based processing. Add StringIO or thorough mocks to your TDD toolkit for testing things that expect IO-like objects.

And for goodness sake, think of the kittens.

Capybara, synchronize, and monkeys that climb ivory towers

I like Capybara. It’s is an incredibly cool tool. Gone are the days where you need an army of bleary-eyed QA team members following Word document scripts for click testing. Now it is possible to grab Capybara, and optionally something like Cucumber, to develop your QA team into a true automation team without having to also include an engineering resource to build a whole tool chain on-site.

But there is a certain ivory-tower idealism to Capybara that can make it difficult to use with some applications. The authors believe that pretty much all browser testing should be via the DOM. If you click a button or load a page and can’t assert something about its content, selectors, appearance or other properties via XPath- or CSS-driven finders, there is probably something wrong with your your application.

And they’re absolutely right. If the results of clicking on a button doesn’t fire and event that comes back and modifies your DOM, how is your feature remotely useable? If Capybara can’t detect success of an Ajax call as reflected in your DOM, how the heck is your user getting feedback? The DOM should have mutated, a class should now be present on that button that turned it green, or the word “Success” should now appear at the top. The Capybara authors are dead on: this is how it should be.

But, this isn’t how it is. Sometime’s life gets in the way of ideology.

In a particular case I encountered it was simply more efficient to do a little page tweaking with jQuery rather than insisting the new feature’s MVP adhere to Capybara’s ivory tower proscriptions. Normally that wouldn’t be a big deal, but we have an additional wrinkle: All our JavaScript is loaded in a non-blocking, asynchronous manner to improve page render time. That means the much-beloved $ may not exist right when your test code runs.

We’ve already artfully dealt with the complexities of asynchronous JS loading in our application, but it still meant intermittent failures in our black box suite when the network was more sluggish than Capybara could run execute_script. If you poke around the Internet for categories of problems with asynchronous JS or Ajax, you’ll end up in a mire of nearly identical Stack Exchange answers citing how people “fixed” their problems with wait_until. I’d argue most of these people made things worse for their app without having realized it.

Noting the trend, the Capybara authors have removed wait_until and replaced it with a harder to reach synchronize in the 2.0.x releases. Rightly, they’ve asserted that you shouldn’t ever really need to use it since Capybara has a system of waiting and retrying built into all its query code; if you’re wrapping a DOM search in your own timer, you’ve probably failed to understand something fundamental about Capybara.

In our case, however, I need to loop back to the fact that we’re not dealing with the DOM here: the presence of jQuery is not at all a DOM event. Either $ is or isn’t defined. Try as I might, I couldn’t play nice with Capybara testing philosophy and get work done.

So, I reached for my least favorite tool: the monkey patch. Ugh, I feel dirty just typing the name…

Essentially these two new methods on Capybara::Session brings the characteristic Capybara zen-like calm to my jQuery execution. Just as Capybara will patiently let your Ajax calls complete while watching for “Success” to appear, calls to execute_jquery will run little jQuery scripts after waiting patiently for $ to be defined.

It really is grotesque and I don’t like it. But it works. Perhaps in the near future I can modify our JS loader to indicate various states via hints in the DOM. Or I can take the time to rewrite the problem assertions in pure JavaScript instead of jQuery (though I’m not inclined to take the “don’t use that useful tool” route). But maybe it is time to have a bit of a tussle with the Capybara folks and find out if a facility like this might be a worthwhile addition to the framework.

They’ve built a timeout-driven system for DOM testing. Why isn’t a similar timeout system available to evaluate_script?

Mongo Mapper and hybrid deployments: a first hack

At work we’re using RoR atop MySQL to build our core web application. As with many companies, we’re realizing that certain types of scaling problems may be better solved in the NoSQL world. While looking into MongoDB, I’ve found it somewhat surprising that the two dominant Rails-ready adapters (mongo_mapper and Mongoid) don’t really answer the question of hybrid deployments. In fact, much of the “getting started” documentation includes a section on ripping out ActiveRecord (since you obviously won’t need it now that you’re playing with a new NoSQL data store).

Similarly, there are few, if any, stories of people migrating just parts of their schema into MongoDB. Either people aren’t doing this or nobody is blogging about it. I found several conversations about people just abandoning the idea and converting wholesale to MongoDB. Given the 40-minute time span between some of these postings, these are either the greatest coders ever or they’re converting toys/demos/explorations rather than years-old applications with dozens of tables and tens of thousands of regular users.

As I started to work on things, it became clear that the document oriented nature of MongoDB was going to necessitate some changes in how we do things. Gone (in some ways) are a number of the easy and familiar ActiveRecord methods for associations between records. This goes double for a system with some parts in a relational database and other parts in a document store. Rather than just typing belongs_to and moving on, you end up having to write your own accessors and mutators over and over again.

This, of course, is silly. I’ve only been coding Ruby for a little while but it was clear that I should be able to DRY this up a little. Plus mongo_mapper (which I’m currently tentatively useing) provides plugins as a way to provide core functionality, making it easy to extend.

I’m still not great at Ruby, Rails, or some of the idioms that are standard to development with this world, but it didn’t take me long to write something that provides belongs_to_ar, a convenience variation to a key that creates a new key field and automatically handles loading/storing references to ActiveRecord (the objects still in your relational database).

module MongoAdapt
  module ExternalKey

    def self.included(model)
      model.plugin MongoAdapt::ExternalKey
    end

    module ClassMethods
      def belongs_to_ar(name, *args)
        key :"#{name}_id", Integer
        create_link_methods_for(name)
      end

      private
        def create_link_methods_for(name)
          # See MongoMapper::Plugin::Keys
          class_name = name.to_s.classify
          key_name = :"#{name.to_s}_id"
          accessors_module.module_eval <<-end_eval
            def #{name}
              #{class_name}.find(read_key(:#{key_name}))
            end

            def #{name}=(value)
              write_key(:#{key_name}, value.id)
            end

            def #{name}?
              read_key(:#{key_name}).present?
            end
          end_eval

          include accessors_module
        end
    end
  end
end

MongoMapper::Document.append_inclusions(MongoAdapt::ExternalKey)

With this loaded, I can now do belongs_to_ar :user on a MongoMapper::Document and get a user_id field in the underlying document as well as user and user= methods that operate with User objects. This, of course, only handles a single direction for the relationship (going from a Mongo Mapper document to a ActiveRecord object).

I imagine I'll refine this a bit further and post some more about the challenges of a hybrid deployment as I go along. For one thing, it needs a more Ruby-ish (what the hell is the equivalent term to python-ic?) name.

Stupid Ruby tricks

Over at work, I’m rapidly getting up to speed on Ruby and Rails. It’s fun and a little wacky (and, not to mention, very fast-paced for us right now). This snipped cracked me up when I saw Jared use it while debugging.

Totally unnecessary, but man is it fun to type request.wtf! when doing initial debugging on a controller.

In support of BYOBW 2010

Easter is fast approaching and that can mean only one thing: a few people are trying to block the Bring Your Own Big Wheel event that has happened for the last few years on my street. This year, several people have undertaken the work required to make this a legal, permitted, and insured event for the whole family. There’s a permit hearing coming up and here is the letter I’m filing in support of the event.

My name is Kyle VanderBeek and I reside at 8xx Vermont Street, the last turn on the curvy portion of Vermont Street in Potrero Hill. I am writing this letter to express my support for the Bring Your Own Big Wheel (BYOBW) event being considered for April 4, 2010.

I moved into my current apartment just before the BYOBW event in 2008. When I heard it was happening, I was incredibly pleased and decided to turn it into an opportunity to gather with my friends. These events transform an otherwise normal Sunday into a time of celebration and community. I get to make food, be a host, and join my friends on the balcony to watch the spectacle. Many of us also choose to participate in what is truly a uniquely San Francisco experience; one that is family-friendly, joyous, and wacky.

I have enjoyed BYOBW for the last two years, and am encouraged by the additional organizational efforts being put forth this year (including more portable restroom facilities and monitors along the course). It has always been a low impact event, and I anticipate that being true this year as well.

As a directly-impacted neighbor, I urge the City of San Francisco to allow this annual event to continue and by doing so support the right of the people to assemble and be silly.

Thank you.

I really hope everything goes well. I can’t wait to throw a party!

Finding my first Python reference cycle

I’ve worked in Python for nearly six years now and, up to this point, I’ve not had to deal with the king of memory-related gotchas: the reference cycle. Simply put, a reference cycle happens when two or more objects create a cyclic directed graph in their path of references (ok, not that simple). The simplest form is the “I point to you, you point to me, neither of our reference counts get to zero” scenario.

Usually, this isn’t a big deal. Although objects don’t get immediately collected when dereferenced, the garbage collector will eventually notice that there is some wasted memory, do the more expensive cycle detection work, and free the objects. In many scenarios like single-run scripts, cron jobs, or low-load applications you’ll never notice; Python does it’s job, keeps things clean, and stays out of your way. If, however, you’re writing a high performance internet server intended to handle the ever increasing demands of the modern Internet, you just might be screwed. That GC work will cost you desired though-put and scalability. In the worst case, the GC won’t be invoked in time and you’ll hemorrhage memory (or at least gobble it up in competition with other things your system might need to do). I this case, initial numbers left us with a customer-detectable throughput degradation when a particular security feature was enabled.

Without further ado, here is the distilled pattern that ran us into trouble.

class Filter:
    def __init__(self, callback):
        self._callback = callback

    def update(self, data):
        """Override and call parent."""
        self._callback(data)

class AddLF(Filter):
    def update(self, data):
        data = data + '\n'
        Filter.update(self, data)


class AddCRLF(Filter):
    def update(self, data):
        data = data + '\r\n'
        Filter.update(self, data)


class Foo:
    def __init__(self, dos_mode=False):
        if dos_mode:
            self._filter = AddCRLF(self.the_callback)
        else:
            self._filter = AddLF(self.the_callback)
        self._filtered_data = list()

    def update(self, data):
        self._filter.update(data)

    def the_callback(self, data):
        self._filtered_data.append(data)

    def all_data(self):
        return ''.join(self._filtered_data)

At a glance, this isn’t an unreasonable design pattern. If you’re writing a cross platform tool, or something that needs to listen on a socket and account for variations and flaws in protocol implementations, passing the input through a filter to clean it up is reasonable. It’s even somewhat pythonic in that a call-back is used, allowing for various patterns of code re-use and subclassing in both the filters and the consumer.

The usage pattern is something like this: for each chunk of data you receive, call update(), the main class hands the data off to the filter, the filter does its thing, and uses the callback to pass the data back up.

# Many of you will recognize this pattern from crypto digests
f = Foo()
f.update('asdf')
f.update('jkl;')
print f.all_data()
f = None
# At this point ref-counting could have reclaimed the object

Pay attention to that last line: when we assign f = None, under normal circumstances the reference count would hit zero, and all objects should be collected. Here this is not the case!

Once I realized this region of code was the likely culprit, I started playing with the gc module and its debugging abilities. Most importantly, using the gc.DEBUG_SAVEALL setting causes the garbage collector to (instead of completing collection) store all unreachable objects in a list named gc.garbage. This can be examined for hints.

import gc, pprint

gc.set_debug(gc.DEBUG_SAVEALL)
print gc.get_count()
print gc.collect()
pprint.pprint(gc.garbage)

Here is the output from that single object creation.

(249, 5, 0)
6
[{'_filter': <__main__.AddLF instance at 0x7f23ec3261b8>,
  '_filtered_data': ['asdf\n', 'jkl;\n']},
 <__main__.Foo instance at 0x7f23ec326170>,
 <bound method Foo.the_callback of <__main__.Foo instance at 0x7f23ec326170>>,
 {'_callback': <bound method Foo.the_callback of <__main__.Foo instance at 0x7f23ec326170>>},
 <__main__.AddLF instance at 0x7f23ec3261b8>,
 ['asdf\n', 'jkl;\n']]

You might be able to see the loop already. We have, left over, a Foo instance, an AddLF instance (which we know is stored as a member variable of Foo), and (most importantly) a reference to a bound method on Foo. This bound method holds a reference back to the Foo instance, completing the cycle.

What the original designer of this code probably didn’t realize is that passing that callback when instantiating a Filter would create that loop (via the bound instance method, which has a reference to the instance). After a minor refactoring, I’ve gotten rid of the cycle and reference counts hit zero when the last external reference to a Foo object is eliminated. Most importantly all the related data is freed as well; in this case Foo._filtered_data (which could grow to be very large) will get freed up as well.

This anti-pattern is only one step away from the simplest case reference cycle, but it had real consequences. With it gone, memory is managed far more efficiently and this feature is usable in a demanding, high-performance environment.

I do not host comments due to spam and abuse. Discuss this post at reddit.

My first time being censored

I’m not terribly active on the “public” Internet. I have my blog, which I suspect is mostly read by my friends and a few seekers of Python knowledge. I belong to a forum here or comment on a favorite blog there. I use Facebook quite a bit to keep up with friends and pass things I’ve found among my social circle. I just don’t bother much with public discussions since they seem to devolve into nonsense and I just don’t see the point in playing that game. I don’t even allow comments on this blog since I don’t see the need to host other people’s idiocy or vitriol. About the only place you’d find me with any sort of regularity is reddit, but even that is sporadic.

I’m even a fairly quiet skeptic, preferring to share what I’m learning with those around me rather than engaging in public text-based shouting matches. The events of this past weekend started out that way: an old friend of mine from massage school, Sam, posted a link to Examiner.com that erroniously claimed that home birth was proven as safe as hospital birth. Since many medical studies are badly reported, I checked the article.

I immediately recognized the study from the University of BC in Canada. I’d already read it, and it had been similarly misrepresented elsewhere. It’s not that the study’s conclusions are false, it’s that they are narrow. Home birth has similar infant mortality rates only in Canada where a single type of certification exists: the Nurse Midwife. This is a college-educated position with clinic hours, lots of training, and a great deal of time spent doing both hospital and home births. They’re full-on nurses. That’s not the case in the US where we have several different types of midwives, some with surprisingly little training (and potentially zero clinic hours). The Skeptical OB is where I’d seen the study before, and she covers it well. The bottom line is that these lesser trained midwives have an infant mortality rate triple that of the highly trained Canadian midwives.

So it’s fair to say that the result isn’t intended to be assumed in the general case, right? I recognized the over-zealous rhetoric of someone who really wants to believe that all things natural are automatically better. The author on Examiner.com was calling herself the “natural parenting examiner”, so that’s to be expected. But it doesn’t excuse misreporting a study, especially since the original authors specifically warn against over-generalizing the result beyond its intended scope.

So I commented. I asked her to cite her sources (she had not linked, a clue that she might be getting her information third-hand). I provided a link to the Skeptical OB coverage. I was polite and brief.

But then curiosity got the best of me. Having spent a considerable amount of time reading all sides of various health issues, I suspected I was reading the work of another enthusiastic but careless advocate. So I clicked on her bio and found this:

Katie Drinkard is a self-proclaimed natural parenting guru who has done extensive research on natural pregnancy and parenting. Katie presents vital information in a personal manner while providing the latest research on natural parenting topics – ranging from birth through the childhood years.

Right. Well, that’s not exactly a stellar CV. Lack of formal training certainly doesn’t make one wrong (remember, I’m a software engineer commenting on health issues), but dropping words like “guru” has always unnerved me. I already suspected what I was going to find when I clicked on her other articles, but I had to be sure.

Naturally, one title that jumped out at me was “Swine Flu vaccine contains diseased flesh of African Monkeys”. No, I’m not kidding, she actually wrote this. Surely she was just being hyperbolic! Nope. Another predictive sense was tingling in my brain as I read this gem of a final paragraph:

Scientists create the Swine Flu vaccine (and other vaccines) by injecting monkeys with the virus and allowing the disease to take over. Later, the monkey is then killed and its diseased organs are used to make the ingredients of vaccines given to the public.

Oh, the stupid, it burns! I just knew this sort of wrong-headed nonsense could only come from a single source: Mike Adams at Natural News. Sure enough, in seconds I’d found his article making the same bogus claim.

What makes it bogus? In the process of seeking the “truth” in the text of a vaccine patent, Mr. Adams (not a doctor in any sense) inferred from the line “producing said virus using a cell line isolated from the kidney of an African Green Monkey” that there must exist a warehouse of angry, sick monkeys being fed into a meat grinder. Well, that’s not the case. Anyone who has spent more than an hour actually trying to understand vaccines will key on the words “cell line” and know that they’re referring to some laboratory-grade line of perpetually grown cells that can be used as a growth medium. Sure enough, again within seconds, I’d found that there is a widely used line named “Vero” that was isolated from African Green Monkey kidney epithelial cells way back in 1962.

So I made my way back to the Examiner.com article and assured the author that there was no monkey holocaust. I quickly pointed out how she was in error, and provided an explanation of cell lines and how they’re used. Four or five sentences and I was done.

A day later, both of my comments were gone. They appear to have been deleted. Granted, there is no way I can prove this to anyone since I don’t have access to the site’s history or databases. I can, however, say that Ms. Drinkard did update her birthing post using the link I provided. I suppose that’s something, but she didn’t correct or clarify anything.

I’m disappointed.

Love/Hate and Good Wood Mods

I play too much Rock Band. My music game affliction started back with the original Guitar Hero on the PS/2. I was doing the mega-geek rounds around Fry’s one day and saw it on the demo stand. I played a single song and knew this was something I wanted at home. Mitch had the PS/2 already, so I went for it. We played for hours and he quickly started kicking my ass. But, damn, it was fun to get awarded points and stars for being good at not actually playing music.

Fast forward a couple of years, an update to an Xbox 360, and a few finger strains battling through “Rein in Blood”, and I was still having fun. I’m not great at it (I’m really not even a full-fledged “Hard” player with the guitar), but who cares? I get better here and there, and keep enjoying myself. I picked up Rock Band and drooled over the DLC, still content playing the guitar. But, oh, the temptation. I’d tried the drums elsewhere. I wanted them…

Eventually I gave in and bought the Rock Band drum kit and started wailing away trying to teach myself a new skill. Even though it had never really waned, my interest was renewed. Bit by bit I got my hands and foot to do the right things in the right order. Granted, having moved into an apartment, I had to get my drum fix when the downstairs neighbor was out. I was like a fiend, sneaking a hit whenever I could.

Sadly, my enthusiasm took a toll on the drum kit (as many people have found). After seeing them used by YouTube drum stud azuritereaction, I chose and ordered a set of Good Wood Mods (their first, extremely home-made variant).

In April (after a long wait) they arrived. I giggled at the simple packing material. I installed them. I played them. I loved them! Immediately I realized why several of the best players use Good Wood Mods: they’re insanely fast. Compared to the plastic-backed gum rubber pads, they’re a dream to play and take far less effort. If you stay relaxed, you can play entire songs on “Expert” while barely moving your wrists. The bounce you feel in the sticks is much less jarring and closer to that of a real drum head. There’s none of that dead thud at all. And, most importantly, they’re quiet. Very quiet.

Since I work at a place that also loves Rock Band (and houses a 360), I couldn’t wait to show the guys. Only two days into owning the kit, I brought it into work. As we gathered in our game room at the end of the day, several guys lined up to give them a whirl. They got rave reviews. Rave, I say! Everyone agreed with me that they are a huge leap forward over the stock heads.

Then it all came unraveled. One of the people who loves playing the game but still plays stiffly (and hits hard) sat down. Part way through his first song, bang, the blue pad tanked. It became intermittent and only registered some of the strikes. Damnit! I took the set home to test it a little more and, bang, the yellow pad suffered the same fate. Damnit again! Fortunately, I’m a geek (as are the makers, which is awesome). I broke out the soldering iron, exchanged some emails with the guys who make them, dismantled the heads, and fixed the broken connections (just solder welds on the piezo sensors).

Sadly, this hasn’t been an isolated issue. I trust my repairs, but I’ve had to do or re-do this same sort of fix on all but one of the heads. There is something fundamentally flawed in the way their design works. I suspect it’s the combination of foam inserts and floating piezo mounts that result in friction at the edge of the wire’s jacket, creating a tugging movement that strains the braided wire. I fully expect to do this repair again, though I’m trying various applications of glue as protective covering at the edge of the weld.

Now, the big question… would I recommend these to other people? In three words: Yes and No. I absolutely love playing these over the standard pads. However, their initial build quality was questionable and I continue to have to make repairs (usually with a couple of months of playing in between). However, the guys that make them have scaled up and found a manufacturing partner. Apparently, gone is the questionable-but-endearing use of plywood, varied screws, and slices of PVC pipe. If you buy a set now, I suspect you’ll get a fundamentally similar but massively revised product built to higher standards. If you play a lot and have a few spare dollars, go for it. I’m even eying a second set.

If you’re curious, I also took few pictures of my Good Woods Mod install and repairs.

Start your Python project with optparse and logging

Python continues to be my favorite language to hack in. It’s useful for tasks big and small and has the advantage of being more readable and maintainable than a lot of other scripting languages. I remember when, having learned the basics of Python, I decided to rewrite my home-built CD-to-MP3 script suite from its perl incarnation (perl was my previous favorite language for system programming projects). I knocked that project out in only about 2 hours, including a major redesign made easy by Python’s object syntax and built in “pickle” for serialization.

Since then, I’ve usually chosen Python for a variety of system programming tasks. This includes things like making backups, deploying software, configuration file templating, web site scraping, health monitoring, and some bigger data crunching/graphing. Time after time, these quick projects (often at work) turned into something bigger. Eventually they get rolled into a product or adopted by the production operations group as standard kit.

At some point I realized that there was something that I was doing that made this transition from “quick hack” to “standard tool” easy: I always start my projects with logging and optparse in place from day zero. During development, this means that I don’t have to scatter print-style debugging statements throughout the code, I can just use logging.debug() and turn them on and off at will (via command line flags). Once deployed or passed on, it means other people using it can immediately start interacting with my script just like other familiar Unix utilities.

So, here is a variation on what my typical python script starts out with.

#!/usr/bin/python

import logging

def foo():
    """These will only get output if you turn up verbosity."""
    logging.debug("This is debug.")
    logging.info("This is info.")

def bar():
    """These will all be output a default logging levels."""
    logging.warn("Warning!  Things are getting scary.")
    logging.error("Uh-oh, something is wrong.")
    try:
        raise Exception("ZOMG tacos.")
    except:
        logging.exception("Just like error, but with a traceback.")

if '__main__' == __name__:
    # Late import, in case this project becomes a library, never to be run as main again.
    import optparse

    # Populate our options, -h/--help is already there for you.
    optp = optparse.OptionParser()
    optp.add_option('-v', '--verbose', dest='verbose', action='count',
                    help="Increase verbosity (specify multiple times for more)")
    # Parse the arguments (defaults to parsing sys.argv).
    opts, args = optp.parse_args()

    # Here would be a good place to check what came in on the command line and
    # call optp.error("Useful message") to exit if all it not well.

    log_level = logging.WARNING # default
    if opts.verbose == 1:
        log_level = logging.INFO
    elif opts.verbose >= 2:
        log_level = logging.DEBUG

    # Set up basic configuration, out to stderr with a reasonable default format.
    logging.basicConfig(level=log_level)

    # Do some actual work.
    foo()
    bar()

This is obviously a pretty minimal setup, but it achieves what most people need. You get option parsing and checking along with usage information. You get warnings and errors spat out to sys.stderr (leaving sys.stdout for actual program output). You can specify -v to crank up verbosity.

More than once, I’ve had a pager-frazzled sysadmin ask me how the heck to use the new tool. I get to reply, “Just run it with --help, it has full usage instructions built in.” This makes sysadmins happy; really happy. It means they don’t have to remember odd semantics or dig around for the how-to page on the internal wiki at 3:00am when they need to use it. Culturally, there is almost nothing more valuable than a sysadmin who has had success with your code, and nothing worse than one that has had trouble with it. To me, the first step in granting a sysadmin that sort of success is to make your code act like everything else in /usr/bin and that’s why I’ll continue starting my projects in this manner.

Caching WordPress content to appease Google Page Speed

I installed the new Google Page Speed plug-in for Firebug (under Firefox) and played with it a little. I’m impressed. It takes a number of ideas from the similarly awesome YSlow plug-in and goes a few steps further, plus adds a cleaner UI. Of course, I found a couple of really obvious things I could fix.

One of the easiest things is to make sure that your static content is cache-friendly. It’s one of those simple tricks that really does have an impact both on the number of bits you have to serve and on the perceived speed of your pages. Sadly, it gets left out of a lot of CMS/Blog systems (including WordPress). To speed things up, I created an Apache .htaccess file with the following:

<FilesMatch "\.(gif|jpe?g|png|css|js)$">
ExpiresActive On
ExpiresDefault "access plus 2 months"
Header set Cache-Control "max-age=5184000, public"
</FilesMatch>

For my purposes, I put this particular set of directives in wordpress/wp-content/.htaccess which means that all my theme and plug-in resources (including images, JavaScript, and CSS) get an Expires header 2 months into the future. It also includes a matching 2-month Cache-Control “public” header to let caches know that shared object caching is fine. With this, both web caches and the browser cache are far more likely to use the copy of these resources they have laying around.

Your mileage may vary (YMMV), but this is good placement for my particular purposes. It may be safe to put these directives all the way at the top of your document root. As with any caching, be prepared to pay some mild penalty whenever these resources actually do change; you’re ceding precise control over when a new version gets served up in order to gain performance (a trade-off you’ll have to weigh). Granted, there are cache-busting strategies (such as versioned file names) that you can use to get around this, too.

The other thing that impresses me about Page Speed is that it actually provides you with better versions of things you have, no extra software required! If it notices you could have a better-compressed image or min-ified JavaScript file, it’ll tell you how much you could be saving and provide it for you at a single click. Very slick.

Next I have to figure out if DreamHost can help me support gzip’ed content…

Update: Bah! I should have just poked around a little. I found a page regarding turning on more DEFLATE functionality from DreamHost’s Apache 2.2 setup. They already automatically deflate several types, but this doesn’t include CSS or JavaScript. It’s a pretty easy addition to the .haccess file I already have.

# gzip more stuff
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/css application/x-javascript application/javascript
</IfModule>

Unlike the way I did the Expires headers above, this particular technique uses the content’s MIME type when choosing when to apply the DEFLATE output filter. And just like that, I’ve got smaller content!