01/01

So You Wanna Ship Fast?

we all have a lot to learn from the little red hen

For Formula One, the 1994 San Marino grand prix was a turning point for the sport. Tragically over the same weekend the formula one drivers Ayrton Senna and Roland Ratzenberger lost their lives in two separate crashes across two different days. In expert reports from the FIA, the governing body of F1, it was determined that both accidents were avoidable. F1 have since made significant efforts to improve driver safety.

Raised cockpit sides were mandated to protect driver's heads from lateral impacts, the monocoque shell around the cars was strengthened, tracks were redesigned, run off areas were added, and since then technology has significantly improved around driver safety. The HANS device (Head And Neck Support) being one of the most impactful, introduced 9 years later to F1 in 2003. In the 30 years preceding 1994 there were 34 deaths in F1 races, in the 30 years post 1994 there has been only one.

Yet there is still the question: Why did it take the greatest driver the sport had ever known dying for F1 to do something about the problem? They addressed the problem only after the cost of inaction exceeded the cost of prevention. In software, a lot of us are still racing in the pre-1994 era.

 title=

The Hans Device, the most important improvement in motor sport safety since the seatbelt. It existed in 1994 and may have saved Ratzenberger's life.

We're entering a really cool time to be a software engineer. I ship more than I ever have and try things I would have never had time for. Genuinely, I only feel limited now by my imagination and token costs. While the startup class and some of the more cutting edge companies have already bought in and automated a large amount of their dev work, check out the BoundaryML guys for people doing this right, legacy companies have rightfully been a lot slower moving... but they are starting to feel it.

So legacy companies do what they do best. Get their top guys on it and/or hire and consult outward, both to varying degrees of success. If you are a leader at one of these companies I will save you the consulting fees now:

Nothing has changed. It is not easier to ship great software, just faster. You will still need to do the groundwork to enable this.

I want it to be easy to build great software too, but AI is not a magic wand. Despite what that icon everyone seems to be using for their AI features might imply. So what do I mean nothing has changed? We have bleeding edge hyper-scalable agents to throw at any task you can think of. That may be true, but the fundamentals of what a developer needs and what a state of the art large language model needs aren't that different in reality:

What We Need to Ship Great SoftwareAIDevs
Fast Feedback (CI/CD, Tests)
Clear Specifications
Loosely Coupled Architecture
Observability
Ease of Deployment
Caffeine

If that table looks like a description of your infrastructure, good. If it looks like a wish list, we need to talk. Start by auditing your current reality:

  1. How long does it take your team to deploy?

  2. How long does it take to run your automated tests, do you have any?

  3. If something goes wrong, how easy is it to diagnose what went wrong?

  4. Can you clearly articulate business requirements to your engineering team so they can architect and build your product?

  5. How hard is it to make changes to your software?

  6. Do you make the devs pay for their own coffee?

All mission critical to building great software.

If you don't have the guard rails already built, when you do eventually need to move to agents doing the bulk of the development work, which you will, you're going to have a lot of problems. Agents by their nature need context. They need the information coming in, i.e. easily accessible and detailed product documentation, and they need a way to validate their work on the way out (automated builds, testing, and ideally review). Quality is a tax you are going to pay at some point in the development cycle.

The smart companies get it done as early as possible, ship cleanly, and move on to the next feature. Others don't. Statistically speaking, you probably work at a don't. According to research from the Standish Group, It's about 60% of the software industry compared to 26% that are doing the right thing. If you're in the middle, keep going, the 26% is closer than you think and you can get there. So what do we need to change to ship fast? Fortunately this is a very well studied field and there's even stuff you can measure to track if you're improving.

That which is measured improves. That which is measured and reported improves exponentially.

Karl Pearson - British Mathematician

I am talking about the DORA metrics. If you're not familiar, these are:

  • Change Lead Time how long it takes for your team to make and deploy a change
  • Deployment Frequency How often you deploy
  • Failed Deployment Recovery Time How long it takes to recover from a failed deployment that requires immediate intervention
  • Change Failure Rate The ratio of deployments that require an immediate hotfix.
  • Deployment Rework Rate The ratio of deployments that are unplanned but happen as a result of an incident in product

This is all a little nebulous and generic feeling though, right? These are general indicators. Smells to point you in the right direction of issues. It's not whether your change lead time is good or bad, it is what is impacting it. To quote Anna Karenina: All happy families are alike; each unhappy family is unhappy in its own way. The same is true for under-performing organisations.

Your team is going to need to spend the time when things go wrong to determine what went wrong and how to mitigate against it in future. Implement the change, and track its success over time. Did it help, if not why not? None of this needs to necessarily be reactive either, but it should be data-driven. Talk to your team. Most talented people at under-performing companies are very opinionated and passionate about how things can improve. Your job though is to keep it productive and to filter the signal to noise. While I understand this may sound airy it is equally as important to encourage a culture of psychological safety. If there was something wrong, how comfortable would your team actually be to tell you?

You know, outside of blow-hards like myself.

Just so we're aware too, the best time to start improving all of this was yesterday. It does also beg the question: what if my company doesn't do anything? Well, other companies, companies that don't exist yet, companies with better (or luckier) decision makers are going to come along and in a few years they're going to be leagues ahead of your company. This is already happening at the bleeding edge of software. Companies like Jetbrains and Microsoft had what most would have considered sizable moats with tools like IntelliJ and Visual Studio code. These were utterly decimated by plucky startups like Cursor and Windsurf and they're only just now beginning to catch up. When your industry changes, will you be able to react quickly?

I distinctly remember being 3 years old, laying in bed while my mother read me the picture book The Little Red Hen. The little red hen had found some wheat seeds and decided that she would plant them. She asked the other farm animals to help her, but they weren't interested. She tended to the wheat and it grew and grew. Eventually she harvested the wheat and made bread. Of course everyone wanted a slice, but none of them deserved a slice because they hadn't done the work and she refused. In the version of the story my mother read me, the hen did still eventually offer everyone bread, because kindness is an important lesson to teach children.

No one is going to offer you bread in reality. You're going to need to do the work. And unlike F1, we don't need to wait for the crash to build the safety devices. Our HANS devices already exist. The only question is whether you'll install them before you need them.

01/01