In a recent post I praised forecasts based on velocity and said everyone should use them. I would be remiss if I didn’t point out that using velocity to directly estimate end dates will cause you to be late. The culprit here, as in much of development, is bugs.
Some tools go to great lengths to lead you astray. In the Pivotal Tracker FAQ, you find this answer to “Why can’t I estimate bugs?”
By default, only features can be estimated. In contrast to features, bugs tend to emerge over time, and while they are a necessary part of your project, they can be thought of as an ongoing cost of doing business.
Tracker’s automatic velocity calculation frees you from having to account for this cost. By measuring velocity in terms of features only, Tracker can estimate how much real, business-valued work can be completed in future iteration, allowing you to predict when project milestones might be achieved, and allow you to experiment with how any change of scope might affect such milestones.
After all that, they admit that you can enable bug estimation if you must. That’s good. They pack an amazing amount of wrong into those few sentences: suggesting that features don’t tend to emerge over time, for example; Or that software development should be thought of in project terms (the root of much evil); Or that the business value of software can be considered independently of defects (all defects reduce value; there’s some threshold above which the software would be worthless); Or the subtle suggestion at the end that the impact of scope increases will be linear…. but I’ll stick to the impact of bugs on velocity. 
If you follow their recommendation, how bad your estimates will be wrong depends on how you manage your rework cycle. I’ve included the diagram here for reference.
You could choose to measure velocity purely on features, ignoring the bottom half of the diagram. All bugs will accumulate in that case (while you’re off adding business value), but you’d be adding a potentially large, but un-estimated pile of work to the end of your plan. Depending on the defect rate, the bug-fix effort can easily be greater than the feature work. I have to assume that’s not what the Pivotal folks have in mind.
In the best case, you fix bugs as you go. In this scenario, bugs do have an immediate impact your velocity, and your predictions will be better, but you should keep in mind that the only way that velocity can fully account for defects is if these assumptions hold:
1. You find all of your defects with no delay.
2. You fix all of your (must have) defects before moving on to the next bit of feature work
The first assumption will never hold in the real world; the second is just extremely unlikely.
In any other scenario, your velocity calculation will be optimistic and your projections wrong. Going back to the Pivotal example, if you don’t fix all your bugs before moving on, you must estimate them to keep your velocity reasonably correct.
In general, if you have a highly effective test process and keep the known bug count at a constant value, the delay will also be constant. But if you allow bugs to accumulate un-estimated in either the known or unknown state, your projections will be off by a growing amount as work proceeds. And that could turn out to be a big problem.
 I edited the Pivotal quote slightly for readability. Pivotal is one of the best tools around so the shear amount of misunderstanding suggested in those paragraphs is some indication of how much folklore and urban legends still drive engineering management.
The Rework Cycle is a simple, realistic model of software engineering. Here we extend that model to show how interactions with customers increase development complexity. Like the original, this version was derived from simulations using system dynamics.
On the left of the diagram, customers increase scope, adding to the backlog of work to be done, and in all likelihood, pushing out the completion date. On the right, they review the completed work and ask for changes. Both actions can cause outsized variations in development performance.
My doctor is pretty good, so I wasn’t surprised when she noticed I was fat. She backed up her observation with a metric: my weight. She also noted that the trend was not good. (Thank you, Google, for all those brownies.) When you consider blood pressure, pulse, oxygen level, cell counts, etc., maintaining human health is all about the numbers.
No so with software development, where demonstrably useful numbers are not often seen in normal practice. Here are nine tips for using metrics for better process health:
1. Don’t measure what you won’t use
Metrics are expensive and tedious to gather. Unless they’ll drive a decision, don’t collect them.
2. Embrace the limitations of your numbers
One of my hackers challenged code coverage as an inaccurate measure of test effectiveness. While he was correct, it was irrelevant. The role of a metric is to reduce uncertainty. Life is not a math quiz where only perfect answers counts.
The diagram shows a system dynamics model of software development I liberated from a class presentation at MIT. The researcher’s model was much more complex, but this is the heart of it. They dubbed this loop “the rework cycle.” In their model, it accounted to a large degree for the success or failure of a development effort.
The model says progress is determined by combining the number of people with how productive each is. The quality of the work determines how many bugs are created and there’s a defect discovery process that limits how quickly defects are found and re-enter the process as more work.
Are story points about complexity or time? Mike (Agile Estimation) Cohn was explicit:
point-based estimating is about the time the work will take
In short, story points are a flimsy, undersized hospital gown draped over real, time-based estimates. As much as you want to hide them, they’re gonna show through.
Some processes associated with Agile (continuous integration, unit testing, short, functional delivery cycles, customer representation, etc.) help us build better software. But not story points.
Jeff was nice enough to provide a login where I could evaluate tenXer with real data. What I learned (other than that Jeff has been shirking his coding duties) was how the data comes together to provide a useful picture of an engineer’s work.
most programmers believe a great hacker is several times more productive than a marginal hacker, while simultaneously believing that it’s impossible to measure hacker productivity
I believe that quote because I wrote it.
I also believe you can measure hacker productivity by looking at the code they write, and for more or less the same reason. I developed a measure of productivity and experimented with it with some friends. In this post, I’ll tell you how to calculate it (it’s easy), and how to use these powers - along with measures of quality and test coverage, of course - for good and not evil.
One of the most persistent beliefs in corporate management is that money motivates people to do a better job. It’s used to justify exorbitant executive salaries and underlies the whole performance-appraisal/salary-increase dance. It distracts managers from the vital challenge of increasing work’s intrinsic motivational power.
And it’s not true.
Granted, I’m more likely to work for you if you offer me $100 than if you offer me $10, all else equal. But the link from that to, say, creating a bonus plan that increases hacker productivity, is tenuous. Here’s why:
There’s an excellent, slightly scatological article in The Atlantic on computer scientist Larry Smarr’s quest for the quantified self. The article is by Mark Bowden of Black Hawk Down fame. If you’re interested in the intersection of Big Data, metrics and medicine you should check it out.
On a related note, I recently come across a profile in the New York Times on tenXer, a startup that claims it can turn a ‘1x’ engineer into a ‘10x’ one through data-mining and gamification. The Economist also picked up on tenXer, showing that if nothing else, they have a flair for PR. The media interest derives no doubt from TenXer’s remarkable founder Jeff Ma, a former member of the MIT BlackJack Team featured in the film ‘21’.
I signed up for their beta test to check it out. I don’t want to rush to judgement based solely on their beta, so take this with a grain of salt: From what I’ve seen, their approach is shallow, to the extent that I wonder how much they really understand the great sausage factory of software development. In an interview, Ma states that software is just the start, rather than the sole focus of tenXer; if so, the lack of depth is understandable.
The basic idea is that they gather metrics (counts of check-in’s, lines-changed, emails sent, bugs fixed, etc.) from a variety of sources (GMail, Pivotal Tracker, GitHub, Phabricator and Jira to start with) and provide visualization tools to help you track your “progress” and encouragements to beat your prior bests.
I have two concerns: First, how all these stats relate to effective software development is unclear. Contrast tenXer with the depth of Larry Smarr’s inquiries into his health. Smarr understands that without a model of how it fits together, data is meaningless.
My second concern is that tenXer is all about individual performance. Software development is a team sport, so beware the local optima. The last thing you want is someone optimizing his personal check-in rate to make the tenXer leader board.
Still, I think this is a startup to watch. Data mining is the wave of the future in software engineering and it’s great to see startups moving into this space.
(Full-disclosure: TenXer is funded by Google Ventures and my bank account is funded by Google, Inc. The opinions expressed here are my own. Google Ventures is unaware of my existence, etc. etc.)