<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-22389077-1']);
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();


Deathray Research is Larry White’s software engineering blog.  Larry is an engineering manager and hacker at Google, and lives in Beverly, MA.  He’s been managing large software projects for years and finally thinks he knows what he’s doing.* The opinions expressed here are his own.

*Actually, he thought he knew what he was doing the whole time.

PS - I bought the domain deathrayresearch.com years ago thinking i would use it for a startup. Or a blog, maybe.</description><title>Deathray Research</title><generator>Tumblr (3.0; @deathrayresearch)</generator><link>http://deathrayresearch.tumblr.com/</link><item><title>If you forecast using velocity, you're going to be late</title><description>&lt;p&gt;In &lt;a href="http://deathrayresearch.tumblr.com/post/27257008711/story-points-reconsidered" target="_blank"&gt;a recent post&lt;/a&gt; I praised forecasts based on velocity and said everyone should use them. I would be remiss if I didn&amp;#8217;t point out that using velocity to directly estimate end dates will cause you to be late. The culprit here, as in much of development, is bugs.&lt;/p&gt;
&lt;p&gt;Some tools go to great lengths to lead you astray. In the Pivotal Tracker FAQ, you find this answer to &amp;#8220;&lt;a href="https://www.pivotaltracker.com/help/faq#whycantiestimatebugsandchores" target="_blank"&gt;Why can&amp;#8217;t I estimate bugs?&lt;/a&gt;&amp;#8221;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;By default, only features can be estimated. In contrast to features, bugs tend to emerge over time, and while they are a necessary part of your project, they can be thought of as an ongoing cost of doing business.&lt;/p&gt;
&lt;p&gt;Tracker&amp;#8217;s automatic velocity calculation frees you from having to account for this cost. By measuring velocity in terms of features only, Tracker can estimate how much real, business-valued work can be completed in future iteration, &lt;em&gt;allowing you to predict when project milestones might be achieved, &lt;/em&gt;and allow you to experiment with how any change of scope might affect such milestones&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After all that, they admit that you can enable bug estimation if you must. That&amp;#8217;s good.  They pack an amazing amount of wrong into those few sentences: suggesting that features don&amp;#8217;t tend to emerge over time, for example; Or that software development should be thought of in project terms (the root of much evil); Or that the business value of software can be considered independently of defects (all defects reduce value; there&amp;#8217;s some threshold above which the software would be worthless); Or the subtle suggestion at the end that the impact of scope increases will be linear&amp;#8230;. but I&amp;#8217;ll stick to the impact of bugs on velocity. [1]&lt;/p&gt;
&lt;p&gt;If you follow their recommendation, how bad your estimates will be wrong depends on how you manage your &lt;a href="http://deathrayresearch.tumblr.com/post/27691807173/the-rework-cycle" target="_blank"&gt;rework cycle&lt;/a&gt;.  I&amp;#8217;ve included the diagram here for reference. &lt;/p&gt;
&lt;p&gt;&lt;img height="300" src="http://24.media.tumblr.com/tumblr_litu19CgWM1qhcbvmo1_1280.jpg" width="600"/&gt;&lt;/p&gt;
&lt;p&gt;You could choose to measure velocity purely on features, ignoring the bottom half of the diagram.  All bugs will accumulate in that case (while you&amp;#8217;re off adding business value), but you&amp;#8217;d be adding a potentially large, but un-estimated pile of work to the end of your plan. Depending on the defect rate, the bug-fix effort can easily be greater than the feature work. I have to assume that&amp;#8217;s not what the Pivotal folks have in mind.  &lt;/p&gt;
&lt;p&gt;In the best case, you fix bugs as you go. In this scenario, bugs do have an immediate impact your velocity, and your predictions will be better, but you should keep in mind that the only way that velocity can fully account for defects is if these assumptions hold:&lt;/p&gt;
&lt;p&gt;1. You find &lt;em&gt;all&lt;/em&gt; of your defects &lt;em&gt;with no delay&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;2. You &lt;em&gt;fix all of your (must have) defects before moving on&lt;/em&gt; to the next bit of feature work&lt;/p&gt;
&lt;p&gt;The first assumption will never hold in the real world; the second is just extremely unlikely.&lt;/p&gt;
&lt;p&gt;In any other scenario, your velocity calculation will be optimistic and your projections wrong.  Going back to the Pivotal example, if you don&amp;#8217;t fix all your bugs before moving on, you &lt;em&gt;must&lt;/em&gt; estimate them to keep your velocity reasonably correct.&lt;/p&gt;
&lt;p&gt;In general, if you have a highly effective test process and keep the known bug count at a constant value, the delay will also be constant. But if you allow bugs to accumulate un-estimated in either the known or unknown state, your projections will be off by a growing amount as work proceeds. And that could turn out to be a big problem.&lt;/p&gt;
&lt;p&gt;FootNote:&lt;/p&gt;
&lt;p&gt;[1] I edited the Pivotal quote slightly for readability. Pivotal is one of the best tools around so the shear amount of misunderstanding suggested in those paragraphs is some indication of how much folklore and urban legends still drive engineering management. &lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/29049190376</link><guid>http://deathrayresearch.tumblr.com/post/29049190376</guid><pubDate>Thu, 09 Aug 2012 08:23:00 -0400</pubDate><category>estimation</category></item><item><title>Customers ruin everything</title><description>&lt;p&gt;The &lt;a href="http://deathrayresearch.tumblr.com/post/27691807173/the-rework-cycle" target="_blank"&gt;Rework Cycle&lt;/a&gt; is a simple, realistic model of software engineering. Here we extend that model to show how interactions with customers increase development complexity. Like the original, this version was derived from simulations using &lt;a href="http://en.wikipedia.org/wiki/System_dynamics" target="_blank"&gt;system dynamics&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img align="middle" height="280" src="http://media.tumblr.com/tumblr_m8g4bryiDh1qgzfom.png" width="560"/&gt;&lt;/p&gt;
&lt;p&gt;On the left of the diagram, customers increase scope, adding to the backlog of work to be done, and in all likelihood, pushing out the completion date. On the right, they review the completed work and ask for changes. Both actions can cause outsized variations in development performance. &lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;There are three common reasons for changes:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;We built the wrong thing due to a communication failure between the customer and the development team. &lt;/li&gt;
&lt;li&gt;Work was done as specified, but the customer didn&amp;#8217;t know what they needed when they asked for it.  &lt;/li&gt;
&lt;li&gt;Work was done as specified, but by the time it was completed, it was obsolete.  How much work falls into this category depends on the rate of change in the industry and how quickly we deliver. Failing to keep pace with the competition is a special case. &lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;Each of these concerns can be reduced: The first two through better requirements gathering and, especially prototyping and customer testing; the third by increasing velocity.&lt;/p&gt;
&lt;p&gt;Feedback cycles in the process amplify the effects of these changes. Not only is there more work, but more code inevitably means more defects that need to be found and fixed, which increases the work in the original rework cycle.  Projected delivery dates will fluctuate more and delivery will likely be later. &lt;/p&gt;
&lt;p&gt;I&amp;#8217;m struck by the structural similarities between this chart and the one below. &lt;/p&gt;
&lt;p&gt;&lt;img height="264" src="http://www.triz-journal.com/archives/2002/05/f/08a.gif" width="586"/&gt;&lt;/p&gt;
&lt;p&gt;This is Deming&amp;#8217;s model of production viewed as a system. In both models you have customers using products and providing feedback that leads to redesign and redevelopment.  While one comes from manufacturing and the other from software, they&amp;#8217;re  similar because responding to customer feedback is one of the deep principles of business success. Unfortunately, most project plans don&amp;#8217;t fully account for the dynamics in their forecasts.&lt;/p&gt;
&lt;p&gt;Simulations like increase the accuracy of our forecasts and provide greater insight into our process. They&amp;#8217;re less common in engineering than in fields like epidemiology and environmental science, but with an increasing interest in data mining that&amp;#8217;s about to change. &lt;/p&gt;
&lt;p&gt;Soon to come: a model that incorporates basic management policies and decisions. &lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/28957534702</link><guid>http://deathrayresearch.tumblr.com/post/28957534702</guid><pubDate>Tue, 07 Aug 2012 23:39:00 -0400</pubDate><category>process models</category><category>system dynamics</category><category>simulation</category></item><item><title>9 steps to effective metrics</title><description>&lt;p&gt;My doctor is pretty good, so I wasn&amp;#8217;t surprised when she noticed I was fat. She backed up her observation with a metric: my weight. She also noted that the trend was not good. (Thank you, Google, for all those brownies.) When you consider blood pressure, pulse, oxygen level, cell counts, etc., maintaining human health is all about the numbers.&lt;/p&gt;
&lt;p&gt;No so with software development, where demonstrably useful numbers are not often seen in normal practice. Here are nine tips for using metrics for better process health:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Don&amp;#8217;t measure what you won&amp;#8217;t use&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Metrics are expensive and tedious to gather. Unless they&amp;#8217;ll drive a decision, don&amp;#8217;t collect them. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Embrace the limitations of your numbers&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;One of my hackers challenged code coverage as an inaccurate measure of test effectiveness. While he was correct, it was irrelevant. The role of a metric is to reduce uncertainty. Life is not a math quiz where only perfect answers counts.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Your metrics will be wrong; you need to know how&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Neiderman and Boyum[1] note:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;There is always more than one way to measure something.&lt;/li&gt;
&lt;li&gt;Measurements are error prone.&lt;/li&gt;
&lt;li&gt;Even when dead-on, measurements are often just an approximation for what you really want to know.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;That last point is critical. When we use weight to track our health all the measurement problems apply: Weight is merely a proxy for fitness, not a direct measure of it. And fitness may itself be a proxy for another goal: &lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Lester Burnham: I figured you guys might be able to give me some pointers. I need to shape up. Fast. &lt;/p&gt;
&lt;p&gt;Jim Olmeyer: Are you just looking to lose weight, or do you want increased strength and flexibility as well? &lt;/p&gt;
&lt;p&gt;Lester Burnham: I want to look good naked! [2]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;An objective measurement for looking good naked is beyond the scope of this post, but be aware of the difference between what you want to know and what you can measure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Simplicity trumps theory&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In theory, measuring calories consumed and burned has several advantages over just weighing yourself:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;It tells you in advance if you&amp;#8217;ll lose weight (it&amp;#8217;s a leading indicator; weight is a lagging indicator)&lt;/li&gt;
&lt;li&gt;Both variables are under your direct control. &lt;/li&gt;
&lt;li&gt;It avoids confounding issues like the impact of added muscle on your weight&lt;/li&gt;
&lt;li&gt;Home scales are inaccurate.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;But caloric intake and use are difficult to measure with any accurately. Weighing yourself is flawed, but easy and &amp;#8216;good enough&amp;#8217;. Rough, frequent measures are often better than theoretically correct ones. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Metrics without a model teach us nothing&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A few years ago I was serious about losing weight. I didn&amp;#8217;t count calories, but ate less and exercised more. I recorded my exercise. I weighed myself regularly on the more accurate gym scale. I charted my weight on a timeline: If progress stalled, I exercised more. &lt;/p&gt;
&lt;p&gt;I lost about three pounds per week for four months. Measurements were no problem. They were flawed, but useful because I knew &lt;em&gt;how&lt;/em&gt; they were flawed and had a good model of the underlying process:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Food provides energy.  What isn&amp;#8217;t needed is converted to fat.&lt;/li&gt;
&lt;li&gt;Exercise uses energy and reduces fat if calories used exceed calories taken in.&lt;/li&gt;
&lt;li&gt;Exercise increases muscle mass, which increases weight.&lt;/li&gt;
&lt;li&gt;Increasing muscle mass makes calories burn faster, but that effect lags.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;And so on. This is neither complete or fully accurate, but it worked. With a good model, flawed metrics can be effective. Without one, perfect measurements won&amp;#8217;t help. Your &lt;a href="http://deathrayresearch.tumblr.com/post/27691807173/whats-right-with-this-picture" title="A simple model of software development" target="_blank"&gt;model of software development&lt;/a&gt; must be clear. This is a corollary to Deming&amp;#8217;s rule that experience without theory teaches you nothing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;6. Respect the difference between critical variables and indicator variables&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Critical variables interact with the largest number of other variables. Control those and you can exert great influence on the system.&lt;/p&gt;
&lt;p&gt;Indicator variables &lt;em&gt;depend&lt;/em&gt; on other variables, but have little impact. Your weight and a project&amp;#8217;s schedule are indicator variables. Trying to manage them directly is like breaking the glass on your dashboard and moving the dials with your fingers. If you want results you have to get under the hood. Good models include both indicator and critical variables, and know which is which.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;7. Put raw numbers in context&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Joan Magretta[3] uses weight-loss to convey some additional insights:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;span&gt;&amp;#8220;If we learn that Tyler weighs 145 pounds we know something objective, but it isn&amp;#8217;t, to use the managerial term, &amp;#8220;actionable.&amp;#8221; If we learn next that Tyler is a six-foot-tall man the data begins to tell one story. If Tyler is a five-foot-tall woman, it&amp;#8217;s quite another story. Now add one more piece of context. Suppose we know that three months ago, Tyler weighed over 200 pounds. That gives us not just a a story, but a call for urgent intervention&amp;#8221; &lt;/span&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;She makes data meaningful by taking a measure and putting it in context. First, she converts it to a ratio by comparing it to height. (This is what the Body Mass Index (BMI) does.) Then she puts it in an historical context by stating what it was 3 months ago.&lt;/p&gt;
&lt;p&gt;While the BMI ratio provides more info than weight alone, the benefit of ratios is often greater. Height doesn’t change, so over time, a chart of BMI and one of weight will look similar. If both numbers in a ratio change simultaneously, charts will reveal new information hidden in the raw numbers.&lt;/p&gt;
&lt;p&gt;As Tufte writes: “Nearly all the interesting worlds we seek to understand are multivariate in nature.”[4] Three of his six principles of analytic design (comparison, multivariate, and integration of evidence) are based on visually relating and combining metrics to produce new information.  The is particularly true in complex systems like software projects that are characterized by a many causally interrelated variables and dominated by feedback loops.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;8. Combine metrics for greater insight&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Don&amp;#8217;t calculate lots of unrelated metrics. Pareto tells us only a few really matter, so use additional metrics to bolster the key ones. Current Open Bug Count says more about quality when backed by test coverage. Current Total Work Remaining is misleading without the percentage of tasks yet to be estimated. In both cases, the second metric tells us about the accuracy of the first. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;9. Use your program to help the team improve. &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Once you have everything setup, you can begin the hard and creative work of helping your team improve, but as Magretta cautions: &amp;#8220;don&amp;#8217;t lose sight of the underlying human behavior.&amp;#8221; Trends tell us how things change, but not why and not what to do about it. Using your program to reward or, especially, punish individuals is sure-fire way to encourage cheating or counterproductive behavior.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As my doctor will attest, I regained all the weight I once lost. No charts illustrated the decline. &amp;#8220;Numbers are essential to organizational performance&amp;#8221; writes Magretta. &amp;#8221;Doing the numbers begins with the simple act of measurement. If you want to know, objectively, how much you weigh, you have to get on the scale.&amp;#8221; In other words, without measurement you can&amp;#8217;t manage and without management (the process, not the &amp;#8216;suits&amp;#8217;) you can&amp;#8217;t succeed.  &lt;/p&gt;
&lt;p&gt;Here, summarized, are the 9 points:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Don&amp;#8217;t measure what you won&amp;#8217;t use&lt;/li&gt;
&lt;li&gt;Embrace the limitations of your numbers&lt;/li&gt;
&lt;li&gt;Your metrics will be wrong; you need to know how&lt;/li&gt;
&lt;li&gt;Simplicity trumps theory&lt;/li&gt;
&lt;li&gt;Base your metrics on a model of the system&lt;/li&gt;
&lt;li&gt;Respect the difference between critical variables and indicator variables&lt;/li&gt;
&lt;li&gt;Put raw numbers in context&lt;/li&gt;
&lt;li&gt;Combine metrics for greater insight&lt;/li&gt;
&lt;li&gt;Use your program to help the team improve&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;Footnotes:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Derrick Niederman, David Boyum: &lt;a href="http://www.amazon.com/What-Numbers-Say-Mastering-Numerical/dp/0767909984" target="_blank"&gt;What the Numbers Say: A Field Guide to Mastering Our Numerical World&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;American Beauty&lt;/li&gt;
&lt;li&gt;Joan Magretta: &lt;em&gt;&lt;a href="http://www.amazon.com/What-Management-Works-Everyones-Business/dp/0743203186" target="_blank"&gt;What Management Is: How It Works and Why It’s Everyone’s Business&lt;/a&gt;.&lt;/em&gt;&lt;span&gt; &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;Edward Tufte: &lt;a href="http://www.amazon.com/Beautiful-Evidence-Edward-R-Tufte/dp/0961392177" target="_blank"&gt;Beautiful Evidence&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;</description><link>http://deathrayresearch.tumblr.com/post/27760631409</link><guid>http://deathrayresearch.tumblr.com/post/27760631409</guid><pubDate>Sun, 22 Jul 2012 09:02:00 -0400</pubDate><category>metrics</category></item><item><title>The rework cycle</title><description>&lt;p&gt;&lt;img alt="Systems model" height="330" src="http://24.media.tumblr.com/tumblr_litu19CgWM1qhcbvmo1_1280.jpg" width="650"/&gt;&lt;/p&gt;
&lt;p&gt;The diagram shows a system dynamics model of software development I liberated from a &lt;a href="http://mit.uvt.rnu.tn/OcwWeb/Engineering-Systems-Division/ESD-36JFall-2003/CourseHome/index.htm" target="_blank"&gt;class&lt;/a&gt; presentation at MIT. The researcher&amp;#8217;s model was much more complex, but this is the heart of it. They dubbed this loop &amp;#8220;the rework cycle.&amp;#8221; In their model, it accounted to a large degree for the success or failure of a development effort.&lt;/p&gt;
&lt;p&gt;The model says progress is determined by combining the number of people with how productive each is. The quality of the work determines how many bugs are created and there’s a defect discovery process that limits how quickly defects are found and re-enter the process as more work.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;The model is idealized in one sense: Given fixed requirements, productivity and quality, the system will run with complete predictability, like a wind-up toy, until the work is done and the bugs are fixed. &lt;/p&gt;
&lt;p&gt;You can plug in some numbers and see: Say there are 10 requirements and one programmer who implements a requirement every two days.  If the programmer creates 1 bug per requirement and it takes him a 1/2 day to fix each, then the project will complete in 25 days, assuming the bugs are found without delay and I didn’t screw up the math. &lt;/p&gt;
&lt;p&gt;But there is one thing that separates this model from typical agile or gantt-based models. It includes cycles - just like real world processes. In this model, the bug-fixes can themselves have defects, further increasing the projected end date as they&amp;#8217;re discovered. &lt;/p&gt;
&lt;p&gt;For more realism, you need only add customers, management, or both.  With that, most of the problems of large systems engineering can be explained. In future posts, we’ll consider both the Customers-Ruin-Everything and the Corrupting-Influence-of-Management extensions.&lt;/p&gt;
&lt;p&gt;Having a simple, yet complete model of your development process is key to making improvements. What&amp;#8217;s yours?&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/27691807173</link><guid>http://deathrayresearch.tumblr.com/post/27691807173</guid><pubDate>Sat, 21 Jul 2012 07:19:00 -0400</pubDate><category>process models</category><category>system dynamics</category><category>simulation</category></item><item><title>Story points reconsidered</title><description>&lt;p&gt;Are story points about complexity or time? Mike (Agile Estimation) Cohn was &lt;a href="http://www.mountaingoatsoftware.com/blog/its-effort-not-complexity/" target="_blank"&gt;explicit&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;span&gt;point-based estimating is about the &lt;strong&gt;time&lt;/strong&gt; the work will take&lt;/span&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In short, story points are a flimsy, undersized hospital gown draped over real, time-based estimates. As much as you want to hide them, they&amp;#8217;re gonna show through.&lt;/p&gt;
&lt;p&gt;Some processes associated with Agile (continuous integration, unit testing, short, functional delivery cycles, customer representation, etc.) help us build better software. But not story points.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;Jeff Sutherland, founding father of Scrum, is a proponent of story points. &lt;a href="http://scrum.jeffsutherland.com/2010/04/story-points-why-are-they-better-than.html" target="_blank"&gt;Jeff argues&lt;/a&gt; that because one programmer does a task faster than another you can&amp;#8217;t forecast based on hours completed. He seems not to recognize that the performance variation between hackers is the same whether you estimate in points or hours. The unit of measurement is irrelevant. &lt;/p&gt;
&lt;p&gt;He continues:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;span&gt;we estimate everything in points for the Product Owner so that he create a release roadmap based on team velocity and adjust the plan if velocity changes.&lt;/span&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Or, you could estimate in &lt;em&gt;days&lt;/em&gt;[1], calculate a velocity, and adjust the plan if the velocity changes. Again, points adds nothing.&lt;/p&gt;
&lt;p&gt;In his defense of story points, Jeff sites research that showed:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Not knowing the velocity of production of the teams is the root cause of 100% failure of release plans to be accurate in their board meetings.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In other words, story points are good because velocity is good, &amp;#8212; &lt;em&gt;but velocity and story points are two different things&lt;/em&gt;. Using velocity for scheduling is a massive step forward. If you&amp;#8217;re not doing it, start. Look closely at Jeff&amp;#8217;s arguments and you see they hold if you substitute &lt;em&gt;velocity&lt;/em&gt; for &lt;em&gt;story-points&lt;/em&gt;. &lt;/p&gt;
&lt;p&gt;Beyond that, that statement ignores all sources of schedule error not covered by velocity, a constant error on the known tasks. When schedules go to hell, it&amp;#8217;s usually because:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;tasks are added or requirements change - &lt;em&gt;same velocity, different end date&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;people leave the project (or get added) - &lt;em&gt;velocity shifts because the project structure changed&lt;/em&gt;, or&lt;/li&gt;
&lt;li&gt;things seemed to be on track, but big hidden pile of bugs gets found, and panic ensues -&lt;em&gt; measured progress (velocity) wasn&amp;#8217;t real&lt;/em&gt;. &lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;These must be addressed by other means.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Castles in the air&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Another common argument is that it&amp;#8217;s better to estimate in points because you&amp;#8217;re comparing one task to another: It&amp;#8217;s twice as long, or whatever. This argument is typically illustrated by the &lt;em&gt;Parable of The Two Runners, &lt;/em&gt;wherein we have two runners; one finishes a course in 5 minutes (&amp;#8220;That&amp;#8217;s a five-minute course&amp;#8221;, she says), the other in ten (&amp;#8220;That&amp;#8217;s a ten minute course&amp;#8221;). When they see a course that&amp;#8217;s twice as long, they disagree about how long it will take, but they both agree that it&amp;#8217;s twice as long. Happiness.&lt;/p&gt;
&lt;p&gt;&lt;img align="left" height="450" src="http://1.bp.blogspot.com/_oA2SLury9N0/S9Nfef0uK0I/AAAAAAAAAnk/dfg0KKZqOnE/s1600/castles-in-the-sky.jpg" width="300"/&gt;But they can agree &lt;em&gt;because&lt;/em&gt; distance is measured in clear, universally understood units. Just like days. Everyone knows how big one is.&lt;/p&gt;
&lt;p&gt;Measuring with points isn&amp;#8217;t like using with meters instead of yards, it&amp;#8217;s like everyone has their own &amp;#8220;meter-long&amp;#8221; rod they calibrate with, and they&amp;#8217;re all different. Seriously, how big is a point? Each rod is based on some estimate of how long some task might take. (If we could do that reliably we wouldn&amp;#8217;t be having this discussion.) On different projects you&amp;#8217;re not even talking about the same task. &lt;/p&gt;
&lt;p&gt;Is my database task bigger than your two point design task? The only way to know is to convert each to common units (like hours) and then decide. Estimates don&amp;#8217;t get more accurate by being more abstract. &lt;/p&gt;
&lt;p&gt;Another common argument is that It&amp;#8217;s faster to estimate story points. Maybe, but I think part of the benefit comes from the sound practice of using an exponential (1, 2, 4, 8, etc.), or Fibonacci scale, instead of striving for pointless, unachievable accuracy (111.5 days).  This is another practice that should be widely emulated.&lt;/p&gt;
&lt;p&gt;In any case, what percent of your project is given to estimation?  Don&amp;#8217;t introduce a new, fuzzy unit of measure to optimize &lt;em&gt;a small&lt;/em&gt; &lt;em&gt;fraction&lt;/em&gt; of the plan. In his classic work &lt;em&gt;The Diffusion of Innovations&lt;/em&gt;, Rogers identifies &amp;#8216;compatibility&amp;#8217; with existing technologies and beliefs as one of five factors influencing the adoption rate for innovations. Calendars are a key technology. Introducing a new measurement system based on fuzzy concepts will not inspire confidence in your customers. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Story points mean never having to say you&amp;#8217;re sorry&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;But if people like Sutherland, Cohn and Martin Fowler all use story points they must be good for something. They&amp;#8217;re good for this:&lt;/p&gt;
&lt;p&gt;When a one story point task takes two days, no one says you were wrong. You didn&amp;#8217;t say it was a one day task; you said it was a one &lt;em&gt;story-point&lt;/em&gt; task. If you do your initial estimate in days, someone will inevitably come back to that first, raw estimate and try to force you to meet it. These people are everywhere. They learned everything they know about management from &lt;em&gt;The Apprentice, &lt;/em&gt;and actually believe they&amp;#8217;re &amp;#8220;playing hardball&amp;#8221; or &amp;#8220;results oriented&amp;#8221;, rather than &amp;#8220;counter-productive&amp;#8221;, or &amp;#8220;stupid&amp;#8221;.&lt;/p&gt;
&lt;p&gt;I feel for you, but it&amp;#8217;s not worth the trouble of introducing a new &amp;#8220;scale&amp;#8221;. The reason is that it&amp;#8217;s a one-time fix. The first time you generate dates from your points, you have a baseline some savant-idiot from &amp;#8220;the business&amp;#8221; can club you with. &lt;/p&gt;
&lt;p&gt;Success in engineering management lies mostly in not doing stupid things to try to meet unachievable dates. That requires education, communication, political ability, and a skin like rhino hide. Metrics sleight-of-hand won&amp;#8217;t cut it. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Agile estimation provides two tools everyone should use: Velocity, and an exponential scale. &lt;a href="http://deathrayresearch.tumblr.com/post/4503505772/the-pathology-of-estimates" title="The pathology of estimates" target="_blank"&gt;Your estimates will always be wrong&lt;/a&gt;. If you calculate velocity based on real progress and use &lt;em&gt;that&lt;/em&gt; to make predictions, they&amp;#8217;ll be better. They&amp;#8217;ll be about as good, in fact, as fallible humans can make them.&lt;/p&gt;
&lt;p&gt;Footnotes:&lt;/p&gt;
&lt;p&gt;[1] All references to days mean ideal-hacker days. Let velocity handle the conversion to calendar time as well as any estimation error.&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/27257008711</link><guid>http://deathrayresearch.tumblr.com/post/27257008711</guid><pubDate>Sun, 15 Jul 2012 09:02:00 -0400</pubDate></item><item><title>These will make you smarter</title><description>&lt;p&gt;Ten of the best books from the Deathray Research bibliography. Guaranteed to make you smarter about software engineering and the world. Inspired by the book, &lt;a href="http://www.amazon.com/This-Will-Make-You-Smarter/dp/0062109391/" target="_blank"&gt;This Will Make You Smarter&lt;/a&gt;, and my teenage son, who said today &amp;#8220;All books are self-help books&amp;#8221;.  Couldn&amp;#8217;t agree more. &lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="http://www.amazon.com/Making-Things-Work-Solving-Problems/dp/0965632822" target="_blank"&gt;Making Things &lt;/a&gt;&lt;a href="http://www.amazon.com/Making-Things-Work-Solving-Problems/dp/0965632822" target="_blank"&gt;Work&lt;/a&gt;&lt;a href="http://www.amazon.com/Making-Things-Work-Solving-Problems/dp/0965632822" target="_blank"&gt;: Solving Complex Problems in a Complex World&lt;/a&gt;&lt;/em&gt;. The author is a complexity theory guy from MIT.  Full of interesting ideas. Very theoretical, but, thankfully, no math.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="http://www.amazon.com/Predictable-Surprises-Disasters-Prevent-Leadership/dp/1591391784" target="_blank"&gt;Predictable Surprises: The Disasters You Should Have Seen Coming and How to Prevent Them&lt;/a&gt;&lt;/em&gt;. Explains the recurrence of certain types of disasters by showing that they have deep economic, political and cognitive roots that repeatedly prevent people from recognizing and avoiding. Obvious parallels to software project management.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="http://www.amazon.com/Logic-Failure-Things-Wrong-Right/dp/0805041605" target="_blank"&gt;The Logic of Failure&lt;/a&gt;&lt;/em&gt;&lt;a href="http://www.amazon.com/Logic-Failure-Things-Wrong-Right/dp/0805041605" target="_blank"&gt;:&lt;em&gt; Recognizing and Avoiding Errors in Complex Situations&lt;/em&gt;&lt;/a&gt;. An introduction to complex systems (like software development) that explains why they’re prone to mis-management and failure. Also see the presentations from the &lt;em&gt;&lt;a href="http://mit.uvt.rnu.tn/OcwWeb/Engineering-Systems-Division/ESD-36JFall-2003/Readings/index.htm" target="_blank"&gt;System and Project Management course&lt;/a&gt; &lt;/em&gt;at&lt;span&gt; &lt;/span&gt;&lt;a href="http://mit.uvt.rnu.tn/OcwWeb/Engineering-Systems-Division/ESD-36JFall-2003/Readings/index.htm" target="_blank"&gt;MIT&lt;/a&gt;. It&lt;span&gt; &lt;/span&gt;&lt;span&gt;covers the application of system dynamics to project management. It would be at the top of the list if it were a book.. &lt;/span&gt;&lt;/p&gt;
&lt;p class="p2"&gt;&lt;em&gt;&lt;a href="http://www.amazon.com/Evolution-Manufacturing-Systems-at-Toyota/dp/0195123204" target="_blank"&gt;The Evolution of Manufacturing Systems at Toyota&lt;/a&gt;&lt;/em&gt;  The best book about Toyota. It is very theoretical. The most relevant chapter describes all of Toyota as an information processing system, an approach that works extremely well for software development, which I stole for that very reason.&lt;/p&gt;
&lt;p class="p2"&gt;&lt;a href="http://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374275637/ref=sr_1_1?s=books&amp;amp;ie=UTF8&amp;amp;qid=1342292030&amp;amp;sr=1-1&amp;amp;keywords=thinking+fast+and+slow" target="_blank"&gt;Thinking Fast and Slow&lt;/a&gt;. Kahneman&amp;#8217;s summation of his life&amp;#8217;s research on biases and decision-making. &lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="http://www.amazon.com/Better-Surgeons-Performance-Atul-Gawande/dp/0805082115" target="_blank"&gt;Better: A Surgeon’s Notes on Performance&lt;/a&gt;&lt;/em&gt;. Inspiring book about how process improvement can save lives&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="http://www.amazon.com/Factory-Physics-Second-Wallace-Hopp/dp/0256247951" target="_blank"&gt;Factory Physics: Foundations of Manufacturing Management&lt;/a&gt;&lt;/em&gt;.  This is a fascinating, deep analysis of factory operations.  Long on deep principles, short on bs.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="http://www.amazon.com/How-Measure-Anything-Intangibles-Business/dp/0470539399/ref=sr_1_1?s=books&amp;amp;ie=UTF8&amp;amp;qid=1342097005&amp;amp;sr=1-1&amp;amp;keywords=how+to+measure+anything" target="_blank"&gt;How to Measure Anything: Finding the Value of ‘Intangibles’ in Business&lt;/a&gt;.&lt;/em&gt; A useful guide to using measurement to improve your understanding of the real world.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="http://www.amazon.com/Modern-Firm-Organizational-Performance-Management/dp/0198293763" target="_blank"&gt;The Modern Firm&lt;/a&gt;. &lt;/em&gt; A fine, short book connecting organizational design to performance through the miracle of managerial economics, by the top guy in the field. Very good on incentives.&lt;/p&gt;
&lt;p class="p1"&gt;&lt;em&gt;&lt;a href="http://www.amazon.com/Success-Open-Source-Steven-Weber/dp/0674012925" target="_blank"&gt;The Success of Open Source&lt;/a&gt;.&lt;/em&gt; Written by a Berkeley political scientist, it looks at open source as a production system. Highly recommend it to anyone who cares about software development. If there’s a more insightful book about Open Source tell me what it is.&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/27210067781</link><guid>http://deathrayresearch.tumblr.com/post/27210067781</guid><pubDate>Sat, 14 Jul 2012 15:56:00 -0400</pubDate></item><item><title>tenXer revisited</title><description>&lt;p&gt;I had a brief, and pleasant, conversation with &lt;a href="http://www.tenxer.com" target="_blank"&gt;tenXer&lt;/a&gt; CEO &lt;a href="http://en.wikipedia.org/wiki/Jeff_Ma" target="_blank"&gt;Jeff Ma&lt;/a&gt; yesterday.  We talked about metrics, performance improvement and where tenXer is heading. &lt;/p&gt;
&lt;p&gt;Jeff was nice enough to provide a login where I could evaluate tenXer with real data. What I learned (other than that Jeff has been shirking his coding duties) was how the data comes together to provide a useful picture of an engineer&amp;#8217;s work. &lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;Today, tenXer gets three key things right:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;They make it fun. The emphasis is on positive feedback. There&amp;#8217;s a leader board; there&amp;#8217;s no loser board. &lt;/li&gt;
&lt;li&gt;They make it easy.  There&amp;#8217;s no data entry required. It&amp;#8217;s generated by your normal work activities.&lt;/li&gt;
&lt;li&gt;They make it non-threatening.  You own your own data. There&amp;#8217;s no big-brother thing going on. &lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;All are critical to making metrics effective.  The other key take away, is that Jeff and tenXer are dead serious about moving from the high-level data they aggregate today, to the deeper principles of software productivity. I would have known this already, had I &lt;a href="http://blog.tenxer.com/2012/06/18/the-first-small-step-measuring-activity/" target="_blank"&gt;read his blog post&lt;/a&gt; where he says as much, before blogging that tenXer needs to do this. My bad.  &lt;/p&gt;
&lt;p&gt;I was right about one thing, though. These guys are worth keeping an eye on.&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/27054937940</link><guid>http://deathrayresearch.tumblr.com/post/27054937940</guid><pubDate>Thu, 12 Jul 2012 12:02:00 -0400</pubDate></item><item><title>Measuring productivity thru code-mining</title><description>&lt;blockquote&gt;
&lt;p&gt;most programmers believe a great hacker is &lt;a href="http://deathrayresearch.tumblr.com/post/26418932457/tenxer-the-quantified-life-and-the-quantified-hacker" title="tenXer, the quantified life and the quantified hacker" target="_blank"&gt;several times&lt;/a&gt; more productive than a marginal hacker, while simultaneously believing that it’s impossible to measure hacker productivity&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I believe that quote because &lt;a href="http://deathrayresearch.tumblr.com/post/4367093927/software-metrics-and-ethics" title="Software, Metrics and Ethics" target="_blank"&gt;I wrote it&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I also believe you can measure hacker productivity by looking at the code they write, and for more or less the same reason. I developed a measure of productivity and experimented with it with some friends. In this post, I&amp;#8217;ll tell you how to calculate it (it&amp;#8217;s easy), and how to use these powers - along with measures of quality and test coverage, of course - for good and not evil. &lt;!-- more --&gt;&lt;/p&gt;
&lt;div&gt;
&lt;p&gt;When I say measure I&amp;#8217;m mean to reduce uncertainty through observation. I don&amp;#8217;t mean perfection. This measure is imperfect, but useful. And it only tells you about output, not whether you&amp;#8217;re building the right thing. You have Agile and Lean Startup tools for that.&lt;/p&gt;
&lt;/div&gt;
&lt;div&gt;Looking at code is key: In no other craft is the record so complete. Despite limitations in version control systems[0], your source code revision history contains the most complete and accurate development data available. Bug and project data don&amp;#8217;t come close, though they get more attention.&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;The metrics:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Productivity is a measure of change over time, so the process has three steps:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Measure one version of some code&lt;/li&gt;
&lt;li&gt;Measure how much it differs from another version&lt;/li&gt;
&lt;li&gt;Total those differences over some time intervals (code changes per week, for example)&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;That&amp;#8217;s it. Because the approach is rooted in information theory, I refer (pretentiously and imprecisely) to the size metric as the &amp;#8220;&lt;em&gt;Information content&lt;/em&gt;&amp;#8221; of the code, and the change metric as the &amp;#8220;&lt;em&gt;information distance&lt;/em&gt;&amp;#8221; between two versions of the same code. They&amp;#8217;re based on the insight that compression can be used to measure the &amp;#8216;distance&amp;#8217; between two texts, first presented in a paper called &lt;a href="http://prl.aps.org/abstract/PRL/v88/i4/e048702" title="Language Trees and Zipping" target="_blank"&gt;Language Trees and Zipping&lt;/a&gt; by Benedetto, Caglioti, and Loreto. Their (controversial) findings were summarized by the &lt;a href="http://www.economist.com/node/975770" title="Computers and Language: The Elements of Style" target="_blank"&gt;Economist&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;And now, our first metric: &lt;em&gt;The &amp;#8216;information content&amp;#8217; of a program is its compressed size in bytes&lt;/em&gt;. Why compressed? In short, because it squeezes out the redundancy and boilerplate in the code. And because we need it compressed for the second metric.&lt;/p&gt;
&lt;p&gt;Here&amp;#8217;s the other metric: &lt;em&gt;The &amp;#8216;information distance&amp;#8217; between two versions of the same code is the compressed size of both versions concatenated, minus the compressed size of the original.&lt;/em&gt; To convert this to a productivity measure, sum the distance measures over some time interval. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A little light theory&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In information theory, the Kolmogorov complexity of a string is defined as the length of &lt;img align="left" alt="Kolmogorov" height="350" src="http://upload.wikimedia.org/wikipedia/commons/0/08/Kolm_complexity_lect.jpg" width="438"/&gt;its smallest representation in a given language.  The more regular (less complex) a string is, the smaller its compressed representation. Hence &amp;#8216;aaaaaaaaaaaa&amp;#8217; can be represented in fewer bytes than a random string of the same length. &lt;/p&gt;
&lt;p&gt;Because compression algorithms approach this theoretical minimum, the relative size of two strings compressed with the same algorithm is an approximation of their Kolmogorov complexity. Given it&amp;#8217;s regularity, a program file consisting mainly of accessors (&amp;#8220;public void setFoo(String foo)&amp;#8230;&amp;#8221;) should compress better than one where each line is unique. This fits roughly with notions of simple and complex code. Compression also minimizes the effect of differences in formatting, variable naming, and so on.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why I do it this way&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The measures combine two key aspects of output: the amount and complexity of the code produced. There are other nice properties:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;It&amp;#8217;s fairly easy to implement&lt;/li&gt;
&lt;li&gt;It has a reasonable theoretical basis&lt;/li&gt;
&lt;li&gt;It&amp;#8217;s language independent: the same implementation can be used to measure Ruby,  Java, Python, Lisp, etc.[1]. Metrics like Cyclomatic Complexity require language-specific implementations.&lt;/li&gt;
&lt;li&gt;It can be used to measure HTML, XML, CSS, maybe even documentation, as well as actual program code. Maintaining these files is a big part of what hackers do.&lt;/li&gt;
&lt;li&gt;All changes produce a positive result.[2] If you refactor code to make it simpler and smaller, the zipped combination will still be larger than the zipped original, so it shows as an addition to productivity. That&amp;#8217;s not the case with LOC, NCSS and Cyclomatic Complexity, but going from 500 LOC to 480&amp;#160;&lt;em&gt;really is&lt;/em&gt; progress and should be counted as such.[3]&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Yeah, but does it work?&lt;/strong&gt; &lt;/p&gt;
&lt;p&gt;We take the most basic definition of &amp;#8216;work&amp;#8217; to mean that larger differences between two files consistently produce larger information distance values and small differences smaller ones.&lt;/p&gt;
&lt;p&gt;To test that, we reasoned that each time a file is edited it tends to diverge further from the original. We compared over 100 versions of a single file individually against the original and plotted the calculated distances in version order. As expected, each subsequent version was further from the original than its predecessors- except in a few cases. In those cases, code had been removed that was also not in the original, so that version &lt;em&gt;really&lt;/em&gt; &lt;em&gt;was&lt;/em&gt; more like the original than it&amp;#8217;s predecessor. All this suggests the method is a reasonably consistent measure of change. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Possible applications&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We tried this metric as part of a larger code-mining experiment. In it, we calculated the changes between every version of every file for a multi-year project. Here are a few findings: &lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Productivity did not vary with experience size, though I thought it would. &lt;/li&gt;
&lt;li&gt;Neither adding or reducing headcount made productivity go up. Cost, obviously does vary with team size, so that&amp;#8217;s something to consider in your team. &lt;/li&gt;
&lt;li&gt;That whole &amp;#8220;10X&amp;#8221; thing, is it for real? It appeared to be.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Measuring the relative investment in test versus production code is an interesting application. We simply calculated productivity separately for production and test code and looked at the ratio. If you know what your total staffing costs are, this is an easy way to estimate what you spent on tests. Comparing this ratio across projects could help quantify the cost/benefit ratio for automation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Measuring the productivity of individual hackers&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;An obvious application is to see who writes the most code. Since we pulled all the metadata from our repository with each change set, this was fairly easy to calculate. This metric is &lt;a href="http://deathrayresearch.tumblr.com/post/4367093927/software-metrics-and-ethics" title="Software, Metrics and Ethics" target="_blank"&gt;fraught with danger&lt;/a&gt;, however, so proceed use caution. You won&amp;#8217;t be able to resist peeking, but don&amp;#8217;t publicize these numbers or use them for reviews. Don&amp;#8217;t even tell your boss unless you&amp;#8217;re sure she can show the same restraint.&lt;/p&gt;
&lt;p&gt;And - please - don&amp;#8217;t try to directly manage this number (&amp;#8220;10% more productivity next quarter, please!&amp;#8221;). Productivity is a dependent variable; focus on variables that are both (1) relatively independent and (2) high leverage. More on this in another post I&amp;#8217;ve been writing for like a year now. (speaking of productivity.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Caveats:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;These ideas are experimental. All metrics should be subjected to repeated, but I decided to put it out there now because I&amp;#8217;m not sure when or if I&amp;#8217;ll get back to it. &lt;/li&gt;
&lt;li&gt;Always consider the sum of a person&amp;#8217;s contribution. Some people who scored high on productivity reported very few bugs over the same period. Writing good bug reports is a useful activity that reduces the time available for coding.&lt;/li&gt;
&lt;li&gt;The choice of compression algorithm is critical. Gzip produces inconsistent results. We got our best results using libbsc on the default settings. YMMV.&lt;/li&gt;
&lt;li&gt;If you measure productivity and don&amp;#8217;t track quality, you&amp;#8217;ll get what you deserve.&lt;/li&gt;
&lt;/ul&gt;&lt;div&gt;&lt;strong&gt;Footnotes:&lt;/strong&gt;&lt;/div&gt;
&lt;p&gt;[0] SVN didn&amp;#8217;t track the connection between renamed files very well. &lt;/p&gt;
&lt;p&gt;[1] If you want to strip comments before zipping, you need separate code for C-style comments, etc. Also note that I&amp;#8217;m not suggesting that results for different languages are comparable.&lt;/p&gt;
&lt;p&gt;[2] If a file is removed completely, the distance would be 0. This isn&amp;#8217;t awful, though, because removing a file is easy, and you could probably compensate by adding a few points.  Changing other code to make a file obsolete may involve real work, but that &lt;em&gt;would&lt;/em&gt; be picked up in the distance metrics.&lt;/p&gt;
&lt;p&gt;[3] You can count lines added, removed, and changed, but the way diff algorithms decide when something has been changed, rather than removed and added, often seems arbitrary, if not just wrong.&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/26975107981</link><guid>http://deathrayresearch.tumblr.com/post/26975107981</guid><pubDate>Wed, 11 Jul 2012 09:13:00 -0400</pubDate><category>code-mining</category><category>metrics</category></item><item><title>Money can't buy me performance</title><description>&lt;p&gt;One of the most persistent beliefs in corporate management is that money motivates people to do a better job. It&amp;#8217;s used to justify exorbitant executive salaries and underlies the whole performance-appraisal/salary-increase dance. It distracts managers from the vital challenge of increasing work&amp;#8217;s intrinsic motivational power.&lt;/p&gt;
&lt;p&gt;And it&amp;#8217;s not true. &lt;/p&gt;
&lt;p&gt;Granted, I&amp;#8217;m more likely to work for you if you offer me $100 than if you offer me $10, all else equal.  But the link from that to, say, creating a bonus plan that increases hacker productivity, is tenuous. Here&amp;#8217;s why:&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;First, a key assumption is that you can tell if I do a better job.  If I know you can&amp;#8217;t, I&amp;#8217;m not motivated to work harder, though I&amp;#8217;m happy to take your money.  With hackers it&amp;#8217;s extremely difficult to make fine-grained distinctions about performance because every work assignment is unique, and there&amp;#8217;s massive noise in the form of meetings and other non-coding work.&lt;/p&gt;
&lt;p&gt;Second, it&amp;#8217;s hard to design an incentive system that doesn&amp;#8217;t produce negative side effects: Pay a bonus for more output, you get more bugs and less teamwork. Pay for fewer bugs, get less output. Pay $10 for every bug fixed and &lt;a href="http://dilbert.com/strips/comic/1995-11-13/" title="Dilbert minivan" target="_blank"&gt;Wally gets a new minivan&lt;/a&gt;.  The bigger the bonus, the greater the adverse side effects. &lt;/p&gt;
&lt;p&gt;Nothing illustrates this better than Wall Street, which offers enormous incentive pay without earning a reputation for ethical behavior or performance. In the long run,  technology firms that offer exceptional pay risk attracting people whose primary interest is money.  When growth slows these people can enrich themselves only by grabbing a bigger piece of the pie. You may need to adopt a Microsoft style org chart: [1] &lt;/p&gt;
&lt;p&gt;&lt;img alt="Microsoft Org Chart" height="260" src="http://www.globalnerdy.com/wordpress/wp-content/uploads/2011/07/microsoft-org-chart.jpg" width="400"/&gt;&lt;/p&gt;
&lt;p&gt;Third, the &amp;#8216;decreasing marginal utility of money&amp;#8217; is Economics 101. It says that the motivational impact of an extra dollar decreases as the total amount of money increases. Hackers are among the best paid professionals in the world and every time you pay them more, you decrease the effectiveness of their next raise.&lt;/p&gt;
&lt;p&gt;Not only does economic theory itself casts doubts on the utility (pun intended) of monetary rewards, their limits are backed by accepted psychological theory and empirical studies.&lt;/p&gt;
&lt;p&gt;&lt;img align="left" alt="Abraham Maslow" height="333" src="http://upload.wikimedia.org/wikipedia/en/e/e0/Abraham_Maslow.jpg" width="264"/&gt;&lt;/p&gt;
&lt;p&gt;As psychological theories go, few have the acceptance and prestige of Abraham Maslow&amp;#8217;s &lt;a href="http://upload.wikimedia.org/wikipedia/commons/thumb/6/60/Maslow%27s_Hierarchy_of_Needs.svg/800px-Maslow%27s_Hierarchy_of_Needs.svg.png" title="The Hierarchy" target="_blank"&gt;Hierarchy of Human Needs&lt;/a&gt;.  First described in his paper &amp;#8220;&lt;a href="http://psychclassics.yorku.ca/Maslow/motivation.htm" title="A Theory of Human Motivation" target="_blank"&gt;A Theory of Human Motivation&lt;/a&gt;&amp;#8221;, Maslow&amp;#8217;s Hierarchy posited that human motivation is determined by a set of common needs, and that these are arranged in a pyramid, with the more pressing and immediate at the bottom. These have to be satisfied before you move up the pyramid. &lt;/p&gt;
&lt;p&gt;At the very bottom are physological needs like breathing, until you&amp;#8217;ve satisfied that, nothing else matters, but when you&amp;#8217;re breathing easily, I can&amp;#8217;t motivate you with an offer of even more breathing.  &lt;/p&gt;
&lt;p&gt;Money, it seems, works the same way. If you don&amp;#8217;t have enough to adequately clothe, feed and house your family, it dominates your motivation, but passed that stage money is less motivating than friendship and learning. &lt;/p&gt;
&lt;p&gt;Large-scale statistical studies bear this out. As The Economist recently &lt;a href="http://www.economist.com/node/21556989" title="Chinese study on wealth and happiness" target="_blank"&gt;reported&lt;/a&gt;, a study in China tracked contentment over a 20 year period during which the economy grew by 400%. The findings:  &lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;China was ideal for testing expectations of a correlation between economic growth and well-being. But what they found was “no evidence of a marked increase in life satisfaction” to match the rise in prosperity during that period.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Numerous other studies show similar results.&lt;/p&gt;
&lt;p&gt;Unfortunately, the relationship between satisfaction and wealth is not symmetrical. Decreased compensation causes more unhappiness than an increase of the same amount.  And if you pay someone a $10K bonus this year and an $8K bonus next year, it will feel to them like a $2K decrease. &lt;/p&gt;
&lt;p&gt;Not only do extrinsic rewards like bonuses lose their motivational power as people become better off, there is considerable evidence that they &lt;a href="http://www.spring.org.uk/2009/10/how-rewards-can-backfire-and-reduce-motivation.php" title="Extrinsic rewards/Intrinsic motivation" target="_blank"&gt;reduce &lt;em&gt;intrinsic&lt;/em&gt; motivation&lt;/a&gt;. For an entertaining summary of this argument, check out &lt;a href="http://www.ted.com/talks/dan_pink_on_motivation.html" title="Dan on Ted" target="_blank"&gt;Dan Pink on TED&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the business world, problems with incentive compensation aren&amp;#8217;t breaking news. Frederick Herzberg&amp;#8217;s 1968 classic &amp;#8220;&lt;a href="http://www.facilitif.eu/user_files/file/herzburg_article.pdf" title="One more time..." target="_blank"&gt;One More Time: How Do You Motivate People&lt;/a&gt;&amp;#8221; has been reprinted 1.2 million times, more than any other article in the history of the Harvard Business Review. For Herzberg, money is a &amp;#8220;hygiene factor&amp;#8221;, which causes dissatisfaction when it&amp;#8217;s inadequate, but generates very limited motivation when it&amp;#8217;s increased above that point.&lt;/p&gt;
&lt;p&gt;If this is all common knowledge, why is incentive pay still so widely used? A &lt;a href="http://www.mckinseyquarterly.com/Motivating_people_Getting_beyond_money_2460/" title="McKinsey Study" target="_blank"&gt;McKinsey study&lt;/a&gt; suggests that&lt;span&gt; &amp;#8221;many executives hesitate to challenge the traditional managerial wisdom: money is what really counts.&amp;#8221;  But consider that most executives have generous bonus plans: Could the real answer be that they&amp;#8217;re incented to not see the truth&lt;/span&gt;?&lt;/p&gt;
&lt;p&gt;&amp;#8212;&amp;#8212;&amp;#8212;-&lt;/p&gt;
&lt;p&gt;[1] This org chart is an &amp;#8220;enhancement&amp;#8221; created by Joey Devilla, based on the original, awesomely clever version created by &lt;a href="http://www.bonkersworld.net/organizational-charts/" title="Bonkers World" target="_blank"&gt;Bonker&amp;#8217;s World&lt;/a&gt;.&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/26565678611</link><guid>http://deathrayresearch.tumblr.com/post/26565678611</guid><pubDate>Thu, 05 Jul 2012 13:13:00 -0400</pubDate></item><item><title>tenXer, the quantified life and the quantified hacker</title><description>&lt;p&gt;There&amp;#8217;s an excellent, slightly scatological &lt;a href="http://www.theatlantic.com/magazine/print/2012/07/the-measured-man/9018/" title="The Measured Man" target="_blank"&gt;article&lt;/a&gt; in The Atlantic on computer scientist Larry Smarr&amp;#8217;s quest for the quantified self. The article is by Mark Bowden of Black Hawk Down fame. If you&amp;#8217;re interested in the intersection of Big Data, metrics and medicine you should check it out.  &lt;/p&gt;
&lt;p&gt;On a related note, I recently come across a &lt;a href="http://bits.blogs.nytimes.com/2012/06/07/former-card-counters-new-start-up-helps-count-productivity/" title="Measuring productivity" target="_blank"&gt;profile&lt;/a&gt; in the New York Times on &lt;a href="http://www.tenxer.com" title="TenXer home page" target="_blank"&gt;tenXer&lt;/a&gt;, a startup that claims it can turn a &amp;#8216;1x&amp;#8217; engineer into a &amp;#8216;10x&amp;#8217; one through data-mining and gamification. The &lt;a href="http://www.economist.com/blogs/babbage/2012/06/super-star-programmers" title="Super Star Programmers" target="_blank"&gt;Economist&lt;/a&gt; also picked up on tenXer, showing that if nothing else, they have a flair for PR.  The media interest derives no doubt from TenXer&amp;#8217;s remarkable founder Jeff Ma, a former member of the MIT BlackJack Team featured in the film &amp;#8216;21&amp;#8217;.&lt;/p&gt;
&lt;p&gt;I signed up for their beta test to check it out. I don&amp;#8217;t want to rush to judgement based solely on their beta, so take this with a grain of salt: From what I&amp;#8217;ve seen, their approach is shallow, to the extent that I wonder how much they really understand the great sausage factory of &lt;a href="http://deathrayresearch.tumblr.com/post/4195081226/whats-right-with-this-picture-the-diagram-shows" title="Software as a dynamic system" target="_blank"&gt;software development&lt;/a&gt;.  In an interview, Ma states that software is just the start, rather than the sole focus of tenXer; if so, the lack of depth is understandable.&lt;/p&gt;
&lt;p&gt;The basic idea is that they gather metrics (counts of check-in&amp;#8217;s, lines-changed, emails sent, bugs fixed, etc.) from a variety of sources (GMail, Pivotal Tracker, GitHub, Phabricator and Jira to start with) and provide visualization tools to help you track your &amp;#8220;progress&amp;#8221; and encouragements to beat your prior bests.&lt;/p&gt;
&lt;p&gt;I have two concerns: First, how all these stats relate to effective software development is unclear. Contrast tenXer with the depth of Larry Smarr&amp;#8217;s inquiries into his health. Smarr understands that without a model of how it fits together, data is meaningless.  &lt;/p&gt;
&lt;p&gt;My second concern is that tenXer is all about individual performance. Software development is a team sport, so beware the local optima. The last thing you want is someone optimizing his personal check-in rate to make the tenXer leader board. &lt;/p&gt;
&lt;p&gt;Still, I think this is a startup to watch. Data mining is the wave of the future in software engineering and it&amp;#8217;s great to see startups moving into this space. &lt;/p&gt;
&lt;p&gt;(Full-disclosure: TenXer is funded by Google Ventures and my bank account is funded by Google, Inc. The opinions expressed here are my own. Google Ventures is unaware of my existence, etc. etc.)&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/26418932457</link><guid>http://deathrayresearch.tumblr.com/post/26418932457</guid><pubDate>Tue, 03 Jul 2012 10:05:00 -0400</pubDate><category>metrics</category><category>code-mining</category></item><item><title>Positive Software Engineering</title><description>&lt;p&gt;I came across a story recently on hacker news: &lt;/p&gt;
&lt;p&gt;&lt;a class="question-hyperlink" href="http://programmers.stackexchange.com/questions/154733/my-boss-decided-to-add-a-person-to-blame-field-to-every-bug-report-how-can-i" target="_blank"&gt;My boss decided to add a “person to blame” field to every bug report. How can I convince him that it&amp;#8217;s a bad idea?&lt;/a&gt;&amp;#8221;&lt;/p&gt;
&lt;p&gt;Dear OP, Good luck with that. &lt;/p&gt;
&lt;p&gt;&lt;img align="left" alt="Yeah, I'm gonna have to ask you to go ahead and blame your peers when something goes wrong." height="250" src="http://cdn.memegenerator.net/instances/400x/22763891.jpg" width="300"/&gt;Your boss isn&amp;#8217;t (necessarily) a bad guy. Software development is a complex dynamic system so it&amp;#8217;s hard to see what&amp;#8217;s really going on. Intuitions about how to fix things are often wrong and many management interventions are unproductive.&lt;/p&gt;
&lt;p&gt;What&amp;#8217;s unfortunate about this particular fail, is not that it&amp;#8217;s ineffective, but that it poisons the environment. Organizations can develop a pervasive culture of finger-pointing, fear and defensiveness. It becomes impossible to discuss issues because anything imperfect has to be someone&amp;#8217;s &amp;#8216;fault&amp;#8217;. The result feels like the org-chart of fear from Joseph Heller&amp;#8217;s novel &amp;#8220;Something Happened&amp;#8221;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;span&gt;“In my department there are six people who are afraid of me, and one small secretary who is afraid of all of us. I have one other person working for me who is not afraid of anyone, not even me, and I would fire him quickly, but I’m afraid of him.”&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;All of this takes a toll on the people. If you take an activity that many people happily do for free, combine it with some of the highest salaries of any profession, and produce a work-life that sucks, that&amp;#8217;s sub-optimal. It&amp;#8217;s not, unfortunately, unusual: many software people are &lt;a href="http://www.computerworld.com/s/article/9143194/Surveys_IT_job_satisfaction_plummets_to_all_time_low" title="IT Job Satisfaction survey" target="_blank"&gt;less than thrilled with their work&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;Why should anyone care? Two reasons: First, there is considerable evidence that &lt;a href="http://www.nytimes.com/2011/09/04/opinion/sunday/do-happier-people-work-harder.html" title="Do happier people work harder? NYTimes" target="_blank"&gt;employee well-being has a positive, causal impact on performance&lt;/a&gt;. Second, if you can structure work so that it supports, rather than undermines your team&amp;#8217;s well-being, then you should. It&amp;#8217;s morally the right thing to do, and you&amp;#8217;ll make your own work life more meaningful in the process.&lt;/p&gt;
&lt;p&gt;Improved well-being is clearly both a motivator for, and desired outcome of, Agile development practices, but to my mind it doesn&amp;#8217;t go far enough. I&amp;#8217;m looking for connections between how we build software and the relatively new fields of positive psychology, and positive organizational scholarship.  &lt;a href="http://www.ted.com/talks/martin_seligman_on_the_state_of_psychology.html" title="Seligman on TED" target="_blank"&gt;Positive psychology&lt;/a&gt; is concerned with taking healthy people and increasing their well-being. By analogy, we might take healthy development teams and help them really thrive. For lack of a better name, lets call this line of inquiry &amp;#8216;Positive Software Engineering&amp;#8217;.&lt;/p&gt;
&lt;p&gt;FWIW, here&amp;#8217;s my real answer to the OP: Instead of putting a &amp;#8220;person to blame&amp;#8221; field in every bug report, suggest he put all the hacker&amp;#8217;s names (and maybe photos, too) in the product itself. Most products have an &amp;#8220;About&amp;#8221; link or dialog. Put it there.&lt;/p&gt;
&lt;p&gt;No one wants their name on something they&amp;#8217;re not proud of. Open source projects credit their hackers; game developers, too. If he needs a further precedent, show him how Steve Jobs put &lt;a href="http://www.vintagecomputing.com/index.php/archives/391" title="A machine with your name on it" target="_blank"&gt;the original Mac developer&amp;#8217;s signatures inside every Mac.&lt;/a&gt; Your boss can be positive, and be like Steve. What&amp;#8217;s not to like?&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/26359964520</link><guid>http://deathrayresearch.tumblr.com/post/26359964520</guid><pubDate>Mon, 02 Jul 2012 14:37:00 -0400</pubDate><category>software engineering</category><category>well-being</category><category>positive software engineering</category></item><item><title>Save the baby code</title><description>&lt;p&gt;Code reviews are both a standard part of the development process and the biggest wasted resource in software engineering.&lt;/p&gt;
&lt;p&gt;&lt;img align="left" src="http://upload.wikimedia.org/wikipedia/commons/thumb/7/76/Virginia_Apgar.jpg/220px-Virginia_Apgar.jpg" alt="Virginia Apgar and a newborn" width="220" height="265"/&gt;&lt;/p&gt;
&lt;p&gt;Approaches vary from face-to-face discussions to online systems like &lt;a title="Review Board" target="_blank" href="http://www.reviewboard.org/"&gt;Review Board&lt;/a&gt;.  They share two things: They&amp;#8217;re arguably the most effective way to assess code quality, and they&amp;#8217;re expensive. &lt;/p&gt;
&lt;p&gt;Yet even as we pay experts to evaluate the actual code, we manage with metrics like code-coverage and defect counts that provide indirect (and possibly delayed) signals about its health. If we could somehow quantify those reviews, the insights could lead to improvement. &lt;/p&gt;
&lt;p&gt;Faced with a similar problem, Virginia Apgar published in 1953 a paper titled: &lt;em&gt;&amp;#8220;&lt;a target="_blank" href="http://profiles.nlm.nih.gov/ps/retrieve/ResourceMetadata/CPBBKG"&gt;A Proposal for a New Method of Evaluation of the Newborn Infant&lt;/a&gt;&amp;#8221;&lt;/em&gt; that changed obstetrics and neo-natal practice around the world.  She did it by devising a simple, 10 point scale that rated newborns on five categories like muscle tone, color etc, awarding 0 to 2 points for each.&lt;/p&gt;
&lt;p&gt;In the words of Atul Gawande, Apgar&amp;#8217;s score &amp;#8221;turned an intangible and impressionistic clinical concept- the health of newborn babies- into numbers that people could collect and compare.&amp;#8221; This led to two kinds of innovations: One produced new techniques to save babies with low scores; the second brought advances that led to increased average scores. The result was a 16X improvement in infant mortality and 140,000 lives saved each year in the US alone. &lt;/p&gt;
&lt;p&gt;To do this, Apgar first demonstrated that her score was a true measure of newborn health. She divided 2,096 newborns into three groups according to their scores. Mortality for the middle group was an order of magnitude worse than the best group, while the lowest group&amp;#8217;s mortality was an order of magnitude worse still:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Infants receiving 0, 1 or 2 scores: 14%&lt;/li&gt;
&lt;li&gt;Infants receiving 3, 4, 5, 6, 7  scores: 1.1% &lt;/li&gt;
&lt;li&gt;Infants receiving 8, 9, 10 scores: 0.13%&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Having established the score&amp;#8217;s effectiveness, she went on to demonstrate the advantages of one technique over another by comparing the scores they produced.  The results for ways to deliver anesthesia, for example&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Spinal anesthesia: 8.0 &lt;/li&gt;
&lt;li&gt;General anesthesia: 5.0  &lt;/li&gt;
&lt;li&gt;Epidural or caudal: 6.3&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;showed clear differences between the techniques. The result was was the widespread adoption of the rating system and ongoing competition among doctors, hospitals and researchers for improved scores.&lt;/p&gt;
&lt;p&gt;What does this have to do with code reviews?  The health of newborn code is also an &amp;#8220;intangible and impressionistic&amp;#8221; concept. It needs an Apgar score so that teams can learn and improve.  &lt;/p&gt;
&lt;p&gt;There are complications: First, a baby is a baby, but checkins vary from a one-line bug fix to a huge body of code. This can potentially be addressed by normalizing scores with respect to the amount of code reviewed.  Second, no single attribute of code health is as unambiguous as death. This is more troubling, but it can be approached the way Apgar approached infant health: devising a score and comparing it to actual results. In this case, the results might be defect counts and other measures of quality. &lt;/p&gt;
&lt;p&gt;Here is my first pass based on conversations with a few hackers: First, I would measure correctness as a raw count of identified defects.  For the remaining criteria, I would assign a rating of 0, 1 or 2 points.  The categories are:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Readability: (Inadequately documented or poor naming, Acceptable or NA, Clearly documented, well-chosen names)&lt;/li&gt;
&lt;li&gt;Test coverage: (Inadequate, NA or marginal, Fully covered)&lt;/li&gt;
&lt;li&gt;Simplicity: (More complex than necessary, Acceptable or NA, Complexity appropriate to requirements)&lt;/li&gt;
&lt;li&gt;Performance: (Inadequate and material, NA or Immaterial, Appropriate to requirements)&lt;/li&gt;
&lt;li&gt;Reuse (Inadequate or inappropriate use of existing code, NA, Appropriate use of existing code.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Like babies and their Apgar scores, the code would be rated twice: Once on first submission and once with approval (unless, of course, it was approved on first review).&lt;/p&gt;
&lt;p&gt;Other, better approaches are possible.  What would you do? &lt;/p&gt;
&lt;p&gt;By themselves the scores do nothing to improve your process, just as Apgar scores alone don&amp;#8217;t improve an infant&amp;#8217;s health.  The important step, one that will challenge your knowledge and creativity, is to relate them to your other data, understand what this tells you about your process and invent ways to improve things. &lt;/p&gt;
&lt;p&gt;p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Courier; color: #666666; background-color: #dcedd2}
&lt;/p&gt;&lt;p class="p1"&gt;&amp;lt;img src=&amp;#8221;http://images.demandmedia.s3.amazonaws.com/verify.png?id=B8pCkdn81wLM5l5326a2IU4&amp;#8221; &lt;/p&gt;
&lt;p class="p1"&gt;alt=&amp;#8221;&amp;#8221; style=&amp;#8221;width:1px;height:1px;border:0px&amp;#160;!important;&amp;#8221; /&amp;gt;&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/5736417135</link><guid>http://deathrayresearch.tumblr.com/post/5736417135</guid><pubDate>Sun, 22 May 2011 13:12:00 -0400</pubDate><category>software engineering</category><category>metrics</category><category>code reviews</category></item><item><title>The pathology of estimates</title><description>&lt;p&gt;I recently sparred gently on twitter with Scott Ambler regarding an assertion that repeatedly renegotiated schedules was evidence of unethical behavior. Others on the thread equated the practice with lying.&lt;/p&gt;
&lt;p&gt;Not so.&lt;/p&gt;
&lt;p&gt;(Full disclosure: &lt;a target="_blank" href="http://www.ambysoft.com/scottAmbler.html"&gt;Scott&lt;/a&gt; is a highly respected thought-leader in the Agile community and the Chief Methodologist at IBM Rational. And I&amp;#8217;m jealous.)&lt;/p&gt;
&lt;p&gt;&lt;img align="left" src="http://upload.wikimedia.org/wikipedia/commons/thumb/a/a4/BernardMadoff.jpg/220px-BernardMadoff.jpg" alt="Bernie Madoff" width="220" height="279"/&gt;&lt;strong&gt;Bernie Madoff Syndrome&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When we witness a problem, we search for the cause. Often the trail seems to lead to the (mis)behavior of a few ‘bad apples’ &lt;/p&gt;
&lt;p&gt;Bad-apple theory suggests the recent near-collapse of the world-wide financial system was caused by the misdeeds of a handful of Bernie Madoff types. The solution is to punish these people and discourage imitators.&lt;/p&gt;
&lt;p&gt;The theory is attractive because it has a simple story line. It&amp;#8217;s morally satisfying. Cause and effect are tied neatly together, letting us off the hook from the hard work of changing the system itself. &lt;/p&gt;
&lt;p&gt;I&amp;#8217;m not going to argue that lying and deceit don’t sometimes play a role, but we should be wary of attributing the success of some projects to the virtues of Agile, and the failure of others to individual vice. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Many bad apples?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To find other explanations, we begin by observing that the sliding schedule problem is not uncommon. Time and software are old enemies:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;#8220;More projects go awry for lack of calendar-time than all other reasons combined.&amp;#8221; Fred Brooks, Mythical Man Month.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While it&amp;#8217;s possible that engineering management attracts a disproportionate share of incompetent and unethical people, it&amp;#8217;s more likely that the fault is systemic. When a problem recurs regularly in spite of the best efforts of bright, resourceful people, we can assume it has deep roots. &lt;/p&gt;
&lt;p&gt;Bazerman and Watkins argue that some problems recur because they fall into a kind of “sweet spot” for failure, where political interest, organizational dysfunction and cognitive limits align against them. Software blowups fit this model nicely.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Bad brains, and good ones&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;span&gt;&lt;strong&gt;Dr. Frederick Frankenstein&lt;/strong&gt;: Ah! Very good. Would you mind telling me whose brain I DID put in? &lt;br/&gt;&lt;strong&gt;Igor&lt;/strong&gt;: Abby someone. &lt;br/&gt;&lt;strong&gt;Dr. Frederick Frankenstein&lt;/strong&gt;: [&lt;em class="fine"&gt;pause, then&lt;/em&gt;] Abby someone. Abby who? &lt;br/&gt;&lt;strong&gt;Igor&lt;/strong&gt;: Abby&amp;#8230; Normal. &lt;/span&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;span&gt;&lt;span&gt;Unfortunately, it&amp;#8217;s not just abnormal brains that cause problems. Daniel Kahneman won a Nobel Prize in Economics for his research on biased decision-making under uncertainty. I&lt;/span&gt;&lt;/span&gt;ts application in the financial markets helped others win more lucrative prizes- even without cheating.&lt;/p&gt;
&lt;p&gt;&lt;img height="179" width="150" alt="Daniel Kahneman" src="http://upload.wikimedia.org/wikipedia/commons/c/c8/Daniel_KAHNEMAN.jpg" align="left"/&gt;&lt;/p&gt;
&lt;p&gt;More broadly, Kahneman and his followers studied a host of common heuristics and biases. One finding: people have a clear bias towards optimistic time estimates known as the &lt;em&gt;Planning Fallacy&lt;/em&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“The phenomenon is not limited to commercial mega-projects&amp;#8230; and its occurrence does not depend on deliberate deceit or untested technologies.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In fact it’s been shown to be true with even simple household tasks. Better still, a person can be pessimistic about plans in general, e.g. “Software projects always run late” and still be too optimistic in their own planning: “I think I can finish on time.”&lt;/p&gt;
&lt;p&gt;Ironically, more detailed planning can actually make people more optimistic.  The theory is that optimism derives from a mental image of success; more detail makes for a more compelling image.&lt;/p&gt;
&lt;p&gt;Another common cure, having people estimate their own work may also backfire. While this should increase commitment and incentives for accuracy, it goes against human nature: People tend to be optimistic regarding their own plans, but more realistic regarding the plans of others.&lt;/p&gt;
&lt;p&gt;While the planning fallacy affects estimates generally, the heuristic called &lt;em&gt;Anchoring and Adjustment&lt;/em&gt; undermines our attempts at revision. People often unconsciously start with a reference point (the &amp;#8216;anchor&amp;#8217;) and then ‘adjust’ from that to derive an estimate. The problem is that the adjustments are usually too small. The current scheduled completion date can be an anchor that contaminates subsequent efforts to create a viable schedule. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;From Madoff to Gandhi&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;How powerful is this effect? People were asked to estimate how old Gandhi was when&lt;img height="242" width="200" alt="Gandhi" src="http://upload.wikimedia.org/wikipedia/commons/thumb/0/03/MKGandhi.jpg/200px-MKGandhi.jpg" align="right"/&gt; he died, but first half were asked if he was older than 9. The others were asked if he was less than 140. (Obviously his age was between the two when he died.) The first group had an average estimate of 50; the others an average of 67: a difference of 17 years due to completely irrelevant anchors.&lt;/p&gt;
&lt;p&gt;Then there&amp;#8217;s &lt;em&gt;The Confirmation Trap.&lt;/em&gt; Given an assertion like “We can do that in six months,” we have a strong and harmful tendency to seek supporting evidence and stop when we find some. But the presence of supporting evidence doesn&amp;#8217;t make a plan achievable. Plans can only be proven &lt;em&gt;infeasible&lt;/em&gt; before hand. Searching for proof that a plan is infeasible is an uncommon project activity, to say the least.&lt;/p&gt;
&lt;p&gt;How common are these problems? In the words of one researcher: “One of the most robust findings in the psychology of prediction is that people’s predictions tend to be optimistically biased.”  If you think you&amp;#8217;re immune, you&amp;#8217;re suffering from a &lt;em&gt;positive illusion &lt;/em&gt;bias&lt;em&gt;. &lt;/em&gt;These not only harm our estimates, they lead us to think that everything will work out fine, undermining our motivation to do something before things get out of hand.  Other biases have similar effects:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Discounting the Future: We tend to avoid small costs in the present that would prevent large problems in the future. &lt;/li&gt;
&lt;li&gt;Status quo bias: We tend to avoid actions involving any clear harm, even when the positive benefits of the action outweigh the negatives greatly.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;In short, human nature leads us astray. It leads us to underestimate the task at hand and to delay corrective action when needed. If that qualifies as unethical, then we&amp;#8217;re all guilty at least some of the time:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;#8220;when we make mistakes, we shrug and say that we are human. As bats are batty and slugs are sluggish, our own species is synonymous with screwing up&amp;#8221; - Kathryn Shulz- &lt;em&gt;Being Wrong: Adventures in the Margin of Error&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;These biases partly explain the persistence of sliding schedules. If nothing else, they should make us think twice about hammering people who miss a deadline.  I plan to follow this up with posts on other factors that may play a role, but I&amp;#8217;d rather not say exactly when I&amp;#8217;ll be done.&lt;/p&gt;
&lt;p&gt;(Sources on cognitive limits include Shulz, Bazerman &amp;amp; Watkins, Kahneman, etc, and Gilovich, etc. All are listed in the &lt;a title="Bibliography" target="_self" href="http://deathrayresearch.tumblr.com/bibliography"&gt;bibliography&lt;/a&gt; with links to their Amazon pages.&lt;/p&gt;
&lt;p&gt;Mel Brooks&amp;#8217; &lt;em&gt;Young Frankenstein&lt;/em&gt; was my source on problems arising from installing an abnormal brain in a giant, home-made creature&lt;em&gt;.&lt;/em&gt;)&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/4503505772</link><guid>http://deathrayresearch.tumblr.com/post/4503505772</guid><pubDate>Sun, 10 Apr 2011 16:35:00 -0400</pubDate><category>software engineering</category><category>estimation</category></item><item><title>Software, Metrics and Ethics</title><description>&lt;blockquote&gt;
&lt;p&gt;&amp;#8220;It&amp;#8217;s impossible to move, to live, to operate at any level without leaving traces, bits, seemingly meaningless fragments of personal information.&amp;#8221; William Gibson&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;One of the themes of this site is that the lack of transparency in the development process is a leading cause of mis-management. This need not be the case.&lt;/p&gt;
&lt;p&gt;Nearly every aspect of software development leaves a digital trace. Analyzing those traces can help eliminate the fog surrounding software development. I believe the current state of the process can be made available to decision makers. I also believe, though it&amp;#8217;s unproven, that the quality and productivity of teams and individual hackers can be measured by analyzing the traces the process leaves behind.&lt;/p&gt;
&lt;p&gt;Whether or not that assertion is true, it raises the ethical question implied by the quote from William Gibson: Under what circumstances is it acceptable to put the process under the microscope?&lt;/p&gt;
&lt;p&gt;The question is not new, though it is new to software development, where what hackers do is generally thought too complex to measure. In my experience, most programmers believe a great hacker is several times more productive than a marginal hacker, while simultaneously believing that it&amp;#8217;s impossible to measure hacker productivity.&lt;/p&gt;
&lt;p&gt;There is good reason to suggest that this is not the case. The next time you tell your phone to play a song or have Google translate something, remember that you&amp;#8217;re watching statistical natural language processing at work, and that natural languages are far more complex than programming languages. Their complexity hasn&amp;#8217;t prevented valuable progress from being made.  Look more broadly and examples abound of the successful application of statistics in systems that are more complex than software development. Our metrics are weak not because software is so complex, but because our data sucks.&lt;/p&gt;
&lt;p&gt;There is also good reason not to ignore the question of hacker productivity. In the long run, the only way to keep programming jobs in high wage locations is through demonstrably superior productivity. &lt;/p&gt;
&lt;p&gt;The question of how to measure performance in an ethical and non-threatening way is old news in industries where statistical process control (SPC) is common.  I had the privilege years ago to study with W. Edwards Deming at NYU, who was renowned for having taught SPC in Japan after the war. Japan, of course, taught it to the rest of the world by decimating their low-quality competitors. If you can drive to work without wondering if your car will break down, you owe something to Deming. &lt;/p&gt;
&lt;p&gt;In Deming&amp;#8217;s view the ultimate goal of process improvement was to &amp;#8220;provide jobs and more jobs.&amp;#8221; He saw this both as a moral imperative and a practical necessity: only &amp;#8220;driving out fear&amp;#8221; could prevent the sabotage of the metrics needed for SPC. Because of that, he spent relatively less time discussing math and more time teaching managers how to NOT to misinterpret data. And he emphasized consistently employee morale and security.[1]&lt;/p&gt;
&lt;p&gt;If we are to make effective use of data in software engineering we need to be equally vigilant. The data must be used only to (1) help lower-performing individuals improve, and (2) to help move the team as a whole to a higher average. If it turns out that one of your people just doesn&amp;#8217;t have the ability to be a strong hacker, it&amp;#8217;s on you to find a way for them to contribute. If this happens often, you need to work on your hiring process. Having good data may help. What you don&amp;#8217;t do is fire them. The first time someone uses your data as grounds for termination, you&amp;#8217;re lost.  &lt;/p&gt;
&lt;p&gt;[An aside: Other people are starting to move in the direction of analyzing data produced as a normal part of the development process. Michael Feathers had a &lt;a target="_blank" href="http://michaelfeathers.typepad.com/michael_feathers_blog/2011/03/data-rich-development.html"&gt;recent post&lt;/a&gt; on his excellent blog that mentions several different SCM-mining efforts underway, though they&amp;#8217;re a bit different from what I have in mind.]&lt;/p&gt;
&lt;p&gt;[1] Well, that, and the infinite stupidity of America&amp;#8217;s corporate leadership. He was a pretty cranky guy on that subject. &lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/4367093927</link><guid>http://deathrayresearch.tumblr.com/post/4367093927</guid><pubDate>Tue, 05 Apr 2011 12:00:00 -0400</pubDate><category>metrics</category><category>empirical software engineering</category></item><item><title>Finding meaning in manual tests</title><description>&lt;p&gt;&lt;p class="p1"&gt;How do you assess the overall quality of your application when you have too many manual/functional acceptance tests to run them all after every sprint?  Perhaps you’ve been working on an application for some time and want to predict when the quality will be good enough to ship.&lt;/p&gt;
&lt;p class="p1"&gt;(Here some will say, “We don’t need manual tests; We have unit tests for everything,” If your automated suites thoroughly test integration and fully exercise your UI, fine. Otherwise, we&amp;#8217;ll assume that you need or want to augment your automated tests.)&lt;/p&gt;
&lt;p class="p1"&gt;One approach is to run all the manual tests for a functional area with each iteration. This is often coordinated with a push to fix bugs in the same area. It&amp;#8217;s an efficient way to use testing resources, and when coordinated with a bug sweep, it helps you find the things you broke when you swept.  &lt;/p&gt;
&lt;p class="p1"&gt;Be aware, though, that it tells you little about the quality of the entire application. A different approach, which can be used in combination with a focused testing effort, is to select a set of tests at random and execute those. &lt;/p&gt;
&lt;p class="p1"&gt;Specifically:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;select a different random set of tests to run with each iteration.&lt;/li&gt;
&lt;li&gt;execute each test and record whether it passes or fails&lt;/li&gt;
&lt;li&gt;calculate an overall pass rate for the suite.&lt;/li&gt;
&lt;/ul&gt;&lt;p class="p1"&gt;Easy. Now what do you do with the failing tests? In terms of learning about your application, it doesn’t matter whether you fix the issue or not - but it&amp;#8217;s &lt;em&gt;essential&lt;/em&gt; that if you do fix it, you don’t change the original pass rate. That just pollutes your data. &lt;/p&gt;
&lt;p class="p1"&gt;Lets say 90% of your sample tests pass. Can you assume that 90% of the tests you didn’t run would also pass?  Not necessarily. What’s cool about sampling is that it tells you how much to trust your results.&lt;/p&gt;
&lt;p class="p1"&gt;&lt;strong&gt;How many tests is enough?&lt;/strong&gt;&lt;/p&gt;
&lt;p class="p1"&gt;To know how many tests to run for a given level of precision, you can use a sample size calculator like the one at &lt;a href="http://www.surveysystem.com/sscalc.htm.%C2%A0" target="_blank"&gt;http://www.surveysystem.com/sscalc.htm. &lt;/a&gt;&lt;/p&gt;
&lt;p class="p1"&gt;To calculate sample size, you have to provide some guidance. First, tell it how many tests you&amp;#8217;re sampling &lt;em&gt;from&lt;/em&gt;. This is your &lt;em&gt;population&lt;/em&gt;. Lets assume you have 1,000 tests.&lt;/p&gt;
&lt;p class="p1"&gt;Next, select a&lt;em&gt; confidence interval&lt;/em&gt; of say, plus or minus 5%. If your sample tests pass at 90%, you can now say the pass rate for &lt;em&gt;all&lt;/em&gt; tests (run and not run combined) is probably 90% plus or minus 5%, i.e. between 85% and 95%. &lt;/p&gt;
&lt;p class="p1"&gt;Note that I said “probably.” To be more specific, select a &lt;em&gt;confidence level&lt;/em&gt; (usually 95% or 99%).  If you pick 95%, you can now be very specific about what &amp;#8220;probably&amp;#8221; means: “&lt;em&gt;I’m 95% sure&lt;/em&gt; the pass rate for all tests is between 85% and 95%. Or rather you could say that if your sample size is big enough. In this case, the calculator shows that you’d have to run 278 randomly selected tests for that level of precision.  &lt;/p&gt;
&lt;p class="p1"&gt;&lt;strong&gt;The moral of the story&lt;/strong&gt;&lt;/p&gt;
&lt;p class="p1"&gt;If that seems like that’s a lot of tests for little precision, then you&amp;#8217;ve uncovered the most important lesson here. Think about how many times you’ve seen someone use a similar pass rate, &lt;em&gt;taken from an even smaller, non-random sample&lt;/em&gt; and act like it was perfectly accurate: “Last month we passed 91%, this month we passed at 90%. Why are we getting worse?” &lt;/p&gt;
&lt;p class="p1"&gt;If you&amp;#8217;re using sampling, you know that a difference that small is probably meaningless. The real value of being precise about the limits of your knowledge is that it can keep you from chasing random fluctuations and making things worse. The way to judge your improvement is to wait until you have a handful of results, plot them and look for trends.&lt;/p&gt;
&lt;p class="p1"&gt;&lt;strong&gt;The frame is not the universe&lt;/strong&gt;&lt;/p&gt;
&lt;p class="p1"&gt;Before ending, we should be clear about one more thing: All we really know is the pass rate for our tests. We&amp;#8217;ve been making an implicit assumption that the suite would provide an accurate measure of overall quality if we ran them all. That remains to be proven.&lt;/p&gt;
&lt;p class="p1"&gt;If you think of each test as exercising a particular path through the application, then some terms from sampling theory can help make the remaining limits of our understanding clearer:&lt;/p&gt;
&lt;p class="p1"&gt;Universe: What we really want to measure. In this case perhaps our quality over the set of all possible user paths.&lt;/p&gt;
&lt;p class="p1"&gt;Frame: The set of accessible paths from which we draw our sample. In this case all the list of paths we’ve documented as unique test cases.&lt;/p&gt;
&lt;p class="p1"&gt;Sample: A randomly selected subset of the frame.  &lt;/p&gt;
&lt;p class="p1"&gt;In software terms, we need to understand the coverage our test suite provides. There are numerous ways we can define coverage, but that’s a subject for another day. &lt;/p&gt;&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/4340271504</link><guid>http://deathrayresearch.tumblr.com/post/4340271504</guid><pubDate>Mon, 04 Apr 2011 11:29:00 -0400</pubDate><category>metrics</category><category>quality</category></item><item><title>A note on quality</title><description>&lt;blockquote&gt;
&lt;p&gt;&lt;span&gt;&amp;#8220;We never use a screwdriver in the last week. We hammer the screws in. We slam solder on the connections, cannibalize parts from other televisions if we run out of the right ones, use glue or hammers to fix switches that were never meant for that model. All the time management is pressing us to work faster, to make the target so we all get our bonuses.&amp;#8221; Worker in a Soviet television factory quoted in Milgrom &amp;amp; Roberts: &lt;em&gt;Economics, Organization and Management&lt;/em&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Every hacker, at one time or another has committed the software equivalent of hammering in screws when a deadline approaches. It&amp;#8217;s understood that quality suffers, but quality can mean many things. Here we talk about three kinds of quality: &lt;em&gt;&lt;span mce_name="em" mce_style="font-style: italic;"&gt;design quality&lt;/span&gt;&lt;/em&gt;, &lt;em&gt;&lt;span mce_name="em" mce_style="font-style: italic;"&gt;conformance quality&lt;/span&gt;&lt;/em&gt; and &lt;em&gt;&lt;span mce_name="em" mce_style="font-style: italic;"&gt;total quality&lt;/span&gt;&lt;/em&gt;.[1]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Design Quality&lt;/strong&gt; is a statement of intent, a measure of how the product &lt;em&gt;&lt;span mce_name="em" mce_style="font-style: italic;"&gt;as designed&lt;/span&gt;&lt;/em&gt; would appeal to the market&amp;#8217;s true needs and desires if it were made perfectly. It has nothing to do with the hidden details of the internal architecture and everything to do with user. The original iPhone had great design quality.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conformance Quality&lt;/strong&gt; is the degree to which the actual product reflects that design.&lt;/p&gt;
&lt;p&gt;A product with few bugs, but many issues closed as &amp;#8220;Works as designed&amp;#8221; may have good conformance quality and poor design quality. Usability issues are design quality problems. Designing a product no one wants, that costs too much to operate or that isn&amp;#8217;t competitive are others. &lt;em&gt;In our &lt;a target="_blank" href="http://s3.amazonaws.com/data.tumblr.com/tumblr_litu19CgWM1qhcbvmo1_1280.jpg?AWSAccessKeyId=AKIAJ6IHWSU3BX3X7X3Q&amp;amp;Expires=1301746412&amp;amp;Signature=MCzi5BKbdd2jdWUuLFxCEpZ2SMY%3D"&gt;simple model of software development&lt;/a&gt;, what we&amp;#8217;ve labeled &amp;#8220;quality&amp;#8221; is conformance quality&lt;/em&gt;.  We&amp;#8217;ll add design quality when we add customers to the model.&lt;/p&gt;
&lt;p&gt;Typical bugs are conformance failures&amp;#8212;the product doesn&amp;#8217;t perform as specified. But if you host your software, so are performance problems, deployment problems, hardware issues, scaling problems. All detract from the user experience and create a gap between the intended value and what you delivered.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Total Quality&lt;/strong&gt; is the combination of design quality and conformance quality. The combination is not additive: poor design quality destroys the product&amp;#8217;s value, even if the conformance quality is very high. A product has good total quality when the implementation conforms closely to a design that meets the market&amp;#8217;s expectations.  &lt;/p&gt;
&lt;p&gt;We distinguish between design quality and conformance quality for a reason: Most software organizations invest far more on finding bugs than they do on the quality of their requirements, usability, or the competitiveness of the overall design. For startups, at least, this is starting to change as more adopt the &lt;a target="_blank" href="http://steveblank.com/2010/01/25/whats-a-startup-first-principles/"&gt;Lean Startup&lt;/a&gt; approach that makes customer development (essentially market research) an equal partner with product development. &lt;/p&gt;
&lt;p&gt;[1] The distinction between these three definitions of quality is stolen with pride from &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Kaoru_Ishikawa"&gt;Kaoru Ishikawa&lt;/a&gt;. &lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/4257262150</link><guid>http://deathrayresearch.tumblr.com/post/4257262150</guid><pubDate>Fri, 01 Apr 2011 08:52:06 -0400</pubDate><category>lean startup</category><category>quality</category><category>software engineering</category></item><item><title>A note on defect discovery</title><description>&lt;p&gt;When people see defect discovery in the &lt;a target="_blank" href="http://s3.amazonaws.com/data.tumblr.com/tumblr_litu19CgWM1qhcbvmo1_1280.jpg?AWSAccessKeyId=AKIAJ6IHWSU3BX3X7X3Q&amp;amp;Expires=1301596435&amp;amp;Signature=fw9r8nWp1WyttgBOG%2BJiV2i2RCU%3D"&gt;development model&lt;/a&gt;, they naturally think of quality assurance engineers hammering away on the product. But that&amp;#8217;s just one approach, and not the most effective. (The most effective way is to demo the application to an important customer.)&lt;/p&gt;
&lt;p&gt;Defect discovery includes also &lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Static code analysis (including that performed by the IDE)&lt;/li&gt;
&lt;li&gt;Compilation&lt;/li&gt;
&lt;li&gt;Manual desktop testing by hackers&lt;/li&gt;
&lt;li&gt;Code reviews&lt;/li&gt;
&lt;li&gt;Unit tests&lt;/li&gt;
&lt;li&gt;Integration tests&lt;/li&gt;
&lt;li&gt;Customer beta tests&lt;/li&gt;
&lt;li&gt;Demos (of course)&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;And so on.&lt;/p&gt;
&lt;p&gt;We rarely think of what the hacker does as defect discovery, but much of it is.  One methodology developed at IBM in the eighties took developer testing so seriously they eliminated it. The Cleanroom approach aimed to provide both near zero defect code and guaranteed levels of reliability. And did.[1]&lt;/p&gt;
&lt;p&gt;The approach included several novel elements, but the strangest to current practitioners is that they took away the compilers so no testing could be done by developers.  All test executions were performed by a separate team and the results were recorded. Having a complete history of &lt;em&gt;all&lt;/em&gt; test executions helped enable the creation of probability models that could forecast the number of errors in production. &lt;/p&gt;
&lt;p&gt;Viewed another way, Cleanroom Development tells you not just how many known defects there are, but how many &lt;em&gt;unknown&lt;/em&gt; defects are in the application.[2] On the day you launch that&amp;#8217;s one of of the two things you most want to know. &lt;/p&gt;
&lt;p&gt;[1] This is not the only reason why no one uses it.&lt;/p&gt;
&lt;p&gt;[2] More precisely, it let you project confidently the mean-time-to-failure (MTTF) for the application.&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/4221651480</link><guid>http://deathrayresearch.tumblr.com/post/4221651480</guid><pubDate>Wed, 30 Mar 2011 20:46:00 -0400</pubDate><category>Rework-Cycle</category><category>software engineering</category><category>Cleanroom Software Engineering</category><category>Defect Discovery</category></item><item><title>Romantic Agile and the universal theory of big software</title><description>&lt;p&gt;&lt;em&gt;&amp;#8220;Experience alone, without theory, teaches management nothing about what to do to improve quality and competitive position, nor how to do it.&amp;#8221;&lt;/em&gt; W. Edwards Deming&lt;/p&gt;
&lt;p&gt;For some people, software engineering is a solved problem and Agile is the solution. If a project is small enough, management enlightened enough, and customers sufficiently supportive or powerless  - in other words, if it&amp;#8217;s an easy project - Agile is the way to go. &lt;/p&gt;
&lt;p&gt;But it&amp;#8217;s not enough: &lt;/p&gt;
&lt;ul&gt;&lt;li&gt;It leaves open the question of how to continue improving. How do you outperform the other &amp;#8220;Agilistas&amp;#8221; and build a firm-specific advantage in engineering? &lt;/li&gt;
&lt;li&gt;As an individual, you compete globally with engineers also using Agile who have a significant cost advantage. How will you feed your kids ten years from now?&lt;/li&gt;
&lt;li&gt;Software projects differ. Complex projects differ from simple ones. Web software differs from medical device firmware. When goals are different, any technique must support some better than others. To customize the process you need to combine a solid theory with intense study of your own situation.&lt;/li&gt;
&lt;li&gt;Agile methods focus heavily on coding and testing and say less about things like requirements and operations. It&amp;#8217;s impossible to optimize an entire process while focusing on one part. &lt;/li&gt;
&lt;li&gt;Agile is likely not the final advance in development methodology. Open Source, for example, is a far more radical rethinking of software production, and one with important lessons for developers. &lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;In manufacturing, the techniques that make up Just-in-Time (to which Agile is often compared) are less important than the realization that every aspect of the process is a control to be manipulated, coupled with a &amp;#8220;relentless pursuit of understanding and improvement.&amp;#8221; &lt;/p&gt;
&lt;p&gt;Which brings to mind another lesson from Just-in-Time, the distinction between &amp;#8220;pragmatic&amp;#8221; JIT - characterized by a patient and exhaustive focus &amp;#8220;on the concrete details of the production process&amp;#8221; - and the &amp;#8220;quasi-mystical hyperbole&amp;#8221; of &amp;#8220;romantic JIT.&amp;#8221;  Much of the writing about Agile (&amp;#8220;People over Processes&amp;#8221;) tends towards the romantic, making it simultaneously more appealing and less useful.  &lt;/p&gt;
&lt;p&gt;None of this makes Agile &amp;#8216;wrong&amp;#8217;; It&amp;#8217;s not, but too many people use it as a cook book and when they need to improvise they&amp;#8217;re stuck.  They need to learn to &lt;a href="http://www.amazon.com/Modernist-Cuisine-Art-Science-Cooking/dp/0982761007" target="_blank"&gt;cook&lt;/a&gt;; not just follow recipes. &lt;/p&gt;
&lt;p&gt;Is there a &amp;#8216;universal theory of big software&amp;#8217;?  I think the pieces exist, rooted in economics, systems dynamics and other disciplines.  They&amp;#8217;re just waiting for someone to pull them all together. &lt;/p&gt;
&lt;p&gt;That someone is not me.&lt;/p&gt;
&lt;p&gt;I&amp;#8217;m reminded of William James&amp;#8217;s remarks on his own ground-breaking text &lt;em&gt;Principles of Psychology. &lt;/em&gt;He called it &amp;#8221;a loathsome, distended, tumefied, bloated, dropsical mass, testifying to but two facts: 1st, that there is no such thing as a &lt;em&gt;science&lt;/em&gt; of psychology, and 2nd, that WJ is an incapable.&amp;#8221; &lt;/p&gt;
&lt;p&gt;In the posts that follow, I hope to produce something worthy of similar praise. &lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/4183058307</link><guid>http://deathrayresearch.tumblr.com/post/4183058307</guid><pubDate>Tue, 29 Mar 2011 08:13:00 -0400</pubDate><category>agile</category><category>software engineering</category></item><item><title>Build your own Airline Reservation System</title><description>&lt;p&gt;&amp;#8220;&lt;em&gt;Air Canada suspended activity related to the implementation of a new reservations system under development with ITA Software.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The carrier recorded a second-quarter impairment charge of C$67 million (US$61.9 million) related to the development of the system, dubbed Polaris&lt;/em&gt;.&amp;#8221; - Air Transport World, June 2010.&lt;/p&gt;
&lt;p&gt;I worked on Polaris and was responsible for a few parts. Problems were not unexpected. Of nine major project failures listed in one paper, two were airline reservation systems: A United Airlines system cancelled in the early 70s after $50 million was spent, and an American Airlines system cancelled in 1992 after burning through $125 million. &lt;/p&gt;
&lt;p&gt;Airline reservation systems (ARS) are among the largest and most complex ever built.  The fact that there were no failures listed after 1992 is not because people figured out how to build them, but because - to the best of my knowledge - no one attempted a new, from-the-ground-up reservation system for a major international carrier between American&amp;#8217;s attempt and Polaris. &lt;/p&gt;
&lt;p&gt;Although Air Canada timed-out and walked away, the work on Polaris has born fruit.  The flight schedule management system we built is in production.  The inventory control system has been purchased by American Airlines, and the core reservation system and departure control systems are being considered by other airlines. &lt;/p&gt;
&lt;p&gt;What made Polaris interesting to me was its size - hundreds of person-years of effort with several hundred people working on it at its peak. I&amp;#8217;ve been building large software systems for nearly 20 years and think I know it all, but saw immediately that I didn&amp;#8217;t know how to build something &lt;em&gt;this&lt;/em&gt; big and complex.&lt;/p&gt;
&lt;p&gt;Now I feel like I&amp;#8217;m starting to get it. In subsequent posts I&amp;#8217;ll attempt to describe what I think I learned, and then you too can build an ARS for fun and profit!&lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/4170305887</link><guid>http://deathrayresearch.tumblr.com/post/4170305887</guid><pubDate>Mon, 28 Mar 2011 19:26:00 -0400</pubDate></item><item><title>Ramblings about software development</title><description>&lt;p&gt;This is where I write down all the stuff I&amp;#8217;m thinking about before I forget it. &lt;/p&gt;</description><link>http://deathrayresearch.tumblr.com/post/4115205277</link><guid>http://deathrayresearch.tumblr.com/post/4115205277</guid><pubDate>Sat, 26 Mar 2011 16:25:43 -0400</pubDate></item></channel></rss>
