Skip to content

Premature Unoptimization

October 5, 2010

In my last entry, I may be guilty of premature unoptimization. Everybody has heard of the evils of premature optimization, so doing the opposite must be the root of all good! In my example, I refactored code to make it more readable, by proposing that LINQ extensions made for an elegant approach. Here’s a reminder:

Before:

byte[] md5FixedSizeByteArray = new byte[8];
byte[] md5ByteArray = md5.ComputeHash(encoding.GetBytes(url));
Array.Copy(md5ByteArray, 0, md5FixedSizeByteArray, 0, 8);

After:

md5.ComputeHash(encoding.GetBytes(url)).Take(8).ToArray();

I knew that this rewrite may incur a slight performance loss, but figured it’s in the noise. Coincidentally, soon after this post, an email thread appeared at work on the performance of .NET managed code. One email made mention of poor performance for these type of LINQ operations because they tend to do unsavory things behind the scenes like create temporary objects on the heap. This discussion made me curious to measure it.

What I measure is the new “elegant” approach incurs a 9% increase in CPU. And this does not include costs that may be hidden from my measurements, such as extra memory footprint and garbage collection to clean up after those short-lived enumerator objects.

The difference is primarily the way they allocate memory. The first version allocates an array of the right size to receive the data, and then copies directly to it. The second version, well, what does it do? Without using a decompiler like Reflector to examine the code, I provide an educated guess: The Take() function returns an object that contains the logic for selecting the first 8 values of the array. Most likely this is a heap allocation for an IEnumerable (meaning to loop through it is a series of function calls–no more memcpy). Then, ToArray() iterates the enumerator, while allocating an array to copy to via dynamic growth like a List (or C++ vector)–in the worst case, it might build a temporary List and copy it to an Array (but I give it the benefit of the doubt and assume this unnecessary copy is removed). And in this example I am lucky, as this short array doesn’t need to grow much. But similar code could bite me under different use cases, especially when resize requires relocation.

The odds are stacked against the rewrite, so I fully expected it to perform worse. The real question is, was it bad for me to unoptimize this code? Does readability trump performance? (To be fair, the original code was not unreadable–but as developers we are commonly confronted with similar tradeoffs under more egregious examples so the same question holds).

Joe Duffy’s article makes the case that even though you shouldn’t prematurely optimize, you also shouldn’t write code that performs poorly when you have the option not to. He points out an example where using LINQ resulted in an order of magnitude worse performance than the alternative. It’s an interesting case because it counters what I advocated. He even points out that those new to LINQ (like me) are likely to do such things! And it’s a point well taken… as a LINQ newbie I may tend to overuse it.

On the other hand, we choose our language because it helps us get the job done. I expect that managed languages have overhead, and using nice declarative and functional approaches even more so. If I wanted to code each loop and control each memory allocation I’d use C, but I want C# (and LINQ) so I can focus on the bigger picture: how to process my data.

My rewrite would hurt performance if it were in the critical path of execution, and I would not have done so if this were the case. But here it’s just one of a zillion things that needs to happen when processing the link, and it doesn’t register in the profiler. Just like it is necessary to know context when deciding if an optimization is premature, the same is true when deciding if clearer (or in this case succinct) code is better.

So for now, I’ll leave it as is. In my last job, we had an ongoing joke that we’d intentionally add delays so that later, when we needed to turn the performance to 11, we could remove the offensive code and be a hero. Even though it was a gag, sometimes when finding some low-hanging performance fruit, it feels like it was intentionally there just to be salvaged later. I suppose by leaving my prematurely unoptimized code in place, if and when it becomes the bottleneck, somebody can fix it and be a hero.

Leave a Reply

Your email address will not be published. Required fields are marked *