Monday, 25 February 2008

TDD is easy!

I've been following Jacob Proffitt's recent posts, which focus on challenging some of the conventional wisdom around TDD ([1], [2]. [3], cross-posted to TheRuntime.com). (At least, that is my guess at his intention. Alternatively he could just be trying to incite flamewars :)). I'm always happy to have my assumptions challenged, as such challenges can help lead to deeper understanding, so I've been careful not to dismiss Jacob's work and instead have had a good think about it all.

While I disagree with some of his assumptions, logic and conclusions, I definitely agree with the pointy-end of his argument: that plain old unit testing (POUT) without TDD is better than coding without any unit tests. This became especially apparent to me after reading Michael Feather's Working Effectively with Legacy Code, which helped me realise that once you had code under test, you can refactor and improve the design with not-so-reckless abandon, eventually bringing you to the nice, clean type of design that TDD aims to help you achieve in the first place.

Jacob suggests that the majority of benefits generally attributed to TDD are actually just benefits of POUT and good design principles, and because TDD is more difficult that POUT, the industry should focus on getting people unit testing independently of TDD. By focusing on this distinction between POUT and TDD, I started thinking about his assumption that TDD is more difficult that POUT. What is it about TDD that makes it so hard compared with POUT?

TDD itself is trivially simple: write a test, make it pass, refactor to improve the design. As far as processes go, that's dead simple. So why do people have trouble with it? Anecdotally, the kinds of problems I normally hear (and have experienced) are things like "How do I test code that relies on a database?", "How can I test this UI logic?", "How can I test this unit without setting up loads of dependencies?". There is a common thread to all these issues: "How do I test [X]?". Unit testing. POUT. Completely independent of TDD, people have problems designing testable software, and writing unit tests for their designs. Which is understandable: good design is hard.

So is TDD actually the easy part? Are POUT and general software design challenges the real sources of people's difficulties? Regardless of whether you use TDD or not, maybe the real challenge is unchanged: developing a good, testable design and writing effective unit tests for it.

If there is any truth to this then it does make a good case for TDD. The main aim of TDD is helping developers to get a good design that, as a side effect of the process, can be unit tested. Rather than increasing the barrier to entry for unit testing (or POUTing :)) as Jacob suggests, maybe TDD has the opposite effect by guiding developers to make better (or at least more informed) design choices. That's not to say that you can't get a good, unit tested design without TDD of course. It is simply to say that if you are having trouble with TDD, switching to POUT may not help you much. As you will still be facing the same design and testing challenges, but without the feedback provided by TDD, it might actually exacerbate the problem!

To var or not to var...

Interesting thread on the ALT.NET mailing list, where Ilya is talking about some of the debate on the ReSharper 4 team surrounding the new var keyword, and the rules they ended up using for R# suggestions.

In my .NET 3.5 work so far I have tended to shy away from var, preferring to be explicit about my type declarations. I even turned off R#'s suggestions to use var instead of an explicit type. After reading Ilya's messages on the thread, I am reconsidering the issue, especially now I understand the logic behind when R# suggests var.

One of the conditions is when the right hand side of an initialisation explicitly shows the type. For example:

Dictionary<A,B> dict = new Dictionary<A,B>();
/* OR */
var dict = new Dictionary<A,B>();

In this case, why all the additional typing? We are already specifying what type we want, why duplicate it on the left and right sides of the expression? Both expressions are equivalent, we are just leaving the compiler to work out the obvious bits.

I'm beginning to think eliminating this duplication is a good thing. It removes a bit of noise from the code and let's you focus on the important stuff, without losing any type safety.

Obviously care needs to be taken not to abuse this feature, but for simple cases where the type is immediately obvious from the code then I'm starting to think var might be the way to go.

Update 2008-03-05: Ilya has posted some more arguments in favour of using var on his blog.

Friday, 22 February 2008

Cheat's way to show an indicator while images are loading

There are several cool, interesting ways to use Javascript to show progress indicators while images are downloading to your page. The method shown in this post is not one of them.

I have a simple ASP.NET page that has two dynamically generated images. The images are charts streamed from another ASPX page (e.g. <img src="MakeChart.aspx?someParamsHere" />), and they can sometimes take a couple of seconds to render on the page. This delay means people temporarily see a very bare page that looks like it has finished loading, and might start clicking around or navigating away before they get to see the pretty pictures. What I wanted was to give people some indication that stuff was happening, and that they should stick around for a few seconds.

Rather than whipping out ye olde JavaScript, I chose to keep things really simple. I created a CSS class to display the progress indicator as a background on the element holding the dynamic image. While the image is loading the background shows through. Once it is done, the background is covered by the final image.

<style>
.ChartPlaceholder {
  background-image: url("Images/loadingPlaceholder.png");
  background-position: center;
  background-repeat: no-repeat;
  height: <%= ChartHeight %>px;
  width: <%= ChartWidth %>px;
}
</style>
...
<div class="ChartPlaceholder">
  <asp:Image runat="server" ID="SomeDynamicChart"  />
</div>

In this case I have embedded the DIV's height and width defined in the ASPX straight into the CSS declaration, as it seemed reasonable in the context in which I am currently using it, but you can obviously use other methods to make the DIV render sensibly prior to the image coming through.

A bit hacky, but very simple and worked nicely in FireFox and IE7.

Monday, 11 February 2008

Emacs key bindings everywhere

I used to use Emacs for everything, and I loved it. You could do absolutely everything really efficiently from the keyboard using a number of arcane key combinations, and once the learning curve had been overcome you could absolutely fly through your work.

Now I use Visual Studio for my coding work, MS Word for documentation, and NotePad++ for everything else, I find myself wishing that I didn't have to move positions on the keyboard to move the cursor, delete characters and words, and do basic document navigation. I know VS.NET has Emacs key bindings available, but I find that their incompleteness means I have to mentally figure out whether to use Windows or Emacs shortcuts. This is especially difficult when switching between applications.

So I decided to whip up a quick AutoHotKey script that reproduces the basic Emacs document navigation key bindings, sticking to the shortcuts that don't interfere too much with standard Windows shortcuts (for example, I left Ctrl-V alone :)). You can toggle "Emacs Mode" on and off using the CapsLock key, and because it uses AutoHotKey to translate key strokes, it works everywhere your keyboard does :). I've been using it for months and found it quite useful, so I thought I'd put it online before I lose it. :)

You can download the script from here [ZIP]. You'll also need AutoHotKey. I'm far from an expert on AutoHotKey, so feel free to comment or email if you find any problems with the script.

Key bindings

The keys mapped by the script are described below. The mappings aren't perfect, they simply translate Emacs keys into similar Windows key combinations. Because different applications implement concepts like "next word" differently, sometimes you get results that are slightly different to expected. C is Control, M is meta key (Alt).

CapsLockToggle Emacs mode on/off
C-pPrevious line (move up)
C-nNext line (move down)
C-fForward one character (move right) Note: conflicts with normal "find" shortcut
C-bBack one character (move left) Note: conflicts with normal "bold" shortcut
M-fForward one word
M-bBack one word
C-aStart of line Note: conflicts with normal "Select all" shortcut
C-eEnd of line
C-<Start of page
C->End of page
C-_Undo
C-dDelete character after cursor
M-dDelete word after cursor
M-DelDelete word before cursor
C-kKill line
C-wCut region
M-wCopy region
C-yPaste (no kill ring, so don't get full Emacs yank ability)

If you want to use regular CapsLock functionality, try holding down Shift while pressing CapsLock. For me at least, this lets me toggle CapsLock on and off without being caught by the script.

Wednesday, 6 February 2008

A brief look at the logic of TDD

[Test] 
public void Should_do_tdd() { 
  Assert.That(me.ShouldTdd, Is.EqualTo("Not sure?"));
}

Jacob Proffitt has a post questioning whether TDD provides any benefits over Plain Old Unit Testing (POUT). POUT itself has many benefits, many of which became apparent to me reading Michael Feather's Working Effectively with Legacy Code. In the comments on Jacob's post I said I thought it unlikely that TDD would be proved better than POUT, due to problems measuring, and even defining, software quality. Jacob replied that he wasn't interested in a proof, but was simply after the reasons why TDD might help. So I thought I would have a quick run through the logic of why TDD might help you write better software. I'm going to steer away from benefits of TDD that you can also get from POUT (like low coupling/high cohesion etc), and focus on the my interpretation of the rationale behind TDD.

Please note: I am not trying to convert anyone. I firmly believe that you should use what works for you. If a particular tool doesn't help you, don't use it. If you can't see any value from an approach and feel it is a waste of time to examine further, don't try it. I'm also far from an expert on TDD. But seeing as Jacob took the time to reply to my comment, I thought I should at least return the favour :-)

Quick TDD review

TDD is design tool/process. You write a test first that describes some behaviour that your production code should exhibit. You run the test and it fails because you haven't implemented that behaviour yet. You then write some minimal code that makes the test pass. Finally, you refactor the code to remove duplication and improve the design, re-running the tests to make sure you haven't broken the behaviour. This process leads to the TDD slogan: "Red, Green, Refactor", red and green being the colours of the status bar in most xUnit test frameworks to show failure and success.

So what?

What is the logic of that? Testing code that doesn't even exist yet? Crazy!

Well let's start with the obvious stuff. First up, you get unit tests. Unit tests are good, and let you refactor safely. POUT/test last obviously does this too. You get quick feedback on whether the code you have just written breaks anything. You can do this with a POUT approach as well, depending on how quickly you write your tests, or whether you use manual testing for feedback. TDD provides a bit more assurance that you will get the quickest possible feedback and will always have a decent amount of code coverage, but with sufficient discipline a POUT approach can accomplish all of this. And we trust our development team right? We don't need some process to force them to do the right thing.

TDD as a design tool

TDD is a tool for incrementally improving the design of your code. The unit testing side of it is simply a nice side effect, to the point where BDD has been proposed as an alternate presentation of the technique to eliminate the apparent confusion caused by the word "test". So how can TDD be used to improve design?

Well, first up you are specifying the exact behaviour you want to implement before writing the code to do it. This has several effects. You are writing the logical interface to your class before the class itself (interface, not public interface IInterface{}). This takes some of the guesswork out of determining how your code is going to be called. You have to deal with the interface to your class in order to test it, so if it is painful to use you can immediately tell and do something about it. According to my pseudo-logic, this can help ensure production code that needs to talk to your object should have a well designed interface through which to do so easily.

Specifying the behaviour you want first also helps focus you on one piece of code at a time. You are just trying to pass the test, not immediately trying to solve every problem the class will ever face. This can help reduce feature creep in your design. If you find that a "divide and conquer" approach can make problem-solving easier, then isolating the one thing you want to achieve in this way can also make coding easier.

If you have never written a class that exposes an interface that turns out to be fairly unusable in Real LifeTM, or if you have never worked for a while getting your class to handle a situation that will never actually occur (e.g. "What if this class needs to take a DateTime as an input instead of just ints?", followed by a large effort to produce a generic version of your class that is only ever called as MyClass<int>) then you are a much better developer than I (probably a given), and you may not get any benefit from TDD. TDD doesn't prevent these problems, but it can help make these problems immediately apparent and therefore potentially avoidable.

The refactoring step is an essential part of the design aspect of TDD. The idea is that you have just gotten your test to pass, so your class' behaviour for that specific test is correct. The very next step is to improve the design. Remove any duplication, extract any methods or rename variables as required to make the code more readable and understandable. Sprout any new classes that are required to better encapsulate the data or behaviour. Refactoring is not unique to TDD, but TDD helps you to perform these design improvements in small, hopefully easy, increments. If your test is difficult to write or implement, this feedback tells you your design might need tweaking, or that you initial specification needs more clarity. Your design is evolving based on continuous feedback from your immediate requirements and your actual code. This can help avoid large, difficult refactorings and unnecessary refactoring.

By improving the design at each step, in theory you make the next step easy to perform. The goal is having code that is always easy to change, making your more responsive to changes in business needs and requirements, and making bugs easier to fix (that's right! You still get bugs doing TDD! :-P).

Useless point on the "magic" of TDD

Aside from my convoluted version of the theory behind TDD, I feel TDD has the potential to change the way you think about coding. For me TDD had a profound effect on how I think about software design, to the point where even when not doing TDD I still find myself thinking in terms of passing tests and getting feedback from incremental coding steps. I feel this has made me a better developer, providing me with another way of thinking about problem. The extra perspective may not help you in every situation, or ever, but it does give you another avenue to pursue when you are stuck.

I also find I come up with much cleaner designs using this different perspective. The Robert Martin's Coffee Maker example [PDF] illustrates some OO design traps. In my experience the incremental design encouraged by TDD helps avoid some of those traps. Of course, this is all anecdotal hand-waving, hence this section's title :-)

Clarifications and conclusions

Refactoring, incrementally improving design, isolating behaviour, writing testable code, etc can all be achieved without TDD. TDD is just a tool to help you do all these things. If you do them fine without TDD, then TDD is probably not going to help you. If the technique sounds interesting to you, then your best bet is to give it a try and see if you see any benefits over and above POUT (I first got into TDD following this example. I converted it to C# and NUnit, commented out all but one test, and started coding).

I have not listed anything that is a dramatic, inspiring benefit of TDD. That is because TDD is simply a tool to help achieve what we are all trying to do anyway: writing good, well designed software. If you do that already then TDD may just seem like more work. This is probably why people that feel passionately about TDD can have a hard time selling it to people that are skeptical of it. It isn't dramatic. There is no "if you do this you will become almost as popular as Justice Gray" aspect to it.

I should also be clear that, for me at least, TDD did not come easy. I am definitely still learning, but my confidence and my results improve the more I use it. I feel it has been worthwhile. YMMV. If you have a mentor or a colleague to learn with this process may be easier.

As a closing aside, I think everyone will agree that at least people are debating the merits or TDD vs. a simple, good bank of units tests, rather than trying to justify the practice of unit testing itself. :-) Hopefully that means I won't ever have to face the "We don't have time to unit test" discussion again ;-)

Monday, 4 February 2008

Using LINQ to XML to migrate blog posts

During my recent dalliance with other blog hosts, I ended up playing around with LINQ to XML and was quite impressed with how much nicer it was than the pre-LINQ .NET classes for XML parsing. Here's a quick overview of how I migrated all my Blogger posts to Community Server (before deciding to stick with Blogger :)), with a focus on the LINQ to XML bits.

First my compulsory disclaimers: this was a rush job. This was the most disgusting, hacky code I have ever written. I have written C64 Basic programs with better separation of concerns. If you use any of this code for anything at all you'd have to be insane! I did write the code test-first, but definitely skipped the all-important "design" part of TDD. Ugly ugly ugly. With that said, let's try and get something useful out of the spaghetti.

Getting Blogger posts in XML format

First step was to retrieve all the posts in XML. You can get this directly from a Blogger url:

http://(your_blog).blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500

If you have more than 500 posts, adjust the querystring or use a couple of batches to get the results. You can download this using .NET (webClient.DownloadFile(url, feedFile);, using System.Net.WebClient), or use the highly technical method of downloading from your browser :-) The resulting XML looks something like this:

<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?>
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'>
  <!-- Some nodes relating to your blog here. -->
  <entry>
    <id>...</id>
    <published>...</published>
    <updated>...</updated>
    <category scheme='http://www.blogger.com/atom/ns#' term='daves drivel'/>
    <category scheme='http://www.blogger.com/atom/ns#' term='shameless self promotion'/>    
    <title type='text'>...</title>
    <content type='html'>(blog post content here)</content>
    <link rel='alternate' type='text/html' href='http://davesquared.blogspot.com/2008/02/sample-post.html' title='Sample post'/>
    ...
  </entry>
  <!-- Lots more <entry /> nodes -->
</feed>    
    

Basic XML parsing using LINQ to SQL

As you can see from the XML above, the posts are described by <entry /> nodes. Normally we'd be breaking out the XmlDocument or XmlTextReader or similar and working some XPath magic. Or we could just use LINQ to XML. Here's a simple example:

XDocument xml = XDocument.Load(new StringReader(feed));
XNamespace xmlns = "http://www.w3.org/2005/Atom";
var entries = from feedElements in xml.Descendants(xmlns + "entry")
              select new {
                Title = feedElements.Element(xmlns + "title").Value,
                Content = feedElements.Element(xmlns + "content").Value
              };                  
 

The first thing we do is load the XML into an XDocument (in System.Xml.Linq). Looking at the XML feed, the XML namespace used is "http://www.w3.org/2005/Atom". I haven't found an XmlNamespaceManager-style approach to handling namespaces in LINQ to XML, so I just put this in a xmlns variable and will append it as I go. If you know how to specify a default namespace for XDocuments, please let me know :)

The next step is selecting the title and content elements from the each <entry/> node. The xml.Descendants(XName) method returns an IEnumerable<XElement>, which we tell to filter out all elements other than "entry" nodes.  The entries variable will now contain an IEnumerable<> of an anonymous type. Each item in the enumeration will have a Title and Content property representing the blog title and content parsed from the XML. We can iterate over this, use entries.ToArray() to perform our query, or use other LINQ goodness like entries.Take(5) (cue jazz) to further filter our results.

Notice how we specify all the node names as strings? They are actually of type XName (or XNamespace in the case of the xmlns variable). There is no public constructor to create an XName, but instead an implicit conversion from String is defined. This gives us the ease of using strings to specify node names (which is quite natural when working with XML), with the benefits of having strong typing around the name to access properties like LocalName and Namespace. In our query we have to prefix the XName, like entry, with the namespace xmlns, to make sure our nodes resolve properly, hence all the xmlns + "entry" style code.

Getting all the post data using LINQ to XML

Now let's get strongly typed objects from our XML feed.

public class BlogEntry {
  public String Title;
  public String Content;
  public String[] Categories;
  public String OriginalLink;
  public String Published;
  public String Updated;
}

I've been lazy here and am parsing the published and updated dates as simple strings (see, I told you this was hacky!). There are two field declarations of interest here (the rest have one-to-one relationships with XML elements). The first is the Categories array. These are specified in the XML as children of <entry/> nodes, with the term attribute holding the pertinent information:

<entry>
  ...
  <category scheme='http://www.blogger.com/atom/ns#' term='daves drivel'/>
  <category scheme='http://www.blogger.com/atom/ns#' term='shameless self promotion'/>    
  ... 
 

The other is the OriginalLink field. I wanted to put a link back to the original post from the new blog to be clear about the source and so that people could see any comments they made (I would have taken the comments over as well, but only had the MetaWeblog API to work with). So I needed the original post link, which I could parse out of one of the <link /> nodes that has a rel attribute value of "alternate":

<entry>
  ...
  <link rel='alternate' type='text/html' 
      href='http://davesquared.blogspot.com/2008/02/sample-post.html' title='Sample post'/>
  ...
 

Armed with this knowledge, let's tackled the new LINQ to XML query:

var entries = 
  from feedElements in xml.Descendants(xmlns + "entry")
  select new BlogEntry() {
    Title = feedElements.Element(xmlns + "title").Value,
    Content = feedElements.Element(xmlns + "content").Value,
    OriginalLink = feedElements.Elements(xmlns + "link")
             .Where(link => link.Attribute("rel").Value == "alternate")
             .Select(link => link.Attribute("href").Value)
             .First(),
    Published = feedElements.Element(xmlns + "published").Value,
    Updated = feedElements.Element(xmlns + "updated").Value,
    Categories = feedElements.Elements(xmlns + "category")
                   .Select(category => category.Attribute("term").Value)
                   .ToArray()
  };
 

The main changes from our original query are emphasised. First up, we are now working with strongly typed BlogEntry objects, rather than anonymous types. The entries variable is now an IEnumberable<BlogEntry>, which we can actually return from our parser method (vars only work locally).

We are also using nested queries to drag out the Categories and OriginalLink (you can do this using the sugary "from ... in ... select" style as well, but I found it easier to use the methods explicitly in this case). For categories we are simply selecting the term attributes from all the <category/> nodes in our entry. For the original link, we use .Where() to filter all the <link/> nodes to only include ones with a rel attribute equal to "alternate", take the .First() (there should only be one), and then select the value of the href element.

Finishing up

The final steps in the migration where doing some regex-ing of the content to get the post id (slug for Wordpressors), and translate links to my own articles to point to the new site (so clicking on internal links on the new blog kept readers in the new blog. The last bit in particular was a bit tricky, as it needed a first pass to parse into a dictionary to I could look up the new urls as required. Here's the horribly hacky code if you're interested:

BloggerParser parser = new BloggerParser();
IDictionary<String, BlogEntry> entries = parser.ParseFeed(feed).ToDictionary(entry => entry.GetSlug());
foreach (BlogEntry entry in entries.Values) {
  entry.Content =
    Regex.Replace(entry.Content,
    @"http://davesquared.blogspot.com/\d{4}/\d{2}/(?<slug>.*?)\.html",
    delegate(Match match) {
      return entries[match.Groups["slug"].Value].GetNewLink();
    },
    RegexOptions.IgnoreCase | RegexOptions.Singleline);
}
    

After parsing into a dictionary (parser.ParseFeed() returns our IEnumerable<BlogEntry> from our earlier LINQ to XML adventures), we try and replace any internal links in each post's content with the link to the new post, using the slug as a unique index to lookup BlogEntryS. Ugly but effective.

The final step in all of this was to post all the transformed posts to the new site, which you can do using the MetaWeblog API, which, as far as I can tell, all went remarkably well. :-)

So there you go. This was my first real experience working with LINQ to XML, and I found it a fair bit easier than XmlDocument tweaking and mumbling various XPath incantations. Hope this helps. :-)

Sunday, 3 February 2008

Moving up, moving out, moving back...

After throwing myself headlong into the process of migrating my blog to a shiny new weblogs.asp.net address, I found myself a bit disillusioned with the animated banner ads that appeared on the blog over there. Seeing I'm my most frequent reader, it seemed like a bit of a waste to have ads (or gaps of whitespace for Adblock users ;-) ) on my own resource. I don't really want to be looking up some configuration setting for NHibernate to be interrupted by an animated banner telling me to buy stuff.

So, much like George Costanza in "The Revenge", I'm heading back to work and pretending nothing ever happened :-)

One thing I have learned from the experience is to really appreciate some of the services we have like Blogger, Wordpress.com, Feedburner, Google Code etc. that really offer a lot without asking much (or anything) in return.

Another thing I learned was to look before I leap. I actually knew that already, but was blinded by my years of longing for an MVP-like blog on weblogs.asp.net, right next to Mr Gu, Frans, et al.

And finally, I learned that anyone that reads my blog or blog feed, and enjoys it for some strange reason, should make sure they are subscribed to my Feedburner feed, so you can continue to suffer my inane blog posts no matter where my blog is hosted. :)

Not sure what to do with the old new blog. I might just keep it there as a reminder to me that the grass always looks greener on the other side... :)