Showing posts with label featured. Show all posts
Showing posts with label featured. Show all posts

Saturday, 29 March 2008

99 problems with Python (1-10)

No, this is not a rant about problems I have with Python. I actually quite like it. Rather, it's my attempt to complete some of the Dr Werner Hett's 99 Prolog logic problems using Python, as a way of getting to learn a bit about the language. I first heard about the 99 Problems from Joel and Ryan Lanciaux's efforts to solve the problems using F#, who were in turn inspired by the Ruby implementations on the Curious Coding blog.

Keep in mind that I'm new to Python (have jumped into this exercise straight after reading the excellent Dive into Python), so much of this might be pretty clumsy. I'd love to hear any suggestions for improvements. I am also not entirely keeping with the original spirit of the Prolog exercise in terms of logic programming, as my main purpose is familiarising myself with Python. I have, however, tried to do things in a fairly Pythonic, functional-style rather than C/C++/Java/C#, imperative-style. I'm also not concerned with error handling or edge cases, just in getting the basic syntax and approach right. I'm not sure I'll get through all 99 problems, but at the very least here's the first 10.

Implementation overview

I did the exercise in Eclipse with PyDev plugin, and used the Python unittest module to implement each solution as a test to make it easy to run and verify each solution. You can obviously run all this using IDLE or whatever Python runner you like. Here's the basic file structure I started off with:

import unittest

class NinetyNineProblemsFixture(unittest.TestCase):
    
  def testXX_Description(self):
    def solutionToProblem(input): pass          
    input = "some input"
    expected = "expected output for successful implementation"    
    self.assertEqual(expected, solutionToProblem(input))  
  
if __name__ == '__main__':
  unittest.main()

I wrote a test for each problem, then implemented the solution as an inner function within the test. The if __name__ == '__main__' line lets you run the tests by simply running your *.py file through Python. I tended to run using PyTest.py from within Eclipse/PyDev.

Back to table of contents...

P01 Find the last element of a list

  def test01_FindLastElementOfList(self):
    def getLast(list):
      return list[-1]
    list = [1, 3, 7, 14]
    self.assertEqual(14, getLast(list))

Lists are great in Python. They have lots of nice features that make them very pleasant to work with. This example shows one of these features, you can use a negative index to get list items from the end of the list. For example, list[-3] will get the third last item in the list. In this case, we want the last element, so we use list[-1].

Back to table of contents...

P02 Find the last but one element of a list

Same as P01, but we want the second last item:

  def test02_FindLastButOne(self):
    def getLastButOne(list):
      return list[-2]
    list = [1, 3, 7, 14]
    self.assertEqual(7, getLastButOne(list))

Back to table of contents...

P03 Find the K'th element of a list

Basic array index for this. Lists indicies are zero-based in Python (as they should be :)), but the problem definition wants to translate the reference to 1-based.

  def test03_FindKthElement(self):
    def getKth(list, k):
      return list[k-1]
    list = [1, 3, 7, 14]
    self.assertEqual(3, getKth(list, 2))

Back to table of contents...

P04 Find the number of elements of a list

Cheating for this one and just use the built in len function:

  def test04_NumberOfElementsInList(self):
    def getCount(list):
      return len(list)
    list = [1, 3, 7, 14]    
    self.assertEqual(4, getCount(list))   

Back to table of contents...

P05 Reverse a list

Here I played around with two different approaches so as to learn a bit about Python list slicing.

Update (2 April 2008): Initially I did not realise that there was a built-in reversed() function. I originally had something silly like this:

    def reverse(l):
      newList = list(l)      
      return newList.reverse() or newList

The list.reverse function does an in-place reversal of the list elements and returns None (the Python null value), so I had to clone the list first so as not to affect the original, then OR it to return the list reference itself. Thanks to Mark et al. on Reddit for showing me the light! :)

  def test05_ReverseAList(self):
    def reverse(l):
      return list(reversed(aList))
    def manualReverse(list):          
      return list[::-1]
    sampleList = [1, 3, 7, 14]
    self.assertEqual([14, 7, 3, 1], reverse(sampleList))
    self.assertEqual([14, 7, 3, 1], manualReverse(sampleList))

The reversed() function returns an iterator, so it is wrapped in a list() constructor to convert it to a new list. I used l for the method parameter, as I couldn't figure out how to appropriately qualify references to the list class when it was hidden by a parameter called list (cue embarrassed smiley).

List slicing

The second approach was to use Python's list slicing. A slice is a range of values copied from an original list. This means we have no side effects like we do using the in place reverse(). The basic syntax for a slice is:

list[indexOfFirstElementInSlice : indexOfFirstElement_Not_InSlice : optionalStep]

Omitting the first argument starts the slice from the first list element. Omitting the second argument takes the remainder of elements in the list. The step can be set to, say, 2 to take every second list element. It's worth noticing that the second argument is exclusive, so that we can the following from the Python interpreter:

>>> aList = range(0, 6)
>>> aList
[0, 1, 2, 3, 4, 5]
>>> #From index 1 up to, but excluding, index 5:
>>> aList[1:5]
[1, 2, 3, 4]
>>> aList[1::2]
[1, 3, 5]

In our case we want the slice to include all the elements, but we want to take them in reverse order, which is how we end up with list[::-1]. Right, that was all a lot of explanation for not-much-code, but I thought I'd point out some Python basics along the way.

Back to table of contents...

P06 Find out whether a list is a palindrome

A palindrome is something that reads the same forward as it does backwards, like "Go hang a salami, I'm a lasagna hog"* (well, ignoring punctuation and spaces). I tried two different approaches for this one too.

  def test06_IsListAPalindrome(self):
    def isPalindrome(aList):
      return aList == aList[::-1]
    def isPalindromeRecursive(aList):
      if (len(aList)<=1): return True
      top, tail = aList[0], aList[-1]          
      if (top != tail): return False
      return isPalindromeRecursive(aList[1:-1])
    palindromes = [
                   ['a', 'b', 'c', 'b', 'a'],
                   [1, 10, 20, 30, 30, 20, 10, 1],
                   ['a', 'a']                    
                   ]
    nonPalindromes = [
                      ['a', 'b', 'c', 'd'],
                      [1, 10, 20, 20, 10, 2],
                      ['a', 'v']
                      ]    
    self.assertTrue(all([isPalindrome(x) for x in palindromes]))
    self.assertTrue(all([not isPalindrome(x) for x in nonPalindromes]))
    
    self.assertTrue(all([isPalindromeRecursive(x) for x in palindromes]))
    self.assertTrue(all([not isPalindromeRecursive(x) for x in nonPalindromes]))

The first way is cheating, but effective. Simply use return aList == aList[::-1] to compare the list with a reverse slice, using the same slicing syntax used for P05. This will obviously ensure the list reads the same forwards as backwards.

The second way uses recursion, as well as logical list operations to get the same effect. Let's break it down:

if (len(aList)<=1): return True

A list of 0 or 1 elements will read the same forwards as backwards, right? So yes, we have a palindrome.

top, tail = aList[0], aList[-1]          
if (top != tail): return False

Ok, now we look at first and last elements of the list (the latter of which I inconveniently named tail, which is normally used to refer to the remainder of the list besides the first element, head/tail semantics). This line also shows how you can do multiple assignments over one line in Python. You can also use a similar syntax return multiple values from one function, which we'll see later if you make it that far without getting bored :). The second line of the fragment then compares these elements -- if they are not equal then the list can't be a palindrome (e.g. 1 2 3 2 7, the 1 and 7 don't match so the list isn't a palindrome).

If the first and last elements are equal, then we have a potential palindrome, and use tail recursion to check whether a slice of the list, excluding the first and last elements, is a palindrome. Which seems to work nicely.

* A fine lesson in palindromes is available from Mr Yankovic album. You might be able to find a version somewhere without too much trouble, strictly for educational purposes.

Back to table of contents...

P07 Flatten a nested list structure

This problem just wants us to take a nested list, like [1, [2, 3], [4, 5, [6, 7]]], and flatten it out to a 1-dimensional list, like [1, 2, 3, 4, 5, 6, 7]. After coming up with a fairly ugly solution I ended up peeking at the Curious Coding solution, and adapted it to Python. This is also the first problem in the set marked as a 2-star problem, or "intermediate difficulty" (the others have all been marked as 1-star, or "easy").

  def test07_FlattenAList(self):
    def flatten(aList):
      flatList = []      
      for item in aList:
        if (type(item)==list):
          flatList.extend(flatten(item))
        else:
          flatList.append(item)          
      return flatList
    self.assertEqual(
                     [1, 2, 3, 4, 5, 6, 7], 
                     flatten([1, [2, 3], [4, 5, [6, 7]]])
                     )

Points of note from this problem is the use of the type() method (well, type is actually a "type", which, like everything in Python, is an object). This is used to check if the current element is a list. If so, the flattened list is extended by the flattened version of that sub-list (via a recursive call to flatten()). If not, the element is appended to the flatList.

The extend() method adds the items from a list to another list, whereas append() adds the item from as a single element to the list. This is probably best shown with an example:

>>> aList = range(0,6)
>>> aList
[0, 1, 2, 3, 4, 5]
>>> bList = list(aList)
>>> bList
[0, 1, 2, 3, 4, 5]
>>> anotherList
[6, 7]
>>> aList.append(anotherList)
>>> bList.extend(anotherList)
>>> aList
[0, 1, 2, 3, 4, 5, [6, 7]]
>>> bList
[0, 1, 2, 3, 4, 5, 6, 7]

Back to table of contents...

P08 Eliminate consecutive duplicates of list elements

This is another 2-star, intermediate problem, but it was pretty easy to whip through using Python's list comprehensions. Let's have a quick look at the list comprehension syntax first:

[itemInNewList for itemInNewList in someList if someConditionIsMet]

The square braces ([, ]) indicate we are creating a new list. In the new list we will include itemInNewList as an element, for the itemInNewList values in someList (or other iterable), that meet someConditionIsMet (the if someConditionIsMet is optional). Quick example:

>>> aList = range(0,6)
>>> aList
[0, 1, 2, 3, 4, 5]
>>> [item for item in aList if item < 3]
[0, 1, 2]

All a bit LINQ-like to me (well, more correctly LINQ is inspired by features in functional programming, such as monads [PDF]).

Back to problem 8, the aim is to eliminate consecutive duplicates from a list. So we would like to create a new list by iterating over a source list, and only including elements that are not the same as the previous element.

  def test08_EliminateConsecutiveDuplicates(self):
    def compress(aList):
      return [item for index, item in enumerate(aList) if index==0 or item != aList[index-1]]
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(
                     ['a','b','c','a','d','e'],
                     compress(sampleList)                     
                     )

The only tricky bit with this list comprehension compared with the trivial example given above is the use of the enumerate(aList) function which returns two values per iteration, the item and the index of that item. We use these variables in the list comprehension condition to check whether this is the first list item (which we want to include), or if the list item duplicates the previous one (which we want to exclude).

I think that's awesome :)

Back to table of contents...

From comments: avoiding enumerate()

Arnar (Blogger profile link) left a nice solution for this problem in the comments that doesn't require the use of enumerate(). I've reproduced it here (updating the naming convention to match the example above):

    def compress(aList):
        return aList[:1] + [aList[i] for i in range(1, len(aList)) if aList[i-1] != aList[i]]

This first creates a list containing only the head element (aList[:1]), and concatenates it with a list comprehension that eliminates duplicates. Rather than using enumerate() to get the index and item, Arnar used range(1, len(aList)) (which I have blogged about before) to generate the relevant indexes and then accesses the items direct from the list.

The nice thing about this approach is that by starting with aList[:1], we simplify the list comprehension condition I originally had (if index==0 or item != aList[index-1]). I'm not sure if there is a clear advantage of using enumerate() or range() to iterate over, but it definitely gave me another way of thinking about the problem. Thanks Arnar! :)

From comments: using itertools.groupby

This example added 3 April 2008

I had a number of helpful comments suggesting I use the itertools standard Python library for a number of these examples. Thanks to Arnar, Niall, Mohammad and everyone else that suggested this and wrote in with examples. I originally intended to do this exercise without using any imports (well, except for unittest), but Arnar managed to convince me to stop being so silly. :)

The particular itertools function we are looking at is groupby(iterable[, key]). This function returns a list of tuples. The first item in the tuple is the key for a particular group, and the second is an iterator over the group itself (i.e. (key, group)). So how are keys specified? By default, it is the item's identity or value (so if you group [1, 1, 1, 2, 3], you will get a group of 1s, like this:

>>> from itertools import groupby
>>> aList = [1,1,1,2,3]
>>> [list(group) for key, group in groupby(aList)]
[[1, 1, 1], [2], [3]]
>>> [key for key, group in groupby(aList)]
[1, 2, 3]

Note we have to import the function from the itertools library first using from itertools import groupby (Dive Into Python has a great explanation about how to import stuff). As the group returned is an iterator over a group, we can turn it into a list using the constructor list(group). The last command issue also shows what keys groupby() finds when no key argument is supplied. We can also provide groupby() with a function used to calculate keys:

>>> aList
[1, 1, 1, 2, 3]
>>> [list(group) for key, group in groupby(aList, lambda x: x<3)]
[[1, 1, 1, 2], [3]]

Here we use a simple lambda function to sort the list into groups: those less than 3, and not. :) Armed with just enough knowledge to be dangerous, let's try and solve problem 8 again. To eliminate consecutive duplicates, all we really want to do is get the keys from the list:

  def test08_EliminateConsecutiveDuplicates(self):
    def compress(aList):        
        return [key for key, group in groupby(aList)]
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(['a','b','c','a','d','e'], compress(sampleList))

And this passes nicely. But hold on a minute! How come we have duplicated keys in there? There are to 'a' elements! I said the default keys were each item's identity, and they are both 'a'! Why are you lying Dave? WHY?!?!

Ok, I'm better now. The reason is that groupby() is implemented using an iterator. It goes through the list picking out items in the 'a' group, then hits 'b'. This has a different identity, and so starts a new group. The function's iterator has now moved past the initial group of 4 'a' elements, and has basically completely forgotten about them, so the next time it hits an 'a' it creates a new group. Clear as mud? Check out the documentation and it should clear things up ( worked for me :) )

Back to table of contents...

P09 Pack consecutive duplicates of list elements into sublists

Another 2 star problem, this one wants us to convert consecutive duplicates in a list into a sub-lists. The original problem says something about only packing duplicates if list contains duplicates, but the example and implementations I have seen do not seem to do this, so I'll follow convention of blindly packing each element into a sub-list.

  def test09_PackDuplicatesIntoSubLists(self):
    def pack(aList):
      packedList = []
      for index, item in enumerate(aList):
        if index==0 or item != aList[index-1]:
          packedList.append([item])
        else:  
          packedList[-1].append(item)
      return packedList
    
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(
                     [['a','a','a','a'],['b'],['c','c'],['a','a'],['d'],['e','e','e','e']],
                     pack(sampleList)                     
                     )

I couldn't find a nice way of doing this list comprehension style, so I resorted to a simple imperative operation. We start with an empty packedList, and use the nice enumerate() that was so useful in problem 8 to iterate over the source list. If this is the first element, or if this element is not duplicating the previous item, we will append a one element, sub-list to packedList (packedList.append([item])). This means that every element of packedList will be a list itself. This is important, because the else branch of this condition relies on this fact to simply add duplicate elements to the previous sub-list (packedList[-1].append(item)).

At this point I'm seriously loving how easy it is to use Python lists: slicing, list comprehensions, negative indexing etc. Test passes, so it's off to our final question for this set.

From comments: another itertools.groupby alternative

This example added 3 April 2008.

As noted in the follow ups to problem 8, a number of commenters pointed me to the groupby() function (see problem 8 for an explanation). We can use it here too, but this time we want the groups rather than the keys:

  def test09_PackDuplicatesIntoSubLists(self):
    def pack(aList):
        return [list(group) for key, group in groupby(aList)]
    sampleList = ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e']
    self.assertEqual(
                     [['a','a','a','a'], ['b'], ['c','c'], ['a','a'], ['d'], ['e','e','e','e']],
                     pack(sampleList)                     
                    )

Back to table of contents...

P10 Run-length encoding of a list

This problem relates to problem 9, but instead of showing multiple elements in each sub-list, we just want a count of how many duplicates there are. Let's have a look at the test assertion to make this clearer:

    ...
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(
                     [[4,'a'],[1,'b'],[2,'c'],[2,'a'],[1,'d'],[4,'e']],
                     encode(sampleList)
                     )

First I tackled this the easy way, using copy-and-paste reuse from problem 9 and changing the action on each branch. The implementation is show below, with differences from problem 9 emphasised:

    def encode(aList):
      encodedList = []
      for index, item in enumerate(aList):
        if index==0 or item != aList[index-1]:
          encodedList.append([1, item])
        else:  
          encodedList[-1][0] += 1
      return encodedList

Here, rather than appending to sub-lists all the time, we are keeping a structure that contains the number of duplicates and the item being duplicated in each list. That worked, passing the test show above, but I also tried a less-objectional form of reuse of the pack() function from P09 by calling it from a list comprehension:

    def encode2(aList):
      return [[len(packed), packed[0]] for packed in pack(aList)]    

This list comprehension is making a new list from every sub-list returned by the pack() function. Each item is a two item list ([len(packed), packed[0]]). The first item is the length of the sub-list (i.e. how many duplicates we have), and the second is the first item of the sub-list (i.e. the item being duplicated). Here is the complete test:

  def test10_RunLengthEncodeList(self):    
    def encode(aList):
      encodedList = []
      for index, item in enumerate(aList):
        if index==0 or item != aList[index-1]:
          encodedList.append([1, item])
        else:  
          encodedList[-1][0] += 1
      return encodedList
    def pack(aList):
      packedList = []
      for index, item in enumerate(aList):
        if index==0 or item != aList[index-1]:
          packedList.append([item])
        else:  
          packedList[-1].append(item)
      return packedList
    def encode2(aList):
      return [[len(packed), packed[0]] for packed in pack(aList)]
      
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(
                     [[4,'a'],[1,'b'],[2,'c'],[2,'a'],[1,'d'],[4,'e']],
                     encode(sampleList)
                     )
    self.assertEqual(
                     [[4,'a'],[1,'b'],[2,'c'],[2,'a'],[1,'d'],[4,'e']],
                     encode2(sampleList)
                     )   

From comments: yet another itertools.groupby alternative

This example added 3 April 2008.

As noted in the follow ups to problem 8 and problem 9, I really should have been using the groupby() function for some of these problems (see here for an explanation).

  def test10_RunLengthEncodeList(self):    
    def encode3(aList):
      return [[len(list(group)), key] for key, group in groupby(aList)]
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(
                     [[4,'a'], [1,'b'], [2,'c'], [2,'a'], [1,'d'], [4,'e']],
                     encode3(sampleList)
                    )

This is pretty similar in form to the second approach used (encode2()), but without having to call pack first.

Back to table of contents...

Conclusion

Because all the problems so far revolved around lists, an area in which Python excels, I think all of these implementations came out quite well, especially considering I have no idea what I am doing when it comes to Python (or at all, some might argue!).

Looking at the Ruby implementations, there are a number of similarities between the Python versions I came up with. To be fair, I did peek at some of the Ruby solutions for hints in keeping to a functional programming style, but my main point here is that the language differences for basic stuff seem fairly superficial. I found both the Ruby versions and the Python versions very easy to understand.

Perhaps because it has been many, many years since I worked with Haskell, I found the F# samples from the Lanciaux brothers much more difficult to understand. The language semantics are just so different. Although all of them seem more natural to me than the original Prolog solutions :-) Take a look at the solutions to problem 8 and see if you agree:

All in all I had a thoroughly good time using Python for this exercise, and will definitely be looking for any excuse to use it again in future. At least for this basic stuff, the language just seems so concise and easy express your intention through the syntax. If you have suggestions for improvements or find any bugs with this please leave a comment or send me an email (tchepak at gmail). [UPDATE: Thanks to all those who have helped out with comments so far!]

Back to table of contents...

Change log

  • 2008-04-03: Added itertools.groupby() versions of problems 8, 9, 10 after some more helpful comments. Added this change log.
  • 2008-04-01: Added non-enumerate version for problem 8 after getting some helpful comments.

Wednesday, 6 February 2008

A brief look at the logic of TDD

[Test] 
public void Should_do_tdd() { 
  Assert.That(me.ShouldTdd, Is.EqualTo("Not sure?"));
}

Jacob Proffitt has a post questioning whether TDD provides any benefits over Plain Old Unit Testing (POUT). POUT itself has many benefits, many of which became apparent to me reading Michael Feather's Working Effectively with Legacy Code. In the comments on Jacob's post I said I thought it unlikely that TDD would be proved better than POUT, due to problems measuring, and even defining, software quality. Jacob replied that he wasn't interested in a proof, but was simply after the reasons why TDD might help. So I thought I would have a quick run through the logic of why TDD might help you write better software. I'm going to steer away from benefits of TDD that you can also get from POUT (like low coupling/high cohesion etc), and focus on the my interpretation of the rationale behind TDD.

Please note: I am not trying to convert anyone. I firmly believe that you should use what works for you. If a particular tool doesn't help you, don't use it. If you can't see any value from an approach and feel it is a waste of time to examine further, don't try it. I'm also far from an expert on TDD. But seeing as Jacob took the time to reply to my comment, I thought I should at least return the favour :-)

Quick TDD review

TDD is design tool/process. You write a test first that describes some behaviour that your production code should exhibit. You run the test and it fails because you haven't implemented that behaviour yet. You then write some minimal code that makes the test pass. Finally, you refactor the code to remove duplication and improve the design, re-running the tests to make sure you haven't broken the behaviour. This process leads to the TDD slogan: "Red, Green, Refactor", red and green being the colours of the status bar in most xUnit test frameworks to show failure and success.

So what?

What is the logic of that? Testing code that doesn't even exist yet? Crazy!

Well let's start with the obvious stuff. First up, you get unit tests. Unit tests are good, and let you refactor safely. POUT/test last obviously does this too. You get quick feedback on whether the code you have just written breaks anything. You can do this with a POUT approach as well, depending on how quickly you write your tests, or whether you use manual testing for feedback. TDD provides a bit more assurance that you will get the quickest possible feedback and will always have a decent amount of code coverage, but with sufficient discipline a POUT approach can accomplish all of this. And we trust our development team right? We don't need some process to force them to do the right thing.

TDD as a design tool

TDD is a tool for incrementally improving the design of your code. The unit testing side of it is simply a nice side effect, to the point where BDD has been proposed as an alternate presentation of the technique to eliminate the apparent confusion caused by the word "test". So how can TDD be used to improve design?

Well, first up you are specifying the exact behaviour you want to implement before writing the code to do it. This has several effects. You are writing the logical interface to your class before the class itself (interface, not public interface IInterface{}). This takes some of the guesswork out of determining how your code is going to be called. You have to deal with the interface to your class in order to test it, so if it is painful to use you can immediately tell and do something about it. According to my pseudo-logic, this can help ensure production code that needs to talk to your object should have a well designed interface through which to do so easily.

Specifying the behaviour you want first also helps focus you on one piece of code at a time. You are just trying to pass the test, not immediately trying to solve every problem the class will ever face. This can help reduce feature creep in your design. If you find that a "divide and conquer" approach can make problem-solving easier, then isolating the one thing you want to achieve in this way can also make coding easier.

If you have never written a class that exposes an interface that turns out to be fairly unusable in Real LifeTM, or if you have never worked for a while getting your class to handle a situation that will never actually occur (e.g. "What if this class needs to take a DateTime as an input instead of just ints?", followed by a large effort to produce a generic version of your class that is only ever called as MyClass<int>) then you are a much better developer than I (probably a given), and you may not get any benefit from TDD. TDD doesn't prevent these problems, but it can help make these problems immediately apparent and therefore potentially avoidable.

The refactoring step is an essential part of the design aspect of TDD. The idea is that you have just gotten your test to pass, so your class' behaviour for that specific test is correct. The very next step is to improve the design. Remove any duplication, extract any methods or rename variables as required to make the code more readable and understandable. Sprout any new classes that are required to better encapsulate the data or behaviour. Refactoring is not unique to TDD, but TDD helps you to perform these design improvements in small, hopefully easy, increments. If your test is difficult to write or implement, this feedback tells you your design might need tweaking, or that you initial specification needs more clarity. Your design is evolving based on continuous feedback from your immediate requirements and your actual code. This can help avoid large, difficult refactorings and unnecessary refactoring.

By improving the design at each step, in theory you make the next step easy to perform. The goal is having code that is always easy to change, making your more responsive to changes in business needs and requirements, and making bugs easier to fix (that's right! You still get bugs doing TDD! :-P).

Useless point on the "magic" of TDD

Aside from my convoluted version of the theory behind TDD, I feel TDD has the potential to change the way you think about coding. For me TDD had a profound effect on how I think about software design, to the point where even when not doing TDD I still find myself thinking in terms of passing tests and getting feedback from incremental coding steps. I feel this has made me a better developer, providing me with another way of thinking about problem. The extra perspective may not help you in every situation, or ever, but it does give you another avenue to pursue when you are stuck.

I also find I come up with much cleaner designs using this different perspective. The Robert Martin's Coffee Maker example [PDF] illustrates some OO design traps. In my experience the incremental design encouraged by TDD helps avoid some of those traps. Of course, this is all anecdotal hand-waving, hence this section's title :-)

Clarifications and conclusions

Refactoring, incrementally improving design, isolating behaviour, writing testable code, etc can all be achieved without TDD. TDD is just a tool to help you do all these things. If you do them fine without TDD, then TDD is probably not going to help you. If the technique sounds interesting to you, then your best bet is to give it a try and see if you see any benefits over and above POUT (I first got into TDD following this example. I converted it to C# and NUnit, commented out all but one test, and started coding).

I have not listed anything that is a dramatic, inspiring benefit of TDD. That is because TDD is simply a tool to help achieve what we are all trying to do anyway: writing good, well designed software. If you do that already then TDD may just seem like more work. This is probably why people that feel passionately about TDD can have a hard time selling it to people that are skeptical of it. It isn't dramatic. There is no "if you do this you will become almost as popular as Justice Gray" aspect to it.

I should also be clear that, for me at least, TDD did not come easy. I am definitely still learning, but my confidence and my results improve the more I use it. I feel it has been worthwhile. YMMV. If you have a mentor or a colleague to learn with this process may be easier.

As a closing aside, I think everyone will agree that at least people are debating the merits or TDD vs. a simple, good bank of units tests, rather than trying to justify the practice of unit testing itself. :-) Hopefully that means I won't ever have to face the "We don't have time to unit test" discussion again ;-)

Thursday, 3 January 2008

Messing around with various ORMs

I recently had a quick muck around with a few .NET ORM tools. I took a basic, contrived scenario and tried to implement the same operations using the tools. Here is a quick index:

The examples all go through same steps: the basic configuration required to get going, inserting a couple of records, and then getting some data back out with some simple queries, including traversing a many-to-many relationship.

8 Jan 2008: Quick clarification -- this series is not intended as a review of these tools. It is simply a quick guide to configuring and performing some very basic operations with them. Because all the posts use a (very simple) common scenario, you may be able to draw some basic conclusions about the tools, but in the end you'll probably have to evaluate them for yourself with respect to your own requirements. In which case these posts may help you get started :)

Wednesday, 17 October 2007

Getting to grips with ReSharper

On the couple of occasions I have talked people into trying JetBrains ReSharper (R#), the first thing they seem to notice is that it can optimise your using statements, and that it puts are bar down the right hand side of your editor showing you all the mistakes you have made in bright red and yellow stripes. There is obviously a lot more to ReSharper than that, and I've noticed people seem to get as much out of it as they put into it. Here are some of the features I found useful when I first started getting to know ReSharper 3.0.

Navigation

ReSharper introduces many nice ways of finding your way around your code. The File Structure View provides a list of all the members in your current class. Double clicking on a member jumps to that point in the code. You can also rearrange members and manage code #region blocks.

The mechanics of the File Structure View are fantastic, but to me the biggest advantage is the additional awareness it provides. At a glance you can take in all the members in the current file, which means you always have a good awareness of the context in which you are working, such as the class' current responsibilities.

I dock my File Structure View window to the right of VS and find it invaluable. You can access this window via the ReSharper >> Windows menu, or Ctrl + Alt + F.

Screen shot of File Structure View

ReSharper also provides a number of shortcuts to jump to a file, type, or symbol (Ctrl + T, Ctrl + Shft + T and Shft + Alt + T respectively). This lets you start typing part of the file/type/symbol you are looking for, and pops up an auto-complete box that allows you to quickly jump to the item you are after. Generally much faster than using Solution Explorer.

Other navigation features include an improved (over the standard VS) "Find Usages" feature, and a "Navigate from here..." menu that lets you jump to base, type declaration, function exits and more.

Code Generation

Mashing Alt + Insert opens a code generation menu, from which you can access wizards to generate properties with backing fields, constructors, interface implementations, and even delegations to class members. When I write "wizards", don't go picturing Clippy or some equivalent monstrosity -- these are actually helpful! For example, the constructor generator lets you select the fields/properties you wish to initialise in the constructor's parameters and then dumps the constructor into your class. The wizards are generally very keyboard friendly as well, so you don't have to madly chase your rodent around trying to click obscure buttons or links.

Generate constructor screenshot

Templates

The templates in ReSharper are incredible. There are a couple of templates build in (try typing "foreach<TAB>" into a method that takes any IEnumerable like an array as a parameter. No really, go try it! I'll wait...), but the real power comes from creating your own Live or Surround With template.

The templates support parameter substitution (i.e. you put a token in your template), and through the template builder UI you can specify exactly how these parameters are populated at run time. For example, you can tell ReSharper to guess the substitution based on the current type, current parameters, or even transform an existing name (such as using the camel-cased version of a Pascal-cased property name).

Edit template example

The educated guesses ReSharper uses are generally spot on, and are used in a number of places throughout the product, such as some of the Quick Fixes.

You use a template by typing part of the name into VS, and selecting the template from the autocomplete menu that pops up.

Refactoring

The standard Visual Studio refactorings are awful. On every PC I have used it has been almost entirely unusable. Trying to rename a private class member tended to cause a complete recompilation of the entire solution and basically take an eternity. ReSharper's refactorings work in milliseconds, and provide a whole lot more than VS.

The standard ReSharper 3.0 keymap (not the IntelliJ/ReSharper 2.x one) lets you mash Ctrl + Shft + R to open a context-sensitive refactoring menu. For example, if you highlight a passage of code and hit Ctrl + Shft + R, you can choose Extract Method to move that code to its own method. ReSharper will either instantly apply your refactoring, or open one of its helpful wizards if it needs a bit more data from you to continue.

There are also several chords you can press as short cuts to various refactorings, like Ctrl + R, Ctrl + V to apply the Introduce Variable refactoring. The refactoring features of ReSharper are extremely powerful, but in my experience require the most amount of work to learn how to use effectively. Every day I seem to find a new way to use a ReSharper refactoring to make coding easier.

The screen shots below show the "Refactor This" menu that appears after a value is selected and Ctrl + Shft + R is pressed. The second shot shows the Introduce Variable wizard, and the third shows the result.

Refactor This context menu

Introduce variable refactoring example

Result of introduce variable refactoring

Intellisense improvements

ReSharper, depending on which options you have specified (ReSharper menu >> Options), augments the standard Intellisense offered by Visual Studio. One option I turn on is to show more info about members, such as the return types. It is a good idea to play around with this to best suit the way you work.

The main thing ReSharper offers in this area is smart autocomplete and type autocomplete. Instead of the default Intellisense triggers such as Ctrl + Space, you can press Ctrl + Shft + Space or Shft + Alt + Space to get ReSharper to intelligently restrict the autocompletion options base on your current context. For example, if you are initialising a variable from a method call and and trigger a type-autocomplete, ReSharper will only show you a list of methods that return a compatible type.

Quick fixes

This is the best feature in my book. Whenever ReSharper finds an error, a suggestion, or recognises a number of common actions in the current context, it will flash up a light bulb and underline the relevant code. The colour of the light bulb and underline will depend on what type of quick fix ReSharper has identified (error, suggestion etc).

When you see Quick fix icon, press Alt + Enter to open the Quick Fix context menu, which will solve all your problems for you.

An example that is really useful when doing TDD is when you write some test code first that does not yet exist. The missing method will be written in red, and the quick fix icon will appear. Pressing Alt + Enter will display the menu shown below:

Quick fix menu to create a method

If you select "Create method...", a method stub will be generated, and you can TAB through the fields in the template ReSharper uses and fill in the details. The screen shot below shows ReSharper's suggestions for the parameter name, or you can specify a completely different name.

Quick fix process suggests a method name

Another TAB press and you can replace the throw new NotImplementedException(); line with your required implementation:

After quick fix method creation

But wait, there's more!

There is a Unit Test Runner, an option to cycle through your copy/paste clipboard (Ctrl + Shft + V), more code colouring options (such as greying out dead/unused code), automatic xml-doc generation for parameters, automatic importing of relevant namespaces, dead code removal and more!

Can I have my commission now?

There is so much to ReSharper, and you really will get as much out of it as you put into it. Once you become familiar with it you will find your code almost writing itself. I really recommend exploring the options it offers (ReSharper menu >> Options), as well as looking through and trying all the features. If you want an idea of exactly what can be accomplished, have a look at this video from Ilya Ryzhenkov, .NET Tools Product Manager at JetBrains.

There are only a couple of downsides to ReSharper. One is that you need a decent PC to run it on anything but a trivially sized project -- it is a real memory hog, despite improvements being made over v2. Secondly, it is quite easy to become dependent on ReSharper, and feel lost when you go back into normal VS-land. Sometimes I feel like I need to jump back into emacs to remember how to do stuff old-skool :) The other problem with the product is that it leaves you hankering for JetBrains to develop a Visual Studio replacement, say, Intelli.Net :)

Hope this post helps give your ReSharpering a jumpstart. :)

Tuesday, 5 June 2007

Explaining good code to non-geeks

I commonly encounter a lot of blank looks when I try and explain what a programmer does. Most people know it involves making the computer do stuff, but it pretty much goes down hill from there unless you have a geeky audience. This can pose a problem when explaining concepts like good quality code to non-IT business representatives and managers. This post is an effort to provide an understandable example of good code for non-geeks.

First up some disclaimers. This is a contrived example for illustrative purposes only. It is not meant to be a shining example of fantastic design, but to show some of the considerations that go into producing good code. I also provide some approximate definitions (approxinitions?) of some concepts. These may not be very accurate, but are designed to give the reader an idea of a concept without getting drowned in the details. Lastly, try not to get bogged down in the details (How does that work? How does it connect to a database? What's a database? What language is that? Why am I reading this strange person's blog?). Instead, concentrate on the concepts being communicated. You don't need the details to get the gist. Right? Ok, let's go.

A problem and first pass at a solution
We are writing an application that needs to send emails to a group of people. To do this we will write a class. Think of a class as group of related behaviours (functions) and data (fields, properties). Let's see some code:

class EmailSender {
function SendEmails() {
listOfPeople = database.LoadPeople();
foreach person in listOfPeople {
emailService.SendEmail(welcomeEmail, person);
}
}
}
Even if this looks a bit daunting, you can probably still follow through what we are doing. Here we are creating a new class called EmailSender that contains a SendEmails() function. We can call a function through a class. This is usually expressed in a className.functionName() format, so in this case we could run our function by writing EmailSender.SendEmails().

Within our function we load a list of people from the database, and then for every person in that list we send them an email by calling the
emailService.SendEmail(helloEmail, person) function. The values in the parenthesis is information we are passing to the function. In this case we are telling the function which type of email to send (a helloEmail, whatever that is) to a particular person. In this example we are assuming that the emailService and database classes are already written somewhere for us to use.

Adding more features
We now have a function that will send welcome emails to all the people in some database. Hooray! Now we realise that our application does not just need to send a generic email to everyone, but we also need to send emails to everyone that is about to go on holidays. Let's add a new function to do this:
class EmailSender {
function SendEmails() {
listOfPeople = database.LoadPeople();
foreach person in listOfPeople {
emailService.SendEmail(welcomeEmail, person);
}
}
function SendEmailsToHolidayers() {
listOfPeople = database.LoadPeopleGoingOnHolidays();
foreach person in listOfPeople {
emailService.SendEmail(haveANiceTripEmail, person);
}
}
}
This new function is very similar to the first. It loads a different list of people (people gong on holidays), and then sends each person a haveANiceTripEmail instead of a welcome email. Finally, lets send some nasty emails to people who have not paid their bills to our company.
class EmailSender {
function SendEmails() {
listOfPeople = database.LoadPeople();
foreach person in listOfPeople {
emailService.SendEmail(welcomeEmail, person);
}
}
function SendEmailsToHolidayers() {
listOfPeople = database.LoadPeopleGoingOnHolidays();
foreach person in listOfPeople {
emailService.SendEmail(haveANiceTripEmail, person);
}
}
function SendEmailsToLatePayers() {
listOfPeople = database.LoadPeopleWithOutstandingDebts();
foreach person in listOfPeople {
emailService.SendEmail(weAreSendingSomeGoonsEmail, person);
}
}
}
And we are finished. Simple right?

Problems with our class
Problems? Our class is perfect! We just wrote it, it's beautiful! The last code sample may work, but it has a lot of duplication. All three functions load a list of people from the database, then send an email to each person. This duplication is a warning signal to the programmer that something is wrong with the code.

We wrote more functional code than was required, repeating the same logic in several places. Functional code is the lines that do stuff, rather than the semantics of defining classes and functions. The code is also hard to maintain. For example, say we want to record in the database that an email is being sent before we sent it. We then need to change the following lines in 3 places:
  foreach person in listOfPeople {
database.RecordEmailIsBeingSent(weAreSendingSomeGoonsEmail, person);
emailService.SendEmail(weAreSendingSomeGoonsEmail, person);
}
It is easy to miss one of these lines once our class grows to have more functions, which encourages bugs. It also means if there is one bug in a repeated part of the code it will be faithfully reproduced in the duplicated parts of the code. This kind of duplication can also make it difficult to understand the intent of the code, as the important logic is obscured by other bits of functionality.

A good indicator of good code is a lack of duplication. This makes for smaller amounts of functional code (which means less work as we are solving each problem only once, and less places for bugs to hide), and code that is easier to change and maintain (faster and cheaper to add features and support).


So let's take a giant leap and see if we can turn this code into good code. This process is known as refactoring (changing the design without modifying the behaviour). To non-technical people refactoring can seem a waste of time. Why spend time changing something behind-the-scenes when the overall output remains the same? If it ain't broke, why fix it? To technical people, refactoring is essential and will reap great rewards the second someone finds a bug or wants to change the application slightly. In reality the code is broken, unless part of the initial requirements was for our application to cost a huge amount to support and be nearly impossible to change.

Refactoring to good code
To eliminate this duplication in our functions, let's think about exactly what each function is trying to achieve.
  • Load a list of people from a database. The list will be different for different email types, but the end result will still be a list of people.
  • We need to send an email to every person in the list.
  • The email content will be different for each email type.
So if we can extract the functionality which gets the list of people, and the email content, we will be left with the main logic behind sending these emails.
class EmailSender {
function LoadRelevantPeople();
function EmailContent();
function SendEmails() {
listOfPeople =
LoadRelevantPeople();
foreach person in listOfPeople {
emailService.SendEmail(
EmailContent(), person);
}
}
}
How does that look? We don't have any duplication now, but we have lost our multiple email types. Notice how clearly the SendEmails() function now expresses our intention: we load a list of relevant people, then send an email to each one. We also have some blank functions, LoadRelevantPeople() and EmailContent(). These functions represent the parts of our original code that differed between each function. We will take advantage of some programming magic called inheritance to inject the relevant functionality.

Inheritance is a relation between a parent class and a child class. The child class automatically gets the functions and properties of the parent, but can also add new functions, and
override the behaviour of the parent's functions. We will use this techniques to implement our different types of emails.
class EmailSender {
function LoadRelevantPeople();
function EmailContent();
function SendEmails() {
listOfPeople =
LoadRelevantPeople();
foreach person in listOfPeople {
emailService.SendEmail(
EmailContent(), person);
}
}
}
class WelcomeEmailSender inheritsFrom EmailSender {
function LoadRelevantPeople() {
return database.LoadPeople();
}
function EmailContent() { return welcomeEmail; }
}
class HolidayEmailSender inheritsFrom EmailSender {
function LoadRelevantPeople() {
return database.LoadPeopleGoingOnHolidays();
}
function EmailContent() { return haveANiceTripEmail; }
}
class LatePayersEmailSender inheritsFrom EmailSender {
function LoadRelevantPeople() {
return database.LoadPeopleWithOutstandingDebts();
}
function EmailContent() { return weAreSendingSomeGoonsEmail; }
}
We can now send emails like this: HolidayEmailSender.SendEmails(). This will use the SendEmails() function from our parent EmailSender class, which will fill-in-the-blanks with HolidayEmailSender's implementation of LoadRelevantPeople() and EmailContent().

What do we gain by doing this?
  • We have separated the logic of sending an email to a group of people with the particulars of getting that list of people and email content. This makes it easier to understand each discrete block of logic.
  • We can now add new email types without modifying the original EmailSender class. This helps us to avoid introducing bugs into that class. It also means we do not need to understand all the details of the EmailSender class, so our application is easier to modify.
  • We can change the EmailSender class to record all the emails being sent without affecting the child classes. This example was used in the previous section and required 3 modifications. Now we only require one.
  • We can separately test each unit of logic. This approach is more amenable to automated unit testing techniques.
  • We have eliminated duplication.
What do we lose?
  • In this case, but not always, brevity. By removing duplication we have actually increased the number of lines we wrote from 20 to 28. We are using a new class for each function rather than the 3 line functions we were using initially. This is more to do with this example than the refactoring process in general.
  • Directness. Before we could look at a function and see every step. Now we have introduced a layer of indirection or abstraction to the process. It is now easier to understand each discrete block of logic, but takes more brain power to understand the entire behaviour.
Which is better? Well, ending up with a bit more code is an unfortunate side effect of using a small example for this post. You can probably see that as the size and number of duplicated areas increase, this new approach will actually require less lines than the original.

Even so, the extra lines are not really an issue in this case. The extra lines are not
functional code, they are simple class definitions and so forth, rather than code that actually does stuff. Normally these non-functional lines are written for programmer's by their code editing programs. They are also the kind of things that are automatically checked by the code compiler, so unlike functional lines of code they will generally not be harbouring bugs.

So the balance lies in the gains versus introducing indirection. Bear in mind that this is a contrived example, and that most software is significantly more complicated. As things get more complicated, abstraction and indirection become compulsory. The human brain can only handle so much complexity, so the natural response is to divide and conquer these problems by using abstractions. Programmer's are used to these abstractions as they work with them every day, which further reduces the problems caused by the indirection.

A key point is that changing software can be very costly. Anything we can do to mitigate that cost will pay huge dividends over the life of the software. Programmer's can do this by removing duplication and creating high quality code.

To me the gains seem to significantly outweigh the shortcomings of our changes. On balance, I would always choose our revised example.

Parting thoughts...
The structure and design of the code itself can have a big impact on the cost of supporting, maintaining and extending software. Refactoring is sometimes dismissed by non-geeks as an optional or low-value activity. In reality it is essential. Skipping it incurs a
technical debt, which you will need to pay for next time the application needs to be modified.

The technique we used is an example of the
Template Method design pattern. This technique relies on a main function that contains the logic of what we are trying to achieve (the template), and uses child classes to fill in the blanks on variable behaviour. The are many others ways of removing duplication from code, each with varying applicability to individual situations, and each offering specific strengths and weaknesses.

There is no way to mandate or standardise on a particular technique for all situations. It would be disastrous to declare that every time a programmer writes an email feature into software they must use a Template Method. Each case needs to be evaluated on its merits. Writing good code depends on the skills and experience of the programmer. Good code can not be rubber-stamped and produced en masse, not by standardisation nor by graphical tools, pre-built platforms, advanced IDEs, or any method other than applying good professional practices to every individual case.

Coding is sometimes considered a low-value part of software development (read: monkey work), but good programmers can contribute a huge amount to a project by making effective decisions at this level.

I hope this has helped to show some of the features of good code, why it is worth the effort, and also highlighted some of the issues programmers face when trying to write good code.

Wednesday, 30 May 2007

TDD and unit testing misconceptions

Today I test-drove a feature into some legacy code. It was my first shot at using TDD in a real situation (as opposed to my usual contrived play-around scenarios), and it actually went fantastically well. This post is not about all the stuff that worked, but about some common misconceptions that were revealed when something did not work so well.

There is currently a huge amount of work going on in the rest of the code base (read: the whole thing is broken and the web project will not compile), so I did not get a chance to perform integration tests on the feature, and of course a bug crept in. This lead my colleague (hi mate!), upon joyfully discovering the bug, to make a hysterical joke: "I thought you said you unit-tested this thing, how can it have a bug!?!!" (haw haw, <rollsEyes /> :-P).

Now hilarity aside, this jest actually touches on a number of common* misconceptions about both TDD and unit testing.

Firstly, TDD is a design tool, not a testing tool. The fact you end up with a couple of tests afterwards is just a nice bonus. The real value is producing low coupling between modules which results in a testable, maintainable design, and generally a sensible, consumer-focussed API. In this case TDD served me really well, allowing me to inject a nice, self-contained behaviour into a number of legacy classes.

Secondly, unit testing itself does not eliminate bugs. Even with 100% code coverage there will be bugs. Even with full coverage, a comprehensive suite of integration tests, automated user acceptance tests, hordes of dedicated and specialist testers and a beta testing program of 12 million users there will still be bugs**!

So what's the point of unit tests? For one, they indicate that all the tested cases seem to work. As we can't usually exhaustively test, this means that every case left untested might be broken, but at least we have some cases that work. If we pick our test cases carefully then we can have some measure of confidence that lots of code units work in lots of cases. Unit tests also enable us to refactor with some level of protection, and can help us to catch regressions that can occur during development (such as when merging conflicted changes, working with multiple developers etc.).

In my case today the bug was in a thin slice (~10 lines) of untested code used to plumb in the TDD code to the legacy code (all the TDD'ed code works as advertised -- so far). This plumbing code was difficult to test as it tied in with the ASP.NET page lifecycle and HttpContext. So it was untested and ergo, bug ridden :-). However the unit tests helped here enormously. I knew exactly where the bug probably was not, because my tests passed for a similar case. This left only a handful of places where the bug could be hiding. It took about 30 seconds to find the source***, 5 more to fix it, 10 more to commit to SVN.

Which leads to another point about TDD that caused me some grief when I first started -- you need to be aware of exactly what your tests are testing. You need to know the class under test, the behaviour being tested, and what is not being tested. Get any of these things mixed up and you can end up testing a stub object that is never going to be used anyway. And worse, you lose the advantage of knowing where to look for bugs that have crept in.

The last misconception for this post is around the definition of unit tests. Unit tests are NOT integration tests. Both are important. Just because all your units work in isolation doesn't mean they'll get along with each other.

So there you have it. I found TDD in this case was very helpful, great fun and also very successful. It helped produce low-defect (zero so far) code, resulted in a nice design and API, has given a framework for future refactoring, and also helped debug untested code. And gave me an excuse to ramble on in a large blog post. TDD -- it's fan-tastic!****

* By "common" I mean they seem to crop up on blogs, blog comments and groups. I have no idea what the majority of developer's think, only my current audience ( er, me :-) )

** Actually, this might be sufficient for "Hello, World!".

*** I had "==" instead of "!=" on a condition.

**** Can I have some advertising revenue now please Mr Beck? :-)