Saturday, 29 March 2008

99 problems with Python (1-10)

No, this is not a rant about problems I have with Python. I actually quite like it. Rather, it's my attempt to complete some of the Dr Werner Hett's 99 Prolog logic problems using Python, as a way of getting to learn a bit about the language. I first heard about the 99 Problems from Joel and Ryan Lanciaux's efforts to solve the problems using F#, who were in turn inspired by the Ruby implementations on the Curious Coding blog.

Keep in mind that I'm new to Python (have jumped into this exercise straight after reading the excellent Dive into Python), so much of this might be pretty clumsy. I'd love to hear any suggestions for improvements. I am also not entirely keeping with the original spirit of the Prolog exercise in terms of logic programming, as my main purpose is familiarising myself with Python. I have, however, tried to do things in a fairly Pythonic, functional-style rather than C/C++/Java/C#, imperative-style. I'm also not concerned with error handling or edge cases, just in getting the basic syntax and approach right. I'm not sure I'll get through all 99 problems, but at the very least here's the first 10.

Implementation overview

I did the exercise in Eclipse with PyDev plugin, and used the Python unittest module to implement each solution as a test to make it easy to run and verify each solution. You can obviously run all this using IDLE or whatever Python runner you like. Here's the basic file structure I started off with:

import unittest

class NinetyNineProblemsFixture(unittest.TestCase):
    
  def testXX_Description(self):
    def solutionToProblem(input): pass          
    input = "some input"
    expected = "expected output for successful implementation"    
    self.assertEqual(expected, solutionToProblem(input))  
  
if __name__ == '__main__':
  unittest.main()

I wrote a test for each problem, then implemented the solution as an inner function within the test. The if __name__ == '__main__' line lets you run the tests by simply running your *.py file through Python. I tended to run using PyTest.py from within Eclipse/PyDev.

Back to table of contents...

P01 Find the last element of a list

  def test01_FindLastElementOfList(self):
    def getLast(list):
      return list[-1]
    list = [1, 3, 7, 14]
    self.assertEqual(14, getLast(list))

Lists are great in Python. They have lots of nice features that make them very pleasant to work with. This example shows one of these features, you can use a negative index to get list items from the end of the list. For example, list[-3] will get the third last item in the list. In this case, we want the last element, so we use list[-1].

Back to table of contents...

P02 Find the last but one element of a list

Same as P01, but we want the second last item:

  def test02_FindLastButOne(self):
    def getLastButOne(list):
      return list[-2]
    list = [1, 3, 7, 14]
    self.assertEqual(7, getLastButOne(list))

Back to table of contents...

P03 Find the K'th element of a list

Basic array index for this. Lists indicies are zero-based in Python (as they should be :)), but the problem definition wants to translate the reference to 1-based.

  def test03_FindKthElement(self):
    def getKth(list, k):
      return list[k-1]
    list = [1, 3, 7, 14]
    self.assertEqual(3, getKth(list, 2))

Back to table of contents...

P04 Find the number of elements of a list

Cheating for this one and just use the built in len function:

  def test04_NumberOfElementsInList(self):
    def getCount(list):
      return len(list)
    list = [1, 3, 7, 14]    
    self.assertEqual(4, getCount(list))   

Back to table of contents...

P05 Reverse a list

Here I played around with two different approaches so as to learn a bit about Python list slicing.

Update (2 April 2008): Initially I did not realise that there was a built-in reversed() function. I originally had something silly like this:

    def reverse(l):
      newList = list(l)      
      return newList.reverse() or newList

The list.reverse function does an in-place reversal of the list elements and returns None (the Python null value), so I had to clone the list first so as not to affect the original, then OR it to return the list reference itself. Thanks to Mark et al. on Reddit for showing me the light! :)

  def test05_ReverseAList(self):
    def reverse(l):
      return list(reversed(aList))
    def manualReverse(list):          
      return list[::-1]
    sampleList = [1, 3, 7, 14]
    self.assertEqual([14, 7, 3, 1], reverse(sampleList))
    self.assertEqual([14, 7, 3, 1], manualReverse(sampleList))

The reversed() function returns an iterator, so it is wrapped in a list() constructor to convert it to a new list. I used l for the method parameter, as I couldn't figure out how to appropriately qualify references to the list class when it was hidden by a parameter called list (cue embarrassed smiley).

List slicing

The second approach was to use Python's list slicing. A slice is a range of values copied from an original list. This means we have no side effects like we do using the in place reverse(). The basic syntax for a slice is:

list[indexOfFirstElementInSlice : indexOfFirstElement_Not_InSlice : optionalStep]

Omitting the first argument starts the slice from the first list element. Omitting the second argument takes the remainder of elements in the list. The step can be set to, say, 2 to take every second list element. It's worth noticing that the second argument is exclusive, so that we can the following from the Python interpreter:

>>> aList = range(0, 6)
>>> aList
[0, 1, 2, 3, 4, 5]
>>> #From index 1 up to, but excluding, index 5:
>>> aList[1:5]
[1, 2, 3, 4]
>>> aList[1::2]
[1, 3, 5]

In our case we want the slice to include all the elements, but we want to take them in reverse order, which is how we end up with list[::-1]. Right, that was all a lot of explanation for not-much-code, but I thought I'd point out some Python basics along the way.

Back to table of contents...

P06 Find out whether a list is a palindrome

A palindrome is something that reads the same forward as it does backwards, like "Go hang a salami, I'm a lasagna hog"* (well, ignoring punctuation and spaces). I tried two different approaches for this one too.

  def test06_IsListAPalindrome(self):
    def isPalindrome(aList):
      return aList == aList[::-1]
    def isPalindromeRecursive(aList):
      if (len(aList)<=1): return True
      top, tail = aList[0], aList[-1]          
      if (top != tail): return False
      return isPalindromeRecursive(aList[1:-1])
    palindromes = [
                   ['a', 'b', 'c', 'b', 'a'],
                   [1, 10, 20, 30, 30, 20, 10, 1],
                   ['a', 'a']                    
                   ]
    nonPalindromes = [
                      ['a', 'b', 'c', 'd'],
                      [1, 10, 20, 20, 10, 2],
                      ['a', 'v']
                      ]    
    self.assertTrue(all([isPalindrome(x) for x in palindromes]))
    self.assertTrue(all([not isPalindrome(x) for x in nonPalindromes]))
    
    self.assertTrue(all([isPalindromeRecursive(x) for x in palindromes]))
    self.assertTrue(all([not isPalindromeRecursive(x) for x in nonPalindromes]))

The first way is cheating, but effective. Simply use return aList == aList[::-1] to compare the list with a reverse slice, using the same slicing syntax used for P05. This will obviously ensure the list reads the same forwards as backwards.

The second way uses recursion, as well as logical list operations to get the same effect. Let's break it down:

if (len(aList)<=1): return True

A list of 0 or 1 elements will read the same forwards as backwards, right? So yes, we have a palindrome.

top, tail = aList[0], aList[-1]          
if (top != tail): return False

Ok, now we look at first and last elements of the list (the latter of which I inconveniently named tail, which is normally used to refer to the remainder of the list besides the first element, head/tail semantics). This line also shows how you can do multiple assignments over one line in Python. You can also use a similar syntax return multiple values from one function, which we'll see later if you make it that far without getting bored :). The second line of the fragment then compares these elements -- if they are not equal then the list can't be a palindrome (e.g. 1 2 3 2 7, the 1 and 7 don't match so the list isn't a palindrome).

If the first and last elements are equal, then we have a potential palindrome, and use tail recursion to check whether a slice of the list, excluding the first and last elements, is a palindrome. Which seems to work nicely.

* A fine lesson in palindromes is available from Mr Yankovic album. You might be able to find a version somewhere without too much trouble, strictly for educational purposes.

Back to table of contents...

P07 Flatten a nested list structure

This problem just wants us to take a nested list, like [1, [2, 3], [4, 5, [6, 7]]], and flatten it out to a 1-dimensional list, like [1, 2, 3, 4, 5, 6, 7]. After coming up with a fairly ugly solution I ended up peeking at the Curious Coding solution, and adapted it to Python. This is also the first problem in the set marked as a 2-star problem, or "intermediate difficulty" (the others have all been marked as 1-star, or "easy").

  def test07_FlattenAList(self):
    def flatten(aList):
      flatList = []      
      for item in aList:
        if (type(item)==list):
          flatList.extend(flatten(item))
        else:
          flatList.append(item)          
      return flatList
    self.assertEqual(
                     [1, 2, 3, 4, 5, 6, 7], 
                     flatten([1, [2, 3], [4, 5, [6, 7]]])
                     )

Points of note from this problem is the use of the type() method (well, type is actually a "type", which, like everything in Python, is an object). This is used to check if the current element is a list. If so, the flattened list is extended by the flattened version of that sub-list (via a recursive call to flatten()). If not, the element is appended to the flatList.

The extend() method adds the items from a list to another list, whereas append() adds the item from as a single element to the list. This is probably best shown with an example:

>>> aList = range(0,6)
>>> aList
[0, 1, 2, 3, 4, 5]
>>> bList = list(aList)
>>> bList
[0, 1, 2, 3, 4, 5]
>>> anotherList
[6, 7]
>>> aList.append(anotherList)
>>> bList.extend(anotherList)
>>> aList
[0, 1, 2, 3, 4, 5, [6, 7]]
>>> bList
[0, 1, 2, 3, 4, 5, 6, 7]

Back to table of contents...

P08 Eliminate consecutive duplicates of list elements

This is another 2-star, intermediate problem, but it was pretty easy to whip through using Python's list comprehensions. Let's have a quick look at the list comprehension syntax first:

[itemInNewList for itemInNewList in someList if someConditionIsMet]

The square braces ([, ]) indicate we are creating a new list. In the new list we will include itemInNewList as an element, for the itemInNewList values in someList (or other iterable), that meet someConditionIsMet (the if someConditionIsMet is optional). Quick example:

>>> aList = range(0,6)
>>> aList
[0, 1, 2, 3, 4, 5]
>>> [item for item in aList if item < 3]
[0, 1, 2]

All a bit LINQ-like to me (well, more correctly LINQ is inspired by features in functional programming, such as monads [PDF]).

Back to problem 8, the aim is to eliminate consecutive duplicates from a list. So we would like to create a new list by iterating over a source list, and only including elements that are not the same as the previous element.

  def test08_EliminateConsecutiveDuplicates(self):
    def compress(aList):
      return [item for index, item in enumerate(aList) if index==0 or item != aList[index-1]]
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(
                     ['a','b','c','a','d','e'],
                     compress(sampleList)                     
                     )

The only tricky bit with this list comprehension compared with the trivial example given above is the use of the enumerate(aList) function which returns two values per iteration, the item and the index of that item. We use these variables in the list comprehension condition to check whether this is the first list item (which we want to include), or if the list item duplicates the previous one (which we want to exclude).

I think that's awesome :)

Back to table of contents...

From comments: avoiding enumerate()

Arnar (Blogger profile link) left a nice solution for this problem in the comments that doesn't require the use of enumerate(). I've reproduced it here (updating the naming convention to match the example above):

    def compress(aList):
        return aList[:1] + [aList[i] for i in range(1, len(aList)) if aList[i-1] != aList[i]]

This first creates a list containing only the head element (aList[:1]), and concatenates it with a list comprehension that eliminates duplicates. Rather than using enumerate() to get the index and item, Arnar used range(1, len(aList)) (which I have blogged about before) to generate the relevant indexes and then accesses the items direct from the list.

The nice thing about this approach is that by starting with aList[:1], we simplify the list comprehension condition I originally had (if index==0 or item != aList[index-1]). I'm not sure if there is a clear advantage of using enumerate() or range() to iterate over, but it definitely gave me another way of thinking about the problem. Thanks Arnar! :)

From comments: using itertools.groupby

This example added 3 April 2008

I had a number of helpful comments suggesting I use the itertools standard Python library for a number of these examples. Thanks to Arnar, Niall, Mohammad and everyone else that suggested this and wrote in with examples. I originally intended to do this exercise without using any imports (well, except for unittest), but Arnar managed to convince me to stop being so silly. :)

The particular itertools function we are looking at is groupby(iterable[, key]). This function returns a list of tuples. The first item in the tuple is the key for a particular group, and the second is an iterator over the group itself (i.e. (key, group)). So how are keys specified? By default, it is the item's identity or value (so if you group [1, 1, 1, 2, 3], you will get a group of 1s, like this:

>>> from itertools import groupby
>>> aList = [1,1,1,2,3]
>>> [list(group) for key, group in groupby(aList)]
[[1, 1, 1], [2], [3]]
>>> [key for key, group in groupby(aList)]
[1, 2, 3]

Note we have to import the function from the itertools library first using from itertools import groupby (Dive Into Python has a great explanation about how to import stuff). As the group returned is an iterator over a group, we can turn it into a list using the constructor list(group). The last command issue also shows what keys groupby() finds when no key argument is supplied. We can also provide groupby() with a function used to calculate keys:

>>> aList
[1, 1, 1, 2, 3]
>>> [list(group) for key, group in groupby(aList, lambda x: x<3)]
[[1, 1, 1, 2], [3]]

Here we use a simple lambda function to sort the list into groups: those less than 3, and not. :) Armed with just enough knowledge to be dangerous, let's try and solve problem 8 again. To eliminate consecutive duplicates, all we really want to do is get the keys from the list:

  def test08_EliminateConsecutiveDuplicates(self):
    def compress(aList):        
        return [key for key, group in groupby(aList)]
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(['a','b','c','a','d','e'], compress(sampleList))

And this passes nicely. But hold on a minute! How come we have duplicated keys in there? There are to 'a' elements! I said the default keys were each item's identity, and they are both 'a'! Why are you lying Dave? WHY?!?!

Ok, I'm better now. The reason is that groupby() is implemented using an iterator. It goes through the list picking out items in the 'a' group, then hits 'b'. This has a different identity, and so starts a new group. The function's iterator has now moved past the initial group of 4 'a' elements, and has basically completely forgotten about them, so the next time it hits an 'a' it creates a new group. Clear as mud? Check out the documentation and it should clear things up ( worked for me :) )

Back to table of contents...

P09 Pack consecutive duplicates of list elements into sublists

Another 2 star problem, this one wants us to convert consecutive duplicates in a list into a sub-lists. The original problem says something about only packing duplicates if list contains duplicates, but the example and implementations I have seen do not seem to do this, so I'll follow convention of blindly packing each element into a sub-list.

  def test09_PackDuplicatesIntoSubLists(self):
    def pack(aList):
      packedList = []
      for index, item in enumerate(aList):
        if index==0 or item != aList[index-1]:
          packedList.append([item])
        else:  
          packedList[-1].append(item)
      return packedList
    
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(
                     [['a','a','a','a'],['b'],['c','c'],['a','a'],['d'],['e','e','e','e']],
                     pack(sampleList)                     
                     )

I couldn't find a nice way of doing this list comprehension style, so I resorted to a simple imperative operation. We start with an empty packedList, and use the nice enumerate() that was so useful in problem 8 to iterate over the source list. If this is the first element, or if this element is not duplicating the previous item, we will append a one element, sub-list to packedList (packedList.append([item])). This means that every element of packedList will be a list itself. This is important, because the else branch of this condition relies on this fact to simply add duplicate elements to the previous sub-list (packedList[-1].append(item)).

At this point I'm seriously loving how easy it is to use Python lists: slicing, list comprehensions, negative indexing etc. Test passes, so it's off to our final question for this set.

From comments: another itertools.groupby alternative

This example added 3 April 2008.

As noted in the follow ups to problem 8, a number of commenters pointed me to the groupby() function (see problem 8 for an explanation). We can use it here too, but this time we want the groups rather than the keys:

  def test09_PackDuplicatesIntoSubLists(self):
    def pack(aList):
        return [list(group) for key, group in groupby(aList)]
    sampleList = ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e']
    self.assertEqual(
                     [['a','a','a','a'], ['b'], ['c','c'], ['a','a'], ['d'], ['e','e','e','e']],
                     pack(sampleList)                     
                    )

Back to table of contents...

P10 Run-length encoding of a list

This problem relates to problem 9, but instead of showing multiple elements in each sub-list, we just want a count of how many duplicates there are. Let's have a look at the test assertion to make this clearer:

    ...
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(
                     [[4,'a'],[1,'b'],[2,'c'],[2,'a'],[1,'d'],[4,'e']],
                     encode(sampleList)
                     )

First I tackled this the easy way, using copy-and-paste reuse from problem 9 and changing the action on each branch. The implementation is show below, with differences from problem 9 emphasised:

    def encode(aList):
      encodedList = []
      for index, item in enumerate(aList):
        if index==0 or item != aList[index-1]:
          encodedList.append([1, item])
        else:  
          encodedList[-1][0] += 1
      return encodedList

Here, rather than appending to sub-lists all the time, we are keeping a structure that contains the number of duplicates and the item being duplicated in each list. That worked, passing the test show above, but I also tried a less-objectional form of reuse of the pack() function from P09 by calling it from a list comprehension:

    def encode2(aList):
      return [[len(packed), packed[0]] for packed in pack(aList)]    

This list comprehension is making a new list from every sub-list returned by the pack() function. Each item is a two item list ([len(packed), packed[0]]). The first item is the length of the sub-list (i.e. how many duplicates we have), and the second is the first item of the sub-list (i.e. the item being duplicated). Here is the complete test:

  def test10_RunLengthEncodeList(self):    
    def encode(aList):
      encodedList = []
      for index, item in enumerate(aList):
        if index==0 or item != aList[index-1]:
          encodedList.append([1, item])
        else:  
          encodedList[-1][0] += 1
      return encodedList
    def pack(aList):
      packedList = []
      for index, item in enumerate(aList):
        if index==0 or item != aList[index-1]:
          packedList.append([item])
        else:  
          packedList[-1].append(item)
      return packedList
    def encode2(aList):
      return [[len(packed), packed[0]] for packed in pack(aList)]
      
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(
                     [[4,'a'],[1,'b'],[2,'c'],[2,'a'],[1,'d'],[4,'e']],
                     encode(sampleList)
                     )
    self.assertEqual(
                     [[4,'a'],[1,'b'],[2,'c'],[2,'a'],[1,'d'],[4,'e']],
                     encode2(sampleList)
                     )   

From comments: yet another itertools.groupby alternative

This example added 3 April 2008.

As noted in the follow ups to problem 8 and problem 9, I really should have been using the groupby() function for some of these problems (see here for an explanation).

  def test10_RunLengthEncodeList(self):    
    def encode3(aList):
      return [[len(list(group)), key] for key, group in groupby(aList)]
    sampleList = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
    self.assertEqual(
                     [[4,'a'], [1,'b'], [2,'c'], [2,'a'], [1,'d'], [4,'e']],
                     encode3(sampleList)
                    )

This is pretty similar in form to the second approach used (encode2()), but without having to call pack first.

Back to table of contents...

Conclusion

Because all the problems so far revolved around lists, an area in which Python excels, I think all of these implementations came out quite well, especially considering I have no idea what I am doing when it comes to Python (or at all, some might argue!).

Looking at the Ruby implementations, there are a number of similarities between the Python versions I came up with. To be fair, I did peek at some of the Ruby solutions for hints in keeping to a functional programming style, but my main point here is that the language differences for basic stuff seem fairly superficial. I found both the Ruby versions and the Python versions very easy to understand.

Perhaps because it has been many, many years since I worked with Haskell, I found the F# samples from the Lanciaux brothers much more difficult to understand. The language semantics are just so different. Although all of them seem more natural to me than the original Prolog solutions :-) Take a look at the solutions to problem 8 and see if you agree:

All in all I had a thoroughly good time using Python for this exercise, and will definitely be looking for any excuse to use it again in future. At least for this basic stuff, the language just seems so concise and easy express your intention through the syntax. If you have suggestions for improvements or find any bugs with this please leave a comment or send me an email (tchepak at gmail). [UPDATE: Thanks to all those who have helped out with comments so far!]

Back to table of contents...

Change log

  • 2008-04-03: Added itertools.groupby() versions of problems 8, 9, 10 after some more helpful comments. Added this change log.
  • 2008-04-01: Added non-enumerate version for problem 8 after getting some helpful comments.

Thursday, 20 March 2008

IEnumerable<T> and ForEach()

Why is there no ForEach() extension method defined for IEnumerable<T> in System.Linq.Enumerable for .NET 3.5? There is a ForEach() on Array and List<T>, but not for IEnumerable<T> which would seem a fairly natural place for it.

public static class Extensions {
  public static void ForEach<T>(this IEnumerable<T> source, Action<T> action) {
    foreach (var item in source) {
      action(item);
    }
  }
}

For some reason, when I'm in LINQy mode, my natural reaction is to try ForEach() over foreach, and I'm always a bit surprised when it doesn't work for IEnumerable<T>:

//Lambda goodness:
parameters.ForEach(parameter => doSomethingTo(parameter));

//Old stylz:
foreach (var parameter in parameters) {
  doSomethingTo(parameter);
}

Bit nit-picky I know, but I noticed Daniel Cazzulino made the same observation:

"I added a ForEach extension method to IEnumerable. How come it's missing in .NET 3.5? :S"

So at least I'm not completely alone on this :-) Anyone else wonder about this?

Wednesday, 19 March 2008

Implicitly typed arrays in C# 3.0

Please excuse the short post and general quietness on my blog, but am typing with a newly installed, titanium screw through a tendon and fractured bone of my finger (fixing an "avulsion of the distal phalanx" apparently)*.

ReSharper 4 just found me another little C# 3 feature to save a bit of typing (handy** given my current condition ;-)). First, let's start with a basic array declaration and initialisation:

Person[] people = new Person[] {new Person("Ann"), new Person("Bill")};

Now C# has the new var keyword to get the type from the right hand side of the assignment:

var people = new Person[] {new Person("Ann"), new Person("Bill")};

But it also has implicitly typed arrays***, which means we can write this:

var people = new[] {new Person("Ann"), new Person("Bill")};

As with var, the compiler is clever enough to strongly type the array appropriately from its initialisation. All three examples are functionally equivalent.

Sure, only a minor impact on the amount of typing you do, but the main benefit is reducing the noise surrounding your code and making it easier to get the intention of the code. (I'm currently trying to learn Python, and am getting an appreciation for removing unnecessary noise -- I'm even liking the whole "significant whitespace" thang :-) )

* Basketball is not a contact sport
** Excuse the pun
*** This link gives a really good overview of the new C# 3.0 features

Wednesday, 12 March 2008

Tweaking your Blogger blog

A non-programming post today. Here is a collection of miscellaneous tweaks I have used on my Blogger blog, just in case I need to find them again. Some of them may require certain Blogger options enabled (like archiving etc). I haven't changed much from the defaults, so if something does not seem to work you might want to check your options and see if that accounts for the difference.

Warning: back up your template before making any changes to it!!! As with all my content, all tweaks here are completely without warranty, and may destroy your blog and/or the universe as we know it. You have been warned. :-)

Tracking visits

There are lots of statistics providers out there. I use Google Analytics. It is easy to setup. Once you have signed up you will be given a snippet of JavaScript code to include on all pages you want tracked on your site. For basic Blogger blogs like mine, this just means editing your template and pasting the tracking code just before the closing </body> tag.

The Analytics terms of use also require you to include an easily accessible Privacy Policy on your site. I did this by creating a new post called "Privacy Policy", and adding an HTML footer to my blog (Dashboard -> Layout -> Page Elements) to include a link to the policy.

Tracking feed readers

Sign up to Feedburner and you can track how many people subscribe to your RSS/ATOM feed. It can also do some tricks around formatting your feed. You can then re-point you Blogger feed to your new feed address: Dashboard -> Settings -> Site Feed and enter the feed address in the Post Feed Redirect URL field. Also a good idea to throw a purdy feed icon and subscribe link on one of your sidebar widgets.

Sensible search engine indexing

I found that searching my blog using Google commonly displayed results from my archive pages. This means that the post I was searching for appeared somewhere amongst my 15 other posts for that archived month.

To fix this I added some META tags to my Blogger template to give search engines some hints about my content. In the <HEAD /> of my template I added the following:

<b:if cond='data:blog.pageType == "archive"'>
  <meta content='noindex,follow' name='robots'/>
</b:if>
<b:if cond='data:blog.pageType == "index"'>
  <b:if cond='data:blog.url != data:blog.homepageUrl'>
    <meta content='noindex,follow' name='robots'/>
  </b:if>
</b:if> 

The first META tag tells robots not to include archive pages in their index, but to follow all links on the page (which go to the full post at its permanent address). The next is optional, and does a similar thing for index pages of your blog.

Since Googlebot has made a few rounds with the new META tags, I have a much easier time finding my own content, and as a side benefit have noticed a lot more visitors coming from search engines (some of whom I've even been able to help!).

Adding pages to your blog

Unlike Wordpress et al. Blogger doesn't provide additional pages for your site (well, not if you aren't using the FTP option). If I need to add pages I tend to write a new post with an appropriate title and use its permalink (see Privacy Policy or Search Results). Downside is that these go out on your feed :( (although you can work around this by changing the post's date to a long time ago, or a galaxy far, far away). Lucky for me I don't have any readers :).

Custom search results

Now that I've got my blog well indexed with my shiny new META tags, and know how to add pages to my blog, I want to add a search page that is a bit more advanced than the out-of-the-box Blogger search (which is still good, but doesn't abbreviate post content by default, and also makes finding older posts difficult).

My approach was to use a Google Customer Search Engine (CSE). I created one just for my blog, and added davesquared.blogspot.com/*/*/*.html as the only site and URL pattern for which results would be returned (this pattern will depend on your permalink format -- it works for mine!).

CSE provides a custom, Google-hosted search page for your search engine, but also provides some neat ways of embedding the search results within your site, including a nifty AJAX popup (see the Code link on your search engine's Control Panel). I found the nifty AJAX popup annoying, and ideally wanted something that should work everywhere, so instead I decided to put the search box in a widget on my sidebar, and have it post back to a Search Results page I created (see previous tweak on adding pages).

First to create the search input I added an HTML/JS Page Element to my sidebar, then added a slightly modified version of the code Google provides to host results on a non-Google page using an IFRAME. The modifications are to try and support browsers with javascript turned off, or without javascript support.

<!-- Google CSE Search Box Begins  -->
<form id="searchbox_(your_unique_search_engine_code)" action="http://www.google.com/cse" onsubmit="davesquared_submitSearchBoxWidget()">
  <input value="(your_unique_search_engine_code)" name="cx" type="hidden"/>
  <input value="" name="cof" type="hidden"/>
  <input style="font-size: 0.9em" name="q" size="20" type="text"/>
  <input style="font-size: 0.9em" value="Search" name="sa" type="submit"/>
</form>
<!-- Google CSE Search Box Ends -->
<script type="text/javascript">
function davesquared_submitSearchBoxWidget() {
  var searchBox = document.getElementById("searchbox_(your_unique_search_engine_code)");
  searchBox.action = "http://davesquared.blogspot.com/2008/01/search.html";
  searchBox.cof.value="FORID:11";
}
</script>

You need to change the (your_unique_search_engine_code) to the one provided by the CSE code sample (it looks something like 12312983712387129:asdfjkh23jh). You also need to change the searchBox.action line in the JS function to point to your own page for hosting the results. I've left mine in as a sample. Finally, you might want to change the JS function names so they don't say davesquared :-). The idea behind this is that if JS isn't enabled, a normal post request will be to the Google-hosted version of the search engine. It isn't nicely integrated with your site, but at least it works for non-JS toting users. If JS is enabled, it will update the search box to post to the results page on your own site.

Lastly, you need to add a bit of code to your search results page (http://davesquared.blogspot.com/2008/01/search.html for me). Again, you can get this code from your custom search engine's control panel.

<!-- Google Search Result Snippet Begins -->
<div id="results_(your_unique_search_engine_code)"></div>
<script type="text/javascript">
  var googleSearchIframeName = "results_(your_unique_search_engine_code)";
  var googleSearchFormName = "searchbox_(your_unique_search_engine_code)";
  var googleSearchFrameWidth = 600;
  var googleSearchFrameborder = 0;
  var googleSearchDomain = "www.google.com";
  var googleSearchPath = "/cse";
</script>
<script type="text/javascript" src="http://www.google.com/afsonline/show_afs_search.js"></script>
<!-- Google Search Result Snippet Ends -->

And that's it! Bit of work, but I find it invaluable for helping me trawl through my drivel to find some useful information. :-)

Line breaks

Basic one, but one I stuffed up when first starting out. Blogger can automatically insert <br/> breaks whenever you add a new line. This makes things easy if you never have markup in your posts. I always have markup in my posts, and commonly compose in Notepad++ and plain HTML. It can also make things difficult if you publish from a client like Windows Live Writer. You can turn off this auto-line break behaviour by Dashboard -> Settings -> Formatting and changing the Convert line breaks option. Be aware that this can stuff up your older posts a bit if you were relying on the previous behaviour.

Dotnetkicker

An ex-colleague of mine (I forced him to subscribe to my blog before leaving) submitted one of my posts to DotNetKicks. No I didn't force him that time, only found out when Analytics said I actually had some visitors :-). I did like the idea of putting a counter on the post itself so I could see if anyone voted for it, and also to make it easier on anyone that feels a post is worth submitting. You can manually put the code on each post, but it is fairly easy to add to your post template.

Go into Dashboard -> Layout -> Edit HTML, and make sure you tick Expand Widget Templates. First thing I did was to create an "includable" item (just chucked it between other <b:includable/> definitions, before <b:includable id='feedLinks' />).

<b:includable id='dotnetkicker'>
  <div>
    <a expr:href='"http://www.dotnetkicks.com/kick/?url=" + data:post.url + "&amp;title=" + data:post.title'>
     <img alt='kick it on DotNetKicks.com' border='0' expr:src='"http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=" + data:post.url' style='border:none;'/>
    </a>
  </div>
</b:includable>

Note that this can stuff up if you have unusual characters in your data:post.title, so watch out for that.

The next step is to include the includable in the post itself. In <b:includable id='post' var='post' />, find where you want to put the kicker and include it like this: <b:include data='post' name='dotnetkicker'/>. I put mine after the data:post.body. End result was this:

...
<div class='post-body'>
  <p><data:post.body/></p>
  <b:include data='post' name='dotnetkicker'/>
  <div style='clear: both;'/> <!-- clear for photos floats -->
</div>
...

Highlight your own comments

I like on some blogs how the author's comments are display in a slightly different style, so you can quickly skim through the author's responses to feedback. This tweak might depend on what base template you use, but it should be fairly easy to apply the basic approach. First, add a style to your template (Dashboard -> Layout -> Edit HTML):

dd.comment-by-author {
  background: #F3F3F3;
}

You'll obviously need to customise the style to fit with your own blog. Next, in the <b:includable id='comments' var='post'/> code, find the loop that renders the comments (<b:loop values='data:post.comments' var='comment'>) and update it to include the following condition (changes emphasised):

<b:if cond='data:comment.author == data:post.author'>
  <dd class='comment-body comment-by-author'>
    <p><data:comment.body/></p>
  </dd>
<b:else/>
  <dd class='comment-body'>
    <b:if cond='data:comment.isDeleted'>
      <span class='deleted-comment'><data:comment.body/></span>
    <b:else/>
      <p><data:comment.body/></p>
    </b:if>
  </dd>
</b:if>

This selectively applies the comment-by-author style to comments (I couldn't find a nice way to do it using an expression (expr) on the DD class attribute).

Fixing standard comments link

When viewing stories on the front page of your blog there is generally an "x comments" link. I always expect this to go off to the full post and jump to an anchor at the comments section. Instead, by default, it takes you to Blogger's dorky comment submission form. This displays the comments in a fairly unfriendly format, and is hosted at www.blogger.com, rather than integrated into your blog's look and feel.

This is easy to fix by jumping into your template again (Dashboard -> Layout -> Edit HTML, make sure Expand Widget Templates is checked). Look for code that looks similar to the following an change the link to that shown emphasised below:

<b:if cond='data:blog.pageType != "item"'>
  <b:if cond='data:post.allowComments'>
    <a alt='View any comments left on this post' class='comment-link' 
      expr:href='data:post.url + "#comments"'>
      <b:if cond='data:post.numComments == 1'>
        1 <data:top.commentLabel/>
      <b:else/>
        <data:post.numComments/> <data:top.commentLabelPlural/>
      </b:if>
    </a>
  </b:if>
</b:if>

This makes the link go straight to the full post and jumps to the comments section. Readers can then go off to the dorky Blogger form to post a comment if they want, otherwise they can read the comments and scroll up to read the full story if they like. Down side is potentially one extra click if readers want to leave a comment, but to me this behaviour is closer to my expectations when reading posts. If the extra click worries you, you could always add a direct "Post a comment" link to the original Blogger form.

Previous posts

Here's a couple of previous posts that relate to Blogger tweaks:

Hope this helps!